|
|
||||||||
|
First published online June 17, 2005; 10.1104/pp.105.060541 Plant Physiology 138:1457-1468 (2005) © 2005 American Society of Plant Biologists Compilation of mRNA Polyadenylation Signals in Arabidopsis Revealed a New Signal Element and Potential Secondary Structures1,[w]Department of Botany, Miami University, Oxford, Ohio 45056 (J.C.L., P.C.W., Q.Q.L.); Ohio Supercomputer Center, Columbus, Ohio 43212 (E.A.S.); Cray, Inc., Brighton, Michigan 48116 (D.G.S.); and The Institute of Genomic Research, Rockville, Maryland 20850 (B.J.H.)
Using a novel program, SignalSleuth, and a database containing authenticated polyadenylation [poly(A)] sites, we analyzed the composition of mRNA poly(A) signals in Arabidopsis (Arabidopsis thaliana), and reevaluated previously described cis-elements within the 3'-untranslated (UTR) regions, including near upstream elements and far upstream elements. As predicted, there are absences of high-consensus signal patterns. The AAUAAA signal topped the near upstream elements patterns and was found within the predicted location to only approximately 10% of 3'-UTRs. More importantly, we identified a new set, named cleavage elements, of poly(A) signals flanking both sides of the cleavage site. These cis-elements were not previously revealed by conventional mutagenesis and are contemplated as a cluster of signals for cleavage site recognition. Moreover, a single-nucleotide profile scan on the 3'-UTR regions unveiled a distinct arrangement of alternate stretches of U and A nucleotides, which led to a prediction of the formation of secondary structures. Using an RNA secondary structure prediction program, mFold, we identified three main types of secondary structures on the sequences analyzed. Surprisingly, these observed secondary structures were all interrupted in previously constructed mutations in these regions. These results will enable us to revise the current model of plant poly(A) signals and to develop tools to predict 3'-ends for gene annotation.
Messenger RNA polyadenylation is a crucial step during the maturation of most eukaryotic mRNA, in which a polyadenine [poly(A)] tract is added to the cleaved 3'-end of a precursor mRNA (pre-mRNA) posttranscriptionally. Such a modification of mRNA has been shown to affect its stability, translatability, and nuclear-to-cytoplasmic export (Zhao et al., 1999
The polyadenylation process requires two major components: the cis-elements or poly(A) signals of the pre-mRNA, and the trans-acting factors that carry out the cleavage and addition of the poly(A) tail at the 3'-end. These trans-acting factors are a complex of about 25 to 30 proteins involved in signal recognition, cleavage, and polyadenylation (Proudfoot, 2004
Previous understanding of these signal elements was derived mostly through conventional genetic and some biochemical analyses, which are both tedious and time consuming to perform. The availability of genomic, full-length cDNA and expressed sequence tag (EST) sequences through large-scale genome-sequencing projects makes it possible to search for poly(A) signals using bioinformatics tools (Graber et al., 1999b
Conventional genetic mutagenesis studies revealed that plant poly(A) signals are composed of three major groups: far upstream elements (FUE), near upstream elements (NUE; an AAUAAA-like element), and cleavage sites (CSs; Rothnie, 1996
The full scope of the prominent patterns of plant poly(A) signals has not been revealed previously, and this has been an obstacle toward using such information for gene annotation and for better understanding of how the plant polyadenylation machinery operates. The major principles of gene annotations are based on the identification of functional RNA and coding sequences (Arabidopsis Genome Initiative, 2000
With the advancement of genomic research and availability of large numbers of plant ESTs, particularly of the model species Arabidopsis (Arabidopsis thaliana), we will be able to collect large-scale datasets for genome-wide poly(A) signal analysis. In this article, we report on efforts to characterize regions of significance in which poly(A) signals reside. Our database consists of two datasets of 3'-UTR sequences covering about 17,000 independent genes, one with 8,160 ESTs with authenticated poly(A) sites, the other with 16,211 full-length cDNA downloaded from The Arabidopsis Information Resource (TAIR). Both datasets were searched independently with supercomputers to probe for the signal pattern locations based on a working model built with conventional genetic analyses of plant poly(A) signals (Hunt and Messing, 1998
The NUE
To compile plant poly(A) signals using a computer program, it is necessary to generate a numeric model or location of the cis-elements that are sought. To this end, we constructed a working model based on the characterized plant poly(A) signals by conventional genetic or biochemical approaches on a few genes, including the cauliflower mosaic virus (CaMV) 35S transcript, the pea (Pisum sativum) small subunit of Rubisco (rbcS), the Agrobacterium T-DNA ocs gene, and the maize (Zea mays) 27-kD protein gene (Rothnie, 1996
We started with the NUE because it is a region that is slightly better understood from the literature and expected to be more conserved than the FUE. The NUE is defined as a signal element located between 13 nt to 30 nt upstream of the CS (position 1 anchored at the last nucleotide of the 3'-end of each cDNA sequence; thus the upstream sequence and the downstream sequence will have a "" or "+" designation, respectively; Rothnie, 1996
A few striking features were found when searching the NUE region of the sequences, namely, that a few patterns had a much larger deviation than the rest of the top 50 patterns. The pattern AAUAAA came out at the top of the list in this region, followed by other less dominant ones. However, in this region, AAUAAA can only account for about 10% of the signals. Second, AAUAAA and related patterns are located at the expected position of the working model at about 13 to 30 nt upstream from the CS (Rothnie, 1996
To determine whether there is a new signal element around the CS, another dataset (UTR + downstream) was created to include 100 nt of genomic sequence downstream of the CS for each of the sequences in the 3'-UTR 8-K dataset. When the region of the sequences (15 to +20) was scanned for predominant patterns, the full peaks were seen collectively, with the CS in the center (Fig. 1B). Due to the nature of the SignalSleuth program, the patterns are counted from the position of the rightmost nucleotide (from 3' to 5'). Thus, considering the 6-nt size patterns, most of the patterns with the highest counts are across the CS. Moreover, the region 10 to +15 is highly saturated with such patterns (the top 1,000 lists for 3- to 11-nt scan results are available in Supplemental Table I). It is clear that this region of the RNA consists of a signal element that was not previously documented in plants. The new signal element is termed cleavage element (CE). The SignalSleuth program also has the capability to generate an output in the form of a 2-D image, as explained in "Materials and Methods," where the width represents the full length of the 400-nt sequence, and each red pixel represents one of the top 50 patterns. Thus, the locations of the top 50 patterns on the sequences can be marked on the image. As shown in Figure 1, C and D, these 2-D images clearly demonstrate the existence of the top signal patterns located within the NUE or CE regions (vertical bands) in the 3'-UTR dataset, while no significant signals can be seen in the random DNA sequence dataset (Fig. 1E). Note that these top 50 patterns were reranked in narrower regions manually corresponding only to the NUE (10 to 30) or the CE (10 to +10), respectively. This stipulation ensures visual evidence that the top patterns on the NUE or CE lists are representative of each element rather than a sum of the two elements. It seems that the CE patterns are located mainly in two regions, one at the CS, the other a few nucleotides apart at the right of the CS.
From mutagenesis analysis of the FUE region, it has been defined that this is a region with low conservation for cis-acting element patterns. Molecular evidence suggested that the FUE region should span about 60 to 100 nt with combinations of motifs from 6 to 18 nt in length (Sanfacon et al., 1991
The NUE, CE, and FUE compilations were also done with the 16-K dataset. Similar results, including rankings of the signals, were found (see Supplemental Table II).
The NUE and CE patterns seemed to be notably rich in A and U nucleotides. This prompted the need to analyze the nucleotide composition profile of the 3'-UTR sequences in the databases. A full-scan sweep of the 8-K dataset from 250- to +100-nt positions revealed intriguing findings (Fig. 3A). First, the distributions of A and U are clearly distinct, where the ups and downs of the curves complement each other. This is true between 200 to +60 nt, covering a span of a 260-nt region. The only exception is at the CS, where C seemed to have a spike (for the previously known YA dinucleotide). Second, distinct A and U profiles are also seen in different signal elements. The FUE region has a high U content, while the NUE region has a high A content, with a clear transition between each. We noticed that the CE region is composed of a complex, but clear, nucleotide composition with alternating A- and U-rich submotifs. Third, the location of the elements can be clearly identified as previously proposed, e.g. the NUE is located at about 20 nt upstream of the CS and the FUE at about 25 to 160, covering a region of over 100 nt. The occurrence of the NUE and the CE are consistent with the alignment shown in the images in Figure 1, C and D. Finally, the CS 1 position can be recognized by an A in about 73% of the sequences, followed by distinct U or C at the 2 position in a total of 80% of the sequences (see Fig. 3B). The 2 position C was found to be 5.64-fold (from 453 to 2,553 counts) at that location compared with the adjacent 1 position.
To verify whether such a nucleotide distribution profile holds in the broader range of sequences, we also scanned the 16-K dataset. The results showed similar patterns of nucleotide distribution at this region (see Supplemental Fig. 1). This result indicates that both datasets, which cover about 17,000 unique genes of the Arabidopsis genome, possess similar profiles of poly(A) signals. Based on the information presented here, we propose a new model for Arabidopsis mRNA poly(A) signals (Fig. 3C). From the single-nucleotide scan analysis, there is no obvious spatial separation among the three types of signals, FUE, NUE, and CE. However, the locations of the signals seem to be well positioned, where the transition from one to the other is complete. The proposed CE contains a subset of small cis-elements: two U-rich sequences flanking both sides of the CS.
The occurrence of patterns flanking CSs and the order of an alternate arrangement of residues of complementarity for the poly(A) signals (Fig. 3A) indicate the possibility of the formation of higher order structures. To explore this, we used the mFold 2.3 model analysis as described by Zuker (2003)
Mutagenesis Data Support the Existence of the Predicted Secondary Structures The secondary structure described above was based on in silico prediction. Support of such structures would be strongest from experimental evidence if the alteration of these structures could interrupt the function of the poly(A) signals. Here we analyzed published data based on conventional mutagenesis on a set of genes. It was this kind of classical analysis that contributed to the understanding of the poly(A) signals in plants.
From the predicted secondary structure of pea rbcS E-9 3'-UTR (Fig. 5A), it is clear that the primary and cryptic CS (based on Hunt and MacDonald, 1989
The finer correlation of the secondary structure and CS efficiency can be better illustrated by the linker-scanning (LS) analysis on the same rbcS 3'-UTR as described (Mogen et al., 1992
The mutagenesis analysis of CaMV poly(A) signals also offers a clue pertaining to the importance of the secondary structures. Deletion of the CaMV NUE pattern AAUAAA almost abolished the use of a corresponding poly(A) site (only 15% that of the wild type; see figure 9 of Rothnie, 1996
Using in silico analysis tools, we compiled Arabidopsis nuclear mRNA poly(A) signals from two independently produced 3'-UTR datasets covering about 17,000 independent genes. Beyond confirming the previous working model on the NUE and FUE, we revealed complex nucleotide distribution patterns around the CS and poly(A) site. The signal surrounding the CS is named CE here. A set of prevailing, although not highly conserved, patterns that are potentially poly(A) signals for each of the three elements are presented. Conserved secondary structures surrounding the CSs were also predicted using the RNA secondary structure prediction program, mFold. Using data from the literature, it is confirmed that these structures are important for the functionality of the signals because only those mutations that altered secondary structures had impact on the efficiency of the signals. These findings should serve as a new starting point for plant poly(A) signal study, e.g. the basis for mutagenesis tests of CE, the design of a program to predict poly(A) sites for genome annotation purposes, and for finding alternative poly(A) sites. A new working model for Arabidopsis mRNA poly(A) signals has emerged. As shown in Figure 3, the location of the FUE and the NUE has been updated based on this large-scale analysis, where the FUE region spans 60 to 125 nt, the NUE region 6 to 10 nt, but the CE is clearly expanded from the original CS (only 2 nt) to include two U-rich regions before and after the CS, both spanning about 5 to 10 nt. A closer view of the CS indicates a sharp nucleotide composition change where the U before the CS is highly desirable and a few Us also follow (Fig. 3B). Such a model could serve well in designing a computer algorithm to scan genomic sequences for possible poly(A) sites.
Conventional genetic analysis of plant poly(A) signals was not able to reveal the significance of the sequence elements surrounding the CS. This may be due partially to the signal element not being strong enough to be readily detected. The CE contains a long stretch of sequence to confirm its existence (although such a hypothesis is subject to further testing). It was postulated that there may be a U-rich region surrounding the YA dinucleotide (Hunt, 1994
The U-rich domain after the CE described here differs from the downstream elements found in animal systems, which are disrupted by about 15 nt of nonconserved sequences after the CS (Zhao et al., 1999
Comparing the FUE and NUE signal patterns we compiled to those characterized in pea rbcS, CaMV, figwort mosaic virus, rice tungro bacilliform virus (RTBV), nos, ocs, and maize 27-kD protein gene (Sanfacon and Hohn, 1990
Actual mutagenesis studies done in virus, yeast, and humans (Shen et al., 1999
The deletion of these regions that contain relevant stem loops has been demonstrated to accompany the loss of poly(A) activity. This may be due to disruption of recognition of the higher order structures by protein factors. As mentioned in Zarudnaya et al. (2003)
Compiling the 8-K Poly(A) Sites within Arabidopsis Genome Sequences All Arabidopsis (Arabidopsis thaliana) transcript sequences, including ESTs and partial or complete cDNA sequences, were downloaded from GenBank on September 1, 2004. Using the trimpoly program, included in The Institute of Genomic Research (TIGR) Gene Indices seqclean software (http://www.tigr.org/tdb/tgi/software), transcripts containing terminal poly(A) sequences were identified and trimmed. The terminal transcript nucleotide of each trimmed polyadenylated transcript was classified as a poly(A) site. Since the trimpoly tool trims low-quality regions from transcript sequence ends in addition to poly(A) sequences, our analysis included only those trimmed poly(A) site transcript ends identified by trimpoly, which were followed by a stretch of 8 to 15 nt with at least 80% adenine content. This criteria proved sufficient to differentiate the presumed genuinely polyadenylated sequences from those of low-quality sequence ends, disregarding other sequences trimmed by trimpoly due to low sequence quality rather than based on terminal poly(A) content. Checking the set of poly(A) sites identified in the genome, and limiting a sequence composition analysis to the 8 bp beyond the CS, there are a maximum of 6.8% of the sequences that could be falsely classified by using these criteria alone. Using the 15-bp maximal window size, we found the maximal false-positive rate drops to 5.6%.
From the sequences of 191,301 Arabidopsis ESTs and 35,557 mRNA sequences obtained from GenBank, we found 10,735 sequences containing poly(A) sequences, which align almost perfectly to the Arabidopsis genome. Approximately one-half of the polyadenylated sequences were derived from full-length cDNAs. The final assembly of 9,298 ESTs was further filtered through methods described by Beaudoing and colleagues (2000) in which sequences containing stretches of As within 10 bp after the CS may denote internal priming contamination. The genome alignments of the trimmed poly(A)-containing sequences provide the identity of 8,160 poly(A) sites within all five chromosomes. The genomic sequence position corresponding to the poly(A) site of each relevant transcript sequence was identified via sequence alignment. The Program to Assemble Spliced Alignments (PASA) pipeline (Haas et al., 2003
The 16-K dataset consists of the 3'-UTR terminal 300 nt, from the assembled 16,211 Arabidopsis full-length cDNAs as described (Haas et al., 2003 Comparing the two datasets used here, one containing 8,160 ESTs (8 K) with authenticated poly(A) sites and the other with 16,211 full-length cDNAs (16 K), the 8-K dataset contains 584 EST sequences that are not found in the 16-K dataset totaling 442 unique genes in the 8-K dataset. There are also 10,474 genes that are unique to the 16-K data set and 5,737 genes are common in both datasets. Thus, the combined total number of genes being analyzed is about 17,000. Both the 8-K and 16-K datasets are available at http://www.users.muohio.edu/liq.
A program, SignalSleuth, was created to perform an exhaustive search of varying size patterns within a subregion of a large set of sequences. (The code can be downloaded at http://www.users.muohio.edu/liq.) The program was developed, installed, and run on Cray computers located at both Cray facilities in Wisconsin and the Ohio Supercomputer Center (OSC). With the use of the Cray Bioinformatics Library (CBL; Cray, Inc., 2004 The algorithm used in the program starts out by reading the sequence data from a FASTA file using the CBL routine cb_read_fasta. The program then enters a triply nested loop, looping over pattern size, the number of sequences, and the location within a given sequence. After entering the outermost loop, pattern size, the program allocates enough memory to hold all possible combinations of the four unique characters {A, C, G, T} in n locations, where n is the size of the pattern for this trip through the loop. The program then begins at the starting location for the first nucleotide, in the subregion of interest, within the first sequence. The program copies the first pattern length worth of characters from the sequence into a temporary variable and compresses it, using the CBL routine cb_compress, into a 2-bit compress form by picking out the second and third bits from each character in the variable. Since there are only four possible characters, only 2 bits of information are needed {00, 01, 10, and 11} to store this information. Shifting the bits in this 2-bit compress variable to the rightmost bits of the pattern, the variable can then be used as an integer to index into the all-possible combination array and increment that location. With this location now tallied, the code shifts to the right one character in the input sequence and repeats the process. When all the characters within the subregion for this sequence are processed, the code advances to the next sequence and repeats the process for the subregion in the next sequence. The program continues in this fashion until all sequences have been processed. At the end of this search process, the all-possible combination arrays now contain a histogram of how frequently each combination was found within the target regions of all the sequences. The next step is to search this array to find the largest number, or set of largest numbers, such as the top 50 most common patterns in the target regions of the sequences. The program then converts the indices into this array from its 2-bit compressed form back into its full 8-bit ASCII characters, and the characters associated with the index are printed out. This scanning algorithm took on several different variations, based on user-defined parameters. With these parameters, the count of a particular pattern can be counted once or multiple times per sequence, and if multiple counts are allowed, a gap can be defined regarding when to start counting the particular pattern again. This helps to prevent short repeated patterns from being overly represented. If the single count option is used, the count of a particular pattern is only counted once per sequence and may result in an underrepresented count of a given pattern. For example, if a tract of UAUAUAUAUAUA were encountered for a 6-nt window size pattern, each frame of the UAUAUA will be counted as the same pattern, resulting in an overrepresented count for this particular pattern. This algorithm will allow the signal for a particular repeated pattern to be counted again on the same sequence only if it falls outside a particular exclusion window size on the sequence (Fig. 7A). These repetitive patterns can be observed in Figure 1, described in the following section.
The ranking of the counts in the array of patterns of all-possible combinations is based on a deviation factor from the median value, which is termed delta, and is defined as the difference between the maximal count and the median count of a respective pattern. Pattern counts deviating the farthest from the median are ranked the highest. The selection of the top 50 is justified by the reduction of this deviation among the top 1,000 signals because this deviation drops sharply after the first 50 patterns, as seen in Figure 7B.
After the most common patterns were found, the next task was to illustrate where these common patterns fall within the sequences, and to see if they were more common, for example, in the NUE region as opposed to the rest of the sequence. To accomplish this, additional code was added to the program to form a graphic picture of the locations of the patterns within each sequence. Imagine a picture that is 8,160 pixels tall and 400 pixels wide, where a pixel is a dot on the screen or printed on a page. In this picture each pixel represents the starting location of one of the top 50 patterns within the sequence. Referring to Figure 1C, the program was run to search the NUE region for the top 50 most common patterns. With this list, the program then turned on the pixel that corresponds to those patterns as they are located in each sequence. Notice that, even though these patterns can be found throughout all the sequences, they are clearly more common in the NUE region, as marked at the top of the image. Similarly, the program was run to search the region near the CS, and its top 50 most common patterns plotted in Figure 1D. Again, distinct vertical bands can be seen on either side of the CS. Figure 1E is used as a control to show what an image would look like using random data. For these images, no exclusion window was used, so long repeats of short patterns can be seen as small horizontal bars in the images.
The predictions of secondary structure of the RNA region surrounding poly(A) sites were carried out by an RNA secondary structure prediction program, mFold (Zuker, 2003
We thank Mouin Hourani, Mingjun Tang, and Minghui Ding for technical assistance, and Michael Hughes for helping with statistical analysis. We also thank Kim Davis for helping with mFold analysis, and Paula Strenski for her invaluable assistance in editing and reviewing this manuscript. We are grateful for the use of Cray's SV1ex supercomputer, located in Chippewa Falls, Wisconsin, and the Ohio Supercomputer Center's SV1 located in Columbus, Ohio. Received January 31, 2005; returned for revision April 29, 2005; accepted May 3, 2005.
1 This work was supported in part by Miami University (Shoupp Award and Botany Department Academic Challenge Grant), the Ohio Plant Biotechnology Consortium, and the National Science Foundation (grant no. MCB0313472 to Q.Q.L.).
2 Present address: Department of Medicine, Mount Sinai School of Medicine, 1425 Madison Ave., New York, NY 100296574.
[w] The online version of this article contains Web-only data. Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.105.060541. * Corresponding author; e-mail liq{at}muohio.edu; fax 5135294243.
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline]
Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D (2000) Patterns of variant polyadenylation signal usage in human genes. Genome Res 10: 10011010 Cray, Inc. (2004) Man Page Collection: Bioinformatics Library Procedures. http://www.cray.com/craydoc/manuals/S-2397-21/S-2397-21.pdf
Graber JH, Cantor CR, Mohr SC, Smith TF (1999a) Genomic detection of new yeast pre-mRNA 3'-end-processing signals. Nucleic Acids Res 27: 888894
Graber JH, Cantor CR, Mohr SC, Smith TF (1999b) In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species. Proc Natl Acad Sci USA 96: 1405514060
Graber JH, McAllister GD, Smith TF (2002) Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites. Nucleic Acids Res 30: 18511858
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31: 56545666
Hajarnavis A, Korf I, Durbin R (2004) A probabilistic model of 3' end formation in Caenorhabditis elegans. Nucleic Acids Res 32: 33923399 Hunt AG (1994) Messenger RNA 3' end formation in plants. Annu Rev Plant Physiol Plant Mol Biol 45: 4760[Web of Science] Hunt AG, MacDonald MH (1989) Deletion analysis of the polyadenylation signal of a pea ribulose-1,5-bisphosphate carboxylase small-subunit gene. Plant Mol Biol 13: 125138[CrossRef][Medline] Hunt AG, Messing J (1998) mRNA Polyadenylation in Plants. In J Bailey-Serres, DR Gallie, eds, A Look beyond Transcription Mechanisms Determining mRNA Stability and Translation in Plants. American Society of Plant Physiologists, Rockville, MD, pp 2939 Li QQ, Hunt AG (1995) A near upstream element in a plant polyadenylation signal consists of more than six bases. Plant Mol Biol 28: 927934[CrossRef][Medline] Li QQ, Hunt AG (1997) The polyadenylation of RNA in plants. Plant Physiol 115: 321325[CrossRef][Web of Science][Medline] MacDonald CC, Redondo JL (2002) Reexamining the polyadenylation signal: Were we wrong about AAUAAA? Mol Cell Endocrinol 190: 18[CrossRef][Web of Science][Medline]
MacDonald MH, Mogen BD, Hunt AG (1991) Characterization of the polyadenylation signal from the T-DNA-encoded octopine synthase gene. Nucleic Acids Res 19: 55755581
Mogen BD, MacDonald MH, Graybosch R, Hunt AG (1990) Upstream sequences other than AAUAAA are required for efficient messenger RNA 3'-end formation in plants. Plant Cell 2: 12611272
Mogen BD, MacDonald MH, Leggewie G, Hunt AG (1992) Several distinct types of sequence elements are required for efficient mRNA 3' end formation in a pea rbcS gene. Mol Cell Biol 12: 54065414
Phillips C, Kyriakopoulou CB, Virtanen A (1999) Identification of a stem-loop structure important for polyadenylation at the murine IgM secretory poly(A) site. Nucleic Acids Res 27: 429438 Proudfoot N (2004) New perspectives on connecting messenger RNA 3' end formation to transcription. Curr Opin Cell Biol 16: 272278[CrossRef][Web of Science][Medline] Proudfoot NJ, Furger A, Dye MJ (2002) Integrating mRNA processing with transcription. Cell 108: 501512[CrossRef][Web of Science][Medline] Rothnie HM (1996) Plant mRNA 3'-end formation. Plant Mol Biol 32: 4361[CrossRef][Web of Science][Medline]
Rothnie HM, Chen G, Futterer J, Hohn T (2001) Polyadenylation in rice tungro bacilliform virus: cis-acting signals and regulation. J Virol 75: 41844194 Rothnie HM, Reid J, Hohn T (1994) The contribution of AAUAAA and the upstream element UUUGUA to the efficiency of mRNA 3'-end formation in plants. EMBO J 13: 22002210[Web of Science][Medline]
Ruff M, Krishnaswamy S, Boeglin M, Poterszman A, Mitschler A, Podjarny A, Rees B, Thierry JC, Moras D (1991) Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp). Science 252: 16821689
Ruschak AM, Mathews DH, Bibillo A, Spinelli SL, Childs JL, Eickbush TH, Turner DH (2004) Secondary structure models of the 3' untranslated regions of diverse R2 RNAs. RNA 10: 978987
Sanfacon H, Brodmann P, Hohn T (1991) A dissection of the cauliflower mosaic virus polyadenylation signal. Genes Dev 5: 141149 Sanfacon H, Hohn T (1990) Proximity to the promoter inhibits recognition of cauliflower mosaic virus polyadenylation signal. Nature 346: 8184[CrossRef][Medline]
Shen LX, Basilion JP, Stanton VP Jr (1999) Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc Natl Acad Sci USA 96: 78717876 Teixeira A, Tahiri-Alaoui A, West S, Thomas B, Ramadass A, Martianov I, Dye M, James W, Proudfoot NJ, Akoulitchev A (2004) Autocatalytic RNA cleavage in the human beta-globin pre-mRNA promotes transcription termination. Nature 432: 526530[CrossRef][Medline]
Van Helden J, Olmo M, Perez-Ortin JE (2000) Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res 28: 10001010 Wu L, Ueda T, Messing J (1993) 3'-end processing of the maize 27 kDa zein mRNA. Plant J 4: 535544[CrossRef][Web of Science][Medline]
Zarudnaya MI, Kolomiets IM, Potyahaylo AL, Hovorun DM (2003) Downstream elements of mammalian pre-mRNA polyadenylation signals: primary, secondary and higher-order structures. Nucleic Acids Res 31: 13751386
Zhao J, Hyman L, Moore C (1999) Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev 63: 405445
Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31: 34063415 This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|