|
|
||||||||
|
Plant Physiology 136:3223-3233 (2004) © 2004 American Society of Plant Biologists Maximizing the Efficacy of SAGE Analysis Identifies Novel Transcripts in Arabidopsis1,[w]Agriculture and Agri-Food Canada, Saskatoon Research Centre, Saskatoon, Saskatchewan, Canada
The efficacy of using Serial Analysis of Gene Expression (SAGE) to analyze the transcriptome of the model dicotyledonous plant Arabidopsis was assessed. We describe an iterative tag-to-gene matching process that exploits the availability of the whole genome sequence of Arabidopsis. The expression patterns of 98% of the annotated Arabidopsis genes could theoretically be evaluated through SAGE and using an iterative matching process 79% could be identified by a tag found at a unique site in the genome. A total of 145,170 reliable experimental tags from two Arabidopsis leaf tissue SAGE libraries were analyzed, of which 29,632 were distinct. The majority (93%) of the 12,988 experimental tags observed greater than once could be matched within the Arabidopsis genome. However, only 78% were matched to a single locus within the genome, reflecting the complexities associated with working in a highly duplicated genome. In addition to a comprehensive assessment of gene expression in Arabidopsis leaf tissue, we describe evidence of transcription from pseudo-genes as well as evidence of alternative mRNA processing and anti-sense transcription. This collection of experimental SAGE tags could be exploited to assist in the on-going annotation of the Arabidopsis genome.
Global gene expression analysis has been widely adopted as a tool to uncover candidate genes and elucidate regulatory pathways controlling important traits in a number of species. Two popular strategies being employed are microarray analysis and Serial Analysis of Gene Expression (SAGE). DNA microarrays (Schena et al., 1995
SAGE analysis has been successfully applied to transcript profiling in a number of eukaryotic species including Saccharomyces cerevisiae (Velculescu et al., 1997
The most common form of the SAGE procedure isolates a 14-bp tag from the 3' most NlaIII restriction site of every mRNA found within a sample. In theory, a random tag sequence of this length provides sufficient complexity to uniquely identify its gene of origin since the tag is extracted from a defined position within the transcript. However, the organization of most genomes results in nonrandom DNA sequence that limits the ability of SAGE to unambiguously match the isolated tag to the gene of origin (Lash et al., 2000 Here we describe an assessment of the efficacy of the SAGE technique as a tool for gene expression profiling in Arabidopsis. The assembled annotated whole genome sequence was utilized to generate theoretical SAGE tag data sets for the Arabidopsis nuclear and organelle genomes. An analysis of these data sets established the level of ambiguity associated with theoretical tag-to-gene matching in this species. A comprehensive tag-to-gene mapping analysis was carried out for experimental SAGE tags generated from two Arabidopsis SAGE libraries. A total of 145,170 reliable tags were sequenced, of which 29,632 tags were distinct and 12,988 of these distinct tags were observed more than once. Using a novel iterative matching process, 10,080 (78%) of these tags were unambiguously matched to their representative gene and 6,800 (52%) were matched to the canonical position. A number of the remaining tags that matched noncanonical positions indicate the presence of novel transcripts both in the sense and the anti-sense orientation.
Efficacy of Tag-to-Gene Mapping within the Arabidopsis Genome To fully exploit the available genome sequence for Arabidopsis we chose to utilize the annotated Arabidopsis sequence available from The Institute for Genomic Research (TIGR). The 30,799 nuclear genes from the annotated Arabidopsis genome sequence (TIGR annotation release v4.0) were partitioned into three categories based on supporting experimental evidence: (1) annotated genes that possess defined untranslated region (UTR) sequences, (2) predicted genes based on numerous gene prediction algorithms, and (3) pseudo-genes. The SAGE protocol isolates the 10 bp adjacent to the 3' most anchoring enzyme site in a transcript, and this 14-bp tag, including the enzyme recognition site, is the SAGE tag used to identify the transcript of origin. To facilitate our tag-to-gene matching, the canonical SAGE tag was defined as the 3' most SAGE tag within each annotated gene. Although the short recognition sequence of the anchoring enzyme will result in a proportion of the canonical tags being derived from 3' UTR sequence, the random distribution of the restriction site may result in the isolation of canonical tags from exonic or 5' UTR sequence. A default 5' and 3' UTR was determined to effectively analyze the predicted Arabidopsis genes that do not possess defined UTR sequence. The frequency distributions for the lengths of both the 5' and 3' UTR sequences were determined using the 17,754 annotated genes with defined UTRs (Fig. 1). The UTR length that included 95% of each distribution was determined to be 350 bp and 500 bp for the 5' UTR and 3' UTR, respectively. Artificial UTRs of these lengths were used to extend those predicted genes lacking sufficient annotation. The distribution for the 3' UTR length revealed 113 (0.6%) annotated genes with an UTR of 31 bp, which suggested errors in the annotation (Fig. 1B). To limit potential errors, 3' UTR sequences of insufficient length to be encompassed by 95% of the distribution (less than 90 bp in length) were excluded and treated as having no defined UTR sequence and thus were assigned a UTR of default length. Based on these analyses, the annotated Arabidopsis nuclear genes were divided into 16,550 genes with experimentally defined UTR sequences and 12,031 conceptual transcripts with UTR sequences set to the default lengths. The defined and default nuclear transcripts can be found at http://www.brassica.ca/SAGE/.
Ideally, each individual SAGE tag should unambiguously identify a unique transcript. However, it is inevitable that a fraction of tags will match multiple locations within the genome and this fraction will increase as the genome size increases. To assess the theoretical efficiency of NlaIII derived SAGE tag-to-gene matching in Arabidopsis, the canonical SAGE tag for each gene or pseudo-gene was extracted from the nuclear and organelle data sets (Table I). Ninety-eight percent of the annotated genes contained an NlaIII site, while the remaining 2% would only be detected by changing the anchoring enzyme. SAGE analysis in Arabidopsis also has the potential to unambiguously differentiate between 79% of all canonical tags, despite the fact that 17% of Arabidopsis genes are duplicated in tandem between 2 and 23 times within the genome (Arabidopsis Genome Initiative, 2000
In certain instances SAGE may not capture the tags equivalent to the annotated canonical site, for example, due to alternative mRNA processing. Therefore, the current annotation may restrict the matching of legitimate tags. For these reasons, theoretical SAGE tags were also considered from every NlaIII site from exonic sequences and from immature transcript sequences. As expected, this analysis increased the level of ambiguity such that 55% of the 216,669 theoretical SAGE tags could be uniquely assigned to their annotated gene of origin (Supplemental Table I, available at www.plantphysiol.org). This ambiguity was further compounded after including intergenic sequence that resulted in 42% of the 431,518 theoretical tags matching a unique position within the Arabidopsis genomic sequence (Supplemental Table I).
The SAGE protocol also allows the directionality of each tag to be determined, and anti-sense tags have been reported in C. elegans and more recently in plants (Jones et al., 2001 All of the theoretical NlaIII SAGE tags described above form an invaluable resource for tag-to-gene matching in Arabidopsis and these data can be accessed from http://www.brassica.ca/SAGE/.
The data from two SAGE libraries generated from Arabidopsis leaf tissue were combined resulting in the extraction of 184,580 tags prior to quality assessment. The Phred sequence trace quality scores were used to remove all tags of low sequence quality (Phred score of <20) due to the 1% error rate associated with single pass sequencing (Hillier et al., 1996
We have focused our analysis on tag-to-gene matching for the 12,988 distinct tags that were encountered greater than once and provide limited analysis of singletons. An iterative process was employed to match these tags to their representative Arabidopsis genes that involved assigning the highest level of confidence to tags that match the canonical site within a mature transcript sequence. This analysis assigned 6,800 (52%) tags to a unique canonical position within the Arabidopsis nuclear and organelle data sets (Table II, Canonical Tag Matches). An additional 345 (3%) tags matched canonical positions but could not be unambiguously assigned to a single position. These included instances where a distinct tag identified multiple members of a gene family and where distinct tags were matched to multiple unrelated genes (Table II, Canonical Tag Matches).
The unmatched experimental tags were compared to the theoretical tags extracted from all possible NlaIII restriction sites within the nuclear and organelle data sets (Table II, Cumulative Tag Matches). This resulted in a cumulative total of 9,186 (71%) of the unique tags being matched to a single site within either an annotated gene or pseudo-gene (Table II, Cumulative Tag Matches). The remaining unmatched tags were compared to theoretical tags flanking every NlaIII restriction site in data sets generated from all available genomic sequences to allow for incompletely processed hnRNA, misannotation of splice sites, and unannotated transcripts. A further 894 (7%) of the unmatched tags were identical to a unique site within either intronic or intergenic sequence.
These combined analyses matched 12,138 (93%) distinct tags to the Arabidopsis genome, 2,058 matched multiple sites, and 850 tags remained unmatched. In total, 10,080 (78%) of the distinct SAGE tags observed more than once could be unambiguously assigned to a single location within the Arabidopsis genomic sequence. This was in contrast to previous publications where the highest level of tag-to-gene matching in Arabidopsis was 57% (Fizames et al., 2004
The 30 most abundant SAGE tags identified within this experiment and their genes of origin are presented in Table III. As anticipated, the majority of the identified genes encode proteins involved in photosynthesis, with the remainder classified within oxidative stress, energy production, and cell division categories. The most abundant tag, which comprised 1.7% of the experimental tags, matched At2g34420, a member of a large gene family encoding a Photosystem II chlorophyll a/b binding protein. An additional tag (CATGCTCGGAGCCC) was present at a frequency of 0.4% and matched members of the same gene family including At2g34420. Using the iterative process, 21 of the 30 abundant tags (70%) were unambiguously matched to a single genomic location, although it was noted that two of these tags were unable to distinguish between their possible alternate gene transcripts. The remaining tags matched multiple locations. Three of these could not distinguish duplicate members of gene families that arose presumably via concerted evolution or relatively recent gene duplication events. Four of the SAGE tags matched multiple genes each annotated with a different predicted biological function. It is possible that the use of the conceptual transcripts could generate erroneous matches. However, in only one case was an additional match due to a tag being identified within the UTR of a conceptual transcript.
Alternative Transcript Processing The precision of gene identification through SAGE allows tags to be unambiguously matched to noncanonical sites within transcripts. This has the potential to identify uncharacterized differential processing events. For example, six of the most highly abundant SAGE tags were unambiguously matched to a noncanonical position (Table III) and in each case the corresponding annotated canonical tag occurred at a low frequency providing evidence for alternative transcription.
The annotation provided by TIGR details differential processing events for only 1,267 Arabidopsis genes resulting in 2,678 alternatively spliced transcripts (Haas et al., 2003
Current annotation for the gene At3g47470 details a single iso-form, although two distinct SAGE tags were unambiguously matched to this gene. A noncanonical tag (CATGAACAAATTTG) was observed 777 times (Table III) while the canonical tag (CATGTGGCAACAGT) was found 149 times, suggesting this is not an artifact of incomplete digestion. The presence of alternate iso-forms for this gene was corroborated by the presence of full-length cDNA and 3' EST sequences deposited in GenBank representing these forms (for example AY093080 and AF325012). SAGE also has the ability to detect the presence of alternative transcriptional termination in addition to the identification of alternatively processed transcripts. This was observed for At3g16770 where the canonical tag (CATGTGTAAATAAG) was identified 21 times compared to a noncanonical tag (CATGGCTTATGATG) that was found 340 times. These SAGE tags matched different full length cDNA sequences submitted to GenBank (AY087488 and AY035100) that had a conserved translational stop codon but differed in the length of the 3' UTR, presumably as the result of differential transcriptional termination. The discovery of this phenomenon through SAGE analysis is particularly interesting in light of recent evidence demonstrating the role of alternative transcriptional termination in the regulation of FCA gene expression (Simpson et al., 2003 The analysis was extended to include singleton SAGE tags matching a unique position in the genome to establish an estimate for the frequency of alternative mRNA processing in Arabidopsis. This increased the number of genes with multiple tag matches to 3,038. Of these, 2,248, representing 17% of the 12,934 genes unambiguously assigned SAGE tag matches when including singletons, had more abundant noncanonical tags suggesting they were unlikely to be the result of incomplete digestion and could be due to alternate mRNA processing.
The orientation of each SAGE tag extracted from an mRNA transcript is known, enabling potential anti-sense transcripts to be detected. It is unlikely that all of the remaining 850 unmatched distinct tags result from experimental artifacts since singletons were excluded from the tag-to-gene analysis. The unmatched tags were compared to all possible theoretical tags from the anti-sense strand in all data sets (Table IV). This resulted in the assignment of 259 (2%) tags to a unique anti-sense site within the Arabidopsis nuclear and organelle mRNA data sets and a further 147 (1%) matched a single location in the intronic or intergenic sequences. The remaining 387 (3%) tags that failed to match the data sets could have been derived from sequences spanning an unpredicted intron/exon boundary or from unsequenced regions of the genome or they may represent experimental artifacts.
The majority of these anti-sense SAGE tags were detected at low levels that may be expected if anti-sense molecules perform a regulatory function. However, some anti-sense tags were found among the most abundant tags observed. For example, the tag CATGGTCTCTCCAG was present 93 times and was unambiguously matched to At2g37220 in the anti-sense orientation.
We analyzed leaf tissue expression profiles detected using the alternate platforms of MPSS (Meyers et al., 2004
The analysis was extended to include all experimental SAGE tags to provide an estimate for the level of anti-sense transcription in Arabidopsis as some anti-sense molecules may be physiologically active at low concentrations. A total of 1,165 of the remaining unmatched distinct SAGE tags were assigned a match to an annotated Arabidopsis gene in an anti-sense direction after iterative matching to the sense strand. This identified 966 Arabidopsis genes with anti-sense transcription of which 518 were also detected by either MPSS or microarray, with 191 genes common across all three technologies.
Efficacy of SAGE Tag-to-Gene Matching within Arabidopsis
Arabidopsis is a widely used model for the study of plant biology due to its small genome size (approximately 120 Mb), low amount of repetitive DNA and fast generation time. Over the years, a large collection of associated genetic and genomics resources has been amassed and these have been augmented by the completion of the whole genome sequence (Arabidopsis Genome Initiative, 2000
Conceptual transcripts were constructed for 12,031 nuclear genes by the addition of a 5' and 3' UTR of defined length to obtain the correct canonical tag from predicted Arabidopsis genes that have no annotated UTR sequences. This approach has recently been verified by work in D. melanogaster where it was demonstrated that 52% of canonical SAGE tags were located within the UTR sequence (Pleasance et al., 2003
Almost all Arabidopsis nuclear and organelle genes (98%) can be distinguished using NlaIII based SAGE analysis and 79% of theoretical canonical tags, which are considered the most biologically relevant, can be unambiguously assigned to their gene of origin (Table I). It is necessary to consider noncanonical theoretical tags to identify the origin of unmatched tags in any SAGE experiment. In Arabidopsis, the specificity of tag matching (27%) was marginally lower than that observed in other nonplant model species (35% in D. melanogaster and C. elegans; Pleasance et al., 2003
Previous SAGE analyses in Arabidopsis has utilized the available UniGene data set for tag-to-gene matching (Ekman et al., 2003
An iterative process was employed to allow comprehensive tag-to-gene matching for Arabidopsis leaf tissue. The high level of SAGE tag quantitation makes this one of the most comprehensive SAGE analysis of global gene expression in a plant species to date. Conservative sequence quality analysis was also employed to alleviate problems associated with the single pass sequencing of SAGE data, this allowed 145,170 (79%) SAGE tags to be analyzed with confidence, of which 29,632 were unique.
Almost 80% of the 12,988 tags that were observed two or more times could be unambiguously matched to a single genic location in the sense orientation and in the process identified 8,293 different annotated Arabidopsis genes. Of these matches, 84% were to nuclear transcripts comprising annotated UTR sequence. A total of 12,934 genes were uniquely identified by extending the tag-to-gene matching to incorporate the singleton data. Additionally, a greater proportion of these matches were assigned to the conceptual transcripts and pseudogenes perhaps reflecting their relative abundance within the transcriptome. Utilizing a comprehensive set of global arrays for Arabidopsis, Yamada et al. (2003)
In Arabidopsis, 93% of the SAGE tags could be assigned a match. However, 16% of these tags were assigned to more than one location within the genome (Table II, Cumulative Tag Matches). Matching of tags to multiple members of gene families or multiple alternative transcripts of the same gene still allows biological inferences to be made from the SAGE data since the predicted function in each case is identical. However, both the qualitative and quantitative value of the tag data is reduced since further investigation needs to be made on an individual basis. This can be compounded when a tag matches multiple genes with apparently unrelated functions. The majority of tags were matched to a unique genomic location in previously published SAGE analyses (Jones et al., 2001
A minority of the tags observed greater than once (3%) did not match any sequence within the Arabidopsis genome data sets. These tags may originate from unsequenced heterochromatic regions of the Arabidopsis genome, unpredicted intron splice junctions, or they may be artifacts of the SAGE protocol. These questions should be resolved once the genome is completely sequenced and the depth of EST sequencing increases. Nevertheless, the use of the full genome sequence has achieved a more comprehensive tag assignment than other SAGE studies in Arabidopsis where tags with no matches were as high as 70% (Jung et al., 2003
It has been suggested that SAGE tags can assist with the annotation of genomic sequence by facilitating the discovery of novel transcripts and providing confirmatory evidence for hypothetical transcripts (Saha et al., 2002
The nuclear genome is almost 250 times larger than the combined organelle genomes. However, 0.3% of the experimental tags were assigned matches to canonical tags from transcripts present on either the mitochondrial (366.9 kb) or the chloroplast genome (154.4 kb). Together these comprised 3% of the total number of experimental tags extracted in this analysis. The highest level of mRNA synthesis for the organelles was detected in the chloroplast, which reflects the fact that they are present at a high copy number in leaf cells and that the leaf is predominantly a photosynthetic organ. However, there is growing evidence that complex signaling and gene regulation between the organelle and nuclear genomes plays a role in controlling a plant's physiological responses, particularly in interactions with the environment (Pfannschmidt et al., 2001
The dichotomy of SAGE analysis lies in the requirement for comprehensive sequence data. Although SAGE tags can be generated from any organism, the ability to analyze and exploit the data is dependant upon the tags being unambiguously matched to their gene of origin. In the present analysis, the annotated Arabidopsis genome sequence provides an excellent resource with which to exploit SAGE data. Although deference was given to canonical tag matches, all possible NlaIII derived tags were considered which allowed a number of novel transcripts to be uncovered.
Alternatively processed transcripts were discovered within the SAGE data at a level of between 4% and 17%, the lower level being consistent with previous observations in plants (Haas et al., 2003
An intriguing observation was the presence of anti-sense transcripts within the SAGE data. The anti-sense tags could be explained by experimental errors such as mispriming from internal poly(A) tracts during second strand synthesis of the cDNA, from spurious promoter regions present on the noncoding strand or from illegitimate transcriptional read through from an adjacent gene or pseudo-gene found on the opposite strand (Elrouby and Bureau, 2001
SAGE tags were also detected for a number of annotated pseudo-genes. These presumed nonfunctional paralogs have arisen either through gene duplication, segmental genome duplication, or retrotransposition. The occurrence of pseudo-genes has been estimated to be 19% in the human genome (Dunham et al., 1999
SAGE has proved a valuable tool in Arabidopsis to gain insight into the metabolic processes functioning within leaf tissue. Utilizing the available Arabidopsis genome sequence has allowed a comprehensive assignment of SAGE tags to their particular gene of origin. Although qualified by the evident ambiguity of some of the matches, SAGE has uncovered a number of novel transcripts, the biological significance of which has yet to be established. The theoretical SAGE tags made available through this analysis (http://www.brassica.ca/SAGE/) should assist in the effective matching of experimental SAGE tags in future analyses.
The efficacy of SAGE in Arabidopsis has proved to be similar to that of other model systems. However, for the larger crop genomes, the underlying duplication so ubiquitous in recent plant genome evolution could limit the information captured from such data and may warrant the application of techniques that isolate longer sequence tags such as LongSAGE (Saha et al., 2002
Plant Materials Leaf tissue was collected from plants grown under axenic conditions to eliminate contaminating sequence from potential pathogens. Arabidopsis (Col-4) seeds were treated with 10 mL of a 30% NaOCl solution for 10 min followed by treatment with an equal volume of 70% ethanol for 5 min before being washed three times for 5 min in an equal volume of dH2O. The axenic seed were germinated on 0.5x Murashige and Skoog media. Control seedlings were grown at 22°C and 125 µE light. The tissue above the media was harvested after 14 d, including leaf and shoot material, and used for RNA extraction. For cold treated tissue, the Arabidopsis plants were grown as described for the control plants but were exposed to low temperature, 4°C and 125 µE light, for a period of 30 min prior to the tissue being harvested and used for RNA extraction.
Total RNA was extracted from 10 g of Arabidopsis plant tissue using the Total RNA extraction kit and mRNA was purified from 600 mg total RNA using an mRNA purification kit according to the manufacturer's protocol (Amersham Biosciences, Baie d'Urfe, Canada). Five micrograms of poly(A) RNA was used to generate double-stranded cDNA using a cDNA synthesis kit (Gibco-BRL, Gaithersburg, MD) according to the manufacturer's protocol with the exception that a biotinylated oligo(dT) primer was used to prime synthesis of the first strand cDNA.
SAGE procedures were performed according to the original protocol (Velculescu et al., 1995
The experimental SAGE tags were extracted from approximately 4,600 primary sequence reads using cSAGE software developed at Agriculture and Agri-Food Canada (AAFC) Saskatoon. cSAGE is a UNIX tool written in C that allows automated analysis of SAGE sequence data from FASTA files and is freely available upon request. The raw sequence data was processed by cSAGE such that ditags were rejected from further analysis if they were less than 24 bp or greater than 28 bp in length or if the mean sequence quality across each ditag had a Phred (Ewing et al., 1998
A suite of tailored Perl modules was developed to create Arabidopsis sequence data sets for tag matching using the available TIGR XML data (release v4, http://www.tigr.org/tdb/e2k1/ath1/) and GenBank data (http://www.ncbi.nlm.nih.gov) for the chloroplast (accession no. NC_000932) and mitochondrial (accession nos. Y08501 and Y08502) genomes. Listed in order of matching priority, three data sets were generated: (1) the exonic data set, derived from annotated exonic sequences from the nuclear, chloroplast, and mitochondria genomes; (2) the intron data set composed of the annotated immature transcript sequences; and (3) the pseudo-genome data set encompassing the entire Arabidopsis sequence from the nuclear, chloroplast, and mitochondria genomes. It was necessary to generate conceptual transcripts for a subset of the annotated genes in order to utilize the TIGR data. The 17,754 (58%) Arabidopsis genes with experimentally verified UTR sequence were used to calculate the mean length and distribution of the UTRs after removal of all intronic regions. From these distributions, the length sufficient to encompass 95% of each distribution was determined to specify a default 5' and 3' UTR for those genes with limited annotation. These default UTRs were extracted from the Arabidopsis XML files of the pseudo-chromosomes utilizing the coordinates defining each predicted gene to identify the adjacent sequence. The extension of the UTR sequence was terminated at the beginning of the adjacent gene where extension of the predicted gene sequence created overlaps between adjacent genes. However, where data from full-length cDNAs was used to support the gene annotation and overlapping genes were documented these genes remained unaltered. For comparative purposes, data sets were also generated from the Arabidopsis UniGene Build number 41 (as extracted from http://www.ncbi.nlm.nih.gov/UniGene in November 2003).
For MPSS we extracted signature sequences from the LEF and LES libraries, both were derived from 21-d-old leaf material (http://mpss.udel.edu/at/java.html). We only considered MPSS tags of reliable sequence quality (T) that were present at significant levels (>3 tpm) and had only a single hit to the genome. The sense tags matched to annotated Arabidopsis genes were identified by the classes 1 (inside annotated gene), 2 (within 500-bp 3' of annotated gene), or 7 (within 17 bp of an exon boundary; spliced). The anti-sense tags matched to annotated genes were identified by classes 3 (anti-sense to annotated gene) or 6 (within intron, anti-sense strand).
For global microarray analysis we obtained the expression data described in Yamada et al. (2003) Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession number GSM30396.
We thank Diana Bekkaoui and Lian Hao for technical assistance, Dr. Larry Pelcher at NRC-Plant Biotechnology Institute, Saskatoon for use of ELVIS, and Drs. Hossein Borhan, Dwayne Hegedus, and Andrew Sharpe, all at Saskatoon Research Centre, for critical reading of the manuscript. Received March 23, 2004; returned for revision July 14, 2004; accepted July 16, 2004.
1 This work was supported in part by the Genome Prairie project Functional Genomics of Abiotic Stress in Crop Plants and in part by the Agriculture and Agri-Food Canada Canadian Crop Genomics Initiative.
[w] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.104.043406. * Corresponding author; e-mail parkini{at}agr.gc.ca; fax 3069567247.
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline] Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, et al (2000) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol 18: 630634[CrossRef][ISI][Medline]
Carter MJ, Milton ID (1993) An inexpensive and simple method for DNA purifications on silica particles. Nucleic Acids Res 21: 1044 Carpousis AJ, Vanzo NF, Raynal LC (1999) mRNA degradation. A tale of poly(A) and multiprotein machines. Trends Genet 15: 2428[CrossRef][ISI][Medline] Cock JM, Swarup R, Dumas C (1997) Natural antisense transcripts of the S locus receptor kinase gene and related sequences in Brassica oleracea. Mol Gen Genet 255: 514524[CrossRef][ISI][Medline] Dolfini S, Consonni G, Mereghetti M, Tonelli C (1993) Antiparallel expression of the sense and antisense transcripts of maize alpha-tubulin genes. Mol Gen Genet 241: 161169[CrossRef][Medline] Dunham I, Shimizu N, Roe BA, Chissoe S, Hunt AR, Collins JE, Bruskiewich R, Beare DM, Clamp M, Smink LJ, et al (1999) The DNA sequence of human chromosome 22. Nature 402: 489495[CrossRef][Medline]
Ekman DR, Lorenz WW, Przybyla AE, Wolfe NL, Dean JF (2003) SAGE analysis of transcriptome responses in Arabidopsis roots exposed to 2,4,6-trinitrotoluene. Plant Physiol 133: 13971406
Elrouby N, Bureau TE (2001) A novel hybrid open reading frame formed by multiple cellular gene transductions by a plant long terminal repeat retroelement. J Biol Chem 276: 4196341968
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175185
Fizames C, Munos S, Cazettes C, Nacry P, Boucherez J, Gaymard F, Piquemal D, Delorme V, Commes T, Doumas P, et al (2004) The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence. Plant Physiol 134: 6780 Gibbings JG, Cook BP, Dufault MR, Madden SL, Khuri S, Turnbull CJ, Dunwell JM (2003) Global transcript analysis of rice leaf and seed using SAGE technology. Plant Biotechnol J 1: 271285[CrossRef][Medline] Gorski SM, Chittaranjan S, Pleasance ED, Freeman JD, Anderson CL, Varhol RJ, Coughlin SM, Zuyderduyn SD, Jones SJ, Marra MA (2003) A SAGE approach to discovery of genes involved in autophagic cell death. Curr Biol 13: 358363[CrossRef][ISI][Medline] Hansen NJ, Kristensen P, Lykke J, Mortensen KK, Clark BF (1995) A fast, economical and efficient method for DNA purification by use of a homemade bead column. Biochem Mol Biol Int 3: 461465
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31: 56545666 Hayes R, Kudla J, Gruissem W (1999) Degrading chloroplast mRNA: the role of polyadenylation. Trends Biochem Sci 24: 199202[CrossRef][ISI][Medline]
Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W, et al (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res 6: 807828 Jones SJ, Riddle DL, Pouzyrev AT, Velculescu VE, Hillier L, Eddy SR, Stricklin SL, Baillie DL, Waterston R, Marra MA (2001) Changes in gene expression associated with developmental arrest and longevity in Caenorhabditis elegans. Genome Res 8: 13461352 Jung SH, Lee JY, Lee DH (2003) Use of SAGE technology to reveal changes in gene expression in Arabidopsis leaves undergoing cold stress. Plant Mol Biol 52: 553567[CrossRef][ISI][Medline]
Lash AE, Tolstoshev CM, Wagner L, Schuler GD, Strausberg RL, Riggins GJ, Altschul SF (2000) SAGEmap: a public gene expression resource. Genome Res 10: 10511060
Lee JY, Lee DH (2003) Use of serial analysis of gene expression technology to reveal changes in gene expression in Arabidopsis pollen undergoing cold stress. Plant Physiol 132: 517529 Lorenz WW, Dean JF (2002) SAGE profiling and demonstration of differential gene expression along the axial developmental gradient of lignifying xylem in loblolly pine (Pinus taeda). Tree Physiol 5: 301310
MacIntosh GC, Wilkerson C, Green PJ (2001) Identification and analysis of Arabidopsis expressed sequence tags characteristic of non-coding RNAs. Plant Physiol 127: 765776 Matsumura H, Nirasawa S, Terauchi R (1999) Transcript profiling in rice (Oryza sativa L.) seedlings using serial analysis of gene expression (SAGE). Plant J 20: 719726[CrossRef][ISI][Medline]
Meyers BC, Lee DK, Vu TH, Tej SS, Edberg SB, Matvienko M, Tindell LD (2004) Arabidopsis MPSS. An online resource for quantitative expression analysis. Plant Physiol 135: 801813
Nam DK, Lee S, Zhou G, Cao X, Wang C, Clark T, Chen T, Rowley JD, Wang SM (2002) Oligo(dT) primer generates a high frequency of truncated cDNAs through internal Poly(A) priming during reverse transcription. Proc Natl Acad Sci USA 99: 61526156 Olsen MA, Schechter LE (1999) Cloning, mRNA localization and evolutionary conservation of a human 5-HT7 receptor pseudogene. Gene 227: 6369[CrossRef][ISI][Medline]
Patankar S, Munasinghe A, Shoaibi A, Cummings LM, Wirth DF (2001) Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite. Mol Biol Cell 12: 31143125 Pfannschmidt T, Allen JF, Oelmuller R (2001) Principles of redox control in photosynthesis gene expression. Physiol Plant 112: 19[CrossRef] Pfannschmidt T (2003) Chloroplast redox signals: how photosynthesis controls its own genes. Trends Plant Sci 8: 3341[CrossRef][ISI][Medline]
Pleasance ED, Marra MA, Jones SJ (2003) Assessment of SAGE in transcript identification. Genome Res 13: 12031215
Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16: 16161626 Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE (2002) Using the transcriptome to annotate the genome. Nat Biotechnol 20: 508512[CrossRef][ISI][Medline]
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467470
Schuster G, Lisitsky I, Klaff P (1999) Polyadenylation and degradation of mRNA in the chloroplast. Plant Physiol 120: 937944 Simpson G, Dijkwel PP, Quesada V, Henderson I, Dean C (2003) FY is an RNA 3' end-processing factor that interacts with FCA to control the Arabidopsis floral transition. Cell 113: 777787[CrossRef][ISI][Medline] Vanhee-Brossollet C, Vaquero C (1998) Do natural antisense transcripts make sense in eukaryotes? Gene 211: 19[CrossRef][ISI][Medline]
Velculescu VE, Zhang L, Voglestein B, Kinzler KW (1995) Serial analysis of gene expression. Science 270: 484487 Velculescu VE, Zhang L, Zhou W, Voglestein J, Basrai MA, Bassett DE, Hieter P, Voglestein J, Kinzler KW (1997) Characterisation of the yeast transcriptome. Cell 88: 243251[CrossRef][ISI][Medline]
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, et al (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842846
Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, Voglestein B, Kinzler KW (1997) Gene expression profiles in normal and cancer cells. Science 276: 12681272
Zhu W, Schlueter SD, Brendel V (2003) Refined annotation of the Arabidopsis genome by complete expressed sequence tag mapping. Plant Physiol 132: 469484 This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||