|
|
||||||||
|
First published online May 15, 2009; 10.1104/pp.109.138214 Plant Physiology 150:1135-1146 (2009) © 2009 American Society of Plant Biologists OPEN ACCESS ARTICLE
TriFLDB: A Database of Clustered Full-Length Coding Sequences from Triticeae with Applications to Comparative Grass Genomics[C],[W],[OA]Plant Science Center, RIKEN, Yokohama 230–0045, Japan (K.M., T.Y., T.S., K.S.); and Kihara Institute for Biological Research, Yokohama City University, Yokohama 710–0046, Japan (Y.O.)
The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp/.
The recent accumulation of nucleotide sequences for agricultural species, including crops and domestic animals, now permits the application of genome-wide comparative analyses of model organisms with the goal of identifying key genes involved in phenotypic characteristics (Cogburn et al., 2007
Integrative databases that house the sequences of systematically collected full-length cDNA clones have become fundamental initial resources for the bold promotion of the study of genomics in various organisms (Hayashizaki, 2003
The Poaceae are a plant family that includes four major food staple crop species: wheat (Triticum aestivum), maize (Zea mays), rice (Oryza sativa), and barley (Hordeum vulgare). cDNA and/or genome sequence data for crops of the Poaceae have recently been accumulating in the public domain. Completion of the whole-genome sequencing of rice and its curated annotation using full-length cDNA data have benefited comparative plant genomics by increasing our understanding of genome-wide features and accelerating practical cereal breeding (International Rice Genome Sequencing Project, 2005
To integrate our genomic knowledge of plants and facilitate further discoveries, many public databases that contain important plant genomics resources and that have effective interfaces have been established (Supplemental Fig. S1). PlantGDB, The Institute for Genomic Research (TIGR) Gene Indices, TIGR Plant Transcript Assemblies, and HarvEST provide clustered and representative transcript sequences resulting from advances in large-scale EST compilation. Each of these databases is useful not only for the provision of comprehensive transcripts, but also for comparisons among plant species (Liang et al., 2000 Therefore, to fill the gap in our knowledge of full-length CDSs of the Triticeae and, thus, to facilitate comparative grass genomics, we gathered the relational annotations of full-length CDSs of wheat and barley into a new database with the following specific properties. The first property was to provide predicted domain structures as well as other protein domain-oriented annotations of entire amino acid sequences that have been deduced from full-length CDSs and from CDSs clustered with proteome datasets of other plant species. The second was to provide seamless cross references to previously released sequence data resources, which was accomplished by annotating each of the database entries with possible identical sequences and/or counterparts in various transcripts and also by annotating the modeled proteome data resources of plant species, all with related reference links. The aim of this was to integrate knowledge and thus increase our understanding of gene annotations. Third, each of the entries in the database was related to the genetically mapped cDNAs of barley and diploid wheat, which in turn were bidirectionally integrated with TriMEDB. This yields a synergistic data relationship and extends the application of these resources to provide potential genetic positions of full-length transcripts on linkage maps of Triticeae in silico. Here we describe our novel database. The Triticeae Full-Length CDS Database (TriFLDB) integrates knowledge of full-length CDSs of Triticeae crops with insights into comparative grass genomics. Currently, TriFLDB consists of 8,530 wheat and 7,341 barley putative full-length CDSs and related information. TriFLDB can be accessed via the Web interface at http://TriFLDB.psc.riken.jp/.
Dataset, Design, and Search Interface of TriFLDB
The dataset integrated into the initial version of TriFLDB is summarized in Table I
. Full-length CDSs were predicted using the full-length open reading frame (ORF) methods employed in the japonica rice full-length cDNA project (Kikuchi et al., 2003
We integrated full-length CDS data for wheat and barley with various annotations into a database capable of providing insights for comparative genomics. First, we retrieved full-length cDNA data and protein data deduced from full-length CDSs and analyzed it bioinformatically. This yielded sequence annotations, hierarchical protein clustering, and sequence similarity-based mapping of Triticeae full-length CDSs compared to the rice and sorghum genomes (Fig. 1 ).
To access housed full-length CDS entries, TriFLDB provides a Web-based search interface enabling keyword and sequence similarity searches (Fig. 2 ). It is possible to search with keyword strings from BLAST definitions as well as with identifiers from databases such as PFAM, Prosite, and Panther. Gene Ontology (GO) terms assigned in the InterProScan results can also be used, as predicted chromosomal locations from TriMEDB (Fig. 2A). National Center for Biotechnology Information (NCBI) BLAST has also been implemented on the TriFLDB Web site. The BLAST service allows users to perform a homology search against multiple-sequence datasets. The database for this BLAST service consists of wheat and barley full-length cDNAs and their transcribed amino acid sequences, as well as the Arabidopsis proteome dataset from The Arabidopsis Information Resource (TAIR) and the Rice Annotation Project Database (RAP-DB) and TIGR rice databases (Fig. 2B). These search interfaces provide users with effective access to Triticeae full-length CDS (TriFLCDS) entries by using various types of queries that are also used in the databases for other plant species. For wheat and barley, this approach permits knowledge of model organisms, such as rice and Arabidopsis, which could be used for gene discovery and crop improvement (Bellgard et al., 2004
Annotation of Triticeae Full-Length CDSs The Web interface displays information on TriFLCDSs that includes the results of CDS predictions and the nucleotide and deduced protein sequences (Fig. 3, A B ). To provide annotations based on sequence similarity, nucleotide sequences of TriFLCDS entries were used as the query to search against the sequence sets provided in various public data resources. Because assignment of full-length CDSs with clustered representative transcript sequences makes it possible to use complete ORFs, which facilitates the molecular elucidation of CDS function and gene structure, TriFLCDS entries were assigned to clustered, representative transcript sequences of wheat and barley using separate BLASTN searches against the NCBI UniGene, Plant GDB, TIGR Gene Index, and HarvEST databases. In total, 7,030 (95.8%) full-length CDSs from barley and 7,719 (90.5%) from wheat were assigned to at least one representative transcript derived from these clustered cDNA sequence datasets (Supplemental Fig. S2A).
To obtain clues about gene function, TriFLCDS entries were also searched against the annotated protein datasets of Arabidopsis, rice (RAP-DB and TIGR), and sorghum, as well as against representative nonredundant protein data repositories (nr of NCBI and UniProt of the European Bioinformatics Institute [EBI]). We found hits with significant similarity to more than 80% of the TriFLCDS entries in Arabidopsis and to at least 87% in rice and sorghum (Supplemental Fig. S2B). The results of the similarity searches for each of the TriFLCDS entries are shown on the Web interface, and, whenever possible, links to the original data for each hit are provided so as to enable browsing of additional related information (Fig. 3C). For domain-based functional annotation, the deduced protein data were subjected to a domain search using InterProScan. In total, 13,162 (82.9%) entries were assigned to at least one identifier of the database used in InterPro. Using the Web interface, the user can browse each of the results of the domain search, along with the predicted GO classification (Fig. 3, D and E). A synopsis of the results of the similarity search against various sequence resources is shown on the Web interface, and this should allow researchers to determine the annotation status of the searched entries and the predicted annotation of the most likely counterparts in other databases. This should help users to build hypotheses that are related to gene function. To construct a dataset that relates the proteins predicted in TriFLDB to those of other plant species, we grouped TriFLCDSs hierarchically into homologous clusters with the protein datasets for Arabidopsis, rice, and sorghum. Clustering with a 90% identity threshold produced 10,639 clusters containing one or more protein sequences derived from wheat or barley full-length CDSs. This indicates that the current version of TriFLDB contains putative full-length CDSs that correspond to more than 10,000 nonredundant genes (Supplemental Table S1). Hierarchically clustered data have been added to TriFLDB and are presented together with information on the domain structure predicted for each protein sequence. This information can be browsed via a Web-based hierarchical structure, which is a viewing interface that contains annotated domain data as well as hyperlinks to the reference databases (Fig. 4 ). The interface provides the structure and relationships of the modeled proteomes of Arabidopsis, rice, and sorghum, and includes TriFLDB entries that are clustered according to global amino acid identities. Since all of the TriFLCDS entries in the viewer are reciprocally related on each annotation page, the user can navigate to the detailed annotation pages of other TriFLCDS entries classified in the same cluster. To provide clues for sequence comparison among clustered proteins, the interface provides four kinds of hyperlinks to internal and external data resources. The user can confirm a multiple alignment of each clustered dataset, and these can be captured in the clustalw format and shown in a subwindow opened from the alignment hyperlink in each cluster. The protein domain search results from InterProScan for all clustered entries can be browsed, and hyperlinks to the original protein domain knowledge resources are also provided. The domain identifiers listed for each of the sequence entries should allow a clear assessment of the sharing status of the domain structure among the clustered sequences. Hyperlinks to referenced annotations in the modeled proteome datasets of Arabidopsis, rice, and sorghum are also provided to permit comparisons of domain structures among clustered genes with seamless browsing.
The detailed annotations of each of the TriFLCDS entries that have been inferred via sequence similarity as well as predicted protein domains should facilitate the prediction of possible gene functions, as well as the configuration of further functional analyses and/or the narrowing down of candidate genes in Triticeae.
Genetic localization of full-length CDSs will greatly facilitate the positional cloning of targeted genes in wheat and barley. We related mapped EST markers to full-length cDNA sequences and to CDSs of barley and wheat to generate a table showing the map locations of full-length transcripts in Triticeae. Out of 3,605 mapped cDNAs, 2,182 (60.5%) demonstrated significant similarity to full-length CDSs of either barley or wheat (Table III)
. TriFLDB entries assigned to mapped wheat and barley cDNA markers can be searched using wheat and barley chromosome names, and relational links are provided on the Web interface together with additional annotations (Fig. 5, A and B
). The user can browse information on corresponding cDNA markers at the TriMEDB interface (Fig. 5C) and can search for cDNA markers related to full-length CDSs via the TriMEDB search interface (http://TriMEDB.psc.riken.jp/cgi-bin/TriMEDB/marker_search.pl). The integration of mapped ESTs with full-length CDSs can provide valuable information, especially when accompanied by annotations, such as predictions of whole-gene structure. This information can be used to coordinate nucleotide polymorphism discoveries with marker development. Moreover, genome-scale genotyping will facilitate forward genetic approaches, such as QTL analyses and association studies (Varshney et al., 2006
Assignment and Assembly of Wheat and Barley ESTs into TriFLCDSs
Full-length CDSs are useful for obtaining accurate sequence clusters and for the assembly of cDNA sequences. To determine the relationships between TriFLCDSs and the released ESTs of wheat and barley, we conducted BLAST similarity searches with the ESTs against TriFLCDS entries. Each query EST demonstrating
Comparative Mapping of TriFLCDSs onto the Rice and Sorghum Genomes
To visualize predicted exon-intron structures and the comparative genomic features of Triticeae transcripts in rice and sorghum, sequence similarity-based mapping of TriFLCDSs onto the rice and sorghum genomes was performed. The Generic Genome Browser (Gbrowse; Donlin, 2007
The database structure of the current version of TriFLDB and its relationship with TriMEDB are depicted in a schematic diagram showing the data handling and generated relational datasets with corresponding Web interfaces (Supplemental Fig. S3). The genome resources related to Triticeae species will continue to accumulate (Schulte et al., 2009
This integrative Web-based database interface provides information on putative full-length CDSs of wheat and barley that will facilitate the comparative genomics of grasses. The database should meet the broad demands of researchers who need to search for information related to Triticeae genes with the goal of a greater understanding of Gramineae species. The database should accelerate progress in Triticeae genomics and plant comparative genomics, as well as facilitate molecular breeding programs.
Prediction and Retrieval of Full-Length CDSs
We retrieved cDNA sequences of completely sequenced wheat (Triticum aestivum) full-length cDNAs using a primer walking method with the Phred/Phrap package (Ewing and Green, 1998 The sequences were first checked for sequence contamination and extensive simple repeats using the SeqClean script (http://compbio.dfci.harvard.edu/tgi/software/). Vector sequences were then trimmed using the univec_core db of NCBI (http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html) with the cross_match utility of the Phred/Phrap package. Contamination was identified via BLASTN sequence similarity searches against both the Escherichia coli K12 genome (U00096) and the bacteriophage phi_X174 (J02482) genome sequences. Sequences with a threshold e value less than 1e-100 were removed.
CDS prediction was performed based on the longest ORF using those sequences that had passed through the sequence cleaning step. As supporting information, we used the results for full-length CDS prediction from DECODER (Fukunishi and Hayashizaki, 2001
To annotate the CDSs of TriFLDB with predicted gene functions, we searched the sequence data against the following protein and nucleotide datasets using the BLAST algorithm (Altschul et al., 1997 Conserved domains in the deduced protein sequence of each TriFLCDS were identified with InterProScan and the InterPro database (http://www.ebi.ac.uk/interpro/). The domain data were also used to assign GO terms to each TriFLCDS, which are also available as search query terms for the TriFLCDSs. Links to each of the original datasets interrelated with the TriFLCDS entries are provided on the TriFLDB Web interface.
The nonredundant set of TriFLCDSs and groupings with other plant proteins on the basis of sequence similarity assists in the identification of the unique genes of Triticeae plants, as well as in acquiring proteins with sequence similarity to those in other plants. Through the use of the CD-HIT package (Li and Godzik, 2006
As of April 15, 2008, the dbEST database of NCBI (NCBI-GenBank Flat File Release 165.0) contained more than 0.5 million entries for barley and more than 1 million for wheat. These sequences were retrieved from GenBank and were cleaned up as follows. First, low-complexity and/or repetitive sequences were removed using SeqClean with the default parameter settings. Repetitive sequence regions of the remaining sequences were identified and masked with RepeatMasker (http://www.repeatmasker.org/), with optional use of the nonredundant Gramineae repeat-sequence dataset derived from TIGR as the target database (Ouyang and Buell, 2004
To provide comparative sequence mapping information for the TriFLCDS entries that were allocated to the genome sequences of rice and sorghum, we mapped the nucleotide sequences of TriFLCDSs onto the genome sequences of rice (International Rice Genome Sequencing Project v. 4, http://rgp.dna.affrc.go.jp/IRGSP/download.html) and sorghum (JGI v. 1.4) based on nucleotide sequence similarity. A combination of BLASTN and SIM4 (Pidoux et al., 2003 Pairwise alignment using SIM4 with default parameter settings was then performed to predict the genomic structure in the comparative alignment between the two sequences that were used as input. The comparative genome mapping results have been implemented in Gbrowse with the gene annotations for rice and sorghum provided by RAP-DB and JGI, respectively. To map TriFLCDSs onto the nonannotated regions of each genome, the TriFLCDSs homologous to the plant organelle sequences that were filtered out were mapped onto both genomes and compared with the mapped region using the genome annotations RAP-DB v. 2 and Sbi 1.4. The wheat and barley chloroplast genomes (AB042240, EF115541) and the wheat mitochondrial genome (AP008982) were searched using BLASTN with a threshold e value of less than 1e-20 to subtract possible FLCDSs derived from the organelle genomes.
To assign genetically mapped ESTs to the full-length transcripts of the TriFLDB entries, we searched the dataset of 15,871 TriFLCDS nucleotide sequences with the mapped EST markers housed in TriMEDB (http://TriMEDB.psc.riken.jp/) using BLASTN with a threshold e value of less than 1e-130. The table of relationships between the mapped ESTs and the full-length transcripts generated by this homology search was imported into TriMEDB as a database for Cmap (http://gmod.org/wiki/Cmap) to visualize linkage map images. The comparative data from the mapping of cDNA markers of TriMEDB onto the rice genome were also integrated into the Gbrowse interface of TriFLDB. Cross referencing between the Web interfaces of TriMEDB and TriFLDB was also implemented.
The following materials are available in the online version of this article.
The authors thank Dr. K. Sato of Okayama University, Japan, for permitting the integration of released data into TriFLDB. The authors also thank Dr. T. Close of the University of California for permitting integration of the released data from HarvEST barley v. 1.68 to update TriMEDB. We also thank Dr. Y. Hayashizaki of the RIKEN Omics Science Center for DECODER. Received March 7, 2009; accepted May 8, 2009; published May 15, 2009.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Kazuo Shinozaki (sinozaki{at}rtc.riken.jp).
[C] Some figures in this article are displayed in color online but in black and white in the print edition.
[W] The online version of this article contains Web-only data.
[OA] Open access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.109.138214 * Corresponding author; e-mail sinozaki{at}rtc.riken.jp.
Alexandrov NN, Brover VV, Freidin S, Troukhan ME, Tatarinova TV, Zhang H, Swaller TJ, Lu YP, Bouck J, Flavell RB, et al (2009) Insights into corn genes derived from large-scale cDNA sequencing. Plant Mol Biol 69: 179–194[CrossRef][Web of Science][Medline] Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 Bellgard M, Ye J, Gojobori T, Appels R (2004) The bioinformatics challenges in comparative analysis of cereal genomes: an overview. Funct Integr Genomics 4: 1–11[CrossRef][Medline] Bossolini E, Wicker T, Knobel PA, Keller B (2007) Comparison of orthologous loci from small grass genomes Brachypodium and rice: implications for wheat genomics and grass genome annotation. Plant J 49: 704–717[CrossRef][Web of Science][Medline] Carollo V, Matthews DE, Lazo GR, Blake TK, Hummel DD, Lui N, Hane DL, Anderson OD (2005) GrainGenes 2.0. An improved resource for the small-grains community. Plant Physiol 139: 643–651 Childs KL (2009) Genomic and genetic database resources for the grasses. Plant Physiol 149: 132–136 Childs KL, Hamilton JP, Zhu W, Ly E, Cheung F, Wu H, Rabinowicz PD, Town CD, Buell CR, Chan AP (2007) The TIGR Plant Transcript Assemblies database. Nucleic Acids Res 35: D846–D851 Close TJ, Wanamaker S, Roose ML, Lyon M (2007) HarvEST: an EST database and viewing software. Methods Mol Biol 406: 161–178[Medline] Cogburn LA, Porter TE, Duclos MJ, Simon J, Burgess SC, Zhu JJ, Cheng HH, Dodgson JB, Burnside J (2007) Functional genomics of the chicken—a model organism. Poult Sci 86: 2059–2094 Conte MG, Gaillard S, Lanau N, Rouard M, Perin C (2008) GreenPhylDB: a database for plant comparative genomics. Nucleic Acids Res 36: D991–D998 Dong Q, Kroiss L, Oakley FD, Wang BB, Brendel V (2005) Comparative EST analyses in plant systems. Methods Enzymol 395: 400–418[CrossRef][Web of Science][Medline] Donlin MJ (2007) Using the Generic Genome Browser (GBrowse). Curr Protoc Bioinformatics Chapter 9: Unit 9.9 Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V (2008) PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res 36: D959–D965 Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186–194 Faris JD, Zhang Z, Fellers JP, Gill BS (2008) Micro-colinearity between rice, Brachypodium, and Triticum monococcum at the wheat domestication locus Q. Funct Integr Genomics 8: 149–164[CrossRef][Web of Science][Medline] Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al (2008) Ensembl 2008. Nucleic Acids Res 36: D707–D714 Fukunishi Y, Hayashizaki Y (2001) Amino acid translation program for full-length cDNA sequences with frameshift errors. Physiol Genomics 5: 81–87 Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, Hayashizaki Y, Okazaki Y (2003) CDS annotation in full-length cDNA sequence. Genome Res 13: 1478–1487 Hayashizaki Y (2003) RIKEN mouse genome encyclopedia. Mech Ageing Dev 124: 93–102[CrossRef][Web of Science][Medline] Horan K, Lauricha J, Bailey-Serres J, Raikhel N, Girke T (2005) Genome cluster database: a sequence family analysis platform for Arabidopsis and rice. Plant Physiol 138: 47–54 Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9: 868–877 Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, et al (2004) Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2: e162[CrossRef][Medline] International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436: 793–800[CrossRef][Medline] Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, Antonio BA, Aono H, Apweiler R, Bruskiewich R, et al (2007) Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res 17: 175–183 Jaiswal P, Ni J, Yap I, Ware D, Spooner W, Youens-Clark K, Ren L, Liang C, Zhao W, Ratnapu K, et al (2006) Gramene: a bird's eye view of cereal genomes. Nucleic Acids Res 34: D717–D723 Jia J, Fu J, Zheng J, Zhou X, Huai J, Wang J, Wang M, Zhang Y, Chen X, Zhang J, et al (2006) Annotation and expression profile analysis of 2073 full-length cDNAs from stress-induced maize (Zea mays L.) seedlings. Plant J 48: 710–727[CrossRef][Web of Science][Medline] Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, et al (2003) Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science 301: 376–379 Lai J, Dey N, Kim CS, Bharti AK, Rudd S, Mayer KF, Larkins BA, Becraft P, Messing J (2004) Characterization of the maize endosperm transcriptome and its comparison to the rice genome. Genome Res 14: 1932–1937 Lee Y, Quackenbush J (2003) Using the TIGR gene index databases for biological discovery. Curr Protoc Bioinformatics Chapter 1: Unit 1.6 Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658–1659 Liang C, Jaiswal P, Hebbard C, Avraham S, Buckler ES, Casstevens T, Hurwitz B, McCouch S, Ni J, Pujar A, et al (2008) Gramene: a growing plant comparative genomics resource. Nucleic Acids Res 36: D947–D953 Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, Quackenbush J (2000) An optimized protocol for analysis of EST sequences. Nucleic Acids Res 28: 3657–3665 Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, Engstrom PG, Lenhard B, Aturaliya RN, Batalov S, Beisel KW, et al (2006) Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet 2: e62[CrossRef][Medline] Mitchell RA, Castells-Brooke N, Taubert J, Verrier PJ, Leader DJ, Rawlings CJ (2007) Wheat Estimated Transcript Server (WhETS): a tool to provide best estimate of hexaploid wheat transcript sequence. Nucleic Acids Res 35: W148–W151 Mochida K, Kawaura K, Shimosaka E, Kawakami N, Shin IT, Kohara Y, Yamazaki Y, Ogihara Y (2006) Tissue expression map of a large number of expressed sequence tags and its application to in silico screening of stress response genes in common wheat. Mol Genet Genomics 276: 304–312[CrossRef][Web of Science][Medline] Mochida K, Saisho D, Yoshida T, Sakurai T, Shinozaki K (2008) TriMEDB: a database to integrate transcribed markers and facilitate genetic studies of the tribe Triticeae. BMC Plant Biol 8: 72[CrossRef][Medline] Ouyang S, Buell CR (2004) The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32: D360–D363 Ozdemir BS, Hernandez P, Filiz E, Budak H (2008) Brachypodium genomics. Int J Plant Genomics 2008: 536104[Medline] Paterson AH (2008) Genomics of sorghum. Int J Plant Genomics 2008: 362451[Medline] Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–556[CrossRef][Web of Science][Medline] Paterson AH, Freeling M, Sasaki T (2005) Grains of knowledge: genomics of model cereals. Genome Res 15: 1643–1650 Paux E, Sourdille P, Salse J, Saintenac C, Choulet F, Leroy P, Korol A, Michalak M, Kianian S, Spielmeyer W, et al (2008) A physical map of the 1-gigabase bread wheat chromosome 3B. Science 322: 101–104 Pidoux AL, Richardson W, Allshire RC (2003) Sim4: a novel fission yeast kinetochore protein required for centromeric silencing and chromosome segregation. J Cell Biol 161: 295–307 Ralph SG, Chun HJ, Kolosova N, Cooper D, Oddy C, Ritland CE, Kirkpatrick R, Moore R, Barber S, Holt RA, et al (2008) A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). BMC Genomics 9: 484[CrossRef][Medline] Sakurai T, Satou M, Akiyama K, Iida K, Seki M, Kuromori T, Ito T, Konagaya A, Toyoda T, Shinozaki K (2005) RARGE: a large-scale database of RIKEN Arabidopsis resources ranging from transcriptome to phenome. Nucleic Acids Res 33: D647–D650 Sato S, Nakamura Y, Asamizu E, Isobe S, Tabata S (2007) Genome sequencing and genome resources in model legumes. Plant Physiol 144: 588–593 Sato K, Shin IT, Seki M, Shinozaki K, Yoshida H, Takeda K, Yamazaki Y, Conte M, Kohara Y (2009) Development of 5006 full-length CDNAs in barley: a tool for accessing cereal genomics resources. DNA Res 16: 81–89 Sato S, Tabata S (2006) Lotus japonicus as a platform for legume research. Curr Opin Plant Biol 9: 128–132[CrossRef][Web of Science][Medline] Schulte D, Close TJ, Graner A, Langridge P, Matsumoto T, Muehlbauer G, Sato K, Schulman AH, Waugh R, Wise RP, et al (2009) The international barley sequencing consortium—at the threshold of efficient access to the barley genome. Plant Physiol 149: 142–147 Seki M, Shinozaki K (2009) Functional genomics using RIKEN Arabidopsis thaliana full-length cDNAs. J Plant Res (in press) Spannagl M, Noubibou O, Haase D, Yang L, Gundlach H, Hindemitt T, Klee K, Haberer G, Schoof H, Mayer KF (2007) MIPSPlantsDB—plant database resource for integrative and comparative plant genome research. Nucleic Acids Res 35: D834–D840 Tanaka T, Antonio BA, Kikuchi S, Matsumoto T, Nagamura Y, Numa H, Sakai H, Wu J, Itoh T, Sasaki T, et al (2008) The Rice Annotation Project Database (RAP-DB): 2008 update. Nucleic Acids Res 36: D1028–D1033 Tochitani S, Hayashizaki Y (2007) Functional screening revisited in the postgenomic era. Mol Biosyst 3: 195–207[CrossRef][Web of Science][Medline] Varshney RK, Hoisington DA, Tyagi AK (2006) Advances in cereal genomics and applications in crop breeding. Trends Biotechnol 24: 490–499[Web of Science][Medline] Wall PK, Leebens-Mack J, Muller KF, Field D, Altman NS, dePamphilis CW (2008) PlantTribes: a gene and gene family resource for comparative genomics in plants. Nucleic Acids Res 36: D970–D976 Ware D (2007) Gramene: a resource for comparative grass genomics. Methods Mol Biol 406: 315–330[Medline] Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, Takeda J, Taniya T, Sakate R, Kikugawa S, Shimada M, et al (2008) The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts. Nucleic Acids Res 36: D793–D799 Zhang H, Sreenivasulu N, Weschke W, Stein N, Rudd S, Radchuk V, Potokina E, Scholz U, Schweizer P, Zierold U, et al (2004) Large-scale analysis of the barley transcriptome based on expressed sequence tags. Plant J 40: 276–290[CrossRef][Web of Science][Medline] Zhu W, Buell CR (2007) Improvement of whole-genome annotation of cereals through comparative analyses. Genome Res 17: 299–310 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|