|
|
||||||||
|
Plant Physiol, December 2002, Vol. 130, pp. 1606-1613
UPDATE ON GRAMENE
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
ABSTRACT |
|---|
|
|
|---|
Gramene (http://www.gramene.org) is a comparative genome mapping database for grasses and a community resource for rice (Oryza sativa). It combines a semi-automatically generated database of cereal genomic and expressed sequence tag sequences, genetic maps, map relations, and publications, with a curated database of rice mutants (genes and alleles), molecular markers, and proteins. Gramene curators read and extract detailed information from published sources, summarize that information in a structured format, and establish links to related objects both inside and outside the database, providing seamless connections between independent sources of information. Genetic, physical, and sequence-based maps of rice serve as the fundamental organizing units and provide a common denominator for moving across species and genera within the grass family. Comparative maps of rice, maize (Zea mays), sorghum (Sorghum bicolor), barley (Hordeum vulgare), wheat (Triticum aestivum), and oat (Avena sativa) are anchored by a set of curated correspondences. In addition to sequence-based mappings found in comparative maps and rice genome displays, Gramene makes extensive use of controlled vocabularies to describe specific biological attributes in ways that permit users to query those domains and make comparisons across taxonomic groups. Proteins are annotated for functional significance using gene ontology terms that have been adopted by numerous model species databases. Genetic variants including phenotypes are annotated using plant ontology terms common to all plants and trait ontology terms that are specific to rice. In this paper, we present a brief overview of the search tools available to the plant research community in Gramene.
| |
INTRODUCTION |
|---|
|
|
|---|
Gramene is a relational database and
Web site that serves as a comparative mapping database for the grasses
and a community resource for rice (Oryza sativa; Ware
et al., 2002
). The U.S. Department of Agriculture (USDA) funded
the database in October 2000 to better leverage the newly available
rice genomic sequence (Sasaki and Burr, 2000
; Yu
et al., 2002
) as a foundation for the study of comparative
genomics of agriculturally important members of the grass family. The
first release of Gramene to the community was made in January 2002. Gramene provides an integrated resource of comparative genetic and
physical maps between rice and other grasses based upon orthologous
sequences and brings structure to the wealth of genetic and phenotypic
information available among the grasses.
In this paper, we will walk through the Gramene Web site as a way to familiarize the reader with the main types of information available in Gramene. In addition to interactive Web-based access, all of the raw data sets and annotations are downloadable in bulk form from the Gramene FTP site. Researchers wishing to create mirrors of Gramene or to reuse the Gramene database and visualizations can obtain the Gramene software under open source terms.
| |
MAP SEARCH |
|---|
|
|
|---|
The Gramene main navigation bar (Fig. 1A), found on all of the Gramene pages, provides explicit links to various database search tools, submission forms, documentation, and access to data sets found within the Web site. We begin by exploring the grass genetic and comparative maps available under the map search link.
|
Genetic maps are used as entry points to identify regions of
colinearity among the cereals (Ahn and Tanksley, 1993
;
Gale and Devos, 1998
; Feuillet and Keller,
1999
). Researchers typically will begin by selecting a single
genetic map to display, and will then add additional maps from the same
or different species, building up a comparative display (Fig. 1B). One
way to enter a map display is to select a map study and a specific
linkage group from a set of pull down menus. Gramene also offers a
"feature search" that allows researchers to identify maps that
contain a particular feature of interest such as a genetic marker. Maps
can be adjusted in a number of ways, including flipping linkage groups
and expanding regions to see additional details.
Once a map view is generated, it provides the user with an interactive display containing links to external databases and internal displays within Gramene. For instance, when the researcher selects a genetic marker he/she is taken to a detail page that provides links to additional information related to that marker, such as marker assay conditions or polymorphism rate. If the genetic marker is found on several cereal maps in addition to rice, then the detail page will take the user directly to other species-specific databases, thereby facilitating the biologist's ability to traverse multiple databases and providing interoperability between plant community resources.
The comparative map tool allows multiple maps to be viewed
simultaneously providing utility for both between (Fig. 1B) and within
(Fig. 2A) species map comparisons. To add
maps to the display, researchers simply select additional maps from the
pull-down menus. This can be repeated ad infinitum. As an alternative,
researchers who are primarily interested in comparative mapping may
enter the map viewer via the matrix view (Fig. 1C), which provides a table with all pairwise combinations of maps and the number of correspondences (markers) between them. By referring to the matrix, a
biologist may easily see that the Cornell RFLP map study (Causse et al., 1994
; Van Deynze et al., 1998
;
Wilson et al., 1999
) contains the highest number of
correspondences between available cereal maps at this time (Fig. 1C)
and will likely provide the most utility for cross-species comparisons
in Gramene. However, the Japanese Rice Genome Project RFLP map study
(Harushima et al., 1998
) provides the highest number of
correspondences to the rice physical map (Fig. 1C).
|
Rather than limiting researchers to a single static view of map to map correspondences, our comparative map displays are dynamically generated based upon a set of automated and curated correspondences at the individual marker level. The current correspondences are based on sequence similarity among markers, identity based on marker name, and curated manual correspondences. Each correspondence carries explicit supporting evidence, allowing researchers to understand the strength and significance of the map correspondences and to control the type of correspondences to display.
| |
PHYSICAL MAPS |
|---|
|
|
|---|
In addition to the side-by-side genetic map display shown in
Figure 1B, the Gramene map viewer allows researchers to study the
relationship between genetic maps and physical (clone) maps. The
current physical maps contained in Gramene are fingerprint contig (FPC)
maps generated from digested genomic clones in bacterial artificial
chromosome (BAC) or P1 artificial chromosome (PAC) vectors (Chen
et al., 2002
). In Gramene, the rice FPC map has been enhanced
to show the positions of both the original hybridized RFLP markers and
sequence-based markers derived from genomic and cDNA clones. In
addition, we have placed all available simple sequence repeat (SSR)
markers onto the FPC map. By aligning sequence-based genetic markers to
the emerging rice sequence from the International Rice Genome
Sequencing Project (IRGSP) we are able to produce an integrated map
that relates the genetic, physical and sequence maps of rice. This
integrated map, called the "I-Map" allows researchers to move from
the genetic maps of maize (Zea mays), wheat (Triticum aestivum), or barley (Hordeum vulgare) to the rice
genome sequence.
Figure 2 illustrates how researchers can use the I-Map to move between genetic, physical, and sequence maps within a species. In Figure 2A, the researcher searched for the Waxy locus, wx, and identified it in three genetic maps of rice, which are displayed side by side. The researcher then added the Clemson University Genomics Institute FPC map, thereby identifying a large contig, ctg129, that contains the region spanning Waxy. The researcher then clicked on ctg129, displaying the detailed view shown in Figure 2B (map), along with the tabular text summary shown in Figure 2B (table). The image and the table together provide an entry point to the sequenced clones that span the contig, and indicate that the Waxy locus is most likely to be located in clone AP002542 because this clone contains a genetic marker, S1084, that is close to Waxy on one of the genetic maps. By selecting a clone, the researcher is taken to the detailed display of the annotated sequence as described in the next section.
| |
RICE GENOME BROWSER |
|---|
|
|
|---|
The rice genome browser in Gramene is a modified version of the
Ensembl (Hubbard et al., 2002
) genome browser, a tool
developed collaboratively by the European Bioinformatics Institute and
Sanger Center for the human genome project. In Gramene, the rice genome browser contains two views: an overview (Fig.
3A) and detailed view (Fig. 3B). This
browser integrates many types of information relating to the genome,
including raw features such as cross-species sequence alignments, and
derived annotations, such as predicted gene structures. A partial list
of the information currently displayed on the browser includes GenBank
annotations from IRGSP, gene predictions based on FGENESH
(Salamov and Solovyev, 2000
), genetic markers from
cereals, rice coding sequences, sequence tagged DNAs ranging from rice
BAC ends to expressed sequence tags (ESTs) to SSRs, and clusters
derived from ESTs from rice and other cereals. Sequence tagged clones
and genetic markers are downloaded from GenBank whereas the clusters
(theoretical contigs) were obtained from multiple sources such as The
Institute for Genomics Research (Quackenbush et al.,
2001
) and Beijing Genomics Institute (Yu et al.,
2002
). All data sources are acknowledged via the
"Collaborators" link on the main navigation bar, whereas
annotations generated within Gramene are detailed in a materials and
methods section available from the "Documentation" link on the
navigation bar.
|
The detailed view (Fig. 3B) allows users to zoom in and out or to move
along the genomic clone. When users click on a feature or annotation,
they are presented with a drop-down menu (Fig. 2B) which provides links
to the detail pages in Gramene as well as to external databases such as
GrainGenes or MaizeDB (Polacco and Coe, 1999
;
Matthews et al., 2003
). Users can further customize the
display by showing or hiding feature tracks, changing the color or type
of feature, or compacting or expanding a track. A biologist can enter
the genome browser by several methods; via a text search for a genomic
clone or GenBank accession number, via a comparative map view by
clicking on a genomic clone or genetic marker, or via the results of a
BLAST search of the rice IRGSP clones. A biologist can traverse back to
the physical and genetic map views by clicking on genetic markers that
are displayed in the overview and detail panels or by following links
from sequenced BAC ends that have been aligned to the genome. In the
case of the rice PAC clone P0679C08 (Fig. 3A), RFLP markers C76 and
S1084 and SSR marker RM190 all provide links back to the genetic maps found in Figure 2A. This image also shows the important role played by
rice and cereal ESTs in the genome browser. In this case, the multi-exon gene prediction from IRGSP that is shown at the top of the
detailed view is supported by multiple rice and cereal ESTs. These
simultaneously validate the prediction and provide a basis for
establishing future cross-species relationships. A biologist can obtain
detailed information about each gene prediction by clicking on the
transcript and selecting "protein information" from the drop down
menu that appears. This will bring the researcher to a protein detail
page from the Gramene protein database, another useful resource for the
cereal biologist.
| |
THE USE OF ONTOLOGIES |
|---|
|
|
|---|
We now discuss the curated portions of Gramene. An important
organizing principle of the curated portion of Gramene is its use of
ontologies to describe information on proteins, genes, alleles, and
phenotypes. Ontologies are controlled vocabularies that are shared by
database communities working on different taxa. We use the gene
ontology (GO; Ashburner et al., 2000
) to describe the
attributes of gene products in multiple species, the plant ontology
(PO) to describe anatomical features and developmental stages in
diverse plant species, and the trait ontology (TO) to describe how,
when and where a rice-specific character (or trait) is evaluated or
quantified (Jaiswal et al., 2002
). The use of controlled
vocabularies allows Gramene to support complex queries and allows
cross-referencing of information within and between databases
(interoperability). For example, the GO is shared among multiple
metazoan databases, whereas the PO is in use for Arabidopsis, maize,
and rice. The use of shared ontologies allows researchers to relate
genes that affect a particular life stage or organ in rice to ones that
affect the corresponding life stage or organ in maize or Arabidopsis.
Combined with map-based cross-species correspondence, this provides a
powerful tool for researchers seeking to find candidate genes for
quantitative trait loci and other traits.
| |
PROTEIN DATABASE |
|---|
|
|
|---|
The Gramene protein database is a collection of curated protein entries from all the species, subspecies, and cultivars of Oryza sp. presently available from SPTrEMBL and GenBank. The most recent updates of protein entries were imported en masse from the August 2002 release of SPTrEMBL. The September 2002 release of Gramene has 12,310 entries representing 12,939 GenBank entries. Of these, 5,022 protein entries, or about 40% of the total, have annotations examined and validated by the Gramene curatorial staff.
A biologist will typically enter the protein database by searching for a protein name (e.g. granule-bound glycogen [starch] synthase), a gene name (e.g. waxy), a subspecies name (e.g. indica or japonica), a cultivar/strain name (e.g. Nipponbare or IR36), or a database accession number (e.g. P19395), or by typing the SPTrEMBL database ID (e.g. UGST_ORYSA or Q9ZSU7). Other ways to enter the protein database are by searching for a particular protein domain (e.g. "zinc finger") or a GO term.
The protein detail page (Fig. 4A) displays the name of the protein, the gene name, the EC number (if the protein happens to be an enzyme and has been classified by the International Union for Biochemistry and Molecular Biology), the database cross references to GenBank and SPTrEMBL, and its location on the rice genome, which links to the genome browser. The association section provides curated information on a gene product's molecular function, its role in a biological process, and its possible localization in a cellular component using the GO. The protein entry page also provides detail about the protein family to which the gene product belongs, based on the pair-wise similarity to consensus domains identified by the Pfam and PROSITE databases. In the example of P19395 (Fig. 4A), the protein was assigned to a family of protein kinases (Pfam: PF00534; Glycos_transf_1). Gramene also allows the user to look for other curated entries from rice that share these features by clicking on the URL next to "Other Members of this Family," thus presenting a global overview of entries from all species and germplasm accessions in rice. In this example, there are 21 different protein entries in the database that are members of the protein family of glycosyl transferases (Pfam, PF00534; Glycos_transf_1). To further facilitate cross-species analysis, Gramene links to the NCBI BLink service to facilitate a search for homologs and orthologs from rice, Arabidopsis, the grasses, all monocots, dicots, fungi, and/or metazoa.
|
The protein entry page may also provide information on in silico or experimentally analyzed features such as transmembrane domains, cleavage sites, and so forth. We currently annotate proteins using only GO terms, however in future, users will be able to search for information on the functional consequences of alterations in the protein using PO terms to describe affected plant anatomy and growth stages. As shown in Figure 4A, the supporting evidence for each assertion is documented, usually by referring to a literature reference. The same standard for documenting supporting evidence is upheld throughout the curated division of Gramene.
| |
PHENOTYPE SEARCH |
|---|
|
|
|---|
In September 2002, Gramene released a new database of rice genes and alleles, composed of classically identified phenotypic variants (marker genes) of rice. The gene and allele database is a curated resource providing collective information about publicly available mutant stocks of rice (Oryza sp.). It includes descriptions of phenotypic variants and alleles associated with morphological, developmental, and agronomically important phenotypes, variants of physiological or morphological characters, biochemical functions, and isozymes. The current version of the rice genes and alleles database houses information on over 400 phenotypic variants curated from the published literature.
The main entry point to the genes and alleles database is the "Phenotype search" link on the Gramene navigation bar. A researcher can search the database using a gene symbol, gene name, or a Gramene accession number. The database also provides a full text search of the phenotype descriptions. The search result presents a comprehensive summary of all the data associated with the particular phenotypic variant in the database and provides links to associated objects. Figure 4B shows the detail page associated with the lesion mimic disease-10 locus lrd10. The page displays lrd10's name, symbol, synonym, phenotypic description, literature references, and two images that demonstrate the phenotype. Other curated information includes information about the gene's allelic variants, the genetic background in which the alleles are observed, and the environmental conditions in which the alleles are assayed. Links to Gramene's sequence, map, and protein divisions are provided when feasible.
The gene and allele database is in the early stages of development, and more phenotypic variants will be curated and useful features will be added in the future. Users are encouraged to send suggestions and comments to help improve the utility of the phenotype search by contacting the Gramene curators by e-mail at gramene{at}gramene.org or by using the feedback button that appears at the bottom of every page.
| |
THIRD-PARTY SUBMISSION OF DATA |
|---|
|
|
|---|
The inclusion of curated literature is often limited by time and language constraints. To facilitate data retrieval and integration and to encourage community involvement, a Web-accessible submission form is provided to allow community submission of newly identified mutants or mapped genes that are accepted for publication or already available in scientific journals. The submission forms are available from the Gramene navigation bar under the submission link.
| |
FUTURE ENHANCEMENTS |
|---|
|
|
|---|
Over the past two years, Gramene has built an information resource for the grass community to use the rapidly emerging rice genome sequence and the reservoir of genetic maps available from the cereal community. In the next year, we aim to stay current with the emerging rice genome sequence, to continue to enhance existing tools within Gramene, and to continue the development of controlled vocabularies in collaboration with colleagues in the greater biological community. The major new effort will be to curate and incorporate phenotypes associated with quantitative trait loci studies published in the literature. This will substantially enhance the existing genes and alleles database and provide a valuable resource for researchers seeking to identify the genes associated with agronomically significant traits among the cereals.
| |
FOOTNOTES |
|---|
Received September 24, 2002; returned for revision September 29, 2002; accepted October 7, 2002.
2 These authors contributed equally to the paper.
* Corresponding author; e-mail SRM4{at}cornell.edu; fax 607-255-6683.
1 This work was supported by the USDA Initiative for Future Agriculture and Food Systems (IFAFS) (grant no. 00-52100-9622) and by a USDA-Agricultural Research Service specific cooperative agreement (grant no. 58-1907-0-041).
www.plantphysiol.org/cgi/doi/10.1104/pp.015248.
| |
LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Liang, P. Jaiswal, C. Hebbard, S. Avraham, E. S. Buckler, T. Casstevens, B. Hurwitz, S. McCouch, J. Ni, A. Pujar, et al. Gramene: a growing plant comparative genomics resource Nucleic Acids Res., January 11, 2008; 36(suppl_1): D947 - D953. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Tuberosa, S. Salvi, S. Giuliani, M. C. Sanguineti, M. Bellotti, S. Conti, and P. Landi Genome-wide Approaches to Investigate and Improve Maize Response to Drought Crop Sci., December 18, 2007; 47(Supplement_3): S-120 - S-141. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Masoudi-Nejad, S. Goto, R. Jauregui, M. Ito, S. Kawashima, Y. Moriya, T. R. Endo, and M. Kanehisa EGENES: Transcriptome-Based Plant Database of Genes with Metabolic Pathway Information and Expressed Sequence Tag Indices in KEGG Plant Physiology, June 1, 2007; 144(2): 857 - 866. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Erwin, E. G. Jewell, C. G. Love, G. A. C. Lim, X. Li, R. Chapman, J. Batley, J. E. Stajich, E. Mongin, E. Stupka, et al. BASC: an integrated bioinformatics system for Brassica research Nucleic Acids Res., January 12, 2007; 35(suppl_1): D870 - D873. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M.S. Evans The indeterminate gametophyte1 Gene of Maize Encodes a LOB Domain Protein Required for Embryo Sac and Leaf Development PLANT CELL, January 1, 2007; 19(1): 46 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Soderlund, W. Nelson, A. Shoemaker, and A. Paterson SyMAP: A system for discovering and viewing syntenic regions of FPC maps. Genome Res., September 1, 2006; 16(9): 1159 - 1168. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Jaiswal, J. Ni, I. Yap, D. Ware, W. Spooner, K. Youens-Clark, L. Ren, C. Liang, W. Zhao, K. Ratnapu, et al. Gramene: a bird's eye view of cereal genomes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D717 - D723. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H. Paterson, M. Freeling, and T. Sasaki Grains of knowledge: Genomics of model cereals Genome Res., December 1, 2005; 15(12): 1643 - 1650. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Zhang, C. Chen, L. Li, L. Meng, J. Singh, N. Jiang, X.-W. Deng, Z.-H. He, and P. G. Lemaux Evolutionary Expansion, Gene Structure, and Expression of the Rice Wall-Associated Kinase Gene Family Plant Physiology, November 1, 2005; 139(3): 1107 - 1124. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. H. Pratt, C. Liang, M. Shah, F. Sun, H. Wang, St. P. Reid, A. R. Gingle, A. H. Paterson, R. Wing, R. Dean, et al. Sorghum Expressed Sequence Tags Identify Signature Genes for Drought, Pathogenesis, and Skotomorphogenesis from a Milestone Set of 16,801 Unique Transcripts Plant Physiology, October 1, 2005; 139(2): 869 - 884. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-M. Wang, Z.-Y. Dong, Z.-J. Zhang, X.-Y. Lin, Y. Shen, D. Zhou, and B. Liu Extensive de Novo Genomic Variation in Rice Induced by Introgression From Wild Rice (Zizania latifolia Griseb.) Genetics, August 1, 2005; 170(4): 1945 - 1956. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Lawrence, T. E. Seigfried, and V. Brendel The Maize Genetics and Genomics Database. The Community Resource for Access to Diverse Maize Data Plant Physiology, May 1, 2005; 138(1): 55 - 58. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Schneider, A. Bairoch, C. H. Wu, and R. Apweiler Plant Protein Annotation in the UniProt Knowledgebase Plant Physiology, May 1, 2005; 138(1): 59 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Wisser, Q. Sun, S. H. Hulbert, S. Kresovich, and R. J. Nelson Identification and Characterization of Regions of the Rice Genome Associated With Broad-Spectrum, Quantitative Disease Resistance Genetics, April 1, 2005; 169(4): 2277 - 2293. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. N. Doust, K. M. Devos, M. D. Gadberry, M. D. Gale, and E. A. Kellogg The Genetic Basis for Inflorescence Variation Between Foxtail and Green Millet (Poaceae) Genetics, March 1, 2005; 169(3): 1659 - 1672. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Matsuoka New Directions of Post-rice Genome Sequencing Plant Cell Physiol., January 15, 2005; 46(1): 1 - 2. [Full Text] [PDF] |
||||
![]() |
Y. Yamazaki and P. Jaiswal Biological Ontologies in Rice Databases. An Introduction to the Activities in Gramene and Oryzabase Plant Cell Physiol., January 15, 2005; 46(1): 63 - 68. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Adai, C. Johnson, S. Mlotshwa, S. Archer-Evans, V. Manocha, V. Vance, and V. Sundaresan Computational prediction of miRNAs in Arabidopsis thaliana Genome Res., January 1, 2005; 15(1): 78 - 91. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. V. Krutovsky, M. Troggio, G. R. Brown, K. D. Jermstad, and D. B. Neale Comparative Mapping in the Pinaceae Genetics, September 1, 2004; 168(1): 447 - 461. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. N. Doust, K. M. Devos, M. D. Gadberry, M. D. Gale, and E. A. Kellogg Genetic control of branching in foxtail millet PNAS, June 15, 2004; 101(24): 9045 - 9050. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Haen, H. Lu, T. L. Friesen, and J. D. Faris Genomic Targeting and High-Resolution Mapping of the Tsn1 Gene in Wheat Crop Sci., May 1, 2004; 44(3): 951 - 962. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Close, S. I. Wanamaker, R. A. Caldo, S. M. Turner, D. A. Ashlock, J. A. Dickerson, R. A. Wing, G. J. Muehlbauer, A. Kleinhofs, and R. P. Wise A New Resource for Cereal Genomics: 22K Barley GeneChip Comes of Age Plant Physiology, March 1, 2004; 134(3): 960 - 968. [Abstract] [Full Text] [PDF] |
||||
![]() |