Plant Physiol. Journal of Pharmacology and Experimental Therapeutics
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (16)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Buell, C. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Buell, C. R.
Agricola
Right arrow Articles by Buell, C. R.

Plant Physiol, December 2002, Vol. 130, pp. 1585-1586

UPDATE ON GENOMICS
Current Status of the Sequence of the Rice Genome and Prospects for Finishing the First Monocot Genome1


C. Robin Buell*

The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850


    ARTICLE
TOP
ARTICLE
LITERATURE CITED

Rice (Oryza sativa) is the first grass species to be sequenced, and as of September 2002, there are four draft genome sequences available. All four drafts are available to the academic community, although two drafts have some limitations with respect to access and distribution. Although none of the four draft sequences is complete, they collectively provide our first view of the landscape and the content of a monocot genome.

The first rice genome sequence made accessible in large tracts was that of the O. sativa subsp japonica cv Nipponbare generated by the International Rice Genome Sequencing Project (IRGSP; Sasaki and Burr, 2000), an international consortium of public laboratories. Using a bacterial artificial chromosome (BAC)-by-BAC approach, the IRGSP has generated draft sequence of 3,083 BAC or P1 artificial chromosome (PAC) clones that is available through GenBank/DNA data bank of Japan (DDBJ)/EMBL (as of September 17, 2002). These 3,083 BAC/PAC clones represent 426 Mb of sequence, and assuming an overlap of 15% between the clones, this would represent 362 Mb of unique sequence. With an estimated genome size of 430 Mb (Arumuganathan and Earle, 1991), this represents 84% of the rice genome. Alignment of the IRGSP sequence with 13,895 sequenced genetic markers reveals that 11,442 markers can be anchored to a BAC/PAC clone using high-stringency criteria (http://www.tigr.org/tdb/e2k1/osa1/BACmapping/description.shtml), indicating that based on coverage of markers, the IRGSP sequence represents 82% of the genome. A graphic depiction of the anchoring of the BAC/PAC clones to the chromosomes can be viewed at http://www.tigr.org/tdb/e2k1/osa1/BACmapping/description.shtml. There is clearly representation throughout most of the chromosomes, with the exceptions occurring in the regions devoid of, or lacking in, a high density of genetic markers in which to anchor the BAC/PAC clones. Likewise, regions where it is technically difficult to identify BAC/PAC clones (telomeres, centromeres, and nucleolar-organizing regions) are under-represented in the IRGSP sequence.

Although the majority of the IRGSP sequence is draft sequence, approximately a third of the sequence is finished (1,023 BAC/PAC clones as of September 12, 2002; http://www.tigr.org/tdb/e2k1/osa1/BACmapping/description.shtml). In fact, manuscripts describing the sequence, annotation, and analysis of chromosomes 1 and 4 are in press (T. Sasaki and B. Hin, personal communication) and a manuscript on chromosome 10 is in preparation (C.R. Buell, W. McCombie, J. Messing, and R.A. Wing, personal communication) highlighting the role of the IRGSP in finishing the rice genome. In addition, the overall quality of draft sequence generated by the IRGSP is high with the bulk of the sequence being 10×, phase 2 sequence, with 10× being the level of sequence coverage and phase 2 reflecting the fact that the contigs are ordered and oriented when deposited in GenBank (http://www.ncbi.nlm.nih.gov/HTGS/). Although the immediate goal of the IRGSP is completion of a phase 2 draft of the rice genome by the end of 2002 (http://rgp.dna.affrc.go.jp/rgp/press_conference.html), the ultimate goal is that of a finished rice genome.

Annotation for the IRGSP BAC/PAC clones is available for finished clones in GenBank/DDBJ/EMBL. Annotation data for unfinished sequences are generated through automated annotation processes and are available from The Institute for Genomic Research (http://www.tigr.org/tigr-scripts/e2k1/irgsp.spl) and the Rice Genome Program (http://rgp.dna.affrc.go.jp/giot/INE.html). Although manually curated annotation is always preferred over automated annotation, access to automated annotation for unfinished sequences provides a valuable resource for these unfinished sequences. Other analyses of the rice genome, such as alignment with expressed sequence tags from other monocot species, identification of motifs/domains within the rice proteome, analysis of repetitive sequences, and identification of syntenic sequences are available through several public sources (http://www.tigr.org/tdb/e2k1/osa1/; http://rgp.dna.affrc.go.jp/; http://www.gramene.org).

Draft sequence of the same rice cv Nipponbare japonica sequenced by the IRGSP is available from two separate private sources, Pharmacia (Peapack, NJ) and Syngenta (San Diego). The Pharmacia draft sequence was generated using a BAC-by-BAC approach and represents 259 Mb of sequence (Barry, 2001). Access to this draft sequence is available to academic scientists under an access agreement with Pharmacia (http://www.rice-research.org). An agreement between Pharmacia and the IRGSP has resulted in the incorporation of the Pharmacia BAC clones and sequence into the IRGSP sequence. The Syngenta draft sequence was generated using a whole-genome shotgun sequencing approach and provides 93% coverage of the genome (Goff et al., 2002). This draft sequence is available through a licensing agreement with Syngenta (http://www.tmri.org). Although the Syngenta draft sequence has been annotated, these data are not available to the public. Insights into the rice genome and proteome using the Syngenta draft sequence was recently reported by Goff et al. (2002). It is estimated that the rice genome encodes between 32,000 and 50,000 proteins. From comparative analyses with cereals, not only was a high degree of homology present between rice and other cereal genes, but synteny between rice and other cereals, especially maize (Zea mays), was reported. These analyses are an extension of previous studies on the high degree of similarity between rice and other cereals (Gale and Devos, 1998) and further highlight the role rice can have in cereal comparative genomics.

A draft sequence of the O. sativa subsp. indica cultivar (93-11) was reported by the Beijing Genomics Institute (BGI) by Yu et al. (2001, 2002). This draft, generated through a whole-genome shotgun sequencing approach, represents 360 Mb of assembled sequence and provides a resource not only for gene discovery in rice subsp indica but also for rice comparative genomics. Unlike the Pharmacia and Syngenta draft sequences, the BGI sequence is freely available via the BGI web site (http://btn.genomics.org.cn/rice) and through GenBank/DDBJ/EMBL. An analysis of the BGI sequence suggests that the rice genome encodes for between 46,022 and 55,612 proteins (Yu et al., 2002), consistent with the estimate made by Goff et al. (2002). To date, rice has the most genes of any sequenced organism, almost twice that of the dicotyledonous model plant, Arabidopsis (Arabidopsis Genome Initiative, 2000). In a comparative study between rice and Arabidopsis, rice has a homolog for approximately 81% of the proteins in the Arabidopsis genome (Yu et al., 2002), suggesting substantial overlap in the genes required for basic plant functions in monocots and dicots. However, in the reciprocal comparison, a homolog in Arabidopsis could only be found for one-half of the rice proteins.

Although these four draft sequences provide a rich resource for data mining, they have limitations. The nature of draft sequence, regardless of source, is that it contains errors and is incomplete. The errors can be simple sequencing errors (incorrect bases, low quality regions) or larger in nature in terms of misassembly. However, the main disadvantage of draft sequence is the incomplete nature of the sequence. Not only can the gene of interest be truncated in the draft sequence because of sequencing gaps, not all portions of the genome are represented in the draft sequence. Telomeres, centromeres, and other regions that are difficult to sequence are absent, under-represented, or misassembled in these draft sequences. Thus, studies such as chromosome structure and organization that require a more complete sequence cannot be performed with these draft sequences.

Obtaining a finished rice genome is necessary if rice is to be used as the base species in comparative genomics in cereals (Goff, 2002; Leach et al., 2002). It is anticipated that the IRGSP will take the lead in finishing the genome. Because the IRGSP has finished one-third of the genome to date and will have the remainder of the genome ready for finishing in December 2002, finishing the rice genome seems to be on a positive track. In addition, although the rice genome may be finished and annotation deposited in GenBank at the BAC level, it will be essential to have annotation of the genome at a level comparable with that of Arabidopsis if rice is to be leveraged to other monocot genomes. Construction of pseudomolecules (reference molecules) of the chromosomes along with uniform, high-quality annotation will be required for researchers to maximally gain information from the rice genome sequence. Although the rice genome and annotation is incomplete today, multiple centers are contributing to finishing the sequence and in improving the annotation. From these efforts, a truly finished and well-annotated reference sequence for the first cereal species will be available.

    FOOTNOTES

Received September 18, 2002; returned for revision September 25, 2002; accepted September 25, 2002.

1 The work on rice genome sequencing at TIGR was supported by the U.S. Department of Agriculture (grant no. 99-35317-8275), by the National Science Foundation (grant no. DBI998282), and by the U.S. Department of Energy (grant no. DE-FG02-99ER2035).

* E-mail rbuell{at}tigr.org; fax 301-838-0208.

www.plantphysiol.org/cgi/doi/10.1104/pp.014878.


    LITERATURE CITED
TOP
ARTICLE
LITERATURE CITED

  • Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796-815[CrossRef][Medline]
  • Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9: 208-218
  • Barry GF (2001) The use of the Monsanto draft rice genome sequence in research. Plant Physiol 125: 1164-1165[Free Full Text]
  • Gale MD, Devos KM (1998) Comparative genetics in the grasses. Proc Natl Acad Sci USA 95: 1971-1974[Abstract/Free Full Text]
  • Goff SA (2002) Collaborating on the rice genome. Science 296: 45-46
  • Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100[Abstract/Free Full Text]
  • Leach J, McCouch S, Slezak T, Sasaki T, Wessler S (2002) Why finishing the rice genome matters. Science 296: 45
  • Sasaki T, Burr B (2000) International Rice Genome Sequencing Project: the effort to completely sequence the rice genome. Curr Opin Plant Biol 3: 138-141[CrossRef][ISI][Medline]
  • Yu J, Hu S, Wang J, Li S, Wong KSG, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al (2001) A draft sequence of the rice (Oryza sativa ssp. indica) genome. Chin Sci Bull 46: 1937-18941
  • Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 79-92[Abstract/Free Full Text]
© 2002 American Society of Plant Biologists



This article has been cited by other articles:


Home page
GeneticsHome page
C. D. Buchanan, P. E. Klein, and J. E. Mullet
Phylogenetic Analysis of 5'-Noncoding Regions From the ABA-Responsive rab16/17 Gene Family of Sorghum, Maize and Rice Provides Insight Into the Composition, Organization and Function of cis-Regulatory Modules
Genetics, November 1, 2004; 168(3): 1639 - 1654.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
I. Landrieu, M. da Costa, L. De Veylder, F. Dewitte, K. Vandepoele, S. Hassan, J.-M. Wieruszeski, F. Corellou, J.-D. Faure, M. Van Montagu, et al.
A small CDC25 dual-specificity tyrosine-phosphatase isoform in Arabidopsis thaliana
PNAS, September 7, 2004; 101(36): 13380 - 13385.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
G. Blanc and K. H. Wolfe
Widespread Paleopolyploidy in Model Plant Species Inferred from Age Distributions of Duplicate Genes
PLANT CELL, July 1, 2004; 16(7): 1667 - 1678.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (16)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Buell, C. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Buell, C. R.
Agricola
Right arrow Articles by Buell, C. R.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
ASPB Publications PLANT PHYSIOLOGY THE PLANT CELL
Copyright © 2002 by the American Society of Plant Biologists