|
|
||||||||
|
Plant Physiol, December 2002, Vol. 130, pp. 1594-1597
UPDATE ON THE MAIZE GENOME SEQUENCING PROJECT
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |
INTRODUCTION |
|---|
|
|
|---|
On September 20, 2002, the National Science Foundation (NSF) announced the launch of the Maize Genome Sequencing Project. The momentum for this endeavor has been building within the maize (Zea mays) genetics and larger plant science community for several years. Reasons for launching a concerted effort at this time are at least 4-fold. First, advances in DNA sequencing technology now allow faster sequencing at a lower cost than in the past. Second, new high-resolution, high-throughput DNA fingerprinting methods should yield a minimum clone set colinear with the genetic map of the maize genome. Third, promising approaches to preparing fractions of the maize genome enriched for genes have been developed. Fourth, comparative analyses of maize with rice (Oryza sativa) or Arabidopsis suggest that the genome sequences of these two species will not be sufficient to understand the precise details of maize gene content and expression. This Update reviews the project goals and the expected deliverables deriving from the two funded consortia.
| |
WHY SEQUENCE MAIZE? |
|---|
|
|
|---|
The cereals, including maize and rice, account for 70% of
food production worldwide, and in addition, maize is an economically important crop in the United States. Maize is also the best-studied and
most tractable genetic system among the cereals, making it the premier
model system for studying this important group of crops, as well as
other monocots. Although cereals are of economic importance and a
greater understanding of their genes will have great impact, much
interesting biology can also be learned from these species. For
example, the recent diversification of the grasses makes them an ideal
collective system for dissecting genetic control of morphological and
genomic diversity (for review, see Kellogg, 2001
).
Comparative analyses of several cereal genomes, including maize, rice,
sorghum (Sorghum bicolor), wheat (Triticum aestivum), and barley (Hordeum vulgare), have shown
extensive conservation of gene content and order at the level of the
overall genetic map (Gale and Devos, 1998
). However,
local rearrangements often interfere with microsynteny, providing
evidence for the differentiation of grass genomes (Tikhonov
et al., 1999
; Keller and Feuillet, 2000
;
Dubcovsky et al., 2001
; Fu and Dooner,
2002
; Li and Gill, 2002
; Song et al., 2002
).
These rearrangements include tandem gene duplications, small
inversions, and translocations of one or a few genes between
chromosomes. Thus, rice is likely to be too diverged to serve as a
resource for efficient map-based cloning of maize traits. However, once
the structure of the maize genome is better understood in relation to
rice, reciprocal comparative studies will be possible among grasses
(for review, see Freeling, 2001
). Having the maize
sequence is also likely to benefit rice genome annotation, as
illustrated by comparisons of the mouse and human genomes
(Gregory et al., 2002
).
Our current picture of the maize genome is largely derived from data
generated by projects previously funded by the NSF Plant Genome Research Program. Two deep coverage bacterial
artificial chromosome (BAC) libraries (Cone et al.,
2002
; Tomkins et al., 2002
) have been produced,
and an integrated genetic/physical map using a high-resolution
agarose fingerprinting method (Cone et al., 2002
;
http://www.genome.arizona.edu/fpc/maize/) is being generated.
At present, maize sequence data comprise expressed sequence tags
(ESTs), genome survey sequences (GSSs) from transposon-tagged sites,
random clone insert sequences, a few BAC clone end-sequences (BAC-ends), and sample sequences of genome subclones selected for
hypomethylation or the presence of long open reading frames (ORFs). A
handful of maize BAC clones have also been sequenced completely.
Comparison of predicted maize ORFs from these sequence data with the
Arabidopsis proteome suggests that maize-specific or highly diverged
proteins contribute to a maize proteome that is anticipated to be about
50,000 proteins, or about twice the size of that of Arabidopsis
(Brendel et al., 2002
). More than 30% of the
EST-derived ORFs and more than 70% of GSS-derived ORFs of maize do not
match any Arabidopsis proteins. Although it is likely a number of these
maize ORFs will be in rice, it will be interesting to determine the
number of genes different between maize and rice. The upcoming sequence
resources should expand significantly our knowledge of the gene space
of flowering plants and additionally allow elucidation of possible
differences in gene content between the monocot and dicot lineages.
| |
CHALLENGES OF SEQUENCING THE MAIZE GENOME |
|---|
|
|
|---|
The maize genome represents a significant new challenge for
sequencing. At 2,500 million bp, the maize genome is about 20 times
larger than that of Arabidopsis, about six times larger than that of
rice, and about the same size as the human genome. However, its
organization is more complex than the other genomes sequenced to date.
The genes of maize compose only about 20% of the genome and are
organized into islands of variable size that are scattered throughout a
sea of highly conserved, high-copy retrotransposons and other
repetitive sequences (San Miguel et al., 1996
). Given
this complex organization, how should the maize genome be sequenced? On
July 2, 2001, NSF sponsored (DBI-0126620) a workshop in St. Louis to
discuss technical approaches for a Maize Genome Sequencing Project.
This workshop included academic, governmental, and industrial
scientists with expertise in genome analysis as well as observers
representing federal funding agencies (NSF, Department of Energy,
National Institutes of Health, and U.S. Department of Agriculture) and
U.S. corn growers' associations. All participants at the workshop
agreed that genome sequencing and the placement of maize genes on a
cross-referenced physical-genetic map was a feasible, worthy and timely
goal that should be given a high priority. The workshop report
(Bennetzen et al., 2001
) described several strategies
that could be used to focus sequencing efforts on gene-rich regions
within complex genomes such as maize. The cost of identifying most of
the maize genes and placing them on the integrated physical and genetic
map was estimated at approximately $52 million.
One area highlighted in the fiscal year 2002 NSF Plant Genome Research Program Solicitation (NSF 01-158; http://www.nsf.gov/pubsys/ods/getpub.cfm?nsf01158) was large-scale DNA sequencing of specific regions (e.g. gene-rich regions) or clones of large plant genomes. As a result of this competition, two projects were awarded $10.2 million over 2 years to begin to sequence the maize genome. One project (DBI-0211851: http://www.fastlane.nsf. gov/servlet/showaward?award=0211851) led by Joachim Messing of Rutgers University includes Rod Wing and Cari Soderlund of the University of Arizona, Francis Quetier of Genoscope (Evry, France), Hans-Werner Mewes and Klaus Mayer of the Munich Information Centre for Protein Sequences, and Michele Morgante of DuPont (Wilmington, DE) as a technical consultant. The other project (DBI-0221536: http://www.fastlane.nsf.gov/servlet/showaward?award=0221536), led by Karel Schubert of the Danforth Center (St. Louis), includes Roger Beachy also of the Danforth Center, Cathy Whitelaw and John Quackenbush of The Institute for Genomic Research (TIGR; Rockville, MD), Nathan Lakey of Orion Genomics (St. Louis), and Jeff Bennetzen of Purdue University (West Lafayette).
The project led by Joachim Messing will deliver a
high-resolution, sequence-ready map of the maize genome. This map will
integrate 450,000 fluorescent-based BAC clone fingerprint reads,
450,000 end sequences from 225,000 BACs, and 10× shotgun sequence of
about 140 BACs seeded from about 10 points throughout the genome
(totaling about 20 Mb of sequence or 1% of the genome). The project
led by Karel Schubert will evaluate two gene enrichment technologies ("methylation-filtering" [Rabinowicz et al., 1999
]
and "high Cot selection"). One million end
reads from 250,000 clones from each of the methylation-filtered
libraries and the high Cot libraries will be
assembled into contigs, annotated, and placed on the maize and rice
genome maps. Together the two projects will yield the resources listed
as the highest priorities in the maize genome sequencing workshop
report. The integrated outcomes will be a maize sequence resource that
will allow analysis of the overall architecture of the genome,
including the size and distribution of the gene islands, the gene
densities within these, and the range of gene structures. In addition,
these resources will provide a minimal clone set that is colinear with
the genetic map, providing the foundation for future large-scale
sequencing of the maize genome, and the proof-of-concept for new
methods to rapidly and selectively enrich for the genes of any large,
complex plant genome in a cost-effective manner.
A meeting was held at NSF on September 18, 2002, to coordinate the two projects. Discussions at this meeting included how data would be shared between projects and communicated to the public. A single advisory committee will be formed to advise both projects, and frequent conference calls are planned both within and among the consortia. In addition, a single Web site from which all project data and related maize resources can be accessed was proposed.
| |
FINDING PROJECT DATA |
|---|
|
|
|---|
It will be important for the project data to be rapidly and
readily accessible to maize researchers and the broader scientific community. The plans for release of each of the major deliverables are
as follows. (a) All sequence data, including trace files, will be
automatically deposited with GenBank within 24 h, or at most
within a week, after production. Thus, public access to the data
will be achieved with the shortest possible turn-around time. (b)
BAC-end sequences from the Messing project will be deposited in dbGSS
(http://www.ncbi.nlm.nih.gov/dbGSS/index.html). (c) A minimum
tiling path of the BACs will be derived and displayed with the
fingerprint contig (FPC) software (Soderlund et al., 2000
) at Arizona
Genomics Institute (Tucson; http://www.genome.arizona. edu/fpc/maize/). (d) Sequenced BACs will be initially deposited in the
high throughput genomic sequences division of GenBank
(http://www. ncbi.nlm.nih.gov/HTGS/). (e) Subsequent annotation of a
total of 20 Mb of distinct BAC sequences will be conducted
at Munich Information Centre for Protein Sequences. (f) Trimmed
single-pass end sequences derived from methylation-filtered and high
Cot clones will be submitted to the high
throughput genomic sequences division of GenBank. (g)
Methylation-filtered and high Cot clones will be
available through the Arizona Genomics Institute
(http://www.genome.arizona.edu/fpc/maize/).
Using known maize genes as auxiliary templates, all gene-enriched
sequences will be assembled into contigs at TIGR using programs similar
to the TIGR tools currently used for EST assembly in the production of
TIGR gene indices (Liang et al., 2000
). There will be no
restrictions on the public use of these gene assemblies. The FPC maps
and BAC-end sequences will serve as anchors for tying the TIGR
assemblies to the maize genome. Both projects will map their sequence
assemblies to the rice genome for comparative analysis. As the results
of these analyses become available, they will be disseminated through a
single Web site (http://www.maizegenome.org). This site will include
a BLAST server with links from sequences to the individual project
sites and a general description of the overall program with current
progress toward the sequencing and mapping goals. This site will also
provide a forum for community input and feedback.
From experience with Arabidopsis (and virtually all eukaryotic genome projects), annotation of gene structure and functional characterization of gene products will be a time-consuming endeavor that will continue well beyond the initial release of sequence data and their preliminary characterization. However, the informatics plan for this project provides the community with immediate access to all the primary data. Thus, researchers can look forward to a rich data-mining resource for their particular gene or genes of interest from the very beginning of the project.
The U.S. Department of Agriculture (USDA) is funding complementary efforts to integrate cereal genome sequence data such as these into a larger context. A next-generation maize genetics/genomics database (MGDb, http://www.mgdb.org/) under development by Volker Brendel (Iowa State University, in collaboration with the USDA Agricultural Research Service [ARS]) will provide comprehensive access to maize genetic and genomic data (including the new maize genome sequence data from these NSF projects). Gramene (http://www.gramene.org/) is a new data resource for comparative genome analysis in the grasses (Lincoln Stein, Cold Spring Harbor Laboratory; Sam Cartinhour, USDA/ARS, Ithaca; Susan McCouch, Cornell University), which links genome sequence and map data from rice to maps (physical and genetic) and DNA sequence of other cereals (including maize). The convergence of all these efforts will give plant biologists an unprecedented detailed view of plant genome content and organization within the next few years. In addition, it will provide an important public resource for growers and breeders.
Within the coming year, we should have a fuller picture of the architecture of the maize genome that includes the approximate size range and distribution of the gene islands, and an idea of the structure of a typical gene. In addition, we should have the first substantial public maize gene sequence collection that includes information about the promoter, upstream, and downstream sequences. This resource will set the stage for a large-scale sequencing effort. It is essential that the community be engaged in this endeavor from the very beginning. Feedback from the community will be critical for the maintenance, update, and dissemination of the project outcomes. The success of the Maize Genome Sequencing Project will very much depend on the extent to which the community becomes involved.
| |
FOOTNOTES |
|---|
Received October 2, 2002; returned for revision October 4, 2002; accepted October 7, 2002.
* Corresponding author; e-mail chandler{at}ag.arizona.edu; fax 520-621-7186.
www.plantphysiol.org/cgi/doi/10.1104/pp.015594.
| |
LITERATURE CITED |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
X. Wang, A. A. Elling, X. Li, N. Li, Z. Peng, G. He, H. Sun, Y. Qi, X. S. Liu, and X. W. Deng Genome-Wide and Organ-Specific Landscapes of Epigenetic Modifications and Their Relationships to mRNA and Small RNA Transcriptomes in Maize PLANT CELL, April 1, 2009; 21(4): 1053 - 1069. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. I. E. Amarillo and H. W. Bass A Transgenomic Cytogenetic Sorghum (Sorghum propinquum) Bacterial Artificial Chromosome Fluorescence in Situ Hybridization Map of Maize (Zea mays L.) Pachytene Chromosome 9, Evidence for Regions of Genome Hyperexpansion Genetics, November 1, 2007; 177(3): 1509 - 1526. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. McSteen Branching Out: The ramosa Pathway and the Evolution of Grass Inflorescence Morphology. PLANT CELL, March 1, 2006; 18(3): 518 - 522. [Full Text] [PDF] |
||||
![]() |
M. Falque, L. Decousset, D. Dervins, A.-M. Jacob, J. Joets, J.-P. Martinant, X. Raffoux, N. Ribiere, C. Ridel, D. Samson, et al. Linkage Mapping of 1454 New Maize Candidate Gene Loci Genetics, August 1, 2005; 170(4): 1957 - 1966. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Chardon, B. Virlon, L. Moreau, M. Falque, J. Joets, L. Decousset, A. Murigneux, and A. Charcosset Genetic Architecture of Flowering Time in Maize As Inferred From Quantitative Trait Loci Meta-analysis and Synteny Conservation With the Rice Genome Genetics, December 1, 2004; 168(4): 2169 - 2185. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Buchanan, P. E. Klein, and J. E. Mullet Phylogenetic Analysis of 5'-Noncoding Regions From the ABA-Responsive rab16/17 Gene Family of Sorghum, Maize and Rice Provides Insight Into the Composition, Organization and Function of cis-Regulatory Modules Genetics, November 1, 2004; 168(3): 1639 - 1654. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. K. Castleden, N. Aoki, V. J. Gillespie, E. A. MacRae, W. P. Quick, P. Buchner, C. H. Foyer, R. T. Furbank, and J. E. Lunn Evolution and Function of the Sucrose-Phosphate Synthase Gene Families in Wheat and Other Grasses Plant Physiology, July 1, 2004; 135(3): 1753 - 1764. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. HOCHHOLDINGER, K. WOLL, M. SAUER, and D. DEMBINSKY Genetic Dissection of Root Formation in Maize (Zea mays) Reveals Root-type Specific Developmental Programmes Ann. Bot., April 1, 2004; 93(4): 359 - 368. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Dong, S. D. Schlueter, and V. Brendel PlantGDB, plant genome database and analysis tools Nucleic Acids Res., January 1, 2004; 32(90001): D354 - 359. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ouyang and C. R. Buell The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants Nucleic Acids Res., January 1, 2004; 32(90001): D360 - 363. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. W. Engler, J. Hatfield, W. Nelson, and C. A. Soderlund Locating Sequence on FPC Maps and Selecting a Minimal Tiling Path Genome Res., September 1, 2003; 13(9): 2152 - 2163. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Schlueter, Q. Dong, and V. Brendel GeneSeqer@PlantGDB: gene structure prediction in plant genomes Nucleic Acids Res., July 1, 2003; 31(13): 3597 - 3600. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-S. Yim, G. L. Davis, N. A. Duru, T. A. Musket, E. W. Linton, J. W. Messing, M. D. McMullen, C. A. Soderlund, M. L. Polacco, J. M. Gardiner, et al. Characterization of Three Maize Bacterial Artificial Chromosome Libraries toward Anchoring of the Physical Map to the Genetic Map Using High-Density Bacterial Artificial Chromosome Filter Hybridization Plant Physiology, December 1, 2002; 130(4): 1686 - 1696. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|