|
|
||||||||
|
Plant Physiology 139:1612-1624 (2005) © 2005 American Society of Plant Biologists Structure and Architecture of the Maize Genome1,[W]Munich Information Center for Protein Sequences, Institute for Bioinformatics, Gesellschaft für Strahlenforschung Research Center for Environment and Health, D85764 Neuherberg, Germany (G.H., H.G., K.F.X.M.); Broad Institute of the Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02141 (S.Y., C.R., S.R., B.B., C.N.); Plant Genome Initiative at Rutgers, Waksman Institute, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854 (A.K.B., G.F., J.M.); and Arizona Genomics Institute, University of Arizona, Tucson, Arizona 85721 (E.B., R.A.W.)
Maize (Zea mays or corn) plays many varied and important roles in society. It is not only an important experimental model plant, but also a major livestock feed crop and a significant source of industrial products such as sweeteners and ethanol. In this study we report the systematic analysis of contiguous sequences of the maize genome. We selected 100 random regions averaging 144 kb in size, representing about 0.6% of the genome, and generated a high-quality dataset for sequence analysis. This sampling contains 330 annotated genes, 91% of which are supported by expressed sequence tag data from maize and other cereal species. Genes averaged 4 kb in size with five exons, although the largest was over 59 kb with 31 exons. Gene density varied over a wide range from 0.5 to 10.7 genes per 100 kb and genes did not appear to cluster significantly. The total repetitive element content we observed (66%) was slightly higher than previous whole-genome estimates (58%63%) and consisted almost exclusively of retroelements. The vast majority of genes can be aligned to at least one sequence read derived from gene-enrichment procedures, but only about 30% are fully covered. Our results indicate that much of the increase in genome size of maize relative to rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) is attributable to an increase in number of both repetitive elements and genes.
Maize (Zea mays or corn) has a wide variety of uses and broad economic impact. It is a significant food source for humans, a chief ingredient in livestock feed, and is the source of a wide range of manufactured products, including sweeteners, fuel, and adhesives. It also has a long and storied history as a model organism in genetic studies. The combination of its genetic and economic importance has made maize a prime organism for genomic studies (for review, see Messing, 2005
In the absence of a genome sequence, studies of selected regions of the maize genome and comparisons to related species have been carried out. Comparative genetic analyses (Hulbert et al., 1990
Existing data suggest that plant genomes are much more dynamic than similarly related animal genomes in terms of size, gene content, organization, and repeat content (for review, see Messing, 2005
There are a variety of strategies for sequencing whole genomes, and part of the goal of this work was to generate a reference sequence for evaluation of an appropriate sequencing strategy for the maize genome. Suitability of a sequencing strategy to a genome depends on the character of the genome, the state of the technology, and availability of funding. Published strategies include whole-genome shotgun, clone by clone, various reduced representation shotgun (RRS) methods, and various combinations of these (Lander et al., 2001 To this end we randomly selected 100 bacterial artificial chromosomes (BACs) from the genome of the maize inbred line B73 for sequence analysis. They were sequenced to deep coverage and manually curated to derive an accurate consensus. This provided a high-quality reference sequence representing approximately 0.6% of the genome that can serve as a basis for both an unbiased study of genome content and evaluation of potential strategies for sequencing the whole maize genome. Based on the sequence information from this large random sampling, we undertook an assessment of the organization and structure of genes, repeat sequence families, and of the coverage by RRS datasets.
Sequencing and Assembly
With the goal of sampling random regions from the maize genome, we selected 100 BAC clones from inbred B73. To avoid bias, these clones were taken from three different BAC libraries made by using different restriction digests (see Nelson et al., 2005 We sequenced, assembled, and manually curated these clones (see "Materials and Methods") to generate the optimal consensus sequence, producing a high-quality dataset on which all of our analyses are based. After curation, 89 BACs yielded ordered and oriented sequence assemblies, while the remaining 11 clones are not fully ordered. One of these (AC147814) represents typical tandemly repetitive regions associated with cytogenetically defined knobs. Another BAC clone (AC150267) not included in the set of the 100 regions contains ribosomal RNA gene sequences, illustrating that the selection process yielded also clones that are recalcitrant to assembly. Since these regions were selected at random, the BACs have a wide range of sizes (22.6227.5 kb), with an average of 143.8 kb. The singletons are smaller overall with an average size of 82.5 kb, as compared to 163.2 kb for the mapped clones (Supplemental Table III). This is not surprising, since fingerprints of smaller clones have fewer bands and thus less information content. In selecting a clone path, sequencing larger clones would typically be chosen. The combined length of the BACs is 14.38 Mb and represents roughly 0.6% of the total maize genome or, for comparison, 3.7% of the rice genome and 12.3% of the Arabidopsis genome (Table I).
Annotation
Accurate gene annotation of maize sequences poses significant challenges. The presence of transposable elements, whether whole or fragmented, in a genome often leads to overprediction of genes. To counter this, one can remove any repeated sequences from the gene set. However, as a consequence, large gene families can be mistaken for repeat sequences, leading to underprediction of genes. Thus, our annotation methods must strike a careful balance. The 100 BAC clones were annotated using a semiautomated pipeline and additional manual inspection and adjustment of gene models (see "Materials and Methods"). To address the issue of falsely predicted genes, the potential gene models were surveyed for the presence of putative repetitive sequences (Messing et al., 2004 To minimize bias in analyses that extrapolate to the entire maize genome, we defined a high-confidence gene set (HCGS) of full-length genes with end-to-end protein database matches. For example, estimation of gene size or the coverage of genic sequences by reduced representational sequencing methods requires a full-length gene set. We have used both the complete and HCGS gene sets for distinct analyses (see below). In defining the HCGS by manual inspection of protein alignments, we identified a reference subset of 172 genes that were very similar in length and sequence to previously described proteins in the nonredundant database. The average length ratio between a reference protein and its counterpart in nonredundant was 0.97 (±0.18 s), and the amino acid identity of the alignment was 0.68 (±0.16 s). In the following, we have used both datasets in our analysis.
We set out to use our annotations to characterize the maize gene set. HCGS gene size falls into a broad distribution ranging from under 1 kb to 59.1 kb, with an average of 4 kb and a median of 2.6 kb. The average gene size for the full set of 330 genes was 3 kb. By comparison, the gene sizes of rice (2.6 kb) and Arabidopsis (2 kb) are significantly smaller (Table I; Arabidopsis Genome Initiative, 2000
Gene density also falls into a broad distribution. We observed between zero and 17 HCGS genes per clone (Fig. 1B). Of the 100 clones, 78 contained at least one gene, while the remaining 22 contained none. Because of the wide range in BAC sizes, it is more appropriate to use a normalized measure of gene density, such as genes per 100 kb (Fig. 1C). Using this measure, gene density varied over an 18-fold range (0.510.7) with an average of 1.2 genes/100 kb. Using the full set of 330 predicted genes, the average density increases to 2.3 genes/100 kb or one gene every 43.5 kb. Both of these values are markedly lower than those reported for the rice genome (one gene every 9.9 kb) and for the Arabidopsis genome (one gene every 4 kb; Table I). The broad range observed is in line with previous observations based on the sequencing of a 346-kb region containing the storage protein gene cluster on chromosome 4S in inbred BSSS53 (Song et al., 2001 In using the data from the random BAC clones to estimate the total density and number of genes in the maize genome, one must take into account the variability observed as well as edge effects, since the individual BACs will often contain partial genes. An alternative method would be to use the average HCGS gene length along with the predicted total gene space to calculate an approximate predicted gene number of the maize genome.
The full set of 330 genes predicted in this study cover 7% of the nucleotides in the sequenced BACs, leading to the extrapolation that the genic space for the whole 2.3-Gb genome totals approximately 167 Mb. Although this is likely to be an underestimate since partial genes are included, this number is consistent with a previous estimate of 177 Mb for the maize transcriptome (Messing et al., 2004
We assessed the guanine + cytosine (G + C) content of exons and introns using just the HCGS because the high level of conservation of these genes across species means the splice-site locations can be considered high confidence. Most strikingly, there were clear differences in GC content between coding and noncoding (intron and untranslated regions) sequences within genes. Exons varied from 40% to over 75% GC with a mean of 55.4% (Table I). Intronic sequences ranged from 30% to 60% with an average of 42.3%. However, there was no significant difference between the GC content of the HCGS and the full gene set.
Besides the overall difference in GC content of exons and introns, we also observed a polarity of GC content of both introns and exons decreasing in the direction of transcription, with the translational start marked by a steep increase in GC content (Supplemental Fig. 3). These observations are consistent with findings in rice (Wong et al., 2002
Since expressed sequences provide the most reliable data for confirmation of gene calls, we compared our gene annotation to existing ESTs from maize and other plant genomes. The publicly available maize collection of about 397,000 ESTs has been clustered to 49,991 unigenes, although these clusters include paralogous sequences (Lai et al., 2004a
Interestingly, including ESTs from dicot and gymnosperm species (Arabidopsis, Medicago, and Pinus) yields a very different result. We compared the full set of 330 predicted genes against each EST dataset at both the DNA sequence level (using BLASTN; Altschul et al., 1990
One likely explanation for this trend is that codon usage differs greatly from monocots to dicots and gymnosperms. The marked dissimilarities in GC content between Arabidopsis and maize genes are consistent with large deviations in codon usage. For instance, maize prefers the GCC codon for Ala, while Arabidopsis prefers the GCT codon (Supplemental Table V; Supplemental Fig. 5). Knowledge of codon usage has been critical in the design of transgenes to be expressed in plants. For instance, the huge success with producing maize varieties resistant to European corn borer was largely based on synthesizing a gene for an insect-toxin protein from Bacillus thuringensis using codons preferred by the plant host (Perlak et al., 1991
The distribution of maize genes relative to repeat sequences has been the object of much interest. Distribution of genes across a sample of 10 clones is shown in Figure 3. Our data show that in almost all cases, single or at most a few genes are separated by repeat elements, although it is possible that larger clusters of genes will be found when longer contiguous sequences become available. This raises some questions about the widely accepted theory that the maize genome consists of gene islands separated by large blocks of repeat elements (SanMiguel and Bennetzen, 1998
Repeat Elements
Our random sample of 0.6% of the maize genome allows us a relatively unbiased view of its repeat content. Previous characterizations of the repeat content of the maize genome were based on genome survey sequences (GSS; Meyers et al., 2001
Analyses based on fully sequenced BACs give us the opportunity to study full-length repeats. BAC sequences were screened for repeat elements using RepeatMasker (www.repeatmasker.org; A.F.A. Smit and P. Green, RepeatMasker, version 2.1) with a customized plant repeat library (http://mips.gsf.de). The underlying repeat sequences were compiled from different sources, clustered into a nonredundant set of 5,707 sequences, and classified by a hierarchical repeat classification scheme. This repeat library was then used to mask and classify repeat sequences in BAC clones. Based on this analysis, we found the known repeat content of the 100 random BACs to be about 66% (Table II), somewhat higher than the estimates of 58% repeat elements from BES representing one-eighth-fold coverage of the genome (Messing et al., 2004
The end sequences of a 50,000-member small insert library of sheared genomic DNA (Whitelaw et al., 2003 As shown by the graphical distribution of repeat sequences, contiguous repetitive regions are frequently interrupted by regions that contain neither repeats nor genes. It is possible that these represent members of repeat or gene families that have degenerated beyond detection or functional sequences not yet well defined. These regions make up more than a quarter of the genome. Of the full set of 330 genes, 34 genes (10.3%; 11.6% for the HCGS) harbor repeats within their introns. The detected repeat types within introns differed significantly from the overall repeat content in maize. About one third of these repeats belong to DNA transposons as compared to 1.28% for the entire genome, indicating a substantial enrichment of this repeat type within introns. Figure 4 shows three gene models that contain repeat elements in their introns.
To compare repeat content in maize and rice, we selected a similar number of random BACs from rice subsp. japonica cv Nipponbare and subjected them to the same analysis (175 pseudo BACs, i.e. 200 kb cut equally from all 12 chromosomes; see Messing et al., 2004
Despite the significant expansion of known repeat families from rice to maize, it is not sufficient to fully explain the size difference between their genomes. In rice, 69% are repeat free, which totals 276 Mb compared to 34% in maize, totaling 804 Mb. This 3-fold increase of repeat-free sequence can in part be explained by the WGD event, which occurred as recently as 4.8 mya through the hybridization of two closely related ancestors of maize (Swigo
The high density of repeat sequences, low gene density, and small average gene size of the maize genome make alternative gene-enrichment sequencing strategies very attractive. To test the effectiveness of this approach, we have aligned sequence reads/contigs (GSSs) derived from two gene-enrichment protocols (Whitelaw et al., 2003
About 93% of the HCGS had at least one corresponding alignment within the GSS collection (Fig. 5), which is slightly higher than maize EST coverage (85%, as described above). However, only 29% of the genes have GSS alignments covering greater than 90% of their length. This result is similar to a previous study of 78 full-length cDNAs (FLCs) reporting that at least 95% aligned to at least one GSS and about 18% of the FLCs were completely covered (Springer et al., 2004
Alignments of GSSs against annotated BAC clones (Fig. 3) revealed deep clusters of filtered sequence reads occurring both in genes and in intergenic regionsboth repetitive (e.g. in BAC AC145262.7) and nonrepetitive (e.g. in BAC AC148169.2). The clusters in introns show that a significant percentage of genes contain repetitive elements (11%) such as solo LTRs or miniature inverted-repeat transposable elements (MITES) present in intronic sequences as shown above (Fig. 5). One can envision that such genes might be underrepresented by enrichment procedures. Indeed, although 94% of genes in our analysis that contain repeats in their introns were tagged by at least one GSS, their total coverage was relatively lower than the coverage for all genes (40% and 51%, respectively). The upstream and downstream sequences of genes showed decreased coverage by the GSS (Supplemental Fig. 6), although we did not observe any pronounced gradients of coverage internal to genes. As a result, UTRs and promoter sequences may be underrepresented in GSS sequences. In addition, GSS clusters that do not represent known repeats are worthy of further study, as they may either identify previously unknown repeats or a particular bias of the GSS datasets.
The two GSS datasets together have tagged the majority of the analyzed genes with at least one read, demonstrating that these methods provide a significant enrichment and enable exploration of the genic space in maize. However, upstream and downstream sequences as well as genes containing intronic repeats are underrepresented. Full-length sequences of these biologically important regions may therefore require other sequencing approaches, such as traditional shotgun sequencing of BAC clones. Alignments also show that hypomethylated DNA sequences are not restricted to gene sequences. Recently, it was shown that certain retrotransposon element families are not only hypomethylated but also transcribed (Messing et al., 2004
With the goal of gaining a relatively unbiased view of the maize genome, we have sampled 100 randomly selected BACs representing 0.6% of the genome, defined their content of genes and repeats, and used these data to characterize the structure and architecture of the maize genome. The maize genome is substantially larger than those of two previously sequenced plant genomes, Arabidopsis and rice. Our work shows this to be a function of the repeat, gene, and intergenic content of maize. Our analysis shows that at least 66% of the genome consists of repetitive elements. This is a lower bound, since there are undoubtedly additional repeats in the genome including sequences that have not yet been characterized or that have diverged too far from known repeats. Retrotransposons are far more frequent than DNA transposons in the maize genome, while in rice the opposite is true. Since retrotransposons are so much larger, this partially explains the significant difference in the sizes of the maize and rice genomes. Repeats are found in the introns of 11% of genes, which explains the relative increase in size of introns compared to exons of rice and Arabidopsis. The repeat types found within introns tend to be short, with a higher frequency of DNA transposons than the rest of the genome, and frequent occurrence of solo LTRs, indicating a possible selective pressure against large elements in these maize introns. Of the BACs sequenced in this study, 80% were found to contain genes. Full-length genes average 4 kb in length, somewhat larger than in rice and Arabidopsis. Longer introns in maize, due in part to transposon insertions, are responsible for most of the increase in gene size. The density of genes is widely variable, ranging from 0.5 to 10.7 genes per 100 kb over a relatively even distribution, and does not suggest that a large fraction of genes are tightly clustered in islands. Based on these data, we estimate that maize has roughly 42,000 to 56,000 genes, substantially more than rice or Arabidopsis. This reflects the history of the maize genome, which includes a relatively recent WGD event, subsequent gene loss, and expansion of gene families. The WGD also appears to contribute to an increase in intergenic space void of apparent repeat sequences. In contrast to sequencing large stretches of genomic DNA, previous samplings of the maize genome focused on methods designed to enrich unique sequences relative to repeats. Available datasets from two such methods were evaluated against our representative gene set. We found that although 93% of genes are at least partially represented in the enriched sample, less than 30% of genes are fully covered by the enriched data. Further, biases exist that indicate that not all sequences of biological interest will be obtained easily. Our results suggest that filtering methods aimed at separating genes from the rest of the genome are an efficient way to begin to sample unique sequences in the maize genome, but will probably be of limited effectiveness for generating a complete representation of the maize gene set due to inherent biases in the data. Our data show that generating high-quality sequences from large insert clones is an effective method for sampling the repeat and gene content of the maize genome. Further, because maize BACs are linked to the physical map, they provide a resource to generate anchored sequences of the genome.
Selection of Clones
Genomic libraries of maize (Zea mays) inbred B73 have been constructed in BACs with three different enzymes, HindIII, EcoRI, and MboI (see Nelson et al., 2005
BAC DNA was sheared into random fragments and size fractionated. Two different sizes (4 kb and 10 kb) were selected. Care was taken to hold inserts of shotgun libraries within a narrow size range. Inserts were sequenced from both ends using universal primers (Vieira and Messing, 1982
A repeat sequence library was built as described in the text and used to mask the BAC sequences that were then analyzed for their coding potential by applying extrinsic (homology based) and intrinsic (ab initio gene prediction methods) criteria and methods. As a first pass, potential gene models required either homology to known genes/ESTs or prediction by at least two gene finders. Genes were detected by applying FGeneSH++ (Salamov and Solovyev, 2000
The methyl- and C0t-filtered sequence reads available at TIGR (http://www.tigr.org/tdb/tgi/maize/) were used to determine the coverage of genes by the filtered sequence reads. All filtered sequence reads were compared against the 100 BACs by BLASTN sequence comparison. To anchor a clone to a genomic location, an alignment length of at least 90% of the clone length and a minimal sequence identity of 98% over the alignment length were required. Genomic/exonic/intronic coverage was determined on a nucleotide basis and was normalized to the length of the respective segment.
The following is a list of the Web sites referenced in this study: www.broad.mit.edu/annotation/plants/maize/randomclones.html (sequence and assembly data for the 100 random clones); pgir.rutgers.edu (the Plant Genome Initiative at Rutgers, sequencing the maize genome project); and www.maizeseq.org (the DuPont/Monsanto/Ceres maize Sequence Information Sharing program). The list of the 100 accessions deposited into GenBank can be found in the Supplemental Table II. Received July 21, 2005; returned for revision September 11, 2005; accepted October 5, 2005.
1 This work was supported by the National Science Foundation Plant Genome (grant no. 0211851). Work at the Munich Information Center for Protein Sequences was in part supported by the Genomanalyse im biologischen System Pflanze program of the German Ministry for Education and Research.
2 These authors contributed equally to the paper. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Joachim Messing (messing{at}mbcl.rutgers.edu).
[W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.105.068718. * Corresponding author; e-mail messing{at}mbcl.rutgers.edu; fax 7324450072.
Ahn S, Tanksley SD (1993) Comparative linkage maps of the rice and maize genomes. Proc Natl Acad Sci USA 90: 79807984 Alleman M, Doctor J (2000) Genomic imprinting in plants: observations and evolutionary implications. Plant Mol Biol 43: 147161[CrossRef][ISI][Medline] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403410[CrossRef][ISI][Medline] Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline] Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res 12: 177189 Bedell JA, Budiman MA, Nunberg A, Citek RW, Robbins D, Jones J, Flick E, Rholfing T, Fries J, Bradford K, et al (2005) Sorghum genome sequencing by methylation filtration. PLoS Biol 3: e13[CrossRef][Medline] Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, et al (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31: 365370 Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A (2005) Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17: 343360 Brunner S, Keller B, Feuillet C (2003) A large rearrangement involving genes and low-copy DNA interrupts the microcollinearity between rice and barley at the Rph7 locus. Genetics 164: 673683 Chen M, SanMiguel P, de Oliveira AC, Woo S-S, Zhang H, Wing RA, Bennetzen JL (1997) Microcolinearity in sh2-homologous regions of the maize, rice, and sorghum genomes. Proc Natl Acad Sci USA 94: 34313435 Cone KC, McMullen MD, Bi IV, Davis GL, Yim YS, Gardiner JM, Polacco ML, Sanchez-Villeda H, Fang Z, Schroeder SG, et al (2002) Genetic, physical, and informatics resources for maize: on the road to an integrated map. Plant Physiol 130: 15981605 Feuillet C, Keller B (1999) High gene density is conserved at syntenic loci of small and large grass genomes. Proc Natl Acad Sci USA 96: 82658270 Fu H, Dooner HK (2002) Intraspecific violation of genetic colinearity and its implication in maize. Proc Natl Acad Sci USA 99: 95739578 Fu Y, Hsia AP, Guo L, Schnable PS (2004) Types and frequencies of sequencing errors in methyl-filtered and high Cot maize genome survey sequences. Plant Physiol 135: 20402050 Gale MD, Devos KM (1998) Comparative genetics in the grasses. Proc Natl Acad Sci USA 95: 19711974 Guo M, Rupe MA, Danilevskaya ON, Yang X, Hu Z (2003) Genome-wide mRNA profiling reveals heterochronic allelic variation and a new imprinted gene in hybrid maize endosperm. Plant J 36: 3044[CrossRef][ISI][Medline] Hulbert SH, Richter TE, Axtell JD, Bennetzen JL (1990) Genetic mapping and characterization of sorghum and related crops by means of maize DNA probes. Proc Natl Acad Sci USA 87: 42514255 Ilic K, SanMiguel PJ, Bennetzen JL (2003) A complex history of rearrangement in an orthologous region of the maize, sorghum, and rice genomes. Proc Natl Acad Sci USA 100: 1226512270 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436: 793800[CrossRef][Medline] Jaffe DB, Butler J, Gnerre S, Mauceli E, Lindblad-Toh K, Mesirov JP, Zody MC, Lander ES (2003) Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res 13: 9196 Lai J, Dey N, Kim C-S, Bharti AK, Rudd S, Mayer KFX, Larkins BA, Becraft P, Messing J (2004a) Characterization of the maize endosperm transcriptome and its comparison to the rice genome. Genome Res 14: 19321937 Lai J, Li Y, Messing J, Dooner HK (2005) Gene movement by Helitron transposons contributes to the haplotype variability of maize. Proc Natl Acad Sci USA 102: 90689073 Lai J, Ma J, Swigo Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al (2001) Initial sequencing and analysis of the human genome. Nature 409: 860921[CrossRef][Medline] Langham RJ, Walsh J, Dunn M, Ko C, Goff SA, Freeling M (2004) Genomic duplication, fractionation and the origin of regulatory novelty. Genetics 166: 935945 Lisch D, Carey CC, Dorweiler JE, Chandler VL (2002) A mutation that prevents paramutation in maize also reverses Mutator transposon methylation and silencing. Proc Natl Acad Sci USA 99: 61306135 Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26: 11071115 Lund G, Das OP, Messing J (1995) Tissue-specific DNase I-sensitive sites of the maize P gene and their changes upon epimutation. Plant J 7: 797807[CrossRef] Messing J (2005) Maize genomics. In D Leister, ed, Plant Functional Genomics. Haworth's Food Products Press, Binghamton, NY, pp 279303 Messing J, Bharti AK, Karlowski WM, Gundlach H, Kim HR, Yu Y, Wei F, Fuks G, Soderlund C, Mayer KFX, et al (2004) Sequence composition and genome organization of maize. Proc Natl Acad Sci USA 101: 1434914354 Meyers BC, Tingey SV, Morgante M (2001) Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res 11: 16601676 Moore G, Devos KM, Wang Z, Gale MD (1995) Cereal genome evolution: grasses, line up and form a circle. Curr Biol 5: 737739[CrossRef][ISI][Medline] Nelson WM, Bharti AK, Butler E, Wei F, Fuks G, Kim HR, Wing RA, Messing J, Soderlund CA (2005) Whole-genome validation of high information content fingerprinting. Plant Physiol 139: 2738 Palmer LE, Rabinowicz PD, O'Shaughnessy AL, Balija VS, Nascimento LU, Dike S, de la Bastide M, Martienssen RA, McCombie WR (2003) Maize genome sequencing by methylation filtration. Science 302: 21152117 Perlak FJ, Fuchs RL, Dean DA, McPherson SL, Fischhoff DA (1991) Modification of the coding sequence enhances plant expression of insect control protein genes. Proc Natl Acad Sci USA 88: 33243328 Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA (1999) Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nat Genet 23: 305308[CrossRef][ISI][Medline] Ramakrishna W, Dubcovsky J, Park Y-J, Busso C, Emberton J, SanMiguel P, Bennetzen JL (2002a) Different types and rates of genome evolution detected by comparative sequence analysis of orthologous segments from four cereal genomes. Genetics 169: 13891400 Ramakrishna W, Emberton J, SanMiguel P, Ogden M, Llaca V, Messing J, Bennetzen JL (2002b) Comparative sequence analysis of the sorghum Rph region and the maize Rp1 resistance gene complex. Plant Physiol 130: 17281738 Rayburn AL, Biradar DP, Bullock DG, McMurphy LM (1993) Nuclear DNA content in F1 hybrids of maize. Heredity 70: 294300 Rice Chromosome 10 Sequencing Consortium (2003) In-depth view of structure, activity, and evolution of rice chromosome 10. Science 300: 15661569 Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10: 516522 SanMiguel P, Bennetzen JL (1998) Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann Bot (Lond) 82: 3744 Schoof H, Ernst R, Nazarov V, Pfeifer L, Mewes HW, Mayer KF (2004) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics. Nucleic Acids Res 32: D373D376 Soderlund C, Humphrey S, Dunhum A, French L (2002) Contigs built with fingerprints, markers and FPC V4.7. Genome Res 10: 17721787 Song R, Llaca V, Linton E, Messing J (2001) Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome Res 11: 18171825 Song R, Llaca V, Messing J (2002) Mosaic organization of orthologous sequences in grass genomes. Genome Res 13: 15491555 Song R, Messing J (2003) Gene expression of a gene family in maize based on noncollinear haplotypes. Proc Natl Acad Sci USA 100: 90559060 Springer NM, Xu X, Barbazuk WB (2004) Utility of different gene enrichment approaches toward identifying and sequencing the maize gene space. Plant Physiol 136: 30233033 Swigo Tarchini R, Biddle P, Wineland R, Tingy S, Rafalski A (2000) The complete sequence of 340 kb DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell 12: 381391 Tikhonov AP, SanMiguel PJ, Nakajima Y, Gorenstein NM, Bennetzen JL, Avramova Z (1999) Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc Natl Acad Sci USA 96: 74097414 Usuka J, Zhu W, Brendel V (2000) Optimal spliced alignments of homologous cDNA to a genomic DANN template. Bioinformatics 16: 203211 Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al (2001) The sequence of the human genome. Science 291: 13041351 Vieira J, Messing J (1982) The pUC plasmids, an M13mp7 derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene 19: 259268[CrossRef][ISI][Medline] Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520562[CrossRef][Medline] Whitelaw CA, Barbazuk WB, Pertea G, Chan AP, Cheung F, Lee Y, Zheng L, van Heeringen S, Karamycheva S, Bennetzen JL, et al (2003) Enrichment of gene-coding sequences in maize by genome filtration. Science 302: 21182120 Wong GKS, Wang J, Tao L, Tan J, Zhang JG, Passey DA, Yu J (2002) Compositional gradients in gramineae genes. Genome Res 12: 851856 Wu J, Yamagata H, Hayashi-Tsugane M, Hijishita S, Fujisawa M, Shibata M, Ito Y, Nakamura M, Sakaguchi M, Yoshihara R, et al (2004) Composition and structure of the centromeric region of rice chromosome 8. Plant Cell 16: 967976 Yim YS, Davis GL, Duru NA, Musket TA, Linton EW, Messing JW, McMullen MD, Soderlund CA, Polacco ML, Gardiner JM, et al (2002) Characterization of three maize bacterial artificial chromosome libraries toward anchoring of the physical map to the genetic map using high-density bacterial artificial chromosome filter hybridization. Plant Physiol 130: 16861696 Yuan Y, SanMiguel PJ, Bennetzen JL (2002) Methylation-spanning linker libraries link gene-rich regions and identify epigenetic boundaries in Zea mays. Genome Res 12: 13451349 Yuan Y, SanMiguel PJ, Bennetzen JL (2003) High-Cot sequence analysis of the maize genome. Plant J 34: 249255; erratum Yuan Y, SanMiguel PJ, Bennetzen JL (2003) Plant J 36: 430 This article has been cited by other articles:
|