|
|
||||||||
|
First published online August 12, 2009; 10.1104/pp.109.143370 Plant Physiology 151:483-495 (2009) © 2009 American Society of Plant Biologists OPEN ACCESS ARTICLE
Computational Finishing of Large Sequence Contigs Reveals Interspersed Nested Repeats and Gene Islands in the rf1-Associated Region of Maize1,[W],[OA]Bioinformatics and Computational Biology (B.A.K., R.P.W.), Department of Plant Pathology and Center for Plant Responses to Environmental Stresses (B.A.K., R.P.W.), and Corn Insects and Crop Genetics Research, United States Department of Agriculture-Agricultural Research Service (R.P.W.), Iowa State University, Ames, Iowa 50011–1020
The architecture of grass genomes varies on multiple levels. Large long terminal repeat retrotransposon clusters occupy significant portions of the intergenic regions, and islands of protein-encoding genes are interspersed among the repeat clusters. Hence, advanced assembly techniques are required to obtain completely finished genomes as well as to investigate gene and transposable element distributions. To characterize the organization and distribution of repeat clusters and gene islands across large grass genomes, we present 961- and 594-kb contiguous sequence contigs associated with the rf1 (for restorer of fertility1) locus in the near-centromeric region of maize (Zea mays) chromosome 3. We present two methods for computational finishing of highly repetitive bacterial artificial chromosome clones that have proved successful to close all sequence gaps caused by transposable element insertions. Sixteen repeat clusters were observed, ranging in length from 23 to 155 kb. These repeat clusters are almost exclusively long terminal repeat retrotransposons, of which the paleontology of insertion varies throughout the cluster. Gene islands contain from one to four predicted genes, resulting in a gene density of one gene per 16 kb in gene islands and one gene per 111 kb over the entire sequenced region. The two sequence contigs, when compared with the rice (Oryza sativa) and sorghum (Sorghum bicolor) genomes, retain gene colinearity of 50% and 71%, respectively, and 70% and 100%, respectively, for high-confidence gene models. Collinear genes on single gene islands show that while most expansion of the maize genome has occurred in the repeat clusters, gene islands are not immune and have experienced growth in both intragene and intergene locations.
Genome sequencing of the maize (Zea mays) genome is nearing completion (Bennetzen et al., 2001
The landscape of the maize genome provides an interesting challenge for both sequencing and subsequent annotation. A high density of long terminal repeat (LTR) retrotransposons has had a direct effect on the genome size of many plant genomes, including maize (SanMiguel et al., 1996
Previous studies of large contiguous regions of maize have provided a general view of the landscape of the genome. Unfinished sequence totaling 7.8 Mb from chromosome 1 and 6.6 Mb from chromosome 9 shows a gene density of one gene per 33 and 27 kb, respectively (Bruggmann et al., 2006
In order to characterize large contiguous regions of maize sequence, we identified and sequenced two B73 BAC contigs from the centromeric region of chromosome 3. These contigs of 961 and 594 kb correspond to contigs 117 and 119, respectively, on maize WebFPC (Wei et al., 2007 Using probes designed off the 5.5-kb cosegregating restriction fragment and the p6140-1 cDNA, we have identified two BAC contigs spanning the rf1 locus. Sixteen BACs were sequenced to completion to provide high-quality finished sequence. Here, we present two methods for computational finishing of highly repetitive grass genomes, which were successfully utilized to close 11 TE-induced gaps. Sixteen nested repeat clusters were found, each spanning as much as 155 kb and containing a variety of LTR retrotransposon types and ages of insertion. Genes are found tightly clustered, showing a density rate of one gene per 16 kb within gene islands. Finally, comparative analysis with rice (Oryza sativa) and sorghum (Sorghum bicolor) shows that while many genes are retained across all three species, genes have both been lost and translocated across the genomes.
Mapping, Sequence, and Assembly of Maize rf1 Contigs
Analysis of Multiple B73 Maize BAC Libraries Leads to Two Separated rf1-Associated Contigs
After the National Science Foundation-sponsored maize physical mapping project was under way (NSF-PGR no. 9872655), additional BAC clones were identified by hybridization to the ZMMBBb and ZMMBBc libraries and subsequently via in silico overlaps from the maize WebFPC database (Coe et al., 2002
Sequencing and Initial Assembly of Maize BACs
At this stage, BAC assemblies were as close to best possible condition that finished sequencing could bring (Table I, Post-Finish Gaps). Remaining gaps were closed with computational methods. BAC sequences were assembled with two programs, CAP3 (Huang and Madan, 1999
Finished BAC clones were verified with restriction digest analysis. For nongap regions, base pair quality is well within sequencing standards, with less than one error in 1 x 105 per BAC assembly. In the minimal tiling path, BACs average 32 kb of overlap, although the areas of overlap between ZMMBBb0211C05 and ZMMBBb0331I02 multiple BACs were sequenced to resolve mapping discrepancies. Fully assembled, the rf1-C1 is 961 kb, the rf1-C2 is 594 kb, and they have been submitted to GenBank (Benson et al., 2006
Characterization of Repetitive Gaps in Maize Sequence Assembly In particular, two methods proved very useful to resolve maize sequence gaps that were unclosable with traditional laboratory-based finishing methods. Eleven gaps in the BAC assemblies were closed with purely computational methods. Two cases of a gap causing misassembly were found to be common in maize BACs, both involving the duplicated regions of LTRs of retrotransposons. The first misassembly type is much like any misassembly caused by a duplicated area within a BAC; the traces for one LTR all assemble into the second copy, breaking the sequence of the first LTR and causing a gap. This was seen most often in TEs with long LTRs where the whole sequence trace or even both end sequences from an entire subclone were within the LTR boundaries. This was also commonly seen on LTR retrotransposons with a recent age of insertion, and fewer polymorphisms introduced over the time since insertion between the two LTRs caused more assembly confusion. The second common case of misassembly was also caused by the LTRs of retrotransposons, seen when a LTR retrotransposon nested into one of the LTRs of an existing LTR retrotransposon. In this type, the gap can be found in either of the two LTRs of the first retrotransposon (Fig. 2A ). Once this insertion occurs, the sequence of one LTR is interrupted with the sequence of the nested transposon. To cause a gap, during assembly the sequence from the complete LTR incorrectly aligns to both LTR locations, removing the join between the interrupted LTR and the nested TE. This recruitment causes a gap; now one or both of the contig ends that point into the gap have assembled traces belonging to the other LTR (Fig. 2B) and can cause one of two gaps in the nested LTR or one gap in the unnested LTR of the original LTR retrotransposon.
The closure of the final unfinished gap, found in ZMMBBb0331I02 (Table I), has been hindered by long strings of simple repeat sequences. Simple repeats, such as homonucleotide polymers (AAAA), dinucleotide polymers (GAGAGA), or even larger repeated segments, inhibit thorough sequencing by allowing the DNA polymerase to slip on the DNA template or sequencing product, resulting in either a loss of polymerase or unreadable sequence beyond the difficult region. On one contig end this gap has a 305-bp string of GAs. The other side, starting from the gap and traveling into the contig, has approximately 700 bp of unique sequence, followed by 396 bp of GA repeated, followed by 620 bp of TTAGGG repeated, followed by 50 bp of ATs. Plasmid subclones surrounding the gap have not been able to close the gap when sequenced with the transposon-bombing method, and primers designed from the surrounding area have been unable to amplify PCR products. Sequencing off of primers designed in the most internal unique regions provides less than 100 bp of sequence. All of these results suggest a strongly bound hairpin across this area preventing complete sequence, with a possible fifth simple repeat section still within the gap.
Computational Methods for Closing Difficult Gaps: Genome-Based Approach
The first step in the genome-based approach was to run both contigs surrounding the gap with TEnest (Kronmiller and Wise, 2008
Once the nesting structure of the TEs was identified using the above process, a string of DNA sequence could be filled in to span the gap. Sequence surrounding the gap was built to resemble the predicted nested TE structure. This built sequence contains three sections. The split LTR is formed by identifying its missing sequence donated by the corresponding full LTR. The join point between the split LTR and the nested TE exactly identified the nested location on the other side of the split LTR. Finally, the sequence of the nested TE is added to complete the sequence spanning the gap. A low-quality backbone phd file (Ewing et al., 1998
Computational Methods for Closing Difficult Gaps: Sequence-Based Approach This sequence-based approach was most useful on the simpler gaps caused by duplicated regions in the BAC that condensed the sequence into one region. In these misassemblies, the collapsed traces were identified by their plasmid mate pairs anchored in unique sequence and forced to assemble into the duplicated copy. This process also proved to be helpful to build a backbone phd sequence when closing gaps by the genome-based approach explained above. Often, the sequences that were needed to span the gap were hard to identify or did not match the predicted backbone sequence well enough to find by assembly or by hand, and this sequence-based method was useful to draw them to the correct location.
The two sequence contigs were repeat annotated with TEnest (Kronmiller and Wise, 2008
Definite separation between gene areas and repeat areas can be seen when large sections of the maize genome are evaluated. In maize, this phenomenon is known as oceans and islands, where islands of genes are found within oceans of repetitive clusters (SanMiguel et al., 1998
TEnest displays clusters of TE insertions, with multiple layers of chronologically inserted TEs nested into one another. As repeat clusters become more dense and complex, the heights or levels of these TE insertion clusters increase. The level heights of TEnest-displayed repeat clusters observed here correspond to the lengths of repeat clusters; large repeat clusters contain more TE insertions, which have higher levels of nested TEs. The largest repeat group, repeat cluster 13 (RC13) at 155 kb, has 18 TE insertions, 12 of which are full LTR retrotransposons (Table II). This cluster has a height of six nested TEs. Estimated times since TE insertion are spread evenly throughout the repeat clusters; larger clusters do not have younger or older LTR retrotransposon insertions when compared with smaller clusters. As expected, TE nested clusters are seen with older insertions found lower in the cluster and younger insertions found at higher levels. Partial TE insertions, resulting from whole TEs that have either undergone a deletion or rearrangement at the sequence location or that have mutated significantly so that characterization becomes increasingly difficult, are most often found at the lowest levels of nested TE clusters and so correspond to the oldest TE insertions. This is expected, as after enough time for mutations to accumulate the identified TE fragments cannot be reconstructed.
LTR retrotransposons were examined for differing insertion patterns between repeat clusters. Of the three most abundant retrotransposons found in maize (Meyers et al., 2001 Distances between clusters, which can also be characterized as length of gene islands, range in size from 4 to 98 kb, averaging 33 kb long. These sizes heavily rely on the definition of repeat clusters and would significantly change with modifications to this rule that would separate or combine the repeat cluster sets. Gene islands are not devoid of TE insertions, as described by the definition for repeat clusters. We also attempted to characterize the differences between TEs found inserted within TE clusters versus those found in gene islands. Many of the TE insertions within gene islands are partial LTR retrotransposons: 18 TEs out of 36 total gene island TEs. This suggests that ancient TE insertions have occurred in these areas and have since been mutated beyond recognition. Eleven whole LTR retrotransposons were found in gene islands. These are not younger, recently integrated LTR retrotransposons but rather older yet complete insertions. Instead, the recently inserted LTR retrotransposons are seen almost exclusively at the top levels of repeat clusters. There is one observed exception: a Shadowspawn LTR retrotransposon inserted into rf1-C2 at 389 kb has an estimated time since insertion of 0.231 million years ago. Also seen is nested Ji retrotransposon (0.154 million years ago) inserted within an older Huck (0.654 million years ago) found in a gene island between 456 and 507 kb on rf1-C2. The Huck TE follows the observed pattern of older LTR retrotransposons inserted into gene islands, the younger Ji does not, but because it is inserted within the Huck element, the selective pressures against its insertion may not be as strong than if it was to insert directly within the gene island; thus, it has less chance to disrupt nearby gene functions.
Sequence file repeats masked by TEnest were used for gene prediction. These masked files were analyzed with three programs: GeneSeqer (Schlueter et al., 2003
Complete gene models were identified for all 14 predicted genes. Gene model and exon coordinates are given in Supplemental Table S1. Predicted functions were assigned to nine of the identified genes (Table III). Genes that we were unable to assign function were given one of two notations: predicted, if the predicted protein has a full-length alignment to other submitted nonfunctionally characterized proteins; or hypothetical, if the predicted protein has a less than full alignment to submitted proteins. Hypothetical predicted genes, while having complete gene model predictions, are suspect due to their incomplete alignments and may be pseudogenes or false gene predictions. This corresponds to one predicted gene and four hypothetical genes. A gene density of one gene per 111 kb is much less than other observed rates of gene densities over long distances of the maize genome: for example, one gene per 19 kb over 2.8 Mb (Brunner et al., 2005 The lengths of predicted genes range from 180 to 1,578 bp (with introns removed), having a median of 719 bp and a mean of 798 bp. Full genes (including introns) range from 180 to 16,472 bp in length, giving a median of 1,212 bp and a mean of 2,786 bp. Exons have a median of 117 bp and a mean of 205 bp in length. The number of exons per gene ranges from one to 14. Introns have a median of 151 and a mean of 529 bp in length. In one example, a TE inserted within the intron of a gene has increased the length of the intron. Gene 10 on rf1-C2, a mov/MPN/PAD-1 family protein, has an almost complete Jaws retrotransposon found within intron 5. We identified 14 gene islands as a result of characterization of 16 nested TE clusters. Because our repeat cluster definition (explained above) did not allow repeat clusters to contain predicted non-transposon-related genes, all of the predicted genes are found in these 14 gene islands. While genes found within gene islands or between islands do not seem to form any tight clusters, there is obvious clustering of genes when observed on a contig-wide scale. Gene islands have just one or a few predicted gene annotations, and no gene islands contain large clusters of genes.
To examine sequence collinearity between grass genomes, the 14 gene islands were aligned to the rice assembly (International Rice Genome Sequencing Project, 2005 Seven out of the 14 predicted maize genes align when compared to the rice genome, all seven seen in a syntenic location on rice chromosome 1. As illustrated in Figure 4 and Table III, predicted maize genes 1, 2, 5, 7, and 8 on rf1-C1 correspond to gene exons of rice chromosome 1 between 8.2 and 8.4 Mb with a conserved order. rf1-C2 genes 10 and 13 also align to gene exons of rice chromosome 1 in a conserved order and orientation, approximately 5 Mb farther along the rice chromosome at 13.2 Mb. Of the seven genes found in conserved collinear locations, two genes, 8 and 13, are found in a reverse orientation relative to maize. One nonpredicted region on the maize contigs, a region near 85 kb on rf1-C1, aligns to rice gene Os01g14670 on rice chromosome 1 also in this conserved location, near 8.2 Mb. These conserved gene regions show expanded intragene distance in maize as compared with rice, as expected by the increased density of repeat clusters surrounding gene islands.
Ten of the 14 predicted maize genes align to the sorghum genome. On rf1-C1, predicted genes 1, 2, and 4 align with a conserved order and orientation to a 50-kb region on sorghum chromosome 3 near 5.4 Mb. This same set of predicted maize genes, along with genes 5, 6, 7, and 8, are found also on sorghum chromosome 3 near 10.2 Mb (Table III; Fig. 4). This shows that at least 500 kb of the maize sequence is duplicated in the sorghum genome on the same chromosome, while only one copy of this region is found in rice, and only one copy of this region is found in the currently sequenced maize genome. Similar to the rice genome comparison, the nonpredicted region near 85 kb on rf1-C1 aligns to sorghum chromosome 3 at both 5.4 and 10.2 Mb. rf1-C2 gene predictions show that genes 10, 12, and 13 are shared between maize and sorghum over the sequence of this contig in similar order and orientation. The four maize genes that did not have sorghum counterparts correspond to the four hypothetical gene predictions, further suggesting that these may not be real genes. The set of seven predicted maize genes found on rice chromosome 1 in a conserved order are found in the set of 10 genes found conserved when compared with the sorghum genome. The two genes in conserved order and location but found in a reverse direction in rice are seen in the same orientation in maize and sorghum, suggesting that the direction change for these genes occurred either in rice after the split to maize/sorghum or in the maize/sorghum ancestor. Three maize genes are found in two locations on sorghum chromosome 3, and these genes are not found duplicated in the rice genome. These three genes are not seen duplicated in the initial maize genome sequence, either on chromosome 3 or elsewhere.
Based on the sequence length, 78% of the rf1-associated contigs consist of repetitive sequences (Table I). For an extremely repetitive organism, maize BAC clones are not overly difficult to assemble. Compared with the assembly of much less repetitive genomes, such as rice (35% repetitive; International Rice Genome Sequencing Project, 2005 Seven of the predicted maize genes are found conserved in the rice genome, and 10 of the predicted maize genes are found in the sorghum genome. One nonpredicted gene region is found conserved in both rice and sorghum; this is not near any predicted maize genes and suggests that it is a pseudogene. Fifty percent of predicted maize genes are found in collinear locations on rice chromosome 1, and 71% of predicted maize genes are found collinear to sorghum chromosome 3. For high-confidence gene models (the set of predicted maize genes excluding those termed hypothetical), 70% are found in collinear locations on rice chromosome 1 and 100% are found in collinear locations on sorghum chromosome 3. Of genes found conserved across both compared organisms, 27% of shared genes are not seen collinear between maize and rice and 23% of shared genes are not seen in collinear locations between maize and sorghum. Gene islands are not found conserved in their entirety in their orthologous locations. Rather, gene islands are made up of one to two collinear genes, with additional genes found on other chromosome locations or not found in the comparison organism. In the maize-to-rice comparison, one gene island is found containing at least two genes in the collinear region. The distance between these two genes expanded by almost 7-fold in maize. In the maize-to-sorghum comparison, three sets of genes are found with two genes in a gene island in the collinear region. One set of genes is seen with a similar distance between the genes in maize and sorghum, one set has had an approximately 3-fold expansion in maize relative to sorghum, and the final set of genes, the same set observed in the maize-to-rice comparison, has experienced an almost 9-fold increase of intergene distance in the maize genome. While the most common increase of intergene distance has occurred between gene islands, increase in genome sequence is not limited to repeat clusters. In several instances, genes found on the ends of collinear regions of rice and sorghum did not have a maize counterpart; however, due to the increased intergene distances, these genes may be found off the ends of our sequenced contigs. Sixteen repeat clusters were identified across the two sequenced contigs. These clusters are 23 to 155 kb long and contain a variety of TEs and LTR retrotransposons with a range of insertion ages. In a few cases, several LTR retrotransposon families are seen highly clustered in tight groupings within one to two repeat clusters and may indicate preferential nesting of TEs. Recent insertions of LTR retrotransposons, those that can be considered as the currently active replicating and transposing elements, are seen almost exclusively in the top levels of nested repeat clusters. Insertions into these locations are farther away from genes; therefore, mutations in these regions have a less detrimental effect on the organism. Gene islands, located between each repeat cluster, range from 4 to 98 kb long and contain from one to four gene predictions. The average gene density across islands is one gene per 16 kb for islands that contain genes. This density is not consistent across islands; larger gene islands do not necessarily contain more genes. While it may be an artifact of our definition of repeat oceans and gene islands, TEs found inserted in gene islands are seen on a very small scale as opposed to the large nested repeat clusters. In all but one case, LTR retrotransposon insertions in gene islands are estimated to have older ages of insertion when compared with the younger TE insertions on upper levels of repeat clusters. This suggests that TEs integrated near genes are rare or not selected for, possibly due to their potential to cause plant-altering mutations. One LTR retrotransposon is seen within the intron of predicted gene 10, increasing the size of the intron by 4.5 kb. The rice and sorghum ortholog counterparts to maize predicted gene 10 do not share this observed increase of intron length due to TE insertion. The architecture of maize varies across its expanse. From comparative sequence analysis of related grass genomes to the clustering of genes or repeats, diversity is observed at different sequence scales and across various sequence lengths. We hope the assembly techniques presented here will assist the community, ultimately providing long contiguous grass genome assemblies that facilitate examination of the genome as a whole.
Identification of BACs in the rf1 Region
Three maize (Zea mays) rf1-m allele families (rf1-m3207, rf1-m7323, and rf1-m7212; Wise et al., 1996 Two different BAC genomic library filters were obtained from the Clemson University Genomics Institute, ZMMBBa and ZMMBBb. After probing, additional ZMMBBb and ZMMBBc (Children's Hospital of Oakland Research Institute) BACs were computationally identified using maize WebFPC (http://www.genome.arizona.edu/fpc/maize/).
Genomic DNA, cDNA, AIMS, and RFLP probe fragments were labeled by random priming with [
BAC DNA was extracted by a modified alkaline lysis protocol obtained from the Clemson University Genomics Institute. BACs were digested with HindIII and run on a 0.9% LE agarose gel for fingerprint analysis. TIFF images were edited for lane tracking, individual band calling, and size fractionation with IMAGE software (Sulston et al., 1989
Sequenced BAC ends from each of the original putative contigs were used to make low-copy overgo probes, which were designed using Overgo Maker software (http://genome.wustl.edu/tools/software/overgo.cgi). The set lengths of the overgos are paired 24-mer oligonucleotides that contain an 8-bp complementary overlap with a GC range of 40% to 60%. The oligonucleotides were annealed to each other, and a fill-in reaction was performed using [
BAC clones were sequenced by MWG Biotech. BACs were sheered and cloned into 3-kb subclone libraries, and subclones were end sequenced to a coverage of 8x to 10x. The BAC sequences were initially assembled with the phred/phrap package (Ewing and Green, 1998
Finishing assembly was conducted with phrap and CAP3 (Huang and Madan, 1999
Assembly of repetitive gap regions was aided with the use of TEnest (Kronmiller and Wise, 2008
Sequence files masked with TEnest were used for gene predictions. Three programs were used: GeneSeqer (Schlueter et al., 2003
Orthologous regions were identified using the VISTA comparative genomics tools (Dubchak et al., 2000 Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers EF517601 (rf1-C1) and EF517600 (rf1-C2).
The following materials are available in the online version of this article.
We thank Karin Gobelman-Werner for expert technical assistance in construction of the sequence-ready BAC contigs. Received June 24, 2009; accepted August 3, 2009; published August 12, 2009.
1 This work was supported by the U.S. Department of Agriculture-National Research Initiative (grant no. 2002–35301–12064).
2 Present address: Mendel Biotechnology, Inc., 3935 Point Eden Way, Hayward, CA 94545–3720. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Roger P. Wise (rpwise{at}iastate.edu).
[W] The online version of this article contains Web-only data.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.109.143370 * Corresponding author; e-mail rpwise{at}iastate.edu.
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815[CrossRef][Medline] Bennetzen JL, Chandler VL, Schnable P (2001) National Science Foundation-sponsored workshop report: maize genome sequencing project. Plant Physiol 127: 1572–1578 Bennetzen JL, Ma J, Devos KM (2005) Mechanisms of recent genome size variation in flowering plants. Ann Bot (Lond) 95: 127–132 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2006) GenBank. Nucleic Acids Res 34: D16–D20 Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome Res 13: 97–102 Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13: 721–731 Bruggmann R, Bharti AK, Gundlach H, Lai J, Young S, Pontaroli AC, Wei F, Haberer G, Fuks G, Du C, et al (2006) Uneven chromosome contraction and expansion in the maize genome. Genome Res 16: 1241–1251 Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A (2005) Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17: 343–360 Celniker SE, Wheeler DA, Kronmiller B, Carlson JW, Halpern A, Patel S, Adams M, Champe M, Dugan SP, Frise E, et al (2002) Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol 3: RESEARCH0079 Chandler VL, Brendel V (2002) The maize genome sequencing project. Plant Physiol 130: 1594–1597 Coe E, Cone K, McMullen M, Chen SS, Davis G, Gardiner J, Liscum E, Polacco M, Paterson A, Sanchez-Villeda H, et al (2002) Access to the maize genome: an integrated physical and genetic map. Plant Physiol 128: 9–12 Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I (2003) Strategies and tools for whole-genome alignments. Genome Res 13: 73–80 Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA (2000) Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res 10: 1304–1306 Duvick DN, Snyder RJ, Anderson EG (1961) The chromosomal location of Rfl, a restorer gene for cytoplasmic pollen sterile maize. Genetics 46: 1245–1252 Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186–194 Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175–185 Feinberg AP, Vogelstein B (1983) A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity. Anal Biochem 132: 6–13[CrossRef][Web of Science][Medline] Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Res 32: W273–W279 Frey M, Stettner C, Gierl A (1998) A general method for gene isolation in tagging approaches: amplification of insertion mutagenised sites (AIMS). Plant J 13: 717–721[CrossRef][Web of Science] Fu Y, Emrich SJ, Guo L, Wen TJ, Ashlock DA, Aluru S, Schnable PS (2005) Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes. Proc Natl Acad Sci USA 102: 12282–12287 Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8: 195–202 Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing RA, Rounsley S, Birren B, et al (2005) Structure and architecture of the maize genome. Plant Physiol 139: 1612–1624 Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF (2006) Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res 16: 1252–1261 Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9: 868–877 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436: 793–800[CrossRef][Medline] Kaminker JS, Bergman CM, Kronmiller B, Carlson J, Svirskas R, Patel S, Frise E, Wheeler DA, Lewis SE, Rubin GM, et al (2002) The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol 3: RESEARCH0084 Kidwell MG, Lisch DR (2000) Transposable elements and host genome evolution. Trends Ecol Evol 15: 95–99[CrossRef][Medline] Kimmel B, Palozzolo M, Martin C, Boeke JD, Devine SE (1997) Transposon-Mediated DNA Sequencing. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16: 111–120[CrossRef][Web of Science][Medline] Kronmiller BA, Wise RP (2008) TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol 146: 45–59 Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al (2001) Initial sequencing and analysis of the human genome. Nature 409: 860–921[CrossRef][Medline] Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26: 1107–1115 Ma J, Bennetzen JL (2004) Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA 101: 12404–12410 Mao L, Wood TC, Yu Y, Budiman MA, Tomkins J, Woo S, Sasinowski M, Presting G, Frisch D, Goff S, et al (2000) Rice transposable elements: a survey of 73,000 sequence-tagged-connectors. Genome Res 10: 982–990 Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, McDonald KM, Hillier LW, McPherson JD, Waterston RH (1997) High throughput fingerprint analysis of large-insert clones. Genome Res 7: 1072–1084 Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16: 1046–1047 Meyers BC, Tingey SV, Morgante M (2001) Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. Genome Res 11: 1660–1676 Palmer LE, Rabinowicz PD, O'Shaughnessy AL, Balija VS, Nascimento LU, Dike S, de la Bastide M, Martienssen RA, McCombie WR (2003) Maize genome sequencing by methylation filtration. Science 302: 2115–2117 Pampanwar V, Engler F, Hatfield J, Blundy S, Gupta G, Soderlund C (2005) FPC Web tools for rice, maize, and distribution. Plant Physiol 138: 116–126 Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551–556[CrossRef][Web of Science][Medline] Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, et al (2006) Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res 16: 1262–1269 Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D (2005) Combined evidence annotation of transposable elements in genome sequences. PLOS Comput Biol 1: 166–175[Web of Science][Medline] Rabinowicz PD, Bennetzen JL (2006) The maize genome as a model for efficient sequence analysis of large plant genomes. Curr Opin Plant Biol 9: 149–156[CrossRef][Web of Science][Medline] Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA (1999) Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nat Genet 23: 305–308[CrossRef][Web of Science][Medline] Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10: 516–522 SanMiguel P, Bennetzen JL (1998) Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann Bot (Lond) 82: 37–44 SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL (1998) The paleontology of intergene retrotransposons of maize. Nat Genet 20: 43–45[CrossRef][Web of Science][Medline] SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, et al (1996) Nested retrotransposons in the intergenic regions of the maize genome. Science 274: 765–768 Schlueter SD, Dong Q, Brendel V (2003) GeneSeqer{at}PlantGDB: gene structure prediction in plant genomes. Nucleic Acids Res 31: 3597–3600 Soderlund C, Humphray S, Dunham A, French L (2000) Contigs built with fingerprints, markers, and FPC V4.7. Genome Res 10: 1772–1787 Soderlund C, Longden I, Mott R (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci 13: 523–535 Song R, Llaca V, Linton E, Messing J (2001) Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome Res 11: 1817–1825 Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12: 1599–1610 Sulston J, Mallett F, Durbin R, Horsnell T (1989) Image analysis of restriction enzyme fingerprint autoradiograms. Comput Appl Biosci 5: 101–106 Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520–562[CrossRef][Medline] Wei F, Coe E, Nelson W, Bharti AK, Engler F, Butler E, Kim H, Goicoechea JL, Chen M, Lee S, et al (2007) Physical and genetic structure of the maize genome reflects its complex evolutionary history. PLoS Genet 3: e123[CrossRef][Medline] Wessler SR (2006) The maize community welcomes the maize genome sequencing project. Curr Opin Plant Biol 9: 147–148[CrossRef][Web of Science] Whitelaw CA, Barbazuk WB, Pertea G, Chan AP, Cheung F, Lee Y, Zheng L, van Heeringen S, Karamycheva S, Bennetzen JL, et al (2003) Enrichment of gene-coding sequences in maize by genome filtration. Science 302: 2118–2120 Wise RP, Dill CL, Schnable PS (1996) Mutator-induced mutations of the rf1 nuclear fertility restorer of T-cytoplasm maize alter the accumulation of T-urf13 mitochondrial transcripts. Genetics 143: 1383–1394[Abstract] Wise RP, Gobelman-Werner K, Pei D, Dill CL, Schnable PS (1999) Mitochondrial transcript processing and restoration of male fertility in T-cytoplasm maize. J Hered 90: 380–385 Yuan Y, SanMiguel PJ, Bennetzen JL (2003) High-Cot sequence analysis of the maize genome. Plant J 34: 249–255[CrossRef][Web of Science][Medline] Zhang Q, Arbuckle J, Wessler SR (2000) Recent, extensive, and preferential insertion of members of the miniature inverted-repeat transposable element family Heartbreaker into genic regions of maize. Proc Natl Acad Sci USA 97: 1160–1165
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|