|
|
||||||||
|
Plant Physiology 136:3486-3503 (2004) © 2004 American Society of Plant Biologists Sequence and Comparative Analysis of the Maize NB Mitochondrial Genome1,[w]Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri 63108 (S.W.C., P.M., H.S., R.K.W.); University of Utah, Eccles Institute of Genetics, Salt Lake City, Utah 84112 (C.M.-R.F., M.G.); Division of Biological Sciences, University of Missouri, Columbia, Missouri 65211 (J.O.A., M.T., S.K., C.T., L.M., K.J.N.); and Donald Danforth Plant Science Center, St. Louis, Missouri 63132 (B.B.)
The NB mitochondrial genome found in most fertile varieties of commercial maize (Zea mays subsp. mays) was sequenced. The 569,630-bp genome maps as a circle containing 58 identified genes encoding 33 known proteins, 3 ribosomal RNAs, and 21 tRNAs that recognize 14 amino acids. Among the 22 group II introns identified, 7 are trans-spliced. There are 121 open reading frames (ORFs) of at least 300 bp, only 3 of which exist in the mitochondrial genome of rice (Oryza sativa). In total, the identified mitochondrial genes, pseudogenes, ORFs, and cis-spliced introns extend over 127,555 bp (22.39%) of the genome. Integrated plastid DNA accounts for an additional 25,281 bp (4.44%) of the mitochondrial DNA, and phylogenetic analyses raise the possibility that copy correction with DNA from the plastid is an ongoing process. Although the genome contains six pairs of large repeats that cover 17.35% of the genome, small repeats (20500 bp) account for only 5.59%, and transposable element sequences are extremely rare. MultiPip alignments show that maize mitochondrial DNA has little sequence similarity with other plant mitochondrial genomes, including that of rice, outside of the known functional genes. After eliminating genes, introns, ORFs, and plastid-derived DNA, nearly three-fourths of the maize NB mitochondrial genome is still of unknown origin and function.
Mitochondrial genomes have been sequenced from a large number of protists, algae, fungi, and animals, but from few plants (for review, see Burger et al., 2003
Although angiosperm mitochondrial genomes are at least 10 times larger than those of mammals, the total number of known genes they encode is fewer than twice as many as their mammalian counterparts. The mtDNAs of both plants and animals include genes for ribosomal RNAs, tRNAs, and several subunits of the oxidative phosphorylation complexes. The greatest difference is that some of the ribosomal proteins and some of the proteins involved in the biogenesis of cytochrome c are coded for by mtDNA in plants, whereas they are coded for by nuclear DNA in animals. In several angiosperm genera, two subunits of the succinate dehydrogenase complex are also coded for by mtDNA (Adams et al., 2001
One inference from the small number of sequenced plant mitochondrial genomes is that their sizes vary independently of the number of functional genes. The mitochondrial genome of the liverwort, M. polymorpha, was reported to be 184 kb and to encode 66 identified genes, including ribosomal and tRNAs (Oda et al., 1992
Comparative analyses of mitochondrial genes have shown that, with rare exceptions (Palmer et al., 2000
It is not clear why plant mitochondrial genomes rearrange so readily, or how their genomes expand and contract over such short evolutionary times. Complete mitochondrial sequence data are needed for many more plants, including closely related taxa, to address the question of how rapid changes in their intergenic regions occur. Relationships among grasses have been extensively studied (e.g. Freeling, 2001
Previous sequencing of plant mitochondrial genomes used cloned DNA: cosmids for M. polymorpha (Oda et al., 1992
The final assembly of the maize NB mitochondrial sequences generated a single circular map of 569,630 bp (Fig. 1), larger than any of the previously sequenced plant mitochondrial genomes and remarkably similar to the estimates from early mapping studies (570 kb; Lonsdale et al., 1984
The actual NB mitochondrial genome complexity is 520 kb, which is calculated by removing one copy of each of the large (>500 bp) repeats from the 570-kb circle. Either of these numbers is still larger than any previously sequenced plant mitochondrial genome. In comparison, the rice mitochondrial genome has been reported to have an overall size of 490.5 kb (Notsu et al., 2002
The maize NB main mitochondrial genome contains 58 identified genes, including 34 genes coding for 33 known proteins (Table I). They include 22 proteins of the electron transport chainnine subunits of complex I: NADH dehydrogenase subunits 1, 2, 3, 4, 4L, 5, 6, 7, and 9) (NAD1, 2, 3, 4, 4L, 5, 6, 7, and 9); one subunit of complex III: apocytochrome b (cob); three subunits of complex IV: cytochrome c oxidase subunits 1, 2, and 3 (COX 1, 2, and 3); five subunits of complex V: ATP synthase F1 subunits 1, 4, 6, 8, and 9 (ATP 1, 4, 6, 8, and 9); and four subunits of a complex involved in the biogenesis of cytochrome c: subunits B, C, and F (CCMB, C, FN, and FC). Also present is a gene formerly called orfX, now renamed mttB, that codes for a transporter protein and is reported to be homologous to the Escherichia coli tatC gene (Bonnard and Grienenberger, 1995
Despite the higher genome complexity of the maize NB mitochondrial genome, it actually encodes two fewer proteins than rice (33 in maize versus 35 in rice; Table I). As has been previously noted, when plant mitochondrial genomes differ in gene content, it is usually in the number of ribosomal proteins present (for review, see Palmer et al., 2000
The functional copies of mitochondrial rps11 and rps14 lie in the nucleus in both rice and maize, but the rice mitochondrial genome retains pseudogenes of each (Notsu et al., 2002
Maize NB mtDNA has 3 ribosomal RNA genes (5S, 18S, and 26S) and 32 tRNA genes (Sangaré et al., 1990
Although plant mitochondria use the universal genetic code and require tRNAs for all 20 amino acids, which mitochondrial tRNAs are actually encoded by the mtDNAs of plants is quite variable (Table I). Functional tRNA genes for six amino acidsAla, Arg, Gly, Leu, Thr, and Valare missing from the NB mtDNA. Since all of them are required for protein synthesis in mitochondria, they are presumably encoded by the nuclear genome and imported from the cytosol into the mitochondria (see Maréchal-Drouard et al., 1993
Twenty-one of the expressed tRNAs display a classic cloverleaf structure, whereas each of the two tRNA-Sers (tRNA-SerGCU and tRNA-SerUGA) fold into an unusual five-loop secondary structure. Five of the tRNA genes (trnD, trnN, trnI, trnQ, and trnP) are present in duplicate because they are located within a 17-kb repeated sequence. Posttranscriptional modification from C to U within the anticodon sequence is necessary to generate a functional tRNA-Ile, which is similar to the situation reported in potato mitochondria (Weber et al., 1990
A total of 22 identifiable group II introns are present within 8 of the protein-coding genes, including 7 trans-spliced introns that are part of nad1, nad2, and nad5. Fifteen cis-spliced introns are located in cox2, nad1, nad2, nad4, nad5, nad7, ccmFC, and rps3. The numbers and locations of introns are almost identical to those in the other sequenced angiosperm mitochondrial genomes. The only differences found are that sugar beet lacks the rps3 intron and the second intron of nad4 (Kubo et al., 2000
The functional mitochondrial rRNA and tRNA genes of the sequenced angiosperms lack introns, but insertions are found within four of the tRNA pseudogenes in the maize NB mtDNA. The mitochondrial
An ORF was defined as an in-frame sequence 300 bp or longer that is bounded by a start and a stop codon, with no match to a coding sequence in the public databases. This definition excludes smaller proteins and does not indicate whether the sequence is expressed. There are 121 mitochondrial ORFs, most of which are unique to maize, and 7 plastid-derived ORFs (Supplemental Table I). None of the maize mtDNA ORFs are maintained among all sequenced higher-plant mtDNAs; only one (orf140-b) is found in Arabidopsis and three (orf99-a, orf140-b, and orf146-a) are found in rice. Compared with maize, orf99-a is the same size in rice, orf146-a is slightly shorter in rice, and orf140-b is shorter in rice and slightly longer in Arabidopsis. Ten of the maize ORFs are found within cis-spliced introns, nine of them within nad2. A well-known intron-located gene in plants is the maturase-related mat-r gene (Wahleithner et al., 1990 Only two ORFs (orf186 and orf127) have been found to be truly chimeric in the NB mitochondrial genome of maize. A short segment (79 nucleotides) derived from cox2 resides within orf186, whereas orf127 contains 209 bp from rps12. Some other genes or portions of genes are duplicated because they are present within repeated DNA. The atp1 gene is present within the 15-kb repeat; nad1-exon1, a trans-spliced exon, lies within the 17-kb repeat (Fig. 1). In addition, exons from rps3 (exon 1) and nad2 (exons 4 and 5) are duplicated because their adjacent introns span the borders of the 11-kb and 17-kb repeats, respectively (Fig. 1).
In the NB mitochondrial genome, there are many more ORFs (
The NB mitochondrial genome contains two large insertions of ctDNA, one of 12.6 kb (Stern and Lonsdale, 1982
Detailed analyses revealed that the 12.6-kb segment appears to be part of a much larger portion of the IR and adjacent DNA that has come to reside in the mtDNA (Fig. 2A). The transferred region would have extended approximately 21 kb from plastid coordinate 82 to 103 kb, with the additional sequences located on either side of the segment originally reported by Stern and Lonsdale (1982) The transferred ctDNA segments are present as single copies, except for those that occur in the NB mtDNA large repeats. Of the 18 segments larger than 100 bp, only 3 are also present in rice. One of them, a 550-bp portion of rpoC1 (RNA polymerase subunit C1 from maize plastid coordinate 25 kb), is part of a 7-kb plastid segment in the rice mitochondrial genome. The other two fragments together constitute a single 4.1-kb region in the mitochondrial DNA, as described below. The 4.1-kb (4,098 bp) region of plastid origin is located between maize mitochondrial coordinates 346.5 kb and 350.5 kb, and is a composite of two separate regions of ctDNA. This segment includes homologs of the plastid genes rbcL, rpl23, rpl2, orf75, trnH, and rps19 (Fig. 2B). In the plastid genome, rpl23 through rps19 are located in the IR (segment C in Fig. 2A), whereas rbcL and an rpl23 pseudogene are located outside of the IR in the large single-copy region (segment A in Fig. 2A). The joining of these two sequences appears to have been the result of recombination between the 260 bp that is common to both the rpl23 pseudogene and the IR rpl23.
The rice mitochondrial genome contains a similar composite region at a sequence homologous to the maize insertion site (Fig. 2B, top). The right end of the plastid insertion ends at the same point within rps19 in both maize and rice. At the other (rbcL) end, the rice plastid-derived sequence extends an additional 3.3 kb upstream. Rice and maize also have different points of recombination within the overlapping region in rpl23 (Fig. 2B, bottom). The fact that the same terminus and the same insertion point, along with the unusual organization of the 4.1-kb region, occur in both maize and rice mtDNA suggests that this transfer of plastid DNA occurred before the divergence of rice and maize. However, if this hypothesis were correct, the plastid-derived insert in maize mtDNA would be expected to be more similar to the plastid-derived insert in rice mtDNA than to its own plastid sequence. In fact, the opposite is observed; the plastid-derived insert in maize mtDNA is more similar to maize plastid DNA, as is shown by phylogenetic analyses of the concatenated 4.1-kb region (Fig. 2C). The finding that the maize 4.1-kb plastid-derived mitochondrial insert groups with maize ctDNA, and that the similar rice plastid-derived mitochondrial insert groups with rice ctDNA, held true for distance, maximum parsimony, and maximum likelihood methods, as well as in analyses using components of each sequence separately, e.g. the rbcL or the IR segment and individual genes (data not shown). Similar results were obtained for the mtDNA copy of rbcL by Cummings et al. (2003)
If the plastid DNA transfer did indeed occur after the divergence of maize and rice, one must account for the facts that (1) the IR plastid segment has the same right end breakpoint and insertion point, and (2) both segments of the insertion appear to be roughly contemporary in both species. This would require that, when the IR segments from the plastid genomes of maize and rice inserted into the maize and rice mitochondrial genomes, recombination occurred using the same plastid and mitochondrial sequences, yielding identical products at the rps19 end. This would have been followed in each species by recombination with the rbcL-containing segment within their regions of homology within rpl23. If both events in both species occurred well after the divergence of the maize and rice lineages, the sequences would yield the phylogenetic results obtained here and by Cummings et al. (2003) An alternative explanation for the phylogenetic results is that the chimeric, plastid-derived region was generated within the mitochondrial genome prior to the divergence of rice and maize, but that copy correction is occurring; i.e. there is an ongoing process in which DNA from the plastid recombines with homologous, plastid-derived DNA in the mitochondrion. Thus, the presence and organization of the 4.1-kb region would be old, but the plastid sequences comprising it would be recently acquired. In the copy-correction scenario, the transfer of plastid DNA into the mitochondrion could be a relatively common event in which the transferred DNA integrates only rarely, but more freely participates in recombination with existing mtDNA sequences of plastid origin. The recombinogenic nature of plant mitochondrial genomes would assist in this process, as would the proclivity of plant mitochondria to take up exogenous DNA.
The six largest plastid-derived regions within maize NB mtDNA are found to be at least 96% identical to their plastid counterparts, which provides further evidence for recent uptake of plastid sequences by the mitochondrial genome. Indeed, the 12.6-kb region is 99.8% identical over the nucleotides common to the mitochondrial and plastid genomes. Although this type of data has been considered evidence of their recent acquisition (Cummings et al., 2003
A variety of repetitive DNA motifs occur in the maize NB mitochondrial genome. There are five pairs of large directly repeated sequences, with repeat lengths of 14,936; 11,092; 5,270; 719; and 543 bp, as well as one large IR of 16,870 bp (Fig. 1). Overall, the large repeats account for 17.35% of the genome. The 0.7-kb and 0.5-kb repeats are small enough that our sequencing strategy allowed an individual determination of each repeat sequence. One copy of the 0.7-kb repeat is slightly larger (725 bp) due to a 6-bp insertion.
Studies with cucurbit mtDNAs (Lilly and Havey, 2001
An SDR family was defined as sequences having at least 90% similarity and identical length. Similar families were grouped into superfamilies, as shown in Figure 3. For example, all four SDR families within superfamily 13 (Fig. 3A) have 22 bp in common, but the individual families, with two to four members each, have larger or smaller terminal extensions. Superfamily 2 (Fig. 3B) is an example of a simple sequence repeat superfamily, containing families that each have the sequence TACTA tandemly repeated four to five times. It is a large superfamily, having 13 families with as many as 17 members each and containing 91 members overall. It accounts for 2,011 bp of the NB genome. The largest group is superfamily 1 (Fig. 3C), with 30 SDR families that include 117 sequences and account for 3,797 bp of the NB genome. The families constitute a sliding sample across a "consensus" region (e.g. SDR274 in Fig. 3C), such that the repeats at the top of the alignment have no region of overlap with the sequences at the bottom of the alignment.
There are 197 SDR families, of which 143 are grouped into 31 superfamilies. The remaining 54 families each contain 2 to 5 members. In both groups, 99% of the SDRs are 20 to 50 bp long, and 57% are 20 to 30 bp long (Table III). In a genome of the size of the maize NB mitochondrion, most of the observed repeats of less than 30 bp are probably the result of chance. Analyses of a set of 10 computer-generated 600-kb genomes (see "Materials and Methods") having the same base composition as the maize NB genome showed that, of a total of 2,783 repeats, 2,321 (83.4%) were 20 to 24 bp and 405 (14.6%) were 25 to 29 bp. There were no repeats in the randomly generated genomes larger than 45 bp, and an average of only one of 40 to 44 bp and two of 35 to 39 bp. Thus, repeats longer than 40 bp within the NB mitochondrial genome are probably not the result of chance. Overall, the maize NB mitochondrial genome has an under-representation of SDRs of 20 to 50 bp, and an over-representation of larger repeats. Nonetheless, the total contribution of repeats of less than 500 bp in length to the NB genome is only 31,843 bp, or 5.59% (Table III).
Searches for transposable element sequences in the NB mitochondrial genome showed little evidence for this type of repetitive DNA. Using the Institute for Genomic Research (TIGR) grass transposable element database as a reference (http://www.tigr.org/tdb/e2k1/plant.repeats/index.shtml) and with a minimum match of 50 bp, only four small fragments (50277 bp) with similarity to known retrotransposons were found (Supplemental Table II). Searching for IRs of 11 to 14 bp separated by 100 to 700 bp with FindMITE (Tu 2001
An in silico search for the presence of NB mtDNA within the nuclear genome of the B73 inbred line was performed. The Assembled Zea mays Database (AZM; Whitelaw et al., 2003
To estimate the amount of mtDNA present in the total maize nuclear genome, the extent of mitochondrial sequence within a random, unfiltered B73 genome shotgun library (Whitelaw et al., 2003 The amount of mtDNA in the nuclear genome of maize inbred B73 appears to be at least 290 kb, which is a rough estimate. There may be mitochondrial sequences within the nuclear genome that are closely associated with repetitive sequences that would be excluded during high Cot selection, or methylated sequences that would be excluded from methyl-filtered libraries. In addition, the less than 1% random sampling of B73 sequences may not be representative of the nuclear genome as a whole.
The transfer of individual organelle genes to the nucleus is well documented (e.g. Palmer et al., 2000
Comparisons of the angiosperm mitochondrial genomes were conducted using MultiPipMaker (Schwartz et al., 2000
Although genes generally are not clustered in the NB mtDNA, there are regions relatively richer or poorer in known genes (Fig. 4, A versus B). The MultiPip alignments show that the similarities among the mtDNAs are associated mainly with known coding regions (Fig. 4). Indeed, coding-region conservation can be high even in comparisons with M. polymorpha and Reclinomonas americana, a protist containing the most bacteria-like mtDNA (Lang et al., 1997
Of the 569,630-bp maize NB mitochondrial genome, 27.9% is at least 90% identical to rice mtDNA (minimum match of 20 contiguous bp). Conversely, 39.5% of the rice mitochondrial sequence is similar to maize NB mtDNA. Although the numbers of nucleotides in common are the same, the percentage is larger because the rice genome is smaller. When the stringency is decreased from 90% to 80%, the proportion of NB present in rice increases minimally, from 27.8% to 29.0% of the genome, and the proportion of rice present in NB increases from 39.5% to 41.0%. Decreasing the stringency from 80% to 70% does not increase the proportions further. Only 9.5% of the maize NB mtDNA is shared with Arabidopsis mtDNA and 13.8% of the Arabidopsis mitochondrial genome is shared with maize. Thus, even between two grasses, most of the mitochondrial sequences do not seem to have an identifiably common ancestry, and between a monocot and a dicot, all that the mitochondrial genomes seem to have in common are genes (Fig. 4). Most of the regions identified in maize mtDNA as containing ORFs are not conserved among the other taxa (Fig. 4B). Still, some of the mtDNA without known genes, but encompassing several maize ORFs or intergenic regions, was found to be shared between maize and rice mtDNAs (Fig. 4A). However, except for orf99a, orf140b, and orf146, the sequences that were ORFs in maize were not ORFs in rice (e.g. Fig. 4B). A striking feature of the maize NB mitochondrial genome is that most of the intergenic regions are not conserved with mtDNA even from another grass, and, in fact, they showed no sequence similarity to any other known sequences. The maize NB and rice mitochondrial genomes have approximately the same number of SDRs that are at least 50 bp long (Table III), repeats of a length that are unlikely to be present due to chance. This is despite the fact that the rice genome is 15% smaller than the maize NB genome and has a complexity that is only 77% that of NB.
Because maize and rice are both grasses, separated by only about 50 million years of evolution (see Gaut, 2002
Sequence Conservation between Two Fertile Genotypes of Maize
The other major mitochondrial genotype in male-fertile North American cultivated maize is termed NA. It was identified in the A188 inbred line, which is the cytoplasm present in most of the lines used to transform maize, and it has a larger, rearranged genome relative to NB (Fauron and Casper, 1994
NA was used as the reference genome in the MultiPip analysis shown in Figure 6. Of this 40 kb, only 26,080 bp are also present as long, homologous regions in NB, and they are found in four locations in the NB mtDNA. By design, the sequences for NA and NB begin at the same point, and the first 6,834 bp are shared (I in Fig. 6). There is a single nucleotide substitution and a single 10-bp gap in NB relative to NA. Following a 465-bp gap in NB relative to NA, segment II (6,065 bp) is present in an inverted orientation and includes 4 nucleotide substitutions. Segment II is composed of sequences homologous to the maize R1 plasmid (found as a free plasmid in two Latin American lines of maize; Weissinger et al., 1982
In addition to the large segments of shared sequence, there are eight small regions of similarity (86, 69, 53, 40, 36, 30, 29, 25, and 24 bp) that are found at dispersed locations in the NB genome. The smallest are not depicted in Figure 6. An additional 36 small sequences (25182 bp long, with at least 78% identity) that are present in segments I to IV are also repeated elsewhere in NB mtDNA. The first 40 kb of the NA genome also contains three insertions of plastid DNA. The smallest, starting at 10,351 bp, is identical to that at the same position in NB. The second smallest constitutes the first 260 bp of segment III. The third plastid-derived segment is a 3.7-kb region that contains four genes: trnV, trnM, atpE, and atpB. Interestingly, this ctDNA insertion is not present in the NB mtDNA. The rice mitochondrial genome also contains the 3.7-kb ctDNA insertion. Other than the plastid-derived sequence, approximately one-fourth of the rest of the first 40 kb of the NA mitochondrial genome is present in rice mtDNA. Segments II and III and much of segment I are absent from the rice mitochondrial genome. Most of segment IV, which contains three mitochondrial genes, is shared among NA, NB, and rice mtDNAs. Further comparisons are ongoing and will be reported elsewhere.
Although the maize mitochondrial genome is the largest that has been sequenced to date, the known gene space in maize is not larger than those of other sequenced plant mitochondrial genomes. In the 570-kb maize NB mitochondrial genome, identifiable mitochondrial exons comprise only 35.4 kb (6.22%) of the genome, and the cis introns account for 25.7 kb (4.51%). It is not yet known how much more of the genome is accounted for by trans-spliced introns, as well as by upstream and downstream regulatory regions, but it is unlikely to be a major fraction. Plastid-derived sequences comprise only 4.44%, and SDRs (2049 bp) only 3.47% of the NB genome. Thus, compared to other sequenced plant mitochondrial genomes, the maize mitochondrial genome has even more DNA of undetermined origin. The mechanisms for rapid mitochondrial genome rearrangement and expansion in plants remain enigmatic. Plant mitochondrial DNAs could be very interesting for evolutionary studies, because they are composed of two components: (1) slowly evolving DNA that includes coding regions and (2) noncoding DNA of obscure origin. Sequencing and comparing mtDNAs from a large number of closely related taxa should assist in understanding the evolution of plant mitochondrial genomes.
mtDNA Preparation
Seeds from normal, male-fertile maize (Zea mays) inbred B37N were obtained from Pioneer-Hi-Bred (Des Moines, IA). Mitochondria were prepared by differential centrifugation, and mtDNA was purified on CsCl gradients (Fauron et al., 1987
To generate random fragments, the mtDNA was processed in a French press (Schriefer et al., 1990
The DNA fragments were cloned into plasmid vector pOTMI (Lander et al., 2001
DNA template was purified on a Packard Bioscience DNATrak (CCS Packard, Torrence, CA) robot using a magnetic bead preparation (Hawkins et al., 1997 DNA was eluted from the paramagnetic beads with double-distilled water. The plates were placed on ring magnets during DNA transfer. Plates were sealed with foil tape and stored at 20°C.
ABI DyeTerminator (Sunnyvale, CA) reactions were used for sequencing. The reactions were performed in a 384-well format and were assembled using a BiomekFX robot. Thermocycling reaction times were as previously reported (Lander et al., 2001
After the library was constructed, an initial batch of 96 clones was purified and sequenced to assess library quality, after which other clones from the library were placed in the sequencing queue. From all templates processed, 80% were used for assembly, after low quality reads had been removed by the ASP script in use at the GSC. Chromatograms were processed using Phred (Ewing and Green, 1998
The maize NB mtDNA was sequenced by the whole genome shotgun method to a coverage depth of 22x. Following sequence assembly by Phrap, the database was viewed in Consed (Gordon et al., 1998
The primary database used for annotation was AceDB (http://www.acedb.org/). ORFs were initially identified using Artemis (Rutherford et al., 2000
The circular representation of the genome was drawn using DOCIRCLE (M. Gibson and C. Fauron, unpublished data), a program that draws circular DNA maps with genome features taken from text files. It is written in C++ and uses Embedded JavaScript (SpiderMonkey, http://www.mozilla.org/js/spidermonkey/) configuration files to specify how the figure is drawn.
Alignments were generated using MultiPipMaker, a web-based tool for genomic sequence alignments (Schwartz et al., 2000
Sequences were initially aligned using Pileup (GCG), ClustalW (Thompson et al., 1994
To search for transposable elements, the TIGR transposable element database was used in a BLASTN search of the NB genome. A minimum length of 15 bp and 90% sequence identity was required for assignment of a match. MITEs were searched for using the program FindMITE (Tu, 2001 SDRs were defined as sequences of at least 20 bp but less than 500 bp that are present more than once in the NB genome, are at least 90% identical in sequence, and are of exactly the same length. The NB genome was searched against itself using National Center for Biotechnology Information (NCBI) BLASTALL. Using RepeatExtractor (http://zeamtdna.missouri.edu/nlsap-gui.htm), an in-house script, all self-matches were removed (i.e. the query against its exact self), as were the 12 matches of the 6 known repeats of greater than 500 bp. BLAST lists reverse complements as separate elements, so all found repeats were compared, and reverse complement matches were combined if they were unique, or one of them was eliminated if they were redundant. BLAST sometimes erroneously finds small repeats only among large repeats, so large repeats were masked. However, repeats that are present exclusively inside a large repeat were counted, as well as repeats that are present both outside and inside a large repeat. For example, a repeat with one copy outside a large repeat and one inside would be counted three times. Other parameters were as noted in the text. The SDRs were then grouped into superfamilies using an in-house script based on BLASTN and aligned with Pileup (GCG). Ten random genomes were generated whose lengths were approximately the same as that of maize NB (600 kb) and whose overall nucleotide composition matched NB. The pseudo-genomes were strings of nucleotides generated using probabilities weighted to the proportions of the nucleotides in the NB genome, and verified in the resulting product. These genomes were subjected to the same SDR analyses as the NB genome itself.
B73 nuclear sequences available from TIGR as assemblies of methyl-filtered and high-Cot sequences (AZM release 2.0; http://www.tigr.org/tdb/tgi/maize/; Whitelaw et al., 2003 This is likely to yield an underestimate because there may be mitochondrial sequences within the nuclear genome that are closely associated with repetitive sequences that would be excluded during high-Cot selection, or methylated sequences that would be excluded from methyl-filtered libraries. The complete sequence described in this study, Zea mays ssp. mays cytotype NB mtDNA, is GenBank accession number AY506529. The partial sequence described in this study, Zea mays ssp. mays cytotype NA mtDNA, is GenBank accession number AY705912.
We thank John Spieth, Brandi Chiapelli, Warren Gish, William Nash, Jill Cifrese, and Leah Westgate for their contributions to this project. We thank Laurence Maréchal-Drouard, Jeffrey Palmer, David Stern, and Pankaj Jaiswal for helpful comments on the data and Makedonka Mitreva for help with final formatting and submission. We are grateful to our educational partners at Truman State University, Diane Janick-Buckner, Brent Buckner, and their students, particularly Anup Parikh, for input to the project. Received April 16, 2004; returned for revision August 25, 2004; accepted August 25, 2004.
1 This work was supported by the National Science Foundation Plant Genome Research Program (grant no. DBI0110168).
2 Present address: Magpie Systems, 4085 South 300 West, Salt Lake City, UT 84107.
3 Present address: Genome Science and Technology Program, University of Tennessee, Knoxville, TN 37996.
[w] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.104.044602. * Corresponding author; e-mail sclifton{at}watson.wustl.edu; fax 3142861810.
Adams KL, Rosenbleuth M, Qiu YL, Palmer JD (2001) Multiple losses and transfers to the nucleus of two mitochondrial succinate dehydrogenase genes during angiosperm evolution. Genetics 158: 12891300 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403410[CrossRef][Web of Science][Medline] Backert S, Nielsen BL, Börner T (1997) The mystery of the rings: structure and replication of mitochondrial genomes from higher plants. Trends Plant Sci 2: 477484[CrossRef][Web of Science]
Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL (2002) The Pfam protein families database. Nucleic Acids Res 30: 276280 Bendich AJ (1993) Reaching for the ring: the study of mitochondrial genome structure. Curr Genet 24: 279290[CrossRef][Web of Science][Medline]
Bogsch EG, Sargent F, Stanley NR, Berks BC, Robinson C, Palmer T (1998) An essential component of a novel bacterial protein export system with homologues in plastids and mitochondria. J Biol Chem 273: 1800318006 Bonnard G, Grienenberger JM (1995) A gene proposed to encode a transmembrane domain of an ABC transporter is expressed in wheat mitochondria. Mol Gen Genet 246: 9199[CrossRef][Web of Science][Medline]
Brennicke A, Zabaleta E, Dombrowski S, Hoffmann M, Binder S (1999) Transcription signals of mitochondrial and nuclear genes for mitochondrial proteins in dicot plants. J Hered 90: 345350 Burger G, Gray MW, Lang BF (2003) Mitochondrial genomes: anything goes. Trends Genet 19: 709716[CrossRef][Web of Science][Medline] Cummings MP, Nugent JM, Olmstead RG, Palmer JD (2003) Phylogenetic analysis reveals five independent transfers of the chloroplast gene rbcL to the mitochondrial genome in angiosperms. Curr Genet 43: 131138[Web of Science][Medline] Dietrich A, Small I, Cosset A, Weil JH, Maréchal-Drouard L (1996) Editing and import: strategies for providing plant mitochondria with a complete set of functional transfer RNAs. Biochimie 78: 518529[Medline]
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186194
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175185 Fauron C, Casper M, Gao Y, Moore B (1995) The maize mitochondrial genome: dynamic, yet functional. Trends Genet 11: 228235[CrossRef][Web of Science][Medline] Fauron CM, Casper M (1994) A second type of normal maize mitochondrial genome: an evolutionary link. Genetics 137: 875882[Abstract] Fauron CM-R, Abbott AG, Brettell RIS, Gesteland RF (1987) Maize mitochondrial DNA rearrangements between the normal type, the Texas male sterile cytoplasm, and a fertile revertant CMS-T regenerated plant. Curr Genet 11: 339346
Fauron CM-R, Havlik M (1988) The BamHI/XhoI, SmaI restriction maps of the normal maize mitochondrial genotype B37. Nucleic Acids Res 16: 10395
Freeling M (2001) Grasses as a single genetic system: reassessment 2001. Plant Physiol 125: 11911197 Gaut BS (2002) Evolutionary dynamics of grass genomes. New Phytol 154: 1528[CrossRef]
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8: 195202
Handa H (2003) The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res 31: 59075916
Hawkins TL, McKernan KJ, Jacotot LB, MacKenzie JB, Richardson PM, Lander ES (1997) A magnetic attraction to high-throughput genomics. Science 276: 18871889
Koch M, Haubold B, Mitchell-Olds T (2001) Molecular systematics of the Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences. Am J Bot 88: 534544
Kubo T, Nishizawa S, Sugawara A, Itchoda N, Estiati A, Mikami T (2000) The complete nucleotide sequence of the mitochondrial genome of sugar beet (Beta vulgaris L.) reveals a novel gene for tRNA(Cys)(GCA). Nucleic Acids Res 28: 25712576 Kumar R, Marechal-Drouard L, Akama K, Small I (1996) Striking differences in mitochondrial tRNA import between different plant species. Mol Gen Genet 252: 404411[Web of Science][Medline] Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al (2001) Initial sequencing and analysis of the human genome. Nature 409: 860921[CrossRef][Medline] Lang BF, Burger G, O'Kelly CJ, Cedergren R, Golding GB, Lemieux C, Sankoff D, Turmel M, Gray MW (1997) An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387: 493497[CrossRef][Medline]
Leon P, Walbot V, Bedinger P (1989) Molecular analysis of the linear 2.3 kb plasmid of maize mitochondria: apparent capture of tRNA genes. Nucleic Acids Res 17: 40894099
Lilly JW, Havey MJ (2001) Small, repetitive DNAs contribute significantly to the expanded mitochondrial genome of cucumber. Genetics 159: 317328
Lonsdale DM, Hodge TP, Fauron CM (1984) The physical map and organisation of the mitochondrial genome from the fertile cytoplasm of maize. Nucleic Acids Res 12: 92499261 Lonsdale DM, Hodge TP, Howe CJ, Stern DB (1983) Maize mitochondrial DNA contains a sequence homologous to the ribulose-1,5-bisphosphate carboxylase large subunit gene of chloroplast DNA. Cell 34: 10071014[CrossRef][Medline]
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 955964 Maier RM, Neckermann K, Igloi GL, Kössel H (1995) Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol 251: 614628[CrossRef][Web of Science][Medline] Maréchal-Drouard L, Weil JH, Dietrich A (1993) Transfer RNAs and transfer RNA genes in plants. Annu Rev Plant Physiol Plant Mol Biol 44: 1332[Web of Science] Marienfeld JR, Unseld M, Brennicke A (1999) The mitochondrial genome of Arabidopsis is composed of both native and immigrant information. Trends Plant Sci 4: 495502[CrossRef][Web of Science][Medline]
Martin W (2003) Gene transfer from organelles to the nucleus: frequent and in big chunks. Proc Natl Acad Sci USA 100: 86128614
Maul JE, Lilly JW, Cui L, dePamphilis CW, Miller W, Harris EH, Stern DB (2002) The Chlamydomonas reinhardtii plastid chromosome: islands of genes in a sea of repeats. Plant Cell 14: 26592679 Meyer LJ (2004) Tissue-specific ORF and gene expression analysis in maize mitochondria. MS thesis. University of Missouri, Columbia, MO Notsu Y, Masood S, Nishikawa T, Kubo N, Akiduki G, Nakazono M, Hirai A, Kadowaki K (2002) The complete sequence of the rice (Oryza sativa L.) mitochondrial genome: frequent DNA sequence acquisition and loss during the evolution of flowering plants. Mol Genet Genomics 268: 434445[CrossRef][Web of Science][Medline] Oda K, Yamato K, Ohta E, Nakamura Y, Takemura M, Nozato N, Akashi K, Kanegae T, Ogura Y, Kohchi T, et al (1992) Gene organization deduced from the complete sequence of liverwort Marchantia polymorpha mitochondrial DNA. A primitive form of plant mitochondrial genome. J Mol Biol 223: 17[CrossRef][Web of Science][Medline] Palmer JD (1990) Contrasting modes and tempos of genome evolution in land plant organelles. Trends Genet 6: 115120[CrossRef][Web of Science][Medline]
Palmer JD, Adams KL, Cho Y, Parkinson CL, Qiu YL, Song K (2000) Dynamic evolution of plant mitochondrial genomes: mobile genes and introns and highly variable mutation rates. Proc Natl Acad Sci USA 97: 69606966
Parsons JD (1995) Miropeats: graphical DNA sequence comparisons. Comput Appl Biosci 11: 615619 Perrotta G, Grienenberger JM, Gualberto JM (2002) Plant mitochondrial rps2 genes code for proteins with a C-terminal extension that is processed. Plant Mol Biol 50: 523533[CrossRef][Web of Science][Medline]
Richly E, Leister D (2004) NUMTs in sequenced eukaryotic genomes. Mol Biol Evol 21: 10811084
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B (2000) Artemis: sequence visualization and annotation. Bioinformatics 16: 944945 Sangaré A, Weil JH, Grienenberger JM, Fauron C, Lonsdale D (1990) Localization and organization of tRNA genes on the mitochondrial genomes of fertile and male sterile lines of maize. Mol Gen Genet 223: 224232[Medline]
Schriefer LA, Gebauer BK, Qui LQ, Waterston RH, Wilson RK (1990) Low pressure DNA shearing: a method for random DNA sequence analysis. Nucleic Acids Res 18: 74557456
Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, Smit A, Green ED, Hardison RC, Miller W (2003) MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res 31: 35183524
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W (2000) PipMaker: a web server for aligning two genomic DNA sequences. Genome Res 10: 577586 Senthilkumar P, Narayanan KK (1999) Analysis of rice mitochondrial genome organization using pulsed-field gel electrophoresis. J Biosci 24: 215222 Staden R (1996) The Staden sequence analysis package. Mol Biotechnol 5: 233241[Web of Science][Medline] Stern DB, Lonsdale DM (1982) Mitochondrial and chloroplast genomes of maize have a 12-kilobase DNA sequence in common. Nature 299: 698702[CrossRef][Medline] Swofford D (2002) Phylogenetic Analysis Using Parsimony (*and Other Methods), Version 4. Sinauer Associates, Sunderland, MA
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29: 2228
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 46734680
Tu Z (2001) Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. Proc Natl Acad Sci USA 98: 16991704 Unseld M, Marienfeld JR, Brandt P, Brennicke A (1997) The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides. Nat Genet 15: 5761[CrossRef][Web of Science][Medline]
Wahleithner JA, MacFarlane JL, Wolstenholme DR (1990) A sequence encoding a maturase-related protein in a group II intron of a plant mitochondrial nad1 gene. Proc Natl Acad Sci USA 87: 548552
Weber F, Dietrich A, Weil JH, Maréchal-Drouard L (1990) A potato mitochondrial isoleucine tRNA is coded for by a mitochondrial gene possessing a methionine anticodon. Nucleic Acids Res 18: 50275030
Weissinger AK, Timothy DH, Levings CS, Hu WWL, Goodman MM (1982) Unique plasmid-like mitochondrial DNAs from indigenous maize races of Latin America. Proc Natl Acad Sci USA 79: 15
Wendl MC, Dear S, Hodgson D, Hillier L (1998) Automated sequence preprocessing in a large-scale sequencing environment. Genome Res 8: 975984
Whitelaw CA, Barbazuk WB, Pertea G, Chan AP, Cheung F, Lee Y, Zheng L, van Heeringen S, Karamycheva S, Bennetzen JL, et al (2003) Enrichment of gene-coding sequences in maize by genome filtration. Science 302: 21182120 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|