A reevaluation of rice mitochondrial evolution based on the complete sequence of male-fertile and male-sterile mitochondrial genomes.

Plant mitochondrial genomes have features that distinguish them radically from their animal counterparts: a high rate of rearrangement, of uptake and loss of DNA sequences, and an extremely low point mutation rate. Perhaps the most unique structural feature of plant mitochondrial DNAs is the presence of large repeated sequences involved in intramolecular and intermolecular recombination. In addition, rare recombination events can occur across shorter repeats, creating rearrangements that result in aberrant phenotypes, including pollen abortion, which is known as cytoplasmic male sterility (CMS). Using next-generation sequencing, we pyrosequenced two rice (Oryza sativa) mitochondrial genomes that belong to the indica subspecies. One genome is normal, while the other carries the wild abortive-CMS. We find that numerous rearrangements in the rice mitochondrial genome occur even between close cytotypes during rice evolution. Unlike maize (Zea mays), a closely related species also belonging to the grass family, integration of plastid sequences did not play a role in the sequence divergence between rice cytotypes. This study also uncovered an excellent candidate for the wild abortive-CMS-encoding gene; like most of the CMS-associated open reading frames that are known in other species, this candidate was created via a rearrangement, is chimeric in structure, possesses predicted transmembrane domains, and coopted the promoter of a genuine mitochondrial gene. Our data give new insights into rice mitochondrial evolution, correcting previous reports.

Plant mitochondrial genomes share several unique structural features that distinguish them from both mammalian and fungal mitochondrial genomes. The size of the plant mitochondrial genome is highly variable and quite large, ranging from at least 200 kb to over 2,000 kb (Palmer, 1990;Alverson et al., 2011). Variation in size and also in genome organization of the plant mitochondrial genome can occur in a single species (Fauron et al., 1995). In contrast, the genomes of animal and fungal mitochondria are more compact and range in size from 15 to 75 kb (Kessler and Avise, 1985;Gray et al., 1999). Although angiosperm mitochondrial genomes have greatly expanded when compared with animal genomes, with, for example, the Arabidopsis (Arabidopsis thaliana) mitochondrial genome being 367 kb (Unseld et al., 1997) versus the human mitochondrial genome being 20 times smaller, the coding-size difference is accounted for by the extraordinarily high proportion (greater than 80%) of noncoding sequence in Arabidopsis mitochondrial (mt)DNA. As a result, the total number of known genes encoded by angiosperm mitochondrial genomes is fewer than twice as many as their mammalian counterparts. The greatest difference in encoded proteins is due to the presence of genes for ribosomal proteins and for some of the proteins involved in the biogenesis of cytochrome c, which are mitochondrially encoded in plants but are nucleus encoded in animals.
Perhaps the most unique structural feature of angiosperm mtDNAs is the presence of large repeated sequences that are partly responsible for their large size and the size variation observed in the same species (Notsu et al., 2002;Allen et al., 2007). Angiosperm mtDNAs have also grown because of the frequent incorporation of plastid and nuclear DNA (Stern and Lonsdale, 1982;Marienfeld et al., 1999). Conversely, gene losses occur frequently in angiosperm mitochondrial genomes; ribosomal proteins exhibit over 200 losses in angiosperms (Palmer et al., 2000). Similarly, gene losses have resulted in the plant mitochondrial tRNAs being encoded by the nucleus and imported to the mitochondrion (Duchêne et al., 2009), unlike in animals. Some of the functional mitochondrially encoded tRNAs in plants are of plastid origin; these are the only examples of foreign DNA transfer known to result in functional genes.
Angiosperm mitochondrial genomes are recombinationally active. The vast majority of plant mitochondrial genomes that have been analyzed possess structures known as recombination repeats, which are sequences present in at least two copies and which occur in multiple genomic environments that can be readily detected by physical mapping (Stern and Palmer, 1984;Fauron et al., 1995). As a result, the mitochondrial genome within even the same plant can be envisioned as a dynamic population of different molecules generated through homologous recombination (Folkerts and Hanson, 1989;Fauron et al., 1995). In addition to the high frequency of recombination between defined recombination repeats, rare recombination between other repeated sequences sometimes occurs in plant mitochondrial genomes. These rare events occasionally occur at detectable levels and are likely the source of substoichiometric molecules termed sublimons that persist together with the main mitochondrial genome (Small et al., 1987). Nuclear genes evidently control the mitochondrial substoichiometric shifting between different sublimons, such as the fertility restorer (Fr) gene in Phaseolus vulgaris (Mackenzie and Chase, 1990) and several genes in Arabidopsis, MutS Homolog1 (MSH1), organellar single-stranded DNA binding protein (OSB1), and recombination A3 (RECA3) (for review, see Arrieta-Montiel and Mackenzie, 2011). Consequences of these rare recombination events can manifest themselves through plant aberrant phenotypes (Sakamoto et al., 1996;Kubo and Newton, 2008), in particular, cytoplasmic male sterility (CMS; Pruitt and Hanson, 1989;Pla et al., 1995).
CMS is a phenotypic trait that is widespread among plants and results in the inability of the plant to produce viable pollen (Laser and Lersten, 1972). In all the characterized species, the mitochondrion has been shown to bear the causal defect. Generally, the regions whose expression is associated with CMS contain unusual open reading frames (ORFs) that are often chimeric in structure and frequently cotranscribed with conventional mitochondrial genes (for review, see Hanson and Bentolila, 2004). Natural suppressors of CMS, called restorers of fertility, are found in the nucleus and have the ability to restore the production of pollen to plants carrying the deleterious mitochondrial CMS-associated gene. CMS/restorer of fertility systems have been widely used in hybrid seed production, because they eliminate the need to emasculate the parental line used as the female. The hybrid will be fully male fertile providing that the male parental line carries the restorer of fertility in its nuclear genome.
Another unique characteristic of plant mitochondrial genomes is the generally very low, with few exceptions, rate of point mutation when compared with plastid DNA, nuclear DNA, or animal mtDNA (for review, see Palmer, 1990). The point mutation rate in angiosperm mtDNA is roughly four times slower than in plastid DNA and 100 times slower than in animal mtDNA (Palmer and Herbon, 1988). The intriguing aspect of this exceptionally low point mutation rate is its occurrence over the entire plant mitochondrial genome, even though most of it is constituted by noncoding DNA (Palmer and Herbon, 1988;Allen et al., 2007). In conclusion, the consensus view of the dynamics of angiosperm mtDNA evolution dynamics is an extraordinary high rate of rearrangements, genome growth, and shrinkage and the incorporation of foreign DNA but a very slow rate of nucleotide sequence divergence.
In this study, we report the complete sequences of two rice (Oryza sativa) mitochondrial genomes obtained by Roche/454 pyrosequencing technology. One mitochondrial genome comes from a maintainer line and is referred to as normal (N), while the other carries the wild abortive CMS and is hereafter referred to WA-CMS. There are over 60 types of rice CMS lines that can be categorized into three types, namely wild-abortive (WA) type, Honglian (HL) type, and Boro II (BT) type, based on their inheritance, morphology of abortive pollen, and identity of the fertility-restorer genes (for review, see Li et al., 2007). About 90% of the three-line hybrid rice in China is composed of hybrids derived from the WA-CMS lines. Despite its crucial economic importance, the causative mitochondrial gene for WA-CMS is unknown. In addition to allowing the identification of a candidate for the WA-CMS-associated ORF, our data give new insights into the evolution of the rice mitochondrial genome. The findings reported in this study contradict some previous work on rice mitochondrial genome variation (Tian et al., 2006) A striking result from the reference sequence mapping is the presence of numerous areas where no reads from either N or WA-CMS matched the reference sequence. An example of such a gap is given in Supplemental Figure S1, where the area from coordinates 36,402 to 41,914 is not covered by reads from the N mitochondrion. By contrast, this gap is absent from the WA-CMS mitochondrion, as many reads match this area (Supplemental Fig. S1).
In most of the cases where a gap was observed in either genome, N or WA-CMS, bridging contigs were found on either side of the gap that match partially to the sequence upstream or downstream of the gap. The other parts of these bridging contigs mapped to another region of the reference sequence. We counted nine of these rearrangements in the N mitochondrial genome where there is an interruption of the colinearity with the reference sequence (Supplemental Fig.  S2). The majority of the rearrangements bridge sequences bordering gaps, except for three of them, 2-4, 2-5, and 2-8, that bridge sequences bordering a gap to a sequence where there is no gap with the reference sequence (Supplemental Fig. S2). For instance, 2-8 links 243K, which is downstream of the gap 243K-244K to 304K. At position 304,040, some reads are read through with the reference sequence while others match until 304,040 but not downstream (Supplemental Fig. S3). In the rare cases where a sequence bordering a gap is not linked through rearrangement to another location of the reference sequence, this sequence is always found to be duplicated and the duplicated sequence does not border a gap. For example, the sequence 136K-141K is duplicated as 41K-47K. 136K (=41K) is linked through rearrangement 2-2 to 170K; 141K, which borders the gap 141K-148K and does not show any link to a rearrangement, is actually identical to 47K, which does not border a gap (Supplemental Fig. S2). Similarly, the sequence 148K-150K, which is surrounded by the gaps 141K-148K and 150K-154K and lacks any link to a rearrangement, is duplicated at the coordinates 206K-208K; 206K-208K does not border any gap in the N mitochondrial genome (Supplemental Fig. S2). The most likely explanation for the genesis of the N mitochondrial genome in this particular area is the loss of the whole sequence 130K-154K through recombination.
We found the same extent of missing sequence from the Nipponbare mitochondrial genome in the WA-CMS mitochondrion as in the N mitochondrion (Supplemental Fig. S4). Some of the gaps in the WA-CMS mitochondrial assembly are shared with the N mitochondrial assembly, for instance, the block 141K-148K, while other gaps are specific to WA-CMS, such as the block 218.8K-220K (Supplemental Fig. S4). It appears that most of the common gaps in WA-CMS and N are found in or close to the third largest duplicated region in Nipponbare (144K-167K; Supplemental Fig. S4).
With 15 rearrangements relative to the Nipponbare mitochondrial genome, WA-CMS shows more rearrangements than N, which exhibited only nine rearrangements. Two of these rearrangements are common to N and WA-CMS: 2-6 and 2-9, which link 164.5K to 390K and 274K to 421K,respectively (Supplemental Figs. S2 and S4). These common losses and rearrangements must have occurred in the common ancestor of N and WA-CMS during the lineage separation from Nipponbare.

Validation of the Rearrangements by PCR
In order to validate the rearrangements detected in N and WA-CMS mitochondrial genomes by 454 sequencing, we used primers in PCR designed to amplify regions specifically associated with the rearrangements. These primers could not lead to an amplification of a PCR product in the Nipponbare mitochondrion, either because they are too far apart or in the wrong orientation. All the rearrangements were validated by the production of PCR amplicons of the expected size (Fig. 1); furthermore, sequencing of the amplicons confirmed their sequence identity (data not shown).
Most of the rearrangements specific to one mitochondrial genome are present at a substoichiometric level in the other genome. At 20 cycles of PCR amplification, only templates carrying specific rearrangements were able to produce a detectable PCR product. For instance, 2-1 primers, specific to the 2-1 rearrange- Figure 1. Rearrangements in the N and WA-CMS mitochondria confirmed by PCR. Shown are negative images of an ethidium bromide-stained gel loaded with PCR products amplified with primers designed to produce amplicons specifically associated with the rearrangements. Above each bracket are given the rearrangements with the same numbers used in Figures 2 and 3. Each set of PCRs includes three lanes, a negative control (2; no DNA template), the N mitochondrion template (N), and the WA-CMS mitochondrion template (W). The PCRs were replicated four times and stopped at an incremental number of cycles (as shown at left). The left panels contain PCR products of small size amplified with a regular Taq enzyme, while the right panels contain larger amplicons obtained with the herculase enzyme (Stratagene). The primers could not lead to PCR amplification with the Nipponbare mitochondrion as a template, either because they are too distant (2-1, 67 kb; 4-8, 88 kb; 4-13, 277 kb) or in the wrong orientation. At 20 cycles, the PCR products are only detected with the primers specific for each template rearrangement: 2-1, 2-2, and 2-5 for N and 4-3, 4-7, 4-1, 4-8, and 4-13 for WA-CMS. Primers associated with 2-6 and 2-9, the common rearrangements found in both N and WA-CMS, lead to the amplification of PCR products with both templates at 20 cycles. As the number of cycles increases, PCR products become detectable in the lanes where the template used does not carry the rearrangement. At 35 cycles, all the lanes with the presence of a template in the PCR show an amplicon, except for 4-8 with the N template. The PCR at 25 cycles with 4-13 primers did not work, as a PCR product is readily detectable at 20 cycles. ment carried by the N mitochondrion, induce the production of an amplicon only with the N mitochondrion template at 20 cycles ( Fig. 1). Nevertheless, as the number of cycles increases to 25, the presence of a product starts to be detectable with the WA-CMS template. At 35 cycles, the plateau of amplification is almost reached, with an abundance of 2-1-specific product with the WA-CMS template nearly comparable to the one obtained with the N template (Fig. 1). We can rule out an accidental contamination of the PCR as a cause for these nonspecific amplifications, because the negative control with no DNA template does not produce any amplicon. It is also unlikely that the N and WA-CMS templates contaminated each other during their isolation, because for the specific amplification of 4-8, no PCR product is detectable with the N template even after 35 cycles (Fig. 1). Primers associated with 2-6 and 2-9, the two rearrangements shared by N and WA-CMS, are the only primers producing a PCR product with both templates at 20 cycles, thus supporting the presence of these rearrangements at similar and abundant levels in both mitochondrial genomes (Fig. 1). The substoichiometric amount of rearrangements specific to one genome in the other genome is enough to be detected by PCR amplification but not enough to have been picked up by the coverage depth of the 454 sequencing that was performed.
Because tissue culture has been reported in several instances to induce mitochondrial genome rearrangements (Shirzadegan et al., 1991), we verified that the rearrangements observed in the N and WA-CMS mitochondrial genomes were not caused by the cell cultures used in this study. A similar PCR approach was followed as above, but with DNA templates being total DNA extracted from rice leaves (Supplemental Fig. S5). As described previously, after 20 cycles of amplification, PCR products were detected only when primers specific for a rearrangement were used with templates carrying the rearrangement (Supplemental Fig. S5). Thus, the rearrangements in rice mitochondrial genomes are not induced by in vitro culture. As the number of cycles was increased to 35, a PCR product was observed with templates not harboring the specific rearrangement. The finding of rearrangements specific to one genome in the other genome at substoichiometric levels was also not an artifact due to in vitro culture.

Assembly of N and WA-CMS Mitochondrial Genomes in a Contiguous Sequence
The following guidelines allowed us to assemble the N and WA-CMS mitochondrial genomes into a contiguous sequence. We arbitrarily chose a starting point with the constraint that the end of the sequence should stop at the same point. During the reference sequence assembly, we did not find any point that was not related to another; every sequence bordering a gap was related through rearrangement to another location of the reference sequence. Thus, the physical maps of the rice mtDNAs can be configured as master circles, even though such large circular molecules may not exist in vivo (Bendich, 1996). For the N mitochondrion, we started at the coordinate 47K going downstream toward coordinate 41K, then follow the rearrangement 2-2 that links coordinate 41K to coordinate 170K (Supplemental Fig. S2). From there, the assembly path goes upstream toward coordinate 195K, then it follows rearrangement 2-7, which connects coordinate 195K to coordinate 244K, and sequentially follows the colinear blocks with Nipponbare until the next rearrangement. The ambiguity with that process lies when rearrangements occur in positions that do not contain gaps. At these positions, two alternatives are offered to the assembly path, either to follow the rearrangement or the colinear block with Nipponbare. For example, going downstream from coordinate 421K toward coordinate 410K, the assembly path could follow the rearrangement 2-4 that connects coordinate 410K to coordinate 160K; alternatively, it could go downstream from coordinate 421K to coordinate 391K bypassing the rearrangement 2-4 (Supplemental Fig. S2). The outcome of these alternative roads is the production of several equally likely conformations for the N mitochondrial genome.
We followed this assembly process until all the reference sequences matched by the N reads were included and all the rearrangements detected previously were incorporated in a contiguous sequence. Another constraint put to this system was the choice to favor the conformation presenting the smallest sequence length. A linear representation of one of the conformations for the N mitochondrial genome reconstructed from Nipponbare colinear blocks with rearrangements is given in Figure 2. The resulting sequence is 637 kb long and includes three large duplicated sequences ranging in size from 74 to 115 kb. The longest colinear segment with Nipponbare is 111 kb long, from rearrangement 2-6 coordinate 390K to rearrangement 2-4 coordinate 279K (Fig. 2). We were able to produce another conformation for the N mitochondrion given the constraints stated above (Supplemental Fig. S6). Assuming that the N mitochondrial genome exists as a master circle, this alternative conformation is actually identical to the one presented in Figure 2.
The same assembly strategy with similar constraints was adopted to establish a contiguous sequence for the WA-CMS mitochondrial genome (Fig. 3). The sequence length of the WA-CMS mitochondrial genome is much smaller than N, 402 kb versus 637 kb, respectively. The WA-CMS genome possesses only two duplicated regions of 4 and 32 kb long. Contrary to the N mitochondrial genome, whose three repeats were direct repeats in the same orientation, the WA-CMS repeats are inverted repeats (Fig. 3). Due to the higher number of rearrangements in WA-CMS than in N (15 versus nine) relative to Nipponbare, the longest colinear segment with Nipponbare is only 59 kb, from rearrangement 4-13 coordinate 113K to rearrangement 4-1 coordinate 54K (Fig. 3). This colinear segment is smaller than the one found in N. If we consider the genome complexity as defined by Allen et al. (2007), that is, only one copy of each repeat (more than 0.5 kb) is considered, the differences between N and WA-CMS are much reduced. The genome complexity of WA-CMS, with 364,100 bp, is actually larger than the complexity of N, which is 345,415 bp. As a comparison, the complexity of Nipponbare is in a similar range, with 357,349 bp. An alternative conformation of identical length was obtained for the WA-CMS mitochondrial genome (Supplemental Fig. S7). This alternative conformation results from the inversion of the sequence between the two repeated rearrangements 4-10 ( Fig. 3; Supplemental Fig. S7); contrary to the alternative conformations found for the N mitochondrial genome, the two alternative sequences for WA-CMS are not identical even if the genome is represented as a master circle.
The two contiguous sequences established for the N and WA-CMS mitochondrial genomes served as a template to reassemble the totality of 454 reads obtained for each genome by performing an assembly with SeqMan software. The resulting sequences have been deposited at GenBank (accession nos. JF281153 and JF281154) and used in all the analyses we performed.
Sequence Gains and Losses in N and WA-CMS Are Linked to Rearrangements As we already noticed during the reference sequence assembly, all the sequence gaps in the N and WA-CMS mitochondrial genomes are linked to rearrangements (Supplemental Figs. S2 and S4). These Nipponbare sequence losses amount to 24,065 and 22,791 bp in N and WA-CMS, respectively (counting the losses only once when they are duplicated). These sequence losses in N and WA-CMS represent 6.73% and 6.38%, respectively, of the Nipponbare complexity (24,065/357,349 and 22,791/357,349). A large part of the sequence in Nipponbare that is missing in both N and WA-CMS is shared and maps close to or within the third largest Nipponbare duplicated region (orange blocks in Supplemental Figs. S2 and S4); this common missing sequence is 14,507 bp long and represents 60% and 64% of the Nipponbare sequences that are absent in N and WA-CMS, respectively. That the majority of the Nipponbare missing sequence is identical in N and WA-CMS supports their common origin relative to Nipponbare. Nevertheless, after their separation, the N and WA-CMS mitochondrial genomes experienced more losses, as evidenced by missing sequences specific to either N or WA-CMS (e.g. 195K-197K in the N mitochondrion; Supplemental Fig. S2).
We also looked for sequences in N and WA-CMS that are not found in Nipponbare (i.e. sequences specific to these genomes). For simplicity, we will refer to these sequences as sequence gains, even though it is equally possible that these sequences were present in a common rice ancestor and lost in Nipponbare. As observed for the losses, the majority of the sequence gains were linked to rearrangements (Table I). We Figure 2. Linear representation of the N mitochondrial genome. Colinear blocks from Nipponbare were reassembled according to the rearrangements found in the N mitochondrion. At the end of each block, the Nipponbare coordinates are given. Above the coordinates is given the rearrangement number in italics. 2-6 and 2-9 are rearrangements also found in WA-CMS. 2-4, 2-5, and 2-8 are connecting sequences that do not border gaps, whose coordinates are given in red. These positions, like 304, are duplicated, with one copy being part of a rearrangement while the other is not. Contiguous rearrangements are underlined. The color in Nipponbare blocks is similar to the one in Figure 3 and is taken from the original paper by Notsu et al. (2002). Underneath the contiguous sequence, colored blocks show the three large duplicated segments in the N mitochondrion with their respective sizes. The total length of the N mitochondrial genome is 637 kb and is slightly off here because noncontiguous rearrangements are represented as contiguous rearrangements. [See online article for color version of this figure.] already mentioned that some of the rearrangements found in N and WA-CMS are contiguous between Nipponbare blocks, while others carry extra sequence (Figs. 2 and 3; Supplemental Table S1). These noncontiguous rearrangements are responsible for most of the sequence gains in N and WA-CMS; 27,104 bp of the 28,742 bp, or 94% of the WA-CMS absent from Nipponbare, are carried by the rearrangements (Table I). Similarly, 87% of the N-specific 12,509 bp absent from Nipponbare is found in rearrangements detected in the N genome. The N genome shows smaller sequence gains than the WA-CMS genome, because five out of nine rearrangements in N are noncontiguous while 12 out 15 rearrangements are noncontiguous in WA-CMS. Sequence gains were only counted once when they were located in duplicated sequences. Relative to genome complexities, the gains in N and WA-CMS represent 3.62% and 7.89%, respectively.
Sequences Present in Newly Assembled Genomes and Absent from Nipponbare Are Found in Grass Mitochondrial Genomes and in the Rice Nuclear Genome Most of the sequences specific to the N and WA-CMS genomes but absent in the Nipponbare mitochondrion are found in the mtDNA of other members of the grass family (Poaceae). A total of 68% and 82% of the sequences absent from the Nipponbare mitochondrion but present in WA-CMS and N, respectively, are found in the grass family mtDNAs (Table I). For instance, the 1,145 bp carried by rearrangement 4-12 in the WA-CMS mitochondrial genome and ab-sent from Nipponbare is found not only in the rice N genome (rearrangement 2-3) but also integrally in the giant timber bamboo (Bambusa oldhamii) mitochondrion. A total of 870 bp of this sequence is found in the wheat (Triticum aestivum) mtDNA, while the same 660bp fragment is present in both the eastern gamagrass (Tripsacum dactyloides) and sorghum (Sorghum bicolor) mtDNAs. A smaller fragment of 346 bp from this sequence is found in the teosinte (Zea luxurians) and maize (Zea mays) mitochondrial genomes. This observation suggests that this sequence was found in the mitochondrion of a common ancestor to the grass family and that it was lost partially or totally during speciation. A total of 268 bp of this sequence lies in the nucleus of Nipponbare on chromosome 12, the location of a massive transfer of DNA between the nucleus and the mitochondrion. A total of 46% of the Nipponbare mitochondrial sequence is found on chromosome 12, with a 40-kb region being the longest fragment homologous to mtDNA.
Transfer of DNA between the mitochondrion and the nucleus has been documented in Nipponbare, but whether the direction of the flux of DNA is from mitochondrion to nucleus or vice versa is difficult to establish (Notsu et al., 2002). However, because of the presence of this sequence in many grass mitochondria, it seems likely that it was also present in Nipponbare but lost after transfer to the nucleus. Incidentally, the bamboo mitochondrion not only carries the 1,145-bp sequence from the 4-12 and 2-3 rearrangements that are absent in Nipponbare, but adjacent to it is a sequence homologous to Nipponbare at the 390K coordi- Figure 3. Linear representation of the WA-CMS mitochondrial genome. Colinear blocks from Nipponbare were reassembled according to the rearrangements found in the WA-CMS mitochondrion. At the end of each block, the Nipponbare coordinates are given. Above the coordinates is given the rearrangement number in italics. 2-6 and 2-9 are rearrangements also found in N. 4-2, 4-11, and 4-13 are connecting sequences that do not border gaps, whose coordinates are given in red. These positions, like 215K, are duplicated, with one copy being part of a rearrangement while the other is not. Contiguous rearrangements are underlined. The color in Nipponbare blocks is similar to the one in Figure 4 and is taken from the original paper by Notsu et al. (2002). Underneath the contiguous sequence, colored blocks show the two duplicated segments in the WA-CMS mitochondrion with their respective sizes. The total length of the N mitochondrial genome is 402 kb and is slightly off here because noncontiguous rearrangements are represented as contiguous rearrangements. nate. In other words, the existence of portions of two rearrangements that are found in the WA-CMS and N mitochondria is supported by phylogenetic evidence.
A similar observation can be made for the sequence absent from Nipponbare mtDNA and carried by the rearrangement 2-9, which is shared in common by the N and WA-CMS mitochondrial genomes. This sequence is integrally found in the mitochondria of two rice subspecies, O. sativa indica and Oryza rufipogon (Fujii et al., 2010), but also on chromosome 12 of the Nipponbare nuclear genome. Part of this sequence is also found in the mitochondrion of members of the grass family, such as sorghum and bamboo. In this case, the whole rearrangement (i.e. the sequence not found in Nipponbare and the contiguous sequence homologs to Nipponbare coordinates 274K and 421K [rearrangement 2-9 in Figs. 2 and 3]), is integrally present in the O. sativa indica and O. rufipogon mitochondrial genomes.
The DNA flux between the mitochondrion and the nucleus is illustrated by the large fraction of the sequence absent from the Nipponbare mitochondrion but present in the WA-CMS and N mitochondria and found in the rice nucleus, 50% and 73%, respectively (Table I). Although chloroplasts are also known to exchange DNA with mitochondria (Lonsdale et al., 1983;Notsu et al., 2002), a very small fraction of the sequences absent from Nipponbare but present in the WA-CMS and N mitochondria is found in the chloroplast, 1% and 2%, respectively (Table I). The amount of sequence absent from Nipponbare but present in the WA-CMS and N mitochondria with no homology in the database is rather significant, 17% and 10%, respectively (Table I). Sequence gains in WA-CMS and N overlap: 91% of the sequence absent in Nipponbare but present in the N mitochondrion is also found in WA-CMS, supporting a common lineage to these two mitochondrial genomes.

The Published Assemblies of the 93-11 and PA64S Mitochondrial Genomes Show a Unique and Suspiciously High Level of Sequence Conservation and Synteny
In order to place the amount of sequence gain and loss in the rice mitochondrial genomes in a broader a The absence of a rearrangement means that the sequence is not associated with a rearrangement. b Sequence length is given in bp. Their order of appearance here is according to their coordinates in the WA-CMS and N mitochondria. context, we compared N and WA-CMS not only with Nipponbare but also with 93-11 and PA64S. The Nipponbare mitochondrial genome was the first to be reported in rice and was obtained by sequencing physically contiguous phage clones (Notsu et al., 2002). 93-11 (indica) and PA64S (japonica) were assembled from whole-genome shotgun projects aimed at sequencing the nuclear genomes (Tian et al., 2006). The first striking result is the absence or near absence of missing sequence between Nipponbare, 93-11, and PA64S (Table II). The level of conservation between these three mitochondrial genomes is astonishing; the only minor difference resides in the presence of a 500bp sequence in 93-11 that is absent from Nipponbare and PA64S. The Nipponbare and PA64S mitochondrial genomes do not show any difference in the composition of their sequences in terms of sequence gain or loss. As a reminder, we showed previously that 24 and 23 kb of Nipponbare are missing in N and WA-CMS, respectively. As a comparison point, we included data in Table II from the assembled maize mitochondrial genomes (Allen et al., 2007); NA is missing 1.97% of the sequence present in NB (approximately 10 kb), even though NA and NB are the two maize fertile mitochondrial genomes that are the most similar genomes in the maize study (Allen et al., 2007; Table II).
Another surprising feature of the published 93-11 and PA64S mitochondrial genomes relative to Nipponbare is the total absence of rearrangements between these genomes; the previously reported sequences of these three genomes are totally colinear. Given that we have shown a strong link between sequence gains and losses and rearrangements, this absence of rearrangements between Nipponbare, 93-11, and PA64S is somewhat expected, because of the lack of sequence gains and losses between these three genomes. The published report concludes that the genome sizes for the three lines are very similar, 491, 492, and 491 kb for Nipponbare, 93-11, and PA64S, respectively. The invariance of both the large repeats and the mitochondrial genomic size in 93-11 and PA64S relative to Nipponbare contradicts data gathered from maize, where total genome size varies widely even between NA and NB, from 701 to 570 kb, respectively (Allen et al., 2007).

Conserved Gene Repertoire and DNA Polymorphism
The two newly sequenced mitochondrial genomes contain the same basic repertoire of 55 genes as Nipponbare, which comprises 35 known protein-coding genes, three ribosomal RNAs (5S, 18S, and 26S), and 17 tRNA genes (Notsu et al., 2002). Three genes do not contain a DNA-encoded ATG start codon. For nad1 and nad4L, the DNA-encoded ACG is modified to a start codon by RNA editing (Notsu et al., 2002). The start codon for mat-r, a putative maturase, is encoded by AGA, by comparison with other organisms in the public databases. Three genes do not possess a DNAencoded stop codon; their stop codons are created by RNA editing, from CAA for atp6 and from CGA for ccmFc and atp9 (Notsu et al., 2002). Comparison of the coding sequences for the 35 protein-coding genes revealed 26 single-nucleotide polymorphisms (SNPs) and one insertion/deletion among the rice mitochondrial genomes (Table III). These 26 SNPs can be partitioned into 11 synonymous changes that do not alter the encoded amino acid and 15 nonsynonymous changes that modify the identity of the encoded amino acid. It is unclear why nad6 in Nipponbare exhibits so many polymorphisms relative to nad6 genes in the other rice mitochondrial genomes (Table III). Nevertheless, among the 11 SNPs found in nad6, four are corrected by RNA editing in Nipponbare (Notsu et al., 2002). For instance, the C at position 476 is edited to U, resulting in an amino acid change from Ser to Leu, which is the amino acid encoded in the other rice mitochondrial genomes (Table III). It is known that the conversion of a C to a U in a mitochondrial RNA generally results in a codon encoding an amino acid similar to that found in other plants or microorganisms at the homologous amino acid position (Covello and Gray, 1993). Thus, it is likely that C at position 557 in Nipponbare and PA64S cox3 is edited to U, resulting in a Ser-to-Leu change; Leu is the conserved amino acid found in a wide range of plant COX3 proteins (Supplemental Fig. S8). This particular site was not reported to be edited by Notsu et al. (2002), presumably because it is partially edited and escaped their RNA-editing detection method, which relied upon comparing sequence traces from cDNA and genomic DNA. The C at position 1,249 in rpl2 of N, WA-CMS,  From Notsu et al. (2002). c From Tian et al. (2006).
The Complete Sequence of Two Rice Mitochondrial Genomes 93-11, and PA64S is genomically encoded as a T in Nipponbare and therefore might be subjected to RNA editing (Table III). Eight of the 26 SNPs found in the mitochondrial gene-coding sequence could be corrected by RNA editing. We also found two SNPs in the ribosomal RNAs, rrn18 and rrn26 (Table III). Despite the many rearrangements found in the rice mitochondrial genomes, there is a high level of sequence conservation at the nucleotide level. The range of nucleotide substitution in the mitochondrial genes varies from 0.3 per 10 kb for N versus WA-CMS to 7.2 per 10 kb for Nipponbare versus 93-11 (Table IV). The fact that the lowest nucleotide substitution rate among the five genotypes exists between N and WA-CMS illustrates their close relationship. This is an unanticipated result, given that WA-CMS is thought to have originated from the O. rufipogon/Oryza nivara complex, the ancestor of cultivated rice in Asia (Li et al., 2005). Nevertheless, the WA-CMS lines have been developed from crosses between wild rice or traditional rice varieties (O. nivara, O. rufipogon, Oryza glaberrima, and O. sativa indica) as a maternal parent and earlymatured indica rice varieties (such as Zhen-Shan 97) as the recurrent paternal lines (Li et al., 2007). Therefore, it is possible that N and WA-CMS mitochondrial genomes originate from the same species, O. sativa indica, explaining their observed low nucleotide substitution rate. The highest levels of nucleotide polymorphism in mitochondrial genes are found in comparisons with Nipponbare and are attributable largely to nad6; nad6 in Nipponbare exhibits 11 SNPs with all the other mitochondrial genomes (Table III). Even though some of the polymorphisms carried by the nad6 gene in Nipponbare are corrected by RNA editing (see above), the reason for this level of polymorphism is unclear. A similar observation was made in maize, where the atp4 gene of CMS-T has 10 nucleotide substitutions with the other four cytotypes examined (Allen et al., 2007). Aside from nucleotide substitution rates in mitochondrial genes, which are rather high with Nipponbare or Table III. Substitutions and insertions/deletions within rice mitochondrial genes relative to Nipponbare The position of the SNP is given relative to the start site of each protein-coding gene or mature rRNA. Each SNP is capitalized in the corresponding codon, with the amino acid change in parentheses (only one amino acid is given when the polymorphism is synonymous). I, Identical form to Nipponbare. The underlined C in nad6 at positions 474, 476, 562, and 567 have been reported to be edited in Nipponbare (Notsu et al., 2002 Tian et al. (2006). low between N and WA-CMS, the remaining comparative rates are very similar, 1.5 to 2.1 SNPs per 10 kb (Table IV). The nucleotide substitution rates calculated over the whole mitochondrial genome are higher than the rates in mitochondrial genes. The rate of SNPs per 10 kb varies from 1.8 between 93-11 and PA64S to 10.9 between 93-11 and Nipponbare (Table IV). There is generally a good agreement between the substitution rates calculated over the whole genome and those restricted to the mitochondrial genes; these rates for the most part fit the subspecies origin of the mitochondrial genomes. The second lowest nucleotide substitution rate across the whole genome, 4.7 SNPs per 10 kb, is found between N and WA-CMS (Table  IV). N is more closely related to 93-11, another indica member, than to PA64S and Nipponbare, which belong to japonica rice subspecies, with substitution rates of 5.2, six, and seven SNPs per 10 kb, respectively (Table IV). Similarly, WA-CMS, an indica member, exhibits its highest nucleotide substitution rate in comparison with Nipponbare (10.4 SNPs per 10 kb; Table IV). The largest nucleotide substitution rate was found between Nipponbare and 93-11, which belong to the japonica and indica subspecies, respectively. The surprising result comes from the very low nucleotide substitution rate, 1.8 SNPs per 10 kb, observed between 93-11 and PA64S. PA64S has a composite background, having incorporated genetic materials from all three major cultivated rice subspecies, indica, japonica, and javanica, but its mitochondria are maternally inherited from a japonica ancestor (Tian et al., 2006). Accordingly, PA64S should be more closely related to Nipponbare than to 93-11, an expectation that is not supported by the corresponding nucleotide substitution rates of 9.3 and 1.8 SNPs per 10 kb, respectively (Table IV). In addition, the N and WA-CMS genomes, which show the lowest nucleotide substitution rate for mitochondrial genes, exhibit a substitution rate for the whole genome that is 2.6 times higher than the one found between 93-11 and PA64S.

ORFs Found in Rice Mitochondrial Genomes
We used Artemis software (Rutherford et al., 2000) to identify ORFs in the N and WA-CMS mitochondrial genomes. A threshold of 150 amino acids was chosen in order to be able to compare ORFs in the newly sequenced genomes with the ones previously annotated in Nipponbare (Notsu et al., 2002) and in 93-11 and PA64S (Tian et al., 2006). Notsu et al. (2002) reported 19 ORFs in Nipponbare, among which only 10 were found to be transcribed. We found more ORFs in N and WA-CMS than in Nipponbare, 31 and 19, respectively (Table V). Therefore, we decided to reassess the number of ORFs (more than 150 amino acids) in Nipponbare but also in 93-11 and PA64S, since for these latter, the annotated ORFs were found by homology to Nipponbare. The number of missed ORFs amounts to 10, 13, and 13 for Nipponbare, 93-11, and PA64S, respectively (Table V). Compared with mitochondrial genes, the mitochondrial ORFs present a high level of variability: of the 35 different ORFs identified, only 17 are present in all mitochondrial genomes. The variability in mitochondrial ORFs can take several forms, either presence versus absence or, more often, a length polymorphism (Table V). For instance, orf194 present in Nipponbare, 93-11, and PA64S lies in a region that has been lost in N and WA-CMS; therefore, it is absent from these genomes ( Table V). The most frequent cause of length polymorphism in ORFs is a frameshift, with generally an insertion (or deletion) of one nucleotide. The case of orf117 found in Nipponbare, N, and WA-CMS, while its homolog in 93-11 and PA64S is 159 amino acids long (Table V), is unique, because the frameshift is caused by the insertion in Nipponbare (or, conversely, by deletion in 93-11) of a four-nucleotide sequence. This four-nucleotide sequence is present in a tandem repeat in Nipponbare but only at one location in 93-11. The second most frequent origin of length polymorphism in ORFs is in-frame insertion/deletion (Table V). For instance, nine nucleotides have been inserted in orf176 in Nipponbare and PA64S or, conversely, deleted in its homolog orf173 in N, WA-CMS, and 93-11. Rearrangement can also modify the length of an ORF by fusing on a new sequence; the occurrence of orf161c in N and orf161b in WA-CMS is caused by the rearrangements 2-3 and 4-12, respectively. The most unusual source of length polymorphism is nucleotide substitution: the Glu (GAA) at position 560 in orf682 found in 93-11 and PA64S is modified to a stop codon (TAA) in orf579 found in Nipponbare, N, and WA-CMS.
We compared the number of differences in the pool of ORFs for every pair of mitochondrial genomes in order to estimate their relationships. Similar to the  (Table V). Only one difference was detected between these two genomes; orf173b found in 93-11 possesses three additional amino acids in PA64S. The rest of the comparisons held a more homogeneous range of differences, from eight differences between N and WA-CMS to 15 differences between WA-CMS and PA64S (Table V). The second lowest number of different ORFs was found between N and WA-CMS, thus supporting the close relationship between these two mitochondrial genomes. The largest ORF found in mitochondrial genomes far exceeds the size of the largest mitochondrial gene; it is also one of the most variable in our sampling of rice mitochondrial genomes, since it is 871 amino acids long in Nipponbare, 1,054 amino acids long in N, 93-11, and PA64S, and 1,075 amino acids long in WA-CMS (Table V). This ORF is homologous to rpoB, a chloro-plast RNA polymerase gene. This ORF lies on the largest plastid fragment that was incorporated into the rice mitochondrial genome (Notsu et al., 2002). In maize, the largest ORFs found in the mitochondrial genomes appear to be degenerate copies of the DNA polymerase or RNA polymerase genes derived from mitochondrial plasmids (Allen et al., 2007).

Candidate WA-CMS-Associated ORFs
An important goal of this work was to determine whether a candidate ORF could be identified that might cause the male sterility phenotype of the WA-CMS line. In maize, a comparison of the sequences of five maize mitochondrial genomes, two fertile and three sterile, allowed the identification of two of the three CMS-associated ORFs (Allen et al., 2007). Despite the fact that the identified CMS-associated ORFs do not share sequence homology, they exhibit a common set of features (for review, see Hanson and Bentolila,  orf152a  orf152b  orf152  orf152a  orf152a  orf152b  orf152  orf152b  orf152b  Loss  orf153  orf153  orf153  orf153 Frameshift, in-frame insertion/deletion a From Notsu et al. (2002). ORFs in italics were missed by the authors of the corresponding reports. ORFs in boldface are truncated versions of homologs found in other mitochondrial genomes.
b When several ORFs have the same length, letters follow their location/annotation on the genome. c From Tian et al. (2006). d Wrongfully annotated as orf176.
2004). Known CMS-associated ORFs are often chimeric in structure, arising from recombination between legitimate mitochondrial genes and unknown ORFs or between different genuine mitochondrion-coding regions. In addition, many of the CMS-associated ORFs encode products that carry predicted transmembrane domains, and a number of the encoded proteins have been shown experimentally to be loosely associated with, or integrated into, the inner mitochondrial membrane.
We screened the WA-CMS mitochondrial genome for the presence of specific ORFs that were chimeric and whose products carry predicted transmembrane domains. Artemis software identified 104 ORFs (more than 100 amino acids) in the WA-CMS mitochondrial genome (Supplemental Table S2). The search for these ORFs in Nipponbare and N by TBLASTN, a translated nucleotide database in six frames using a protein query, resulted in the finding of 15 ORFs among the 104 ORFs that were specific to the WA-CMS genome. These 15 WA-CMS-specific ORFs were either absent or present in a truncated form in Nipponbare and N (Supplemental Table S2). A homology search by BLASTN of these 15 ORFs, with a database composed of mitochondrial and plastid genes including rRNA and tRNA, led to the discovery of four chimeric ORFs, orf126, orf127, orf133, and orf200 (Supplemental Table  S3). The encoded products by these chimeric ORFs were submitted to a search for transmembrane domains using the TMHMM server version 2.0 (Krogh et al., 2001). Among the four chimeric ORFs specific to WA-CMS, only orf126 encodes a product that possesses two predicted transmembrane domains (Supplemental Table S3).
The 96 amino acids on the N terminus of ORF126 are identical to the ones found in ORF284, which is present in all the mitochondrial genomes (Table V; Fig. 4). ORF126 and ORF284 share 10 common amino acids with rpl2 and 95 nucleotides in their promoter area upstream of the start codon (Fig. 4). The start codon for ORF126 and ORF284 was arbitrarily chosen to be the first encoded Met in the coding sequence, while the start codon for rpl2 is the second Met and is five codons downstream from the start codon of ORF126 and ORF284 (Fig. 4). The 30 amino acids on the C terminus of ORF126 are only found in this ORF. This sequence with the 121 nucleotides downstream of the stop codon is unique to the WA-CMS mitochondrial genome, as it has no detectable significant similarity with any sequence in the public database. It is noteworthy that this sequence is associated with the rearrangement 4-9 specific to the WA-CMS mitochondrial genome (Table I). The presence of the 30-amino acid unknown reading frame in the C terminus of ORF126 is responsible for the occurrence of a second predicted transmembrane domain, while ORF284 only possesses one predicted transmembrane domain (Fig. 4).
Because Boro-type CMS in rice is caused by orf79 (Akagi et al., 1994), we decided to repeat the same strategy developed to uncover orf126, with a lower threshold for the identification of ORFs from 50 to 100 amino acids. This new search generated the identification of 549 ORFs in WA-CMS, among which 62 were specific to this genome (data not shown). Among these 62 WA-CMS-specific ORFs, orf86d was the only one to be chimeric and to encode for a product with a predicted transmembrane domain. ORF86D is homologous in its N terminus to the 44 amino acids of ORF61, which is found in Nipponbare and N (Fig. 4). A frameshift between orf86d and orf61 creates a predicted transmembrane domain only in the product encoded by the former (Fig. 4). orf86d possesses in its sequence a fragment of 22 nucleotides long that is found in atp1, although in the antisense direction (Fig.  4). Of the two ORFs that are thus candidates for the WA-CMS-associated male sterility phenotype, we favor orf126 over orf86d as the WA-CMS-associated ORF, because it possesses part of the rpl2 promoter and thus is likely to be expressed.
We verified by reverse transcription (RT)-PCR that orf126 was expressed in rice plants carrying the WA-CMS mitochondrial genome both in inflorescence and leaf (Fig. 5, top panel). The presence of an RT-PCR product specific to orf126 correlates completely with the presence of the WA-CMS genome, which carries this orf (compare top and bottom panels in Fig. 5). In the two sets of lines we studied, orf126 was expressed The Complete Sequence of Two Rice Mitochondrial Genomes in WA-CMS lines (Fig. 5, lanes 1 and 5) and in the hybrid, WA-CMS 3 restorer (Fig. 5, lanes 4 and 8). As expected, orf126 was not expressed in inflorescences of plants carrying the N mitochondrial genome. The absence of an orf126 RT-PCR product in restorer lines (Fig. 5, lanes 3 and 7) is not due to the presence of a nuclear restorer gene but to the absence of a WA-CMS mitochondrial genome in these lines, as illustrated by the PCR experiment (Fig. 5, bottom panel). An RNAblot survey showed that the restorer lines possess a mitochondrial genome that is very similar to the N mitochondrial genome (data not shown).
Because plant mitochondrial genomes are nearly entirely transcribed at least at a basal level (Holec et al., 2006), we checked the steady-state level of orf126 transcript in the inflorescence of a WA-CMS rice plant in comparison with other mitochondrial genes. A quantitative RT-PCR experiment demonstrated that the orf126 steady-state level in WA-CMS inflorescence is similar to rps12, an essential gene coding for the small ribosomal protein subunit S12 (Fig. 6). In addition, the level of the orf126 transcript is significantly higher than orf187 and orf241, two ORFs that are conserved among the rice mitochondrial genomes but whose functions are unknown (Table V; Fig. 6).

Short Repeats and Rearrangements
A genome assembly with Nipponbare as a reference genome allowed us to detect overlapping fragments at the site of several rearrangements referred to as contiguous here. Four of the nine rearrangements detected in the N mitochondrion and three of the 15 rearrangements in the WA-CMS mitochondrion are contiguous (Supplemental Table S1). The seven repeat sequences found in Nipponbare fragments that have merged in the N and WA-CMS mitochondria range from five to 39 nucleotides (Supplemental Table S1). We searched for small dispersed repeats (SDRs), defined as sequences of at least 25 bp (but less than 500bp) that are present more than once in the genome, are at least 90% identical in sequence, and are of exactly the same length. Repeats were discovered using RepeatExtractor (Clifton et al., 2004). A total of 116 and 131 SDR families were found in the N and WA-CMS mitochondria, respectively (Supplemental Tables S4 and S5). Each family comprises at least two members and up to seven or eight members for WA-CMS and N, respectively. The largest family was similar in both genomes and contains repeats of 34 nucleotides (Supplemental Tables S4 and S5). The fraction of the genome covered by these SDRs is relatively small in both genomes, since it represents 13,594 bp (3.94%) of N and 16,406 bp (4.51%) of WA-CMS. Some of these SDR families could be arranged in superfamilies based on sequence homology (Supplemental Fig. S9). The SDRs were mapped on the Nipponbare mitochondrial genome and their coordinates compared with the break points of the rearrangements detected in N and WA-CMS. None of the SDRs in the N mitochondrion mapped close to or at the breaking points of the rearrangements observed in N. By contrast, several of the SDRs detected in the WA-CMS mitochondrion were located close to or at the break points of some of the rearrangements found in WA-CMS (Table VI). SDR67 is actually the repeat sequence found to overlap in the rearrangement 4-2 (Supplemental Tables S1 and S5). This sequence is also found eight nucleotides away from the break point of rearrangement 4-1; similarly, four nucleotides separate SDR61 from the break point of rearrangement 4-9 (Table VI). SDR67 was the only SDR detected that is located integrally in both Nipponbare fragments that are involved in the rearrangement. Only parts of SDR46, which is 32 nucleotides long, map to the two Nipponbare break points that are merged in rearrangement 4-7 (Table VI). On the other hand, the complete sequences of SDR85 and SDR115 are found at the break point of only one of the Nipponbare fragments involved in rearrangements 4-3 and 4-8, respectively (Table VI).

Plastid Sequences in Rice Mitochondrial Genomes
A search for plastid homologous sequences in the N, WA-CMS, and Nipponbare mitochondrial genomes was performed by using the Nipponbare chloroplast genome as a query in BLASTN. Most of the plastid fragments we uncovered in Nipponbare are similar to those reported by Notsu et al. (2002), except for four fragments ranging in size from 37 to 864 bp (Table VII). Four small fragments (less than 37 bp) reported in the previous study are absent from our results because of the default threshold used in our BLASTN search. The most striking aspect of this analysis is the very high conservation of plastid fragments across the rice mitochondrial genomes (Table VII). Fragment 7, at 272 bp long, was the only plastid sequence specific to N and WA-CMS that is absent from the Nipponbare mtDNA; all the other plastid sequences were common to the three mitochondrial genomes. These data strongly suggest that the transfer of plastid sequence to the rice mitochondrion predates the separation of the indica and japonica subspecies.
The largest plastid sequence is 6,749 bp and is found in the WA-CMS mitochondrion; this sequence is represented as two fragments in Nipponbare and N, due presumably to a similar deletion of 58 bp in both genomes. The most parsimonious explanation is the incorporation of the whole fragment in the ancestor mitochondrial genome, then a deletion in the Nipponbare and N mitochondria via a rearrangement. Interestingly, there is an imperfect direct short repeat, AGTTC upstream of the 58 bp and AGATTC downstream, that likely served as a recombination site in Nipponbare and N. The plastid sequences found in the rice mitochondrial genomes account for 22,440, 22,714, and 22,824 bp for Nipponbare, N, and WA-CMS, respectively, or 6.28%, 6.58%, and 6.27% of the corresponding genome complexities.

Rice Mitochondrial Sequences in the Nuclear Genome
A well-characterized example of the escape of DNA from the plant mitochondrion is the recent loss in evolutionary time of cox2, encoding the cytochrome oxidase subunit II (Nugent and Palmer, 1991). COXII is encoded by the mitochondrial genome in all flowering plants that have been analyzed except for mung bean (Vigna radiata) and cowpea (Vigna unguiculata), where cox2 is a nuclear gene. More recently, sequencing of Arabidopsis chromosome 2 revealed the presence of a stretch of 270 kb of sequence that is nearly identical to that of the Arabidopsis mitochondrial genome (Lin et al., 1999). Notsu et al.'s (2002) analysis of the Nipponbare mitochondrial genome reported 43 rice nuclear sequences that covered 13.4% (48,060 bp) of the mitochondrial genome. Because the rice nuclear sequence was not complete at the time of that previous study, we decided to reinvestigate the amount of rice nuclear sequence that shows homology to the different rice mitochondrial genomes. A BLAST search was performed using the Nipponbare nuclear sequence as the subject and the three mitochondrial sequences, Nipponbare, N, and WA-CMS, as queries. The first observation is the very similar and large amount of mitochondrial sequence originating from the three mitochondrial genomes present in the rice nuclear DNA; overall, 62% to 63% of the mitochondrial genomes are found in the nuclear genome (Table VIII). More interestingly, the presence of mitochondrial sequences found in the rice nuclear genome is not evenly distributed along the chromosomes (Table VIII). Chromosome 12 has been the recipient of a massive invasion of mtDNA, as about half the mitochondrial genome lies on this particular chromosome. The level of similarity between the mitochondrial sequences and their homologs in the nucleus, referred to nucleomitochondrial sequences hereafter, is very high, suggesting that the DNA transfer might have occurred recently in evolutionary time. For instance, the largest nucleomitochondrial fragment found in chromosome 12 is 40,405 bp long when compared with the Nipponbare mitochondrial genome; only 45 SNPs are found between this fragment and its mitochondrial homolog, resulting in a 99.9% identity. Because of the mitochondrial genome rearrangements that we described above, the largest nucleomitochondrial fragment is 38,285 and 51,989 bp when compared with WA-CMS and N mtDNAs, respectively.
There has been important reshuffling of homologous sequences between the mitochondrial sequences and their nucleomitochondrial counterparts, explaining why the longest nucleomitochondrial fragment on chromosome 12 represents only one-fourth of the mitochondrial area covered by this chromosome. For instance, on chromosome 12 directly contiguous and upstream of the longest nucleomitochondrial frag- Coordinates refer to the Nipponbare mitochondrial genome; only the coordinates of the part of the SDR mapping to or close to the break point are given. Figure 6. Expression of mitochondrial genes in the inflorescence of a WA-CMS rice plant relative to orf126. The expression of the mitochondrial genes was measured by quantitative RT-PCR and normalized with orf126, the candidate WA-CMS-associated gene whose value was arbitrarily fixed to 100. The values are averages from three repeated measurements. * P , 0.05, ** P , 0.01.
The Complete Sequence of Two Rice Mitochondrial Genomes ment, whose homolog maps to coordinates 92K-52K on the Nipponbare mitochondrial genome, lies a fragment whose homolog maps to coordinates 317K-313K on the Nipponbare mitochondrial genome. A reorganization of the nucleomitochondrial fragment on chromosome 2 of Arabidopsis relative to its mitochondrial homolog was similarly observed (Lin et al., 1999).

Sequencing Is an Appropriate Technology to Assemble the Mitochondrial Genome When Coupled to a Reference Genome
The Roche/454 GS FLX utilizes pyrosequencing technology to produce millions of DNA-sequencing 200-to 400-nucleotide reads in a single run. This technology belongs to the second-generation sequencing technology and enabled us to sequence for a rather modest cost two rice mitochondrial genomes. One of the mitochondrial genomes carries the gene responsible for a male sterility phenotype called WA-CMS. This particular CMS has been extensively used in China to produce hybrid rice, but little is known at the molecular level of the causative gene responsible for the CMS. The fertility restorer to WA-CMS, a nuclear suppressor gene able to revert the male sterility phe-notype, has also not been identified. A conventional approach to identify CMS-associated ORFs has relied on northern screening, where transcript profiles of CMS and restored lines are compared and candidate genes for the CMS identified by their modified expression pattern in the presence of a restorer gene (for review, see Hanson and Bentolila, 2004). We tried this strategy by using a set of probes covering 35 proteincoding genes and 10 transcribed ORFs on northern blots containing inflorescence RNA from rice WA-CMS and restored lines, but we did not uncover any candidate for the CMS-associated gene (data not shown). On the other hand, a study in maize was able to identify two known CMS-associated ORFs by comparing the finished shotgun sequences of multiple mitochondrial genomes, two fertile and three male sterile (Allen et al., 2007). However, this study failed to uncover one of the maize male sterility CMS, CMS-C, that still remains unidentified.
The sequencing strategy for the maize study was a whole-genome shotgun method relying on conventional technology and the previous establishment of maps generated from restriction mapping studies (Fauron et al., 1995). Our sequencing strategy, like that for maize, was a whole-genome shotgun, but we did not have any previous knowledge of the genomic architecture of the two newly sequenced genomes.  Notsu et al. (2002). d This fragment is split into fragments 5 and 6 in Nipponbare and N due to a deletion. e The only fragment specific to N and WA-CMS.
Instead, in our assembly effort, we relied on the previous release of the Nipponbare mitochondrial genome (Notsu et al., 2002). Following a set of logical rules, namely integrating all the reads and rearrangements detected with the Nipponbare genome, starting and finishing at the same point, and producing the smallest genome map, we succeeded in establishing a contiguous map for both newly sequenced genomes, N and WA-CMS (Figs. 2 and 3). Some alternative configurations were also produced for both genomes (Supplemental Figs. S6 and S7). In both representations, the N genome possesses three large direct repeats (greater than 70 kb), while the WA-CMS genome appears more compact, with only one large inverted repeat (greater than 30 kb). We attempted to test the validity of the large repeats we defined in our assembly by measuring the distribution of the reads along the genome. One of the tracks in the genome browser we developed represents a histogram of the frequency of the reads for each 10 or 100 nucleotides. However, the results of this analysis were inconclusive.

The Evolution of the Rice Mitochondrial Genome Resembles That of Maize
Our data highlight several common features between the evolution of the rice and maize mitochondrial genomes. Like maize, the rice mitochondrial genomes experienced numerous rearrangements resulting in the interruption of syntenic regions when genomes are aligned. We detected nine rearrangements between N and Nipponbare and 15 between WA-CMS and Nipponbare. The largest contiguous sequence in the pairwise comparisons between Nipponbare, N, and WA-CMS is 111 kb and was found between Nipponbare and N. In maize, even between the two fertile cytotypes, NA and NB, 16 rearrangements occurred, resulting in the largest contiguous region being 161 kb long (Allen et al., 2007). We found evidence that some of the rearrangements were likely to be the result of recombination between short repeats. All the contiguous rearrangements detected in N and WA-CMS involved the presence of a repeat, from five to 39 nucleotides, at the site where the two Nipponbare fragments were joined (Supplemental Table S1). Likewise, approximately half of the rearrangement sites in maize have fragment ends that overlap, but with sizes that range from a single nucleotide to more than 4,500 bp (Allen et al., 2007). The involvement of short repeats in the rearrangement detected in the WA-CMS mitochondrion was also suggested by the mapping of some short dispersed repeats at or close to the rearrangement breaking points (Table VI). These repeats have been found in the genes that fused to produce the chimeric genes responsible for CMS in Petunia (Pruitt and Hanson, 1989) and in the nonchromosomal stripe mutations in maize (Newton et al., 1990). In addition to these phenotype-causing rearrangements, recombination at short repeats has also been reported to cause rearrangements between closely related genomes and the production of gene duplications and pseudogenes in mitochondrial genomes of normal plants. For instance, a decanucleotide repeat lying in the Oenothera rrn26 gene and 7.5 kb downstream was postulated to cause the presence of one of the mitochondrial circular molecules observed in this species (Manna and Brennicke, 1986). Our data strongly support the occurrence of sporadic recombination at short repeats as a way to generate rearrangements between rice mitochondrial genomes. In addition, our data reveal the presence of recombinationally active larger repeats in both N and WA-CMS mitochondrial genomes, resulting in near equimolar amounts of readthrough sequences colinear with the reference sequence and recombined sequences (Supplemental Fig. S3). The orientation of the recombination repeats found in the N and WA-CMS genomes, respectively direct and inverted, is predicted to give rise to subgenomes or to produce The Complete Sequence of Two Rice Mitochondrial Genomes different isomers. These results corroborate the analysis of mitochondrial recombination in Arabidopsis, where large repeated sequences (greater than 1 kb) appear to mediate high-frequency recombination while intermediate repeated sequences (50-500 bp) display evidence of low frequency and asymmetric DNA exchange (for review, see Arrieta-Montiel and Mackenzie, 2011). This latter recombinational activity is controlled by the nuclear gene MSH1 and accounts for most of the diversity of the Arabidopsis mitochondrial genome observed in different accessions (Arrieta-Montiel et al., 2009). We also found numerous sequence gains and losses between Nipponbare, N, and WA-CMS and demonstrated that the majority of these sequences were in the vicinity of the rearrangements. All the Nipponbare missing sequences in N and WA-CMS were bordered by sequences that have rearranged (Supplemental Figs. S2 and S4). Conversely, 94% and 87% of the sequences absent in Nipponbare but present in WA-CMS and N, respectively, were found in the proximity of rearrangements (Table I). The proximity of sequence gains and losses with rearrangements evokes the model of plant mitochondrial genome evolution proposed by Small et al. (1989). In a slight variation of this model, an unusual subgenome resulting from rare recombination via a short repeated sequence recombines with a normal subgenome produced by active recombination via large repeated sequences. Recombination between these two subgenomes results in a larger genome that possesses a novel arrangement with a deletion and a duplication (Hanson and Folkerts, 1992). The amount of sequence gain and loss between Nipponbare, N, and WA-CMS, when quantified in relation to genome complexities, is in the same range as the maize cytotypes (Table II). We found a relationship between the level of complexity of a given genome and the amount of missing sequence of this genome in the other genomes. WA-CMS, which possesses the largest complexity, exhibits the highest amount of missing sequence in N and Nipponbare; conversely, N, with the lowest complexity, shows the lowest amount of missing sequence in the other genomes (Table II). Likewise, CMS-C, the maize cytotype with the lowest complexity, shows the least amount of missing sequence in the other maize cytotypes (Allen et al., 2007).
Since the first report of a 12-kb piece of chloroplast DNA residing on the maize mitochondrial genome by Stern and Lonsdale, (1982), the presence of chloroplast DNA in mitochondrial genomes has been documented in a number of species (e.g. rice, rapeseed [Brassica napus], maize, and Arabidopsis; Unseld et al., 1997;Notsu et al., 2002;Handa, 2003;Allen et al., 2007). The presence of plastid-originated sequence shows a very high level of conservation among the Nipponbare, N, and WA-CMS mitochondrial genomes. Only one fragment of 272 bp was present in N and WA-CMS but absent in Nipponbare; all the other plastid fragments were similar in the three mitochondrial genomes. This result is the only marked difference between rice and maize. In maize, the amount of plastid-originated sequence is much more variable among the cytotypes analyzed (Allen et al., 2007). Moreover, in the comparisons between the fertile and CMS maize mitochondrial genomes, a large proportion of the missing sequences are exogenous, originating either from the plastid genome or from mitochondrial plasmids. Most of the difference between NA and NB, the two fertile maize cytotypes, comes from the presumed loss in the NA genome of a 9.4-kb fragment of plastid origin (Allen et al., 2007).
Despite the rearrangements and sequence gains/ losses, the nucleotide conservation is very high among rice mitochondrial genomes when homologous fragments are compared (Table IV). Only one SNP was detected between the N and WA-CMS mitochondrial genomes in rps1 (Table III), resulting in a rate of 0.3 SNP per 10 kb between these two genomes. This rate is identical to the one calculated between the fertile maize cytotypes NA and NB, which also exhibit only one SNP in the whole set of mitochondrial genes (Allen et al., 2007). When calculated over the whole genome, the rate of SNP per 10 kb is increased compared with the one restricted to the mitochondrial genes because of the selective pressure exerted on the genes (Table IV). The rate of nucleotide substitution over the whole rice genome shows generally higher values than the one reported for maize; as an illustration, the nucleotide substitution rate is 1.6 SNP per 10 kb between NA and NB, the closest cytotypes, and 7.1 SNPs per 10 kb between CMS-S and CMS-T, the most divergent maize cytotypes (Allen et al., 2007). This slight increase in rice nucleotide substitution rate might just reflect a difference in the sampling of the two studies, as the comparison in rice was made between cytotypes belonging to different subspecies, indica and japonica.
The presence at the substoichiometric level of rearrangements specific to one mitochondrial genome in the other genome was supported by a semiquantitative PCR amplification (Fig. 1). The term sublimons has been introduced to describe such low-abundance mtDNA molecules, whose existence was first demonstrated in maize (Small et al., 1987). Small et al. (1987) were able to detect, after prolonged exposure of Southern blots, the presence in the maize N genome of atpA types characteristic of either the CMS-S or CMS-T cytoplasm. It has been proposed that sublimons may be the substrates from which new plant mitochondrial genomic configurations are generated (Hanson and Folkerts, 1992). Our data support this model of mitochondrial evolution; the rearranged genomes carried by sublimons could experience a modification in their abundance and become fixed depending on the nuclear background. It has been reported that different types of novel DNAs in CMS-S revertants were obtained depending on the nuclear background (Small et al., 1988). In addition, the MSH1 nuclear gene in Arabidopsis has been shown to control mitochondrial substoichiometric shifting (Abdelnoor et al., 2003). Interestingly, the presence of a rearranged fragment in a msh mutant with a maternal distorted leaf phenotype was detected at the substoichiometric level in the wild plant (Sakamoto et al., 1996). It has been reported that tissue culture might induce substoichiometric shifting (i.e. that substoichiometric recombinant forms can become predominant; Kanazawa et al., 1994). However, rearrangements observed between the N, WA-CMS, and Nipponbare mitochondrial genomes were not induced by the cell suspension cultures used in this study, because all the tested rearrangements were detected in planta (Supplemental Fig. S5).

A Comparative Analysis of Newly Assembled Rice Mitochondrial Genomes with Previous Reports Suggests a Faulty Assembly for 93-11 and PA64S
A striking result of our analysis is the complete or nearly complete absence of sequence gain/loss between Nipponbare,  genomes were also found to be completely colinear with Nipponbare (Tian et al., 2006), supporting in a specious way our findings of a strong relationship between rearrangements and sequence gains and losses. Another surprising result of our analysis relates to the small number of reported SNPs between PA64S (japonica) and 93-11 (indica), which exhibit 65 nucleotide substitutions over the whole genome, or 1.8 SNP per 10 kb (Table IV). A similar observation was reported by Tian et al. (2006), the group that assembled both genomes, even though the authors counted 96 SNPs between 93-11 and PA64S instead of 65 SNPs. We believe that our count of SNPs is correct, as we used two different methods to count the number of SNPs, one through the use of BLASTN and the other with a procedure in the SeqMan software, and both resulted in the same number of SNPs, 65, between 93-11 and PA64S. The reported low substitution rate between 93-11 and PA64S seems incorrect, because a comparison between N and WA-CMS, the two closely related indica genomes newly assembled in this study, results in the finding of 159 SNPs (Table IV). The apparent closeness of the 93-11 and PA64S mitochondrial genomes was also reflected in a set of ORFs almost identical in the two genomes (Table V). We believe that these features are merely artifacts coming from a faulty assembly.
Previously, Tian et al. (2006) followed an original strategy to assemble the mitochondrial genomes of 93-11 and PA64S. These authors belong to the group that produced the whole-genome shotgun nuclear sequence of these two cultivars. Among the very large amount of raw sequencing data, these authors extracted the mitochondrial sequences through the use of an in silico sieve by BLASTing the sequences against known rice mitochondrial genome sequences. Following this extraction, sequence reads were assembled into a contiguous sequence. There are two risks associated with this strategy. First, we have shown that around 60% of the mitochondrial genome has been transferred to the nucleus (Table VIII). Moreover, the similarity between mitochondrial sequences and their nuclear homologs, herein designated nucleomitochondrial sequences, is very high. For instance, the longest nucleomitochondrial sequence carried by chromosome 12 that is homologous to the Nipponbare mitochondrial genome is around 40 kb and exhibits only 45 SNPs with its mitochondrial homolog, resulting in a 99.9% similarity rate. It is thus impossible to distinguish between mitochondrial sequences and nucleomitochondrial sequences. Second, the configuration of the nucleomitochondrial sequence can differ markedly from the mitochondrial sequence. While the 40-kb nucleomitochondrial fragment, or more accurately its mitochondrial homolog, maps to coordinates 52K-92K, the nucleomitochondrial fragment directly upstream contiguous maps to coordinates 317K-313K. This intense reshuffling of rice mitochondrial sequences between the mitochondrial genome and the nucleus had already been reported (Fig. 2 in Notsu et al., 2002). Not only did the nucleomitochondrial sequences pass through the in silico screen, but their mere presence could blur the detection of rearrangements that are very likely to be present in 93-11 and PA64S. The stoichiometry between the nucleomitochondrial reads and the mitochondrial reads, however, should favor the overrepresentation of the latter because of the sheer number of mitochondria per plant cell. It is thus still uncertain why the published PA64S and 93-11 assemblies are so similar to the Nipponbare mitochondrial genome, showing no rearrangement between themselves or with Nipponbare. In addition to this genomic colinearity, the PA64S and 93-11 mitochondrial genomes are suspiciously similar, exhibiting a very low nucleotide substitution rate for genomes belonging to two different rice subspecies.
A recent study reported the global genomic reorganization of two rice CMS mitochondrial genomes (Fujii et al., 2010). Twelve and 15 rearrangements were detected between Nipponbare and LD-CMS and between Nipponbare and CW-CMS, respectively. Misled by the previous report on 93-11 and PA64S, the authors suggested that genomic rearrangements were dynamic in CMS lines in comparison with rice cultivars. Our results prove that rearrangements are not exclusive to rice CMS mitochondrial genomes but are likely to be widespread in rice, even among closely related genomes like N and WA-CMS.
Identification of orf126 as a Plausible Candidate for the WA-CMS Causative Gene A large increase in rice yield has been achieved in part by using heterosis through the production of hybrid rice. It is estimated that the average yield of hybrid rice is at least 20% more than that of inbred rice, feeding 70 million more people annually. Seventy percent of the hybrid rice in China is hybrids derived from WA-CMS lines (Li et al., 2007). Previous to this report, little molecular data were available on this crucial agronomic trait.
We have identified orf126 as a putative candidate for the WA-CMS-associated gene. orf126 possesses all the characteristics associated with a CMS-causative gene. It is a chimeric ORF that has coopted a genuine promoter and is expressed in the rice inflorescence with a level of steady-state transcript similar to rps12 (Fig. 6). The encoded product is predicted to carry transmembrane domains, and part of its coding sequence is unique to the WA-CMS mitochondrial genome, as it has no homology to any known sequence in the database. The presence of the nuclear restorer gene does not impact the abundance of orf126 RNA in the rice inflorescence (Fig. 5). However, precedence for fertility restoration not affecting the CMS-associated RNA has been documented in the Ogura CMS in radish (Raphanus sativus; Uyttewaal et al., 2008). In situ hybridization experiments showed that the radish restorer had no effect on the accumulation of orf138 mRNA, the Ogura gene that specifies CMS, in young anthers of radish plants. Immunolocalization and immunoprecipitation experiments supported a role for the radish restorer in the translational regulation of the orf138 mRNA (Uyttewaal et al., 2008). More experiments are needed to confirm whether orf126 is the rice WA-CMS-associated gene.
Recently, an unedited orfB transcript specific to the WA-CMS cytoplasm was identified as a putative candidate for inducing the male sterility phenotype (Das et al., 2010). However, it should be noted that this longer and unedited version of the orfB transcript represents only a small fraction of the pool of orfB transcripts found in the sterile lines; the remainder of the transcript is similar in size and editing status to the fertile lines. In addition, only a limited number of segregating progeny (five sterile and 24 fertile plants) were used to demonstrate the cosegregation of editing of the longer orfB transcript and fertility restoration. It is possible, therefore, that editing of the long orfB transcript has no relevance to fertility restoration but happens to be controlled by a gene linked to the fertility-restorer gene.
Another search for a candidate WA-CMS-associated gene was conducted by investigating the mitochondrial genome of Zhen-Shan 97A, a WA-CMS line, through the use of primers based on the Nipponbare sequence (Liu et al., 2007). As shown in this study, there are numerous rearrangements between the Nipponbare and WA-CMS mitochondrial genomes, so that the coverage of the WA-CMS mitochondrial genome in the study by Liu et al. (2007) was incomplete. Indeed, by using the Nipponbare sequence for primer prediction, the authors were unable to amplify three fragments they expected to find.
We have used deep-sequencing technology and successfully assembled plant mitochondrial genomes based on reference sequences. Our data correct some previous misconceptions on rice mitochondrial genome variation; in addition, a candidate gene for the most widely used rice CMS cytoplasm has been identified.

Plant Material
Seeds from the rice (Oryza sativa) N and WA-CMS lines were collected at the Philippines Rice Research Institute and generously provided by Anthony Alfonso. The N cytoplasm line is IR6888B, and the WA-CMS line is the F1 resulting from the cross IR6888A (WA-CMS) 3 IR62161R (restorer). Calli were obtained from seeds according to the protocol followed by Garg et al. (2002). Briefly, mature seeds were dehusked and sterilized in 70% (v/v) ethanol for 2 to 3 min and then transferred into 50% (v/v) Clorox solution for 40 min with gentle shaking. The seeds were rinsed several times with sterile water. The sterilized seeds were then plated for callus induction on Murashige and Skoog medium (Sigma) supplemented with 3.0 mg L 21 2,4-dichlorophenoxyacetic acid, 0.2 mg L 21 6-benzylaminopurine, 300 mg L 21 casein hydrolysate, 30 g L 21 maltose, and 3.0 g L 21 phytagel, pH 5.8, and grown for 21 d at 25°C in the dark. After this initial induction, calli were cut in half in sterile conditions every 2 weeks and replaced on calli-inducing fresh plates in the dark. When calli reached a critical size, they were transferred to liquid culture (same composition as the induction medium without phytagel) to get suspension cells.
Mitochondrial isolation from suspension cultures followed the protocol established for Petunia (Gillman et al., 2007). DNA extraction was performed by a cetyl-trimethyl-ammonium bromide protocol (Fulton et al., 1995). The steps followed to obtain a high-quality mtDNA are shown in Supplemental Figure S10.

Sequencing
Mitochondrial DNA from the maintainer, N cytoplasm, and from WA-CMS was sent to the University of Illinois Roy J. Carver Biotechnology Center (RJCBC) to be subjected to Roche/454 sequencing. One-fourth of a picotiter plate was occupied by both mtDNAs, which had been sheared to fragments 500 to 800 bp in length and ligated to adaptors for amplification and sequencing. Barcodes were added during the library preparation to allow multiplexing. A total of 76,886 reads were obtained for the N mitochondrion, totaling around 16 Mb (15,953,355 bp), averaging approximately 200 nucleotides per read. A total of 48,604 reads among the 76,886 reads matched the Nipponbare mitochondrial genome by BLAST search (e value , 1e 23 ), implying that around 60% of the isolated DNA was from mitochondrial origin for the N cytoplasm. Assuming as a first approximation that the length of the mitochondrial genome in this study is comparable to Nipponbare (490 kb; Notsu et al., 2002), we can estimate the coverage of N mitochondrion sequencing to be around 203 (48,604 3 0.2/490). For the WA-CMS mitochondrion, 85,885 reads were obtained, totaling around 18 Mb (17,673,762 bp), again averaging approximately 200 nucleotides per read. Among these reads, 34,088 reads matched the Nipponbare mitochondrial genome by BLAST search (e value , 1e 23 ), implying that in the case of the WA-CMS mitochondrion, 40% of the isolated DNA was from mitochondrial origin. Nevertheless, the sequencing coverage for the WA-CMS mitochondrion could be estimated to be 143 (34,088 3 0.2/490), which is acceptable for the purpose of assembly.

Genome Assembly
De novo assembly with Newbler was performed on the totality of the sequencing reads for each mitochondrial genome by the RJCBC, the facility that conducted the 454 sequencing. The largest contigs assembled by Newbler were 55 and 43 kb for the N and WA-CMS mitochondrial genomes, respectively. The presence of large repeats in the sequenced plant mitochondrial genomes (Notsu et al., 2002;Allen et al., 2007) clearly hampered a contiguous assembly of the reads of a size comparable to the Nipponbare mitochondrial genome (490 kb), a rice cultivar belonging to the japonica subspecies that has been previously sequenced by conventional methods (Notsu et al., 2002).
Given these unsatisfactory results, we decided to adopt a reference sequence-based assembly by setting a genome browser containing either the indica 93-11 (I) or the japonica Nipponbare (J) mitochondrion genome as a reference and tracks containing the individual reads and the contigs from N and WA-CMS (http://cbsuss03.tc.cornell.edu/cgi-bin/gbrowse/ricemt/l Supplemental Fig. S1). The coordinates when the reference genome is 93-11 (I) are very similar to the coordinates when the reference genome is Nipponbare (J), because the only difference between these two genomes is the presence of a 500-nucleotide insertion in 93-11 that is absent in Nipponbare, a point we discuss thoroughly in "Discussion." Several tracks can be selected at will by the user; a track figures the annotated transcripts, while algn2 and algn4 represent the Newbler contigs assembled by the RJCBC for the N and WA-CMS mtDNAs, respectively. Break2 and break4 represent the reads from N and WA-CMS, respectively, that map only partially at certain positions of the reference genome. We included tracks depicting contigs from the two genomes that we reassembled from the totality of the reads and that map to the reference genome with a different threshold by using BLAST (all contigs for 02 for N, all contigs for 04 for WA-CMS, at 10e 23 , 10e 24 , 10e 25 ). We also added the Newbler contigs for the two mitochondrial genomes that we reassembled from the totality of the reads and that map to the reference genome with different thresholds by using BLAST (strain02 | newbler contigs for N, strain04 | newbler contigs for WA-CMS, at 10e 23 , 10e 25 ). The totality of the reads mapping to a specific position of the reference genome is accessible for both genomes (reads2 and reads4 for N and WA-CMS, respectively). Finally, histograms representing the frequency of the reads are provided on the tracks reads2 or reads4 frequency for N and WA-CMS, respectively.

Annotation
The Nipponbare mitochondrial genome annotation (Notsu et al., 2002) was used to infer the annotation for both indica mitochondrial genomes: tRNA, rRNA, and Nipponbare mitochondrial annotated genes were BLASTed against the two indica sequences. ORFs were identified using Artemis software (Rutherford et al., 2000), which allows the use of a threshold to identify ORFs. In addition to known genes, ORFs of at least 150 codons were annotated in both maintainer and WA-CMS mitochondrial genomes in order to be consistent with the annotated ORFs in Nipponbare. ORFs of at least 150 nucleotides (50 codons or more) were identified in the WA-CMS mitochondrial genome in order to identify candidate CMS-associated ORFs.
Chimeric ORFs were identified by BLASTing all nongenic ORFs against a database containing all the known organelle genes, rRNA, and tRNAs.

Transmembrane Domain Prediction
The prediction of transmembrane helices in proteins encoded by the ORFs was performed using the TMHMM server version 2.0 (Krogh et al., 2001).

Genome Analysis
First, the assembled genomes in this study, the WA-CMS and maintainer mitochondrial genomes, and the rice mitochondrial genomes already sequenced and available in the GenBank database, Nipponbare (Notsu et al., 2002), 93-11, andPA64S (Tian et al., 2006), were analyzed by BLAST comparisons in order to identify the duplicated sequences. Genome complexities defined as described (Allen et al., 2007; i.e. only one copy of each repeat [greater than 0.5 kb] is considered) were then determined for each mitochondrial genome. The alignments between each pair of mitochondrial genomes were performed on the genome complexities by using BLASTN. This procedure allowed us to identify the sequence gains and losses that happen during the rice mitochondrial evolution. Homologous segments for each pair of genomes analyzed were submitted to an analysis by BLAST and SeqMan software (DNASTAR) to determine the number of SNPs.

Data Representation
Linear representations of the WA-CMS and maintainer mitochondrial genomes illustrating the numerous rearrangements with Nipponbare were drawn to scale by hand.

Analysis of Short Repeats
SDRs were defined as sequences of at least 25 bp (but less than 500 bp) that are present more than once in the genome, are at least 90% identical in sequence, and are of exactly the same length. SDRs were identified by using RepeatExtractor, an in-house program developed in the Newton laboratory (Clifton et al., 2004).

Real-Time Quantitative RT-PCR Conditions and Analysis
RNA extracted by a combination of TRIzol and the PureLink RNA minikit (Invitrogen) was cleared from contaminating trace amounts of DNA by using a turbo DNA-free kit (Ambion). Quantification of RNA was performed with a nanodrop spectrophotometer (Nanodrop Technologies). cDNA was produced with SuperScript III reverse transcriptase (Invitrogen) and random decamers (Ambion). Real-time PCRs were followed with a Bio-Rad MyiQ iCycler Single Color RT-PCR detection system using iQ SYBR Green Supermix, containing Taq polymerase, deoxyribonucleotide triphosphates, SYBR Green I, and buffers (Bio-Rad). Reactions were initiated by incubating the samples at 95°C for 3 min to activate Taq polymerase, followed by 40 cycles of 10 s at 95°C and 30 s at 55°C. Melting-curve analysis was performed starting at 55°C with stepwise temperature elevations of 0.5°C every 10 s to check for nonspecific products. PCR primer sequences used to amplify the different genes assayed are listed in Supplemental Table S6. Reactions contained 10 mL of 23 SYBR Green Master Mix reagent (Bio-Rad), 5 mL of cDNA (0.2 ng mL 21 ), and 100 nM of each product-specific primer in a final volume of 20 mL. Data were analyzed using the MyiQ Software system (Bio-Rad); standard curves (from 0.01 to 10 ng with a 103 increment) were used to derive the amount of starting cDNA material for the different genes assayed in the inflorescences of a WA-CMS plant.
Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers JF281153 and JF281154.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Snapshot of the genome browser.
Supplemental Figure S2. Gaps and rearrangements in the N mitochondrial genome.
Supplemental Figure S3. Example of a rearrangement bridging a sequence with no gap.
Supplemental Figure S4. Gaps and rearrangements in the WA-CMS mitochondrial genome.
Supplemental Figure S5. Rearrangements are present in the mitochondrial genome of rice plants and are not caused by tissue culture.
Supplemental Figure S6. Linear representation of an alternate conformation of the N mitochondrial genome.
Supplemental Figure S7. Linear representation of an alternate conformation of the WA-CMS mitochondrial genome.
Supplemental Figure S8. Alignment of COX3 proteins from different organisms surrounding the polymorphism detected in rice.
Supplemental Figure S9. Examples of hyperfamilies of small dispersed repeats found in N and WA-CMS mitochondrial genomes.
Supplemental Figure S10. Flow illustrating the obtention of high quality rice mitochondrial DNA.
Supplemental Table S1. Coordinates of the rearrangements and the gaps found in N and WA-CMS mitochondrial genomes relative to Nipponbare.
Supplemental Table S3. Specific ORFs (.100aa) to WA-CMS that might cause the male sterility phenotype.
Supplemental Table S4. Small dispersed repeats in N mitochondrion.
Supplemental Table S5. Small dispersed repeats in WA-CMS mitochondrion.
Supplemental Table S6. Primers used in this study.