Conservation of plastid sequences in the plant nuclear genome for millions of years facilitates endosymbiotic evolution.

The nuclear genome of eukaryotes contains large amounts of cytoplasmic organelle DNA (nuclear integrants of organelle DNA [norgs]). The recent sequencing of many mitochondrial and chloroplast genomes has enabled investigation of the potential role of norgs in endosymbiotic evolution. In this article, we describe a new polymerase chain reaction-based method that allows the identification and evolutionary study of recent and older norgs in a range of eukaryotes. We tested this method in the genus Nicotiana and obtained sequences from seven nuclear integrants of plastid DNA (nupts) totaling 25 kb in length. These nupts were estimated to have been transferred 0.033 to 5.81 million years ago. The spectrum of mutations present in the potential protein-coding sequences compared with the noncoding sequences of each nupt revealed that nupts evolve in a nuclear-specific manner and are under neutral evolution. Indels were more frequent in noncoding regions than in potential coding sequences of former chloroplastic DNA, most probably due to the presence of a higher number of homopolymeric sequences. Unexpectedly, some potential protein-coding sequences within the nupts still contained intact open reading frames for up to 5.81 million years. These results suggest that chloroplast genes transferred to the nucleus have in some cases several millions of years to acquire nuclear regulatory elements and become functional. The different factors influencing this time frame and the potential role of nupts in endosymbiotic gene transfer are discussed.

More than a billion years ago, the ancestors of mitochondria and plastids were free-living eubacteria that were sequentially engulfed by a precursor of the nucleated cell (Timmis et al., 2004). Since these two separate endosymbiotic events, organelle DNA has been continuously transferred and integrated into the nucleus. Insertions of organelle DNA are referred to as numts (nuclear integrants of mitochondrial DNA; Lopez et al., 1994) and nupts (nuclear integrants of plastid DNA; Timmis et al., 2004) or collectively as norgs (nuclear integrants of organelle DNA; Leister, 2005). Transfer of mitochondrial and plastid DNA to the nucleus has been shown to occur at a very high frequency (Thorsness and Fox, 1990;Huang et al., 2003;Stegemann et al., 2003;Sheppard et al., 2008). Several molecular mechanisms involved in the integration of mitochondrial DNA into the nucleus have been suggested and are likely to be similar for nupts.
These include the degradation and lysis of mitochondria, the encapsulation of mitochondrial DNA inside the nucleus, the direct physical association between the mitochondria and nucleus with membrane fusions, or the entry and incorporation of mitochondrial DNA into nuclear chromosomes (for review, see Hazkani-Covo et al., 2010). During evolution, this DNA transfer and integration into the nucleus have resulted in the functional relocation of many organelle genes in the nucleus, leading to a massive reduction in the size of organelle genomes compared with those of their freeliving prokaryotic ancestors. Although DNA transfer is frequent, functional gene transfer of organelle genes is very rare since expression requires the acquisition of nuclear gene regulatory elements and, usually, a target peptide if the protein is to replace the organellar equivalent (Stegemann and Bock, 2006;Lloyd and Timmis, 2011). In addition, insertions of organelle DNA have created new genes and nuclear exons encoding parts of novel proteins (Noutsos et al., 2007;Kleine et al., 2009) and novel putative nuclear gene regulatory elements (Knoop and Brennicke, 1991). Analysis of the Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), Chlamydomonas reinhardtii, and Cyanidioschyzon merolae proteomes suggest that about 14% of nuclear genes are of plastid origin (Deusch et al., 2008).
A detailed study of how nupts evolve after their integration into the nucleus is necessary to provide a deeper understanding of the mechanisms by which chloroplast genes become functional in the nucleus.
Studies of the evolution of nupts and numts have been possible for only a few seed plants where nuclear and organelle genomes have been sequenced (Shahmuradov et al., 2003;Richly and Leister 2004;Huang et al., 2005;Matsuo et al., 2005;Noutsos et al., 2005;Guo et al., 2008). These studies were based on sequenced genomes from which contiguous norg sequences may have been excluded from shotgun assemblies because they are difficult or impossible to distinguish from contaminating chloroplast or mitochondrial DNA. In addition, unequivocal studies of norg evolution have been hampered by the paucity of available plastome and mitochondrial genome sequences in earlier studies. In this article, we developed a PCR-derived method that allows the amplification of recent and older norgs by avoiding the amplification of high-copy-number organellar genomes present in total cellular DNA. The method requires the availability of the organellar genome sequences of several closely related species from which primers can be designed to specifically target norgs.
The method was tested in the genus Nicotiana and allowed the isolation of 25 kb of sequence from seven nupts. The origins of the nupts were determined by phylogenetic analyses, and their ages were estimated by comparing each nupt with the related chloroplastic sequences from the native plastome and six other plastomes from other Solanaceae species (Shinozaki et al., 1986;Schmitz-Linneweber et al., 2002;Chung et al., 2006;Daniell et al., 2006;Kahlau et al., 2006;Yukawa et al., 2006). The evolutionary forces acting on the potential protein-coding and noncoding nupt sequences were also analyzed. Finally, we discuss the potential evolutionary role of recent and older nupts (several million years old) containing intact open reading frames (ORFs) with respect to the time frame available for activation in the nucleus.

Specific Amplification of nupt Sequences in
The study of the evolutionary fate and dynamics of norg sequences requires comparison of nuclear sequences with the native cytoplasmic organelle (mitochondrion or chloroplast) genomes and those of closely related species (Hazkani-Covo et al., 2003;Hazkani-Covo et al., 2010). Currently, the plastome sequences of more than a hundred angiosperms are available, most of which were obtained from species of the families Brassicaceae, Fabaceae, Poaceae, and Solanaceae. The Nicotiana genus was chosen for this study because the transfer of DNA from the chloroplast to the nucleus is known to occur at a very high frequency in Nicotiana tabacum (Huang et al., 2003;Stegemann and Bock 2006) and because the plastome sequences of three Nicotiana spp. and four closely related species belonging to the Solanaceae are available.
Comparison of plastome sequences from several closely related species allows identification of recent substitutions and indels. Placing a PCR primer in a region recently deleted in the plastome enables specific targeting of nupts transferred prior to the deletion by avoiding the amplification of the high-copy-number plastomes in total cellular DNA. We identified deletion events larger than 20 bp that were unique to the chloroplast genome of Nicotiana tomentosiformis by aligning the plastome sequence of Nicotiana sylvestris, N. tomentosiformis, N. tabacum, Atropa belladona, and Solanum lycopersicum. Similarly, specific deletions larger than 20 bp were identified in the plastome of N. sylvestris. These deletions observed in N. sylvestris were also present in the N. tabacum plastome. These two species contain the same deletions when compared with other Solanaceae since N. sylvestris was the maternal diploid parent of the allotetraploid N. tabacum, which was formed ,0.2 million years ago (Mya; Clarkson et al., 2005), and the plastomes of these two species only differ by nine nucleotides (Yukawa et al., 2006). Such comparisons revealed seven deletions that were unique to the plastome of N. tomentosiformis (five in the large single-copy [LSC] region and two in the inverted repeat [IR] region) and two that characterized the plastomes of N. sylvestris and N. tabacum (one in the LSC and one in the inverted repeat region; Fig. 1A). All these deletions were flanked by direct repeats of 2 to 24 bp (Fig. 1B). The presence of plastomic deletion events unique to N. sylvestris or to N. tomentosiformis indicates that they occurred after the divergence of these two species about 4.5 to 11 Mya (Wikström et al., 2001;Clarkson et al., 2005). We assumed that, prior to deletion, these regions could have been transferred from the chloroplast to the nucleus where they may still be found. We targeted these hypothetical nupts by positioning one of each primer pair within a deleted region (Fig. 2) to avoid amplifying their strongly related chloroplast sequences. Five deletions (four in N. tomentosiformis and one in N. tabacum) permitted suitable primers to be designed, while four were unsuitable because similar sequences were present in the plastome or because of high AT content.
To validate the method, nupt sequences were amplified from total cellular DNA of N. sylvestris, N. tabacum, and N. tomentosiformis after digestion with the restriction enzyme HpaII, and the products were compared with those from undigested templates (Fig. 2B). HpaII restricts unmethylated plastid DNA, while methylated nuclear genome remains essentially intact (Ayliffe et al., 1998). If the PCR specifically amplifies nupts, there should be no reduction in the intensity of the amplified fragment when HpaII-digested DNA is used as template. Four of the primer pairs (corresponding to three different plastomic deletions) permitted specific amplification of four nupt regions, three in N. tomentosiformis and one in N. tabacum (Fig. 3). The length of each nupt was explored by gradually (500-bp steps) extending the distance to the second primer until no product of the expected size was obtained.

Evolution of Nicotiana nupt Sequences
Five nupts, between 2 and 5 kb, corresponding to three different chloroplastic regions were amplified in N. tomentosiformis, and two, of 4 kb, corresponding to the same chloroplastic region were amplified in N. tabacum (Table I), totaling about 25 kb of nuclear sequence that was similar to the chloroplastic LSC region. While no major rearrangements were present, the nupt sequences showed substitutions and short indels compared with the plastome. Transitions were 2.5 times more frequent than transversions, and 62% of the mutations involved G:C/A:T transitions (Fig. 4). These results are similar to the 2.4 ratio of transitions/ transversions and the 59% of G:C/A:T transitions observed for de novo spontaneous mutations in the Arabidopsis nuclear genome (Ossowski et al., 2010). The proportion of G:C/A:T transitions in the noncoding sequences (45.5%) and in the potential proteincoding sequences (55.5%) was relatively similar. The spectrum of mutation observed in each nupt is consistent with most of the changes having occurred in the nucleus rather than in the plastid.
The 25 kb of nupt sequence contained 32 deletions and 15 insertions compared with their respective plastomes ( Table I). Six of the 32 deleted regions were larger than 5 bp, and all but one were flanked by direct repeats of 2 to 8 bp (Fig. 5), suggesting that replication slippage was responsible (Bzymek and Lovett, 2001). The other 26 deletions involved mainly (77%) single nucleotides within homopolymeric stretches of two to nine nucleotides. We also observed 15 short insertions of 1 to 3 bp. After weighting according to the length of the noncoding and potential coding sequences, we observed that the deletion and insertion events occurred more frequently in noncoding sequences (78% and 79% deletions and insertions, respectively) than in the potential protein-coding sequences (22% and 21%, respectively). In view of this result, the evolution of potential protein-coding and noncoding sequences was examined for any evidence of purifying selection ( Table I). The rate of evolution of both the potential protein-coding and noncoding sequences was as expected for random mutations.
Phylogenetic Relationships between Each nupt and Their Related Solanaceae Chloroplastic Sequences and Estimation of the Age of the Chloroplastic-to-Nucleus Transfer Events Each nupt was aligned with related chloroplastic sequences from seven Solanaceae species, and the Role of Million-Year-Old nupts in Endosymbiotic Evolution origin of the chloroplast-to-nucleus transfer events was determined by phylogenetic analyses using the maximum likelihood and neighbor-joining methods using Olea europea as an outgroup ( Fig. 6; Supplemental Figs. S2-S7). In each phylogenetic tree, the sequences were divided into two clades. One clade always contained the chloroplastic sequences from species belonging to the Solanoideae subfamily (except A. belladonna in Supplemental Fig. S5), while the other clade contained the chloroplastic sequences from the Nicotianoideae subfamily and the nupt sequence. The Nicotianoideae subclade was always divided in two sister clades. One subclade was composed of the N. sylvestris and N. tabacum chloroplastic sequence, and the other always contained the N. tomentosiformis chloroplastic sequence and the nupt sequence, indicating that the latter was transferred to the nucleus after the divergence of N. tomentosiformis and N. sylvestris but before the diagnostic deletions had occurred in the plastome. Using spontaneous mutation accumulation rate of the Arabidopsis nuclear genome (Ossowski et al., 2010), these nupt sequences were estimated to have been inserted into the nucleus between 1.06 and 5.81 Mya (Table II). These results are in accordance with molecular clock analyses that predicted that N. tomentosiformis and N. sylvestris diverged 4.5 to 11 Mya (Wikströ m et al., 2001;Clarkson et al., 2005). Similarly, we estimate that the two nupt sequences obtained from N. tabacum were transferred to the nucleus 0.033 and 4.86 Mya in an ancestor of N. tomentosiformis after the divergence of the N. tomentosiformis and N. sylvestris lineages. The closer relationship of these nupts to N. tomentosiformis chloroplast sequence compared to the N. tabacum and N. sylvestris chloroplast sequences indicates that they were transferred before the polyploidization event (,0.2 Mya) that led to the formation of N. tabacum (Clarkson et al., 2005). The absence of similar nuclear substitutions (type 1) in nupts deriving from the same chloroplastic region (Table II; nupt 1a and 1b in N. tabacum or nupt 4a and 4b in N. tomentosiformis) implies that two independent transfer events account for these nupts, rather than one transfer event and an early duplication in the nuclear genome with a different mutation rate for the two nupt regions.

Evolution of Potential Protein-Coding nupt Sequences
The five nupts obtained in N. tabacum and N. tomentosiformis contained 13 protein-coding regions from eight different plastid genes (atpB, atpE, matK, Figure 2. Example of the method used to specifically amplify nupts in N. tomentosiformis. A, Multiple alignment of part of the plastomes of N. tomentosiformis (NC_007602), N. tabacum (NC_001879), N. sylvestris (NC_007500), A. belladonna (NC_004561), and S. lycopersicum (NC_007898) using Geneious (http://www.geneious.com). This alignment shows a deletion of 47 bp in the N. tomentosiformis plastome that is absent from other Solanaceae plastomes. Assuming this region was present in the plastome of an ancestor of N. tomentosiformis, we searched the extant nuclear genome of this species for a nupt that was transferred before the deletion event. Positioning one primer (here primer 4R) within this deleted chloroplast region allows the avoidance of the amplification of all high-copy-number plastomes and facilitates the amplification of a nupt in N. tomentosiformis that was transferred to the nucleus before the deletion occurred in the chloroplast genome. The other primer was designed to an adjacent sequence that is present in all Solanaceae plastomes. B, Verification of the specific amplification of a nupt sequence in N. tomentosiformis using the primers 4.2F and 4R mentioned in A. Twenty nanograms of N. sylvestris, N. tabacum, and N. tomentosiformis genomic DNA, with or without HpaII digest, was used as template for PCR amplification. HpaII is a methylationsensitive enzyme that will digest the unmethylated plastome but not the methylated nuclear genome. A fainter band was obtained when the N. sylvestris and N. tabacum genomic DNA was restricted with HpaII compared to nondigested genomic DNA. This was expected since this primer pair amplifies the plastome in N. sylvestris or N. tabacum. By contrast, the band obtained when using the N. tomentosiformis genomic DNA restricted with HpaII was stronger than that obtained from the undigested sample, suggesting specific amplification of a nupt sequence. -ve corresponds to an absence of template.
ndhC, ndhJ, ndhK, psbA, and psbB). Within the nupts, eight of the 13 reading frames contained premature stop codons mainly caused by nucleotide substitutions but also due to deletions leading to frameshifts (Table  II). Two other reading frames (matK and psbA) present in the 0.033-million-year-old nupt 1b were identical to the plastome sequences. The reading frame was also maintained in the remaining three protein-coding sequences corresponding to the atpE (nupt 3) and psbB genes (nupt 4a and nupt 4b). These sequences were present in older nupts inserted between 1.06 and 5.81 Mya, but they differ by the presence of nonsynonymous substitutions (NSSs). Even though it is very unlikely that any of these five protein-coding genes is functional, they might be weakly transcribed since the psbA chloroplast promoter has weak nuclear activity that becomes strong if the gene is present in multiple copies (Cornelissen and Vandewiele, 1989;Stegemann and Bock, 2006;Lloyd and Timmis, 2011).
The presence of several substitutions between the nuclear atpE and psbB protein-coding genes and their related chloroplastic sequence facilitated a test for the presence of nuclear transcripts. Since mRNAs of these two genes are highly abundant in the chloroplast, thermostable restriction enzymes, able to cut only the chloroplastic cDNAs, were used during PCR amplification to favor amplification of any nuclear transcripts. Even when using two thermostable restriction en-zymes together, the amplification of chloroplastic transcripts could not be avoided (data not shown); thus, the presence of nuclear transcripts could not be verified. The conceptually translated amino acid sequences of nupt atpE and psbB sequences were compared with those of the related chloroplast genes in five plants ( Fig. 7) to determine if the gene could present the same function after acquiring nuclear gene regulatory elements. The atpE and psbB chloroplastic sequences were retrieved from five plant plastomes that belonged to a moss, a red alga, a green alga, and two land plants. Among these five plants, the psbB chloroplastic sequences (508 to 512 amino acids in length) show only 19.7% of identical sites, whereas the atpE chloroplastic sequences (132 to 141 amino acids) have 68.6% of identical sites, indicating the presence of greater purifying selection in the chloroplast for psbB. Compared with the five chloroplast sequences inspected, the nuclear psbB sequences has four NSSs (nupt 4a) and 13 NSSs (nupt 4b) at sites where no NSS occurred in the five plant plastomic sequences, and the nuclear atpE sequence (nupt 3) has three NSSs at sites where no NSS occurred in the five plant plastomic sequences (Fig. 7). Moreover, the psbB and atpE (nupt 4b) sequences present more NSS than synonymous substitutions and are evolving neutrally, whereas the recent psbB (nupt 4a) sequence presents the same number of NSSs and synonymous substitutions compared with the One nupt obtained from N. tabacum (nupt 1a) contained two large deletions (28 and 80 bp) compared with the related plastomes. By designing a primer spanning the site of the deletion, the orthologous region of that nupt could be specifically amplified in closely related species (Fig. 8A). Using different primer pairs designed to specifically amplify the nupt 1a fragment, products were obtained from N. tomentosiformis but not from N. sylvestris (Fig. 8B). This result is in accordance with the phylogenetic analysis of this nupt that showed that it was transferred from the chloroplast to the nucleus in an ancestor of N. tomentosiformis after it diverged from N. sylvestris and before the polyploidization event that formed N. tabacum. Since N. tomentosiformis is the paternal parent of the allotetraploid N. tabacum, this nupt was transferred to the nucleus in an ancestor of N. tomentosiformis and was then paternally inherited by N. tabacum. In N. tomentosiformis, the entire nupt 1a fragment could not be amplified, so we designed several primers (Fig. 8A) to determine what remains of that nupt. Compared to the 4,207-bp fragment found in N. tabacum, only 3,768 bp could be isolated from N. tomentosiformis. The loss of most of the remaining 439 bp must have occurred after N. tomentosiformis and N. tabacum diverged, most likely due to a deletion or a rearrangement. By comparing the 3,768and 4,207-bp fragments with the related chloroplastic sequences, we observed that the nupt 1a fragment obtained from N. tomentosiformis was orthologous to the N. tabacum nupt since the two nupts shared 139 nucleotide mutations relative to the plastome sequences, while only four substitutions were unique to Table I. Origin and evolution of Nicotiana nupt-containing plastomic protein-coding and noncoding regions The species (N. tabacum [N. tab.] or N. tomentosiformis [N. tom.]) from which the nupts were amplified are shown. The length and the number of substitutions (transitions/transversions [Transi./Transv.]) within each entire nupt sequence and those confined to potential protein-coding or noncoding regions are given. The percentage of substitutions (Subs.) in potential protein-coding regions (CDS) compared with noncoding regions (Non CDS) are given for each nupt and for all the nupts obtained. The deletion (Del.) or insertion lengths present in each potential nupt proteincoding or noncoding regions are shown.  tomentosiformis. The percentage of the frequency of six types of nucleotide substitution in the nupt DNA relative to seven Solanaceae plastomes is presented (e.g. AT/ GC indicates that A or T is present in the chloroplast DNA and G or C is present in the nupt, respectively). Complementary mutations, such as A/G and T/C, are pooled. Error bars indicate SEs of the mean.
the N. tabacum nupt and four to the N. tomentosiformis version. Based on the age of the nupt 1a and on the fact that it was inserted before the N. tabacum polyploidization event, N. tomentosiformis and N. tabacum are estimated to have diverged ,152,000 years ago, which is close to the time of the polyploidization event that led to N. tabacum (,0.2 Mya; Clarkson et al., 2005).

DISCUSSION
Elucidating the mechanisms by which chloroplast genes become functional in the nucleus requires detailed studies of how nupts evolve after their integration into the nucleus. Studies of the evolution of nupts have been possible for a few seed plant genomes that have their nuclear and plastid genomes fully sequenced, mainly Arabidopsis and rice (Shahmuradov et al., 2003;Richly and Leister 2004;Huang et al., 2005;Matsuo et al., 2005;Noutsos et al., 2005;Guo et al., 2008). However, because of the paucity of available plastome sequences at the time of these studies, it was impossible to determine whether the nucleotide differences between the nupt and the related chloroplast sequence were due to mutations that had occurred in the nucleus or whether they reflected differences between the extant and ancient plastomes. Moreover, these studies were based on sequenced genomes from which nupts may have been excluded from shotgun assemblies because they were considered to be contaminating DNA. In this article, we present a PCRderived method that allows the amplification of norgs of various ages and tested it in Nicotiana to examine the fate of nupts after their nuclear insertion. With the increasing number of sequenced chloroplastic and mitochondrial genomes, this method will be a useful approach to the characterization of nupts and numts, especially in species for which the sequencing of the nuclear genome revealed, despite undertaking careful searches, only a few nupts or numts, such as C. reinhardtii, mosquito (Aedes aegypti), and Drosophila melanogaster (Kleine et al., 2009). It will uncover norgs that will vary in age depending on the time at which deletions in the organelle genome occurred.
Seven nupts between 2 and 5 kb in length (25 kb in total), encompassing several chloroplast genes obtained from two different Nicotiana spp., contained substitutions and short indels. These amplified nupts were not rearranged, but it is possible that they belong to larger tight or loose clusters of organelle DNA (Richly and Leister, 2004). These nupts were of different ages ranging from 0.033 to 5.81 million years. The presence of several nupts of different ages that derived from the same plastomic region and that did not share common nuclear nucleotide substitutions indicates that chloroplastic DNA integrates and are fixed continuously into the nuclear genome. The ease of isolating these large and unrearranged nupt fragments in Nicotiana suggests that they are only the tip of an iceberg. From these and previous published results ( Ayliffe and Timmis, 1992), it seems that unrearranged nupt regions can survive longer in Nicotiana than nupts of a similar size in rice since nupts in rice only showed an estimated half-life of 0.5 million years (Matsuo et al., 2005). This difference might be due to the larger nuclear genome of Nicotiana compared with rice, being able to tolerate a higher nupt content (Smith et al., 2011). Alternatively, the presence of older nupts in Nicotiana compared with rice could be explained by the exclusion of some nupts from the assembly of the  Role of Million-Year-Old nupts in Endosymbiotic Evolution rice genome. The retention of large nupt regions for several millions of years was slightly unexpected in Nicotiana since a high proportion of N. tabacum nupts are quickly deleted, even after a few generations (Sheppard and Timmis, 2009). Most of the isolated nupts were obtained from the diploid species N. tomentosiformis, suggesting that the fixation and longevity of nupts are similar in the allotetraploid N. tabacum and its diploid progenitors. One of these nupts was present in two different Nicotiana species but was slightly shorter in N. tomentosiformis than in N. tabacum. The position and number of nupt loci was shown to vary greatly between different diploid maize (Zea mays) lines (Roark et al., 2010); therefore, it will be interesting to test if a similar variation occurs between N. tabacum and its diploid progenitors or if they share many nupts at similar physical locations.
To determine the nature of nupts sequence divergence, the spectrum of mutations observed in the nupts was compared with the spectrum of de novo spontaneous mutations determined by the sequencing of several Arabidopsis nuclear genomes (Ossowski et al., 2010). Our results show that nupts and the Arabidopsis nuclear genome show a similar ratio of each nucleotide substitution class. By assuming that the spectrum of mutations is different between the Nicotiana chloroplast and nuclear genomes and that the Arabidopsis and Nicotiana nuclear genomes evolve similarly, we can conclude that nupts evolve in a nuclear-specific manner, as was presumed from previous studies of norgs (Huang et al., 2005;Schmitz et al., 2005). In the nupt sequences, G:C/A:T transitions that result partly from the deamination of methyl cytosines are overrepresented compared with other classes of substitutions. A similar overabundance of this type of transition is observed in the Arabidopsis nuclear genome, indicating that this type of transition occurs at a similar rate in nupts as in any unselected region of the nuclear genome. In contrast with previous studies (Huang et al., 2005), these results suggest that nupts are not always hypermethylated. These results, together with the observation of a similar number of substitutions in the noncoding and the potential protein-coding sequences, show that these nupts are evolving neutrally (Graur and Li, 2000), consistent with their nonfunctionality. Surprisingly, even though these nupts are under neutral evolution, indels were observed more frequently in noncoding compared to potential proteincoding sequences, but this difference was ascribed to the presence of a higher number of homopolymeric sequences in noncoding regions.
Most of the nupt gene sequences that we obtained contain a premature stop codon due to NSSs and/or deletions. However, a few recent and older nupts contained potential protein-coding sequences with intact ORFs (matK, psbA, atpE, and psbB), in which the original start and stop codons were retained and there were no indels, but substitutions were often present. These chloroplastic ORFs present in nupts have a window of at least several million years to become functional in the nucleus by acquiring appropriate nuclear regulatory elements. The first step of a functionalization Table II. Origin, age and evolution of several Nicotiana nupt regions in comparison with their corresponding chloroplastic protein-coding sequences The species (N. tabacum [N. tab.] or N. tomentosiformis [N. tom.]) from which the nupts were amplified and the approximate age of each nupt are shown. For each nupt, the different potential protein-coding regions, their lengths, and the total number of substitutions are identified by comparison with the corresponding chloroplastic sequences in Nicotiana. The mutations in the chloroplastic protein-coding regions correspond to the number of substitutions that occurred since the Nicotiana species diverged from other members of the Solanaceae. The nature (deletion or NSS) and position of mutations causing the disruption of an ORF within some potential nupt protein coding regions are presented. The number of NSSs is only given for genes that do not present a mutation that causes a frameshift. will be to express the gene by the acquisition of a nuclear promoter and a polyadenylation signal, and the second step will be to obtain a transit peptide if the protein is to be targeted back to the nucleus (Martin and Herrmann, 1998;Bock and Timmis, 2008). From previous studies (Baker and Schatz, 1987;Herman et al., 1990), it seems that the expression of the chloroplastic gene in the nucleus is less likely to occur than the acquisition of a transit peptide (Martin and Herrmann, 1998). The time available to acquire these nuclear regulatory elements will vary depending on the length of the gene, the nature of its specific coding sequence, the importance of various sites for the gene to remain functional, the physical location of the nupt, and chance due to the stochastic nature of random mutations. The number of nuclear regulatory elements that the nupt must acquire during this time frame will also vary between genes because some organelle genes sometimes already contain information for protein targeting into the organelle (Ueda et al., 2008), because a chloroplastic promoter could be immediately transcriptionally active in the nucleus (Cornelissen and Vandewiele, 1989;Lloyd and Timmis, 2011), or because some AT-rich 3# untranslated regions can be used as RNA cleavage and polyadenylation sites (Stegemann and Bock, 2006). After activation in the nucleus, two functional copies will exist in different cellular compartments, and if they are functionally equivalent, then the silencing of one or the other will depend upon chance mutations (Martin and Herrmann, 1998;Adams et al., 1999) and thus will favor the retention of the chloroplastic copy. This is due to the presence of a higher substitution rate in the nucleus compared to the chloroplast genome (Wolfe et al., 1987), the organization of chloroplast genes in operons, the mainly uniparental inheritance of plastids, and the rarity of fusion of plastids (Wicke et al., 2011). However, if a successfully activated chloroplast gene loses its function in the nucleus, many subsequent nupt integrations and activations remain possible. By contrast, if the gene is lost from the chloroplast, the nucleus becomes the permanent location of that gene. This explains why the functional replacement of a chloroplast gene by a nuclear gene has occurred several times independently during the Role of Million-Year-Old nupts in Endosymbiotic Evolution evolution of land plants (Millen et al., 2001;Keeling and Palmer, 2008). It is very unlikely that any of the four genes with an intact ORF in this study could become functional and replace their plastid equivalents since there are currently no known species where this has occurred. However, these nupts are only a small sample, and it is very likely that many nupts contain chloroplast genes with an intact ORF and might become functional later. There still exist some chloroplast genes that can be functionally transferred to the nucleus. For example, chloroplast genes such as accD (Magee et al., 2010), infA (Millen et al., 2001), rpl22 (Gantt et al., 1991), and rpl32 (Cusack and Wolfe, 2007;Ueda et al., 2007) were recently functionally transferred from the chloroplast to the nucleus in some angiosperms (for review, see Rousseau-Gueutin et al., 2011). The number of genes that can be functionally transferred might be larger since analyses of the gene content of all plastomes sequenced so far showed that the ndh, psaI, rps16, rpl23, rpl33, or ycf4 genes were lost in some angiosperms (Magee et al., 2010).
In conclusion, our analysis of Nicotiana nupts shows that some chloroplastic protein-coding sequences can retain an intact ORF for at least several million years in the nucleus. While some genes must remain in the plastome for reasons of maintaining redox balance and are thus presumably cannot relocate (Allen, 1993;Puthiyaveetil et al., 2008), other genes could be functionally transferred to the nucleus in principle but have nonetheless failed to undergo relocation during evolution so far. Our results seem to indicate that the lack of observed relocation for this latter class of genes (relocatable but unrelocated) is not due to the intrinsic difficulty in acquiring nuclear gene regulatory elements before the sequence decay of the ORF. It is possible that the inability of these genes to functionally transfer is due to conservation constraints on the amino acid sequences themselves to maintain function or because the loss of the functional copy of these genes is much more likely to happen in the nucleus than in the chloroplast. However, the continuously high rate of norg transposition ensures further attempts at relocation, accounting for the observed reduction in gene content in the cytoplasmic organellar genomes.

Plant Material, Plant Growth Conditions, and Isolation of DNA or RNA
The diploid Nicotiana spp. (2n = 2x = 24) Nicotiana tomentosiformis and Nicotiana sylvestris and the derived allotetraploid species (2n = 4x = 48) Nicotiana tabacum cv Petite Havana were grown in soil in a controlled environment chamber under 14-h-light/10-h-dark and 25°C day/18°C night conditions. Genomic DNA of each species was isolated from 100 mg of fresh leaves using the DNeasy plant mini kit (Qiagen). Preparation of RNA from N. tomentosiformis was performed using an RNeasy plant mini kit (Qiagen), and genomic DNA contamination was removed using a TURBO DNA free kit (Ambion). Reverse transcription was then performed using an Advantage RT- Figure 8. Amplification in N. tomentosiformis of the orthologous region of the nupt 1a obtained in N. tabacum. A, Schematic representation of the nupt 1a region obtained in N. tabacum and N. tomentosiformis. The locations of protein-coding and noncoding regions are indicated by bars and black lines respectively. Primers used for amplification and for sequencing are represented by solid and dashed arrows, respectively. The SP6 and T7 universal primers used to sequencing the beginning and the end of the nupt region are not shown in this diagram. The white triangle indicates the position of the deletion that happened in the N. tabacum plastome and that permitted the design of a primer that did not amplify the chloroplast genome in this species. Gray triangles indicate the position of the deletions that happened in the N. tabacum nupt 1a. The design of primers (1delF and 1delR) in the sequence spanning the deletion present in the nupt 1a permitted the amplification of an orthologous copy in N. tomentosiformis. B, PCR amplification of nupt 1a region using the 1F + 1adelR, 1adelF + 1.7R, 1adelF + 1.8R, or 1adelF + 1R primers using N. sylvestris, N. tomentosiformis, or N. tabacum. A PCR product was always obtained in N. tabacum, which is the species from which the nupt 1a region was initially found. No PCR product was obtained when using N. sylvestris as a template. A PCR product was obtained in N. tomentosiformis only when using the 1F + 1adelR and 1adelF + 1.7R primer pairs. Since no PCR products were obtained in N. tomentosiformis when using 1adelF + 1.8R or 1adelF + 1R, it is likely that most of the last 439 bp in the 3# end of the nupt 1a is not present in this species due to a deletion or a rearrangement. -ve indicates no template control. The size of the PCR product is indicated on the right.
for-PCR kit (Clontech) with an oligo(dT) primer. All kits were used in accordance with the manufacturers' instructions.

Design of Primer Pairs That Specifically Amplify nupts
The plastome sequences of five Solanaceae species were aligned using Geneious 5.4 (http://www.geneious.com) to detect deletions that occur in the plastome of a Nicotiana spp. in order to design primers that will specifically amplify nupts from that species by avoiding the amplification of high-copynumber plastomes. The specific amplification of nupts was verified by comparing PCR amplifications obtained when total cellular DNA was digested or undigested by the HpaII restriction enzyme. The four primer pairs, out of the 10 tested, that were identified to specifically amplify nupt sequences in N. tabacum and N. tomentosiformis are listed in Supplemental Table S1.

PCR Amplification and Sequencing of Cloned Fragments
The amplification of the each nupt region was performed using the Expand long-range deoxyribonucleotide pack (Roche). All amplifications were carried out in a total volume of 25 mL containing 5 mL of 53 buffer, 1.25 mL of 10 mM deoxyribonucleotide mix, 1 mL of each primer (10 mM), 1.5 mL of 100% DMSO, 0.35 mL of Taq polymerase (5 units/mL), and 20 ng of template DNA. Cycling conditions were 92°C for 3 min, followed by 10 rounds of 92°C for 10 s, 59°C for 15 s, and 68°C (1 min per kb) and 25 rounds of 92°C for 10 s, 59°C for 15 s, and 68°C (1 min per kb + 20 s), and a final extension of 68°C for 5 min. Primers used were 1F and 1R, 2F and 2R, 3F and 3R, and 4F and 4R for the amplification of the nupt 1, 2, 3, and 4 regions, respectively. The PCR products corresponding to the nupt 1, 2, 3, and 4 regions were cleaned (QIAquick PCR purification kit; Qiagen) and cloned into pGEM-T vector (pGEM-T Vector System 1; Promega). Because this polymerase produces blunt ends, an A-tailing reaction was performed using Taq DNA polymerase before ligation. Ligations, transformations, and plating were done according to the protocols provided by the manufacturer. The presence of the expected insert size was determined by colony PCR. All procedures (PCR, ligation, and cloning) were performed twice to minimize biases. Cloned fragments were purified for each nupt using the GenElute Plasmid Miniprep kit (Sigma-Aldrich). Using the single-strand conformation polymorphisms and acrylamide gels (Rousseau-Gueutin et al., 2009), preliminary polymorphism analyses of these amplicons were carried out to identify recombinant plasmids that could represent different cloned fragments (nupt sequences) involving the same chloroplastic region. Finally, four or five cloned fragments were sequenced for each nupt type to detect any errors introduced during amplification. The sequences of the primers used to sequence the cloned fragments are in Supplemental Table S1, and the relative positions of the primers used for the amplification and the sequencing of each nupt region are presented in Figure 3.

Sequence Analyses
The sequences obtained, excluding primer binding sequences, were blasted against N. tabacum mitochondrial genome to verify that they were not of mitochondrial but of nuclear origin. Each nupt sequence was then aligned using Geneious 5.4 (http://www.geneious.com) with the homologous chloroplast sequences from seven species of the Solanaceae (Atropa belladona, NC_004561; N. sylvestris, NC_007500; N. tomentosiformis, NC_007602; N. tabacum, NC_001879; Solanum bulbocastanum, NC_007943; Solanum lycopersicum, NC_007898; and Solanum tuberosum, NC_008096) and one species within the Oleaceae (Olea europaea, NC_013707). The nucleotide divergence between each nupt sequence and the related chloroplastic sequences corresponds to the sum of the mutations accumulated in the chloroplast (type 2 substitution) and in the nuclear genome (type 1 substitution). The mutations observed in the nupt were considered to have arisen in the nuclear genome if no similar nucleotide was present in the corresponding sequence from the chloroplast genomes and also by assuming that plastomes of a given species do not present any sequence variation. Deletions, insertions, and substitutions were identified by visual inspection of the alignments. For all the substitutions, the type of mutation corresponding to the different classes of transitions and transversions was determined. All the potential protein-coding sequences present in each nupt region were translated using Geneious to determine if the ORF contained a premature stop codon due to an NSS or a frameshift. The number of nonsynonymous and synonymous substitutions was determined in the potential protein-coding sequences when possible. For each nupt and for the total nupt sequences, the evolution of potential protein-coding sequences and noncoding sequences was compared to detect the presence of any selection pressure. Finally, the nonsynonymous and synonymous substitutions rates between the N. tomentosiformis atpE chloroplastic sequence and the related potential protein-coding sequence obtained from nupt 3 were computed using the Nei-Gojobori method (Jukes-Cantor) in Mega5 software (Tamura et al., 2011), and the type of selection acting on the nupt atpE sequence was tested using a codon-based Z test of selection. Similar analyses were performed between the N. tomentosiformis psbB chloroplastic sequence and the related potential protein-coding sequences obtained from the nupt 4a or nupt 4b.

Phylogenetic Analyses
The data matrices generated for each nupt region and the related chloroplast sequences from seven Solanaceae species were analyzed using PHYML (Guindon and Gascuel, 2003) and neighbor joining (Saitou and Nei, 1987). Phylogenetic analyses were conducted using Geneious (http://www.geneious. com) with sequences from O. europaea as an outgroup. Bootstrap analyses were performed with 10,000 replicates (Felsenstein, 1985). The choice of model of sequence evolution for the maximum likelihood analyses was performed using Modeltest Web Server (http://darwin.uvigo.es) with default options. Modeltest returned the general time reversible (GTR) model (Tavaré, 1986) as the best-fit model for these data. This matrix was also subjected to phylogenetic analyses using neighbor joining (Saitou and Nei, 1987) with the Tamura-Nei model (Tamura and Nei, 1993).

Estimating the Age of the nupt Sequences
The comparison of our nupt sequence with the related chloroplast sequence from three Nicotiana spp. and four other closely related species belonging to the same family allowed determination of the mutations that occurred in the nucleus (type 1) and of the mutations that reflect differences between the extant and ancient chloroplast genomes. By assuming that the rate of type 1 mutations approximates the nuclear spontaneous mutation rate (7 3 10 29 base substitutions per site per generation) when the nupt sequences are evolving neutrally, we estimated the age of each nupt fragment. This nuclear spontaneous mutation rate was recently obtained in Arabidopsis from the sequencing of the complete nuclear genomes of five individuals derived by 30 generations of single-seed descent from the reference strain Columbia-0 (Ossowski et al., 2010). On the basis of this rate, a 1-million-year-old nupt is expected to accumulate seven nuclear substitutions (type 1 mutations) per kilobase.

Transcriptional Activity of Protein-Coding Genes Present in nupts
Some potential protein-coding genes obtained in three N. tomentosiformis nupts contained neither frameshift nor premature stop codons but had some substitutions compared with the related chloroplast sequences (atpE in nupt 3 and psbB in nupt 4a and 4b). To be able to detect the presence of any transcript from the nuclear atpE gene (nupt 3), we looked for restriction sites for thermostable enzymes that would be present in the chloroplast atpE gene but not in the nupt. The use of such thermostable restriction endonucleases during reverse transcription-PCR would block amplification from chloroplast transcripts but allow potential nuclear atpE transcripts to be amplified. The chloroplast atpE gene contains PvuII and TspRI restriction sites, whereas the atpE sequence of the nupt 3 does not (Supplemental Fig. S1). The N. tomentosiformis cDNA was digested with the PvuII and TspRI thermostable restriction enzymes, and amplification of any potential nuclear atpE transcripts using the atpE-F and atpE-R primers (Supplemental Table S1) was tested.
Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers JN559756 to JN559762.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S2. Phylogenetic tree obtained using the maximum likelihood method (GTR model) with N. tabacum nupt 1a sequence and related chloroplastic sequences from representatives of the Solanaceae.
Supplemental Figure S3. Phylogenetic tree obtained using the maximum likelihood method (GTR model) with N. tomentosiformis and N. tabacum nupt 1a sequence and related chloroplastic sequences from representatives of the Solanaceae.
Supplemental Figure S4. Phylogenetic tree obtained using the maximum likelihood method (GTR model) with N. tabacum nupt 1b sequence and related chloroplastic sequences from representatives of the Solanaceae.
Supplemental Figure S5. Phylogenetic tree obtained using the maximum likelihood method (GTR model) with N. tomentosiformis nupt 2 sequence and related chloroplastic sequences from representatives of the Solanaceae.
Supplemental Figure S6. Phylogenetic tree obtained using the maximum likelihood method (GTR model) with N. tomentosiformis nupt 3 sequence and related chloroplastic sequences from representatives of the Solanaceae.
Supplemental Figure S7. Phylogenetic tree obtained using the maximum likelihood method (GTR model) with N. tomentosiformis nupt 4a sequence and related chloroplastic sequences from representatives of the Solanaceae.