|
|
||||||||
|
First published online April 30, 2004; 10.1104/pp.103.031245 Plant Physiology 135:412-420 (2004) © 2004 American Society of Plant Biologists A Comparison of Rice Chloroplast Genomes1,[w]Institute of Genetics and Developmental Biology (J.T., H.X., X.Z., W.Z., J.Y, H.Y., L.Z.) and Beijing Genomics Institute, Chinese Academy of Sciences, Beijing 101300, China (J.T., H.X., X.Z., S.H., W.T., J.W., J.W., J.Y., H.Y, L.Z.); National Hybrid Rice Research and Development Center, Changsha 410125, China (M.C.); and Hangzhou Genomics Institute, Hangzhou 310007, China (S.H., J.W.)
Using high quality sequence reads extracted from our whole genome shotgun repository, we assembled two chloroplast genome sequences from two rice (Oryza sativa) varieties, one from 93-11 (a typical indica variety) and the other from PA64S (an indica-like variety with maternal origin of japonica), which are both parental varieties of the super-hybrid rice, LYP9. Based on the patterns of high sequence coverage, we partitioned chloroplast sequence variations into two classes, intravarietal and intersubspecific polymorphisms. Intravarietal polymorphisms refer to variations within 93-11 or PA64S. Intersubspecific polymorphisms were identified by comparing the major genotypes of the two subspecies represented by 93-11 and PA64S, respectively. Some of the minor genotypes occurring as intravarietal polymorphisms in one variety existed as major genotypes in the other subspecific variety, thus giving rise to intersubspecific polymorphisms. In our study, we found that the intersubspecific variations of 93-11 (indica) and PA64S (japonica) chloroplast genomes consisted of 72 single nucleotide polymorphisms and 27 insertions or deletions. The intersubspecific polymorphism rates between 93-11 and PA64S were 0.05% for single nucleotide polymorphisms and 0.02% for insertions or deletions, nearly 8 and 10 times lower than their respective nuclear genomes. Based on the total number of nucleotide substitutions between the two chloroplast genomes, we dated the divergence of indica and japonica chloroplast genomes as occurring approximately 86,000 to 200,000 years ago.
The intracellular organelle chloroplast has its own genome that encodes a number of chloroplast-specific components (for review, see Palmer, 1985
Recently, the whole genome shotgun approach has been successfully applied to sequencing nuclear genomes for large eukaryotes, such as Drosophila (Adams et al., 2000
One of the goals in the Super-Hybrid Rice Genome Project carried out at the Beijing Genomics Institute has been to sequence parental genomes of a super-hybrid rice cultivar, Liang-You-Pei-Jiu, or LYP-9 (Yi and Xiao, 2000
Sequence Assemblies and Validation
We reassembled the Nipponbare chloroplast genome using 28,000 reads (about 110 times the length of the chloroplast genome) released from Syngenta Company (referred to as Nipponbare-G). Nipponbare-G (accession no. AY522330 in GenBank) is 134,551 bp long and 26 bp longer than previously reported for Nipponbare-H (Hiratsuka et al., 1989 For each assembly project, we used the sequencing data for PA64S and 93-11 to screen out 52,000 chloroplast sequencing reads (with an average length greater than 540 bp at quality Q20), which are about 250 times of the genome length. The reads were assembled into one major contig in each project based on the quality of our selected raw data. Some smaller contigs were excluded because of either low coverage (i.e. the nuclear equivalent coverage was less than 4 for the entire data set) or less than perfect identity (i.e. <100% over 500 bp and <99% over the entire length) to the main contig. Each genome was finally assembled into one contig, 134,551 bp for PA64S (AY522331) and 134,496 bp for 93-11 (AY522329). The length of PA64S chloroplast genome is the same as that of Nipponbare-G, which is 55 bp longer than that of 93-11. These data are also publicly available at our institutional website (http://www.genomics.org.cn/bgi/rice/main.htm). To validate our three assemblies, we simulated their restriction maps with all 6-bp cutting restriction enzymes. The maps are identical except for those of Nipponbare-H, which showed some length polymorphisms with CCCGGG as the cutting site, including Cfr9I, SmaI, and XmaI. The differences can be attributed to GG and CC deletions at the sequence locations of 98,867 and 116,251 bp, respectively, in the Nipponbare-H assembly compared with 93-11, PA64S, and Nipponbare-G assemblies. We noted that one fragment of 20.8 kb in Nipponbare-H corresponds to three fragments of 17.2 kb, 1.8 kb, and 1.8 kb in the other three assemblies. Furthermore, when we digested chloroplast DNA samples isolated from 93-11, PA64S, and Nipponbare with SmaI, the three predicted restriction fragments observed each differed from the 20.8-kb predicted fragment of the Nipponbare-H assembly (Supplemental Fig. 1, available at www.plantphysiol.org). We concluded that the three chloroplast sequences from 93-11, PA64S, and Nipponbare-G are very similar, as predicted. The Nipponbare-H assembly may be somewhat anomalous as the templates used for sequences may be variants of the Nipponbare chloroplast. Thus, the differences found between our assemblies and the Nipponbare sequence may be attributed to different methods used for the respective sequencing.
Sequence Polymorphisms
The chloroplast genome is believed to be clonal and has its own replication and DNA repair systems. A given cell, such as that of the plant leaf, often contains 400 to 1,600 copies of chloroplast genome (Pyke, 1999 In this study, we did not detect any intervarietal polymorphisms between the PA64S and Nipponbare-G sequences. The PA64S chloroplast genome is a typical japonica variety as predicted from its breeding genealogy and does not appear to have diverged from its common japonica ancestor. The alignments between 93-11 and PA64S/Nipponbare-G indicated that 93-11 is a typical indica variety in agreement with its recorded breeding history. Therefore, the polymorphisms we identified can be regarded as examples of intersubspecific (indica and japonica) polymorphisms.
A total of 72 SNPs, including 30 transitions and 42 transversions, were identified between the 93-11 and PA64S chloroplast genomes. The frequencies of major and minor genotypes at each polymorphic site were categorized (Table I and Supplemental Table I). In general, SNPs in the chloroplast genome occurred at a rate of 5 in 10,000 bases, which is about 8 times lower than that in its nuclear genome (estimated as 0.43%;Yu et al., 2002
In all the SNPs between 93-11 and PA64S chloroplast genomes, only 15 SNPs (about 21%) did not change the GC content, of which 2 SNPs were transversions between G and C, and the other 57 SNPs were related to GC content changes. There were 7 variable sites that involved simple repetitive sequences. Four of them were reverse-complemented sequences, from AGACCAAG, CGTT, TTT, and AAA to CTTGGTCT, AACG, AAA, and TTT, respectively. The number of SNPs in intergenic regions (55 SNPs) was approximately twice that of gene-coding regions (17 SNPs), and several in the gene-coding regions have produced amino acid changes (Table II). Only one hotspot region (GCTT/AAGC) was detected in ORF321 between the 93-11 and PA64S, resulting in one amino acid change, from Leu to Ser. Therefore, it is expected that SNPs between intersubspecific chloroplast sequences may not give rise to significant functional changes among different rice varieties.
Since chloroplast genomes of PA64S and Nipponbare-G are identical with regard to intervarietal polymorphisms, we carefully inspected the intravarietal changes. Almost all of the minor genotypes in PA64S and Nipponbare-G were also found to be the major genotypes in 93-11 assembly or vice versa (Table I and Supplemental Table I). In almost all the cases, we found more intravarietal than intersubspecific SNPs, indicating that only some of intravarietal mutations were fixed gradually and inherited stably among different rice subspecies (Fig. 1). One exceptional site was noted, located at 51,349 bp (positioned in the PA64S sequence). In PA64S, among 142 sequencing reads covering this locus, the major genotype was T (82%) and the minor was C (18%), but this minor genotype (C) was not found in the Nipponbare-G assembly among the 234 sequences we carefully surveyed. One potentially important observation is that the minor genotype frequency at each SNP site in 93-11 is nearly twice as high as that in PA64S and Nipponbare-G (Table I). We are not able to generalize these findings to other indica varieties at the present time but speculate that the indica chloroplast population may be more polymorphic than those of japonica.
InDels among rice chloroplast genomes are quite limited in number compared to SNPs. Only 27 InDels in the 93-11-to-PA64S comparisons were detected. Frequencies of the major and minor genotypes in different varieties are summarized in Table III. The cumulative length differential attributable to the InDels is 55 bp, which is consistent with the total difference in length between the 93-11 and PA64S assemblies. In general, the InDel rate calculated from the 93-11-to-PA64S comparison is about 0.02%, nearly one-half of the SNP rate of chloroplast genomes (0.05%) and about 10 times lower than that of its nuclear counterpart (a genome average of 0.23%; Yu et al., 2002
Experimental Confirmation of Sequence Polymorphisms between Indica and Japonica
Although sequence redundancies (about 250 times the coverage of the total chloroplast genome length) guaranteed high-quality assemblies, we nevertheless conducted experiments to confirm several of the observed variations. Primers were designed (see "Materials and Methods") to amplify the two InDels, D-69 and I-32 (Supplemental Fig. 2), and one hotspot region (located at 62,474 bp of the 93-11 assembly) that harbors the nucleotide changes from AGACCAAG in 93-11 to CTTGGTCT in PA64S. The amplified DNA fragments were subsequently sequenced, and all were verified at the sequence level (Fig. 2). The results demonstrated that two InDels are duplications or deletions of low complexity sequences or simple repeats. The mutation hotspot was also verified as the reverted variation. Among the polymorphisms between the two subspecific chloroplast genomes identified by us, only D-69 was reported previously (Kanno and Hirai, 1993
We attempted to validate some of the InDels found between Nipponbare-H and other japonica varieties. A pair of primers was designed to test the InDel I-15 that represented a 15-bp-deletion (CGAATTCCTATAGTA) located at position 53,857 bp in the Nipponbare-H sequence. This deletion, as well as three other predicted variations, was not found in the indica or japonica varieties used for our experiment, including cultivars of Nipponbare (Fig. 2c and Fig. 3). Exhaustive searches for such a 15-bp-deletion over 550 times the redundant sequences among all raw data traces available did not yield a single relevant sequence too (data not shown).
Segregation analysis of the InDels D-69 and I-32 in the F2 populations from the cross combination of PA64S and 93-11 showed that all individual plants in the F2 generation having PA64S as the maternal parent had the same PCR product band as PA64S, further validating our sequence assemblies (Supplemental Fig. 3). In order to study the distribution of the two large InDels, D-69 and I-32, 27 different cultivars from indica and japonica subspecies were surveyed. The result showed that these InDels were common polymorphisms between indica and japonica subspecies, with only one exception occurring in an indica variety (Supplemental Fig. 4). These polymorphisms were absent in only 1 out of 35 japonica varieties (Supplemental Fig. 5) and 7 out of 27 indica varieties (Supplemental Fig. 6). This result is in accordance with previous reports (Dally and Second, 1990
While it is well known that chloroplast genomes in plant leaf cells are not absolutely homogenous, rigorous confirmation of the nature of their differences requires cloning and sequencing. We have defined a way to study such intravarietal polymorphisms, and we have demonstrated that minor genotypes are detectable at frequencies ranging from a few percents to a few tens of percents. It is noteworthy that a few polymorphic sites were found to have more than one minor genotype. For example, at the polymorphic site located at 51,292 bp in the 93-11 chloroplast genome, among 198 reads surveyed the major genotype is A (85%), and two other minor genotypes were, respectively, T (5%) and G (10%). In addition, each major genotype detected in the intersubspecific chloroplast genome comparisons, such as between 93-11 and PA64S, could also be identified as either major or minor genotypes at the corresponding polymorphic site among the intravarietal variations. Often, the intravarietal minor genotype in one subspecies would be observed as a major genotype in other subspecies, or vice versa. This finding suggests that the minor genotypes are chloroplast in origin rather than results of nuclear or mitochondrial DNA contaminations. For example, one chloroplast genome polymorphic site with A in 93-11 and T in PA64S is found in 93-11 with A as a major genotype and T as a minor genotype. For PA64S, T is a major genotype and A is a minor genotype. Therefore, the frequency at which a minor genotype is detected at a given polymorphic site provides a useful statistical basis for comparing sequence variations among the chloroplast genomes.
The sequence polymorphisms among multiple copies of the chloroplast genomes are inherited maternally as an intravarietal population, but the inheritability of the resultant chloroplast mutations are different from those of endoreduplication that frequently occurs in certain somatic cells of rapidly growing plant tissues (Joubes and Chevalier, 2000
To evaluate the role of inter-genomic gene transfer between the organelle and the nuclear genomes, we surveyed all publicly available rice genomic sequences for the presence of any major and minor genotype sequences found in this study. Of the 99 polymorphisms (including 72 SNPs and 27 InDels) discovered as intersubspecific chloroplast DNA variations, eleven SNPs and seven InDels did not have matching variable sites in the homologous nuclear counterparts. This result suggests that most of the inter-subspecific SNPs and some of the InDels may be quite old, or, alternatively, that recent transfer events between the chloroplast and nuclear genomes have occurred. The chloroplast homologous sequences are nevertheless easily identifiable from the surrounding sequences and higher variation rate coexisting in the nuclear genome. In a previous study of the chloroplast homologous sequences in rice mitochondrial genome, it was reported that a total of sixteen chloroplast sequences (about 22 kb), ranging from 32 bases to 6.8 kb in length were dispersed throughout the mitochondrial genome (490.570 kb; Nakazono and Hirai, 1993
Most of the InDels in chloroplast genomes between two rice subspecies exist as short and simple repetitive sequences in noncoding regions, which, therefore, may not have functional consequences. Furthermore, only a few SNPs in coding sequences cause amino acid changes in the chloroplast encoded proteins. The rate of transversion versus transition in intersubspecific SNPs is different from that in intravarietal SNPs. The rate of transversion versus transition in the intersubspecific SNPs between 93-11 and PA64S is 1.4, and a similar rate is found between chloroplast genomes 93-11 and Nipponbare-G. On the other hand, the rate of transversion versus transition in the 205 intravarietal SNPs of the Nipponbare-G is only 0.4, clearly biased toward transitions and consistent with previously reported results (Alain et al., 2002
Chloroplast genomes diverge at a much different rate than their nuclear genomes. The overall sequence difference between rice subspecific varieties in the nuclear genomes is about 130 times higher than that of the chloroplast (0.12%; Yu et al., 2002
Sequence Assembly and Analysis
High quality sequencing reads were extracted from our whole genome shotgun sequence repository (continual nucleotide length more than 50 bp at Phred value Q20; http://www.genomics.org.cn; Yu et al., 2002
The 93-11 is a typical Oryza sativa cv indica that was bred in Jiangsu Academy of Agricultural Sciences, China (Dai et al., 1997 The japonica varieties used for the experiments are Taibei309, LiJiangXinTuanHeiGu, Lemont, ShenNong1033, DV10, CBB7, DV85, ZhongHua8, C418, JiangNanXiangNuo, E32, DiGu, JingXi17, MiYang46, 02428, Yongjing36, Qiuguang, Wanhui31, Yongjing27, Nongken58, Jiejieqing, Qihongqing, Jin1244-2, Ji86-11, Jia64, Chujing23xuan, Chunjiangnuo3, Huangjinqing, Heizhong, and Manhonggu. The indica varieties are: TeQing, MingHui63, NanJing6, XiaoQingZhan, MoLiZhan, ZhaiYeQing8, Gui630, IR24, IRBB5. B6532-MR-2-5-1, B6582F-MR-14-1-2-3, 1nongzhu, Pin9501, Paozhugu4, Qingzhen8, Hao'an2, Luweidao2, Guisi, Yigenmiao, Dalidao, Youzhidao2, Luweidao, Erkuaigu3, Haonuo2, and Wangdao1. Three javanica varieties are: SR3, C bao, and Dular.
Extraction of chloroplast DNA and restriction digestion were carried out according to published protocols (McCouth et al., 1988
The sequences of the three of primer pairs for the two larger InDels (D-69 and I-32) and one mutation hotspot region (or highly variable region; Ogihara et al., 2002 Sequence data from this article have been deposited with the GenBank data library under accession numbers AY522330, AY522331, and AY522329.
We are grateful to Dr. Qian Qian (Chinese National Center for Rice Improvement) and Syngenta Company (http://www.tmri.org) for kindly supplying the rice materials and the sequencing reads of Nipponbare, respectively. We thank Drs. Gwendolyn Zahner and Lin Wu for critical reading of the manuscript. Received August 1, 2003; returned for revision January 28, 2004; accepted February 10, 2004.
1 This work was supported by project grants from the Chinese Academy of Sciences to J.Y. and H.Y. and by grants from the National Natural Science Foundation of China (90208001) and the Chinese Academy of Sciences (KSCX2SW306) to L.Z.
2 These authors contributed equally to the paper.
[w] The online version of this article contains Web-only data. Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.103.031245. * Corresponding author; e-mail lhzhu{at}genetics.ac.cn; fax 861064873428.
Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al (2000) The genome sequence of Drosophila melanogaster. Science 287: 21852195 Alain V, Denis M, Magali SC, Andrè E (2002) A review on SNP and other types of molecular markers and their use in animal genetics. Genet Sel Evol 34: 275305[CrossRef][Web of Science][Medline]
Carle GF, Frank M, Olson MV (1986) Electrophoretic separations of large DNA molecules by periodic inversion of the electric field. Science 232: 6568 Dai ZY, Zhao BH, Liu XJ (1997) A new medium indica variety with fine quality, high yield and muti-disease resistance. Jiangsu Agricultural Sciences 1: 1314 Dally AM, Second G (1990) Chloroplast DNA diversity in wild and cultivated species of rice (genus Oryza, Section Oryza).Cladistic-mutation and genetic-distance analysis. Theor Appl Genet 80: 209222
De Las Rivas J, Lozano JJ, Ortiz AR (2002) Comparative analysis of chloroplast genomes: functional annotation, genome-based phylogeny, and deduced evolutionary patterns. Genome Res 12: 567583
Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186194
Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92100
Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8: 195202
Gordon D, Desmarais C, Green P (2001) Automated finishing with autofinish. Genome Res 11: 614625 Hiratsuka J, Shimada H, Whittier R, Ishibashi T, Sakamoto M, Mori M, Kondo C, Honji Y, Sun CR, Meng BY (1989) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217: 185194[CrossRef][Web of Science][Medline] Joubes J, Chevalier C (2000) Endoreduplication in higher plants. Plant Mol Biol 43: 735745[CrossRef][Web of Science][Medline] Kanno A, Hirai A (1993) A transcription map of the chloroplast genome from rice (Oryza sativa). Curr Genet 23: 166174[CrossRef][Web of Science][Medline] Kato T, Kaneko T, Sato S, Nakamura Y, Tabata S (2000) Complete structure of the chloroplast genome of a legume, Lotus japonicus. DNA Res 7: 323330[Abstract]
Kohler S, Delwiche CF, Denny PW, Tilney LG, Webster P, Wilson RJ, Palmer JD, Roos DS (1997) A plastid of probable green algal origin in Apicomplexan parasites. Science 275: 14851489 McCouth SR, Kochert G, Yu ZH (1988) Molecular mapping of rice chromosomes. Theor Appl Genet 76: 815829[CrossRef] Morton BR, Clegg MT (1993) A chloroplast DNA mutational hotspot and gene conversion in a noncoding region near rbcL in the grass family (Poaceae). Curr Genet 24: 357365[CrossRef][Web of Science][Medline] Muse SV (2000) Examining rates and patterns of nucleotide substitution in plants. Plant Mol Biol 42: 2543[CrossRef][Web of Science][Medline] Nakazono M, Hirai A (1993) Identification of the entire set of transferred chloroplast DNA sequences in the mitochondrial genome of rice. Mol Gen Genet 236: 341346[CrossRef][Medline] Ogihara Y, Isono K, Kojima T, Endo A, Hanaoka M, Shiina T, Terachi T, Utsugi S, Murata M, Mori N, et al (2002) Structural features of a wheat plastome as revealed by complete sequencing of chloroplast DNA. Mol Genet Genomics 266: 740746[CrossRef][Web of Science][Medline] Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Umesono K, Shiki Y, Takeuchi M, Chang Z, et al (1986) Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322: 572574[CrossRef][Web of Science] Palmer JD (1985) Comparative organization of chloroplast genomes. Annu Rev Genet 19: 325354[CrossRef][Web of Science][Medline]
Pyke KA (1999) Plastid division and development. Plant Cell 11: 549556
Shimada H, Sugiura M (1991) Fine structural features of the chloroplast genome: comparison of the sequenced chloroplast genomes. Nucleic Acids Res 19: 983995 Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, et al (1986) The complete nucleotide sequence of tobacco chloroplast genome: its gene organization and expression. EMBO J 5: 20432049[Web of Science][Medline] Sugiura M (1989) The chloroplast chromosomes in land plants. Annu Rev Cell Biol 5: 5170[CrossRef][Medline] Sun CQ, Wang K, Yoshimura A, Doi K (2002) Genetic differentiation for nuclear, mitochondrial and chloroplast genomes in common wild rice (Oryza rufipogon Griff.) and cultivated rice (Oryza sativa L.). Theor Appl Genet 104: 13351345[Medline] Triboush SO, Danilenko NG, Davydenko OG (1998) A method for isolation of chloroplast DNA and mitochondrial DNA from sunflower. Plant Mol Biol Rep 16: 183189[CrossRef]
Turmel M, Otis C, Lemieux C (1999) The complete chloroplast DNA sequence of the green alga Nephroselmis olivacea: insights into the architecture of ancestral chloroplast genomes. Proc Natl Acad Sci USA 96: 1024810253
Wang J, Wong GK, Ni P, Han Y, Huang X, Zhang J, Ye C, Zhang Y, Hu J, Zhang K, et al (2002) RePS: a sequence assembler that masks exact repeats identified from the shotgun data. Genome Res 12: 824831
Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci USA 84: 90549058 Yi JZ, Xiao WZ (2000) The production technology of the Liang-You-Pei-Jiu (LYPJ). Hybrid Rice 1: 7677 Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al (2001) A draft sequence of the rice (Oryza sativa ssp. indica) genome. Chin Sci Bull 46: 19371942
Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 7992 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|