|
|
||||||||
|
First published online October 8, 2008; 10.1104/pp.108.127902 Plant Physiology 148:1740-1759 (2008) © 2008 American Society of Plant Biologists OPEN ACCESS ARTICLE
Differential Accumulation of Retroelements and Diversification of NB-LRR Disease Resistance Genes in Duplicated Regions following Polyploidy in the Ancestor of Soybean1,[W],[OA]Department of Biology, Indiana University, Bloomington, Indiana 47405 (R.W.I., T.A., A.D., S.H., S.M.d.C., M.M., R.P., A.W.); Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108 (C.A.-T., E.C., S.B.C., B.C., R.D., N.D.Y.); Virtual Reality Application Center, Iowa State University, Ames, Iowa 50011 (E.C.); United States Department of Agriculture-Agricultural Research Service and Department of Agronomy, Iowa State University, Ames, Iowa 50011 (S.B.C.); Institut de Biotechnologie des Plantes, UMR CNRS 8618, INRA, Université Paris Sud, 91 405 Orsay, France (N.W.G.C., M.S., V.T., V.G.); Genoscope/Commissariat à l'Energie Atomique-Centre National de Séquençage, 91 057 Evry, France (A.C., S.S., B.S.); Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019 (S.D., H.L., M.O., I.S., J.Y., B.A.R.); L.H. Bailey Hortorium, Department of Plant Biology, Cornell University, Ithaca, New York 14853 (A.N.E., D.I., B.E.P., S.S.-B., J.J.D.); Department of Crop and Soil Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 (N.G., J.M., A.N., M.B.R., D.M.T., M.A.S.M.); Department of Agronomy, Purdue University, West Lafayette, Indiana 47907 (C.S.H., S.J., J.W.); Commonwealth Scientific and Industrial Research Organization Plant Industry, Canberra, Australian Capital Territory 2601, Australia (B.E.P.); and Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211 (M.B.R.)
The genomes of most, if not all, flowering plants have undergone whole genome duplication events during their evolution. The impact of such polyploidy events is poorly understood, as is the fate of most duplicated genes. We sequenced an approximately 1 million-bp region in soybean (Glycine max) centered on the Rpg1-b disease resistance gene and compared this region with a region duplicated 10 to 14 million years ago. These two regions were also compared with homologous regions in several related legume species (a second soybean genotype, Glycine tomentella, Phaseolus vulgaris, and Medicago truncatula), which enabled us to determine how each of the duplicated regions (homoeologues) in soybean has changed following polyploidy. The biggest change was in retroelement content, with homoeologue 2 having expanded to 3-fold the size of homoeologue 1. Despite this accumulation of retroelements, over 77% of the duplicated low-copy genes have been retained in the same order and appear to be functional. This finding contrasts with recent analyses of the maize (Zea mays) genome, in which only about one-third of duplicated genes appear to have been retained over a similar time period. Fluorescent in situ hybridization revealed that the homoeologue 2 region is located very near a centromere. Thus, pericentromeric localization, per se, does not result in a high rate of gene inactivation, despite greatly accelerated retrotransposon accumulation. In contrast to low-copy genes, nucleotide-binding-leucine-rich repeat disease resistance gene clusters have undergone dramatic species/homoeologue-specific duplications and losses, with some evidence for partitioning of subfamilies between homoeologues.
The comparative approach to studying genes and genomes is a powerful method for addressing both fundamental and applied questions in genome evolution (Paterson, 2006
Soybean is an attractive choice for genome evolution studies because it is a major food crop, it is a legume (a large and diverse plant family that is both ecologically and economically important; Doyle and Luckow, 2003
The soybean genome has undergone at least two rounds of whole genome duplication, one estimated to have occurred 10 to 14 million years ago (mya) and a second more ancient event estimated to have occurred 50 to 60 mya (Shoemaker et al., 2006
One of our specific goals was to assess the impact of polyploidy on the evolution of disease resistance genes, which are among the most rapidly evolving and polymorphic genes known in plants (Meyers et al., 1999 These comparisons allowed us to address several fundamental questions relating to NB-LRRs, polyploidy, and genome evolution in legumes. Do rates of gene evolution vary between homoeologous regions? To what extent have genome rearrangements occurred in these homoeologous segments? Does one member of a homoeologous chromosome pair rearrange preferentially, or are the rearrangements evenly distributed? Are some duplicated copies of particular gene classes (e.g. NB-LRRs) preferentially lost following polyploidy? Do NB-LRRs behave differently from other clustered gene families? What impact have retrotransposons had on the evolution of this region, and do homoeologues differ in this regard?
Assembly of Bacterial Artificial Chromosome Contigs for Sequencing
We assembled and sequenced an approximately 1-Mb bacterial artificial chromosome (BAC) contig from soybean cv Williams 82 centered on the Rpg1-b gene (Supplemental Fig. S1; see "Materials and Methods"; Ashfield et al., 1998 Supplemental Figure S1 shows the specific BAC clones that were selected for sequencing. Assembly of these BAC sequences into supercontigs resulted in 553,148 bp of unique sequence in the left contig and 453,942 bp in the right contig. Combined with the gap sequence from the WGS data, this represented 1,064,642 bp in total. This megabase region served as our reference sequence for identifying the homoeologous region(s) in cv Williams 82, both homoeologues in soybean line PI96983 and G. tomentella accession G1403, and the single orthologous region in P. vulgaris accession G19833 and M. truncatula var Jemalong.
To identify these homologous regions, we screened BAC libraries using DNA hybridization probes derived from low-copy protein-coding genes identified in the Williams 82 sequence (Supplemental Table S1; Supplemental Fig. S1). BAC clones that hybridized to two or more probes were then fingerprinted and end sequenced. A combination of fingerprint information, probe hybridization patterns, and end sequence information was used to assemble contigs and identify a minimum tiling path for sequencing. Supplemental Figure S1 shows a physical map of all of the BACs selected for sequencing and the probes that hybridized to each. Note that we identified BAC clones in soybean cv Williams 82 that appear to represent homoeologous regions derived from two different whole genome duplication events (Fig. 1; see below; Shoemaker et al., 2006 Based on probe hybridization patterns, we obtained good coverage of H2 in Williams 82 and the orthologous and homoeologous regions in the other taxa; however, there remained a number of gaps in most BAC contigs (Supplemental Fig. S1). For soybean H2, this was likely due to expansion by retroelement insertions (see below), making it difficult to identify individual BAC clones containing genes homologous to two or more H1 probes. For the other taxa, gaps may be due to the lower depth of the BAC libraries screened (average of eight genome equivalents), the presence of regions that are unstable in E. coli, and/or genomic rearrangements relative to the reference Williams 82 sequence.
To address whether genomic rearrangements were a possible cause of the gaps in the soybean H2 contig, we genetically mapped several BAC clones that spanned H2 from Williams 82 using microsatellite markers (Akkaya et al., 1995 The soybean Williams 82 H2 supercontig (the combination of contigs, individual BACs, and gaps corresponding to this region) contained two gaps (Supplemental Fig. S1). To determine the size of these gaps, we again compared our sequence with the 7x WGS sequence. All three BAC contigs from the Williams 82 H2 region were contained within 7x scaffold 55 of the WGS sequence, providing confirmation for our genetic mapping data. To our surprise, both gaps were very large (gap 1 = 819,550 bp and gap 2 = 835,706 bp). This analysis also revealed that the homology between the H1 contig and the WGS 7x scaffold 55 sequence extended 177,494 bp on the left flank of the H2 supercontig and 730,000 bp on the right flank. Thus, the 1,064,642-bp H1 region corresponds to a region of 3,434,337 in H2, raising the question of whether H1 had lost DNA or H2 had gained DNA.
To determine the origin of the differences in sequence content between soybean homoeologues H1 and H2, we annotated both sequences and compared their gene contents with the homologous region from Phaseolus. BAC sequences were annotated using a semiautomated approach to identify both protein-coding genes and repetitive elements (see "Materials and Methods"). We then aligned BAC contigs based on positions of conserved low-copy genes. Figure 2A and Supplemental Figure S2A show an alignment between homoeologues 1 and 2 of soybean cv Williams 82 and Phaseolus. The comparison with Phaseolus enabled us to infer whether differences between H2 and H1 represented losses or insertions, as any genes shared between Phaseolus and one or both Glycine homoeologues presumably were present in their most recent common ancestor. This alignment revealed that H2 contains many more retroelement insertions than either H1 or Phaseolus, and as a consequence, the low-copy genes have been spread apart in H2. The degree of expansion is not constant along H2, with some regions affected more than others (Fig. 2A). This expansion explains our failure to identify BAC clones that spanned gap 1 and gap 2 in our H2 contig assembly, as our initial criteria for identifying homoeologous BACs required that they contain at least two low-copy genes found on H1.
Neither Phaseolus nor soybean H1 contained the high retrotransposon content observed in soybean H2, suggesting that retrotransposons have accumulated in H2 in the 10 to 14 million years since the divergence of H1 and H2 from their common ancestor. Consistent with this hypothesis, a similar retroelement-mediated expansion of H2 was also observed in G. tomentella (Fig. 2B; Supplemental Fig. S2B); thus, the propensity for H2 to accumulate retroelement insertions was conditioned prior to the separation of soybean and G. tomentella, which occurred 5 to 7 mya (Fig. 1). A diverse collection of retroelements was found in H2 of both soybean and G. tomentella, including copia-like and gypsy-like long terminal repeat (LTR) retrotransposons, as well as LINE elements (Wawrzynski et al., 2008
In both plants and animals, centromeric and pericentromeric regions of the genome are enriched in repetitive elements, including retroelements (Lin et al., 2005
Given the rapid expansion of retrotransposon content in H2 of soybean, we asked whether this would correlate with a more rapid loss of genes duplicated by polyploidy. To estimate gene loss, we first identified all non-NB-LRR genes present in either H1 or H2 that were also present in a syntenic position in Phaseolus (connected by lines in Fig. 2A and Supplemental Fig. S2A), which defined a minimal set of genes present in this region in the ancestral chromosome of H1 and H2. Of these 35 genes, 27 were present in H2, indicating that H2 has lost 23% of its non-NB-LRR genes following polyploidy. Conversely, all 35 genes were retained on H1, indicating that there has been little gene loss from H1 following polyploidy. Thus, gene loss has been strongly biased toward H2, which may be related to the increased retrotransposon activity in this region. Note that there are 13 predicted low-copy genes present only on H1 and over 70 predicted low-copy genes present only on H2 and not in Phaseolus (Supplemental Fig. S2A). We speculate that these represent genes or gene fragments that were transposed into these positions by retroelements or other mobile DNAs. In support of this speculation, the majority of these predicted genes contain open reading frames shorter than 500 bp and are not present in the soybean EST collection; thus, they are unlikely to be functional. We also identified three genes that are conserved in H1 and H2 but are missing from the syntenic position in Phaseolus (indicated by dotted lines in Fig. 2A and Supplemental Fig. S2A). These genes presumably were lost from Phaseolus after Phaseolus and Glycine diverged or, alternatively, they were inserted after this time point and before the divergence of H1 and H2.
To confirm that H1 and H2 were indeed derived from the most recent whole genome duplication event, we analyzed nucleotide substitution rates at silent sites (Ks) for 15 low-copy genes spread over the entire aligned region (indicated by letters A–O in Fig. 2A; described in Supplemental Table S3). When averaged over multiple homoeologous gene pairs, such Ks analyses provide an approximate age of genome duplication events (Lynch and Conery, 2000
The variation in Ks values between individual gene pairs implies that substitution rates for individual genes vary considerably, even for genes in the same genomic region. To test whether gene conversion might account for this variation, and to determine whether the two homoeologues were undergoing substitutions at the same rate, we compared each homoeologue pair with its common orthologue in Phaseolus. Ks analysis revealed nearly identical substitution rates for H1 and H2, with mean Ks values of 0.260 ± 0.082 for the H1:Phaseolus comparison and 0.256 ± 0.067 for the H2:Phaseolus comparison (n = 13 genes for both; Table I; Supplemental Table S2). To rule out gene conversion events between homoeologues, which would reduce the apparent differences between H1 and H2, we calculated the ratio of the H1:H2 Ks value to the H1:Phaseolus Ks value for each low-copy gene. In the absence of gene conversion, this ratio should be roughly the same for all genes, assuming relatively constant substitution rates for individual genes subsequent to the divergence between soybean and Phaseolus. A gene conversion event between H1 and H2 would artificially lower the Ks value for the H1:H2 comparison relative to the H1:Phaseolus comparison. As shown in Supplemental Table S2, 11 of 13 genes showed similar ratios (range, 0.36–0.62), indicating that gene conversion is not a cause of the gene-to-gene variation in substitution rates. Only gene H showed an unusually low ratio (0.19), while gene D showed a high ratio (0.90). The mean ratio for all 13 genes was 0.49 ± 0.16, consistent with the estimated divergence dates of H1 and H2 (approximately 10 mya) compared with Phaseolus and Glycine (approximately 20 mya). Because homoeologues have retained nearly identical substitution rates (see phylogenetic analysis below), it suggests that the gene-to-gene variation in substitution rate is a property of the gene that has been maintained after polyploidy, and is not strongly influenced by proximity to the centromere or other external factors. The conservation of the majority of homoeologous gene duplicates implies that both copies of each homoeologous gene pair continue to function and are under selection (Fig. 2A). This inference was confirmed by analysis of nonsynonymous to synonymous nucleotide substitution ratios (Ka/Ks), which had a mean of 0.31 ± 0.19 for the 15 gene pairs compared (Supplemental Table S2), indicating that these gene duplicates remain under purifying selection. This conclusion was further supported by the identification of ESTs derived from both members of 23 of 34 homoeologous gene pairs analyzed (Supplemental Table S4; six pairs had no identifiable ESTs for either gene), demonstrating that both the H1 and H2 gene copies are transcribed. Thus, while more gene loss has occurred from H2 than from H1, proximity to a centromere and accumulation of repetitive elements on H2 have not caused the inactivation and/or loss of most low-copy genes on H2, nor has it led to pronounced diversifying selection acting on homoeologous genes.
To evaluate the evolutionary history of low-copy genes in the sequenced regions more carefully, we estimated phylogenies for 15 low-copy genes conserved between soybean H1 and H2, indicated by letters A through O in Figure 2A and Supplemental Figure S2A (see Supplemental Table S3 for individual gene names and descriptions). Maximum parsimony (MP), maximum likelihood (ML), and Bayesian inference (BI) phylogenetic analyses were conducted on all homologues of these 15 low-copy genes (Supplemental Fig. S4). Data sets included all taxa and homoeologous segments, where possible, ranging from 6 to 13 gene sequences per analysis. Phylogenies for each gene were mostly congruent across phylogenetic method, with the exception of genes C and D, for which MP weakly disagreed with ML and BI. All 15 gene phylogenies were consistent with the expected relationships among genes from Glycine homoeologues, Phaseolus, and Medicago (Fig. 1; Supplemental Fig. S4), thus corroborating the assignment of BACs to homoeologues in Glycine. These trees also confirmed that low-copy genes on H2 are evolving at roughly the same rate as their homoeologues on H1. Tajima relative rate tests using all sites showed that the branch lengths subtending H1 and H2 did not differ significantly in length for 13 of 15 low-copy genes (Kumar et al., 2004 These phylogenetic analyses also revealed lineage-specific expansion and loss of low-copy genes between both species and homoeologues. For example, gene M, which belongs to the root nodulin MtN21 protein family, has been tandemly duplicated in Phaseolus. This event took place after the split between Phaseolus and Glycine, as the two Phaseolus copies are positioned sister to each other in the phylogenetic tree (Supplemental Fig. S4). Similarly, gene E (encoding a member of the 12-oxophytodienoate reductase protein family) has been duplicated independently in Phaseolus and in H1 of G. tomentella (Gtd H1). The Gtd H2 homoeologue of this gene has been mostly lost, with only a small part (16%) of the gene remaining. The presence of multiple transposable elements surrounding this gene is suggestive of transposable element-mediated gene loss through recombination. In addition to gene loss possibly mediated by transposable element insertions, pseudogenization is evident in H2. The Gtd H2 homoeologue of gene J exhibits premature stop codons, suggesting a loss of gene function and transition to a pseudogene. The release of this gene from selective constraints is apparent from its inflated branch length relative to its soybean homologues (Supplemental Fig. S4), implying a higher rate of molecular evolution after speciation. This is also reflected in Ka/Ks ratios. The average Ka/Ks ratio of all pairwise gene J comparisons not involving the Gtd H2 gene J (0.387 ± 0.0075) is statistically different (t test, P < 0.0001) and smaller than the average of comparisons involving Gtd H2 gene J (0.624 ± 0.0307), implying less selective constraint on Gtd H2 gene J.
Although collinearity between H1 and H2 of soybean is quite good for low-copy genes (Fig. 2B; Supplemental Fig. S2B), this is not the case for most NB-LRR genes. NB-LRR genes are commonly grouped into two major phylogenetic subclasses, those containing an N-terminal domain homologous to the Toll and Interleukin-1 receptor (TIR) and those lacking a TIR domain (non-TIR; Meyers et al., 1999 In contrast to the left half of the alignment between soybean H1 and H2, in the right half of the alignment H2 contains more non-TIR-NB-LRR genes than H1 (16 copies in H2 versus seven in H1). Although there are a few conserved low-copy genes scattered through this region, the relative positions of the non-TIR-NB-LRR genes do not appear to be conserved in relation to these low-copy genes. The lack of collinearity of the non-TIR-NB-LRRs and their differences in copy number argue for relatively frequent gene gains and losses, which is investigated further below.
Alignment of the soybean H1 and H2 regions with the orthologous regions of G. tomentella revealed a high level of conservation of low-copy genes but major differences in NB-LRR content and retroelement content as well as differences in copy number of a family of protein kinases on H1 (Fig. 2B). Low-copy gene order on G. tomentella H1 is nearly identical to that of soybean. Analysis of 15 conserved gene pairs gave a mean Ks value of 0.064 ± 0.034, consistent with a divergence time of 5 to 7 mya for these two species (Supplemental Table S2). Interestingly, there are examples of low-copy gene loss unique to G. tomentella H2 not observed in soybean H2, indicating that homoeologue-specific gene loss has continued subsequent to the divergence of these two species, albeit at a slow rate.
As described above, G. tomentella H2 has accumulated a large number of retroelements compared with H1, similar to what was observed in soybean. However, the precise locations of retroelement insertions in G. tomentella differ from those in soybean, with the distances between any two low-copy genes varying substantially between these two species. Consistent with this, many of the retroelements in G. tomentella have intact LTRs and represent insertions that occurred within the last 4 million years (Wawrzynski et al., 2008
We also compared soybean cv Williams 82 with a second accession of soybean, PI96983, which was selected because it differs from Williams 82 functionally at several disease resistance loci mapped to the Rpg1 region (Ashfield et al., 1998
We also compared the soybean cv Williams 82 H1 and H2 sequences with the recently released M. truncatula genome sequence. Medicago diverged from Glycine about 50 mya, or about 30 mya before the Phaseolus/Glycine split (Fig. 1). A region on Medicago chromosome 8 was found to share synteny with the H1 and H2 sequences, with the similarity to H1 extending over approximately 700 kb in soybean and 900 kb in Medicago (Fig. 2D). This alignment revealed islands of well-conserved low-copy gene order but very poor conservation of NB-LRRs. Also notable is the expansion of three different gene families in Medicago: protein kinases, sulfotransferases, and signal peptidases (Fig. 2D). The complete absence of NB-LRRs from the left three-quarters of the aligned region in Medicago is particularly striking, given the presence of shared low-copy genes flanking this region. It is not clear whether the Medicago lineage lost these NB-LRRs or whether NB-LRRs were inserted in the lineage that gave rise to Phaseolus and Glycine. Significantly, of all the NB-LRRs found in the Medicago genome, the two most similar to soybean Rpg1-b are located within 400 kb of the aligned region in Medicago (Fig. 3). As discussed in more detail below, the subfamily of NB-LRRs that includes Rpg1-b is distributed over nearly 500 kb in soybean, which may account for the seeming lack of collinearity between the Medicago Rpg1-b homologues and soybean Rpg1-b. On the right side of the alignment shown in Figure 2D, there is a cluster of TIR-NB-LRRs that is located in a syntenic position in Medicago and soybean. Consistent with this syntenic position, these Medicago genes are the most closely related to these soybean TIR-NB-LRRs of any NB-LRRs in the Medicago genome (data not shown). Phylogenetic analysis indicates that this cluster in both Medicago and Glycine has undergone duplications subsequent to the split between Medicago and Glycine but that the common ancestor of Medicago and Glycine contained at least two TIR-NB-LRRs at this position, which gave rise to the members present in this cluster in both species today (Fig. 3, B and C).
Comparison of Soybean H3 with H1 and H2
As mentioned above, we identified a set of BACs in the Williams 82 library that contained homologues of several H1 low-copy genes but appeared to represent a much older duplication event than the divergence of homoeologues 1 and 2. To estimate the time of this duplication, we determined Ks values for all gene duplicates (seven pairs). The average Ks value for H1:H3 comparisons was 0.488 ± 0.098 (Supplemental Table S2), which compares with an average Ks value of 0.122 ± 0.035 for H1:H2 comparisons. Thus, the older duplication (H3) is roughly four times as old as the H2 duplication, which would place the duplication event approximately 40 to 50 mya. Medicago and Glycine share a presumed genome-wide duplication event estimated to have occurred 50 to 60 mya (Fig. 1; Mudge et al., 2005
Alignment of H3 with H1 and H2 also revealed conserved synteny (Fig. 2E). Of the 35 non-NB-LRR genes found on H3 (counting tandemly duplicated carbohydrate transporter and kinases as single genes), 11 are found on H1, H2, or both. Notably, H3 contains a small cluster of non-TIR-NB-LRRs in a position roughly equivalent to the Rpg1-b cluster in H1. Because the H3 duplication predates the divergence of Medicago and Glycine (Pfeil et al., 2005
To understand the evolution of the NB-LRRs in the sequenced region better, we performed phylogenetic analyses using the NB region of each NB-LRR, which is the most highly conserved domain. We first divided the NB-LRRs into TIR and non-TIR subclasses. The NB regions from all members of each subclass from all of the taxa sequenced were aligned and then analyzed for potential recombination events (see "Materials and Methods"). Any genes with apparent recombination events in the NB region were eliminated from the phylogenetic analysis because recombination mixes different histories and leads to misleading phylogenetic inferences. Based on these analyses, we eliminated one non-TIR gene and one TIR gene that showed evidence of recombination within the NB region. The remaining 89 genes were then subjected to Bayesian analysis, and a tree was constructed for each subclass (Fig. 3). Several important conclusions can be drawn from analysis of Figure 3. First, there clearly has been recent expansion of non-TIR-NB-LRR clusters in Glycine H1, Glycine H2, and Phaseolus, as evidenced by terminal branches with multiple closely related genes from individual taxa. Although most of this recent expansion occurs as tandemly repeated genes (note clusters of similarly colored boxes in Fig. 3), there are two clear cases where recent duplications are spread over several hundred kilobases. This is best illustrated by the soybean Rpg1-b subclade on H1 (shown in purple in Fig. 3). Rpg1-b is located approximately 200 kb away from the highly similar gene W21F22_29. The latter gene is located approximately 50 kb away and in the opposite orientation from another highly similar gene, W221b6_21, which is >100 kb away from a fourth highly similar gene, W10n21_4. In addition, genes belonging to the 42i18_2 subclade (shown in blue in Fig. 3) are spread over several hundred kilobases of soybean H1. In both examples, there are NB-LRRs from other clades intermixed, as indicated by the differently colored boxes. How this pattern of gene duplication arose is difficult to explain, but it is unlikely to be a result of unequal crossing over between NB-LRRs, as such events should eliminate the collinearity of intervening low-copy genes when comparing soybean with G. tomentella or Phaseolus, but collinearity has been maintained. The phylogenetic and physical analyses revealed that multiple NB-LRR lineages in this region predate the split between Phaseolus and Glycine. For example, the blue, purple, and red clades in Figure 3 are all shared between Phaseolus and Glycine. Interestingly, all three clades have undergone relatively recent expansion in Phaseolus as a result of tandem duplication events. In contrast, Glycine H2 appears to have lost the purple clade completely and reduced the number of blue clade members. However, there are more members of the red clade on H2 than on H1 in soybean, suggesting that there may have been some partitioning of NB-LRR subfamilies between homoeologues subsequent to the most recent polyploidy event. The phylogenetic analysis also revealed a clade (represented in orange in Fig. 3) that appears to have recently expanded on Glycine H1 but that is absent from both Phaseolus and Glycine H2. Its absence from both Phaseolus and H2 suggests that this clade arose subsequent to the H1:H2 split or, alternatively, that it has been lost from both the Phaseolus and H2 lineages. Because this orange clade is positioned between the blue clade and the red/purple clades on the phylogenetic tree (Fig. 3), and all three of the latter clades predate the Phaseolus/Glycine split, the most likely scenario is that this lineage has been lost independently from this region in Phaseolus and Glycine H2. There also appears to have been gene loss in Medicago, particularly of the non-TIR-NB-LRR class, or at least a failure to expand. The Medicago NB-LRR genes most similar to Rpg1-b (Mt74f16_58 and Mt74f16_73) are located approximately 500 kb away from the aligned regions (brown arrows at the bottom of the Medicago line in Fig. 3) and group outside all of the non-TIR-NB-LRR genes in the phylogenetic tree shown in Figure 3. A BLAST search of the complete 7x WGS soybean genome sequence using these two Medicago NB-LRRs as queries failed to find any additional soybean genes that were more similar than those on H1 and H2; thus, it is unclear whether Medicago underwent any significant losses of non-TIR-NB-LRRs or whether these have just expanded in Glycine and Phaseolus.
To determine whether the rapid evolution of the NB-LRR families is a general property of clustered homologous genes or is confined to NB-LRRs in this region, we performed a phylogenetic analysis on two non-NB-LRR gene families that are also located within the sequenced region. The first gene set encodes a family of putative carbohydrate transporters and is represented by four tandemly repeated genes on H1 of both soybean cultivars sequenced and on H1 of G. tomentella (orange boxes in Fig. 2 and Supplemental Fig. S2). Note that in Figure 2B and Supplemental Figure S2B, G. tomentella paralogues 1 and 2 are not shown because they fall into a gap in the G. tomentella BAC contig. The presence of paralogues 1 and 2 in this position was confirmed by screening a second G. tomentella BAC library derived from a tetraploid accession, G1134, and then sequencing a homologous BAC. Based on phylogenies, the four carbohydrate transporter genes arose by tandem duplication sometime prior to the split of soybean and G. tomentella (Fig. 4A). This observation indicates that, unlike NB-LRRs, these repeated genes are quite stable and are not undergoing rapid birth and death (none since the split of soybean and G. tomentella 5–7 mya). It also indicates that there has not been much recombination and gene conversion occurring between these four copies, which should result in concerted evolution and a loss of the orthologous relationships within each of the four gene pairs in G. tomentella and soybean.
H2 in soybean contains just one carbohydrate transporter gene. This gene (gmw2-12n11_11) is sister to copy 1 of the Gmw H1 cluster (gmw1-173d12_7), rather than falling outside the four Gmw H1 genes. This arrangement suggests that there may have been four copies of this gene present in the common ancestor of H1 and H2 and that H2 has lost three of the four copies during the last 10 million years. Consistent with this hypothesis, Ks analysis of the four Gmw H1 copies indicates that these duplications arose between 17 and 27 mya (Ks = 0.199–0.297), well before the polyploidy event and spanning the time of divergence between Phaseolus and Glycine, while the Ks value for the H1 (gmw1-173d12_7):H2(gmw2-12n11_11) comparison is 0.144, as expected for the most recent whole genome duplication. H3 in soybean also contains a tandem cluster of three carbohydrate transporter genes in the syntenic position (Fig. 2E), suggesting that this cluster dates back at least 50 mya. However, in the phylogenetic tree, these three H3 copies group sister to each other, which indicates that the duplications on H3 occurred after the 50-mya genome duplication event, or alternatively, that there has been some level of recombination/gene conversion occurring among the H3 copies such that the sequences have become homogenized. Ks analysis indicates that gene gmw2-129e12_19 diverged from the other two genes approximately 50 mya (Ks = 0.575–0.611), which is near the same time that the ancient polyploidy event is thought to have occurred. Meanwhile, comparison of the two most closely related genes (gmw2-129e12_14 and gmw2-129e12_16) gives a Ks of 0.205, indicating that this duplication occurred well after the ancient polyploidy event. The large differences in Ks values suggest that recombination among the three genes has been minimal and that there has been independent birth and death occurring in this cluster on H1, H2, and H3, albeit over much larger time frame than was observed for the NB-LRR family. Phaseolus contains at least two tandem copies (pva1-47b16_2 and _3) of the carbohydrate transporter gene in the syntenic position (Fig. 2A; additional copies could be present in the adjacent gap). Based on branch length (Fig. 4), both of these copies appear to have been evolving at a faster rate than the most closely related Glycine homologues (paralogue 2, which includes gmw2-173d12_5). A Tajima relative rates test of each member of the Glycine clade confirmed this conclusion (P < 0.01 using gmp1-34b24_7 or _9 as the outgroup). The topology of Figure 4A suggests that one of the two Phaseolus copies is orthologous to the three Glycine paralogue 2 genes, with the other gene representing a duplication lost from Glycine. However, the low posterior probability of this node (0.76; Fig. 4A) makes it possible that both genes are coorthologous to the three genes in the Glycine clade. We also analyzed a collection of kinase genes, which are more broadly distributed across soybean H1 (aqua boxes in Fig. 2 and Supplemental Fig. S2). We aligned the kinase domains of all kinases identified in all of the sequenced BAC clones and constructed a phylogenetic tree using Bayesian analysis. This tree revealed three distinct kinase families, one of which was specific to Phaseolus (Fig. 4B). Each member of the unique Phaseolus kinase family (family 3 in Fig. 4B) is located adjacent to an NB-LRR gene (pva1-76g17_5, pva1-76g17_9, and pva1-118d24_1 in Supplemental Fig. S2A), suggesting that the recent expansion of this kinase family may be related to its location amid an R gene cluster. The largest kinase family (family 1) included members from Phaseolus and all three soybean homoeologues as well as from H1 and H2 of G. tomentella. Two of the genes from this family are located adjacent to each other in soybean but oriented in opposite directions (Fig. 2A; Supplemental Fig. S2A; genes gmw1-42i18_7 and gmw1-42i18_8). Phylogenetic analysis (Fig. 4B) revealed that gene gmw1-42i18_8 has a clear orthologue in G. tomentella (gtd1-6a2_11), a clear homoeologue on soybean H2 (gmw2-171o6_5), and a clear orthologue in Phaseolus (pva1-34g17_10; Fig. 4B), all of which are located in similar syntenic positions, indicating that this gene has been conserved for at least 20 million years. The adjacent gene in soybean, gmw1-42i18_7, appears to be conserved in G. tomentella as well (orthologue is gtd1-6a1_15) and is oriented in the opposite direction to gtd1-6a2_11, just as in soybean (Fig. 2B). Thus, this gene pair has been quite stable, similar to the carbohydrate transporter family and unlike NB-LRR clusters. However, G. tomentella H1 contains an additional member of this kinase family (gtd1-6a11_19) adjacent to the inverted gene pair. Phylogenetically, this additional kinase is sister to the inverted gene pair; thus, it is likely from an earlier duplication event that may have been lost from soybean. H3 also contains two members of this family in the syntenic position (gmw2-48a19_4 and gmw2-48a19_5). Based on the tree topology, one copy (W48a19_5) is the homoeologue of all of the H1 and H2 genes in this family, consistent with H3 being derived from a much older duplication event. The second copy (W48a19_4) is derived from a still older duplication event. The last kinase family (family 2) displays a phylogenetic pattern similar to that of the low-copy gene families described above, with clearly identifiable orthologues and homoeologues (Fig. 4B). Specifically, gene gmw1-21a17_12 from soybean cv Williams 82 has a clear orthologue in Phaseolus (pva1-144m6_7) and a homoeologue on soybean H3 (gmw2-91b16_16), all of which occupy similar syntenic positions, indicating that this kinase has occupied this genomic position for over 50 million years. However, there is no H2 homoeologue present in this family, indicating that it has been lost from H2. The phylogenetic analysis also revealed a kinase subclade within family 1 that was unique to Glycine H2 (gmw2-171o6_1, gmp1-63m15_14, and gtd1-31b20_12) that did not appear to have homoeologues in H1 or an orthologue in Phaseolus. This is suggestive of a gene insertion event that occurred after the 10-mya Glycine polyploidy event and prior to the soybean-G. tomentella divergence. In summary, although there have been some duplications and losses across taxa and homoeologues in both the carbohydrate transporter family and kinase families, overall they are not experiencing the high rates of birth, death, and translocation observed in the NB-LRR family. This observation suggests that the rapid evolution of NB-LRRs is not simply a by-product of their clustering. Intriguingly, the one exception to this general rule is a small kinase family in Phaseolus that appears to be undergoing duplications alongside an immediately adjacent NB-LRR.
The above analyses were designed to address multiple questions regarding the evolution of the soybean genome, particularly in regard to the effects of polyploidy. By comparing the sequences of two homoeologous regions within soybean with the single orthologous region in Phaseolus, we were able to hypothesize the polarity of most changes. In addition, by comparing the soybean cv Williams 82 sequences with allelic sequences in soybean line PI96983 and with orthologous sequences in G. tomentella, we were able to gain insights into the tempo and mode of changes that are driving homoeologous sequences apart in soybean.
The most significant insights arising from our analyses relate to chromosomal rearrangements within Glycine that placed the H2 homoeologue adjacent to a centromere. Whether this occurred as a result of polyploidy is unknown. Homoeologous recombination is known to occur in polyploids (Gaeta et al., 2007
The relocation of the H2 region adjacent to a centromere is correlated with a dramatic increase in retrotransposon content in H2. Although it is well known that plant and animal centromeres are enriched in repetitive sequences, including retrotransposons (Lin et al., 2005
Expansion of retroelement content in the centromeric region may be selected for during meiosis, as these repeats are thought to promote microtubule binding, which then increases the frequency that a given chromosome ends up in the egg nucleus (Henikoff et al., 2001 In addition to containing high levels of repetitive DNA, pericentromeric regions typically are heterochromatic in structure (i.e. highly condensed). As a consequence, pericentromeric regions are often regarded as being low in both gene content and gene expression. Therefore, it is striking that the majority of the H2 low-copy genes have been conserved and continue to be expressed. Thus, pericentromeric location, per se, does not cause gene silencing or rapid loss of genes. Moreover, the synonymous mutation rates (Ks) observed in H2 low-copy genes were nearly identical to the rates observed in their homoeologues on H1 (Supplemental Table S2); thus, pericentromeric location does not alter substitution frequency, either.
Instead, our data indicate that synonymous substitution frequencies are determined by intrinsic properties of individual genes rather than extrinsic forces such as genomic context. We observed a wide variation in Ks values for individual low-copy genes on soybean H1 when comparing these genes with Phaseolus (2.75-fold; Supplemental Table S2), but there was little difference between homoeologous pairs (maximum fold difference of 1.29). Thus, the properties of individual genes that determine synonymous substitution rates must be conserved following polyploidization. Although the cause of gene-to-gene synonymous variation is unknown, Zhang and colleagues (2002)
In the soybean regions that we analyzed, the majority (approximately 77%) of low-copy gene duplicates derived from the most recent polyploidy event have been maintained over the course of 10 million years. A similarly high level of duplicate retention was also observed in two prior comparisons of homoeologous soybean BAC sequences (Schlueter et al., 2006
A 77% retention rate for homoeologous gene duplicates in soybean appears to be much higher than that reported for maize (Bruggmann et al., 2006
The maize study also analyzed differences in retrotransposon content between homoeologues (Bruggmann et al., 2006
In contrast to the low-copy genes, NB-LRR-encoding genes are not well conserved between soybean H1 and H2. Most notably, there appears to have been reciprocal loss of these genes between H1 and H2. Loss of NB-LRR-encoding genes following polyploidy appears to be a general phenomenon, as NB-LRRs are highly underrepresented in duplicated regions of the Arabidopsis genome (Cannon et al., 2004
Although the loss of NB-LRRs primarily occurred from H2, one NB-LRR subfamily was preferentially lost from H1 compared with H2 (red genes in Fig. 3). Such partitioning of NB-LRR subfamilies between homoeologues should facilitate sequence divergence, as it would be expected to reduce unequal crossover and gene conversion events between NB-LRR copies (Mondragon-Palomino and Gaut, 2005
Independent of polyploidy, our data show that NB-LRR gene clusters in both Glycine and Phaseolus are rapidly evolving, as evidenced by the phylogenetic trees shown in Figure 3. These trees display clusters of genes from the same taxa at terminal nodes, indicating recent duplication events. In addition, alignment of the two soybean genotypes shows significant changes in NB-LLR gene number and arrangement (dashed red boxes in Fig. 2C and Supplemental Fig. S2C). Rapid birth and death of NB-LRR genes have been observed in many plant species and are usually attributed to unequal crossover events, both within and between NB-LRR genes in a cluster (Michelmore and Meyers, 1998
Although the tandem arrangement of NB-LRR genes is thought to be necessary for the rapid birth and death events observed in NB-LRR clusters (e.g. to facilitate unequal crossover events), it must not be sufficient, because we observed a cluster of carbohydrate transporter-like genes in the Glycine H1 contigs that are surprisingly stable, with four copies being maintained since at least the split between soybean and G. tomentella (Fig. 4). In addition, phylogenetic analysis indicates that there has been little, if any, concerted evolution occurring among these four genes (Fig. 4), suggesting that gene conversion events are rare. It is unclear at present why some tandem gene clusters appear to recombine frequently and others do not. Phylogenetic analyses of 50 large gene families in the Arabidopsis genome revealed a large variation in apparent tandem duplication rates among families (Cannon et al., 2004
BAC Libraries All BAC libraries used in this project are available through the Clemson University Genomics Institute (https://www.genome.clemson.edu/cgi-bin/orders/). Two libraries of soybean (Glycine max Williams 82) were used in this project. The gmw1 library (CUGI GM_WBa) was constructed in R. Shoemaker's laboratory (Iowa State University) and contains 5.4 genome equivalents. The gmw2 library (CUGI GM_WBb) was constructed at the Clemson University Genomics Institute and contains 12 genome equivalents. The soybean PI96983 library (gmp1; CUGI GM_PBb) was constructed by BIO S&T and contains 6.8 genome equivalents. The Glycine tomentella diploid accession G1403 library (gtd1; CUGI GT_GBa) and tetraploid accession G1134 library (gtt1; CUGI GT_GBb) were also made by BIO S&T and contain 9.7 genome equivalents and 8 genome equivalents, respectively. The Phaseolus vulgaris accession G19833 library (pva1; CUGI PV_GBa) was made by Matthew Blair at the International Center for Tropical Agriculture and contains 12 genome equivalents. Additional library details, such as average insert sizes and restriction enzymes used, can be obtained from the Clemson University Genomics Institute Web site.
Assembly of the H1 BAC contig from soybean cv Williams 82 was initiated during the cloning of the Rpg1-b disease resistance gene (Ashfield et al., 2003 After sequencing the Williams 82 BACs, we identified low-copy protein-coding genes conserved in Arabidopsis (see "Annotation Protocols" below). A low-copy number in soybean was verified by searching The Institute for Genomic Research (TIGR) soybean Transcript Assembly database (http://tigrblast.tigr.org/euk-blast/plantta_blast.cgi). A subset of these low-copy gene sequences were then used as DNA hybridization probes (Supplemental Table S1; Supplemental Fig. S1) to screen BAC libraries of soybean cv Williams 82, soybean line PI96983, G. tomentella diploid accession G1403, G. tomentella tetraploid accession G1134, and P. vulgaris accession G19833. BAC clones that hybridized to two or more probes were then fingerprinted and end sequenced. A combination of fingerprint information, probe hybridization patterns, and end sequence information was used to assemble contigs and identify a minimum tiling path for sequencing. For G. tomentella tetraploid accession G1134, only a single BAC containing the carbohydrate transporter gene family was analyzed.
The detailed procedures for large-insert genomic DNA isolation, random shotgun cloning, fluorescence-based DNA sequencing, and subsequent analysis have been described previously (Bodenteich et al., 1993
Sequencing reactions were performed as described previously (Chissoe et al., 1995 All sequenced BACs have been deposited in GenBank and assigned accession numbers (Supplemental Table S5).
Genes were predicted using the dicot (Arabidopsis) matrix of FGENESH (Salamov and Solovyev, 2000
Repetitive sequences, including retrotransposons, were identified through a multistep iterative process. We used the program LTR_STRUC as the first step in identifying retrotransposons in sequenced BACs (McCarthy and McDonald, 2003
Genomic regional alignments were generated using similarity comparisons of predicted proteins. Synteny images were generated using custom Perl scripts (available on request) and the GD-SVG image library. Gene correspondences were calculated using BLASTALL (Altschul et al., 1997
Exons from 15 conserved low-copy genes spanning the region were aligned, along with available Medicago orthologues, using MUSCLE (Edgar, 2004
Ks calculations were estimated on the low-copy gene alignments used for phylogeny estimation. Before analysis, each alignment was checked for the correct reading frame. Ks was then determined using the yn00 algorithm implemented in PAML 3.15 (Yang, 1997
In-frame alignments of low-copy genes including soybean H1 and H2 and Phaseolus copies were submitted to MEGA (Kumar et al., 2004
NB-LRRs were subdivided into TIR and non-TIR classes, and a separate phylogenetic analysis was performed on each class. An approximately 900-bp region spanning from the P-loop (VGMGG in Rpg1-b) to the MHD motif (MHDLL in Rpg1-b) was used to construct phylogenetic trees. Amino acid sequences were initially aligned using ClustalW (Thompson et al., 1994
For the carbohydrate transporter tree, we included genes from the neotetraploid G. tomentella accession G1134 on BAC clone gtt1-298n2, as we had a gap in our BAC contig from G. tomentella accession G1403 covering two of the four transporter genes on H1. Amino acid sequences were initially aligned using ClustalW (Thompson et al., 1994
Soybean plants (cv Williams 82) were grown under standard greenhouse conditions (16-h daylength and 27°C daytime temperature). Root tips for somatic chromosome preparations were sampled and treated with 8-hydroxyquinoline according to previously published methods (Walling et al., 2005 Plasmid/BAC clones were purified using Qiagen maxiprep kits according to the manufacturer's instructions. Approximately 1 µg of purified plasmid DNA was labeled with either digoxigenin or biotin using Nick Translation Kits (Roche). The DNA-labeling reaction was kept at 15°C for 2 h, after which unincorporated nucleotides were removed using Qiagen PCR columns.
FISH of BAC clones onto DNA fibers (fiber-FISH) was performed as described (Jackson et al., 1998
Mitotic chromosome FISH was performed as described previously (Jiang et al., 1995
BAC and BAC end sequences were searched for two, three, or four nucleotide repeat motifs that had a minimum repeat length of 15 using the SSRIT script (Temnykh et al., 2001 Sequence data from this article can be found in the GenBank/EMBL data libraries under the accession numbers listed in Supplemental Tables S1 and S5.
The following materials are available in the online version of this article.
We thank Randy Shoemaker, Barbara Baker, and Chris Pires for serving on the advisory committee for this project. We also thank Randy Shoemaker for help with screening of BAC libraries. We thank Mounier Elharam and Jennifer Lewis at the University of Oklahoma's Advanced Center for Genome Technology for contributing to the DNA sequencing on the ABI3730 and Steve Kenton, Shaoping Lin, and Ying Fu for their helpful discussions on sequencing through difficult regions. Computer support was provided by the Indiana University Information Technology Services Research Database Complex, the Computational Biology Service Unit from Cornell University, which is partially funded by Microsoft Corporation, and the Advanced Center for Genome Technology. Received August 10, 2008; accepted October 6, 2008; published October 8, 2008.
1 This work was supported by the National Science Foundation Plant Genome Research Program (grant no. DBI–0321664 to R.W.I., M.A.S.M., N.D.Y., B.A.R., and J.J.D.) and by a grant from Genoscope/Commissariat à l'Energie Atomique-Centre National de Séquençage (to V.G.). A.N.E. was supported by a National Science Foundation Systematics Award (grant no. DEB–0516673).
2 Present address: Trait Genetics and Technology, Dow AgroSciences LLC, Indianapolis, IN 46268. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Roger W. Innes (rinnes{at}indiana.edu).
[W] The online version of this article contains Web-only data.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.108.127902 * Corresponding author; e-mail rinnes{at}indiana.edu.
Adams KL, Cronn R, Percifield R, Wendel JF (2003) Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc Natl Acad Sci USA 100: 4649–4654 Adams KL, Percifield R, Wendel JF (2004) Organ-specific silencing of duplicated genes in a newly synthesized cotton allotetraploid. Genetics 168: 2217–2226 Akkaya MS, Bhagwat AA, Cregan PB (1995) Integration of simple sequence repeat DNA markers into a soybean linkage map. Crop Sci 35: 1439–1445 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 Ashfield T, Bocian A, Held D, Henk AD, Marek LF, Danesh D, Penuela S, Meksem K, Lightfoot DA, Young ND, et al (2003) Genetic and physical localization of the soybean Rpg1-b disease resistance gene reveals a complex locus containing several tightly linked families of NBS-LRR genes. Mol Plant Microbe Interact 16: 817–826[Web of Science][Medline] Ashfield T, Danzer JR, Held D, Clayton K, Keim P, Saghai Maroof MA, Webb PM, Innes RW (1998) Rpg1, a soybean gene effective against races of bacterial blight, maps to a cluster of previously identified disease resistance genes. Theor Appl Genet 96: 1013–1021[CrossRef][Web of Science] Ashfield T, Keen NT, Buzzell RI, Innes RW (1995) Soybean resistance genes specific for different Pseudomonas syringae avirulence genes are allelic, or closely linked, at the RPG1 locus. Genetics 141: 1597–1604[Abstract] Ashfield T, Ong LE, Nobuta K, Schneider CM, Innes RW (2004) Convergent evolution of disease resistance gene specificity in two flowering plant families. Plant Cell 16: 309–318 Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16: 1679–1691 Bodenteich A, Chissoe S, Wang YF, Roe BA (1993) Shotgun cloning as the strategy of choice to generate templates for high-throughput dideoxynucleotide sequencing. In JC Venter, ed, Automated DNA Sequencing and Analysis Techniques. Academic Press, London, pp 42–50 Bomblies K, Lempe J, Epple P, Warthmann N, Lanz C, Dangl JL, Weigel D (2007) Autoimmune response as a mechanism for a Dobzhansky-Muller-type incompatibility syndrome in plants. PLoS Biol 5: e236[CrossRef][Medline] Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433–438[CrossRef][Web of Science][Medline] Bruggmann R, Bharti AK, Gundlach H, Lai J, Young S, Pontaroli AC, Wei F, Haberer G, Fuks G, Du C, et al (2006) Uneven chromosome contraction and expansion in the maize genome. Genome Res 16: 1241–1251 Cannon SB, Mitra A, Baumgarten A, Young ND, May G (2004) The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol 4: 10[CrossRef][Medline] Chin DB, Arroyo-Garcia R, Ochoa OE, Kesseli RV, Lavelle DO, Michelmore RW (2001) Recombination and spontaneous mutation at the major cluster of resistance genes in lettuce (Lactuca sativa). Genetics 157: 831–849 Chissoe SL, Bodenteich A, Wang YF, Wang YP, Burian D, Clifton SW, Crabtree J, Freeman A, Iyer K, Jian L, et al (1995) Sequence and analysis of the human ABL gene, the BCR gene, and regions involved in the Philadelphia chromosomal translocation. Genomics 27: 67–82[CrossRef][Web of Science][Medline] Comai L (2000) Genetic and epigenetic interactions in allopolyploid plants. Plant Mol Biol 43: 387–399[CrossRef][Web of Science][Medline] Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, Soltis PS, Wendel JF (2008) Evolutionary genetics of genome merger and doubling in plants. Annu Rev Genet 42: (in press) Doyle JJ, Luckow MA (2003) The rest of the iceberg: legume diversity and evolution in a phylogenetic context. Plant Physiol 131: 900–910 Dubcovsky J, Dvorak J (2007) Genome plasticity a key factor in the success of polyploid wheat under domestication. Science 316: 1862–1866 Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797 Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8: 186–194 Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8: 175–185 Gaeta RT, Pires JC, Iniguez-Luy F, Leon E, Osborn TC (2007) Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell 19: 3403–3417 Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8: 195–202 Gore MA, Hayes AJ, Jeong SC, Yue YG, Buss GR, Maroof S (2002) Mapping tightly linked genes controlling potyvirus infection at the Rsv1 and Rpv1 region in soybean. Genome 45: 592–599[Medline] Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696–704[CrossRef][Web of Science][Medline] Guindon S, Lethiec F, Duroux P, Gascuel O (2005) PHYML Online: a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res 33: W557–559 Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41: 95–98 Hayes AJ, Jeong SC, Gore MA, Yu YG, Buss GR, Tolin SA, Maroof MA (2004) Recombination within a nucleotide-binding-site/leucine-rich-repeat gene cluster produces new variants conditioning resistance to soybean mosaic virus in soybeans. Genetics 166: 493–503 Hayes AJ, Yue YG, Saghai Maroof MA (2000) Expression of two soybean resistance gene candidates shows divergence of paralogous single-copy genes. Theor Appl Genet 101: 789–795[CrossRef] Hegarty MJ, Hiscock SJ (2008) Genomic clues to the evolutionary success of polyploid plants. Curr Biol 18: R435–R444[CrossRef][Web of Science][Medline] Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293: 1098–1102 Henikoff S, Malik HS (2002) Centromeres: selfish drivers. Nature 417: 227[CrossRef][Medline] Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754–755 Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23: 254–267 Hyten DL, Song Q, Zhu Y, Choi IY, Nelson RL, Costa JM, Specht JE, Shoemaker RC, Cregan PB (2006) Impacts of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci USA 103: 16666–16671 Innes RW (2004) Guarding the goods: new insights into the central alarm system of plants. Plant Physiol 135: 695–701 Jackson SA, Wang ML, Goodman HM, Jiang J (1998) Application of fiber-FISH in physical mapping of Arabidopsis thaliana. Genome 41: 566–572[Medline] Jeong SC, Hayes AJ, Biyashev RM, Saghai Maroof MA (2001) Diversity and evolution of a non-TIR-NBS sequence family that clusters to a chromosomal "hotspot" for disease resistance genes in soybean. Theor Appl Genet 103: 406–414[CrossRef][Web of Science] Jiang J, Gill BS, Wang GL, Ronald PC, Ward DC (1995) Metaphase and interphase fluorescence in situ hybridization mapping of the rice genome with bacterial artificial chromosomes. Proc Natl Acad Sci USA 92: 4487–4491 Jorda L, Vera P (2000) Local and systemic induction of two defense-related subtilisin-like protease promoters in transgenic Arabidopsis plants: luciferin induction of PR gene expression. Plant Physiol 124: 1049–1058 Kass R, Raftery A (1995) Bayes factors. J Am Stat Assoc 90: 773–795[CrossRef][Web of Science] Kumar S, Tamura K, Nei M (2004) MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5: 150–163 Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 29: 4633–4642 Lai J, Ma J, Swigonova Z, Ramakrishna W, Linton E, Llaca V, Tanyolac B, Park YJ, Jeong OY, Bennetzen JL, et al (2004) Gene loss and movement in the maize genome. Genome Res 14: 1924–1931 Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ, Lincoln SE, Newburg L (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174–181[CrossRef][Medline] Lavin M, Herendeen P, Wojciechowski M (2005) Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary. Syst Biol 54: 575–594[CrossRef][Web of Science][Medline] Leitch AR, Leitch IJ (2008) Genomic plasticity and the diversity of polyploid plants. Science 320: 481–483 Lin JY, Jacobus BH, SanMiguel P, Walling JG, Yuan Y, Shoemaker RC, Young ND, Jackson SA (2005) Pericentromeric regions of soybean (Glycine max L. Merr.) chromosomes consist of retroelements and tandemly repeated DNA and are structurally and evolutionarily labile. Genetics 170: 1221–1230 Luo MC, Thomas C, You FM, Hsiao J, Ouyang S, Buell CR, Malandro M, McGuire PE, Anderson OD, Dvorak J (2003) High-throughput fingerprinting of bacterial artificial chromosomes using the snapshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics 82: 378–389[CrossRef][Web of Science][Medline] Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151–1155 Lynch M, Force A (2000) The probability of duplicate gene preservation by subfunctionalization. Genetics 154: 459–473 Ma J, Devos KM, Bennetzen JL (2004) Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res 14: 860–869 Ma J, SanMiguel P, Lai J, Messing J, Bennetzen JL (2005) DNA rearrangement in orthologous orp regions of the maize, rice and sorghum genomes. Genetics 170: 1209–1220 Ma J, Wing RA, Bennetzen JL, Jackson SA (2007) Plant centromere organization: a dynamic structure with conserved functions. Trends Genet 23: 134–139[CrossRef][Web of Science][Medline] Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y (2005) Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA 102: 5454–5459 Malik HS, Henikoff S (2002) Conflict begets complexity: the evolution of centromeres. Curr Opin Genet Dev 12: 711–718[CrossRef][Web of Science][Medline] Marek LF, Mudge J, Darnielle L, Grant D, Hanson N, Paz M, Huihuang Y, Denny R, Larson K, Foster-Hartnett D, et al (2001) Soybean genomic survey: BAC-end sequences near RFLP and SSR markers. Genome 44: 572–581[Medline] Marek LF, Shoemaker RC (1997) BAC contig development by fingerprint analysis in soybean. Genome 40: 420–427[Medline] Margulies EH, Birney E (2008) Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 9: 303–313[CrossRef][Medline] Martin D, Rybicki E (2000) RDP: detection of recombination amongst aligned sequences. Bioinformatics 16: 562–563 Martin DP, Posada D, Crandall KA, Williamson C (2005a) A modified Bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res Hum Retroviruses 21: 98–102[CrossRef][Web of Science][Medline] Martin DP, Williamson C, Posada D (2005b) RDP2: recombination detection and analysis from sequence alignments. Bioinformatics 21: 260–262 Maughan PJ, Saghai Maroof MA, Buss GR (2000) Identification of quantitative trait loci controlling sucrose content in soybean (Glycine max). Mol Breed 6: 105–111[CrossRef] Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16: 1046–1047 McCarthy EM, McDonald JF (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19: 362–367 Membre N, Bernier F, Staiger D, Berna A (2000) Arabidopsis thaliana germin-like proteins: common and specific features point to a variety of functions. Planta 211: 345–354[CrossRef][Web of Science][Medline] Meyers BC, Dickerman AW, Michelmore RW, Sivaramakrishnan S, Sobral BW, Young ND (1999) Plant disease resistance genes encode members of an ancient and diverse protein family within the nucleotide-binding superfamily. Plant J 20: 317–332[Web of Science][Medline] Meyers BC, Kozik A, Griego A, Kuang H, Michelmore RW (2003) Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis. Plant Cell 15: 809–834 Michelmore RW, Meyers BC (1998) Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res 8: 1113–1130 Mitsuhara I, Iwai T, Seo S, Yanagawa Y, Kawahigasi H, Hirose S, Ohkawa Y, Ohashi Y (2008) Characteristic expression of twelve rice PR1 family genes in response to pathogen infection, wounding, and defense-related signal compounds (121/180). Mol Genet Genomics 279: 415–427[Medline] Mondragon-Palomino M, Gaut BS (2005) Gene conversion and the evolution of three leucine-rich repeat gene families in Arabidopsis thaliana. Mol Biol Evol 22: 2444–2456 Moreno C, Lazar J, Jacob HJ, Kwitek AE (2008) Comparative genomics for detecting human disease genes. Adv Genet 60: 655–697[Medline] Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, Town CD, Young ND (2005) Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC Plant Biol 5: 15[CrossRef][Medline] Nagy ED, Bennetzen JL (2008) Pathogen corruption and site-directed recombination at a plant disease resistance gene cluster. Genome Res (in press) Nobuta K, Ashfield T, Kim S, Innes RW (2005) Diversification of non-TIR class NB-LRR genes in relation to whole-genome duplication events in Arabidopsis. Mol Plant Microbe Interact 18: 103–109[Medline] Noel L, Moores TL, van der Biezen EA, Parniske M, Daniels MJ, Parker JE, Jones JD (1999) Pronounced intraspecific haplotype divergence at the RPP5 complex disease resistance locus of Arabidopsis. Plant Cell 11: 2099–2112 Osmark P, Boyle B, Brisson N (1998) Sequential and structural homology between intracellular pathogenesis-related proteins and a group of latex proteins. Plant Mol Biol 38: 1243–1246[CrossRef][Medline] Padidam M, Sawyer S, Fauquet CM (1999) Possible emergence of new geminiviruses by frequent recombination. Virology 265: 218–225[CrossRef][Web of Science][Medline] Paterson AH (2005) Polyploidy, evolutionary opportunity, and crop adaptation. Genetica 123: 191–196[CrossRef][Medline] Paterson AH (2006) Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat Rev Genet 7: 174–184[CrossRef][Web of Science][Medline] Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ (2005) Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families. Syst Biol 54: 441–454[CrossRef][Web of Science][Medline] Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817–818 Posada D, Crandall KA (2001) Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc Natl Acad Sci USA 98: 13757–13762 Roe B (2004) Shotgun library construction for DNA sequencing. In S Zhao, M Stodolsky, eds, Bacterial Artificial Chromosomes, Vol 1: Library Construction, Physical Mapping, and Sequencing. Humana Press, Totowa, NJ, pp 171–187 Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574 Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. In S Krawetz, S Misener, eds, Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 365–386 Saghai Maroof MA, Biyashev RM, Yang GP, Zhang Q, Allard RW (1994) Extraordinarily polymorphic microsatellite DNA in barley: species diversity, chromosomal locations, and population dynamics. Proc Natl Acad Sci USA 91: 5466–5470 Salamov AA, Solovyev VV (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10: 516–522 Sambrook J, Fritsch EF, Maniatis T, editors (1989) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY Schlueter JA, Scheffler BE, Schlueter SD, Shoemaker RC (2006) Sequence conservation of homeologous bacterial artificial chromosomes and transcription of homeologous genes in soybean (Glycine max L. Merr.). Genetics 174: 1017–1028 Schranz ME, Song BH, Windsor AJ, Mitchell-Olds T (2007) Comparative genomics in the Brassicaceae: a family-wide perspective. Curr Opin Plant Biol 10: 168–175[CrossRef][Medline] Shoemaker RC, Schlueter J, Doyle JJ (2006) Paleopolyploidy and gene duplication in soybean and other legumes. Curr Opin Plant Biol 9: 104–109[CrossRef][Web of Science][Medline] Singh RJ, Chung GH, Nelson RL (2007) Landmark research in legumes. Genome 50: 525–537[Medline] Swigonova Z, Lai J, Ma J, Ramakrishna W, Llaca V, Bennetzen JL, Messing J (2004) Close split of sorghum and maize genome progenitors. Genome Res 14: 1916–1923 Swofford DL (2002) PAUP*: Phylogenetic Analysis Using Parsimony (* and Other Methods), Ed 4.0b. Sinauer Associates, Sunderland, MA Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S, McCouch S (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res 11: 1441–1452 Thomas BC, Pedersen B, Freeling M (2006) Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res 16: 934–946 Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673–4680 Tian D, Traw MB, Chen JQ, Kreitman M, Bergelson J (2003) Fitness costs of R-gene-mediated resistance in Arabidopsis thaliana. Nature 423: 74–77[CrossRef][Web of Science][Medline] Vahedian M, Shi L, Zhu T, Okimoto R, Danna K, Keim P (1995) Genomic organization and evolution of the soybean SB92 satellite sequence. Plant Mol Biol 29: 857–862[CrossRef][Medline] Van K, Kim DH, Cai CM, Kim MY, Shin JH, Graham MA, Shoemaker RC, Choi BS, Yang TJ, Lee SH (2008) Sequence level analysis of recently duplicated regions in soybean [Glycine max (L.) Merr.] genome. DNA Res 15: 93–102 Walling JG, Pires JC, Jackson SA (2005) Preparation of samples for comparative studies of plant chromosomes using in situ hybridization methods. Methods Enzymol 395: 443–460[Medline] Wawrzynski A, Ashfield T, Chen NWG, Mammadov J, Nguyen A, Podicheti R, Cannon SB, Thareau V, Ameline-Torregrosa C, Cannon E, et al (2008) Replication of nonautonomous retroelements in soybean appears to be both recent and common. Plant Physiol 148: 1760–1771 Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13: 555–556 Zhang L, Vision TJ, Gaut BS (2002) Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol Biol Evol 19: 1464–1473 This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|