Molecular and Chromosomal Evidence for Allopolyploidy in Soybean 1[OA]

Recent studies have documented that the soybean ( Glycine max ) genome has undergone two rounds of large-scale genome and/or segmental duplication. To shed light on the timing and nature of these duplication events, we characterized and analyzed two subfamilies of high-copy centromeric satellite repeats, CentGm-1 and CentGm-2, using a combination of computational and molecular cytogenetic approaches. These two subfamilies of satellite repeats mark distinct subsets of soybean centromeres and, in at least one case, a pair of homologs, suggesting their origins from an allopolyploid event. The satellite monomers of each subfamily are arranged in large tandem arrays, and intermingled monomers of the two subfamilies were not detected by ﬂuorescence in situ hybridization on extended DNA ﬁbers nor at the sequence level. This indicates that there has been little recombination and homogenization of satellite DNA between these two sets of centromeres. These satellite repeats are also present in Glycine soja , the proposed wild progenitor of soybean, but could not be detected in any other relatives of soybean examined in this study, suggesting the rapid divergence of the centromeric satellite DNA within the Glycine genus. Together, these observations provide direct evidence, at molecular and chromosomal levels, in support of the hypothesis that the soybean genome has experienced a recent allopolyploidization event.

At least 50% to 70% of land plants are estimated to be polyploid (Wendel, 2000), which may be an underestimate because recent genome sequencing has revealed that even putatively diploid genomes contain vestiges of polyploidy, or at least large segmental duplications (Vision et al., 2000;Paterson et al., 2004). Polyploidy is especially prevalent among crop plants, suggesting that polyploidy may provide an advantage in domestication (Udall and Wendel, 2006). Despite the incidence of polyploidy and the logical extension of its importance in evolution and adaptation, relatively little is known about how and when these major events occurred and why they were fixed in many plants, including soybean (Glycine max).
There have been several rounds of polyploidization and/or segmental duplication in soybean on the basis of chromosome number (Lackey, 1980), multiple hybridizing RFLP fragments (Shoemaker et al., 1996), and analysis of duplicated ESTs (Blanc and Wolfe, 2004;Schlueter et al., 2004). Whole-chromosome homology, as revealed by fluorescence in situ hybridization (FISH), provides additional evidence for polyploidy in soybean . Collectively, these data indicate that there have been two large-scale duplication events that occurred within the last 50 million years.
Centromeres of multicellular eukaryotes are generally composed of high-copy, satellite repeats such as the a-satellites of human (Willard, 1985), CentO of rice (Oryza sativa; Dong et al., 1998), the pAL1 satellites of Arabidopsis (Arabidopsis thaliana; Murata et al., 1994), and CentC of maize (Zea mays;Ananiev et al., 1998). Despite the importance of centromeres for chromosomal segregation in cell division, DNA sequences at centromeres are, paradoxically, not well conserved, even within a single genus (Lee et al., 2005). Thus, in an allopolyploid, formed by hybridization between two species, it is not unexpected that the centromeres differ at the sequence level, as seen in allopolyploids of Arabidopsis formed within the past approximately 300,000 years (Kamm et al., 1995;Pontes et al., 2004).
The ancestor of soybean and the remainder of the genus Glycine has been hypothesized to have been formed via a polyploid event within the last 15 million years ; however, it remains unclear whether the event was allo-or autopolyploid (Kumar and Hymowitz, 1989;Straub et al., 2006). Here we report that the soybean genome harbors two classes of centromere-specific repeats that mark subsets of chromosomes, suggesting that Glycine is allopolyploid.

Identification and Cytological Verification of Centromeric Satellite Repeats in Soybean
To identify centromeric satellite repeats, we constructed a whole-genome shotgun (WGS) library of soybean, and generated 1,454 WGS sequences (Gen-Bank CL867099-CL868434; Lin et al., 2005). Subsequently, a repeat database (http://www.soymap.org) was constructed using a repeat-finding program, RECON (Bao and Eddy, 2002). Because satellite repeats have been found to be a class of DNA that has the highest copy number in completely sequenced genomes of plants, such as Arabidopsis and rice, we hypothesized that the highest-copy-number repeats in the WGS database of soybean might represent centromeric satellite repeats. Therefore, we isolated and analyzed the largest family of repeats revealed by RECON that were composed mostly of 92-bp mono-mers and were found to be arranged in a head-to-tail pattern in both WGS sequences and RECON-assembled sequences, a feature typical of centromeric satellite repeats (Jiang et al., 2003). A 92-bp sequence from this family was isolated and an oligo-based probe was developed for FISH and found to localize to centromeric regions of 14 out of 20 chromosome pairs (Fig. 1, A-D). In accordance with commonly accepted nomenclature of plant centromeric sequences, this sequence is hereafter referred to as CentGm-1. We used CentGm-1 to search against the nonredundant nucleotide sequence database deposited in GenBank, and found that CentGm-1 is highly similar (.95%) to previously reported SB92 repeats (Vahedian et al., 1995). SB92 repeats were found to hybridize to four to five genomic locations, whereas, in this study it hybridized to 14 centromere pairs, probably a result of more reliable FISH results due to better chromosome preparation techniques.
Since most plant centromeres consist of large arrays of species-specific satellite repeats (Martinez-Zapater et al., 1986;Ananiev et al., 1998;Dong et al., 1998), we asked whether other centromeric satellite repeats existed in the centromeres that were not detected by CentGm-1. Therefore, we searched the repeat database Figure 1. FISH analysis of centromere repeats in soybean. A, Merged image of B to D, CentGm-1 (green) and CentGm-2 (red; arrows indicate both sequences at the same centromere and arrowheads indicate both sequences, but predominantly CentGm-2). B, DAPI-stained chromosomes. C, CentGm-2-labeled red fluorophore marks eight pairs of centromeres (two pairs overlap with CentGm-1). D, CentGm-1 labeled with green fluorophore marks 14 sets of centromeres (bar = 5 mm). E, Centromeric region of a soybean chromosome at the premeiotic pachytene stage. BAC 76J21 is shown in red (labeled digoxigenin detected with rhodamine anti-digoxigenin) marking the pericentromeric repeats whereas CentGm-2 (labeled biotin detected with AF498) marks the primary constriction (centromere). F, FISH of CentGm-2 (labeled biotin detected with two layers of AF498 streptavidin using goat antistreptavidin as an intermediate) and CentGm-1 (labeled digoxigenin detected with mouse antidigoxigenin followed by AF568 anti-mouse) on extended DNA fibers from soybean showing little or no colocalization of signal (bar = 10 mm). that we obtained from the WGS sequences using a tandem repeat finding program (Benson, 1999), and identified another subfamily of high-copy satellite repeats similar in structure to CentGm-1. This subfamily of repeats, named CentGm-2, is composed of 91-bp monomers, sharing approximately 80% sequence identity on average with CentGm-1. In FISH experiments, an oligo probe designed for CentGm-2 localized to eight chromosome pairs (Fig. 1, A-D). A combined FISH experiment with both repeats revealed that the two repeats mark 18 nonoverlapping sets of chromosomes at the centromeric regions (Fig. 1A). Two pairs of centromeres had both sequences; however, one pair was primarily composed of CentGm-1 (Fig. 1D, arrowheads).
To confirm that these sequences were enriched specifically in the centromeric regions and not pericentromeric regions, we utilized a previously isolated bacterial artificial chromosome (BAC), 76J21, that localizes to the pericentromeric regions of all 40 chromosomes (Lin et al., 2005) and contains previously described soybean repeats STR120 (Morgante et al., 1997) and the retrotransposon SIRE1 (Laten et al., 1998). FISH of BAC 76J21 in combination with CentGm-1 and CentGm-2 repeats to premeiotic pachytene chromosomes revealed that both centromeric repeats marked the primary constrictions exclusively (Fig. 1E). Moreover, CentGm-1 localized to the leading edges of chromosomes at mitotic anaphase ( Fig. 2A). Due to the squashing of the cell during preparation, some of the chromosomes have been rearranged relative to the mitotic poles/plate, but a majority of the chromosomes still have the CentGm-1 localized on the leading edge that would interact with the spindle apparatus.
CentGm-1 and CentGm-2 Are Found on a Pair of Homologs in Soybean Previously, we identified potential chromosome homologs in soybean by cross hybridization of BACs to duplicated segments within the soybean genome . Although CentGm-1 and CentGm-2 are specific to mostly nonoverlapping sets of chromosomes, we asked whether the two centromeric repeats would differentially target the previously identified homologs. Therefore, two BAC clones that cross hybridize to these two homologs were combined in a single FISH experiment with the two centromeric sequences, CentGm-1 and CentGm-2. The results indicated that each pair of homologs is indeed marked by a different centromeric repeat (Fig. 2B). The chromosome corresponding to linkage group 19 is marked by CentGm-1 and the other unidentified chromosome by Cent-Gm-2.

Phylogenetic Analysis of the Centromeric Satellite Repeats in Soybean
The two monomeric sequences used for FISH analysis were used to search against the WGS database by BLASTN and identified 1,001 CentGm-1 and 712 CentGm-2 monomers. Four-hundred sixty-four intact monomer sequences were randomly chosen, aligned, and a distance tree was constructed. Two major clusters were found (Fig. 3A), supporting the hypothesis that there are two major groups of centromeric repeats in the genome with greater than 90% bootstrap support. These results were supported by parsimony analyses (data not shown). The overall mean distance of monomers within clades was 0.134 6 0.015 and 0.131 6 0.018 for CentGm-1 and CentGm-2, respectively; whereas the mean distance between the monomers of CentGm-1 and CentGm-2 was 0.275 6 0.054. Although a cutoff of 60% sequence identity and 80% match length was employed in the BLASTN searches described above, we did not find any other satellite repeats related to CentGm-1 and CentGm-2. Figure 2. FISH analysis of CentGm-1 repeats during mitosis and CentGm-1 and CentGm-2 repeats on homologous chromosomes. A, CentGm-1 localized to leading edges of chromosomes at mitotic anaphase. CentGm-1 labeled with biotin and detected with AF488 streptavidin (green channel) and chromosomes were counterstained with propidium iodide (red channel); bar = 5 mm. B, Analysis of the distribution of the two centromeric repeats in a pair of homologous chromosomes in soybean. BACs 161O23 and 38L01 labeled with biotin and detected with AF488 streptavidin (green channel, arrows) that cross hybridize to duplicated regions in two homologs  were used in conjunction with CentGm-1 (labeled with digoxigenin and detected with rhodamine anti-digoxigenin, red channel) and CentGm-2 (labeled and detected with both fluorophores to result in yellow). One pair of homologs is marked by CentGm-1 (linkage group E), while the other pair is marked by CentGm-2. Lines show positions of centromeres between homologous chromosomes.

The Organization of Centromeric Repeat Arrays in Soybean
The observation that two pairs of soybean centromeres contain both CentGm-1 and CentGm-2 was intriguing. To further shed light on the organization of centromeric satellite arrays, we conducted three independent fiber-FISH experiments with two slides each using differentially labeled CentGm-1 and CentGm-2 sequences as probes. As illustrated in Figure 1F, all large fiber segments examined (e.g. more than a megabase in size) are composed of either CentGm-1 or CentGm-2, suggesting an absence of extensive rearrangement and reshuffling of CentGm-1 and CentGm-2 within centromeres. This parallels the observation that no WGS clones screened in this study contained both CentGm-1 and CentGm-2 sequences.
We performed computational analyses on five shotgun sequencing clones that contained CentGm sequences on both ends of the clone (three clones with CentGm-1 and two with CentGm-2 on both ends) to determine the level of monomer variation within defined regions (regions the size of a shotgun clone, approximately 4 kb). The amount of variation between monomers from either end of a clone, using the Kimura model, mirrored the variation seen within monomer clades described previously. For CentGm-1 the overall mean distances ranged from 0.096 to 0.136 and for CentGm-2 the two comparisons were 0.076 and 0.096.

Rapid Divergence of Centromeric Satellite Repeats within the Glycine Genus
DNA from a representative set of Glycine species was Southern blotted and probed with both centromeric repeats. Both CentGm-1 and CentGm-2 (data not shown) hybridized only to genomic DNA of the cultigen soybean and its intercompatible annual relative and immediate progenitor, Glycine soja (Fig. 3B). It is most likely that the centromeric repeats are arranged in the same patterns in G. soja as in soybean, as seen in 92/91-bp ladders on Southern blots (CentGm-1 shown in Fig. 3A, lane 1). Ladders arise from tandemly arrayed repeats that either do not completely digest or have sequence mutations such that monomers, dimers, and higher multimers are seen on the Southern. The absence of hybridization to CentGm-1 and CentGm-2 in the other relatives suggests the rapid divergence of centromeric satellite repeats within Glycine, as seen in Oryza (Lee et al., 2005). The faint band seen in Glycine pescadrensis (Fig. 3B, lane 6) is likely due to nonspecific hybridization of high M r DNA, since the same signal is seen with other noncentromeric DNA probes (data not shown).
relative to its phaseoloid legume relatives (e.g. Phaseolus, Vigna), most of which are 2n = 20 or 2n = 22. Other lines of evidence indicate that the soybean genome has undergone two large-scale genome-wide duplication or polyploidization events (Shoemaker et al., 1996;Blanc and Wolfe, 2004;Pagel et al., 2004;Schlueter et al., 2004Schlueter et al., , 2006Walling et al., 2005;Innes et al., 2008). Phylogenetic analyses of homologous gene pairs have shown that the older of the two putative polyploid duplications is shared by Glycine, Medicago, and Lotus, and therefore occurred no later than in their common ancestor (Pfeil et al., 2005;Cannon et al., 2006), which is estimated from chloroplast phylogenies to have existed around 54 million years ago (MYA; Lavin et al., 2005). This date is more consistent with a 46 MYA estimate (Schlueter et al., 2004) than a 16 to 17 MYA estimate (Blanc and Wolfe, 2004), and suggests that the dates of 10 to 15 MYA for the more recent duplication (Schlueter et al., 2004) is probably a closer estimate than the 3 to 5 MYA estimate (Blanc and Wolfe, 2004) as discussed by Shoemaker et al. (2006).
Regardless of the divergence dates of the progenitor genomes, an unresolved question is whether these polyploid events were fundamentally allo-or auto-in nature. Phylogenetic analysis can identify a pattern consistent with allopolyploidy, in which each homolog is most closely related to a different progenitor ortholog. No such pattern has been observed in gene phylogenies of Glycine (Straub et al., 2006). This pattern could be due to autopolyploidy, or, as we suggest is more likely, to the extinction of diploid progenitors (Fig. 4); in the latter case, gene phylogenies simply cannot resolve the issue.
In this study, we employed a novel approach to address this question and found two soybean centromere-specific satellite repeat classes that have mostly nonoverlapping distributions. The presence of two different centromeric repeat classes in the soybean genome suggests the existence of two subgenomes, which were already differentiated from one another cytologically, that were brought together by hybridization. In plants, such fixed hybridity defines genetic allopolyploidy, in which homologous (Huskins, 1932) chromosomes generally do not pair, and thus show disomic segregation. In contrast, genetic autopolyploids possess chromosomes similar enough to pair at meiosis, whether as multivalents or as random pairs of bivalents, and thus show polysomic segregation. This permits continued interaction of homologous sequences across all parental chromosomes, and can lead to recombination and segregational loss of parental sequences. However, even in allopolyploids, homologous sequences can continue to interact: many repeat families show homogenization across nonhomologous chromosomes (concerted evolution, e.g. through ectopic gene conversion). Indeed centromeric repeat families are a classic example of this phenomenon (Alexandrov et al., 1988;King et al., 1995;Galian and Vogler, 2003;Hall et al., 2005), and the similarity of repeats within the CentH-1 and CentH-2 classes, two centromere satellites of allopolyploid Arabidopsis suecica, indicates that these repeats can interact across nonhomologous chromosomes.
The surprising finding in Glycine is the low level of recombination or homogenization of the two subfamilies of satellites, which have persisted in the same genome for at least 5 million years (Fig. 4). The homogenization of centromeric satellite repeats has been found to be a relatively rapid process. For instance, extensive rearrangement and reshuffling of CentO satellite repeats in rice centromeres has occurred within the last half-million years (Ma and Bennetzen, 2006;Ma and Jackson, 2006;Ma et al., 2007). In rice and Arabidopsis, both of which are believed to be paleoployploids or extensive paleoaneuploids (Vision et al., 2000;Vandepoele et al., 2003), only single families of centromeric satellite repeats, e.g. CentO in rice (Dong et al., 1998) and pAL1 in Arabidopsis (Kamm et al., 1995), were found in respective genomes. Even in maize, an allotetraploid The diploid progenitor genomes of Glycine diverged at point 1, which was followed by polyploidy at point 2, which led to the modern chromosome number of 2n = 40 in diploid Glycine species, including soybean. The divergence of these now-extinct x progenitors occurred about twice as long ago as the divergence between the soybean lineage and the lineage of perennial Glycine species. The polyploidy event could have occurred anywhere between points 1 and 3. In A, polyploidy occurred very close to the time of divergence of progenitor genomes, possibly within a single species (taxonomic autopolyploidy); close relationship of progenitor genomes could permit pairing among their chromosomes (genetic autopolyploidy) and could lead to segregational loss of parental sequences or homogenization of repeats across parental genomes. In B, polyploidy occurred considerably later than the divergence of parental genomes; parents likely would be differentiated genetically and taxonomically when hybridization occurred (taxonomic allopolyploidy), also leading to disomic pairing (genetic allopolyploidy). MY, Million years. formed about 5 MYA from two diploid progenitors that diverged from a common diploid ancestor about 5 to 12 MYA (Gaut and Doebley, 1997;Swigonova et al., 2004), only CentC centromeric satellite repeats were identified and found in all centromeric regions of maize (Ananiev et al., 1998;Jiang et al., 2003). This indicates that the centromeric satellite repeats from the two diploid progenitors of maize, which had evolved independently for about 7 million years before their reunion, have been highly homogenized in the maize genome, probably by conversion, intercentromeric DNA exchanges, and numerous intracentromeric rearrangements (Ma and Bennetzen, 2006;Ma and Jackson, 2006).
The fact that the CentGm-1 probe targets more centromeres than CentGm-2 is intriguing. Assuming that the CentGm-1 and CentGm-2 donor genomes contributed equal numbers of chromosomes upon the formation of the polyploid genome and subsequent normal cell division, we would not expect unequal numbers of chromosomes carrying the two satellite repeats. It is possible that the CentGm-1 progenitor had more chromosomes than the CentGm-2 progenitor, although the real scenario cannot be revealed based solely on our current data. This is not a unique observation, since it was observed in several Arabidopsis species that there were multiple centromeric repeats that marked unequal subsets of centromeres (Kawabe and Nasuda, 2006).
The most likely explanation for the unequal distribution of centromere types in soybean is that paleopolyploid Glycine originated as a cross between two now presumed to be extinct 2n = 20 plants (Fig. 4), followed by partial homogenization of one centromeric repeat class by the other. The presence of both repeat classes in two pairs of centromeres suggests that the formation of chimeric tandem repeats, rather than homogenization, has also been the outcome of interactions between chromosomes bearing different repeat classes.
Given the rapid evolution of species-specific centromeric heterochromatin repeat sequences (Lee et al., 2005), it is not initially surprising that there was no hybridization of soybean repeats to genomic DNA of perennial soybean species. However, this observation must be reconciled with the even longer-term maintenance of two repeat classes in the soybean genome. The perennial Glycine species, like soybean, are 2n = 40 (one species is 2n = 38), and share a common ancestor with the soybean lineage roughly 5 MYA, based on silent site divergences of numerous nuclear genes (Innes et al., 2008). Thus, these species should possess centromeric repeats homologous with CentGm-1 and CentGm-2. Assuming clock-like rates of evolution, their CentGm-1-like repeats should be approximately only half as diverged from CentGm-1 as CentGm-1 and CentGm-2 are from each other. The same should be true of their CentGm-2-like repeats. Sequences roughly 90% similar should have been detectable by genomic Southern hybridization, but no hybridization was observed. This could be explained by homogenization between CentGm-1 and CentGm-2 throughout the history of soybean, though at a lower level than homogenization within either repeat family. This could cause both sequence families to diverge concertedly from homologous CentGm-like sequences in perennial relatives. This hypothesis predicts that perennial Glycine species may also possess two clusters of centromeric repeats that would be more similar to one another than to CentGm-1 and CentGm-2.
Because it is unknown how much interaction has occurred between CentGm-1 and CentGm-2 since they were brought together in the same genome, we cannot determine how divergent these repeats were at the time of hybridization. If they were already well differentiated (as suggested by their persistence as separate groups), this would be most consistent with the progenitors belonging to different species, suggesting taxonomic allopolyploidy. The unequal number of centromeres bearing the different repeat types would then be part of the rearrangement process that has led to the scrambling of the soybean genome such that homologous regions are scattered among different chromosomes, consistent with other mapping in soybean Shoemaker et al., 2006). However, the unequal number of homologous centromere classes could also be attributable to random segregational loss of parental chromosomes during a period of tetrasomic chromosome association in a genetic autopolyploid, or the newly formed polyploid Glycine could have been a segmental allopolyploid, with attributes of both auto-and allopolyploidy at different loci. One explanation for the observed centromeric structure in soybean would be some type of allopolyploidy followed by diploidization leading to a present day pseudodiploid, similar to what was recently hypothesized for Boechera holboellii (Kantama et al., 2007). Ultimately, the exact mode of origin and thus the formal distinction is less important than the observation that homologous variation currently exists in the Glycine genome, and that the soybean is a fixed polyploid hybrid for centromeres, as it is for many other loci.

Identification of Centromeric Repeats from Genome Shotgun Sequence Data
A total of 25,082 soybean (Glycine max) shotgun sequences comprising approximately 11.4 MB were used for the de novo identification of repeats using RECON (Bao and Eddy, 2002). The most frequently occurring families from the RECON output were selected and annotated using BLASTN (e = 10 24 ) with the nonredundant database at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). Families with satellite repeats were selected, and all the satellite repeats were combined into a single family. These satellite sequences were then assembled into contigs using PHRAP (http://www.phrap.org/phredphrapconsed.html). The contigs were searched for tandem repeats using the program Tandem Repeats Finder (Benson, 1999). High-copy number repeats were selected from the Tandem Repeats Finder output.

FISH of Centromeric Repeats
Chromosome preparations and FISH followed published procedures (Kato et al., 2004), with the following modifications. Soybean seeds (cv Williams 82) were surfaced sterilized overnight and then germinated for 3 d at 24°C in petri dishes on germination medium. Seedling roots were excised and subjected to pressurized nitrous oxide treatment (Kato, 1999) at room temperature for 55 min to induce mitotic arrest. Roots were then quickly fixed in ice-cold 90% acetic acid for 10 min and washed six times with distilled water. The terminal 2 to 3 mm of each root tip was then excised and individually transferred to microfuge tubes containing 20 mL of enzyme solution containing 1% Pectolyase Y-23 (MP Biomedicals) and 2% Onozuka R-10 cellulase (Research Products International) in citric buffer (10 mM sodium citrate, 10 mM sodium EDTA, pH 5.5) and digested for 55 min in a 37°C water bath. Tubes were transferred to wet ice and washed twice with 70% ethyl alcohol. Root tips were then macerated in the residual approximately 100 mL of liquid using a dissecting probe and then centrifuged at 2,000 relative centrifugal force for 4 s and the supernatant poured off onto a paper towel. Each cell pellet was then resuspended in 35 mL of room temperature glacial acetic acid. To prepare individual slides for hybridization, 5 mL of cell suspension was applied to each glass slide (Gold Seal) in a moistened paper towel-covered humid chamber. Once the solvent had evaporated (5-10 min), slides were to UV crosslinked under optimal crosslink setting in a model XL-1000 UV crosslinker (Spectronics Corporation). Next, 10 mL of blocking solution (23 SSC, 13 Tris-EDTA, 10 mg/mL sonicated salmon sperm DNA) and a flexible plastic coverslip (Fisher Scientific) were applied. Groups of slides were then denatured in a covered tray in a covered boiling water bath for 5 min and then transferred to a chilled metal plate on wet ice. Ten microliters of the following hybridization solution was then applied per slide: 23 SSC, 13 Tris-EDTA, and 0.2 ng/mL (2 ng total) of each fluorescently labeled DNA oligonucleotide (Integrated DNA Technologies). The following oligonucleotide pairs were used in combination: Pr91-C:AGTAAAAAGTTATTGTCGTTTGAATTT (CentGm-2) and Pr92-C:AGTTAAAAGTTATGACCATTTGAATTT (CentGm-1). The Pr91 oligonucleotide was labeled with red fluor (5# TEX 615) and the Pr92 oligonucleotide was labeled with green fluor (5# 6-FAM). Slides were hybridized overnight at 55°C in a humidified hybridization chamber. For posthybridization washes, slides were first dipped in room temperature 23 SSC to remove cover glasses and then washed for 5 min in 23 SSC at 55°C for 5 min in a 55°C air incubator. Wash solution was removed and chromosome spreads were covered with 10 mL of 4#,6-diamino-2-phenilinodole (DAPI) in Vectashield mounting medium (Vector Laboratories) and a glass coverslip.
Images were collected on an Olympus BX61 microscope using Applied Spectral Imaging (Vistas) software and a COOL-1300QS CCD camera (VDS Vosskü hler). Raw TIF format image files were imported into Adobe Photoshop CS2; the resolution was then increased from 72 to 200 dpi. Images were converted from 8-to 16-bit mode. Next, using the levels menu, cytoplasmic background was subtracted using the Set Black Point tool, and images were converted back to 8-bit mode. To construct the final images, the blue (DAPI) channel was removed and replaced with 100% black fill and then reintroduced as a separate grayscale layer that was then set to 25% opacity, thereby converting chromosomes from blue to gray.
For meiotic chromosome and DNA fiber FISH, plants (cv Williams 82) were grown under standard greenhouse conditions (16-h daylength and 27°C daytime temperature). Florets were collected for meiotic chromosome preparations according to Walling et al. (2005) and stored in Carnoy's solution at 4°C until used to prepare chromosome spreads. Nuclei extraction and fiber FISH were performed as described previously .
Plasmid clones of CentGm-1 and CentGm-2 were purified using Qiagen miniprep kit according to manufacturer's instructions. Approximately one microgram of purified plasmid DNA was labeled with either digoxigenin or biotin using Nick translation kits (Roche). The DNA labeling reaction was kept at 15°C for 2 h and then cleaned using Qiagen PCR columns.
FISH of plasmid clones on DNA fibers (fiber-FISH) was performed as previously described . AlexaFluor (AF) 488 streptavidin (Invitrogen) detected the biotin label. This signal was amplified by layering goat anti-streptavidin conjugated with biotin (Vector Laboratories) followed by a second application of AF488 streptavidin. Digoxigenin labels were detected using mouse anti-digoxigenin (Roche) followed by AF568 antimouse (Invitrogen). Mitotic and meiotic chromosome FISH was performed as previously described (Jiang et al., 1995). Biotin-labeled probes were detected using a single layer of AF488 streptavidin, and the digoxigenin-labeled probes were detected using a single layer of sheep anti-digoxigenin conjugated with rhodamine (Roche). Posthybridization formamide treatments and stringency washes were the same. FISH images were captured using a Photometrics Cool Snap HG camera attached to a Nikon Eclipse 80i fluorescence microscope. Images were adjusted and analyzed using Metamorph (Universal Imaging). Further cropping and labeling of images was performed using Adobe Photoshop CS v. 8.0 for Macintosh.

Computational Analysis of Centromeric Repeats
CentGm-1 and CentGm-2 were used to query the soybean genome shotgun sequences using BLASTN at default parameters to get 1,337 and 1,260 hits, respectively. These hits were parsed using cutoffs of 80% length and 60% sequence identity, which reduced the hit number to 1,001 and 712, respectively. These criteria were chosen so as to retain potentially diverged sequences. Since CentGm-1 and CentGm-2 share approximately 80% similarity with each other, we expected both sets of sequences to be represented at this stringency. Sequences corresponding to these hits were extracted using custom PERL scripts. Since the minimum length was 80%, some of the hits were truncated, and we therefore chose a random set of 464 nontruncated sequences. The 464 sequences were aligned using ClustalX (Thompson et al., 1994) using default options. The alignment was manually edited in Jalview (Clamp et al., 2004) and a neighbor-joining (Saitou and Nei, 1987) tree was constructed from this alignment using a 1,000 replicate bootstrap analysis (Kumar et al., 2004). FigTree version1.2.2 (http://tree.bio.ed.ac.uk/software/ figtree/) was used to draw and view the unrooted neighbor-joining tree.
To determine between the two clusters, a single consensus sequence for each cluster was created using multiple sequence alignments from the program Emma (http://www.hku.hk/bruhk/emboss/emma.html), an interface to the ClustalW program. The consensus sequences were then manually curated and aligned to each other using MEGA (Kumar et al. 2004). Distance between the two clusters was calculated using the Kimura-2 parameter model and complete deletion option.

Southern Analysis
Plant genomic DNA was extracted from young leaf tissue using a standard cetyl trimethyl ammonium bromide extraction protocol. For each species, 1 mg of plant genomic DNA was restriction digested with 6 units of HindIII (New England Biolabs) in a 37°C water bath overnight and separated on a 0.8% agarose gel. DNA from the gels was blotted onto Zeta-Probe GT genomic tested blotting membrane (Bio-Rad). The membrane was prehybridized for at least 30 min in Church hybridization buffer (1% bovine serum albumin/1 mM EDTA/7% SDS/0.5 M sodium phosphate) at 58°C. Probes were prepared using the Rediprime II random prime labeling system (Amersham Biosciences). Before the probes were used for hybridization, they were purified using the QIAquick nucleotide removal kit (Qiagen). The probe was hybridized to the membrane at 58°C overnight. After hybridization, the membrane was washed in 1.53 SSC/0.1% SDS for 30 min at 58°C, then in 13 SSC/0.1% SDS for 30 min. The membrane was exposed overnight to a Fujifilm BAS-MS imaging plate and digitally scanned using a Fuji FLA-5000 bio imaging analyzer.
Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers CL866971 to CL866979, CL866987 to CL868441, CL876820 to CL876828, CL877016 to CL884613, and CL884615 to CL900625.