- © 2005 American Society of Plant Biologists
Abstract
Tomato (Lycopersicon esculentum) is a model species for molecular biology research and a candidate for large-scale genome sequencing. Pericentromeric heterochromatin constitutes a large portion of the tomato chromosomes. However, the knowledge of the structure, organization, and evolution of such regions remains very limited. Here, we report the analysis of a 198-kb sequence near the FER gene, located in a distal part of pericentromeric heterochromatin on the long arm of tomato chromosome 6. Nine genes, one pseudogene, and 55 transposable elements (TEs) were identified, showing a low gene density (19.8 kb/gene) and a high content of transposable elements (>45% of the sequence). Six genes (56B23_g3, g5, g7, g8, g9, and g10) have perfect matches (>98% identity) with tomato expressed sequence tags. Two genes (56B23_g1 and g6), which share <98% sequence identity with expressed sequence tags, were confirmed for transcriptional activity by reverse transcription-PCR. The genes were not uniformly distributed along the sequence and grouped into gene islands separated by stretches of retrotransposons, forming a pattern similar to that found in the gene-rich regions of the large genomes of maize (Zea mays) and Triticeae. Long terminal repeat retrotransposons account for 60% of the TE sequence length. Sixteen of 55 TEs were completely new and remain unclassified. Surprisingly, five of the seven identified DNA transposons were closely associated with coding regions. The action of transposable elements and DNA rearrangements form the molecular basis of the dynamic genome evolution at the FER locus. Multiple rounds of genome duplication in Arabidopsis (Arabidopsis thaliana) and subsequent gene loss have generated a mosaic pattern of conservation between tomato and Arabidopsis orthologous sequences. Our data show that the distal parts of pericentromeric heterochromatin may contain many valuable genes and that these regions form an evolutionary active part of the tomato genome.
The chromosomes of higher eukaryotes are cytologically characterized by the presence of lightly and strongly staining material called euchromatin and heterochromatin, respectively. In plant genomes, euchromatic DNA contains most of the active genes and heterochromatin was generally thought to be transcriptionally silent or devoid of genes. Heterochromatic DNA is mainly confined to centromeric regions and to variable lengths of DNA flanking centromeres of each chromosome, called pericentromeric regions. While centromeric regions are essential for correct segregation of chromosomes during mitosis and meiosis, the function of pericentromeric regions in plants remains unknown. In the Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) genomes, centromere regions consist of specific tandemly repeated satellite DNA, occasionally interrupted by the insertion of Gypsy-type long terminal repeat (LTR) retrotransposons (Copenhaver et al., 1999; Wu et al., 2004; Zhang et al., 2004), while pericentromeric regions are rich in middle repetitive elements, including transposons, retrotransposons, and pseudogenes (Copenhaver et al., 1999). Surprisingly, more than 200 expressed genes in the pericentromeric regions of the Arabidopsis genome (Copenhaver et al., 1999) and several active genes within the rice chromosome 8 centromere (Nagaki et al., 2004) have been discovered, indicating that the transcription of these genes is not affected by the heterochromatic environment.
The Solanaceae family contains many economically important species such as tomato (Lycopersicon esculentum), potato (Solanum tuberosum), tobacco (Nicotiana tabacum), eggplant (Solanum melongena), pepper (Capsicum spp.), and petunia species. Tomato is the model species in Solanaceae as it has a diploid genome (2n = 24), the smallest genome size (953 Mb) among Solanaceae (Arumuganathan and Earle, 1991), a high-density genetic map (Tanksley et al., 1992), and a large mutant collection (http://tgrc.ucdavis.edu). Cytologically, tomato pachytene chromosomes show a typical morphology. Wide chromatic regions were observed on each side of the achromatic structure of the centromere, while distal regions of chromosome arms stain very lightly (Brown, 1949; Barton, 1950). In tomato, chromatic regions are pericentromeric heterochromatin consisting of a mosaic of highly and lightly chromosome-specific chromatic zones. In contrast to Arabidopsis and rice, tomato chromosomes have wide blocks of pericentromeric heterochromatin accounting for 77% of the nuclear DNA (Peterson et al., 1996), while euchromatin is limited to small contiguous stretches in the distal parts of the chromosome arms. A similar dominance of heterochromatin in genome organization was observed in Medicago truncatula, a model genome for legumes, for which pericentromeric heterochromatin represents 60% of the whole genome (Kulikova et al., 2001). Very little is known about the composition and organization of heterochromatin regions in the tomato genome. In Arabidopsis and rice as well as in numerous other plant species, heterochromatin DNA is mainly composed of repetitive DNA sequences. Considering cytological observations, a large amount of repetitive DNA could be expected in the tomato genome. In contrast, DNA reassociation experiments indicated that most of the tomato genome (73%) is composed of single-copy sequences, suggesting that many of them must reside in the heterochromatin regions (Peterson et al., 1998). Analysis of 1,205 bacterial artificial chromosome (BAC) end sequences have confirmed that the tomato genome is composed of 70% unique or uncharacterized sequences and that repetitive sequence is around 12% (Budiman et al., 2000). Initial deletion mapping experiments indicated that pericentromeric heterochromatin regions might contain few genes. However, there is increasing evidence that these regions contain important genes, such as the root-knot nematode resistance gene Mi (van Daelen et al., 1993), tobacco mosaic virus resistance gene Tm-2A (Motoyoshi et al., 1996), and the Jointless-2 gene (Budiman et al., 2004). Altogether, these results suggest an atypical composition and structure of pericentromeric heterochromatin in tomato.
Recently, the FER gene involved in the iron uptake of roots has been identified and isolated from the tomato BAC clone 56B23, which was mapped close to the centromere of chromosome 6 (Ling et al., 2002). Here, we report fluorescence in situ hybridization (FISH) experiments that show that 56B23 resides in the pericentromeric heterochromatin. Sequence analysis of 198 kb of the BAC 56B23 revealed a complex structure of active genes embedded in a transposon-rich environment. DNA rearrangements in tomato and multiple duplications in Arabidopsis were the mechanisms responsible for the mosaic organization of conservation between tomato and Arabidopsis.
RESULTS
Pericentromeric Localization of BAC 56B23 on Chromosome 6 by FISH
The tomato BAC clone 56B23 that carries the FER gene previously had been mapped to a genetic interval between markers TG590 and TG118 at a distance of approximately 10 cM from the chromosome 6 centromere (Tanksley et al., 1992; Ling et al., 1996). To detect its physical location on the chromosome, we labeled the DNA isolated from 56B23 and hybridized it to pachytene chromosomes. Cytologically, 56B23 resides in a deeply staining region of the distal part of the pericentromeric heterochromatic region on the long arm of chromosome 6 (Fig. 1). The length from 56B23 to the centromere is about one-sixth of the total length of chromosome 6.
Physical mapping of BAC 56B23 by pachytene chromosome FISH. A, The BAC 56B23 (green signal) is mapped to the distal part of the heterochromatic region on the long arm of chromosome 6; Cent indicates centromere. Chromosomes were stained with 4′,6-diamidino-phenylindole and pseudocolored as red. B, Inverted greyscale image of the same chromosome in A. The black portion represents the heterochromatic region of the chromosome. Bars = 5 μm.
Sequence Organization of BAC 56B23 from Tomato
The 198 kb of the BAC 56B23 was completely sequenced (accession no. AY678298). The overall GC content of 33.8% and a coding region GC content of 42.5% were similar to tomato genomic sequences analyzed in previous studies (Messeguer et al., 1991; Ku et al., 2000; Mao et al., 2001; Van der Hoeven et al., 2002). While coding regions accounted for 4.5% of the total sequence, putative transposable elements (TEs) and nonannotated sequences represented 45.5% and 50% of the whole BAC sequence, respectively.
Nine predicted genes and one pseudogene were identified, giving a gene density of 19.8 kb/gene (Table I; Fig. 2). While two predicted genes (56B23_g6 and g10) are intron free, seven genes (56B23_g1, g2, g3, g5, g7, g8, and g9) have multiple exons, ranging from two to nine exons (Table I). All the predicted genes were found highly conserved in the Lycopersicon or Solanum expressed sequence tag (EST) databases (Table I), supporting gene prediction. Six of them (g3, g5, g7, g8, g9, and g10) have prefect matches with tomato ESTs (>98% sequence identity). The genes g1 and g6 share 91% and 95% identity with tomato ESTs (Table I), while the best EST hit of 56B23_g2 was found in potato (CK256320, 94.6% identity). The three genes with lower identities to ESTs were further analyzed by reverse transcription (RT)-PCR for confirmation of active transcription. Transcripts of 56B23_g1 and 56B23_g6 were identified while the transcription of 56B23_g2 was not observed under our experimental conditions (data not shown), indicating that the expression of 56B23_g2 is possibly low, is induced under specific condition(s), or is limited to a specific tissue or cell type. These results suggest that the annotated genes or closely related paralogs are transcriptionally active except for 56B23_g4, which is predicted to be nonfunctional. This pseudogene contains the remnant of one exon partially similar to an internal part of an Arabidopsis gene (84% of amino acid identity of CAA19721, between positions 90–134 of 211). 56B23_g5 spans over 16.9 kb of sequence. The unusually large size is mostly due to the size of the first intron (12.6 kb), which was caused by the insertion of three transposable elements (Fig. 2).
Predicted genes in the BAC clone 56B23
Le, Tomato; Ns, Nicotiana sylvestris; St, potato; id., nucleotide identity.
Physical map of the 198 kb sequence of BAC clone 56B23. Black and gray boxes represent exons and introns of predicted genes and arrowheads indicate the transcriptional orientation of genes. The FER gene is indicated in red. (p) and asterisk indicate partial gene and pseudogene, respectively. Colored boxes represent the different types of TEs according to data included in Supplemental Table I. Horizontal red lines indicate the local duplication in the upstream region of the FER gene and horizontal gray lines indicate stretches of LTR retrotransposons. Numbers in brackets indicate the estimated date of LTR retrotransposon insertions according to Table III.
The predicted genes were mainly distributed in three locations within BAC 56B23. The proximal part of the BAC contained the partial 56B23_g1 gene. The middle part included four genes (56B23_g2 to g5) and covered a region from 35 kb to 88.5 kb with a gene density of 13.3 kb/gene. The distal part grouped five genes (56B23_g6 to g10) in a sequence of approximately 32 kb and here the gene density reaches up to 6.4 kb/gene. These results show that the genes are nonuniformly distributed along this tomato BAC clone.
In addition to predicted genes, a high density of repetitive DNA was observed. In total, 55 TEs were identified, accounting for approximately 93 kb of sequence and representing 45.5% of the whole BAC sequence (Table II; Supplemental Table I). The majority of all identified TE sequences fell into the class I retrotransposon group. These elements represented approximately 28% of the whole BAC sequence and 59.9% of all the TE sequences identified in this analysis. The second most important family was the unclassified elements, representing 20.7% of all TE sequences. The class II group was the smallest with 21 elements divided into three types (transposons, foldbacks, and miniature inverted-repeat transposable elements [MITEs]). They contributed 9% (seven elements accounting for 8.4 kb), 9.6% (10 elements for 8.9 kb), and 0.8% (four elements for 0.7 kb) of all the annotated TE sequences, respectively (Table II).
Summary of identified TEs in the tomato BAC clone 56B23
The 18 identified retrotransposons can be further subdivided into LTR and non-LTR retrotransposons. LTR retrotransposons were classified into Copia and Gypsy types according to the specific position of integrase and transcriptase-RNaseH genes (Kumar and Bennetzen, 1999). Here, five members of each Copia and Gypsy type were found. Only three LTR retrotransposons, Silvia (Copia), Caterina-1, and Caterina-2 (Gypsy), appeared to be complete elements (Fig. 2). Despite the identification of sequences similar to known plant retrotransposon genes, complete elements were all found to be defective. Accumulation of mutations was responsible for stop codons observed in reading frames as well as for numerous frame shift mutations. The other retrotransposons were found to be partial (conservation of one end of the element only), fragmented (conservation of internal sequences), or degenerated (conservation of small parts of the element). Three putative terminal repeat in miniature (TRIM) elements (Witte et al., 2001) and three putative SoloLTRs were also predicted.
A classification of retrotransposons into subfamilies was carried out using the nucleotide conservation of LTRs and internal parts of the elements (Supplemental Table I). With the exception of the three Caterina elements, each LTR retrotransposon belonged to an independent subfamily. The Toto1-like TRIM element was found highly homologous to a complete and autonomous retrotransposon identified earlier (Toto1; AF220602) in genomic sequences from currant tomato (Lycopersicon pimpinellifolium). This result suggests that TRIM elements might be generated from complete elements of the same family by unequal recombination events and might coexist as partial and complete elements in the same species. The Tina element, a degenerated and partial long interspersed nuclear element (LINE) was found inserted in the first intron of the gene 56B23_g5. The Tina element itself was disrupted by the insertion of the foldback element Ajax (Fig. 2). Altogether, our data indicate a high diversity of retrotransposon families in tomato.
The 18 class I retrotransposons described in this study were mainly clustered within three intergenic regions. The proximal part of the BAC (between 56B23_g1 and g2), the middle part (between 56B23_g5 and g6), and the very distal part (downstream 56B23_g10) each contained five LTR retrotransposons within a distance of 24.5 kb, five LTR retrotransposons within a distance of 42.7 kb, and three LTR retrotransposons within a distance of 4 kb (Fig. 2). The remaining five retrotransposons were dispersed throughout the BAC clone. These results indicate that LTR retrotransposons are not uniformly distributed but clustered in long stretches of sequences, participating in the increase of the intergenic distances. The divergence of the two LTRs of the same element was used to estimate the relative age of LTR retrotransposon insertion (SanMiguel et al., 1998). Of the 17 LTR retrotransposons identified, only six had intact LTRs at both ends. The nucleotide identity of LTRs in these complete elements ranged from 89.2% to 98.9% for Monica and Silvia, respectively. Two different groups of insertion times were found. While five elements showed ancient insertion times from 5 to 8 million years ago (MYA), only one retrotransposon Silvia was estimated to have integrated recently (0.74 MYA; Table III). The full-length element Silvia is nested in the degenerated Copia LTR retrotransposon Anouck, demonstrating that the insertion of Silvia was a more recent event than the Anouck insertion. Despite the fact that the insertion time of degenerated and truncated elements cannot be estimated, we conclude that successive insertion of LTR retrotransposons has shaped this region of tomato chromosome 6.
Estimated times of retrotransposon insertion
The 56B23 BAC clone contains 21 elements related to class II TEs, subclassified into transposons, foldback, and MITE repeat types. Transposons were identified by sequence similarities with known plant transposon nucleotide sequences and transposase proteins. In addition to sequence similarity, putative foldbacks and MITEs were identified on the basis of the absence of coding capacities, the presence of well conserved terminal inverted repeat (TIR) structures at both ends of putative TEs, the presence of target site duplications flanking the TIRs, and/or the presence of multiple copies in public sequence databases. According to their length, these TEs were subclassified into foldback (>300 bp) and MITE (<300 bp) types.
Seven transposons were classified into the four subgroups Sol3 (Oosumi and Belknap, 1997), POGO-like (Feschotte and Mouches, 2000), Alien-like (Pozueta-Romero et al., 1996), and Tam3-like (Rubin et al., 2001), ranging in size from 440 to 2,152 bp (Supplemental Table I). Two of the seven transposons identified (Thomas-1 and Thomas-2) were considered as complete elements. Although their TIR and target site duplications could not be clearly identified, these two transposons carried transposase-like genes. Five of the seven identified transposons were located at less than 1.4 kb of the 5′ or the 3′ of genes (Fig. 2). Interestingly, the Sol3 nonautonomous transposon Sol3-1 was found in the promoter region of 56B23_g2 at a distance of 0.3 kb from the predicted transcription initiation site. These results support the observations of a close association between class II TEs and coding regions (Wessler, 1998). Ten different types of foldback elements were classified. All of the foldback elements are short, ranging in size from 0.3 kb for Trojan to 1.5 kb for Kalypso. A notable accumulation of four elements (Cyclop, Kalypso, Ulysses, and Penelope) was observed within an interval of 8.3 kb in the upstream region of the FER gene (56B23_g3, Fig. 2). Four MITEs, ranging in size from 44 to 273 bp, were also identified.
The tomato BAC clone also contains 16 putative TEs showing neither coding capacities nor common structural features of TEs classified previously. The unclassified elements have been identified by the presence of multiple complete or partial copies in available Solanaceae genomic sequences (Supplemental Table I). In some cases, the observation of the disruption of known nucleotide structures led to the identification of unclassified elements (XB, Fig. 2).
DNA Rearrangements
We have found evidence of DNA rearrangements linked to TEs in the tomato sequence studied. Of the 17 LTR retrotransposons identified, only six (including TRIM elements) are intact elements. The presence of truncated elements and soloLTRs in the remaining LTR retrotransposons indicates multiple DNA rearrangements such as unequal and illegitimate recombination similar to those observed in Arabidopsis, rice, maize (Zea mays), and Triticeae genomes (Devos et al., 2002; Ma et al., 2004).
DNA rearrangements were also observed outside TEs. The inactive 56B23_g4 gene showed deletions in coding regions in both 5′ and 3′ sequences and a duplication of a segment of approximately 2 kb occurred in the 5′ region of the FER gene (56B23_g3, Fig. 2). The duplicated segments, called DR1 and DR2, were in direct orientation and separated by 2.3 kb. While DR1 showed a size of approximately 2 kb, DR2 covered a distance of 7 kb, representing a size expansion of 3.5-fold. Nucleotide alignments between DR1 and DR2 revealed that the DR2 region was fragmented into four well-conserved segments (88.6% to 93.1% of nucleotide identity). DR2 segments were separated by the insertion of transposable elements that were not present in the DR1 counterpart. Comparative analysis of the two duplicated segments allowed us to make a model of genome evolution in the FER gene region (Fig. 3). The ancestor locus of the FER gene has undergone at least two insertions before the duplication. Two putative unclassified elements (XF-1 and XG-1) were nested within the ancestor of the DR1 segment (Fig. 3A). Thomas-1 was also inserted at a distance of 0.5 kb of the 3′ of the FER gene. In the second step, the duplication of DR1 generated the DR2 segment (Fig. 3B). DR2 was then disrupted by four distinct TEs Sol3-2 (Sol3 transposon), Kalypso, Ulysses, and Penelope (foldbacks), increasing the size of the segment to 7 kb. Altogether, the sequence analysis revealed the presence of 11 complete, partial, or disrupted elements within a distance of 15.3 kb surrounding the FER gene.
Model for the evolution of the upstream region of the FER gene. A, Step 1, two unclassified elements XF-1 and XG-1 inserted within the promoter region of the FER gene. B, Step 2, a fragment of 2 kb located in the promoter region of the FER gene was duplicated in direct orientation, creating a paralogous segment carrying copies of the XF-1 and XG-1 unclassified transposable elements (XF-2 and XG-2, respectively). Two additional transposable elements (Cyclop and Tapir-like) have been inserted between the two paralogous segments. C, Step 3, four TEs (Kalypso, Sol3-2, Ulysses, and Penelope) inserted within the duplicated segment DR2 increasing the size of the fragment from 2 kb to 7 kb.
Microcolinearity between the Tomato BAC 56B23 and the Arabidopsis Genome
Nine genes and one pseudogene annotated in the tomato BAC 56B23 were used as queries for BLASTP searches against a local database composed of all putative annotated proteins from Arabidopsis. The number of Arabidopsis homologs ranged from one for 56B23_g1 to more than 250 for 56B23_g2 (Supplemental Table II). The genomic location of homologs led to the identification of three Arabidopsis BAC clones from chromosomes 2, 3, and 5. None of them were located close to Arabidopsis centromeres. Each carried several coding regions that are homologous to annotated genes present on tomato BAC 56B23. Homologs of 56B23_g9 and g10 were found conserved in the same orientation between tomato and the Arabidopsis chromosome 2. In tomato, the two genes were separated by a distance of 5.16 kb on 56B23, whereas the Arabidopsis homologs At2g29960 and At2g29950 were separated by a distance of 2.78 kb (Supplemental Table II). This represented a limited intergenic expansion in tomato by a factor of 1.8. Five tomato genes (56B23_g1, g2, g7, g8, and g9) separated by a distance of approximately 187 kb on 56B23 showed homology to four Arabidopsis genes (At3g55960, At3g55950, At3g55940, and At3g55920) in a region of approximately 19 kb on the chromosome 3 (Supplemental Tables II and III). Comparison of distances indicates an expansion by a factor of approximately 10 in tomato. Although the gene order was strictly conserved, the transcriptional orientation of the tomato gene 56B23_g2 was reversed compared to its Arabidopsis homolog (At3g55950). Moreover, a gene duplication has occurred in tomato (56B23_g7 and g8, phosphoinositide-phospholipase), whereas only one such gene was present in the Arabidopsis At3g55940 locus. Three tomato genes (56B23_g7, g8, and g9) separated by a distance of approximately 25 kb showed homology to four Arabidopsis genes (At5g58700, At5g58690, At5g58670, and At5g58710) in a region of 15 kb on the chromosome 5 (Supplemental Tables II and III). This represents an expansion of distance by a factor of 1.6 in tomato. Here, three Arabidopsis genes were found similar to the tomato duplicated genes 56B23_g7 and g8. The remaining tomato genes that did not show any homologs located in the three regions on Arabidopsis chromosomes 2 (At2g29960-At2g29950), 3 (At3g55960-At3g55920) and 5 (At5g58670-At5g58710) have homologs located elsewhere in the Arabidopsis genome (Supplemental Table II). However, no additional colinearity was found between Arabidopsis and the tomato BAC 56B23. Our results indicate conservation of microcolinearity between the distal and proximal part of the tomato BAC clone and several Arabidopsis genomic segments located on different chromosomes. The central part of the tomato BAC, encompassing four predicted genes (56B23_g3 to g6) of which belongs the FER gene (56B23_g3), was not found colinear with any Arabidopsis segments.
To investigate the mechanisms of genome evolution at the origin of the mosaic conservation of the colinearity, the three colinear Arabidopsis segments were subjected to a detailed analysis. Two of the three Arabidopsis regions that are colinear with tomato fall into the “recent” duplication of the Arabidopsis genome that has occurred before the Arabidopsis/Brassica rapa split, 24 to 40 MYA (Blanc et al., 2003). Whereas the chromosome 2 At2g29960-At2g29950 region was not found duplicated, the chromosome 5 (At5g58670-At5g58710) and 3 (At3g55960-At3g55920) regions were found duplicated in chromosomes 3 and 2, respectively (Supplemental Fig. 4). To characterize the duplicated segments, 100 predicted genes of Arabidopsis were extracted from duplicated blocks surrounding At5g58670-At5g58710 and At3g55960-At3g55920 regions. Genes were translated into proteins and were compared with themselves using BLASTP algorithm. Results confirmed the segmental duplications between the chromosomes 2 and 3, surrounding the At3g55960-At3g55920 and between the chromosomes 3 and 5 surrounding At5g58670-At5g58710 (Supplemental Fig. 4). Only two genes located on the paralogous segment of around At5g58670-At5g58710 (At3g47220 and At3g47290, Supplemental Table II) were found similar to genes in the tomato BAC 56B23 (56B23_g7 and g8). These results indicate that few colinear genes between tomato and Arabidopsis At5g58670-At5g58710 and At3g55960-At3g55920 regions were found conserved on their paralogous segments through the large duplication as observed at the ovate locus on the tomato chromosome 2 (Ku et al., 2000). These observations suggest an extensive gene loss in the duplicated segments of Arabidopsis, which is mainly responsible for the mosaic conservation of colinear genes observed between tomato and Arabidopsis.
DISCUSSION
Low Gene Density and High TE Content Characterize the FER Gene Region
The tomato BAC clone 56B23 carrying the FER gene genetically mapped about 10 cM from the centromere of chromosome 6 (Ling et al., 1996, 2002). Cytologically, it is located in the distal part of the pericentromeric heterochromatin of the chromosome 6 long arm. Sequence analysis of the 198 kb of the 56B23 BAC identified nine genes and one pseudogene, giving a gene density of 19.8 kb/gene. This is low compared to previous genomic sequencing studies in tomato, in which the gene density ranged from 5 to 17 kb/gene (Rossberg et al., 2001; Van der Hoeven et al., 2002). Compared to Arabidopsis, it is similar to the gene density (20 kb/gene) in the centromeric region of chromosome 2 and lower than the gene density (11–14 kb/gene) in the pericentromeric region (Copenhaver et al., 1999). In contrast to low gene density, a high density of TEs was identified, with TEs accounting for 45.5% of the whole sequence of the BAC clone. This is much higher than estimates based on the complete sequences of gene-containing BAC clones and the random sequence of BAC ends (Budiman et al., 2000). Similarly, TEs were found significantly more abundant in pericentromeric heterochromatin regions than on chromosome arms (Copenhaver et al., 1999). Because BAC 56B23 was located by FISH mapping in the distal part of the pericentromeric region, we suggest that this segment might be a transition region physically separating the gene-rich euchromatin from the centromeric heterochromatin regions of the chromosome 6.
Most of the identified genes of the FER locus were shown to be active or potentially active, despite a heterochromatic environment. The presence of such heterochromatic genes has been shown in several organisms including Drosophila melanogaster, Arabidopsis, and rice. In D. melanogaster and Arabidopsis heterochromatin regions, 453 annotated genes and 200 expressed genes have been identified, respectively (Copenhaver et al., 1999; Hoskins et al., 2002). In D. melanogaster, where heterochromatic genes have been initially studied, it appears that a heterochromatin environment is necessary for the expression of heterochromatic genes (Weiler and Wakimoto, 1995). It is not known if such gene adaptation occurred for heterochromatic genes of the tomato genome.
The structure and composition of transcriptionally active heterochromatin regions remain poorly understood in eukaryotes and particularly in plants. In tomato, previous analyses identified several valuable genes in the pericentromeric regions of chromosomes 6, 9, and 12 (van Daelen et al., 1993; Motoyoshi et al., 1996; Budiman et al., 2004). The root-knot nematode resistance locus Mi resides in the pericentromeric region of the chromosome 6 short arm (van Daelen et al., 1993). A short genomic sequence of 52 kb spanning the locus was isolated and sequenced (Milligan et al., 1998) and five open reading frames were discovered. Three of them are conserved resistance genes organized in a cluster. A survey of the TE content revealed the presence of several transposons and a degenerated retrotransposon (Milligan et al., 1998; R. Guyot, X. Cheng, B. Keller, and H.-Q. Ling, unpublished data). Unfortunately, the small size of the sequenced locus and the absence of any data on the physical location on pachytene chromosome limit the comparison with the FER locus.
The organization of the tomato genome in large blocks of heterochromatin, the presence of active genes and transposable elements in the distal part of these regions and an ongoing project of large-scale genome sequencing (the International Solanaceae Genome Initiative, http://www.sgn.cornell.edu) indicate that the tomato genome is an attractive model to explore in detail the composition and organization of the pericentromeric heterochromatin regions.
Genome Organization and Evolution at the FER Locus
The detailed characterization of the BAC clone 56B23 revealed a complex organization of genes and TEs. Genes were not uniformly distributed along the BAC, but were clustered in gene islands separated by stretches of retrotransposons. This class of elements accounts for approximately 28% of the BAC sequence. Successive insertion events were evident from the identification of a nested retrotransposon structure. The estimation of the relative LTR retrotransposon insertions suggested that these elements were active from 8 MYA until recently (0.74 MYA). The accumulation of retrotransposons is responsible for the observed distribution of genes and possibly for the increase of intergenic distances between gene islands.
In total, 55 potential TEs were identified. The majority of them were not described before in the tomato genome. TEs were classified into 44 different types demonstrating a high diversity of classes and types. Heterogeneity of retrotransposon families has been described based on PCR experiments using degenerated primers for both Gypsy and Copia retrotransposon families in the tomato genome. Four active and highly heterogeneous families of Copia retrotransposons have been found in the Lycopersicon chilense genome (Yanez et al., 1998) and six families of Gypsy were reported from L. esculentum (Su and Brown, 1997). LTR retrotransposon structure, organization, and evolution have been actively studied in euchromatin of cereals in which they contribute 50% to 80% to the genome size (SanMiguel et al., 1998). Similar patterns of retrotransposon insertions as well as recombination events removing such elements were found in the tomato FER region. LTR retrotransposons acting as target site for insertion of additional elements participate to the formation of long stretches of nested elements as observed in gene-rich regions of maize, wheat, and barley (SanMiguel et al., 1998; Vicient et al., 1999; Wicker et al., 2001). Mechanisms of LTR retrotransposon removal were also identified in tomato, similar to mechanisms described in Arabidopsis, rice, and Triticeae genomes (Devos et al., 2002; Wicker et al., 2003; Ma et al., 2004).
Beside retrotransposons, a high variety of Class II elements was also identified in this region. Class II elements represent 12% of the TE number and were found closely associated with genes (Wessler, 1998). Sol3-1, a nonautonomous transposon, was found inserted within the promoter of 56B23_g2. A promoter location of Sol3 was previously also observed in the tomato 1-aminocyclopropane-1-carboxylate synthase (Shiu et al., 1998) and polygalacturonase genes (Oosumi et al., 1995), and a function as regulatory sequences was proposed for these elements (Shiu et al., 1998). The association between Sol3 and promoter regions as well as more generally association between transposons and genes in Solanaceae will make them useful for future development of targeted tagging tools in these genomes. Unclassified elements account for about 10% of the BAC clone. Probably, these elements are remnants of deletion and recombination explaining why no common structure with known classes of transposable elements was found. Large-scale sequencing of the tomato genome will allow the identification of the full-length elements and then determination of their complete structures.
The dynamic evolution of the tomato genome is based on DNA rearrangements and can be easily studied near the FER gene. There, a local duplication of a promoter fragment of 2 kb predated four TE insertions in one of the two duplicated copies. In total, 11 TEs were identified, increasing the local TE density to 1.3 kb/TE. TE insertions and a local duplication made the region a hot spot of genome evolution. Transposable elements were reported to have an important impact in the evolution and expression of genes in large genome size species such as maize and Triticeae. Here, the accumulation of diverse types of TE upstream and downstream the FER gene raises the question if they affect FER gene activity.
DNA rearrangements and accumulation of TE demonstrate that the pericentromeric transition region around the FER gene is an evolutionary dynamic part of the tomato genome. Furthermore, the recent activity of LTR retrotransposons and the presence of active genes indicate that the organization of the FER locus is evolutionarily young. Altogether, these observations suggest that the FER region is at a dynamic evolutionary transition stage from gene-rich euchromatin to centromeric heterochromatin regions.
Mosaic Pattern of Microcolinearity Conservation between the Tomato FER Gene Transition Region and the Arabidopsis Genome
Our results indicate a complex relationship of microcolinearity between the tomato pericentromeric BAC 56B23 and three distinct regions on Arabidopsis chromosomes 2, 3, and 5. While the colinear region on chromosome 2 was not duplicated in Arabidopsis, the regions on chromosome 3 and 5 represent recently duplicated blocks of the Arabidopsis genome, estimated to have occurred 24 to 40 MYA (Blanc et al., 2003). However, very few colinear genes were found conserved in the Arabidopsis paralogous segments, indicating an extensive process of selective gene loss or gene movement subsequent to the Arabidopsis duplication. Similar observations were described in the tomato chromosome 2 ovate region (Ku et al., 2000) where the Arabidopsis colinear segments were found duplicated prior to gene loss. Close examination of the tomato colinear region on Arabidopsis chromosomes 2, 3, and 5 revealed intragenomic colinearity. The conservation of gene order among the three segments indicated successive ancestral duplications of the regions, prior to the “recent” duplications. Ancestral large-scale duplications in Arabidopsis have been described in detail and there is good evidence that Arabidopsis has undergone three rounds of genome duplications (Simillion et al., 2002). We made a relative dating of Arabidopsis and tomato segments by phylogenetic analysis using the cyclophilin genes that are the only common genes in all the segments studied (R. Guyot, X. Cheng, B. Keller, and H.-Q. Ling, unpublished data). Our results suggest that the ancestral duplication between At5g58670-At5g58710 and At2g29960-At2g29950 regions occurred after the divergence of tomato and Arabidopsis. This ancestral duplication must have happened between 90 to 150 MYA and the recent duplication in Arabidopsis 24 to 40 MYA (Blanc et al., 2003).
The accumulation of large-scale duplications and subsequent gene loss obviously is of great significance for the conservation of colinearity between distantly related plants species. The central part of the tomato BAC 56B23, comprising four genes, was not found colinear with any Arabidopsis segment. However, homologs of the four genes were found conserved, but dispersed throughout the Arabidopsis genome. This colinearity disruption can be generated by gene movements either in Arabidopsis and/or in tomato. In Arabidopsis, extensive gene loss on duplicated segments generated hidden duplications, leading to an apparent gene movement and generating perturbations in the colinearity between Arabidopsis and other plant species. In tomato, the disruption of colinearity is accompanied by a large variation in size compared to the Arabidopsis chromosome 3 colinear segments. Expansion is due to the presence of noncolinear genes and insertions of numerous transposable elements. Gene movements can also be the result of the dynamic evolution of this part of the genome through the action of transposable elements. Further detailed and extensive analysis of pericentromeric heterochromatin transition regions is needed to provide a more global view on the structure and evolution of this highly interesting part of the tomato genome.
MATERIALS AND METHODS
FISH Mapping
Immature tomato (Lycopersicon esculentum) flower buds of about 3.0 mm were harvested and fixed in Carnoy's solution (ethanol:glacial acetic = 3:1). Microsporocytes at meiosis were squashed in acetocarmine solution according to Wu (1967). Slides were frozen in liquid nitrogen. After removing the coverslips, they were dehydrated through an ethanol series (70%, 90%, and 100%) prior to use in FISH. The FISH procedure used for chromosomes was performed according to Jiang et al. (1995). BAC DNA of the tomato BAC clone 56B23 was isolated using a standard alkaline extraction and labeled by nick translation with digoxigenin-16-dUTP (Roche Diagnostics, Indianapolis). Probes were detected with fluorescein isothiocyanate-conjugated sheep-antidigoxigenin (Roche Diagnostics). Chromosomes were counterstained with 4′,6-diamidino-phenylindole in an antifade solution (Vector Laboratories, Berlingame, CA). Chromosomes and FISH signal images were captured under the Olympus BX61 fluorescence microscope in conjunction with a microCCD camera. Greyscale images were captured for each color channel and then merged using the software of Image-Pro Plus.
RT-PCR Analysis
Total RNA was extracted from roots and shoots of tomato with Trizol reagent (Invitrogen, Carlsbad, CA). After elimination of genomic DNA contamination by treatment with DNAase I (Promega, Madison, WI) at 37°C for half an hour, 3 μg of total RNA was used for RT with an anchored oligo(dT) primer and 200 units of Moloney murine leukemia virus reverse transcriptase (Invitrogen) in a reaction volume of 20 μL according to the manufacturer's instructions. RT-PCR analysis was performed following the protocol described by Li et al. (2004) with the gene-specific primers (5′-GTACCGAGAACATGTGAAAG-3′ and 5-TTCCTCGCTTGTCAATGCAG-3′ for 56B23_g1, 5-CCCAAACGACTGTATTTCAG-3′ and 5-GTACCAGCAGCCTTCATTGG-3′ for 56B23_g2, 5-GAGATAGTGGAAGTAGTTGC-3′ and 5-CTCTTCTCATCACCCATCAC-3′ for 56B23_g6). The PCR reaction conditions were 60 s at 94°C, 60 s at 53°C, 60 s at 72°C for 35 cycles and adding 6 min incubation at 72°C for extension. The PCR products were then cloned into pGEM T-easy vector (Promega, Madison, WI) and sequenced.
BAC Sequencing and Analysis
BAC DNA was isolated using a large-construct kit (Qiagen, Valencia, CA) and the subcloning library was constructed using a TOPO shotgun cloning kit from Invitrogen. After shotgun sequencing, the sequences were assembled using the PRED/PHRAP software (Ewing et al., 1998). A total of 2,639 reads were produced giving an average coverage of 8.4 x. Remaining gaps between sequence contigs were filled by PCR. The sequence was deposited in GenBank under the accession number AY678298.
Sequence Analysis and Gene Annotation Method
DNA sequences were analyzed using BLAST algorithms (Altschul et al., 1997) against public and local nucleotide databases. Detailed analysis was performed with the GCG Sequence Analysis Software Package version 10.1 (Madison, WI), the EMBOSS package (Rice et al., 2000), and by dot plot (DOTTER; Sonnhammer and Durbin, 1995). Putative genes were determined by a combination of coding region prediction software (GENSCAN, FGENESH, and MZEF with Arabidopsis and/or monocot matrix, available through the RiceGAAS Web site, http://ricegaas.dna.affrc.go.jp) and similarity searches against transcript and protein sequence databases. Gene predictions as well as amino acid and nucleotide alignments were manually evaluated to improve automatic predictions. Final annotation was performed with the Artemis tool (Rutherford et al., 2000).
Annotation and Classification of Transposable Elements
Putative elements were annotated by amino acid and nucleotide similarity searches against local databases of plant TEs using an E-value of E < 10−10. Ends of putative TEs were carefully investigated using dot-plot alignment of the query sequence against itself or against public Solanaceae genomic sequences. Novel elements were predicted according to the structural features of TEs. Dating experiments were carried out using full-length LTR retrotransposons and TRIM elements. 5′ and 3′ LTRs were aligned using GAP (GCG) and the Kimura-2 parameter distance (K) was calculated by MEGA-2 (Kumar et al., 2001) and the average substitution rate of 6.96×10−9 substitutions per synonymous site per year (Moniz de Sa and Drouin, 1996) was used to estimate the relative age of LTR retrotransposon integration as in (SanMiguel et al., 1998). The putative elements were classified according to their mobility mechanisms and named according to rules established for the Triticeae repeat database (TREP, http://wheat.pw.usda.gov/ITMI/Repeats/).
Colinearity Analysis with the Arabidopsis Genome
BLASTP searches were conducted with the tomato genes as query with a cutoff of E < 10−10, against a local database composed of Arabidopsis (Arabidopsis thaliana) sequences, downloaded from TAIR (http://www.arabidopsis.org). The Arabidopsis duplicated blocks and colinearity between tomato and Arabidopsis segments were displayed and analyzed using the Genome Pixelizer package (Cannon et al., 2003).
Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession number AY678298.
Acknowledgments
We thank two anonymous reviewers and the editor for their valuable comments.
Footnotes
-
↵1 This work was supported by the Chinese Ministry of Science and Technology (grant nos. 2004AA222061 to H.-Q.L. and 2002AA225011 to Z.C.), by the Chinese National Science Foundation (Talented Young Scientist grant no. 30225029 to H.-Q.L. and no. 30325008 to Z.C.), and by the Swiss National Science Foundation (grant nos. 3100–65114 to B.K. and 31–55288.98. to H.-Q.L.).
-
↵2 These authors contributed equally to the paper.
-
↵[w] The online version of this article contains Web-only data.
- Received December 11, 2004.
- Revised April 12, 2005.
- Accepted April 13, 2005.
- Published July 11, 2005.