|
|
||||||||
|
First published online December 23, 2004; 10.1104/pp.104.053215 Plant Physiology 137:168-175 (2005) © 2005 American Society of Plant Biologists Site Preferences of Insertional Mutagenesis Agents in ArabidopsisCold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724 (X.P., L.S.); and Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany (Y.L.)
We have performed a comparative analysis of the insertion sites of engineered Arabidopsis (Arabidopsis thaliana) insertional mutagenesis vectors that are based on the maize (Zea mays) transposable elements and Agrobacterium T-DNA. The transposon-based agents show marked preference for high GC content, whereas the T-DNA-based agents show preference for low GC content regions. The transposon-based agents show a bias toward insertions near the translation start codons of genes, while the T-DNAs show a predilection for the putative transcriptional regulatory regions of genes. The transposon-based agents also have higher insertion site densities in exons than do the T-DNA insertions. These observations show that the transposon-based and T-DNA-based mutagenesis techniques could complement one another well, and neither alone is sufficient to achieve the goal of saturation mutagenesis in Arabidopsis. These results also suggest that transposon-based mutagenesis techniques may prove the most effective for obtaining gene disruptions and for generating gene traps, while T-DNA-based agents may be more effective for activation tagging and enhancer trapping. From the patterns of insertion site distributions, we have identified a set of nucleotide sequence motifs that are overrepresented at the transposon insertion sites. These motifs may play a role in the transposon insertion site preferences. These results could help biologists to study the mechanisms of insertions of the insertional mutagenesis agents and to design better strategies for genome-wide insertional mutagenesis.
Insertional mutagenesis techniques are key resources for studying the gene functions of Arabidopsis (Arabidopsis thaliana). These techniques use either maize (Zea mays) transposable elements (Fedoroff, 1989
T-DNA insertional mutagenesis techniques use a portion of the tumor-inducing plasmid from A. tumefaciens that in nature induces crown galls by transferring T-DNA into the nucleus of plant cells. Upon infection, the T-DNA is transferred into host cells and inserts into the nuclear genome (Gordon, 1998
Previous studies (Rubin and Spradling, 1982
We have developed Arabidopsis thaliana Insertion Database (ATIDB) to store information about insertional mutagenesis lines and to analyze the distributions of their insertion sites (Pan et al., 2003
Transposon-Based Agents Have Marked Preference for High GC Content whereas T-DNA-Based Agents Show Preference for Low GC Content We stratified the Arabidopsis genome by GC content as described in "Materials and Methods" and calculated the insertion frequency of each of the four insertional mutagenesis agents. As there are few extremely GC-poor (0%10%) or GC-rich (70%100%) regions, we removed these regions from our analysis. As shown in Figure 1, the insertion frequencies of Ds and dSpm transposons significantly increase with increasing GC content, from approximately 3 insertions/Mb in 10% to 20% GC regions to 17 insertions/Mb in 60% to 70% GC regions. In contrast, the insertion frequency of T-DNAs shows a preference for low GC content regions, especially the 20% to 30% GC region. The window size change from 20 to 50 bp and the starting position variation of the window had no noticeable effect on these measurements.
Transposon Insertions Preferentially Occur at the 5' Ends of Coding Regions while T-DNA-Based Agents Favor Upstream Regions
Using gene annotation data from The Institute for Genomic Research (TIGR; Wartman et al., 2003
To determine whether there is an insertion site bias relative to the coding regions, we took the region 900 bp upstream of a translation start codon and divided it equally into 9 subregions. We also took the region 1,000 bp downstream of the start codon and divided it equally into 10 subregions. Then, we calculated the positional distribution of insertion sites relating to the start codon (Fig. 3). There is a striking preference of the T-DNA to insert toward the upstream region starting from approximately 100 bp upstream of the start codon. In contrast, more transposon insertions are located downstream of the translation start sites, especially in the region of 200 bp downstream of ATG.
To determine whether the insertion preferences of transposon and T-DNA insertions to different genome regions are reflections of differing preferences for GC-rich regions, we calculated GC content of 20-bp genomic sequences with centers at the insertion sites of each insertional mutagenesis agent in different regions, in parallel with sequences randomly taken from each region. As shown in Table II, GC content of transposon insertion sites is higher than that of T-DNA insertion sites in almost every region. This confirms that transposon insertions prefer high GC content sites, whereas T-DNA insertions favor low GC content regions. The coding exons have high GC content, which may lead to a higher frequency of transposon insertion sites than that of T-DNA insertion sites in this region. The transposon Ds insertion sites have high GC content in 5' UTRs, which is probably a reason causing the preference of these insertions to this compartment. The other genome regions do not show that GC contents at different insertion sites are significantly different from the control.
Transposon Insertions Recognize a Set of Sequence Motifs As described in "Materials and Methods," we used the Multiple EM for Motif Elicitation (MEME) algorithm to search for motifs that are overrepresented at insertion sites using randomly selected subsets of the insertion sites as training sets. Motifs that were identified by MEME in two independent training sets were pursued further. In this manner, we identified three full invariant candidate motifs and six partial motifs with one or more ambiguous bases (Table III) to study further. We found that motif 11 contains motif 1, both motifs 21 and 22 contain motif 2, and motifs 31, 32, and 33 all contain motif 3. To assess the significance of these candidate motifs, we calculated their occurrence frequencies in 20-bp genomic sequences centered at the insertion sites of the corresponding insertional mutagenesis agent. As a control, we used 6 sets of 500 20-bp sequences randomly selected from the entire genome. As shown in Table III, nine candidate motifs are overrepresented at Ds insertion sites. Motifs 1, 2, 11, 21, 22, 31, and 33 are also overrepresented at dSpm insertion sites (P value < 0.05 or 0.01).
None of the above motifs occur more frequently near GABI-Kat T-DNA and FLAGdb T-DNA insertion sites than over the entire genome (the control), suggesting that T-DNA insertion site preference may be independent of small sequence motifs.
To distinguish whether the identified sequence motifs play a role in insertion site preference of transposable elements independently of GC content or position relative to protein coding genes, we stratified the data set by calculating the occurrence frequency of each motif in 20-bp genomic sequences centered at either Ds or dSpm insertion sites in the region of 60% to 70% GC content and the region 200 bp downstream of the translation start site, respectively (Table IV). As a control, we used 6 sets of 500 20-bp sequences selected at random from either testing region of Arabidopsis genome, through which we wished to control for any possible associations between the identified motifs and nucleotide sequences common in either testing region. All of the identified motifs are also overrepresented at the 3' ends of Ds flanking sequences and the 5' ends of dSpm flanking sequences (Table V). Because a full motif is included in one or more partial motifs (Table III), we need only to consider the occurrence frequencies of the partial motifs in a region. As shown in Table IV, motifs 11, 21, 22, 31, and 33 at dSpm transposon insertion sites and motifs 11, 21, 31, 32, and 33 at Ds transposon insertion sites are overrepresented in the region of 60% to 70% GC content. The summed total occurrence frequencies of the overrepresented motifs at Ds and dSpm insertion sites are approximately 34% and 25%, respectively, compared to the control with 22% and 19%. This demonstrates that the insertion site preference to the GC-rich area is correlated with the distribution of these motifs. Interestingly, motifs 11, 21, 22, 31, and 33 at dSpm transposon insertion sites and all of the 6 partial motifs at Ds transposon insertion sites are also overrepresented in the region 200 bp downstream from the translation start codon. The summed frequencies of the overrepresented motifs at dSpm and Ds insertion sites are approximately 25% and 33%, respectively, compared to the control with 19% and 24%. This suggests that downstream insertion preference of the transposon-based agents may in part be due to the distribution of these motifs.
In this study, we used large data sets from several insertional mutagenesis agents to analyze insertion site distributions and thereby provided powerful representation of the site integration process and a good comparison among different insertional mutagenesis agents in Arabidopsis. We found that both transposon and T-DNA agents have insertion site preferences but that these preferences are distinctly different.
Our results suggest that both Ds and dSpm transposon-based insertion mutagenesis in Arabidopsis prefer sites with high GC content. This result has not previously been reported in plants and may shed light on the mechanism of insertion of these vectors. Alternatively, it may be that insertions in high GC content regions are preferentially selected for the GC content of the flanking region enhances expression of the vector's selectable antibiotic resistance marker. Interestingly, our results are similar to the results of Liao et al. (2000)
In contrast to the results with transposon-based agents, we found that T-DNA-based agents prefer low GC content regions. This is consistent with the finding of Brunaud et al. (2002)
Transposon-based insertional mutagenesis agents preferentially occur at the 5' ends of gene coding regions, while T-DNA insertions favor the regions immediately upstream of the start codon, 3' UTRs, and intergenic regions. Our findings are supported by a previous study (Parinov et al., 1999 While we cannot determine whether some of these biases are the result of primary insertion-site preference at the time of vector insertion or are due to subsequent selection of mutated lines, in practical terms these observations show that these two kinds of mutagenesis agents could complement one another well, and neither alone is sufficient to achieve the goal of saturation mutagenesis in Arabidopsis. These observations also suggest that transposons may be more effective for gene disruptions and gene traps, while T-DNAs are more effective for activation tagging and enhancer trapping. The latter has been proven to be practically useful in Arabidopsis due to extensive functional redundancy in its genome, while knockout mutants often result in no obvious phenotype.
Another interesting result of our study is that transposon-based insertional mutagenesis agents in Arabidopsis may recognize a small set of sequence motifs. Craig (1997)
In contrast to our findings in the transposon-based insertional mutagenesis agents, we were unable to identify sequence motifs associated with T-DNA insertion using the motif discovery algorithm described in "Materials and Methods." However, Brunaud et al. (2002)
Insertion Sites
We annotated the insertion sites for four major insertional mutagenesis agents in Arabidopsis (Arabidopsis thaliana): the maize (Zea mays) transposable elements Ds and dSpm and the Agrobacterium tumefaciens T-DNAs from GABI-Kat and FLAGdb/FST and then stored them in ATIDB at http://gremlin6.zool.iastate.edu/. The annotation of Ds and dSpm transposon insertion sites was described by Pan et al. (2003) We retrieved all insertion sites of the four insertional mutagenesis agents from ATIDB at http://gremlin6.zool.iastate.edu/cgi-perl/insertion_tab?Line_type=X (where X = GT:ET for Ds, X = SM for dSpm transposons, X = FTN for FLAGdb and X = GTN for GABI-Kat T-DNAs). After removal of duplicated insertion sites, there are 6,318 Ds and 6,258 dSpm transposon and 7,131 and 11,762 GABI-Kat and FLAGdb T-DNA insertion sites that were used in further analysis (Table I).
Release 4 of the Arabidopsis genome and its annotation data were downloaded from the FTP site of the TIGR Arabidopsis genome annotation database (Wartman et al., 2003
We partitioned the genome into deciles according to GC content across nonoverlapping 20, 30, 40, and 50-bp windows, respectively, and calculated the insertion frequencies in each compartment by dividing the number of insertion sites in each partition by the total number of basepairs in the partition.
We calculated the numbers of insertion sites in upstream regions, 5' UTRs, coding exons, introns, 3' UTRs, and intergenic regions in the entire genome of Arabidopsis. The number of insertion sites per megabase pair in each compartment was obtained by dividing the number of insertion sites in that compartment by the total size of the compartment. We defined the upstream region as 900 bp upstream of the translation start codon. If a 5' UTR was annotated, we truncated the upstream region after the transcriptional start site. The 3' UTR includes the untranslated exons and introns at the 3' end of a gene. The intergenic region is the region between two nearby genes. The heterochromatic region used for this analysis is at the location from 1,591,342 to 2,001,638 bp on chromosome 4, which is within the interval that includes bacterial artificial chromosomes from T5L23 to T27D20 as described in the previous publication (Cold Spring Harbor Laboratory, 2000
We used MEME (Bailey and Elkan, 1994
To verify the candidate motifs, we used the Regulatory Sequence Analysis Tools-DNA pattern match program (Helden et al., 2000
We thank Robert Martienssen's group at Cold Spring Harbor Laboratory, Mike Bevan's group at John Innes Centre, UK, Bernd Weisshaar's group at the Max Planck Institute for Plant Breeding Research, Germany, the FLAGdb/FST, France, and the Institute of Molecular Agrobiology, Singapore for allowing us to use their insertional mutagenesis data. We are grateful to Peter D'Eustachio for reading this manuscript. We also thank Michael Zhang, Jonathan Clarke, and Hong Liu for helpful comments on the data analysis and thank two anonymous referees for valuable suggestions. Received September 9, 2004; returned for revision November 13, 2004; accepted November 15, 2004.
1 Present address: Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50010. Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.104.053215. * Corresponding author; e-mail xpan{at}iastate.edu; fax 5152946755.
Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, et al (2003) Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301: 653657 Azpiroz-Leehan R, Feldmann KA (1997) T-DNA insertion mutagenesis in Arabidopsis: going back and forth. Trends Genet 13: 152156[CrossRef][Web of Science][Medline] Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp 2836 Balzergue S, Dubreucq B, Chauvin S, Le-Clainche I, Le Boulaire F, de Rose R, Samson F, Biaudet V, Lecharney A, Cruaud C, et al (2001) Improved PCR-walking for large-scale isolation of plant T-DNA borders. Biotechniques 30: 496504[Web of Science][Medline] Bancroft I, Dean C (1993) Transposition pattern of the maize element Ds in Arabidopsis thaliana. Genetics 134: 12211229[Abstract]
Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene regulatory elements in silico on a genomic scale. Genome Res 8: 12021215 Brunaud V, Balzergue S, Dubreucq B, Aubourg S, Samson F, Chauvin S, Bechtold N, Cruaud C, DeRose R, Pelletier G, et al (2002) T-DNA integration into the Arabidopsis genome depends on sequences of pre-insertion sites. EMBO Rep 3: 11521157[CrossRef][Web of Science][Medline] Craig NL (1997) Target site selection in transposition. Annu Rev Biochem 66: 437474[CrossRef][Web of Science][Medline] Fedoroff N (1989) Maize transposable elements. In M Howe, D Ber, eds, Mobile DNA. American Society for Microbiology, Washington, pp 375411 Freund RJ, Wilson WJ (1993) Statistical Methods. Academic Press, San Diego Gordon MP (1998) Discovery of the T-DNA of Agrobacterium tumefaciens. In S Kung, S Yang, eds, Discoveries in Plant Biology, Vol 1. World Scientific, Singapore, pp 111115 Helden JV, André B, Collado-Vides J (2000) A web site for the computational analysis of yeast regulatory sequences. Yeast 16: 177187[CrossRef][Web of Science][Medline] James DW Jr, Lim E, Keller J, Plooy I, Ralston E, Dooner HK (1995) Directed tagging of the Arabidopsis FATTY ACID ELONGATION1 (FAE1) gene with the maize transposon Activator. Plant Cell 7: 309319[Abstract]
Jones JDG, Carland FC, Lim E, Ralston E, Dooner HK (1990) Preferential transposition of the maize element Activator to linked chromosomal locations in tobacco. Plant Cell 2: 701707 Koncz C, Nemeth K, Redei GP, Schell J (1992) T-DNA insertional mutagenesis in Arabidopsis. Plant Mol Biol 20: 963976[CrossRef][Web of Science][Medline] Kuromori T, Hirayama T, Kiyosue Y, Takabe H, Mizukado S, Sakurai T, Akiyama K, Kamiya A, Ito T, Takuya T, et al (2004) A collection of 11800 single-copy Ds transposon insertion lines in Arabidopsis. Plant J 37: 897905[CrossRef][Web of Science][Medline]
Li Y, Rosso MG, Strizhov N, Viehoever P, Weisshaar B (2003) GABI-Kat SimpleSearch: a flanking sequence tags (FST) database for the identification of T-DNA insertion mutants in Arabidopsis thaliana. Bioinformatics 19: 14411442
Liao GC, Rehm EJ, Rubin GM (2000) Insertion site preferences of the P transposable element in Drosophila melanogaster. Proc Natl Acad Sci USA 97: 33473351
Machida C, Onouchi H, Koizumi J, Hamada S, Semiarti E, Torikai S, Machida Y (1997) Characterization of the transposition pattern of the Ac element in Arabidopsis thaliana using endonuclease I-Scel. Proc Natl Acad Sci USA 94: 86758680
Pan X, Liu H, Clarke J, Jones J, Bevan M, Stein L (2003) ATIDB: Arabidopsis Thaliana insertion database. Nucleic Acids Res 31: 12451251
Parinov S, Sevugan M, Ye D, Yang W-C, Kumaran M, Sundaresan V (1999) Analysis of flanking sequences from Dissociation insertion lines: a database for reverse genetics in Arabidopsis. Plant Cell 11: 22632270 Rosso MG, Li Y, Strizhov N, Reiss B, Dekker K, Weisshaar B (2003) An Arabidopsis thaliana T-DNA mutagenized population (GABI-Kat) for flanking sequence tag-based reverse genetics. Plant Mol Biol 53: 247259[CrossRef][Web of Science][Medline]
Rubin GM, Spradling AC (1982) Genetic transformation of Drosophila with transposable element vectors. Science 218: 348353
Samson F, Brunaud V, Balzergue S, Dubreucq B, Lepiniec L, Pelletier G, Caboche M, Lecharny A (2002) FLAGdb/FST: a database of mapped flanking insertion sites (FSTs) of Arabidopsis thaliana T-DNA transformants. Nucleic Acids Res 30: 9497
Sessions A, Burke E, Presting G, Aux G, McElver J, Patton D, Dietrich B, Ho P, Bacwaden J, Ko C, et al (2002) The high-throughput Arabidopsis reverse genetics system. Plant Cell 14: 29852994 Smith D, Yanai Y, Liu Y-G, Ishiguro S, Okada K, Shibata D, Whittier RF, Fedoroff NV (1996) Characterization and mapping of Ds-GUS-T-DNA lines for targeted insertional mutagenesis. Plant J 10: 721732[CrossRef][Web of Science][Medline]
Sundaresan V, Springer P, Volpe T, Haward S, Jones JD, Dean C, Ma H, Martienssen R (1995) Patterns of gene action in plant development revealed by enhancer trap and gene trap transposable elements. Genes Dev 9: 17971810 Szabados L, Kovacs I, Oberschall A, Abraham E, Kerekes I, Zsigmond L, Nagy R, Alvarado M, Krasovskaja I, Gal M, et al (2002) Distribution of 1000 sequenced T-DNA tags in the Arabidopsis genome. Plant J 32: 233242[CrossRef][Web of Science][Medline] The Cold Spring Harbor Laboratory, Washington University Genome Sequencing Center, and PE Biosystems Arabidopsis Sequencing Consortium (2000) The complete sequence of a heterochromatic island from a higher eukaryote. Cell 100: 377386[CrossRef][Web of Science][Medline] Thomas CM, Jones DA, English JJ, Carroll BJ, Bennetzen JL, Harrison K, Burbidge A, Bishop GJ, Jones JD (1994) Analysis of the chromosomal distribution of transposon-carrying T-DNAs in tomato using the inverse polymerase chain reaction. Mol Gen Genet 242: 573585[CrossRef][Web of Science][Medline] Tinland B (1996) The integration of T-DNA into plant genomes. Trends Plant Sci 1: 178183[CrossRef]
Tissier AF, Marillonnet S, Klimyuk V, Patel K, Torres MA, Murphy G, Jones JDG (1999) Multiple independent defective Suppressor-mutator transposon insertions in Arabidopsis: a tool for functional genomics. Plant Cell 11: 18411852 Vigdal TJ, Kaufman CD, Izsvak Z, Voytas DF, Ivics Z (2002) Common physical properties of DNA affecting target site selection of sleeping beauty and other Tc1/mariner transposable elements. J Mol Biol 323: 441452[CrossRef][Web of Science][Medline]
Wartman JR, Haas BJ, Hannick LI, Smith RK Jr, Maiti R, Ronning CM, Chan AP, Yu C, Ayele M, Whitelaw CA, et al (2003) Annotation of the Arabidopsis genome. Plant Physiol 132: 461468 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|