|
|
||||||||
|
First published online September 15, 2006; 10.1104/pp.106.083642 Plant Physiology 142:820-830 (2006) © 2006 American Society of Plant Biologists Divergence of the Dof Gene Families in Poplar, Arabidopsis, and Rice Suggests Multiple Modes of Gene Evolution after Duplication1,[W]Department of Plant Sciences, University of Tennessee, Knoxville, Tennessee 37996 (X.Y., G.A.T., Z.-M.C.); and Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831 (X.Y., G.A.T.)
It is widely accepted that gene duplication is a primary source of genetic novelty. However, the evolutionary fate of duplicated genes remains largely unresolved. The classical Ohno's Duplication-Retention-Non/Neofunctionalization theory, and the recently proposed alternatives such as subfunctionalization or duplication-degeneration-complementation, and subneofunctionalization, each can explain one or more aspects of gene fate after duplication. Duplicated genes are also affected by epigenetic changes. We constructed a phylogenetic tree using Dof (DNA binding with one finger) protein sequences from poplar (Populus trichocarpa) Torr. & Gray ex Brayshaw, Arabidopsis (Arabidopsis thaliana), and rice (Oryza sativa). From the phylogenetic tree, we identified 27 pairs of paralogous Dof genes in the terminal nodes. Analysis of protein motif structure of the Dof paralogs and their ancestors revealed six different gene fates after gene duplication. Differential protein methylation was revealed between a pair of duplicated poplar Dof genes, which have identical motif structure and similar expression pattern, indicating that epigenetics is involved in evolution. Analysis of reverse transcription-PCR, massively parallel signature sequencing, and microarray data revealed that the paralogs differ in expression pattern. Furthermore, analysis of nonsynonymous and synonymous substitution rates indicated that divergence of the duplicated genes was driven by positive selection. About one-half of the motifs in Dof proteins were shared by non-Dof proteins in the three plants species, indicating that motif co-option may be one of the forces driving gene diversification. We provided evidence that the Ohno's Duplication-Retention-Non/Neofunctionalization, subfunctionalization/duplication-degeneration-complementation, and subneofunctionalization hypotheses are complementary with, not alternative to, each other.
Darwin's positive selection theory cannot adequately explain the rapid rise and early diversification of more than 250,000 flowering plant species (Darwin and Seward, 1903
Based on the above information, we hypothesized that both genetic and epigenetic changes are involved in the evolution of duplicated genes (Fig. 1
). Genetic changes in proteins include retention (R), indicating that a copy retains the original motif organization and function; degeneration (D), indicating that a copy degenerates or loses one or more motifs and functions; and neofunctionalization (N), indicating that a copy acquires one or more motifs and functions. There are six possible combinations of these three types of genetic changes in coding regions for duplicated genes: RR, RD, RN, DD, NN, and ND. Epigenetic changes in proteins (i.e. protein methylation) or in promoter regions (i.e. DNA methylation) can cause functional diversification in duplicated genes that share the same motif structure and change them from RR into RD or DD type. RD and RN correspond to Ohno's hypothesis (Ohno, 1970
In this study, we compared genes of a plant-specific gene family, the DNA binding with one finger domain (Dof) transcription factor, in three angiosperm plants, poplar (Populus trichocarpa) Torr. & Gray ex Brayshaw, Arabidopsis (Arabidopsis thaliana), and rice (Oryza sativa), all of which have been completely sequenced and have undergone at least one round of genome-wide duplication (Bowers et al., 2003
The Dof Gene Family and Conservative Motifs in the Dof Proteins Using the 66 Dof protein sequences from Arabidopsis and rice to query the recently sequenced poplar genome, we identified 41 poplar Dof genes (Supplemental Table S1) and manually verified their uniqueness. These genes were analyzed along with the 36 and 30 Dof genes from Arabidopsis and rice, respectively (Supplemental Table S1).
A total of 41 conserved motifs were identified in all 107 Dof protein sequences (Table I
). The motif 1 was identified to be the Dof domain using the Conserved Domain Search Service (Marchler-Bauer and Bryant, 2004
Gene Duplicates and Ancestral Protein Sequences We constructed a maximum-likelihood phylogenetic tree using full-length protein sequences of the Dof genes (Fig. 2 ). From the phylogenetic tree, we identified 27 pairs of paralogous genes in the terminal nodes, which were well supported by bootstrap analysis. We predicted the immediate ancestral protein sequences of the 27 pairs of paralogs and identified protein motif structure of the duplicated genes and their ancestors (Fig. 3 ).
Genetic Divergence of the Dof Paralogs after Gene Duplication
All of the six evolutionary outcomes resulted from genetic changes (Fig. 1) that existed in the 27 Dof paralogs (Table II
), and there were 10 RR, six RN, two RD, three NN, three DN, and three DD evolutionary outcomes. According to Ohno's hypothesis (Ohno, 1970
About 37% (10 RR) of the 27 pairs of paralogs retained the ancestral motif organization in protein sequences. It is possible that these genes are still in the process of evolving. This possibility is supported by the fact that the average ks of RR paralogs is lower than that of NN, DN, or DD paralogs (Fig. 4 ). Another explanation for the RR paralogs is that Dof genes are transcriptional factors, which were preferentially retained in duplicate form, as shown in Arabidopsis (Seoighe and Gehring, 2004
Although the evolutionary fates of the promoter region of RR paralogs are not clear, the expression patterns of the sampled duplicate genes in poplar clearly suggest that some of the duplicates have diverged in functions after duplication. For example, the expression levels of Ptr_DOF02 and Ptr_DOF15 are generally stronger in all of the six tissues examined than those of Ptr_DOF 06 and Ptr_DOF25, respectively, and the expression of Ptr_DOF28 is stronger in shoot tip but weaker in leaf than that of Ptr_DOF 30 (Fig. 5 ). Diversification in expression of duplicate genes was also revealed in other types of paralogs (Table III ; Figs. 5 and 6 ). This indicates that the regulatory mechanism of the Dof paralogs might experience rapid evolution. Because changes in the promoter regions of duplicate genes can result in subfunctionalization (Force et al., 1999
Driving Forces for Genetic Divergence
To explore whether Darwinian positive selection was involved in driving gene divergence after duplication, we calculated the nonsynonymous/synonymous substitution ratio (ka/ks) for the coding region of some recently duplicated paralogs using a sliding window of 20 amino acids. Generally, a ka:ks ratio > 1 indicates positive selection, a ratio < 1 indicates negative or purifying selection, and a ratio = 1 indicates neutral evolution (Wang et al., 2005
Furthermore, more than one-half (21) of the 41 motifs in Table I was also found in non-Dof genes in the three plant species (Table IV ). We hypothesize that co-option of new motifs from non-Dof genes might be an important source of domain expansions in Dof genes. Such domain fusion or co-option was also observed in the other gene families (Raff, 1996
Epigenetic Divergence
To investigate possible involvement of epigenetic changes, we performed computer prediction of potential methylation sites in protein sequences of duplicated poplar genes (RR) that share the same motif structure. Our analysis revealed two Arg methylation sites within the motif regions in Ptr_DOF14, whereas another copy of the duplicate, Ptr_DOF04, does not have the methylation sites (Fig. 9). It is interesting that the expression pattern of Ptr_DOF04 is very similar to that of Ptr_DOF14 (Fig. 5). Because Ptr_DOF04 and Ptr_DOF14 are similar in terms of expression and protein motif structure, we suggest that the diversification of these two genes resulted from epigenetic changes such as Arg methylation. It has been reported that Arg methylation plays important roles in RNA processing, transcriptional regulation, and signal transduction (Bedford and Richard, 2005 In spite of the aforementioned possibility that sequence difference in the promoter regions accounts for the expression difference between duplicated genes (Table III; Figs. 5 and 6), it is still possible that epigenetics plays a role in expression diversification of these genes.
Our results show that the duplication and subsequent divergence of the Dof gene family in three plant species do not fit Onho's classical DRNNF model, or the more recently proposed alternatives SF/DDC or SNF alone, in terms of gene functions conferred by the coding regions. We conclude that the existing models are complementary with, not alternative to, one another. We anticipate that the six gene fates (RR, RD, RN, DD, NN, and ND) may also fit other gene families at variable ratios among them. We also suggest that epigenetics may play an important role in gene diversification after duplication. Based on our analysis of the Dof gene families in poplar, Arabidopsis, and rice, we also conclude that after a gene duplication event, the evolution of the duplicated genes is driven by purifying selection, Darwinian positive selection, local duplication and translocations, and domain co-option. The divergent expression may also be affected by epigenetic regulations.
Sequences
The Arabidopsis (Arabidopsis thaliana) Dof gene name list was obtained from two Arabidopsis transcription factor databases (http://Arabidopsis.med.ohio-state.edu/AtTFDB/ and http://datf.cbi.pku.edu.cn). The corresponding coding and protein sequences were downloaded from http://www.arabidopsis.org/ (The Institute for Genomic Research [TIGR] annotation release 5). The 5' end of AT5G62430 was found truncated and it was corrected according to http://datf.cbi.pku.edu.cn. The rice (Oryza sativa) Dof gene name list was obtained from http://ricetfdb.bio.uni-potsdam.de/ and the corresponding coding and protein sequences were downloaded from http://www.tigr.org/ (TIGR rice pseudomolecules release 3). Os03g42200 was manually corrected. The 5' end of Os07g48570 was found truncated after searching the expressed sequence tag database and was corrected according to http://ricetfdb.bio.uni-potsdam.de/. 9640.m03713 in TIGR rice pseudomolecules release 2 was assigned as Os12g38200 in TIGR rice pseudomolecules release 3. The sequence of Os12g38200 was found to be incorrect and was replaced by the sequence of 9640.m03713, which was confirmed by examining the pseudomolecules of the rice genome (Build 4.0) released recently by The International Rice Genome Sequencing Project (International Rice Genome Sequencing Project, 2005
Pairwise alignments of the paralogous nucleotide sequences were made using ClustalW (Thompson et al., 1994
A multiple alignment analysis was performed with M-Coffee (Wallace et al., 2006
Gapped Ancestral Sequence Prediction (Edwards and Shields, 2004
Protein motifs of the Dof genes were identified statistically using MEME (Bailey and Elkan, 1994
The tissue-specific expression analysis of Arabidopsis genes was performed using Meta-Analyzer in GENEVESTIGATOR (Zimmermann et al., 2004
For multiple-tissue RT-PCR analysis of gene expression in poplar (Populus trichocarpa) "Nisqually-1," stem and leaf tissues were taken from plants grown in vitro on media containing Murashige and Skoog salts (Murashige and Skoog, 1962
Comparative analysis of the 1,000-bp region upstream of the translation start codon (ATG) was performed using the GATA program (Nix and Eisen, 2005
Protein sequence alignment was performed with M-Coffee (Wallace et al., 2006 Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers CAA46875 (Dof1) and CAA56287 (Dof2).
The following materials are available in the online version of this article.
We thank F. Chen and R.C. Moore for reviewing the manuscript and valuable comments. We also thank the anonymous reviewers for the inspiring comments on the manuscript. Received May 15, 2006; accepted August 26, 2006; published September 15, 2006.
1 This work was supported by the National Science Foundation (grant no. 0421743 to G.A.T. and Z.-M.C.), by the U.S. Department of Energy/Oak Ridge National Laboratory (subcontract to Z.-M.C.), and by the Tennessee Agricultural Experiment Station. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: (Max) Zong-Ming Cheng (zcheng{at}utk.edu).
[W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.106.083642 * Corresponding author; e-mail zcheng{at}utk.edu; fax 8659745365.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403410[CrossRef][Web of Science][Medline] Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 2836[Medline] Bailey TL, Gribskov M (1998) Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 4854 Bedford MT, Richard S (2005) Arginine methylation an emerging regulator of protein function. Mol Cell 18: 263272[CrossRef][Web of Science][Medline] Bender J (2002) Plant epigenetics. Curr Biol 12: R412R414[CrossRef][Web of Science][Medline] Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16: 16791691 Boisvert FM, Chenard CA, Richard S (2005) Protein interfaces in signaling regulated by arginine methylation. Sci STKE 2005: re2 Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433438[CrossRef][Medline] Cameron RA, Chow SH, Berney K, Chiu TY, Yuan QA, Kramer A, Helguero A, Ransick A, Yun M, Davidson EH (2005) An evolutionary constraint: strongly disfavored class of change in DNA sequence during divergence of cis-regulatory modules. Proc Natl Acad Sci USA 102: 1176911774 Carroll SB (2001) Chance and necessity: the evolution of morphological complexity and diversity. Nature 409: 11021109[CrossRef] Chen H, Xue Y, Huang N, Yao X, Sun Z (2006) MeMo: a web tool for prediction of protein methylation modifications. Nucleic Acids Res 34: W249W253 Comeron JM (1999) K-Estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics 15: 763764 Darwin F, Seward AC, editors (1903) More Letters of Charles Darwin, Vol 2. John Murray, London Davies TJ, Barraclough TG, Chase MW, Soltis PS, Soltis DE, Savolainen V (2004) Darwin's abominable mystery: insights from a supertree of the angiosperms. Proc Natl Acad Sci USA 101: 19041909 De Bodt S, Maere S, Van de Peer Y (2005) Genome duplication and the origin of angiosperms. Trends Ecol Evol 20: 591597[CrossRef][Medline] Edwards RJ, Shields DC (2004) GASP: gapped ancestral sequence prediction for proteins. BMC Bioinformatics 5: 123[CrossRef][Medline] Force A, Cresko WA, Pickett FB, Proulx SR, Amemiya C, Lynch M (2005) The origin of subfunctions and modular gene regulation. Genetics 170: 433446 Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 15311545 Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li WH (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421: 6366[CrossRef][Medline] Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696704[CrossRef][Web of Science][Medline] He X, Zhang J (2005a) Gene complexity and gene duplicability. Curr Biol 15: 10161021[CrossRef][Web of Science][Medline] He X, Zhang J (2005b) Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169: 11571164 Hughes AL (1994) The evolution of functionally novel proteins after gene duplication. Proc Biol Sci 256: 119124 Hughes AL (2005) Gene duplication and the origin of novel proteins. Proc Natl Acad Sci USA 102: 87918792 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436: 793800[CrossRef][Medline] Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McInerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6: 29[CrossRef][Medline] Lee DY, Teyssier C, Strahl BD, Stallcup MR (2005) Role of protein methylation in regulation of transcription. Endocr Rev 26: 147170 Li WH, Yang J, Gu X (2005) Expression divergence between duplicate genes. Trends Genet 21: 602607[CrossRef][Web of Science][Medline] Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 11511155 Marchler-Bauer A, Bryant SH (2004) CD-Search: protein domain annotations on the fly. Nucleic Acids Res 32: W327W331 Moore RC, Purugganan MD (2003) The early stages of duplicate gene evolution. Proc Natl Acad Sci USA 100: 1568215687 Moore RC, Purugganan MD (2005) The evolutionary dynamics of plant duplicate genes. Curr Opin Plant Biol 8: 122128[CrossRef][Web of Science][Medline] Murashige T, Skoog F (1962) A revised medium for rapid growth and bioassay with tobacco tissue cultures. Physiol Plant 15: 473497[CrossRef] Nix DA, Eisen MB (2005) GATA: a graphic alignment tool for comparative sequence analysis. BMC Bioinformatics 6: 9[CrossRef][Medline] Ohno S (1970) Evolution by Gene Duplication. Springer-Verlag, Heidelberg Raes J, Vandepoele K, Simillion C, Saeys Y, Van de Peer Y (2003) Investigating ancient duplication events in the Arabidopsis genome. J Struct Funct Genomics 3: 117129[CrossRef][Medline] Raff R (1996) The Shape of Life. University of Chicago Press, Chicago Rapp RA, Wendel JF (2005) Epigenetics and plant evolution. New Phytol 168: 8191[CrossRef][Web of Science][Medline] Rodin SN, Parkhomchuk DV, Rodin AS, Holmquist GP, Riggs AD (2005) Repositioning-dependent fate of duplicate genes. DNA Cell Biol 24: 529542[CrossRef][Web of Science][Medline] Rodin SN, Riggs AD (2003) Epigenetic silencing may aid evolution by gene duplication. J Mol Evol 56: 718729[CrossRef][Web of Science][Medline] Seoighe C, Gehring C (2004) Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet 20: 461464[CrossRef][Web of Science][Medline] Shiu SH, Karlowski WM, Pan R, Tzeng YH, Mayer KF, Li WH (2004) Comparative analysis of the receptor-like kinase family in Arabidopsis and rice. Plant Cell 16: 12201234 Sterck L, Rombauts S, Jansson S, Sterky F, Rouze P, Van de Peer Y (2005) EST data suggest that poplar is an ancient polyploid. New Phytol 167: 165170[CrossRef][Web of Science][Medline] Taylor JS, Raes J (2004) Duplication and divergence: the evolution of new genes and old ideas. Annu Rev Genet 38: 615643[CrossRef][Web of Science][Medline] Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 46734680 Tocchini-Valentini GD, Fruscoloni P, Tocchini-Valentini GP (2005) Structure, function, and evolution of the tRNA endonucleases of Archaea: an example of subfunctionalization. Proc Natl Acad Sci USA 102: 89338938 Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 15961604 Wallace IM, O'Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34: 16921699 Wang W, Zheng H, Yang S, Yu H, Li J, Jiang H, Su J, Yang L, Zhang J, McDermott J, et al (2005) Origin and evolution of new exons in rodents. Genome Res 15: 12581264 Yanagisawa S (2002) The Dof family of plant transcription factors. Trends Plant Sci 7: 555560[CrossRef][Web of Science][Medline] Yanagisawa S (2004) Dof domain proteins: plant-specific transcription factors associated with diverse phenomena unique to plants. Plant Cell Physiol 45: 386391 Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136: 26212632 This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|