|
|
||||||||
|
First published online February 27, 2003; 10.1104/pp.102.016311 Plant Physiol, March 2003, Vol. 131, pp. 1042-1053 A Phylogenomic Investigation of CYCLOIDEA-Like TCP Genes in the Leguminosae1Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh EH3 5LR, United Kingdom (H.C., R.T.P., Q.C.B.C.); Institute of Cell and Molecular Biology, University of Edinburgh, Edinburgh EH9 3JR, United Kingdom (H.C., Q.C.B.C.); Shanghai Institute of Plant Physiology, Chinese Academy of Sciences, 300 Fenglin Road, Shanghai 200032, China (D.L.); and Genetics Department, John Innes Centre, Colney Lane, Norwich NR4 7UH, United Kingdom (E.C.)
Numerous TCP genes (transcription factors with a TCP domain) occur in legumes. Genes of this class in Arabidopsis (TCP1) and snapdragon (Antirrhinum majus; CYCLOIDEA) have been shown to be asymmetrically expressed in developing floral primordia, and in snapdragon, they are required for floral zygomorphy (bilaterally symmetrical flowers). These genes are therefore particularly interesting in Leguminosae, a family that is thought to have evolved zygomorphy independently from other zygomorphic angiosperm lineages. Using a phylogenomic approach, we show that homologs of TCP1/CYCLOIDEA occur in legumes and may be divided into two main classes (LEGCYC group I and II), apparently the result of an early duplication, and each class is characterized by a typical amino acid signature in the TCP domain. Furthermore, group I genes in legumes may be divided into two subclasses (LEGCYC IA and IB), apparently the result of a duplication near the base of the papilionoid legumes or below. Most papilionoid legumes investigated have all three genes present (LEGCYC IA, IB, and II), inviting further work to investigate possible functional difference between the three types. However, within these three major gene groups, the precise relationships of the paralogs between species are difficult to determine probably because of a complex history of duplication and loss with lineage sorting or heterotachy (within-site rate variation) due to functional differentiation. The results illustrate both the potential and the difficulties of orthology determination in variable gene families, on which the phylogenomic approach to formulating hypotheses of function depends.
The considerable advances in plant
developmental genetics from a few model species have provided a
starting point for studying plant morphological diversity and evolution
at the molecular level. Genes that control development have been
implicated in the evolution of novel phenotypes (for review, see
Baum, 1998 Comparative expression studies rely on a phylogenetic framework to help
identify candidate genes (Eisen, 1998 In snapdragon (Antirrhinum majus L. [Lamiales,
Veronicaceae]), floral dorsal identity is controlled by two closely
related nuclear genes CYC and DICHOTOMA
(DICH; Luo et al., 1996 The Leguminosae is one such plant family where zygomorphy is believed
to have evolved separately from the Lamiales (Stebbins, 1974
Within the Papilionoideae, a few taxa with atypical near radial
symmetry have traditionally been considered basal members of this
subfamily, even transitional between caesalpinioids and papilionoids
(Polhill, 1981 In the model legumes Lotus japonicus, soybean (Glycine
max), and pea (Pisum sativum), CYC-like
genes have been isolated, and in the case of L. japonicus, two genes have been found to be asymmetrically expressed in the developing flower (D. Luo, unpublished data). This
study aims to expand these findings to other taxa from other major
papilionoid groups such as the dalbergioid and genistoid clades as well
as basal lineages (Pennington et al., 2001
As functional gene studies expand from model organisms to related
species, it becomes necessary to identify the functional counterparts
of genes well-characterized in model species. The phylogenomic method
proposes that orthology (i.e. common descent) is a likely predictor of
functional equivalence (Eisen, 1998
Legume CYC Sequence Characterization Thirty-eight sequences with a TCP and R domain were amplified using primers LEGCYC/F1 and R1 in 16 different taxa. Sequence number per taxon ranged from one to four, with only one sequence isolated from non-papilionoid taxa. However, basal papilionoid taxa, such as S. jorori and Dussia macroprophyllata Harms, had multiple copies comparable in number with more derived papilionoid species (see Table I for summary and GenBank accession no.). No evident sequence modifications (e.g. premature stop codons) were observed in papilionoids with unusual floral morphology.
Fragment length ranged from 274 bp (Pisum 1) to 427 bp (Clitoria 1), with a mean length of 333.81 (± 40.2) bp. These fragments were also highly variable in sequence (at the amino acid and nucleotide level), with numerous substitutions and indel events in the region between the TCP and R domain. As a result, unambiguous sequence alignment for all legume CYC-like sequences was only possible in the TCP and R domains. Position of Legume CYC-Like Sequences in the TCP Gene Family TCP domains of seven legume CYC-like protein sequences from two
species, C. purpurea and L. japonicus, were analyzed in the context of the TCP gene
family. Analysis of the TCP domain peptide matrix using protein
distance, parsimony, maximum likelihood (ML), and Bayesian methods
resulted in congruent trees with strong support values for the major
groups. Figure 4 shows the protein ML
unrooted phylogram, with support values obtained by Bayesian analysis
of the data. The 50% majority rule (MR) protein distance and maximum parsimony trees are also shown for comparison (Figs.
5 and 6, respectively). All analyses strongly suggest that the TCP gene family
can be divided into two main groups: the PCF group (recovered in every
analyses with 100% support values) and a second group containing
CYC/TB1 and, among others, the five Arabidopsis genes (TCP1, TCP12,
TCP18, TCP2, and TCP24) with an R domain. These results confirm the
conclusions of Cubas (2002)
All analyses suggest that the legume CYC (LEGCYC) sequences from C. purpurea and L. japonicus (with the exception of Cadia 4) form a strongly supported group (found in 92% of Bayesian trees). This monophyletic group (here called LEGCYC) is sister to the CYC-TCP1 clade in the ML, Bayesian (Fig. 4) and distance (Fig. 5) trees. LEGCYC genes are therefore putative orthologs of CYC and TCP1. Cadia 4 is recovered in ML (Fig. 4) and distance (Fig. 5) analyses in the clade containing TB1, TCP12, and TCP18. The parsimony analysis is not informative because the relationship between the LEGCYC clade, Cadia 4, the CYC/LCYC/DICH clade, TCP1, TCP12, TCP18, and TB1 collapses in a 50% MR consensus tree (Fig. 6). Evolution of LEGCYC Genes: Partial TCP and R Nucleotide Analyses To recover major groups within the LEGCYC genes, we analyzed a matrix of 29 legume nucleotide sequences, rooted using snapdragon CYC and DICH, chosen to represent the full range of papilionoid legume taxa and sequence variation. The legume sequences could only be aligned with the snapdragon sequences using the highly conserved TCP and R domains. Parsimony analysis of the 67 informative sites out of 145 in the partial TCP and R nucleotide sequences produced 168 trees with a minimal length of 278 steps (additional branch swapping did not recover any more maximum parsimony trees), a consistency index (CI) of 0.424 and a retention index (RI) of 0.636, indicating fairly high homoplasy (parallel evolution) in the data. A strict consensus tree (Fig. 7), rooted on snapdragon genes CYC and DICH, resolves only one large supported clade within the ingroup (corresponding to group II, see below). Otherwise, only the relationship between sequences from different species of the same genus (e.g. Lupinus spp.) or related genera (e.g. Anthyllis and Lotus spp.) were supported in this analysis.
Model-based methods, such as Bayesian inference, are less sensitive to long-branch attraction and may therefore be better alternatives for analyzing homoplastic data. Bayesian analysis (Fig. 8) recovered two groups of legume sequences with support values (called here group I and group II). Group II had very high (97%) Bayesian support, whereas group I had weak support of 52%. Both groups include species from basal as well as more derived papilionoids and would appear to represent an early duplication event. However, relationships between sequences other than from closely related species or genera (e.g. Lupinus spp.) were difficult to interpret.
Therefore although parsimony analysis of this small data set did not resolve relationships between LEGCYC genes well, Bayesian analysis gave a more fully resolved tree. The poor performance of parsimony analysis was probably due to high homoplasy in the data set coupled with the low number of informative characters with consequent low phylogenetic signal. Evolution of LEGCYC Genes: Inclusion of Sequence Data between the TCP and R Domains The region between the TCP and R domains was then added to the initial data set, together with additional legume sequences. Due to the high length and sequence variability of this region, it could not be aligned with nonlegume sequences, and so all analyses are unrooted. Furthermore, because of length variability, alignment was difficult even within legumes. For this reason some of the positions in which the alignment was ambiguous were excluded from the analysis (300 aligned positions). Eight LEGCYC sequences were excluded altogether from this analysis for the same reason. The remaining 38 sequences covered 292 unambiguously aligned characters, which required the insertion of 34 gaps of 1- to 6-bp triplets for alignment. Parsimony analysis of the resulting 153 parsimony informative characters from the extended data set resulted in a single most parsimonious tree of 748 steps, with CI = 0.452 and RI = 0.601. The tree recovered two clades (groups I and II from the previous analyses) with a bootstrap value of 65%, although sequence relationship within these groups had little bootstrap support with the exception of sequences from closely related taxa (Fig. 9). The topology of the ML tree and the 50% MR consensus tree from the Bayesian analysis was identical, with only three nodes collapsing in the Bayesian consensus tree. The topology of those trees was also similar to the tree from the parsimony analysis, but the level of support for the nodes (estimated by Bayesian inference) was much higher in the model-based analysis. For instance, group I and II were recovered in the Bayesian analysis with high support (Fig. 10). Comparison of the partial TCP domains of amino acid sequences from group I and II showed that there were five synapomorphies, which suggests these clades are genuine (Fig. 11). These groupings were also supported by considerable differences in the variable region, such as presence or absence of motifs, which could not be included in the analysis.
Within group I, two sequences from most taxa were found. These segregated into two clades (A and B, see Fig. 9), which for the most part contained one sequence per taxon, with a few exceptions (for example Machaerium 1 and 2). Clade A contained one LEGCYC sequence from representatives from both the genistoid (Lupinus spp., Cadia sp., and Acosmium spp.) and robinioid (Lotus spp. and Anthyllis sp.) clades, whereas clade B contained another LEGCYC sequence from these taxa. Although these clades have no bootstrap support in the parsimony analysis, they were found the ML tree and in most Bayesian trees. This suggests a putative orthology relationship between sequences within these clades (IA and IB) and a further conserved duplication in LEGCYC sequences (LEGCYC IA and IB) of possible functional significance.
Presence of TCP1/CYC Orthologs in Leguminosae In the TCP gene family analyses, evidence from sequence similarity (PROTDIST) and evolution (ML and Bayesian analyses) strongly suggest that the legume CYC-like sequences examined here are homologous to the floral symmetry genes in snapdragon, CYC and DICH, and to the adaxially expressed floral gene TCP1 in Arabidopsis. Within this legume clade, a lower estimate of three CYC-like copies were found within the Papilionoideae, in species ranging from the basal-most clade (S. jorori) to higher papilionoids (e.g. the robinioid A. hermannia). Because of their apparent orthology with snapdragon CYC, these genes are candidates for floral developmental genes in the Leguminosae. However, these analyses, many of which lead to poorly resolved trees, highlight some of the difficulties in making detailed orthology statements within gene families and CYC-like genes in particular. Complex Evolution of CYC-Like Genes in the Leguminosae No simple pattern of gene evolution tracking organismal phylogeny
within the legume CYC family was recovered in the
phylogenetic analyses. Possible confounding factors such as
intermediate levels of concerted evolution, variation in the rate of
sequence evolution, and independent gene loss and duplication events,
which render the interpretation of gene trees difficult (Doyle,
1994 Different levels of variation in different parts of the sequences also
made analysis difficult. The highly conserved TCP and R boxes were
alignable but contained little phylogenetically informative information, whereas the variable region contained much variation but
was difficult to align. Furthermore, the variation in the TCP and R
domains was mainly at the synonymous third codon position and had a
high degree of homoplastic variation (accounting for two-thirds of the
steps required). High levels of homoplasy, possibly resulting in
long-branch attraction and therefore artificial groupings, is suggested
by the low support values of the trees from this analysis and the
collapse of many nodes in the maximum parsimony strict consensus trees.
Also, because the analysis includes clades between which functional
differentiation may exist, particular amino acid positions may be
subject to different selection pressure in different parts of the tree.
This within-site rate variation, or heterotachy (Lopez et al.,
2002 Two Major Subgroups (I and II) of Legume CYC-Like Genes Represent a Probable Early Duplication Despite the problematic nature of the data, certain patterns do emerge from the analyses. Results of the rooted Bayesian analysis suggests that LEGCYC genes can be divided into two main groups (referred to as I and II), which are characterized by different amino acid signatures. The results of the unrooted legume analyses of the extended dataset are also consistent with the two-group hypothesis, and these groups, although only moderately supported by maximum parsimony, are strongly supported by Bayesian inference. Taxa ranging from the basal-most papilionoids to highly derived species (from the "inverse repeat loss" clade, e.g. pea) have both groups of genes suggesting that these genes probably diverged after a duplication event before the evolution of the Papilionoideae. In addition to the putative amino acid synapomorphies in the TCP domain (Fig. 11), these groups are also distinguished by specific motifs in the otherwise variable region between the TCP and R domains. Evidence for Two Subgroups (IA and IB) of Group I LEGCYC Sequences Within group I, one other major duplication event appears to have occurred, giving rise to two subgroups IA and IB. We recovered genes belonging to both clades in a wide range of the species sampled here, implying that this duplication occurred at least early in the diversification of the papilionoids. However, the relationships between sequences within these groups appear
complex and require further investigation. Even though our sampling is
fairly extensive compared with many studies of developmental gene
phylogeny, further sampling may help resolve relationships within and
between gene copies. However, these results are in agreement with a
trend of independent duplications, and possible losses, with rapid gene
evolution outside of the conserved TCP and R domains, previously
documented in CYC-like genes families from other plant
groups (e.g. Gesneriaceae; Citerne et al.,
2000 The Limitations and Potential of Phylogenomics The lack of resolution resulting from problematic analyses (particularly using parsimony) highlights the limitations of phylogenomics, at least in rapidly evolving genes with high levels of homoplasy and in gene families where functional differentiation may lead to high levels of heterotachy (within-site rate variation). These problems may lead to difficulties in robust orthology estimation and hence functional prediction. In this study, Bayesian inference gives better resolution than parsimony; with the large amount of homoplasy in these data it is likely that model-based methods such as Bayesian inference will outperform parsimony. The recognition of a major legume CYC-like (LEGCYC) group in this study does however suggest likely candidate genes for functional equivalents of CYC/TCP1. Furthermore, within this group of legume CYC candidates, further subgroups are recognized in this study (LEGCYC IA, IB, and II), inviting investigation of possible functional differences between these. Thus even where phylogenetic analyses are difficult, partial resolution may still enable hypotheses to be generated. Although we recognize the limitation of phylogenomics, we still regard this approach as extremely promising even with relatively intractable gene families.
Molecular Methods: DNA Extraction, PCR, Cloning, and Sequencing For each species, genomic DNA was extracted from either
fresh or silica dried leaf material following a modification of the cetyl-trimethyl-ammonium bromide procedure of Doyle and Doyle (1987) The region delimited by the conserved TCP and R domains was
amplified using primers LEGCYC/F1, 5'-TCA GGG SYT GAG GGA CCG-3', and
LEGCYC/R1, 5'-TCC CTT GCT CTT GCT CTT GC-3'. These primers were
designed based on available sequences of CYC-like genes from Lotus japonicus and soybean (Glycine max;
D. Luo, unpublished data), compared with nucleotide sequences of the
TCP and R domains from snapdragon (Antirrhinum majus;
CYC, Y16313; and DICH, AF199465),
Arabidopsis (TCP1, AC002130; TCP12,
AC011914; and TCP18, AP001303) and maize (Zea
mays subsp. mays; TB1, AF340199).
PCR amplifications were carried out using Taq and reagents (Bioline, London) in a 50-µL mix containing 2.5 µL of 50 mM MgCl2, 5 µL of a 2 mM dNTP
mix, 2.5 µL of each primer (10 µm; MWG Biotech, Gersberg,
Germany), 1 unit of BIOTAQ, and 10 to 20 ng of DNA. Conditions
consisted of an initial denaturation step at 94°C (3 min), followed
by 30 cycles of denaturation at 94°C (1 min), annealing at 50°C to
55°C (30 s), and extension at 72°C (30 s), followed by a final
extension step at 72°C (5 min). PCR products were purified using the
QIAquick PCR Purification Kit (Qiagen Ltd, Dorking, Surrey, UK) and
then cloned using TOPO-TA Cloning Kit for Sequencing (Invitrogen,
Carlsbad, CA). Dye-terminator cycle sequencing was carried out using
Thermosequenase II (Amersham Biosciences UK, Little Chalfont,
Buckinghamshire, UK). Samples were analyzed on an ABI 377 Prism
Automatic DNA Sequencer (Applied Biosystems, Foster City, CA). In taxa
of particular interest (Cadia purpurea and
Lupinus nanus), 36 to 39 clones were sequenced,
respectively. In addition, the entire open reading frame of two gene
pairs in C. purpurea and
L. nanus was sequenced by genome walking
(modified from Siebert et al., 1995 Phylogenetic Analysis: Taxon and Sequence Selection CYC-like genes from legumes were placed in the context of the
TCP gene family, represented by certain key sequences from
L. japonicus and C.
purpurea (Lotus japonicus 1 and 2, Cadia 1-4; Table I).
To simplify the analysis, certain Arabidopsis TCP genes belonging to
the PCF group (Cubas, 2002 Results from these analyses guided the choice of sequences sampled to investigate the evolution of CYC-like genes in the legume family, using nucleotides of the TCP and R domains, with CYC, DICH, and TCP1 as outgroups. Twenty-nine taxa were sampled to represent the phylogenetic range of the papilionoids. For the detailed analysis within the legumes including the nucleotide
region between the TCP and R domains, a larger number of species was
used, with representatives from the three subfamilies Caesalpinioideae,
Mimosoideae, and Papilionoideae (Table
II). Particular emphasis was placed on
sampling representatives from all major papilionoid groups defined by
current molecular phylogenetic evidence (Doyle et al.,
1997
DNA Sequence Alignment Unambiguous alignment of all 54 legume CYC-like
DNA sequences from 25 taxa was only possible in the TCP and R domains
and reduced the matrix to 145 nucleotide characters. However, by
excluding certain problematic sequences, it was possible to align
certain parts of the variable region between these two conserved
domains as protein sequences that were then analyzed as nucleotide
sequences. Protein sequences were aligned using ClustalX
(Thompson et al., 1997 Phylogenetic Analysis Protein Methods Protein distance analysis was carried out using program from the PHYLIP package (Felsenstein, 1993DNA Methods Maximum parsimony analysis was carried out using PAUP* 4.0b7 (Phylogenetic Analysis Using Parsimony, version 4.0b7, Sinaur Associates, Sunderland, MA). Heuristic searches with 1,000 random addition replicates (to avoid local optima) and tree bisection reconnection (TBR) branch swapping were conducted with steepest descent and multrees options selected. A maximum of 10 minimal length trees was retained per replicate, and a further heuristic search by TBR was carried out on the shortest trees. Branch support values were calculated by 1,000 bootstrap replicates with simple sequence addition and a maximum of 10 minimal length trees retained per replicate. This search method was carried out both for the TCP and R nucleotide matrices, as well as the matrix incorporating certain variable regions. Bayesian phylogenetic analysis of the TCP plus R data set was carried out using MrBayes v2.01 (Huelsenbeck and Ronquist, 2001 -distribution; Rodriguez et al., 1990Distribution of Materials Upon request, all novel materials described in this publication will be made available in a timely manner for noncommercial research purposes, subject to the requisite permission from any third-party owners of all or parts of the material. Obtaining permissions will be the responsibility of the requestor.
We thank the Royal Botanic Garden Edinburgh for use of laboratory and glasshouse facilities, the horticultural staff, and the laboratory staff (particularly Michelle Hollingsworth and Alex Ponge) for assistance. Julie Hofer (John Innes Centre) and Susan Barker (University of Western Australia) kindly made available DNA samples. We thank Debbie White (RGBE) for the photographs.
Received October 16, 2002; returned for revision November 20, 2002; accepted December 29, 2002. 1 This work was supported by The Carnegie Trust for the Universities of Scotland and by the Systematics Association.
2 Present address: Botanical Garden and Centre for Plant Research, University of British Columbia, 6804 Southwest Marine Drive, Vancouver, BC, Canada V6T 1Z4.
* Corresponding author; e-mail h.citerne{at}rbge.org.uk; fax 44-131-248-2901.
Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.102.016311.
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||