|
|
||||||||
|
First published online March 2, 2007; 10.1104/pp.107.095760 Plant Physiology 143:1452-1466 (2007) © 2007 American Society of Plant Biologists OPEN ACCESS ARTICLE
PlanTAPDB, a Phylogeny-Based Resource of Plant Transcription-Associated Proteins1,[C],[W],[OA]Plant Biotechnology, Faculty of Biology, University of Freiburg, D79104 Freiburg, Germany
Diversification of transcription-associated protein (TAP) families during land plant evolution is a key process yielding increased complexity of plant life. Understanding the evolutionary relationships between these genes is crucial to gain insight into plant evolution. We have determined a substantial set of TAPs that are focused on, but not limited to, land plants using PSI-BLAST searches and subsequent filtering and clustering steps. Phylogenies were created in an automated way using a combination of distance and maximum likelihood methods. Comparison of the data to previously published work confirmed their accuracy and usefulness for the majority of gene families. Evidence is presented that the flowering plant apical stem cell regulator WUSCHEL evolved from an ancestral homeobox gene that was already present after the water-to-land transition. The presence of distinct expanded gene families, such as COP1 and HIT in moss, is discussed within the evolutionary backdrop. Comparative analyses revealed that almost all angiosperm transcription factor families were already present in the earliest land plants, whereas many are missing among unicellular algae. A global analysis not only of transcription factors but also of transcriptional regulators and novel putative families is presented. A wealth of data about plant TAP families and all data accrued throughout their automated detection and analysis are made available via the PlanTAPDB Web interface. Evolutionary relationships of these genes are readily accessible to the nonexpert at a mouse-click. Initial analyses of selected gene families revealed that PlanTAPDB can easily be exerted for knowledge discovery.
The coordinated expression control of the entirety of genes in a given cell determines its physiological state, morphology, and identity in the organism. Reprogramming the set of transcribed genes during development or physiological adaptation requires modulated activation and deactivation of regulatory factors. In eukaryotes, the transcription of protein-coding genes is controlled by complex networks of transcription-associated proteins (TAPs). Specific transcription factors (TFs) activate or repress transcription of their target genes by binding to cis-active elements. Further transcriptional regulators (TRs) include the following: (1) coactivators and corepressors, which bind and influence TFs; (2) general transcription initiation factors, which recognize core promoter elements and recruit components of the basal transcription machinery; and (3) chromatin remodeling factors, which affect the accessibility of DNA through histone modifications and DNA methylation. The modular nature of TFs, possessing DNA-binding and protein-protein interaction domains, facilitates the high diversity of transcriptional regulation.
Changes in transcriptional regulation enhance complexity at the genetic level and thus can generate novel signal transduction pathways. Such changes, mediated by recombined complexes of regulatory proteins as well as by altered regulatory sequence elements, were repeatedly proposed to be a major driving force of evolution (Doebley and Lukens, 1998
The evolution of eukaryotic TF genes involves the processes of specific amplification of common families through duplication and diversification, as well as the shuffling of functional domains, resulting in lineage-specific families that can facilitate novel networks of protein-protein interactions and can take over new functions. In plants, the evolution and expansion of specific gene families seem to be more pronounced than in other eukaryotes (Lespinet et al., 2002
In recent years, much emphasis was placed on the understanding of regulatory networks controlling the transcription of genes. Genome-wide comparative analyses aid in revealing the evolution of transcriptional regulation that underlies the diversity of organisms. TAP genes and transcriptional networks have been studied extensively in unicellular organisms (e.g. Kyrpides and Ouzounis, 1999
While phylogenetic studies have been carried out for single TAP families, e.g. sigma factors, LEAFY (LFY)/FLO, MADS, and AP2 (Ichikawa et al., 2004
Availability: All resources are available via the PlanTAPDB Web interface (http://www.cosmoss.org/bm/plantapdb).
In terms of evolution, mosses are located half way between seed plants and algae and were therefore chosen as an offset for the global phylogenetic analysis of plant TAPs. In addition, mosses morphologically resemble the first plants that occupied the land (Kenrick and Crane, 1997
Filtering and Clustering of PSI-BLAST Results
During the PSI-BLAST searches, 369,118 hits were generated, representing a total of 144,941 distinct protein sequences (Fig. 1). To deal with the differences in degree of conservation and family size between gene families, we deployed an iterative six-step filtering scheme that optimizes the applied filtering criteria and the selected PSI-BLAST iteration for each query sequence individually. The most stringent step (6), demanding at least 45% sequence identity and 300 amino acids in alignment length, was designed to reduce domain-derived superfamilies to family or subfamily level. Smaller and more diverse superkingdom-spanning families were handled via the least stringent step (1), allowing hits from the fringe of the "twilight-zone" (Rost, 1999
While it greatly improves taxon sampling, the strategy to use both a huge multispecies-containing database like UniProt and the individual whole-genome protein predictions results in the detection of identical protein sequences from these overlapping databases. In addition, the same locus is often represented by more than one protein sequence due to divergent predicted gene models, splice variants, as well as sequencing and annotation errors. To cope with this problem, redundant copies of genes were eliminated prior to all functional analyses using an identity cutoff of In addition, a homology-reduced set of the 540 clusters was compiled to infer phylogenies (Fig. 1). Phylogenetic inference of large clusters is computationally costly, and the interpretation and inference of results from huge trees is difficult. As a total of 102 clusters had more than 150 members, these were condensed via stepwise homology reduction until the threshold of 150 members was reached. The homology-reduced clusters contain 29,317 cluster members in total, 26,595 of which are distinct. The average pairwise distances within clusters were found to be in the range of 12% to 95% identity with an average of 44%.
Due to errors introduced by the alignment algorithm, a certain fraction of columns in a multiple sequence alignment (MSA) generates noise that disturbs correct inference of phylogenetic relationships (Castresana, 2000
Many approaches to phylogenomics rely solely on a distance approach using neighbor joining (NJ; Saitou and Nei, 1987). However, NJ is known to be susceptible to noisy data, provides no confidence measures, and makes it hard to compute reliable distances for strongly divergent sequences. Probabilistic approaches, such as maximum likelihood (ML) and Bayesian methods, are known to overcome most of these problems, but both are very time consuming and thus usually not applied in large-scale phylogenomics approaches. We followed a combined approach by calculating ML consensus branch lengths using gamma-distributed rates from bootstrapped NJ topologies (Fig. 1). We compared published phylogenies of plant TAP families to those created by the approach presented here. In general, the same topology was recovered and the same conclusions could be drawn from the automatically generated phylogenies described here. For example, homologs of the floral regulator LFY, a plant-specific TF, are present in all land plants. The LFY phylogeny is characterized by two deep clefts separating (1) angiosperms from gymnosperms and ferns and (2) mosses from gymnosperms and ferns (Maizel et al., 2005
The functional annotation of the 540 candidate TAP clusters was inferred from identified Inter-Pro domains and associated Gene Ontology (GO) terms (Camon et al., 2004
TAP clusters with the same functional annotation (main and subfamily), which had not been merged during single linkage clustering due to the stringent parameters applied there, were manually grouped, resulting in 138 families of TAPs (Supplemental Table S1). This resulted in a total number of 14,680 nonredundant TAP family members, while the remaining overlap among the families was reduced to 3.6% (14,157 distinct nonredundant family members; Fig. 1), indicating a good separation of the gene families. Fifty-four of the TAP families are represented by more than one cluster of deviating but partially overlapping composition. These multiple clusters depict the particular TAP family either from a different taxonomic perspective (e.g. restricted to the plant lineage versus covering all kingdoms) or comprise different subfamilies. Because large TAP gene families are substantially divergent outside of their conserved domains, it appears more reasonable to deduce phylogenies from subgroups to be able to utilize as much homologous sequence information as possible. The phylogenetic trees were therefore derived for each of the 300 separate TAP clusters. We divided the TAP families into three categories according to their molecular function and associated GO terms: (1) DNA-binding TFs (59), which comprise direct activators or repressors of transcription; (2) TRs (56), comprising basal TFs interacting with RNA polymerase II or the core promoter, coactivators/corepressors, and chromatin remodeling factors; and (3) proteins with unknown function and/or domains that are possibly associated with transcriptional regulation (PT, 23; Fig. 1).
Previously, plant TF gene families were globally identified in two seed plants, Arabidopsis and rice (http://arabtfdb.bio.uni-potsdam.de/v1.1/, http://ricetfdb.bio.uni-potsdam.de/v2.1/; Riechmann et al., 2000
To analyze the level of completeness of our dataset, we compared numbers of PlanTAPDB family members with the size of well-known Arabidopsis TAP families. In Supplemental Table S2, those PlanTAP families that were previously described by Riechmann and colleagues (Riechmann et al., 2000
The PlanTAPDB Web interface (http://www.cosmoss.org/bm/plantapdb) provides dynamic access to the results generated in this study. TAP gene families can be retrieved by their accession numbers and identifiers or queried via keyword searches among the family annotations. In addition, all 37,247 TAP cluster sequences (Fig. 1) can be queried using BLAST. The PlanTAPDB portal gives an overview of all available families of TFs, TRs, and PTs in the form of grouped lists or a clickable image map of their overall taxonomic profile (described below). Both provide access to the PlanTAPDB family entry of interest via hyperlinks. The family viewer displays the results of the comprehensive manual annotation process (main family, subfamily, consensus Inter-Pro domains), as well as literature references and the list of annotated family members (including a graphical representation of their domain structure) for each of the 138 TAP families. The extensive information available for every member, e.g. Inter-Pro domains and taxon information, is cross-linked to the primary databases. The individual taxonomic profile, as well as species names and several other parameters, can be used to filter the family member list. All member sequences can be retrieved selectively in FASTA format. The cluster(s) of which a PlanTAPDB family is composed can be accessed via links to the corresponding cluster view(s) and contain the following features: (1) the cluster's description and an optional comment that provides additional information derived from the manual annotation process; (2) the distance matrix and detailed statistics in the form of histograms and box plots, describing the cluster's sequence diversity as found in the redundancy removal and the homology reduction phase of TreePipe; (3) a graphical overview describing the distribution of the sum-of-pairs score, Shannon's entropy score, the gap ratio, and the column removal threshold along the length of the complete alignment used in the selection of conserved sites; (4) the initial alignment of all cluster members used to build the distance matrix as well as the filtered alignment, which was used to infer the phylogeny, viewable and downloadable via the Jalview applet (Clamp et al., 2004
Previous global comparative studies of plant TAP gene families focused mainly on the subgroup of DNA-binding TFs in seed plants (for review, see Qu and Zhu, 2006
There seems to be a trend that total amounts of TAPs (Fig. 2
) are associated with the number of cell types in the respective organism (there is no significant difference between Arabidopsis and rice [P = 0.84], but P. patens differs significantly from both the flowering plants and the algae in this regard [P < 0.001]). A correlation of numbers of TFs with organism complexity (which might be defined as number of cell types) has previously been described for animals (Levine and Tjian, 2003
The gene family data (Fig. 3A ) reveal an extensive (4.7-fold) increase in the number of different TF gene families with the transition from the three algae (average 12.0 ± 5.0 families) to the three land plants studied here (average 56.7 ± 0.6). The number of TR families exhibits the same trend (average 28.7 ± 4.0 versus 54.0 ± 1.0), but less pronounced (1.9-fold), indicating an increased importance of TF genes for the evolution of the three land plants in question. Consistent with this, components of the basal transcriptional machinery and general TFs are known to be conserved across the three domains of life, while DNA-binding TFs have been shown to evolve in a lineage-specific way in plants as well as in animals (Coulson and Ouzounis, 2003
Species-Specific Expansion of Individual TAP Families
The absolute size of the 138 annotated TAP families for the above-mentioned six species is shown in Supplemental Table S1. The size distribution of the Arabidopsis TF gene families correlates well with published results (Qu and Zhu, 2006
As an example, members of a distinct branch of the His triad family (TF033, HIT) known from animals (Kijas et al., 2006
For visualization of the distribution of TAP family members across all taxonomic lineages, a taxonomic profile was created and is presented as a heat map in Figure 4
. Initial tests using taxonomic resolution fixed at the kingdom or order level, respectively, were not able to resolve the expected phylogeny of the contributing taxa using columnwise clustering (data not shown). Therefore, those taxonomic groups that contributed significantly to the overall distribution were selected as columns; the remainder of the Eubacteria, protists, plants, and animals were gathered into the respective "other" columns. Thus, a nonredundant representation of the taxonomic distribution was created that is able to resolve the expected phylogeny using columnwise clustering. To overcome the sampling bias presented by fully sequenced genomes, the columns were normalized. Subsequent clustering yielded the significantly correlated groups depicted in Figure 4. The top half of the taxonomic profile contains families that are predominantly present in plants. Within these, the first significantly correlated cluster is almost completely composed of large plant TF families, most of which have been described as plant specific before (highlighted by green text color), while the second cluster contains a mixture of plant TAP families not yet discovered in Asterids. Only a few families, mostly TRs, are abundant in both prokaryotes and eukaryotes (located mainly in the middle part of the profile). The families in the second half of the profile are shared between plants and other eukaryotes and are sometimes present in Eubacteria and Archaea as well. The TR families accumulate within these clusters, especially in the lowest part. This distribution correlates very well with published data (Riechmann et al., 2000
The WUSCHEL/WOX Phylogeny
The HB/WUSCHEL (WUS) family (TF032_373) exhibits a rigorous land plant-specific taxonomic profile, comprising the species Arabidopsis, tomato (Solanum lycopersicum), poplar (Populus spp.), rice, and P. patens. The consensus domains for this family are Homeobox (IPR001356), Homeodomain_like (IPR009057), and Homeodomain-rel (IPR012287). During redundancy filtering, 10 nearly identical sequences belonging to Arabidopsis, rice, and poplar were removed. The average identity between the remaining sequences is relatively low (36.26%); therefore, the alignment was reduced from an initial 950 columns to 167 columns that could be unequivocally aligned, comprising mainly the actual homeobox domain. Due to the low conservation grade of the WUS-related (WOX) gene family (e.g. 30.6% amino acid identity between Arabidopsis WOX9 and WOX14), several annotated homologs were not detected by the PSI-BLAST search and thus are missing from the above-mentioned phylogeny. To add those, all annotated Arabidopsis WUS/WOX sequences were retrieved from Swissprot. After retrieval of the remainder of the sequences using the PlanTAPDB Web interface, MSA and tree reconstruction were performed. The phylogeny is available via the Web interface as well, as an example for manually curated data to be added upon request. The resulting tree (Fig. 5
) is clearly separated into two clusters, one containing Arabidopsis WUS itself as well as the majority of WOX sequences, and the other containing Arabidopsis WOX 10, 13, and 14. While WUS has been shown to be involved in shoot meristem maintenance (Mayer et al., 1998
The COP1 Phylogeny
The three uppermost clusters of the taxonomic profile (Fig. 4) contain families that are generally present in plants and also appear erratically in other taxonomic groups. Among those, the PT family PT024 (COP1) can be found. It attracts attention because of the overrepresentation of moss sequences that is apparent from both the taxonomic profile (Fig. 4) and the species-specific expansion (Supplemental Table S1), which is in contrast to the generally lower amount of P. patens TAPs as compared to rice and Arabidopsis (Figs. 2 and 3). In angiosperms, the E3 Ubiquitine ligase COP1 acts as a photomorphogenesis/skotomorphogenesis switch by degradation of downstream factors in the dark, while it is inactivated by nuclear depletion in the light (Holm and Deng, 1999
Caveats PlanTAPDB users should be aware that the automated homolog detection and clustering approach resulted in the loss of some gene families, i.e. a low percentage (approximately 4%) of plant TAP families is missing. In addition, on average 19% of the gene family members known from well-annotated genomes are lacking. To present phylogenetic trees that can be viewed on a normal computer screen, large gene families have been reduced to contain a maximum of 150 homology-condensed members. Due to the fragmentary nature of the data (incomplete genome/transcriptome data, fragmentary sequences, sampling bias), the phylogenetic analyses might be biased or flawed. Taken together, users should take appropriate caution concerning the points raised above while interpreting the data.
The PlanTAPDB resource might be used as a starting point for knowledge discovery. Using the family and cluster annotation available through the Web interface, designated gene families can be located, e.g. by name or member sequence accession number. MSAs of the gene families as well as arbitrary sequence subsets can be retrieved. The taxonomic profile (Fig. 4, also available via the Web interface) and the overrepresentation analysis (Supplemental Table S1) might be employed to detect biased taxonomic distribution. Descriptive data, such as sequence conservation, gene family size, species distribution, and alignment properties, are available. Cross-links to sequence, domain, and literature databases enable simple access to related information. Finally, the phylogenetic trees offer an evolutionary vantage point for nonexperts.
So far, most comparative analyses dealing with plant TAPs have focused on TFs of Arabidopsis and rice. To broaden our evolutionary understanding of transcriptional regulation in plants, we have included three algae and a moss into the present analysis, as well as the complete UniProt database. In addition, we have analyzed both TFs and TRs, and have detected several novel PT families. Using automated methods, a stringent detection and representation of gene clusters has been established that can easily be expanded to cover more genomes in the future, while manual curation of gene clusters into families assures their quality. High-quality phylogenetic trees were created from these clusters and are available through an easy-to-use Web interface together with a multitude of accompanying data, such as alignments, domain-based family annotation, and taxonomic profiles. Instant knowledge discovery using the PlanTAPDB is straightforward, as has been demonstrated using several examples. In addition, such comparative data can be applied to aid phylogenomics. The general expansion of both the total number of TAP genes and the amount of TAP families seems to coincide with organism complexity. A dramatic increase in the complexity of transcriptional regulation, particularly at the level of TFs, might have occurred after the development of multicellularity, respective the transition from water to land. Subsequently, during land plant evolution, the intricacy of the previously established TF families enhanced again, possibly reflecting large-scale morphological and physiological changes paralleling angiosperm radiation. Apart from these general trends, distinct TAP gene families were subject to expansion in individual species. Interesting details about the evolution of the stem cell regulator WUS, the photomorphogenesis switch COP1, and the genotoxic stress-related HIT gene family were revealed.
Sequence Datasets
For the identification of Physcomitrella patens transcription-associated EST sequences, National Center for Biotechnology Information (NCBI) Entrez (Geer and Sayers, 2003
The results and resources presented here were generated using an automated phylogeny pipeline that utilizes BLAST and PSI-BLAST (Altschul et al., 1997
NCBI GenBank was queried using the keywords "transcription factor," "transcription activator," "transcription repressor," and "transcription regulator," as well as taxon IDs of Viridiplantae and nongreen algae (txids 33090, 136419, 3027, 33682, 38254, 2830, 2763, 33634). Additionally, Arabidopsis loci were extracted from TAIR matching the keyword "transcription factor." With this reference set of 7,476 TAPs, the clustered P. patens EST sequences were searched by TBLASTN. A total of 286 PFAM HMM profiles and 67 PROSITE patterns of transcription-associated domains without taxonomic restriction were used for motif searches in the same database. A total of 1,592 nonredundant P. patens candidate TAP sequences were identified. Full-length closest homologs of the 1,592 moss candidate TAP transcripts were determined via BLASTX (Altschul et al., 1997
PSI-BLAST searches were performed against the UniProt Knowledgebase, all available whole-genome predicted protein databases of plants and algae, and the predicted ORF of the P. patens virtual transcripts using an E-value threshold of 1E-4, a profile inclusion threshold of 1E-5, and four iterations. Up to 500 results per query were considered and parsed into the TreePipeDB. Each result set (composed of one query and its hits after one of the four PSI-BLAST iterations) was run through a series of six filter steps with increasing stringency concerning the length and percent identity of the PSI-BLAST matches (step 1: 25% identity/50-amino acid alignment length; step 2: 30%/60 amino acids; step 3: 35%/80 amino acids; step 4: 45%/100 amino acids; step 5: 45%/150 amino acids; step 6: 45%/300-amino acid length). For each query and iteration, the filtering process determines the first filtering step that reduces the result set to
Single-linkage clustering using a stringent hit-coverage-based distance measure was implemented in Perl and the TreePipeDB backend. Result sets of two queries were merged if they shared at least one hit covering the same region of this hit sequence. The length of the region to be shared depends on the previously selected filter step, namely, the most stringent filter step possible (e.g. result set A overlaps with B on hit X). A was filtered using step 6 and B using step 5. Hence, A and B can only then be merged into a cluster if they overlap to at least 300 amino acids (step 6 criteria) on sequence X. Result sets without any significant overlaps were added as single-query clusters. For all cluster members, the corresponding NCBI taxonomy annotation was retrieved and stored in TreePipeDB.
For the removal of redundant sequences, a MSA was performed using MAFFT FFT-NS-2 and pairwise distances were calculated using the EMBOSS distmat program. This alignment was used to infer initial phylogenies of the complete clusters. The resulting matrix was scanned for sequence pairs from the same species with a distance
Multiple alignments for a given cluster were performed using MAFFT G-INSI and ProbCons (clusters
Phylogenies for the representative cluster members were inferred using a Perl program on all clusters. After generation of 100 bootstrapped alignments using seqboot from the PHYLIP package, ML distance matrices were computed for these alignments using puzzleboot as implemented in Tree-Puzzle. These distance matrices were then used to infer topologies by applying the NJ algorithm as implemented in PHYLIP's neighbor program. Afterward, the resulting 100 trees were used to create a ML consensus topology using Tree-Puzzle. For the two steps where Tree-Puzzle was used to compute maximum likelihoods, eight gamma-distributed rates were used to model mutation rate heterogeneity and full (exact) ML parameter estimation was performed for each gene family. Manual ML trees were created using the same parameter settings. The WAG (Whelan and Goldman, 2001
The nonredundant cluster member sequences were annotated using Inter-ProScan 4.2 with all available databases of the Inter-Pro Release 12.1. The annotated domains and associated GO terms were stored in the TreePipeDB. Inter-ProScan searches (Quevillon et al., 2005
The PlanTAPDB family sizes in six genera, Arabidopsis, rice, P. patens, C. reinhardtii, C. merolae, and T. pseudonana, were inferred using the NCBI taxonomy information of the nonredundant list of family members. These values were normalized using the total amount of members per group (TF, TR, or PT) in order to account for the general differences in TAP family sizes. If the fraction of family members in a given species deviated from the arithmetic average of the group with a z score of
For visualization of the taxonomic composition of the TAP families (taxonomic profile), all taxa were allocated into 20 nonredundant taxonomic groups that were chosen because they contributed significantly to the distribution of NCBI taxonomy strings. After normalization for taxonomic group size (columnwise log ratio per average), the rows were used for average-linkage clustering with a centered Pearson-correlation distance and heat map visualization using Cluster 3.0 and JavaTreeview 1.0.12 (Eisen et al., 1998
Hypothesized differences in the size distribution of TAP gene families between organisms (Fig. 3B) were tested using two-sided t tests assuming unequal variances. Fisher's exact test was used to test for hypothesized differences between total number of genes of the six organisms (Fig. 2A). The resulting P values were adjusted for multiple testing by calculating the false discovery rate (Benjamini and Hochberg, 1995
The following materials are available in the online version of this article.
We thank T. Kretsch, T. Laux, and M. Woriedh for helpful discussions, A.K. Prowse for critically reading the manuscript, and several anonymous reviewers for helpful comments. Received January 10, 2007; accepted February 19, 2007; published March 2, 2007.
1 This work was supported by the German Research Foundation (grant nos. Re 837/73 and Re 837/101 to R.R.).
2 These authors contributed equally to the paper. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Stefan A. Rensing (stefan.rensing{at}biologie.uni-freiburg.de).
[C] Some figures in this article are displayed in color online but in black and white in the print edition.
[W] The online version of this article contains Web-only data.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.107.095760 * Corresponding author; e-mail stefan.rensing{at}biologie.uni-freiburg.de; fax 497612036945.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 33893402 Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57: 289300 Benton D (1990) Recent changes in the GenBank On-line Service. Nucleic Acids Res 18: 15171520 Bierfreund NM, Tintelnot S, Reski R, Decker EL (2004) Loss of GH3 function does not affect phytochrome-mediated development in a moss, Physcomitrella patens. J Plant Physiol 161: 823835[CrossRef][Web of Science][Medline] Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16: 16791691 Byzova MV, Franken J, Aarts MG, de Almeida-Engler J, Engler G, Mariani C, Van Lookeren Campagne MM, Angenent GC (1999) Arabidopsis STERILE APETALA, a multifunctional gene regulating inflorescence, flower, and ovule development. Genes Dev 13: 10021014 Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res (Database issue) 32: D262D266 Carles CC, Choffnes-Inada D, Reville K, Lertpiriyapong K, Fletcher JC (2005) ULTRAPETALA1 encodes a SAND domain putative transcriptional regulator that controls shoot and floral meristem activity in Arabidopsis. Development 132: 897911 Carroll SB (2005) Evolution at two levels: on genes and form. PLoS Biol 3: 11591166[Web of Science] Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17: 540552 Clamp M, Cuff J, Searle SM, Barton GJ (2004) The Jalview Java alignment editor. Bioinformatics 20: 426427 Coulson RM, Enright AJ, Ouzounis CA (2001) Transcription-associated protein families are primarily taxon-specific. Bioinformatics 17: 9597 Coulson RMR, Ouzounis CA (2003) The phylogenetic diversity of eukaryotic transcription. Nucleic Acids Res 31: 653660 Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330340 Doebley J, Lukens L (1998) Transcriptional regulators and the evolution of plant form. Plant Cell 10: 10751082 Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5: 113[CrossRef][Medline] Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 1486314868 Felsenstein J (1989) PHYLIP: Phylogeny Inference Package (Version 3.2). Cladistics 5: 164166 Fiala KI, Sokal RR (1985) Factors determining the accuracy of cladogram estimation: evaluation using computer-simulation. Evolution Int J Org Evolution 39: 609622[CrossRef][Web of Science] Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al (2006) Pfam: clans, web tools and services. Nucleic Acids Res (Database issue) 34: D247D251 Frickey T, Lupas AN (2004) PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res 32: 52315238 Fuellen G, Spitzer M, Cullen P, Lorkowski S (2003) BLASTing proteomes, yielding phylogenies. In Silico Biol 3: 313319[Medline] Gao G, Zhong Y, Guo A, Zhu Q, Tang W, Zheng W, Gu X, Wei L, Luo J (2006) DRTF: a database of rice transcription factors. Bioinformatics 22: 12861287 Geer RC, Sayers EW (2003) Entrez: making use of its power. Brief Bioinform 4: 179184 Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, Danchin EG (2005) FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform. BMC Bioinformatics 6: 198[CrossRef][Medline] Gueven N, Becherel OJ, Kijas AW, Chen P, Howe O, Rudolph JH, Gatti R, Date H, Onodera O, Taucher-Scholz G, et al (2004) Aprataxin, a novel protein that protects against genotoxic stress. Hum Mol Genet 13: 10811093 Guo A, He K, Liu D, Bai S, Gu X, Wei L, Luo J (2005) DATF: a database of Arabidopsis transcription factors. Bioinformatics 21: 25682569 Gutierrez RA, Green PJ, Keegstra K, Ohlrogge JB (2004) Phylogenetic profiling of the Arabidopsis thaliana proteome: What proteins distinguish plants from other organisms? Genome Biol 5: R53[CrossRef][Medline] Haecker A, Gross-Hardt R, Geiges B, Sarkar A, Breuninger H, Herrmann M, Laux T (2004) Expression dynamics of WOX genes mark cell fate decisions during early embryonic patterning in Arabidopsis thaliana. Development 131: 657668 Hannaert V, Saavedra E, Duffieux F, Szikora JP, Rigden DJ, Michels PA, Opperdoes FR (2003) Plant-like traits associated with metabolism of Trypanosoma parasites. Proc Natl Acad Sci USA 100: 10671071 Hedges SB, Blair JE, Venturi ML, Shoe JL (2004) A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4: 2[CrossRef][Medline] Holm M, Deng XW (1999) Structural organization and interactions of COP1, a light-regulated developmental switch. Plant Mol Biol 41: 151158[CrossRef][Web of Science][Medline] Hsia CC, McGinnis W (2003) Evolution of transcription factor function. Curr Opin Genet Dev 13: 199206[CrossRef][Web of Science][Medline] Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ (2006) The PROSITE database. Nucleic Acids Res (Database issue) 34: D227D230 Ichikawa K, Sugita M, Imaizumi T, Wada M, Aoki S (2004) Differential expression on a daily basis of plastid sigma factor genes from the moss Physcomitrella patens. Regulatory interactions among PpSig5, the circadian clock, and blue light signaling mediated by cryptochromes. Plant Physiol 136: 42854298 Iseli C, Jongeneel CV, Bucher P (1999) ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. In Proceedings of International Conference on Intelligent Systems for Molecular Biology. American Association for Artificial Intelligence, Menlo Park, CA, pp 138148 Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, Antonio BA, Aono H, Apweiler R, Bruskiewich R, et al (2007) Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res 17: 175183 Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8: 275282 Kamisugi Y, Cuming AC, Cove DJ (2005) Parameters determining the efficiency of gene targeting in the moss Physcomitrella patens. Nucleic Acids Res 33: e173 Kasahara M, Kagawa T, Sato Y, Kiyosue T, Wada M (2004) Phototropins mediate blue and red light-induced chloroplast movements in Physcomitrella patens. Plant Physiol 135: 13881397 Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33: 511518 Kenrick P, Crane PR (1997) The origin and early evolution of plants on land. Nature 389: 3339[CrossRef] Kieffer M, Stern Y, Cook H, Clerici E, Maulbetsch C, Laux T, Davies B (2006) Analysis of the transcription factor WUSCHEL and its functional homologue in Antirrhinum reveals a potential mechanism for their roles in meristem maintenance. Plant Cell 18: 560573 Kijas AW, Harris JL, Harris JM, Lavin MF (2006) Aprataxin forms a discrete branch in the HIT (histidine triad) superfamily of proteins with both DNA/RNA binding and nucleotide hydrolase activities. J Biol Chem 281: 1393913948 Kyrpides NC, Ouzounis CA (1999) Transcription in archaea. Proc Natl Acad Sci USA 96: 85458550 Lang D, Eisinger J, Reski R, Rensing SA (2005) Representation and high-quality annotation of the Physcomitrella patens transcriptome demonstrates a high proportion of proteins involved in metabolism in mosses. Plant Biol 7: 238250[CrossRef][Medline] Larroux C, Fahey B, Liubicich D, Hinman VF, Gauthier M, Gongora M, Green K, Worheide G, Leys SP, Degnan BM (2006) Developmental expression of transcription factor genes in a demosponge: insights into the origin of metazoan multicellularity. Evol Dev 8: 150173[CrossRef][Web of Science][Medline] Laubinger S, Marchal V, Gentilhomme J, Wenkel S, Adrian J, Jang S, Kulajta C, Braun H, Coupland G, Hoecker U (2006) Arabidopsis SPA proteins regulate photoperiodic flowering and interact with the floral inducer CONSTANS to regulate its stability. Development 133: 32133222 Leibfried A, To JP, Busch W, Stehling S, Kehle A, Demar M, Kieber JJ, Lohmann JU (2005) WUSCHEL controls meristem function by direct regulation of cytokinin-inducible response regulators. Nature 438: 11721175[CrossRef][Medline] Lespinet O, Wolf YI, Koonin EV, Aravind L (2002) The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res 12: 10481059 Levine M, Tjian R (2003) Transcription regulation and animal diversity. Nature 424: 147151[CrossRef][Medline] Madan Babu M, Teichmann SA, Aravind L (2006) Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J Mol Biol 358: 614633[CrossRef][Web of Science][Medline] Maizel A, Busch MA, Tanahashi T, Perkovic J, Kato M, Hasebe M, Weigel D (2005) The floral regulator LEAFY evolves by substitutions in the DNA binding domain. Science 308: 260263 Mayer KF, Schoof H, Haecker A, Lenhard M, Jurgens G, Laux T (1998) Role of WUSCHEL in regulating stem cell fate in the Arabidopsis shoot meristem. Cell 95: 805815[CrossRef][Web of Science][Medline] Messina DN, Glasscock J, Gish W, Lovett M (2004) An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res 14: 20412047 Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, et al (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res (Database issue) 33: D284D288 Mittmann F, Brucker G, Zeidler M, Repp A, Abts T, Hartmann E, Hughes J (2004) Targeted knockout in Physcomitrella reveals direct actions of phytochrome in the cytoplasm. Proc Natl Acad Sci USA 101: 1393913944 Newsham KK (2003) UV-B radiation arising from stratospheric ozone depletion influences the pigmentation of the Antarctic moss Andreaea regularis. Oecologia 135: 327331[Web of Science][Medline] Oravecz A, Baumann A, Mate Z, Brzezinska A, Molinier J, Oakeley EJ, Adam E, Schafer E, Nagy F, Ulm R (2006) CONSTITUTIVELY PHOTOMORPHOGENIC1 is required for the UV-B response in Arabidopsis. Plant Cell 18: 19751990 Perez-Rueda E, Collado-Vides J, Segovia L (2004) Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput Biol Chem 28: 341350[CrossRef][Web of Science][Medline] Qu LJ, Zhu YX (2006) Transcription factor families in Arabidopsis: major progress and outstanding issues for future research. Curr Opin Plant Biol 9: 544549[CrossRef][Web of Science][Medline] Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33: W116W120 Reece-Hoyes JS, Deplancke B, Shingles J, Grove CA, Hope IA, Walhout AJ (2005) A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks. Genome Biol 6: R110[CrossRef][Medline] Rensing SA, Rombauts S, Van de Peer Y, Reski R (2002) Moss transcriptome and beyond. Trends Plant Sci 7: 535538[CrossRef][Web of Science][Medline] Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, et al (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31: 224228 Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16: 276277[CrossRef][Web of Science][Medline] Riechmann JL, Heard J, Martin G, Reuber L, Jiang CZ, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, et al (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290: 21052110 Riese M, Faigl W, Quodt V, Verelst W, Matthes A, Saedler H, Munster T (2005) Isolation and characterization of new MIKC*-Type MADS-box genes from the moss Physcomitrella patens. Plant Biol (Stuttg) 7: 307314[CrossRef][Medline] Rosenberg MS (2005) Evolutionary distance estimation and fidelity of pair wise sequence alignment. BMC Bioinformatics 6: 102[CrossRef][Medline] Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12: 8594 Sabelli PA, Larkins BA (2006) Grasses like mammals? Redundancy and compensatory regulation within the retinoblastoma protein family. Cell Cycle 5: 352355[Web of Science][Medline] Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406425[Abstract] Satou Y, Satoh N (2005) Cataloging transcription factor and major signaling molecule genes for functional genomic studies in Ciona intestinalis. Dev Genes Evol 215: 580596[CrossRef][Web of Science][Medline] Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29: 29943005 Schiefthaler U, Balasubramanian S, Sieber P, Chevalier D, Wisman E, Schneitz K (1999) Molecular analysis of NOZZLE, a gene involved in pattern formation and early sporogenesis during sex organ development in Arabidopsis thaliana. Proc Natl Acad Sci USA 96: 1166411669 Schiex T, Gouzy J, Moisan A, de Oliveira Y (2003) FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences. Nucleic Acids Res 31: 37383741 Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18: 502504 Seoighe C, Gehring C (2004) Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet 20: 461464[CrossRef][Web of Science][Medline] Shigyo M, Hasebe M, Ito M (2006) Molecular evolution of the AP2 subfamily. Gene 366: 256265[CrossRef][Web of Science][Medline] Shiu SH, Shih MC, Li WH (2005) Transcription factor families have much higher expansion rates in plants than in animals. Plant Physiol 139: 1826 Sicheritz-Ponten T, Andersson SG (2001) A phylogenomic approach to microbial evolution. Nucleic Acids Res 29: 545552 Siegfried KR, Eshed Y, Baum SF, Otsuga D, Drews GN, Bowman JL (1999) Members of the YABBY gene family specify abaxial cell fate in Arabidopsis. Development 126: 41174128[Abstract] Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12: 16111618 Tautz D (2000) Evolution of transcriptional regulation. Curr Opin Genet Dev 10: 575579[CrossRef][Web of Science][Medline] Theissen G, Becker A, Di Rosa A, Kanno A, Kim JT, Munster T, Winter KU, Saedler H (2000) A short history of MADS-box genes in plants. Plant Mol Biol 42: 115149[CrossRef][Web of Science][Medline] Theissen G, Münster T, Henschel K (2001) Why don't mosses flower? New Phytol 150: 18[CrossRef][Web of Science] Trouiller B, Schaefer DG, Charlot F, Nogue F (2006) MSH2 is essential for the preservation of genome integrity and prevents homeologous recombination in the moss Physcomitrella patens. Nucleic Acids Res 34: 232242 Uenaka H, Wada M, Kadota A (2005) Four distinct photoreceptors contribute to light-induced side branch formation in the moss Physcomitrella patens. Planta 222: 623631[CrossRef][Web of Science][Medline] Valdar WS (2002) Scoring residue conservation. Proteins 48: 227241[CrossRef][Web of Science][Medline] Waller RF, McFadden GI (2005) The apicoplast: a review of the derived plastid of apicomplexan parasites. Curr Issues Mol Biol 7: 5779[Web of Science][Medline] Weigel D, Ahn JH, Blazquez MA, Borevitz JO, Christensen SK, Fankhauser C, Ferrandiz C, Kardailsky I, Malancharuvil EJ, Neff MM, et al (2000) Activation tagging in Arabidopsis. Plant Physiol 122: 10031013 Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18: 691699 Xiong Y, Liu T, Tian C, Sun S, Li J, Chen M (2005) Transcription factors in rice: a genome-wide comparative analysis between monocots and eudicots. Plant Mol Biol 59: 191203[CrossRef][Web of Science][Medline] Yi C, Deng XW (2005) COP1: from plant photomorphogenesis to mammalian tumorigenesis. Trends Cell Biol 15: 618625[CrossRef][Web of Science][Medline] Zhou DX, Bisanz-Seyer C, Mache R (1995) Molecular cloning of a small DNA binding protein with specificity for a tissue-specific negative element within the rps1 promoter. Nucleic Acids Res 23: 11651169 Zmasek CM, Eddy SR (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics 17: 383384 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|