|
|
||||||||
|
Plant Physiology 135:773-782 (2004) © 2004 American Society of Plant Biologists Genome-Wide ORFeome Cloning and Analysis of Arabidopsis Transcription Factor Genes1,[w]Peking-Yale Joint Center for Plant Molecular Genetics and Agrobiotechnology, College of Life Sciences, and the National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing 100871, China (W.G., Y.-P.S., L.-G.M., Y.P., Y.-L.D., D.-H.W., C.-X.D., Y.-H.C., X.-Y.Y., Y.G., D.Z., Y.L., H.-Y.G., L.-J.Q., S.-N.B., J.-D.Z., Y.-X.Z.); State Key Laboratory of Genetic Engineering, Department of Biochemistry, School of Life Sciences, Fudan University, Shanghai 200433, China (J.-Y.Y., L.-D.H., X.-P.W.); Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China (X.-F.L., X.-L.T., J.-Y.M., J.-Y.L., J.Z.); Key Lab. of MOE for Plant Developmental Biology, College of Life Sciences, Wuhan University, Wuhan 430072, China (L.M., Y.-T.L.); School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China (D.-B.Z.); Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 05620 USA (Y.-L.L., L.-G. M., S.P.D.-K., X.W.D.); and Shanghai Institute of Plant Physiology and Ecology, Shanghai 200032, China (H.H.)
Here, we report our effort in generating an ORFeome collection for the Arabidopsis transcription factor (TF) genes. In total, ORFeome clones representing 1,282 Arabidopsis TF genes have been obtained in the Gateway high throughput cloning pENTR vector, including 411 genes whose annotation lack cDNA support. All the ORFeome inserts have also been mobilized into a yeast expression destination vector, with an estimated 85% rate of expressing the respective proteins. Sequence analysis of these clones revealed that 34 of them did not match with either the reported cDNAs or current predicted open-reading-frame sequences. Among those, novel alternative splicing of TF gene transcripts is responsible for the observed differences in at least five genes. However, those alternative splicing events do not appear to be differentially regulated among distinct Arabidopsis tissues examined. Lastly, expression of those TF genes in 17 distinct Arabidopsis organ types and the cultured cells was profiled using a 70-mer oligo microarray.
Transcription factors (TFs) play critical roles in all aspects of a higher plant's life cycle. It is the programmed and regulated interactions between TFs and genomic DNA that bring a genome to its life and define many of its functional features (Grandori et al., 2000
Although extensive studies have been carried out for functional analysis of individual TFs, the function of only a small fraction of these TFs has been revealed so far (Riechmann, 2002
Further functional analysis of TF genes requires a careful examination of the encoded proteins and their interaction in the cells. A prerequisite for genome-wide analysis of TF genes at the protein level is a collection of cDNA clones with intact open-reading-frames (ORFs). Unfortunately, the initial identification of TF genes in the Arabidopsis genome sequence was carried out mainly by ab initio gene predictions, sequence homology comparisons, motif analysis, and other nonexperimental methods (The Arabidopsis Genome Initiative, 2000 Here we report our genome wide effort in generating Arabidopsis TF ORFeome clones, which succeeded in covering 1,282 unique Arabidopsis TF genes. In the process, sequence analysis of our ORFeome clones allowed us to correct a number of errors in the annotation of these genes. Further, comprehensive expression profiles of those TF genes in the Arabidopsis life cycle were conducted. This ORFeome clone collection has been deposited in the Arabidopsis stock center and is available to the research community for in-depth functional analysis of Arabidopsis TF genes.
Through searching MIPS database using previously described InterPro and GenBank accessions as family identifier (Riechmann et al., 2000
As expected, most of the ORFeome clones matched to the existing cDNA sequences or gene annotation based on single read sequencing from both ends of each clone. Our sequence analysis also revealed differences in 39 clones. Among those, 34 ORFeome clones were completely sequenced and their differences confirmed from either reported cDNA sequences or annotation in public databases (Tables II and III). Table II summarizes those 15 TF genes that have prior deposited cDNA sequences but nevertheless show clear sequence discrepancy between our ORFeome clones and the cDNA. Table III summarizes those 19 Arabidopsis TF genes that had predicted annotation without experimental support. In these cases, we were able to correct inaccuracy in the gene annotation using our ORFeome clones. All those 34 ORFeome clone sequences were submitted to GenBank and their corresponding accession numbers are listed in Table II and III.
As a way to confirm the intactness of ORFeome clones and to test the feasibility of high-throughput expression of TF proteins, all the ORFeome clone inserts in our collection were transferred into a yeast expression vector (pYTV; see Fig. 1) for expression analysis in yeast. About 300 representative ORFeome clones in pYTV vector were selected from different TF gene families and their expression in yeast was examined via protein-blotting analyses. Using antibodies against a His-tag fused to ORFeome inserts in the pYTV vector, our results indicated that up to 85% of ORFeome clones expressed TF proteins above the detection limit. Examination of the protein size in SDS-PAGE indicated that about 90% of the expressed proteins were of expected Mr (data not shown). For example, as illustrated in Figure 2, of the yeast protein extracts from 12 distinct TF ORFeomes, 10 of them produced strong protein blot signals that match with the calculated protein sizes, while 1 protein migrated significantly slower than predicted (Fig. 2). Protein extract from 1 clone failed to produce a detectable amount of protein (Fig. 2). These results largely confirmed the intactness of our ORFeomes clones. The small amount of clones (<10%) that failed to produce proteins with the expected size (with most migrating slower) suggest that the proteins encoded by these ORFeome clones may have unusual conformations or promiscuous interactions with other proteins in yeast such that they migrated in larger size than expected. For those failed to be detected, the proteins might be unstable or expressed at very low levels in yeast.
An Arabidopsis 70-mer oligo microarray covering more than 25,000 Arabidopsis genes were used for organ-specific expression analysis (L.G. Ma, N. Sun, X.G. Liu, Y.L. Jiao, H.Y. Zhao, and X.W. Deng, unpublished data). In this array, 1,222 of the 1,282 ORFeome genes were present. The profiling data of those 1,222 TF genes from the 17 representative Arabidopsis organs and suspension cultured cells were extracted from the above-mentioned data set and allow us to estimate the relative expression abundance for each transcript in different organs. As the detailed characterization of the AP2/EREBP and MYB families of TF genes will be reported separately, the expression patterns for the remaining 858 cloned TF genes are summarized in this report. The expression patterns of MADS family of TF genes among the 17 organs and cultured cells were illustrated in Figure 3, while the expression patterns for the entire 858 TF genes are shown in an appendix figure available at www.plantphysiol.org. One notable feature of the TF gene expression profile is that a vast majority of the TF genes exhibited organ specific expression patterns. On the other hand, some notable exceptions are present. For example, four genes (At5g65670, At2g22430, At2g18160, and At1g30970) were found to express at quite higher levels in all organs and cultured cells tested in the current work. The relative expression levels of the 858 TF genes follow similar distribution in most organ types (Fig. 4) with a general pattern not much different from the total gene expression level distribution in each organ type (data not shown).
As described in a previous analysis (L.G. Ma, N. Sun, X.G. Liu, Y.L. Jiao, H.Y. Zhao, and X.W. Deng, unpublished data), close comparisons between the known expression patterns of several well-characterized TF genes and the microarray result is a valuable mean to validate our microarray data. The genes examined in that work (L.G. Ma, N. Sun, X.G. Liu, Y.L. Jiao, H.Y. Zhao, and X.W. Deng, unpublished data) included the PISTILLATA (Goto and Meyerowitz, 1994
We also examined the number (and percentage) of the 858 TF genes whose expression can be detected experimentally in each organ type and in any of the organs examined. This analysis revealed that the expression for 831 (97%) out of the 858 genes can be detected in at least one of the 17 organs or cultured cells examined. This result confirms that the vast majority of known and predicted TFs are expressed during Arabidopsis development; while the percentage of TF genes expressed in each organ types varied from 37.4% (silique 8 d-post-pollination) to 75.8% (petal; see Fig. 6A). We also calculated the numbers of genes exhibiting highest relative expression levels in each organ types. As shown in Figure 6B, the percentage of the highest expressed TF genes in each organ type varies from organ to organ. Vegetative organs have large numbers of TF genes that have the highest expression level. This may be consistent with the fact that vegetative organs are where most of metabolism activities reside. About 20% TF genes exhibit highest expression levels in flower organs, which may hint at their special roles in flower development and reproduction. The germinating seed has the highest number of TFs, with highest expression among all organs examined here. It is interesting to note that the germinating seed has a very low percentage of the total TF genes with detectable expression (Fig. 6A). This result indicated that during seed germination, a relatively large fraction of genes turned on highly are TFs. This is consistent with the fact that those early expressed genes will initiate the developmental and metabolic processes to follow.
For the 34 TF genes where our ORFeome sequences differ from prior cDNA sequence or predicted gene annotation (Tables II and III), we examined whether alternative splicing contributes to the observed differences. Indeed, we were able to confirm that alternative splicing variants were present for five TF genes (At1g26260, At1g72050, At3g46590, At4g26640, and At4g29930) by RT-PCR (Fig. 7) and were responsible for the observed differences in their ORFeome clone sequences. Different mRNA forms of At1g72050 and At4g26640 appear to be generated by using extra or different exons located at the very 5' end, while At4g29930 and At3g46590 include alternative exons at the 3' of the mature RNAs. In At1g26260, one form of the mRNAs simply contains the first intron that is spliced out in the other form of transcript. In the case of At1g72050, the previously reported mature transcript contains 5 exons with a 975 bp ORF (GenBank accession nos. AY054225 and AY066042), and its encoded protein possesses one C2H2-type Zinc-finger domain (PSSIM-Id: 20248) plus a partial domain (PSSIM-Id: 21389) that is incomplete in both ends. By using two extra exons at the 5' end of the transcript, the new ORF predicted from our cloned mRNA species is 1,239 bp in length and has two C2H2-type Zinc-finger domains, a scenario resembling a previously reported alternative splicing event for the rice Myb7 gene (Magaraggia et al., 1997
In all five cases, the typical GT-AG binucleotide splicing junctions were observed in the alternatively spliced transcripts. To test if alternative splicing of these genes is developmentally regulated, we designed specific PCR primer pairs for the two alternative spliced transcripts of each of those five genes and used semiquantitative RT-PCR to examine the presence and abundance of those alternative mRNAs in selected Arabidopsis tissue samples (Fig. 7). The alternative spliced transcripts for each of those five genes were present at similar abundance in all tissue types tested (data not shown), suggesting that alternative splicing of these genes is constitutive and is not regulated by developmental or environmental conditions tested.
Plant Materials Arabidopsis (ecotype Columbia) plants were grown in fully automated growth chambers (Conviron, Canada) under 16 h light illumination each 24 h period. Plants were maintained at 23°C during the light period and 21°C during the dark period. To provide additional RNA samples to cover those TF genes that may not be expressed under normal growth conditions, Arabidopsis plants at 6 to 8 rosette stages were subjected to the following 8 specific treatments and were used for total RNA isolation: (1) NaCl treatment, whole pots were submerged in 300 mM NaCl for 8 h. (2) Heat shock (heat), plants were preconditioned at 37°C for 2 h before being transferred to 45°C for another 2 h. (3) UV treatment, plants were radiated with UV light (100 J m2) for 6 h. (4) Water depletion treatment, entire plants were uprooted, placed on filter papers, and allowed to dry for 6 h. (5) Ethylene treatment, plants were placed in a closed jar containing 100 ppm C2H4 for 24 h. (6) Cold treatment, plants were placed in a 4°C cold room for 8 h. (7) Wound treatment, rosette leaves were cut into approximately 5 mm strips and were left in the growth chamber for 8 h before being used for RNA isolation. (8) Dark adaptation, plants were placed in darkness for 48 h before being harvested for RNA isolation.
All known and predicted TF genes were selected from the MIPS Arabidopsis genome database (http://www.mips.biochem.mpg.de/proj/thal/db/index.html) as of December 21, 2000. Each gene was identified by its chromosome locus ID (e.g. At5g61270). The MIPS Arabidopsis database August 17, 2003 update was used as our final reference for gene annotation.
Total RNA was isolated from pooled Arabidopsis plant samples harvested from the 6 to 8 rosette leaves before bolting and plants a week after flowering using the RNeasy plant mini kit (QIAGEN, Germany) and was quantified at 260 nm with a spectrophotometer. This RNA sample was used as a generic initial template for RT-PCR. For any TF genes that were not able to be cloned from those generic RNA samples, RNA samples from specific treated Arabidopsis seedlings (see "Plant Materials" section) were used as an alternative template for RT-PCR amplification. Three µg of total RNA sample was reverse transcribed using SuperScript First-Strand Synthesis System for RT-PCR (Invitrogen, Carlsbad, CA) in a total volume of 20 µL. Primers for ubiquitin amplification (forward: 5'-GGTGCTAAGAAGAGGAAGAAT-3' and reverse: 5'-CTCCTTCTTTCTGGTAAACGT-3') were added as the internal control together with gene-specific primers. PCR was performed using Pfu polymerases (Sangon, China). For tissue-specific expression analyses, different plant materials harvested at indicated stages were used. PCR products were purified with a gel extraction kit (CLONTECH Laboratories, Palo Alto, CA), cloned into pENTR/D/TOPO vector (Invitrogen), and verified by sequencing using M13 primers. Primers for different TF genes were designed using information obtained from the Arabidopsis genome. The forward primer contained the sequence 5'-CACCACAAA-3' at the 5'end. The CACC base paired with the overhang sequence, GTGG, in pENTR TOPO vector (Fig. 1A). The yeast expression vector, pYTV, was a modified version of the pDEST 52 (Invitrogen). The original tag was removed and was replaced by 3XFLAG, 6Xhis, and a 3C cleavage and 2XIgG binding protein added as C-terminal tags to facilitate purification of the fusion protein (Fig. 1B). To clone the gene of interest in frame with C-terminal tags present in the pYTV, the reverse primer was designed in such a way that the stop codon in the target gene was deleted in the final PCR product for ORF amplification for initial cloning into pENTR TOPO vector (Fig. 1C).
Total proteins extracted from 50 to 75 µL saturated yeast cells expressing target genes were fractionated by SDS-PAGE. Each gel was probed with 1 ug/mL monoclonal antipolyhistidine antibody (R&D Systems, Minneapolis) and visualized after incubating with goat anti-mouse AP-conjugated secondary antibody (Promega, Madison, WI).
Gene-specific 70-mer oligos were designed based on Arabidopsis genome annotation data available on February 20, 2002 by Qiagen (http://omad.qiagen.com/download/genelist/arabidopsis_V1_384.prn), and the microarray slide was printed at Yale University as described (L.G. Ma, N. Sun, X.G. Liu, Y.L. Jiao, H.Y. Zhao, and X.W. Deng, unpublished data). The signal was scanned at 532 nm (Cy3) and 635 nm (Cy5) wavelengths with an Axon GenePix 4000B scanner (Axon, Foster City, CA) at 5-nm resolution and quantified with Axon GenePix Pro 3.0 image analysis. The intensity of different organs was normalized by equalizing the median value of all gene intensities from each organ. The normalized intensity value for each gene was considered its relative expression level (L.G. Ma, N. Sun, X.G. Liu, Y.L. Jiao, H.Y. Zhao, and X.W. Deng, unpublished data). Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers listed in Tables II and III.
We thank Dr. Lei Li for commenting on this manuscript. Received March 5, 2004; returned for revision April 20, 2004; accepted April 20, 2004.
1 This work was supported by a grant from the Chinese National Natural Science Foundation (grant no. 30221120261).
2 These authors contributed equally to the paper.
[w] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.104.042176. * Corresponding authors; e-mail xingwang.deng{at}yale.edu or zhuyx{at}water.pku.edu.cn; fax 2034325726.
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline] Burgeff C, Liljegren SJ, Tapia-López R, Yanofsky MF, Alvarez-Buylla ER (2002) MADS-box gene expression in lateral primordia, meristems and differentiated tissues of Arabidopsis thaliana roots. Planta 214: 365372[CrossRef][ISI][Medline]
Chen W, Provart NJ, Glazebrook J, Katagiri F, Chang H-S, Eulgem T, Mauch F, Luan S, Zou G, Whitham SA, et al (2002) Expression profile matrix of Arabidopsis transcription factor genes suggests their putative functions in response to environmental stresses. Plant Cell 14: 559574
Dimova DK, Stevaux O, Frolov MV, Dyson NJ (2003) Cell cycle-dependent and cell cycle-independent control of transcription by the Drosophila E2F/RB pathway. Genes Dev 17: 23082320 Flanagan CA, Hu Y, Ma H (1996) Specific expression of the AGL1 MADS-box gene suggests regulatory functions in Arabidopsis gynoecium and ovule development. Plant J 10: 343353[CrossRef][ISI][Medline] Flanagan CA, Ma H (1994) Spatially and temporally regulated expression of the MADS-box gene AGL2 in wild-type and mutant Arabidopsis flowers. Plant Mol Biol 26: 581595[CrossRef][ISI][Medline]
Goto K, Meyerowitz EM (1994) Function and regulation of the Arabidopsis f homeotic gene PISTILLATA. Genes Dev 8: 15481560 Grandori C, Cowley SM, James LP, Eisenman RN (2000) The Myc/Max/Mad network and the transcriptional control of cell behavior. Annu Rev Cell Dev Biol 16: 653699[CrossRef][ISI][Medline]
Heim MA, Jakoby M, Werber M, Martin C, Weisshaar B, Bailey PC (2003) The basic helix-loop-helix transcription factor family in plants: a genome-wide study of protein structure and functional diversity. Mol Biol Evol 20: 735747 Honma T, Goto K (2000) The Arabidopsis floral homeotic gene PISTILLATA is regulated by discrete cis-elements responsive to induction and maintenance signals. Development 127: 20212030[Abstract] Honma T, Goto K (2001) Complexes of MADS-box proteins are sufficient to convert leaves into floral organs. Nature 409: 525528[CrossRef][Medline]
Hosoda K, Imamura A, Katoh E, Hatta T, Tachiki M, Yamada H, Mizuno T, Yamazaki T (2002) Molecular structure of the GARP family of plant Myb-related DNA binding motifs of the Arabidopsis response regulators. Plant Cell 14: 20152029 Huang H, Tudor M, Weiss CA, Hu Y, Ma H (1995) The Arabidopsis MADS-box gene AGL3 is widely expressed and encodes a sequence-specific DNA-binding protein. Plant Mol Biol 28: 549567[CrossRef][ISI][Medline] Jack T, Brockman LL, Meyerowitz EM (1992) The homeotic gene APETALA3 of Arabidopsis thaliana encodes a MADS box and is expressed in petals and stamens. Cell 68: 683697[CrossRef][ISI][Medline]
Jiao Y, Yang H, Ma L, Sun N, Yu H, Liu T, Gao Y, Gu H, Chen Z, Wada M, et al (2003) A genome-wide analysis of blue-light regulation of Arabidopsis transcription factor gene expression during seedling development. Plant Physiol 133: 14801493
Kofuji R, Sumikawa N, Yamasaki M, Kondo K, Ueda K, Ito M, Hasebe M (2003) Evolution and divergence of the MADS-box gene family based on genome-wide expression analysis. Mol Biol Evol 20: 19631977
Kohler C, Hennig L, Spillane C, Pien S, Gruissem W, Grossniklaus U (2003) The polycomb-group protein MEDEA regulates seed development by controlling expression of the MADS-box gene PHERES1. Genes Dev 17: 15401553
Ma H, Yanofsky MF, Meyerowitz EM (1991) AGL1-AGL6, an Arabidopsis gene family with similarity to floral homeotic and transcription factor genes. Genes Dev 5: 484495
Ma LG, Zhao HY, Deng XW (2003) Analysis of the mutational effects of the COP/DET/FUS loci on genome expression profiles reveals their overlapping yet not identical roles in regulating Arabidopsis seedling development. Development 130: 969981 Magaraggia F, Solinas G, Valle G, Giovinazzo G, Coraggio I (1997) Maturation and translation mechanisms involved in the expression of a myb gene of rice. Plant Mol Biol 35: 10031008[CrossRef][ISI][Medline] Mandel MA, Gustafson-Brown C, Savidge B, Yanofsky MF (1992) Molecular characterization of the Arabidopsis floral homeotic gene APETALA1. Nature 360: 273277[CrossRef][Medline] Mandel MA, Yanofsky MF (1995) The Arabidopsis AGL18 MADS box gene is expressed in inflorescence meristems and is negatively regulated by APETALA1. Plant Cell 7: 17631771[Abstract]
Michaels SD, Amasino RM (1999) FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell 11: 949956
Ng M, Yanofsky MF (2001) Activation of the Arabidopsis B class homeotic genes by APETALA1. Plant Cell 13: 739753
Parenicova L, de Folter S, Kieffer M, Horner DS, Favalli C, Busscher J, Cook HE, Ingram RM, Kater MM, Davies B, et al (2003) Molecular and phylogenetic analyses of the complete MADS-box transcription factor family in Arabidopsis: new openings to the MADS world. Plant Cell 15: 15381551 Riechmann JL (2002) Transcriptional regulation: a genomic overview. In CR Somerville, EM Meyerwitz, eds, The Arabidopsis Book. American Society of Plant Biologists, Rockville, MD, doi/10.1199/tab.0085, http://www.aspb.org/publications/arabidopsis/
Riechmann JL, Heard J, Martin G, Reuber L, Jiang CZ, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, et al (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290: 21052110 Riechmann JL, Ratcliffe OJ (2000) A genomic perspective on plant transcription factors. Curr Opin Plant Biol 3: 423434[CrossRef][ISI][Medline] Rigaut G, Shevchenko A, Rutz B, Wilm M, Mann M, Seraphin B (1999) A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol 17: 10301035[CrossRef][ISI][Medline] Rounsley SD, Ditta GS, Yanofsky MF (1995) Diverse roles for MADS box genes in Arabidopsis development. Plant Cell 7: 12591269[Abstract]
Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, et al (2002) Functional annotation of a full-length Arabidopsis cDNA collection. Science 296: 141147 Shinozaki K, Yamaguchi-Shinozaki K (2000) Molecular responses to dehydration and low temperature: differences and cross-talk between two stress signaling pathways. Curr Opin Plant Biol 3: 217223[ISI][Medline]
Shuai B, Reynaga CG, Springer PS (2002) The LATERAL ORGAN BOUNDARIES gene defines a novel, plant-specific gene family. Plant Physiol 129: 747761
Toledo-Ortiz G, Hug E, Quail PH (2003) The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell 15: 17491770
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, et al (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842846 This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||