Tissue-specific expression patterns of Arabidopsis NF-Y transcription factors suggest potential for extensive combinatorial complexity.

All aspects of plant and animal development are controlled by complex networks of transcription factors. Transcription factors are essential for converting signaling inputs, such as changes in daylength, into complex gene regulatory outputs. While some transcription factors control gene expression by binding to cis-regulatory elements as individual subunits, others function in a combinatorial fashion. How individual subunits of combinatorial transcription factors are spatially and temporally deployed (e.g. expression-level, posttranslational modifications and subcellular localization) has profound effects on their control of gene expression. In the model plant Arabidopsis (Arabidopsis thaliana), we have identified 36 Nuclear Factor Y (NF-Y) transcription factor subunits (10 NF-YA, 13 NF-YB, and 13 NF-YC subunits) that can theoretically combine to form 1,690 unique complexes. Individual plant subunits have functions in flowering time, embryo maturation, and meristem development, but how they combine to control these processes is unknown. To assist in the process of defining unique NF-Y complexes, we have created promoter:β-glucuronidase fusion lines for all 36 Arabidopsis genes. Here, we show NF-Y expression patterns inferred from these promoter:β-glucuronidase lines for roots, light- versus dark-grown seedlings, rosettes, and flowers. Additionally, we review the phylogenetic relationships and examine protein alignments for each NF-Y subunit family. The results are discussed with a special emphasis on potential roles for NF-Y subunits in photoperiod-controlled flowering time.

Eukaryotic gene expression is often controlled by combinatorial transcription factors (Singh, 1998;Wolberger, 1998;Remenyi et al., 2004). Combinatorial transcription factors are multiprotein complexes that derive their gene regulatory capacity from both intrinsic properties and the properties of their trans-acting partners (Singh, 1998). Participation in such higher order complexes allows an organism to use single transcription factors to control multiple genes with different temporal and spatial expression patterns.
MADS box transcription factors represent a wellstudied example of this phenomenon (Messenguy and Dubois, 2003;Yamaguchi and Hirano, 2006). Through the formation of homodimers, heterodimers, and heteromultimers, MADS box proteins bind unique cisregulatory elements and control, for example, most of the floral organ fates in plants (Kaufmann et al., 2005). While less well studied in the plant lineage, the numerous heterotrimeric NF-Y (for Nuclear Factor Y) transcription factors might provide similar levels of combinatorial diversity for transcriptional fine-tuning.
NF-Y transcription factors are likely found in all eukaryotes and have roles in the regulation of diverse genes (McNabb et al., 1995;Edwards et al., 1998;Maity and de Crombrugghe, 1998;Mantovani, 1999). In mammals, where their biochemistry is well described, the NF-Y transcription factor complex is composed of three unique subunits: NF-YA, NF-YB, and NF-YC. Assembly of the NF-Y heterotrimer in mammals follows a strict, stepwise pattern (Sinha et al., 1995. Initially, a heterodimer is formed in the cytoplasm between the subunits NF-YB and NF-YC. This dimer then translocates to the nucleus, where the third subunit, NF-YA, is recruited to generate the mature, heterotrimeric NF-Y transcription factor (Frontini et al., 2004;Kahle et al., 2005). Mature NF-Y binds promoters with the core pentamer nucleotide sequence CCAAT, and this can result in either positive or negative transcriptional regulation Jahroudi, 2002, 2003;Ceribelli et al., 2008). Bioinformatic analy-ses indicate that 25% to 30% of all mammalian promoters have predicted NF-Y binding sites (Bucher, 1990;Testa et al., 2005), and recent chromatin immunoprecipitation data demonstrate additional widespread NF-Y binding in nonpromoter sites. Suggesting the importance of binding context, NF-Y-regulated gene expression can be tissue specific, developmentally regulated, or constitutive (Maity and de Crombrugghe, 1998). Predictably, NF-Y function is essential for mammalian development (Hu and Maity, 2000;Bhattacharya et al., 2003).
Despite the wide cellular distribution and functional variability of NF-Y-regulated genes, most eukaryotic genomes have only one or two genes encoding each NF-Y subunit (Maity and de Crombrugghe, 1998;. For example, humans and mice encode only one copy of each subunit. Thus, there is minimal combinatorial diversity in the subunit composition of the heterotrimeric mammalian NF-Y. In contrast, Arabidopsis (Arabidopsis thaliana) has multiple genes encoding each subunit (10 NF-YA, 13 NF-YB, and 13 NF-YC homologs; this article). This Arabidopsis NF-Y expansion is a general feature of the plant lineage, including monocots and dicots. Because of the heterotrimeric composition, the 36 Arabidopsis NF-Y subunits can theoretically combine to generate 1,690 unique transcription factors.
NF-Y function in the plant lineage is poorly understood, yet many of the mechanistic details are likely conserved across plant, animal, and fungal lineages. This inference comes from strong, cross-kingdom conservation of amino acid residues with well-characterized importance in mammalian and yeast NF-Y functions Mantovani et al., 1994;Sinha et al., 1995Sinha et al., , 1996Coustry et al., 1996;Kim et al., 1996;Romier et al., 2003). Additionally, several groups have demonstrated that each of the three plant NF-Y proteins can substitute for their yeast counterparts in gene expression assays (Masiero et al., 2002;Ben-Naim et al., 2006). Finally, our own preliminary data suggests that dominant negative Arabidopsis NF-YC subunits can be readily predicted from existing mammalian and yeast data B.F. Holt and R.W. Kumimoto, unpublished data). Hence, animal and yeast models provide excellent starting points for further investigations of the numerous plant NF-Y proteins.
No complete plant NF-Y complex has been described, but individual subunits are increasingly known to be involved in a number of important processes. LEAFY COTYLEDON1 (LEC1 or NF-YB9; Table I) was the earliest cloned and described plant NF-Y. LEC1 has strong expression in the developing embryo and is necessary for controlling the transition from embryo to adult status (West et al., 1994;Lotan et al., 1998;Lee et al., 2003). Evidence comes from the precocious development of trichomes on lec1 cotyledons and the presence of embryo-like tissue on the adult leaves of LEC1-overexpressing plants. The closely related LEC1-LIKE (L1L or AtNF-YB6) is also involved in embryogenesis (Kwong et al., 2003). While LEC1 has been studied for well over a decade, the requirements for specific NF-YA and NF-YC subunits remain elusive. This may be partly due to the complexity of genetic and biochemical studies in the embryo. However, another likely reason for the general lack of complete plant NF-Y complexes is the likelihood of extensive redundancies between NF-Y subunits. Several new phenotypes associated with overexpression and mutant versions of NF-Y subunits promise to provide the anchor points for describing complete in planta complexes.
The Medicago truncatula NF-YA subunit, MtHAP2-1, is expressed in a narrow region of the root nodule meristem and is essential for postinitiation nodule development (Combier et al., 2006). Interestingly, spatial and temporal maintenance of MtHAP2-1 nodule expression is dually controlled by microRNA169 (Jones-Rhoades and Bartel, 2004) and a small RNAbinding peptide (uORF1p) encoded by the MtHAP2-1 leader sequence (Combier et al., 2008). Thus, as with the narrow expression of LEC1 and L1L in embryos, MtHAP2-1 expression is quite finely controlled. This is in contrast to animal systems, where NF-Y expression is largely ubiquitous and, unsurprisingly, suggests refinements and specialization of NF-Y function in the plant lineage.
Plant NF-Y function also appears to be important for responses to drought stress. Although a specific mechanism remains unclear, overexpression of NF-YB1 and its ortholog in maize (Zea mays), ZmNF-YB2, leads to enhanced drought resistance (Nelson et al., 2007). Although the ability of NF-YB1 to promote drought resistance is clear, no loss-of-function data were provided to demonstrate an actual biological role in Arabidopsis. In contrast, a recent publication shows both overexpression and loss-of-function data for NF-YA5 (Li et al., 2008). Overexpression of NF-YA5 reduced drought susceptibility, anthocyanin production, and stomatal aperture, while nf-ya5 mutants had the expected opposite phenotype in each instance. As with the role of MtHAP2-1 in nodule meristem development, miRNA169 also plays a regulatory role in drought resistance. Specifically, the miRNA169 precursors miRNA169a and miRNA169c are strongly down-regulated by drought treatments in an abscisic acid (ABA)-dependent manner. Similar to NF-YA5 loss of function, 35S:miRNA169 plants were more drought susceptible (Li et al., 2008). Although still unknown, the involvement of both NF-YA and NF-YB in plant drought resistance suggests that complete NF-Y complexes with NF-YC subunits will eventually be discovered.
Perhaps the most interesting recent discovery is the involvement of NF-Y in the control of photoperiodregulated flowering time. In Arabidopsis, the key regulator of photoperiod-induced flowering time is the zinc-finger-type transcriptional activator encoded by CONSTANS (CO; Redei, 1962;Koornneef et al., 1991;Putterill et al., 1995). CO mRNA levels are controlled by the circadian clock and oscillate on a daily basis: CO expression peaks during the day in long-day (LD) conditions and during the night in short-day conditions (Suarez-Lopez et al., 2001). Peak expression during the day is essential for CO activity because the protein is rapidly degraded in the dark (Valverde et al., 2004). Under LD conditions, CO protein accumulates and induces the expression of FLOWERING LOCUS T (FT). Recent evidence strongly suggests that FT protein is the primary component of "florigen," the long-sought mobile flowering signal (Izawa, 2007;Jaeger and Wigge, 2007;Mathieu et al., 2007;Tamaki et al., 2007). Therefore, in LD-grown co mutants, FT does not accumulate to sufficient levels to induce flowering (Kardailsky et al., 1999;Samach et al., 2000). While the genetics of this biological process have been carefully examined, precisely how CO integrates with DNA to trigger FT expression remains unclear.
Several publications strongly suggest that NF-Y transcription factors are intimately involved in photoperiod-regulated flowering. Research groups studying flowering time in both tomato (Solanum lycopersicum) and Arabidopsis identified NF-YB and NF-YC subunits as CO-interacting proteins via yeast two-hybrid assays (Ben-Naim et al., 2006;Wenkel et al., 2006). Additionally, two independent research groups described moderate flowering time delays for the same nf-yb2 allele (Cai et al., 2007;Chen et al., 2007). Furthermore, overexpression of NF-YB2 resulted in significantly more rapid flowering. Finally, Kumimoto et al. (2008) found that nf-yb2 nf-yb3 double mutant plants flower as late as co mutants. Because of NF-Y's well-characterized role as a transcription-activating, DNA-binding complex in mammals and yeast, these results immediately suggest a possible platform for CO interactions with DNA. Additionally, the region of interaction between CO and NF-Y proteins is highly conserved in many CO-like genes (Wenkel et al., 2006). This suggests that the CO/NF-Y regulatory module might be paradigmatic for other CO-like DNA interactions and developmental processes.
The emerging picture has plant NF-Y complexes acting as essential regulatory hubs for many processes. While functions for individual NF-Y genes are beginning to emerge, overlapping functionality remains a persistent problem for further investigations. Additionally, demonstration of a complete NF-Y complex remains elusive. In this article, we present updated phylogenetic trees and alignments for all 36 Arabidopsis NF-Y proteins. As determined by functional analyses in yeast and mammals, we clearly identify the essential amino acids for each subunit type and specifically discuss these data in the context of recent findings on the NF-Y requirement in flowering. Furthermore, we test the utility of mammalian and yeastderived positional weight matrices for defining CCAAT sites in plant promoters. Finally, we examine the tissue-and development-specific expression patterns for all 36 NF-Y genes using stable promoter:GUS fusions. The resulting plant lines will facilitate the discovery of complete NF-Y complexes and are freely available to academic researchers. Collectively, we hope the following data and review will serve as an entry point for other researchers interested in plant NF-Y function.

Notes on Nomenclature
Various nomenclatures are used for the three NF-Y families. The three most widely used names in various organisms are CBF (for CCAAT-binding factor), HAP (for histone or heme-associated protein), and NF-Y. Additionally, the unique subunits are alternatively assigned numerical or letter designations, and those designations often do not match across genera. For example, NF-YA in Homo sapiens is homologous to CBF-B in Rattus norvegicus, HAP2 in Saccharomyces Table I. Suggested NF-Y nomenclature Nomenclature is the same as originally proposed (Gusmaroli et al., 2001(Gusmaroli et al., , 2002, except the At designations are removed and seven new genes (bold) have been added (see note on nomenclature). AGI, Arabidopsis Genome Initiative number.
Arabidopsis NF-Y cerevisiae, and HAPB in Aspergillis nidulans. Because there are still relatively few Arabidopsis NF-Y papers and we provide information on all 36 genes, we discussed nomenclature options with curators from The Arabidopsis Information Resource (TAIR). In Arabidopsis, these genes are alternatively called AtHAP, AtNF-Y, NF-Y, and CBF (Edwards et al., 1998;Kusnetsov et al., 1999;Gusmaroli et al., 2001;Kumimoto et al., 2008). Due to their usage for unrelated genes, HAP (for HAPLESS) and CBF (for C-Repeat/DRE Binding Factor) nomenclature options are both problematic and confusing. Additionally, TAIR curators are encouraging limited usage of the "At" designation. Thus, we jointly chose the NF-Y nomenclature following the number scheme initially suggested by Gusmaroli et al. (2001Gusmaroli et al. ( , 2002 and expanded here with seven new genes. To avoid unnecessary complexity, we respectfully request that future Arabidopsis researchers follow this updated naming scheme (Table I).

NF-Y Phylogenies and Alignments
Published phylogenies are available for Arabidopsis, rice (Oryza sativa), and wheat (Triticum aestivum) plant NF-Y families (Gusmaroli et al., 2001(Gusmaroli et al., , 2002Yang et al., 2005;Stephenson et al., 2007). A comprehensive examination of all Arabidopsis transcription factors concluded that there are 36 total NF-Y genes (10 NF-YA, 13 NF-YB, and 13 NF-YC homologs; . Nevertheless, published phylogenies only list 29 to 30 genes (Gusmaroli et al., 2001(Gusmaroli et al., , 2002Yang et al., 2005). Therefore, we repeated the BLAST searches for each NF-Y family (Altschul et al., 1990). In each case, BLAST searches were performed with several members of each NF-Y family using amino acid sequences from highly conserved (across genera) regions. Although there is no absolute rule for inclusion/exclusion from a gene family, in each case we chose the last member of the family based on very obvious and large breaks in the BLAST-derived E value score (e.g. using NF-YB1 in BLAST analysis, we accepted NF-YB family members with E values ranging from 2e-50 to 2e-06; the next closest nonfamily member had an E value of 0.11). Our searches confirmed the previous finding of 36 total NF-Y genes  and are presented as three phylogenetic trees ( Fig. 1) and three amino acid alignments (Figs. 2-4).

NF-YA Family
NF-YA proteins represent a unique transcription factor class lacking obvious homology to other described proteins (Maity and de Crombrugghe, 1998). NF-YA proteins are characterized by the presence of Gln (Q)-and Ser/Thr (S/T)-rich NH 2 termini, a subunit interaction domain (NF-YB/NF-YC interaction), and a DNA-binding domain (Olesen and Guarente, 1990;Xing et al., 1993Xing et al., , 1994. The protein interaction and DNA binding domains are well conserved between plant and other eukaryote lineages (Fig. 2). As with other eukaryotes, NF-YA NH 2 -terminal regions are also characterized by an overall high composition of Q and S/T residues. The Q-rich regions of NF-YA and NF-YC are thought to act redundantly in transcriptional activation (di Silvio et al., 1999). However, it is notable that plant NF-YA proteins have low Q:S ratios (1.0:2.1, Q:S), while the yeast, human, and rat NF-YA proteins have the opposite relationship (2.9:1.0, Q:S). It is currently unknown how these changes might affect the transcriptional activation potential of plant NF-Y complexes.
The alignment in Figure 2 highlights the crosskingdom conservation of NF-YA proteins in their interaction and DNA-binding domains. Functionally required amino acids were previously determined by mutagenesis in yeast and mammalian NF-YA (HAP2 and CBF-B, respectively; Xing et al., 1993). Of the three plant NF-Y families, functionally required amino acids are most highly conserved in the NF-YA proteins ("required" throughout refers to data from yeast or mammals; almost no amino acid requirements have been experimentally determined for plant NF-Y proteins). There is one notable change that is apparently specific to the plant lineage: the required Arg (R) residue at position 9 is almost always Gly (G) or Ala (A) in the plant lineage. This is true for all Arabidopsis NF-YA proteins as well as the majority of other plant NF-YA sequences (rice, poplar [Populus spp.], grape [Vitis vinifera], moss, wheat, etc.). All other required amino acids from yeast are absolutely conserved in all 10 Arabidopsis NF-YA proteins.
The conservation of specific amino acid residues across highly divergent NF-YA lineages strongly suggests functional conservation. This is particularly relevant when considering the recent finding that CO, a master regulator of floral transitions, physically interacts with NF-YB and NF-YC proteins (Ben-Naim et al., 2006;Wenkel et al., 2006). In this sense, CO seems to act like an NF-YA protein. Accordingly, the CCT (for CO, CO-LIKE, TOC1) domain of CO and CO-LIKE (COL) proteins shares a region of apparent homology to the NF-YA proteins (see "CCT Cons." line in Fig. 2; Wenkel et al., 2006). In a stretch of 41 amino acids, the CCT-containing proteins CO and COL1-5 share 32% identity with the NF-YA proteins ( Fig. 2; Supplemental Figs. S1 and S2). Furthermore, previously identified co alleles, and mutant alleles in related CCT domain proteins, represent amino acid changes in residues that are conserved between the CCT domain and NF-YA proteins (Wenkel et al., 2006). The similarities between the conserved domains of NF-YA and COL suggest the possibility of another layer of combinatorial interactions with plant NF-Y complexes.
For the majority of NF-YB and NF-YC proteins, required amino acids are well conserved (Figs. 3 and 4; Sinha et al., 1995Sinha et al., , 1996Kim et al., 1996). Phylogenetically distant Arabidopsis family members, such as NF-YB11 to -YB13 and NF-YC10 to -YC13 (Fig. 1), are much more likely to have undergone changes in required amino acids (Fig. 3). Because the required amino acids are so highly conserved across evolutionary time and space, it is likely that nonconservative changes will significantly alter protein function. In this regard, LEC1 (NF-YB9) and LEC1-like (NF-YB6) have Asp (D) residues where Lys (K) is found in yeast, mammals, and most plants and is required for mammalian NF-YB function (position 29 in Fig. 3; actual change is K55D). Although LEC1 has other differences from plant and animal NF-YB proteins, K55D is the only change in a required amino acid and reversion of this alteration in LEC1 eliminates rescue of the lec1 embryonic desiccation-intolerance phenotype . Thus, LEC1 appears to have evolved a novel function.
NF-YB2 and NF-YB3, redundant players in photoperiod-related floral transitions (Kumimoto et al., 2008), have replaced the required Ser (S) at position 48 ( Fig. 3) with Gly (S66G and S72G, respectively). Although Gly and Ser are both very small amino acids, Gly is more hydrophobic. In fact, many of the changes in the plant NF-YB and NF-YC proteins result in changes in hydropathy (Yang et al., 2005). It would be interesting to see if reversions to the apparently ancestral Ser residue would eliminate the rescue of flowering time defects in either the nf-yb2 or nf-yb3 mutant. Furthermore, it is possible that this change is necessary for the CO/NF-Y interaction and might prevent formation of the normal NF-Y heterotrimer Figure 1. NF-Y family phylogenies. Phylogenetic trees for each family were constructed by neighbor joining using the conserved regions illustrated in Figures 2 to 4. Phylogenetic trees were also generated for full-length proteins and did not differ substantially from those shown (data not shown). Reliability values at each branch represent bootstrap samples (2,000 replicates). All trees were determined and constructed using MEGA 4 (Tamura et al., 2007). Note that NF-YB12 and NF-YB13 are sometimes included in a separate DR1related, two-member protein family (see "Gene Families" at TAIR and ). Nevertheless, NF-YB12 and -YB13 clearly show much stronger homology to the larger NF-YB family than to any other proteins, and there is currently no functional evidence to differentiate them as unique. Thus, we consider them to be divergent, but related, NF-YB family members.
in favor of CO/NF-YB/YC. There is a precedent for alterations at this exact position driving the formation non-NF-YA-containing heterotrimers. For rice OsNF-YB1, there is an unusual Asp in this position, and this difference preferentially drives the formation of an OsMADS18/NF-YB/YC heterotrimer (Masiero et al., 2002). As eight of 13 Arabidopsis NF-YB proteins have alterations from the expected Ser, further experimentation on this residue is important.
To date, very little is known regarding the NF-YC proteins. Phylogenetic analyses and amino acid alignments suggest that there are two distinct clades. One clade consists of NF-YC1 to -YC4 and -YC9 and is still very similar to distant yeast and mammalian NF-YC lineages (Figs. 1 and 4). Members of the second clade, consisting of NF-YC5 to -YC8 and NF-YC10 to -YC13 are increasingly divergent from the ancestral NF-YC. Suggesting that they may be evolving functions inconsistent with yeast and mammalian NF-Y functions, members of this clade have numerous nonconservative changes from required amino acids.

CCAAT Motif in Arabidopsis
In yeast and mammals, many CCAAT sites have been experimentally defined (Mantovani, 1998;Testa et al., 2005). CCAAT sites can be in either orientation and are approximately 13 bp in length, including the approximately centrally located CCAAT pentamer. Yeast and mammalian NF-Y binding sites are defined as C,Pu,Pu,C,C,A,A,T,C/G,A/G,G,A/C,G and are typically between 250 and 2100 bp upstream of the transcription start site (TSS; Mantovani, 1998). In humans and yeast, mutations in the flanking nucleotides typically have moderate effects on NF-Y binding, while mutations in the core CCAAT pentamer essentially abolish binding (Dorn et al., 1987). Because in vivo targets have not been identified, there is no descriptive positional weight matrix (PWM) describing plant CCAAT motifs. Although there is strong amino acid homology between the yeast, mammalian, and plant NF-Y proteins, there is no formal evidence that plant NF-Y complexes bind CCAAT sites. Nevertheless, select members of each Arabidopsis subunit family have been successfully used in yeast CCAATbinding assays (Edwards et al., 1998;Ben-Naim et al., 2006;Kumimoto et al., 2008). Thus, for at least a subset of Arabidopsis NF-Y complexes, the ability to bind in planta CCAAT motifs likely still exists.
We addressed whether or not a yeast/mammalian PWM would find similar enrichments of CCAAT sites in Arabidopsis promoters (Mantovani, 1998). Our Arabidopsis promoter data set was assembled by TAIR and consisted of 16,851 sequences corresponding to 21,000 to +200 from the TSS. Only promoters with known TSS were used. For humans, we obtained a Figure 2. Arabidopsis NF-YA family alignment, including the consensus CCT domain from CO and COL1-5. Sequences correspond to the conserved regions in NF-YA proteins across various lineages. Hs, Homo sapiens; Rn, Rattus norvegicus; Sc, Saccharomyces cerevisiae. Numbers in parentheses correspond to actual amino acid numbers; numbers at top are for reference in the text. Three black boxes in the alignment correspond to three nuclear localization signals that are collectively required for HsNF-YA binding to importin b (Kahle et al., 2005). In the NF-Y Cons. (consensus) line, uppercase letters represent identity in more than 80% of sequences, lowercase letters represents 50% or greater identity, and x represents less than 50% identity. Required amino acid (AA) residues are from the literature (Xing et al., 1993). CCT Cons. (consensus) was determined from separate alignment of CO and COL1-5. Small black boxes in the CCT Cons. line represent two amino acids that are required for the physical interaction between tomato COL1 and two tomato NF-YC proteins (THAP5a and THAP5c; (Ben-Naim et al., 2006). CCT = NF-YA ($) and CCT ¹ NF-YA (#) lines identify amino acids that are required for NF-YA function and are either conserved or not conserved, respectively, in CO and COL1-5.
previously described set of 13,010 sequences corresponding to the same region relative to the TSS (FitzGerald et al., 2004). Computer-based scanning of each human sequence for PWM matches ($90% match threshold) were consistent with previous publications (i.e. there is a strong peak of putative CCAAT sites at approximately 2100 from the TSS; Fig. 5A). In contrast, Arabidopsis promoters are largely devoid of this strong enrichment of putative CCAAT sites. Although the PWM does predict a small number of CCAAT sites in Arabidopsis (approximately one of 20 promoters examined), the human peak is almost eight times higher. These data suggest that NF-Y-binding sites have significantly evolved from those of yeast and mammals.
Because there is currently only one confirmed Arabidopsis NF-Y binding site (Kusnetsov et al., 1999), we were not able to test if a specialized plant PWM could find more putative CCAAT sites. Furthermore, because of the many possible plant NF-Y complexes, it might be more likely that plant NF-Y binding sites have coevolved with specific complexes. Nevertheless, we could assess the frequency of CCAAT-only sequences in Arabidopsis and humans (i.e. no flanking sequences; Fig. 5B). Interestingly, when flanking sequences are removed from the search, Arabidopsis has  Figure 2. Note that the NF-YC interaction domain extends across two independent regions and partly overlaps with the DNA-binding and NF-YA interaction domains. To eliminate a gap of nonhomology, the amino acid sequence SECS was removed from NF-YB11 between the L and P residues at position 20/21. Required AA, Required amino acid.

Arabidopsis NF-Y Transcription Factors
Plant Physiol. Vol. 149, 2009 more CCAAT/ATTGG sequences than humans. Additionally, both Arabidopsis and humans have peaks in the expected 250 to 2100 window. Thus, the core pentamer is quite common in Arabidopsis, and there is currently no reason to believe that Arabidopsis has fewer CCAAT sites than humans.

Promoter:GUS Fusions
As with other multigene families, researchers interested in Arabidopsis NF-Y genes must contend with overlapping functionality. Based on our own experience, loss-of-function mutations in NF-Y rarely have any obvious phenotypes. From the lack of complete  Figure 2. Note that the NF-YA interaction domain extends across two separate regions. The DNA-binding domain in NF-YC consists of the two amino acids AR (found in most NF-YC homologs). To eliminate two large gaps of nonhomology, the amino acid sequence DTLTRS was removed from NF-YC7 between the S and D residues at position 57/58 and YVNFQK was removed from NF-YC12 between the paired I residues at 18/19. Required AA, Required amino acid.
plant NF-Y complexes described in the literature, we infer that other groups have also confronted this problem. We reasoned that assembling complete complexes would be greatly facilitated if reporter gene fusions were available for all 36 NF-Y genes. Accordingly, for each gene we cloned 1,000 bp of upstream sequence (5# untranslated region and promoter region) in front of an enhanced GFP (eGFP):GUS reporter gene fusion (Karimi et al., 2002). For each gene, we collected at least two independent, stable (third-generation) transformants and analyzed these in numerous plant tissues throughout normal development. We expect the data available from these lines to greatly facilitate the discovery of complete, in planta NF-Y complexes.

Root Expression Patterns
As a first qualitative measure of the reporter lines, we compared the GUS expression patterns in roots with a high-quality data set based on florescence-activated cell sorting (FACS; Birnbaum et al., 2005) and numerous standardized microarray data sets available for online visualization at Genevestigator and The Botany Array Resource (Schmid et al., 2005;Toufighi et al., 2005;Grennan, 2006). Figure 6 shows the promoter:GUS expression patterns in the root tip for all 36 genes. Although precise quantitative comparisons are difficult, we generally found that GUS expression levels and patterns were consistent with expectations from FACS and online data. For example, GUS and FACS data sets agree that NF-YA2 is primarily expressed in the maturation zone of the root (Fig. 6). It is important to note that all GUS pictures are static representations of dynamic expression patterns. For example, NF-YB2 is very weakly expressed in the meristematic region of elongated roots (Fig. 6). Nevertheless, in young roots that are just emerging from the pericycle, NF-YB2 expression is very strong throughout the root tip (data not shown).
We found only one obvious problem for the root GUS expression patterns: NF-YB10 had essentially no GUS expression. Nevertheless, FACS, Genevestigator, and our own reverse transcription (RT)-PCR results show that it is strongly expressed in the roots. Additionally, NF-YA5 did not express anywhere in our experiments, but another publication shows broad expression in the aboveground plant that is highly ABA inducible (Li et al., 2008). Our construct was not ABA inducible (data not shown). This suggests that ABA induction of NF-YA5 relies on promoter elements beyond 1 kb upstream of the start codon. Overall, for 34 of 36 NF-Y genes, we find that our GUS patterns are largely consistent with published expression patterns. Our NF-YA5 and NF-YB10 results are included in all figures but are boxed in red wherever they clearly differ from the published record.

Dark-Versus Light-Grown Expression Patterns
To further demonstrate the usefulness of the promoter:GUS fusions, we compared our results with a previous publication examining NF-Y functions in blue light perception and ABA signaling (Warpeha et al., 2007). RT-PCR was used by the authors to identify NF-Y expression in 6-d-old dark-grown seedlings. For the NF-YC genes, the authors examined NF-YC1 through NF-YC9 and reported that NF-YC1, -YC4, and -YC9 were expressed. We confirmed this finding and added NF-YC3 to the list (Fig. 7). It is not surprising to find that NF-YC3 is expressed in a similar fashion to NF-YC1, -YC4, and -YC9, as all four genes are very closely related and likely arose from recent duplication events (Fig. 1). Additionally, we also tested NF-YC10 and the newly identified NF-YC11 to -YC13. Demonstrating that NF-YC expression patterns in dark-grown seedlings are more complicated than previously realized, we also found that NF-YC10, -YC11, and -YC12 are strongly expressed in the dark. There were no significant GUS expression differences between light-and dark-grown seedlings (Fig. 7). Promoter searches for CCAAT motifs in Arabidopsis and humans. A, Arabidopsis and human promoter data sets were searched for sequence matches to a PWM based on a large human and yeast data set of experimentally defined CCAAT sites. To control for differences in the size of each data set, numbers of positive matches (90% threshold for positive match) are presented as sites per 100 promoters examined. B, The same data sets were searched for the simple presence of the sequence CCAAT or its reverse complement ATTGG. Note that Arabidopsis is the top line in B.

Arabidopsis NF-Y Transcription Factors
Plant Physiol. Vol. 149, 2009 In addition to the NF-YC data, it was previously reported that only NF-YA5, NF-YB6, and NF-YB9 are expressed in 6-d-old dark-grown seedlings (Warpeha et al., 2007). To further validate our promoter:GUS fusions, we additionally examined the NF-YA and NF-YB expression patterns (data not shown). Our findings were inconsistent with the previous report in that we saw clear expression for all 10 NF-YA genes (NF-YA5 was weak) as well as NF-YB2, -YB3, -YB5, -YB6, -YB8, -YB9, and -YB12. In our experiments, NF-YB6 (L1L) and NF-YB9 (LEC1) were among the weakest of the GUS-positive NF-YB lines: GUS staining for NF-YB9 lines required extensive incubation (2 d) in 5-bromo-4-chloro-3-indoxylb-D-glucuronide solution. This finding is consistent with online microarray data and the known, narrow roles for LEC1 and L1L in embryogenesis, but it does not rule out a role in blue light perception (Lotan et al., 1998;Kwong et al., 2003;Lee et al., 2003;Warpeha et al., 2007). We additionally confirmed our GUS results by examining the presence or absence of NF-Y gene expression by RT-PCR (data not shown). Our RT-PCR data were consistent with the GUS data presented in Figure 7. Additionally, we generated a comparative table for our GUS expression data versus publicly available microarray data for the dark-grown seedlings, root tips, rosettes, and flowers (Figs. 6-9; Supplemental Table S1).

Rosette Expression Patterns
We examined the rosette expression patterns of all 36 NF-Y genes (Fig. 8). In general, the NF-Y expression patterns are spatially complex with highly variable levels of intensity. For example, numerous genes have trichome expression. In some instances, this is part of a larger staining pattern (e.g. NF-YA7, NF-YB2, and NF-YC3), while in others, the staining is much more specific to the trichome (e.g. NF-YA1 and NF-YB12). Several genes from each family have clear vascular expression patterns. These genes are particularly interesting because CO function in floral induction is a phloem-specific process Ayre and Turgeon, 2004). Furthermore, it is now well established that FT accumulates in the phloem and then translocates to the shoot meristem (Corbesier et al., 2007;Jaeger and Wigge, 2007;Mathieu et al., 2007;Tamaki et al., 2007). If, as current data suggest, NF-Y proteins form an interaction platform for CO at the FT promoter, this process would also be expected to take place in the leaf phloem.
Strongly supporting this theory, NF-YB2 and NF-YB3 are strongly expressed in the vasculature (Fig. 8), and they are known to redundantly control photoperiodregulated flowering time (Kumimoto et al., 2008). NF-YB2 and NF-YB3 are very closely related proteins Figure 6. NF-Y expression patterns in 6-d-old Arabidopsis root tips. Note that none of the roots continue to have broad, whole root expression above the approximate level where these pictures were cut off (i.e. mature roots do not generally have NF-Y expression in the cortical and epidermal layers of the root). Many NF-Y genes have strong expression in the root stele; these are generally the same genes that are expressed in the leaf vasculature of rosettes (Fig. 8). All panels are shown at the same magnification. Bar in Col-0 panel = 200 mm.
sharing 94% amino acid identity in their conserved domains (Figs. 1 and 3). NF-YB8 and NF-YB12 also have vascular expression patterns and might be expected to have a role in flowering. Nevertheless, the vascular expression of NF-YB8 is much weaker than that of either NF-YB2 or -YB3 and does not appear to extend into the second set of true leaves. NF-YB12 is phylogenetically distant from both NFYB2 and -YB3, and nf-yb2 nf-yb12 double mutants show no additive delay in flowering time over the nf-yb2 single mutant (B.F. Holt, R.W. Kumimoto, and N. Siefers, unpublished data). Therefore, especially considering their co mutant phenocopy (Kumimoto et al., 2008), nf-yb2 nf-yb3 are likely the major NF-YB components of the photoperiodregulated flowering response. The next important challenge is to describe the entire NF-Y complex involved in flowering.
NF-YC1, -YC3, -YC4, -YC9, -YC11, and -YC12 are all vascular expressed and, therefore, are potential targets for flowering time control. Beyond the first set of true leaves, NF-YC1 and NF-YC11 expression does not generally extend into the vasculature. Of the remaining four genes, NF-YC3, -YC4, and -YC9 are all very strongly expressed in the vasculature and are very closely related; in fact, NF-YC3 and YC9 are identical across their 78amino acid conserved regions. Unlike single mutations in NF-YB2 and -YB3 (Cai et al., 2007;Chen et al., 2007;Kumimoto et al., 2008), we did not measure any flowering delays for nf-yc1, nf-yc3, nf-yc4, nf-yc9, or nf-yc11 (B.F. Holt, R.W. Kumimoto, and N. Siefers, unpublished data). Although this does not formally exclude involvement for any of the other NF-YC genes, we suspect that overlapping functionality will be important here as well. Assuming that they are involved, the NF-YA genes may be the most complicated: most of them have some level of vasculature expression, and the overall family conservation is much higher than for the NF-YB and NF-YC families (Figs. 1, 2, and 8). Interestingly, plants expressing 35S:miR169, which is predicted to target all NF-YA mRNAs except NF-YA4, -YA6, and -YA7 (Jones-Rhoades and Bartel, 2004), do not show a flowering-time phenotype (Li et al., 2008;W.-X. Li and J.-K. Zhu, personal communication).

Flower Expression Patterns
As in the rosettes, floral NF-Y expression patterns are quite variable and complex. Some of the genes with ubiquitous expression in rosettes are much more restricted in the floral organs. For example, NF-YB3 is widely expressed in the rosette but is restricted to the filaments in flowers. Alternatively, genes such as NF-YB7 are minimally expressed in the rosettes and ubiquitously expressed in the flowers. The potential for using these tissue-specific patterns to infer likely NF-Y complexes is simply illustrated by the stigmas; in the pictured developmental stage, only NF-YA7, NF-YB7, NF-YB12, NF-YC3, and NF-YC12 have strong stigma expression. To our knowledge, no phenotypes related to floral organs have been reported for the NF-Y genes. This small set may be a good place to start.

DISCUSSION
There have been numerous NF-Y duplications unique to the plant lineage. However, compared with yeast and mammals, much less functional information is currently available. This is likely due to overlapping functionality between NF-Y subunits re- Figure 7. NF-YC expression patterns in light-versus dark-grown plants. Pairs of light-and darkgrown seedlings are shown. To facilitate direct comparison, plants were grown as described previously (Warpeha et al., 2007). Dark-grown plants were infiltrated and stained in the dark to prevent possible light activation of promoter:GUS fusions. Bar in Col-0 panel = 500 mm.
sulting in very few plant NF-Y genes isolated in forward genetic mutant screens. Additionally, when plant NF-Y proteins have been associated with specific functions, identifying their interacting partners has been complicated by the numbers of possible NF-YA/ B/C combinations. Although their existence is virtually certain, no single complete NF-Y complex has ever been described in plants.
To simplify future analyses and aid in the discovery of complete complexes, we have created a set of publicly available expression lines. The gross morphological expression patterns appear to be accurate for 34 of 36 lines. Arabidopsis NF-Y genes are expressed in almost all plant tissues we examined, and this was typically true for at least one of each subunit type per tissue examined (Figs. 6-9). The epidermis and cortex of mature roots were the only tissues largely devoid of NF-Yexpression. The tissue-and development-specific expression patterns presented here will simplify the process of identifying complete NF-Y complexes.
It is interesting that plants utilize so many NF-Y genes while other complicated organisms need only single copies of each subunit. This expansion is also true for other Arabidopsis transcription factors, such as the Myb, Myc, and MADS proteins . Evolution- ary biologists have long postulated that infrequent gene duplication events typically result in the loss or inactivation of one of the duplicated copies (Lynch and Force, 2000;Lynch et al., 2001;Meyer, 2003;Ward and Durrett, 2004;Moore and Purugganan, 2005). However, whole genome sequencing has demonstrated that gene duplications take place more commonly than expected and are often maintained indefinitely in the apparent absence of functional diversification (Meyer, 2003). For many of the NF-Y genes, especially NF-YA, the high degree of conservation strongly suggests the maintenance of ancestral functions. Perhaps plant NF-Y proteins have maintained their core ancestral DNA-binding and complex-forming functions while refining the ways in which they interact with specific cis-and trans-elements.
Our searches for CCAAT motifs using a yeast/ mammalian PWM suggest divergence of the cis-elements bound by plant NF-Y (Fig. 5). Alternatively, there may simply be fewer NF-Y-regulated genes in plants. Because of the amino acid conservation across lineages, we favor the idea that plant NF-Y complexes still bind sequences with a central CCAAT motif but the surrounding bases of their cognate cis-elements have evolved from those of other lineages. Obviously, there is a great need for direct, in vivo measurements of NF-Y/ DNA interactions if we hope to understand how these proteins have uniquely evolved in the plant lineage. Figure 9. NF-Y expression patterns in flowers. Flowers were harvested from plants at approximate "principal growth stage" 6.50 (Boyes et al., 2001;Kjemtrup et al., 2003). Bar in Col-0 panel = 2 mm.
The addition of novel, plant-specific trans-interacting factors to NF-Y complexes may also greatly modify their functionality. For example, there is at least one known example of the NF-YB/YC dimer being coopted by another transcription factor (OsMADS18; Masiero et al., 2002). Additionally, there is evidence that CO and CO-Like proteins can physically associate with specific NF-Y subunits (Ben-Naim et al., 2006;Wenkel et al., 2006), but it remains to be seen if these complexes can bind CCAAT motifs in vivo.
One exciting prospect is that many of the CCT domain proteins might require NF-Y complexes to exert their regulatory effect on specific promoters. For example, CO might exert its regulatory effects on FT expression by competing for the position of NF-YA occupancy in NF-Y complexes (Wenkel et al., 2006). The coincidence of light (LD conditions) and peak CO expression might allow CO to outcompete NF-YA for utilization of specific NF-YB/C dimers. In this formulation, specific NF-YA proteins might act as negative regulators of CO-mediated flowering. As predicted by this replacement model, overexpression of several NF-YA subunits delayed flowering (Wenkel et al., 2006). Additionally, as a variation on the replacement model, there may be no need for NF-YA in a CO/NF-YB/YC complex. To date, no supporting loss-of-function data have been published regarding NF-YA subunits in flowering.
While there are clear similarities between NF-YA and CCT-containing proteins, there are important unanswered questions with a replacement model. First, the region of similarity between the two domains is almost exclusively within the NF-YA DNA-binding domain. There is essentially no similarity between CCT proteins and the NF-YA subunit interaction domain (Fig. 2). Thus, the alignments do not support a direct equivalency between NF-YA and CCT proteins for interactions with NF-YB/YC dimers. Additionally, while the DNA-binding domains of NF-YA proteins share clear similarities with CCT domains, none of the required His residues are shared. His residues are absolutely essential for NF-Y complex binding to CCAAT motifs (Xing et al., 1993). Therefore, assuming that the replacement model is correct, one would not expect the resulting CO/NF-YB/YC complexes to bind at CCAAT motifs. An alternative model for CO/NF-Y interactions places CO docking on preassembled, mature NF-Y/DNA complexes but not actually displacing the NF-YA subunit. If this docking model is correct, we would expect appropriate NF-YA loss-of-function alleles to result in flowering delays. Although this is just one example of NF-Y interactions, it may prove paradigmatic for how NF-Y complexes uniquely function and fine-tune plant gene expression.
After numerous rounds of duplication, plant NF-Y proteins have likely evolved numerous unique cisand trans-interactions and have clearly become much more highly regulated in their tissue-and developmentspecific expression patterns. This in turn suggests a refinement and narrowing in their gene targets. We expect that this refinement will be in marked contrast to the more broad and universal transcriptional activation potential expected in animal systems. As there are now several very interesting NF-Y-associated developmental and stress-responsive processes, we expect that our collective understanding of NF-Y complexes and their plant-specific functions will expand rapidly in the next few years.

Plant Growth Conditions
All Arabidopsis (Arabidopsis thaliana) plants used are in the Columbia (Col-0) ecotype. Plants for rosette and flower expression patterns (Figs. 8 and 9) were grown at 23°C in a standard LD light regime (16 h of light/8 h of dark). Plants were grown in medium containing equal parts Farfard C2 Mix and Metromix 200 supplemented with 40 g of Marathon pesticide and dilute Peters fertilizer (NPK = 20:20:20). Plants were watered throughout with dilute fertilizer (approximately one-tenth the recommended regular feeding levels). Root and light/dark experimental plants (Figs. 6 and 7) were grown on sterile plates. To allow appropriate comparisons, root and light/dark plants were grown exactly as described by Birnbaum et al. (2003) and Warpeha et al. (2007), respectively.

Phylogenies and Alignments
Individual subunits were identified by standard BLAST searches at TAIR and the National Center for Biotechnology Information (Altschul et al., 1990). Full-length protein sequences were imported into MEGA 4, where phylogenies and alignments were created (Tamura et al., 2007). Multiple sequence alignments were done by ClustalX (Thompson et al., 2002). Alignments and phylogenies were created from both full-length protein sequences and truncated sequences identical to those shown in Figures 2 to 4. All phylogenetic trees presented in this article were derived from the truncated alignments of highly homologous regions. Neighbor-joining and bootstrap methods were employed as described previously (Hall, 2008). Figures 2 to 4 were created with BOXSHADE (created by K. Hofmann and M. Baron) within the Mobyle Web portal (http://mobyle.pasteur.fr/cgi-bin/MobylePortal/ portal.py).

CCAAT Searches
Using the Perl TFBS module (Lenhard and Wasserman, 2002), we generated a 16-base PWM from the nucleotide frequencies in 178 confirmed yeast and mammalian NF-Y binding sites (Mantovani, 1998). We searched the human and Arabidopsis data sets with this PWM, counting only those sequences that achieved a match score of 90% or greater of the maximum possible score. Match scores were calculated as a sum across positions of the weights for the observed nucleotide at that position. Arabidopsis promoter sequences were assembled as described in the text by TAIR data curators Aleksey Kleitman and Leonore Reiser.

Promoter:GUS Fusions
To clone each NF-Y promoter region, primers were designed with partial B1 and B2 sites (according to standard Gateway protocols; Invitrogen) and gene-specific sequence. These PCRs were then used in a second PCR step containing full-length B1 and B2 primers. The resulting PCR products were then cloned into pDONR207 (Invitrogen). After confirming the correct sequence for each promoter, each was transferred by LR recombination reaction to the eGFP/GUS fusion-containing binary vector pGWFS7 (Karimi et al., 2002). For each promoter, the sequence represented 1,000 bp starting at 2 bp downstream of the ATG start codon and extending 995 bp upstream of the start codon. The 2 bp downstream of the ATG placed the gene-specific ATG in frame with the eGFP/GUS start codon. We chose this conservative strategy to avoid possible aberrant expression patterns arising from the gene-specific start codon being out of frame with the reporter gene. After transferring each construct into plants by standard Agrobacterium tumefaciens-based methods (Bechtold et al., 1993), we confirmed the correct transgene in each line by PCR amplification from plant DNA and restriction digestion. All pictures show stable, third-generation plants with single T-DNA insertions. At least two lines were examined per promoter:GUS fusion.

Microscopy
Macrophotography was used to visualize the Arabidopsis rosettes in Figure 8. For this, an Olympus DP71 CCD camera was fitted with a Pentax KC 50-mm Adapter and mounted above the subject with a Nikon MKII optic light providing illumination from above. SPOT software (version 4.6) was used to record the pictures. For Figures 6, 7, and 9, we used an Olympus BX41 microscope with an Insight 2 Megapixel Color Mosaic CCD camera. We used SPOT software (version 4.6) to record the pictures.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Phylogenetic tree for the CO-Like proteins.
Supplemental Figure S2. Full alignment of CCT domains from CO and COL1-5 with the Arabidopsis, human (Hs), rat (Rn), and yeast (Sc) NF-Y.