ATX3, ATX4, and ATX5 Encode Putative H3K4 Methyltransferases and Are Critical for Plant Development1[CC-BY]

Characterization of ATX3/4/5 completes elucidation of Arabidopsis Thritorax homologs and enhances the understanding of the occurrence, distribution, and function of H3K4me2 and H3K4me3 in plants. Methylation of Lys residues in the tail of the H3 histone is a key regulator of chromatin state and gene expression, conferred by a large family of enzymes containing an evolutionarily conserved SET domain. One of the main types of SET domain proteins are those controlling H3K4 di- and trimethylation. The genome of Arabidopsis (Arabidopsis thaliana) encodes 12 such proteins, including five ARABIDOPSIS TRITHORAX (ATX) proteins and seven ATX-Related proteins. Here, we examined three until-now-unexplored ATX proteins, ATX3, ATX4, and ATX5. We found that they exhibit similar domain structures and expression patterns and are redundantly required for vegetative and reproductive development. Concurrent disruption of the ATX3, ATX4, and ATX5 genes caused marked reduction in H3K4me2 and H3K4me3 levels genome-wide and resulted in thousands of genes expressed ectopically. Furthermore, atx3/atx4/atx5 triple mutants resulted in exaggerated phenotypes when combined with the atx2 mutant but not with atx1. Together, we conclude that ATX3, ATX4, and ATX5 are redundantly required for H3K4 di- and trimethylation at thousands of sites located across the genome, and genomic features associated with targeted regions are different from the ATXR3/SDG2-controlled sites in Arabidopsis.

Changes in chromatin state involve several types of modification of an array of histone residues to allow fine-tuning of genes with diverse transcriptional profiles. A major regulator of chromatin state and controller of gene expression is methylation of Lys residues in the tail of the H3 histone (Thorstensen et al., 2011). This modification is conferred by a large family of enzymes, histone Lys methyltransferases. Most of them contain an evolutionarily conserved SET (suppressor of variegation, enhancer of zeste and trithorax) domain, which is responsible for the methyl transferase activity (Tschiersch et al., 1994;Lei et al., 2012). The Arabidopsis (Arabidopsis thaliana) genome encodes a group of 49 such proteins, known as the SET DOMAIN GROUP (SDG; www.chromdb.org; Baumbusch et al., 2001;Ng et al., 2007;Berr et al., 2010).
SDG proteins are classified into five distinct classes, based on their activity and domain architecture (Springer et al., 2003;Zhao and Shen, 2004;Pontvianne et al., 2010). One of the main classes is class III, which contains proteins acting on Lys 4 (H3K4; Jackson et al., 2002;Alvarez-Venegas et al., 2003;Lindroth et al., 2008;Xu et al., 2008;Jacob et al., 2009). There are five class III SDG proteins in Arabidopsis, named ATX1-5, which are all homologs of Trithorax proteins found throughout eukaryotes (Veerappan et al., 2008;Schuettengruber et al., 2011). Only two of them have been characterized so far. ATX1 was found to be involved in H3K4 trimethylation and required for root, leaf, and floral organ development (Baumbusch et al., 2001;Alvarez-Venegas et al., 2003;Alvarez-Venegas and Avramova, 2005;Fromm and Avramova, 2014;Napsucialy-Mendivil et al., 2014), as well as for transcriptional regulation of several stress-response genes (Ding et al., 2011a(Ding et al., , 2011b. ATX2 has sequence that is fairly similar to that of ATX1, but the protein shows diverged biochemical properties. It exhibits H3K4me2 activity, rather than H3K4me3 activity, and influences expression of a gene set that is largely different from that affected by ATX1 (Pien et al., 2008;Saleh et al., 2008).
In addition to the five ATX proteins, Arabidopsis has seven ATX-Related (ATXR) proteins, of which only two have been characterized (Baumbusch et al., 2001;Avramova, 2009). ATXR7/SDG25 is involved in flowering time regulation by activating expression of FLOWERING LOCUS C (FLC) (Berr et al., 2009;Tamada et al., 2009). ATXR3/SDG2, in contrast, has a much broader role, as it affects a large number of processes participating in both sporophyte and gametophyte development (Berr et al., 2010;Guo et al., 2010), which it does by controlling H3K4 trimethylation at thousands of loci genome-wide.
H3K4me2 and H3K4me3 are both marks of active chromatin. Interestingly, however, they display distinct genomic profiles (Zhang et al., 2009;Du et al., 2013). H3K4me3 marks tend to appear upstream of H3K4me2 marks within genes (Zhang et al., 2009) and are more narrowly spread compared to H3K4me2 marks (van Dijk et al., 2010). There is also indication that the two marks display different dynamics in response to external stimuli (van Dijk et al., 2010). These observations indicate that H3K4me2 and H3K4me3 are controlled by distinct cellular mechanisms and sets of genes. However, despite a large number of ATX and ATXR proteins identified, exact roles of specific genes have not been well defined, particularly for the genes involved in H3K4me2. To elucidate the roles of ATX proteins, we characterized the three so-far-unexplored ATX family members, ATX3, ATX4, and ATX5. We found that they are responsible for controlling a large number of genes critical for both vegetative and reproductive development. Furthermore, they affect H3K4me2 and H3K4me3 at thousands of sites throughout the genome. The former makes them the first enzymes with genome-wide H3K4me2 activity identified so far in Arabidopsis. ATX3/4/5 are similar to ATXR3/SDG2 in that they control genome-wide H3K4me3 profiles. However, the loci affected by ATX3/4/5 are largely distinct from those affected by ATXR3/SDG2, indicating that ATX3/ 4/5 act in a pathway separate from ATXR3/SDG2.

RESULTS
ATX3, ATX4, and ATX5 Exhibit Common Evolutionary Origin, Identical Domain Structure, and Similar Expression Patterns The five Arabidopsis TRX genes have been shown to form two subfamilies, one containing ATX1 and ATX2, and the other comprised of ATX3, ATX4, and ATX5 (Avramova, 2009). To examine the evolutionary origin of these genes, we conducted a phylogeny reconstruction using homologs of ATX genes from rice (Oryza sativa), maize (Zea mays), and Physcomitrella patens (Fig. 1A). Rice has one putative ATX1/2 homolog (LOC_Os09g04890) and two putative ATX3/4/5 homologs (LOC_Os01g46700 and LOC_Os01g11952; Ng et al., 2007). The maize genome has one putative homolog of ATX1/ATX2 (GRMZM2G013794) and three putative ATX3/4/5 homologs (GRMZM2G372928, GRMZM2G085266, and GRMZM2G170412; Springer et al., 2003). In Physcomitrella, there are three ATX1/2 homologs (XP_001766115, XP_001767466, and XP_001780587) and a single ATX3/4/ 5 homolog (XP_001777592), which we identified using BLAST searches. Our phylogeny reconstruction showed that the two plant ATX subfamilies originated early in plant evolution, as the two groups are already present in Physcomitrella (Fig. 1A). However, the origins of the paralogs within each subfamily are more recent and occurred independently in different plant lineages.
The main feature defining the two ATX subfamilies is the presence of domain associated with SET in trithorax (of either the F/Y-rich N terminus or F/Y-rich C terminus type) in the ATX1/2 subfamily, whereas the ATX3/4/5 subfamily lacks the domain associated with SET in trithorax but instead contains an additional plant homeodomain (PHD) finger domain (Fig. 1B). The latter structure and its amino acid sequence are conserved in all three Arabidopsis proteins ( Fig. 1B; Supplemental Fig. S1), which suggests that they could exhibit similar biochemical activities.
To start investigating functional properties of ATX3, ATX4, and ATX5, we first examined the expression patterns of their corresponding genes. To do this, we generated transgenic plants carrying promoters of the three genes fused to the coding region of the GUS gene. The three transgenes exhibited similar patterns of expression, although ATX5:GUS showed stronger overall expression than ATX3:GUS and ATX4:GUS (Fig. 1C). All three transgenes were expressed in cotyledons, leaves, and hypocotyls, but not in roots (Fig. 1C). Strong expression was also observed in vascular tissues and trichomes (Fig. 1C). In reproductive tissues, expression was detected in sepals, pedals, anthers, filaments, styles, and stigmas, but not in mature pollen (Fig. 1C). To verify the expression patterns, we conducted real-time quantitative PCR (RT-qPCR). Consistent with the GUS results, we found transcripts of ATX3, ATX4, and ATX5 in seedlings, stems, leaves, green buds, and open flowers. However, only low levels of transcripts were present in roots (Fig. 1D).
Taken together, our results indicated that Arabidopsis ATX3, ATX4, and ATX5 are genealogically closely related and exhibit the same protein domain structure as well as similar tissue-specific expression patterns. These data suggested that the three genes may have similar functions. To examine whether ATX3, ATX4, and ATX5 are indeed functionally redundant, we conducted mutant analyses. We obtained two T-DNA insertional mutants for each of the three ATX genes from various Arabidopsis mutant resources (see "Materials and Methods"). The T-DNA insertion sites were verified by sequencing their flanking regions ( Fig. 2A). Using RT-PCR, we could not detect transcripts of the disrupted genes in any of the six mutants (Supplemental Fig. S2), suggesting that the mutations are null. However, none of the single mutants exhibited a discernible vegetative or reproductive mutant phenotype under normal growth conditions. We then generated double mutants in different combinations: atx3/atx4, atx3/atx5, and atx4/atx5. However, we still could not find any apparent mutant phenotype. Consequently, we produced triple mutants, and these did display several strong phenotypes. The triple mutants were generated using all eight possible combinations of atx mutant alleles. However, since all combinations exhibited nearly identical phenotypes, we only focused on the atx3-1/atx4-1/atx5-1 triple mutant in further phenotypic characterization. Figure 1. Phylogenetic relationship, protein domain structures, and expression patterns of Arabidopsis ATX3, ATX4, and ATX5 genes. A, Bayesian phylogeny reconstruction of ATX homologs in Arabidopsis, maize, rice, and Physcomitrella patens. Numbers next to branches indicate posterior probability values. The scale indicates number of substitutions per site. Protein sequences were aligned using ClustalX (Jeanmougin et al., 1998), and phylogeny reconstruction was conducted using MrBayes (Huelsenbeck et al., 2001). B, Domain organization of Arabidopsis TRX proteins. Protein structure was analyzed using NCBI-CD and SMART searches. Abbreviations: PWWP, Domain named after a conserved Pro-Trp-Trp-Pro motif; FYRN, F/Y-rich N terminus; FYRC, F/Y-rich C terminus; PHD, plant homeodomain. C, Analysis of spatial and temporal expression patterns of ATX3 (top), ATX4 (middle), and ATX5 (bottom) using promoter-GUS fusion. Bars = 1 mm. D, RT-qPCR analysis of ATX3, ATX4, and ATX5 expression. Expression levels relative to the expression level of the ACTIN2 gene are presented with SEs. Values are means of three independent experiments. atx3-1/atx4-1/atx5-1 triple mutant plants grew normally at early seedling stages of 1 to 7 d (Fig. 2B), but obvious abnormalities started appearing at 2 weeks of growth. In particular, leaves of 2-week-old seedlings were much smaller than those of wild-type plants (Fig.  2C). Furthermore, growth was drastically retarded throughout vegetative and reproductive stages, resulting in dwarf plants with small leaf rosettes (Fig. 2, D and E). Flowering time was not significantly affected in mutant plants. However, seed set was dramatically reduced: mutant plants exhibited an average of 21 seeds per silique, compared to 55 seeds per silique in wild type (n = 50; Fig. 2, F and G).
To elucidate causes of the reduced seed set, we dissected flower development in the atx3-1/atx4-1/ atx5-1 mutant. We found that stamen development was significantly delayed starting at stage 12 ( Fig. 2H). At that point, wild-type anthers matured and were yellowish, whereas anthers in the triple mutant remained green. Starting at stage 13, mutant stamens were also much shorter than wild-type stamens, which prevented pollen release from anther sacs onto stigma (Fig. 2H). However, aniline blue staining showed that the growth of pollen tubes that managed to land and germinated on stigma was normal (Fig. 2I). Following pollen release, anthers in the mutant were also slow to senesce (Fig. 2H). Moreover, mutant anthers were smaller than wild-type anthers and contained aborted pollen grains (Supplemental Fig. S3A), although most mature pollen grains developed normally (Supplemental Fig. S3B). The gynoecium in mutant plants appeared normal (Supplemental Fig.  S3C). These observations implicate defects in anther development as the cause for the reduced seed set in atx3/4/5 mutant plants.
Taken together, our mutant analyses indicated that ATX3, ATX4, and ATX5 are redundantly required for both vegetative as well as reproductive development.

Disruption of ATX3/4/5 Causes Decreases of H3K4 Di-and Trimethylation at Sites throughout the Genome
To study biochemical functions of ATX3, ATX4, and ATX5, we first examined the activity of the proteins in vitro by expressing the C-terminal, SET-domain-containing regions of the corresponding genes in E. coli. We strictly followed the methods previously used to produce a recombinant SDG2 protein (Guo et al., 2010). However, despite generating sufficient protein quantities, we were not able to detect HMT activity using recombinant mouse H3 histones as substrate (Supplemental Fig. S4). Based on this result, we concluded that the proteins may require additional factors for their HMTase activity, such as interacting proteins or posttranslational modifications.
We then investigated the activity of the three proteins in planta by using western blot analyses to compare the overall levels of several histone modifications, H3K4me1, H3K4me2, H3K4me3, H3K9me2, H3K27me3, and H3K36me3, in wild-type and atx3/atx4/atx5 triple-mutant plants. We found that levels of H3K4me2 and H3K4me3 were consistently lower in the triple mutant compared to wild type (Fig. 3A). The quantification of blot using ImageJ software (https://imagej.nih.gov/ij/) further revealed that the reduction in both H3K4me2 and H3K4me3 was ;25% and statistically significant (Fig.  3B). In contrast, H3K4me1, H3K9me2, H3K27me3, and H3K36me3 were not affected. These data suggested that ATX3/4/5 specifically catalyze H3K4 di-and trimethylation.
To learn more about the properties of ATX3, ATX4, and ATX5, we examined distribution of H3K4me2 and H3K4me3 marks in wild-type and atx3/4/5 mutant plants using chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) (Supplemental Table S1). In wild-type plants, we identified 20,585 H3K4me2 sites (Supplemental Table S2) and 14,496 H3K4me3 sites (Supplemental Table S3). In atx3/4/5 mutant plants, the number of H3K4me2 and H3K4me3 sites was substantially lower than in wild type (Supplemental Tables S4 and S5). As typical for H3K4me2 and H3K4me3 patterns (Zhang et al., 2009), we found the most enrichment occurring around gene transcription start sites (TSSs) in both wild-type and mutant plants (Fig. 3,C and D). Although the overall distribution of H3K4me2 or H3K4me3 tend to overlap in wild-type and atx3/4/5 mutant plants, the intensity around peak summit around TSSs for both H3K4me2 and H3K4me3 was significantly decreased in mutant (Student's t test, P , 2.2e-16) after conducting a simulation test ("Materials and Methods").
In the atx3/4/5 mutant, we identified 2409 sites exhibiting decreased presence of H3K4me2 and 2375 sites with decreased H3K4me3, representing 11.7% and 16.4% of all wild-type H3K4me2 and H3K4me3 sites, respectively. These data demonstrated that ATX3/4/5 control the landscape of both H3K4me2 and H3K4me3 throughout the genome ( Fig. 3E; Supplemental Fig. S5).
The sites affected in the atx3/4/5 mutant could be grouped into three subclasses. Class I (n = 634) contains sites exhibiting a decrease in both H3K4me2 and H3K4me3  Table S8). The existence of class II and III sites is intriguing; as H3K4me2 is substrate for H3K4me3, changes in H3K4me3 should accompany changes in H3K4me2. A possible explanation of why these sites are present would be that most of them already lack H3K4me2 or H3K4me3 in wild type (Supplemental Fig. S6, B and D). However, as many as 26% of class II sites (Supplemental Fig. S6C) and 22% of class III sites (Supplemental Fig. S6E) exhibited both H3K4me2 and H3K4me3 in wild type. This finding suggests that there might be other HMTases acting on the same chromosomes sites independently from ATX3/4/5.
To investigate the mode of redundancy of ATX3, ATX4, and ATX5 in regulating H3K4me2/3 levels at specific regions, we selected six loci from the list of genomic sites affected in the atx3/4/5 triple mutant (Supplemental Table S6) and conducted ChIP-qPCR to measure H3K4me2 and H3K4me3 levels in single and double mutants compared to the triple mutant. We found that only atx5 showed an H3K4me2/3 decrease as a single mutation and in double mutant combinations, although the decrease was significantly smaller than that in the triple mutant (Supplemental Fig. S7). This effect was seen in four of the six loci. It is worth noting that the degrees of H3K4me2 and H3K4me3 reductions in the triple mutant in the ChIP-qPCR experiments were similar to those observed in ChIP-seq. Altogether, these results suggest that ATX3, ATX4, and ATX5 act redundantly with ATX5 having a larger role than ATX3 and ATX4.

ATX3/4/5 Activity and Transcription
Distribution of H3K4me2 as well as H3K4me3 generally reflects the distribution of genes (Zhang et al., 2009). Of the H3K4me2 and H3K4me3 regions affected in the atx3/4/5 triple mutant, 2542 and 2574 corresponded to genes, respectively. It is noted that some H3K4me2 and H3K4me3 sites may overlap two genes, which located in too close proximity to separate. Therefore, the number of genes appeared larger than the corresponding number of H3K4me2 or H3K4me3 sites. To investigate whether any functional groups of genes were disproportionally targeted by ATX3/4/5, we performed a gene ontology (GO) category analysis (Du et al., 2010). However, we found that most GO categories were equally affected, suggesting that ATX3/4/5 targets do not exhibit a significant bias toward any functional gene class (Supplemental Fig. S8).
To investigate the effect of ATX3/4/5-mediated chromatin remodeling on gene transcription, we conducted an RNA-seq analysis in wild-type and atx3/4/5 triple-mutant plants. We identified 1946 differentially expressed genes, of which 932 were up-regulated in the atx3/4/5mutant (Supplemental Table S9) and 1014 were down-regulated (Supplemental Table S10). Interestingly, decreases in H3K4me2/3 at individual gene loci were not always accompanied by corresponding reductions in transcript levels (Fig. 3F) and vice-versa (Supplemental Fig. S9). We confirmed these findings using H3K4me2 and H3K4me3 ChIP-qPCR and RT-qPCR on 15 genes selected to represent all three classes of ATX3/4/5 target sites (Supplemental Figure S10).
We further examined these conclusions by investigating the effects of atx1 and atx2 mutations on H3K4 di-and trimethylation at six genomic sites that we previously found to be regulated by ATX3/4/5 (Supplemental Table S6). ChIP-qPCR analyses showed that atx1 resulted in a decrease of H3K4me3 at five out of the six sites (Fig.  4C). In contrast, atx2 exhibited lower H3K4me2 at only one site. These data suggested that most genomic sites targeted by ATX3/4/5 are also targets of ATX1, but not of ATX2. Consequently, ATX1 and ATX3/4/5 likely function in the same pathway whereas ATX2 acts separately.
ATX3/4/5-Affected Genomic Sites Are Distinct from Those Controlled by SDG2 Similar to ATX3/4/5, ATXR3/SDG2 also controls H3K4me3 at thousands of sites throughout the genome (Berr et al., 2010;Guo et al., 2010). To investigate whether ATX3/4/5 act in the same way as SDG2, we examined H3K4me3 patterns in the sdg2 mutant using ChIP-seq. We found 6723 regions exhibiting reduced levels of H3K4me3 in sdg2 (Fig. 5A), which constituted 46.4% of all H3K4me3 sites in the genome (Supplemental Tables S11 and S12). This number is substantially larger than the number of H3K4me3 sites controlled by ATX3/4/5. Moreover, we also found that the ATX3/4/5-controlled H3K4me3 sites and the SDG2-controlled sites only partially overlapped ( Fig. 5A; Supplemental Fig. S12, A-C).
To examine what differentiates ATX3/4/5 and SDG2 target sites, we compared their characteristics. We found that H3K4me3 sites targeted by ATX3/4/5 exhibited considerably low level of gene-body CG methylation (Fig. 5B) and high level of H3K27me3 mark (Fig. 5C). In addition, the H3K4me3 sites targeted by ATX3/4/5 were remarkably smaller than the sites targeted by SDG2, exhibiting reduced peak width, height, and intensity (Fig.  5, D-F). Moreover, genes containing the ATX3/4/5 target sites showed, on average, lower expression levels than genes overlapping SDG2-targeted H3K4me3 sites (Fig. 5G). These findings further indicate that ATX3/4/5 and SDG2 function in different ways. It is noted that the same analysis was performed toward ATX3/4/5-targeted H3K4me2 sites, and the results showed a similar pattern as H3K4me3 with the exception that the peak height and intensity of H3K4me2 were higher for ATX3/4/5-targeted sites relative to the whole sites in the genome (Supplemental Fig. S13).

DISCUSSION
To elucidate the network controlling chromatin modification by SET domain proteins, we characterized three Arabidopsis H3K4 methyl transferases, ATX3, ATX4, and ATX5. Our analyses revealed that the three proteins facilitate H3K4 di-as well as trimethylation at a large number of genomic sites. They all act in the same genetic pathway, which is required for both vegetative and reproductive development and is distinct from the previously identified pathway for genome-wide H3K4 trimethylation defined by ATXR3/SDG2.
Five ATX proteins are all homologs of Trithorax protein, which is a large multifunctional protein including a SET domain, PHD fingers, and PWWP domains (Fig. 1B). The SET domain in Trithorax protein has been shown to have Lys methyltransferase activity with substrate specificity for histone H3K4 (Smith et al., 2004;Tie et al., 2014). However, the mechanisms of how the specificity of these Trithorax proteins conferred remain largely unknown.
Our data further show that the action of ATX3, ATX4, and ATX5 exerts a substantial effect on the transcriptome. Interestingly, not all affected transcripts contain ATX3/4/5dependent H3K4me2/3 sites. However, a simple explanation for this phenomenon could be that the genes lacking these sites represent downstream effects of genes directly targeted by ATX3/4/5. Alternatively, it is likely that not all the changes in gene expression are necessarily accompanied with the alterations in H3K4me2/3 marks. Furthermore, it is also possible that the other chromatin states are more or less altered in the atx3/4/5 mutant plants, leading to the secondary effects on gene expression.

Functional Redundancies of ATX Genes
Our genetic experiments showed that ATX3, ATX4, and ATX5 act redundantly. Further analyses using ChIP-qPCR suggested that they operate in a dosagedependent manner and their effects are additive, although ATX5 exhibits a stronger effect than ATX3 and ATX4. The stronger effect could be a result of ATX5 being expressed at a higher level than ATX3 and ATX4 (Fig. 1, C and D). We should note, however, that our results do not exclude a possibility that there are sites in the genome where the relationship of ATX3, ATX4, and ATX5 is different from these six sites that we examined.
We also examined the functional overlap between ATX3/4/5 and other proteins from different clades of the ATX family tree. Phenotypic analyses and ChIP-qPCR experiments indicated that ATX2 acts independently from ATX3/4/5 in regulating H3K4me2. In contrast, our evidence suggests that the ATX1 function partially overlaps with the function of ATX3/4/5 in controlling H3K4me3. It is possible that the ATX1 protein acts in the same complex as ATX3/4/5. Alternatively, ATX1 and ATX3/4/5 may target the same genomic sites. The latter possibility seems more likely, based on the fact that the functions of ATX1 and ATX3/4/5 overlap only partially, as evidence particularly by the less severity of the atx1 mutant phenotype compared to atx3/4/5.
Our biochemical and genetic studies also demonstrated that ATX3/4/5 act in a pathway different from the one defined by SDG2. First, ATX3/4/5 and SDG2 affect largely different sets of genomic target sites. Second, the ATX3/4/5-controlled H3K4me3 sites show distinct features compared to the sites targeted by SDG2, which also suggests that the activities of ATX3/4/5 and SDG2 are controlled by different mechanisms. In particular, we found that ATX3/4/5-targeted genic loci have fairly lower level of the gene-body CG methylation and higher level of H3K27me3 than SDG2 targets, raising the questions of why and how such characteristics are conferred. It has been known that, in general, the activity of gene transcription is positively correlated with the gene-body methylation and negatively correlated with H3K27me3 mark (Zhang et al., 2006;Kouzarides, 2007;Wang et al., 2015;Yang et al., 2016). Indeed, we found that the expression of genes for the ATX3/4/5-targeted loci is significantly lower than SDG2 targets. Therefore, it seems that ATX3/4/5 and SDG2 can target two classes of genic sites with differential gene transcription activity. The next intriguing question raised is how such association of is fulfilled. Recently, Song et al. (2015) have shown that the H3K4 methyltransferase complex can directly interact with specific transcription factor to regulate gene expression in Arabidopsis. Therefore, we speculate that ATX3/4/5 and SDG2 may interact with different transcription factors to facilitate the gene expression and generate the deposition of H3K4me3 mark at specific loci. In addition, two recent studies in mammals found that width of H3K4me3 sites is associated with specific cell identities and transcriptional profiles (Benayoun et al., 2014;Lei et al., 2015). Therefore, it is likely that ATX3/4/5 and SDG2 diverged to modulate transcriptional programs in different tissues or at different stages of development. SDG2 may play a larger role than ATX3/4/5 in facilitating transcription of genes necessary for leaf development, the stage at which tissue samples were collected for transcriptome analyses in this study. In other tissues, such as flowers, where ATX3/4/5 are more abundantly expressed than SDG2, ATX3/4/5 may be more critical. It would be very interesting to compare transcriptome profiles and assess the relative requirement for ATX3/4/5 versus SDG2 in several different tissue types. Finally, our in vitro experiments, while not completely conclusive, suggest that ATX3/4/5 may require different cofactors or posttranslational modifications than SDG2. CONCLUSION We examined three until-now-uncharacterized SETdomain-containing Arabidopsis histone methyltransferases, ATX3, ATX4, and ATX5. The three proteins share ancient origin, predating the emergence of monocots and dicots, but have diverged from each other more recently. They also have similar domain structure as well as expression patterns. They are redundantly required for plant vegetative and reproductive development, and their absence results in profound mutant defects. Unlike most other characterized SET domain proteins except SDG2, they act genome wide. Interestingly, the genomic sites affected by ATX3, ATX4, and ATX5 are largely different from those affected by SDG2. This observation suggests that ATX3, ATX4, and ATX5 define a novel major pathway for H3K4 di-and trimethylation that is different from the SDG2-controlled pathway. Our characterization of ATX3, ATX4, and ATX5 completes elucidation of ATX genes in Arabidopsis. Since the ATX gene family is widely conserved across the plant kingdom, this study should enhance the understanding of the occurrence, distribution, and function of H3K4me2 and H3K4me3 in plants.

Reproductive Development Analyses
Identification of flower development stages was performed according to Smyth et al. (1990) and Alvarez-Buylla et al. (2010). To examine pollen germination, pollen tubes were stained with aniline blue as described by Wang et al. (2011). 49,6-Diamidino-2-phenylindole staining was performed as described by McCormick (2004). Pollen viability staining was performed as described by Alexander (1969). Paraffin section was performed as described by Dou et al. (2011).

Generating and Analyzing Transgenic Promoter-GUS Fusion Arabidopsis (Arabidopsis thaliana) Plants
Promoter regions of ATX3 (from 21296 to 21 bp), ATX4 (from 21916 to 220 bp), and ATX5 (from 22150 to 21 bp) were PCR-amplified from genomic DNA using primers ATX3-1L/1R, ATX4-1L/1R, and ATX5-1L/1R, respectively, and cloned into the pGEM-T vector (Promega). Resulting inserts were confirmed by sequencing and subcloned into pCAMBIA1305 using SalI and BamHI restriction sites. Agrobacterium tumefaciens strain GV3101 was used to transform the construct into Arabidopsis plants by floral dipping (Clough and Bent, 1998). Transgenic plants were selected using 25 mg/mL hygromycin. Histochemical GUS assays were performed on twenty independent T2 transgenic plants for each construct as previously described (Jefferson et al., 1987).

RNA Extraction and RT-PCR
RNA was extracted from 3-week-old rosette leaves using the RNeasy Plant Mini kit (Qiagen). cDNA was synthesized from 5 mg of total RNA using the ProtoScript First Strand cDNA Synthesis Kit (New England Biolabs) following manufacturer's instructions. RT-PCR reactions were performed as previously described  using primers listed in Supplemental Table S13. Transcript levels were estimated using the comparative Ct (threshold cycle) method utilizing ACTIN2 as an internal control for data normalization. Data shown in Figure 1D are averages of three independent experiments.

ChIP-seq Library Construction and Sequencing
ChIP was performed as previously described (Xue et al., 2015). Illumina libraries were constructed using the ChIP-seq DNA Sample Prep Kit (Illumina) according to manufacturer's protocol. Libraries were sequenced using the Singleend Cluster Generation and 100-cycle Sequencing Kits (Illumina) on the Illumina HiSeq2500 sequencing system according to manufacturer's instructions.

Processing and Mapping Illumina Reads to the Genome
Base calling and read quality control were performed following the standard Illumina protocol. Reads passing quality control were aligned to the Arabidopsis genome (TAIR10; www.arabidopsis.org) using Bowtie v1.0.1 allowing for no more than two mismatches (Langmead et al., 2009). Only reads mapped uniquely to the genome were used for further analysis. SAMtools v1.2 was used to convert the mapped reads into BAM files (Li et al., 2009).

Peak Calling
To identify peaks in ChIP-seq datasets, MACS (version 1.4.2) was used with a matching number of reads in treatment and control (Zhang et al., 2008). Peak calling was performed with following parameters: bandwidth = 150 nucleotides; m-fold = 5 to 50; P value cutoff = 1e-05. ATX3/4/5-or SDG2-dependent sites were identified after two filtering steps. First, H3K4me3 ChIP-seq read peak regions were called using the wild-type sample as treatment and the corresponding mutant sample as control. Next, ChIP-seq read peak regions were called using the wild-type sample as treatment and the corresponding input sample as control.
Overlapping regions from the two steps were defined as ATX3/4/5-or SDG2dependent sites. This procedure was devised to avoid the effects of occasional sequence coverage fluctuations, which affect between 0.2% and 5.8% of all reads (Supplemental Fig. S14; Supplemental Tables S6-S8 and S12).

Simulation Test
Differences of H3K4me2 and H3K4me3 around TSSs between wild-type and atx3/4/5 mutant were quantified using simulation method (King et al., 2000). Eighty percent of total H3K4me2/3 sites from wild-type and mutant was randomly shuffled to calculate the average read density of 50 bp up-and downstream of the peak summit 1000 times, and the significance was determined by the paired t test.

RNA-seq Analyses
Polyadenylated mRNA was purified using the Dynabeads mRNA Purification kit (Invitrogen), and library construction was performed with the NEB Next mRNA Library Prep Master Mix Set (New England Biolabs) following manufacturer's instructions. Sequencing was using the Pair-end Cluster Generation kits with 125 cycles on the HiSEquation 2500 sequencing platform. Quality-controlled reads were aligned to the Arabidopsis genome using TopHat-2.0.10 (Trapnell et al., , 2012. Gene expression levels were normalized into fragments per kilobase of transcript per million mapped reads using Cufflinks-2.2.1 with default settings (Trapnell et al., 2012). Genes exhibiting 2-fold change with FDR , 0.05 between wild type and mutant were considered as differentially expressed genes (Supplemental Tables S9 and S10).

DNA Methylation Analysis
The BS-Seq datasets were downloaded from a published study and were analyzed according to the method as previously described (Tang et al., 2016).
In brief, the data used for the DNA methylation analysis were downloaded from the Gene Expression Omnibus (GEO; accession no. GSE83802). Clean reads were mapped to TAIR10 using BRAT-BW v2.0.1 with parameters -m 2 -i 0 -a 1000. Copy duplicates were removed and count mapped Cs and Ts at each cytosine of forward and reverse strands of the reference. For the mega gene plots, sliding window methods were used with the window size equal to 500 nucleotides and step 50 nucleotides, and methylation levels of CG, CHG, and CHH context were plotted with gene-coding regions plus 2 kb upstream of the TSS and downstream of the transcription termination site (TTS).

H3K27me3 Analysis
The H3K27me3 datasets were downloaded from GEO with accession number GSE65329 (Cui et al., 2016) and were analyzed using the same method as described above for H3K4me2/3 data. Analyses span 1 kb upstream of the TSS and the gene body and 1 kb downstream of the TTS.

Accession Numbers
Sequencing datasets are available for download from the NCBI GEO under series GSE73972. A summary of the datasets is provided in Supplemental Table S1.

Supplemental Data
The following supplemental materials are available.
Supplemental Figure S5. IGV images of representative regions selected from all five chromosomes showing altered patterns of H3K4m2 and H3K4me3.
Supplemental Figure S7. Levels of H3K4me2/3 in six representative genes verified using ChIP-qPCR.
Supplemental Figure S9. Lack of correlation between a decrease in gene transcript levels and H3K4me2/3 levels in the atx3/4/5 triple mutant.
Supplemental Figure S11. Quantification of the fresh weight of 4-week-old plants.
Supplemental Figure S14. IGV images showing artifactual H3K4me2/3 peaks prior to second filtration of ChIP-seq data.
Supplemental Table S1. Summary of sequencing datasets used in this study.
Supplemental Table S2. Distribution of H3K4me2 sites in wild-type Arabidopsis.
Supplemental Table S3. Distribution of H3K4me3 sites in wild-type Arabidopsis.
Supplemental Table S9. List of genes up-regulated in the atx3/4/5 triple mutant relative to wild type.
Supplemental Table S10. List of genes down-regulated genes in atx3/ atx4/atx5 triple mutant relative to wild type.
Supplemental Table S11. Distribution of H3K4me3 sites in the sgd2 mutant.
Supplemental Table S13. List of primers used in the study.