Functional analysis of the group 4 late embryogenesis abundant proteins reveals their relevance in the adaptive response during water deficit in Arabidopsis.

Late-Embryogenesis Abundant (LEA) proteins accumulate to high levels during the last stages of seed development, when desiccation tolerance is acquired, and in vegetative and reproductive tissues under water deficit, leading to the hypothesis that these proteins play a role in the adaptation of plants to this stress condition. In this work, we obtained the accumulation patterns of the Arabidopsis (Arabidopsis thaliana) group 4 LEA proteins during different developmental stages and plant organs in response to water deficit. We demonstrate that overexpression of a representative member of this group of proteins confers tolerance to severe drought in Arabidopsis plants. Moreover, we show that deficiency of LEA proteins in this group leads to susceptible phenotypes upon water limitation, during germination, or in mature plants after recovery from severe dehydration. Upon recovery from this stress condition, mutant plants showed a reduced number of floral and axillary buds when compared with wild-type plants. The lack of these proteins also correlates with a reduced seed production under optimal irrigation, supporting a role in fruit and/or seed development. A bioinformatic analysis of group 4 LEA proteins from many plant genera showed that there are two subgroups, originated through ancient gene duplication and a subsequent functional specialization. This study represents, to our knowledge, the first genetic evidence showing that one of the LEA protein groups is directly involved in the adaptive response of higher plants to water deficit, and it provides data indicating that the function of these proteins is not redundant to that of the other LEA proteins.


INTRODUCTION
Water deficit is a common environmental condition that leads to various responses that may help in the adaptation or adjustment of an organism to the stress. It is considered one of the most important environmental stresses influencing plant productivity (Bray, 1997;Morison et al., 2008). The adverse effects of this environmental stress need to be counteracted mainly because of the increasing soil desertification in cultivated and uncultivated regions. This fact demands for plants to tolerate drying periods and elevated salt concentrations in the soil, which may be accompanied by extreme temperatures. Also, the interest in understanding the mechanisms by which plants sense and respond to these environmental cues, account for the most important reasons to study in detail the responses that have been selected in plants to cope with water deficit.
The acquisition of desiccation tolerance during late stages of seed development is correlated with the induction of a set of small, highly hydrophilic proteins called Late Embryogenesis Abundant (LEA) proteins (Dure et al., 1989). These proteins are ubiquitous in plants and, although there are several classifications, we will follow that of Battaglia et al. (2008), where they are classified into seven groups on the basis of sequence similarity. Analysis of the protein sequences in these groups from different plant species defined distinctive motifs within groups (Dure, 1993;Battaglia et al., 2008). The number of members is different for each LEA protein group and varies according to the plant species. Most LEA proteins are hydrophilins, a set of proteins characterized by their biased amino acid composition, rich in glycines, and other small and/or charged residues, and their high hydrophilicity index (Garay-Arroyo et al., 2000).
This amino acid composition promotes their flexible structure in solution, existing mainly as random coils, with the exception of the hydrophobic or atypical LEA proteins (Singh et al., 2005). Moreover, hydrophilic LEA proteins from groups 2, 3 and 4 show a prevalence of typical spectroscopic patterns of intrinsically unstructured proteins (IUP) with the occurrence of transitions from IUP to ordered conformations in the presence of helix promoting solvents or air drying (McCubbin et al., 1985;Russouw et al., 1995;Eom et al., 1996;Lisse et al., 1996;Ismail et al., 1999;Wolkers et al., 2001;Soulages et al., 2002;2003;Goyal et al;2003;Shih et al., 2004;Tolleter et al., 2007). Their high content of water-interacting residues facilitates the scavenging of water molecules, which is of special importance during developmental stages where a programmed desiccation of tissues takes place, as in the dry seed (Dure et al., 1989), or when cells experience changes in their water status (Colmenero-Flores et al., 1999). Remarkably, there is also an elevated induction in the expression of these proteins in vegetative In this work we focus on the study of the group 4 LEA proteins of Arabidopsis thaliana. With only three genes in the genome (AtLEA4-1, AtLEA4-2 and AtLEA4-5), the AtLEA 4 group is one of the smallest groups in Arabidopsis (Hundertmark and Hincha, 2008;Battaglia et al., 2008), which makes it accessible for a 'loss-of-function' analysis. The LEA 4 proteins are characterized by a high content of A, T and G amino acid residues, the latter highly represented in unstructured proteins. They have a conserved N-terminus domain of 70 -80 residues, predicted to form amphipathic αhelices, and a less conserved C-terminus region with variable size and random coil structure (Dure, 1993). As other LEA proteins, the LEA 4 group is highly accumulated in all embryo tissues of dry seeds (Roberts et al., 1993). Recently, Wise (2002) performed a bioinformatics analysis and questioned the existence of a group 4 of LEA proteins as a distinct group of LEA proteins from group 3. The algorithm used the over/under representation of particular amino acids within small motifs in the protein, giving rise to a different classification for these proteins (Wise, 2003). In support of the original classification proposed by Dure et al. (1989) and because of the high sequence conservation within this group in plants, in the present work, we present genetic and functional evidence that group 4 of LEA proteins is indeed a distinct group conserved in the plant kingdom. The results reported here show that over-expression of one of the AtLEA 4 proteins in Arabidopsis leads to a tolerant phenotype compared to their wild type counterparts in their capability to endure severe water deficit, and that the reduction in the accumulation levels of these proteins leads to plants more sensitive to water limiting conditions than their wild type genotypes. All together, these data constitute the first direct evidence indicating that LEA 4 proteins are involved in the adaptive response of vascular plants to withstand water deficit.

Response to Water Deficit Treatments
To gain insight into the function of LEA proteins in the adaptation of vascular plants to water deficit, we carried out a functional analysis of the Arabidopsis LEA 4 protein family, because it is one of the three LEA protein groups with fewer members (two for group 1, three for group 4, and three for group 6) (Battaglia et al, 2008;Hundertmark and Hincha, 2008). These data were confirmed with a BLASTp analysis using a reported LEA 4 homologue from cotton as query (LEA D-113, NCBI Acc. No. M19406); it retrieved three proteins that conformed the Arabidopsis LEA 4 family, encoded in loci At1g32560, At2g35300 and At5g06760. In reference to their chromosomal location, we named the corresponding proteins: AtLEA4-1, AtLEA4-2 and AtLEA4-5 (Supplemental Fig. S1). As predicted by Dure (1993), the proteins in this group were conserved in the amino portion (AtLEA4-1: 1-78, AtLEA4-2: 1-74 and AtLEA4-5: 1-76) for which α-helix and "coiled-coils" structures were predicted to form in silico (Lupas et al., 1991;McGuffin et al., 2000). In contrast, their C-terminal region (AtLEA4-1: 79-134, AtLEA4-2: 75-97 and AtLEA4-5: 77-158; corresponding to a length of 56, 23 and 82 amino acids, respectively) showed a putative random coil structure (Supplemental Fig. S1).
The presence and abundance of a LEA transcript and the corresponding protein during a developmental stage or in response to stress in a plant organ can provide information about their sensitivity to different types of environmental adverse conditions, which can be useful in the elucidation of their function. Hence, we analyzed the LEA 4 group transcript and protein accumulation patterns during embryogenesis and in seedlings of Arabidopsis plants grown under optimal irrigation, and in plants subjected to water deficit treatments. The results from RT-PCR experiments using total RNA showed that, in agreement with available microarray data (Schmid et al., 2005;Winter et al., 2007;Hruz et al., 2008), transcripts of the AtLEA 4 family could be detected in flowers, and during embryo development, but the highest abundance was detected at the dry seed stage. After seed germination, their transcript levels showed a significant reduction (Fig. 1A).
Western blot experiments showed that AtLEA4-1 protein accumulated abundantly in flowers and immature siliques (Fig. 1B). Unexpectedly, in dry seeds during stratification and germination the 14.9 kD AtLEA4-1 protein was undetectable. Instead, a protein with an apparent higher molecular mass (AtLEA4-1-L) was specifically recognized by immunopurified LEA4-1 antibodies in dry or stratified seed protein response to ABA, even though AtLEA4-5 transcript accumulated to similar levels as upon salt treatments. The conspicuous contrast between transcript and protein accumulation patterns in response to ABA again suggested the involvement of a posttranscriptional control for the adjustment of protein levels. Overall, these results demonstrated that the three members of the LEA 4 family were differentially expressed under normal developmental stages and upon stress treatments, suggesting that a functional diversification occurred in the course of evolution.

Arabidopsis Adult Plants
Arabidopsis plants were transformed by floral dipping using an Agrobacterium strain carrying a 35S::AtLEA4-5::NOS fusion, which led to the constitutive expression of the AtLEA4-5 gene (Fig. 3A). We selected AtLEA4-5 gene for over-expression analysis because it showed the strongest response to water deficit treatments as determined by its transcript and protein accumulation patterns among all three family members. Once the over-expression of the protein of interest was verified under control and drought conditions (Figs. 3A and 3B), five independent homozygous transgenic Arabidopsis lines were used to analyze their phenotype under optimal and limiting irrigation (see Materials and methods for details). These lines over-expressed the AtLEA4-5 protein under both growth conditions, showing higher accumulation levels than wild type even upon water deficit treatment ( Fig. 3A and 3B). The phenotypic analysis of these overexpression lines showed that many characteristics throughout development were similar to those shown by WT ( Fig. 8 and data not shown). To evaluate the contribution of the AtLEA4-5 protein to the tolerance of Arabidopsis plants to low water availability conditions, the selected transgenic lines were subjected to water deficit during germination and in the adult stage. During germination, the effect of high concentrations of NaCl (250 mM) and of high osmolarity imposed by mannitol (350 mM) was determined by monitoring germination (radicle emergence) up to 16 days. The transgenic homozygous lines over-expressing AtLEA4-5 protein did not show a significantly improved ability to germinate under these conditions, because their germination rate was similar to that observed for wild type seeds Fig. 5).
In the adult stage, two different experiments were carried out growing the plants in a substrate with low water retention. In order to evaluate the impact of the AtLEA4-5 over-expression on the production of buds after recovery from severe dehydration, plants at the flowering stage were subjected to water deficit by halting irrigation during 14 days, when the substrate water potential (Ψ substrate ) was approximately -6.45 (± 0.57) MPa (Supplemental Fig. S2), after which, plants were rehydrated and allowed to recover during 10 days; at this point, axillary and floral buds were counted (see Material and methods for details). As shown in Figures 3C, 3D and Supplemental Figure S2, the transgenic lines over-expressing AtLEA4-5 protein showed a better recovery from this severe dehydration treatment as compared to wild type plants.
Homozygous AtLEA4-5 over-producing lines showed not only recovery of vegetative tissues (e.g. rosette leaves), but also a higher number of axillary and floral buds contrasting with wild type plants that, even though were able to recover some of their rosette leaves, they were incompetent in bud maintenance (Figs. 3E, Supplemental  Table S1). Total biomass accumulation recorded at the end of the recovery period showed that four of the five 35S::AtLEA4-5::NOS lines presented a significant higher biomass when compared to wild type plants (Fig. 3D).
In an independent experiment, wild type and AtLEA4-5 over-expressing plants at the flowering stage were subjected to water deficit by halting irrigation during 10 days, when Ψ substrate was approximately -4.62 (± 0.62) MPa, at this point complete plants were harvested to determine relative water content (RWC) (see Materials and methods for details). The results showed that plants over-expressing AtLEA4-5 protein exhibited a higher RWC upon water deficit when compared to wild type plants (Fig. 3C). These data support the conclusion that the overproduction of AtLEA4-5 protein confers tolerance to water limiting conditions to Arabidopsis plants as determined by RWC, biomass accumulation, and bud maintenance after recovery from severe dehydration.

Show Sensitive Phenotypes in Response to Osmotic Stress During Germination and to Drought Treatments in Adult Plants
In this work we analyzed the phenotype of mutants affecting the three members of the LEA4 gene family in response to water deficit conditions. A non-autonomous Supressor-mutator transposon insertion (dSpm, Tissier et al., 1999) in the promoter of AtLEA4-5 was obtained from the European Arabidopsis NASC stock centre (N122943).

Southern blot experiments confirmed the dSpm transposon insertion site in the
AtLEA4-5 gene promoter as described in the NASC database (Supplemental Fig. S3).
To investigate whether this mutation affects the production of the corresponding protein, northern and western blot analyses were carried out. The results showed that this insertion led to a severe reduction in the levels of AtLEA4-5 transcript and protein when detected in dry seeds and roots from adult plants grown under water deficit as opposed to wild type plants ( Fig. 4 A, B, C).
This mutant was used to analyze its phenotype during germination in media with or without NaCl (250 mM) or mannitol (350 mM), as well as in adult plants subjected to dehydration-rehydration treatments. As shown in Figure 5, mutant seeds were unable to withstand the stress treatments, because a significant reduction in their germination rate compared to that of wild type seeds was detected, thus indicating that AtLEA4-5 protein was necessary for optimal germination efficiency under water deficit. To evaluate the participation of the AtLEA4-5 protein in the ability of the plant to maintain the production of floral and axyllary buds under water limitation, plants were grown under optimal irrigation conditions until bolting, and at this point they were subjected to a dehydration treatment until the Ψ substrate was approximately -4.62 (± 0.62) MPa (note that the dehydration treatment in this experiment was less severe than that used with over-expressing lines); subsequently, plants were rehydrated during a 6-day recovery period (see Material and methods for details). After this time, axillary and floral buds were counted and then plants were harvested to determine total biomass accumulation. Results in Figure 6 showed that significant differences were detected in biomass accumulation after recovery from drought between mutant and wild type plants, and that the production of buds in the AtLEA4-5 mutants was affected by the dehydration-rehydration treatment when compared to wild type plants. In this case, wild type plants recovered some buds in contrast to the phenotype of wild type plants Because no mutants were available for the AtLEA4-1 and AtLEA4-2 genes, an artificial microRNA (a-miR) construct was used for their post-transcriptional silencing.
The high homology between AtLEA4-1 and AtLEA4-2 genes allowed the design of an a-miR (a-miR 4-1/2) able to target both genes for silencing using a region of maximal nucleotide identity between both transcripts (Supplemental Fig. S4). The precursor of a-miR 4-1/2 was expressed under the control of the 35S promoter to favor an efficient silencing of the genes of interest. Those lines showing the highest a-miR accumulation were selected to validate the expected silencing. The mature a-miR 4-1/2 was detected by small RNA northern blot from various T 3 lines using two week-old plants grown under optimal conditions (Fig. 7A). The results from western blot analyses using protein extracts from adult plants of the selected lines subjected to dehydration indicated that the designed a-miR was functional and able to silence the expression of both genes (Fig. 7B). Once confirmed that the silencing was functional, these lines were phenotypically characterized, applying the same treatments described above for the AtLEA4-5 insertion mutant. For the germination assays, the major effect was observed when seeds were germinated in the presence of NaCl (250 mM), where the silenced AtLEA4-1 and 4-2 mutant seeds showed 20% germination compared to 70% for wild type seeds. When mannitol (350 mM) was used no differences in the final germination percentage were detected; however, significant differences were observed between the silenced mutants and the wild type germination rates (Fig. 5). An increased susceptibility to dehydration was also detected in experiments using adult plants subjected to dehydration-rehydration, where mutants showed a significant reduction in the amount of buds when compared to wild type plants ( Fig. 6 and Supplemental Table S2).
To construct a triple mutant affecting the production of the three members of the AtLEA4 group and because both mutants, dSpm insertion in AtLEA4-5 and a-miR 4-1/2, are resistant to BASTA, we generated a different mutant affecting the expression of AtLEA4-5, containing kanamycin as the selection marker. To this end, we obtained a silencing mutant using RNA interference (RNAi) against AtLEA4-5 by the expression of because silencing was stable just in T 1 and T 2 generations, all experiments with these lines were carried out using T 2 homozygous plants. In this generation we selected independent transgenic lines showing different levels of reduction in the AtLEA4-5 protein accumulation (lines 3-5, Fig. 7C), which were phenotypically characterized prior to the generation of the triple mutant. Analyses were carried out using Arabidopsis adult plants subjected to dehydration treatment until Ψ substrate = -4.62 (± 0.62) was achieved under greenhouse conditions. The results from these analyses showed that RNAi 4-5 silenced lines accumulated less biomass, recovered less number of buds after six days of plant re-hydration and produced a lower total seed number per plant than wild type plants (Supplemental Table S1 and Fig. S5). Homozygous T 2 RNAi 4-5 silenced plants (Mutant line 3 in Fig. 7C) were crossed with homozygous T 2 a-mir 4-1/2 silenced plants (Mutant lines 3 and 4 in Fig. 7B), choosing those lines where the silencing of the corresponding transcripts was more efficient. From the products of this cross containing both silencing constructs (a-mir 4-1/2 and RNAi 4-5), we selected those lines that showed the lower protein levels in the F 2 generation for further characterization (Supplemental Fig. S6). Two independent lines were subjected to dehydration-rehydration treatments as those applied for the phenotypical analyses described above. The results showed a significant lower dry biomass and lower recovery of buds after dehydration-rehydration in those plants affected in the production of the group 4 LEA proteins, compared to wild type plants grown in the same pot ( Fig. 6A and 6B). As for the RNAi 4-5 silenced plants, the silencing was functional only in F 1 and F 2 generations: some homozygous F 3 lines showed wild type protein levels as well as wild type phenotype upon dehydration-rehydration treatments (data not shown). Because RNA interference was not effective in reducing AtLEA4-5 protein accumulation in seeds, none of the lines containing the RNAi 4-5 construct were used for phenotypical characterization during germination.
Because the three protein members of LEA group 4 accumulated to high levels in dry seeds, we analyzed the effect that their lower levels could have on seed production under optimal growth conditions. To this end, individual seedlings of the different lines used in this study were grown in pots containing soil (Metromix) under optimal irrigation until senescence. Seeds were collected from each plant and the dry biomass of seeds was determined. The comparison of their seed biomass showed that the absence of AtLEA4-5 protein, the low levels of AtLEA4-1 and AtLEA4-2 proteins, or of the complete protein family led to a lower seed biomass than wild type plants even when grown during their complete life cycle under optimal growth conditions ( Fig. 8). Higher AtLEA4-5 protein levels obtained when AtLEA4-5 was overexpressed did not confer any advantage on this phenotype when compared to wild type plants (Fig. 8).
Structural and phylogenetic analysis of group 4 LEA proteins points to an early gene duplication that gave rise to two distinct subgroups with arguably divergent functions.
In a previous work, we showed that group 4 LEA proteins can be defined based on sequence similarity (Battaglia et al., 2008). In that work, we reported the identification of two subgroups, where the proteins differed in the length of their carboxy terminus.
Five sequence motifs were identified, three of which were common to all proteins, while the other two were present only in the longer proteins. In the present work, we report a more thorough analysis, starting with a collection of 74 LEA group 4 proteins from angiosperms, gymnosperms and bryophyta. A multiple sequence alignment showed a broadly conserved amino region, and a much more variable carboxy region, where sequence repetitions and rearrangements were common (data not shown). Given that such variability confounds the creation of a correct multiple alignment for the carboxy region, to describe the motif structure of the group 4 LEA proteins we used MEME, a motif discovery tool that does not rely on alignments (Bailey and Gribskov, 1998). Ten motifs were discovered. MEME is sensitive to the starting collection of sequences, so the new motifs do not match precisely those reported previously. Thus, we will not refer to the previous nomenclature. Motifs 1, 2 and 4 are almost universally distributed, being present in 70, 74 and 66 of the 77 proteins, respectively. The actual prevalence of motif 4 could be higher, because, being the leftmost motif, it could easily be lost during cDNA construction. Motifs 6 and 7, although distinct, must be divergent variants of the same motif, because they are related in sequence, occupy equivalent positions in the proteins, and are mutually exclusive: proteins have either motif 6 (35 proteins) or motif 7 (32 proteins). Motif 5 is also very common; it is present in 46 proteins, where it usually lies near the C-terminus, and sometimes it is present more than once per protein. Motifs 3, 9 and 10 appear in 13, 7 and 12 proteins, respectively. When present, they appear in proteins that have motif 7 and never in association with those that have motif 6 (Supplemental Table S3 and Supplemental Fig. S7). Motif 8 is only present in the three proteins from lettuce. It might be related to motif 3 because it has a similar size, occupies a similar position in the proteins and, like motif 3, it is only present in proteins that carry motif 7. However, motif 3 and 8 are not similar in sequence.
The Pfam (listed as LEA_1, PFAM 03760) alignment that defines this family encompasses motifs 4, 1, 2 and either motif 6 or 7, in that order. We called this region of the proteins "the main conserved block" (MCB). Because there are no gaps between the four motifs and the region has the same length in all proteins, the MCB could be easily aligned. We used the MCB to reconstruct the phylogeny of the family. Although none of several methods achieved high reliability for all branches (something we impute to having many sequences and relatively few alignment columns) the resulting best trees from all strategies were very similar and they are generally consistent with the species phylogeny, except for a deep branching which probably indicates a very ancient gene duplication (a representative tree is shown in Fig. 9). The deepest divide was found between proteins having motif 6 and those having motif 7, showing a bootstrap value of 62/100 with neighbor joining, and 80/100 with Fitch, but bootstrap values became 99/100 if we ignored the wheat LEA, which has a very poor motif 6 (MAST e-value = 0.00037, see Supplemental Table S3 for the sequences used and motif presence). This divide was also the most reliable with a maximum likelihood approach (Proml; bootstrap was 34/100, and 96/100 if the wheat LEA is ignored). The divide was not an artifact of the great difference in sequence between motif 6 and 7, because the same divide and similar bootstrap values were obtained using motif 2 alone, instead of the whole MCB. In general, motif 2 was informative of the evolutionary history of the complete proteins, indicating that it represents the defining core of group 4 LEA proteins, and that throughout evolution the acquisition and loss of the other motifs has not been rampant. The presence of proteins with either motif 6 or motif 7, which allowed to predict the associated "accessory motifs", together with the finding that the deepest dichotomy in the phylogenetic tree was clearly attributable to the presence of either of these two motifs indicate that group 4 LEA proteins should be divided in two subgroups, 4A or 4B, according to the occurrence in a protein of motif 6 or motif 7. Because both subgroups were found within and outside the angiosperms (4A exists in several conifers, while 4B was present in the moss Physcomitrella patens) the gene duplication must have predated the evolution of seed plants. Furthermore, the side-to-side persistence of both subgroups, after hundreds of millions of years, in broadly distributed taxons, suggests that the duplication gave rise to functional divergence in such a way that one subgroup cannot be substituted for the other. This acquisition of specialized sub-functions would explain why the expression of the two Arabidopsis proteins from subgroup 4A (AtLEA4-1 and AtLEA4-2) overlapped only partially with the expression of the one from soubgroup 4B (AtLEA4-5).

DISCUSSION
The high conservation of LEA protein families in the plant kingdom and the high correlation between their abundance and water deficit conditions denote a relevant role for these proteins in this environment. Even though circumstantial evidence have suggested the participation of these proteins in the adaptation of higher plants to low water availability, no direct genetic evidence on this regard had been reported. In this work, we analyzed the group 4 LEA genes and provide data showing that they participate in the adaptive response to this environmental stress in Arabidopsis plants, throughout their life cycle.
Even though there is some information regarding the transcript accumulation patterns for the group 4 LEA genes in Arabidopsis (Delseny et al., 2001;Hoth et al., 2002;Seki et al., 2002;Oono et al., 2003;Schmid et al., 2005;Winter et al., 2007;Hruz et al., 2008), we were interested in knowing their accumulation under the growth conditions and developmental stages relevant for this work, as well as to have information on the correlation between the abundance of a transcript and its corresponding protein. Our results and those reported previously agree in that the highest accumulation for all group 4 LEA transcripts occurs at the desiccation tolerance acquisition stage during seed development and up to 2 DAG seedlings. Also, our results concur that AtLEA4-5 transcript is the one that reaches the highest abundance in most stress conditions tested, in particular in response to hyperosmotic, drought and ABA treatments, when compared with the other two members of the family. The two genes that showed the highest similarity, AtLEA4-1 and AtLEA4-2, also showed a high correlation in their transcript accumulation patterns, suggesting similar functions, in agreement with our phylogenetic analysis, which localized both genes in the same clade or subgroup (see below).
Transcript levels contrasted with protein abundance in mature flowers and during seedling development in the case of AtLEA4-1, where protein levels are similar to those accumulated in seeds, even though low transcript abundance was detected.
Interestingly, we found that, in dried and stratified seeds, AtLEA4-1 protein migrated with an apparent higher molecular mass suggesting that under severe dehydration this protein experiences a post-translational modification or conformational changes.
Unexpectedly, in the case of AtLEA4-2, we were unable to detect the protein in the predicted molecular mass (10.5 kD), instead we specifically detected a protein with a higher molecular mass, whose accumulation pattern correlates with that obtained for its transcript. A possible explanation for this observation is that, in contrast to AtLEA4-5 and AtLEA4-1, secondary structure prediction analysis suggests the lowest percentage of random coil (20% of the protein) for AtLEA4-2, indicating that the higher structural order in this protein could favor the formation of homo-or hetero-oligomers. In some cases (Figs. 1B and 2B), instead of one band, two bands were detected, suggesting the formation of two types of oligomers due to preferential interactions. Notice that all the members of AtLEA4 protein family are predicted to form "coiled-coil" structures (Lupas et al., 1991), involved in protein-protein interactions, reinforcing a hypothesis where these proteins may interact with each other and/or with other protein partners in the cell. In the case of AtLEA4-2, most of its residues (80%) are predicted to be involved in the formation of "coiled-coils", supporting the existence of strong interactions between AtLEA4-2 monomers. The analysis of the protein accumulation patterns for the different members of this family suggests the participation of post-transcriptional control mechanisms to modulate protein levels as has been suggested for other LEA proteins (Colmenero-Flores et al., 1999). One example of this situation is evident in the case of AtLEA4-5 transcript/protein upon ABA and salt treatments (Fig. 2), where transcript accumulation levels are similar for both conditions whereas protein levels in response to ABA are much higher than in response to NaCl, indicating that ABA may be involved in a and AtLEA4-2 were silenced using a specific artificial microRNA indicating that there is not redundancy in their participation under these adverse environments. This interpretation was further supported by the results obtained from plants where the three genes of the family were silenced (RNAi 4-5 X a-miR 4-1/2). These plants with undetectable or low levels of the three proteins showed a lower reduction in biomass accumulation and also in the production of axillar and floral buds in response to drought treatments.
The participation of this gene family in the plant response to water limitation was also supported by the results obtained from the over-expression of the AtLEA4-5 gene that showed a higher production of axillar and floral buds compared to wild type plants, when plants were subjected to severe dehydration-rehydration treatments. An additional observation from these experiments was that plants over-producing AtLEA4-5 protein showed a higher ability to restore their tissues after rehydration (see Supplemental Material). These results indicated that the over-production of AtLEA4-5 protein is able to confer a higher tolerance to severe drought treatment (>85% substrate water loss). Interestingly, we found that even under optimal irrigation conditions, the seed biomass from the different mutants (dSpm 4-5; amiR 4-1/2; RNAi 4-5; triple silenced) was lower than that of wild type plants maybe due to a protecting role of these proteins on the plant machinery needed for optimal fruit and/or seed development. Also, it should be considered the presumed protective role that these proteins may have on the meristematic regions or primordia (in agreement with their effect on the number of floral and axillary buds under water deficit), which would be relevant for the formation of inflorescences and consequently in the total number of seeds. Because the determination of seed size by thousand seed weight (data not shown) indicates that none of the mutants affect this phenotype, the possibility that inefficient seed filling is responsible of lower seed biomass in the different mutants could be discarded. Plants overproducing AtLEA4-5 did not show a higher seed biomass under optimal irrigation suggesting that during normal seed development higher levels than those present in wild type seeds are not required for a successful seed production.
The in silico analysis of the known group 4 LEA proteins confirmed the proposal by Battaglia et al. (2008) that group 4 LEA proteins can be defined through sequence conservation, which is a much stronger criteria than amino acid properties (Wise, 2003;Wise and Tunnacliffe, 2004), and that the group is formed by two subgroups, each characterized by different sequence motifs. Motifs 1, 2, 4 and 6/7 are so broadly distributed that they could be the signature of the group. However, only 52 out of the 77 sequences included in this work have all four motifs. If we were to choose a single motif as signature of the group, that would probably be motif 2, not only because it is present in most sequences, but also because its phylogeny is informative of the phylogeny of the complete proteins. By super-imposing the NCBI taxonomy of the organisms on the phylogeny of group 4 LEA proteins we concluded that the two subgroups originated from a very early duplication that predated the branching of monocots and dicots, because the two types of proteins can be found in both taxons. duplication gave rise to functional divergence or sub-functionalization. A reasonable hypothesis is that, in protecting other proteins or cellular structures, both subgroups have similar mechanism but different targets; a prediction supported by both subgroups sharing a common core (formed by motifs 4, 1 and 2, and either motif 6 or 7, two motifs that are clearly related) but differing in the presence of additional motifs, that could be involved in specificity. In the case of Arabidopsis, the two group 4A proteins (AtLEA4-1 and AtLEA4-2) have similar patterns of expression that differ from that of the group 4B protein (AtLEA4-5). Our experimental observations correlate well with the conclusion from the bioinformatic analysis, because dissimilar patterns are what would be expected for proteins whose functions have diverged.
Since the first recognition of several families of these hydrophilic and flexible We also showed that their specific induction and high abundance is necessary, but not sufficient to adapt to drought stress, suggesting that each LEA gene has evolved to help in the adaptative process of higher plants depending on the developmental stage or particular tissues as well as stress type and severity. These results also support the idea that there is not functional redundancy among the different LEA protein groups. All together, the genetic and phylogenetic evidence presented in this work strongly support the essential role that group 4 LEA proteins play in the adaptative process to water deficit in higher plants.
Furthermore, we include a phenotypical analysis that considers the impact of drought on Arabidopsis plants at the adult stage, focusing on the survival and/or recovery of reproductive organs. This type of analysis allowed us to assign to these proteins a role in the protection of reproductive organs, a property relevant for the offspring rather than for the vegetative tissues, and consequently important from the evolutionary and agronomic point of view.

Plant Material and Growth Conditions
Wild type Arabidopsis thaliana (Columbia) seeds were germinated in MS 1X:

Water Deficit Treatments
The expression pattern analyses of group 4 LEA transcripts and proteins were carried out at different developmental stages as indicated. Some experiments were performed using two week-old seedlings grown in vitro in MS 1X and subjected to Phenotypic analyses were carried out during germination or in adult plants as

Statistical analyses
The germination experiments included three replicates (plates) with 100 seeds per replicate. The data were accumulated over time and fit to sigmoidal dose-response curves with variable slope, (Y= Bottom + {(Top-Bottom)/ 1 + 10 (LogEC50-X)*Hillslope )}, also called four-parameter logistic equation. Bottom is the Y value at the bottom plateau (constrained to cero); Top is the Y value at the top plateau; LogEC50 is the X value when the response is halfway between Bottom and Top; Hillslope describes the steepness of the curve. The null hypothesis was that two curve fit parameters (Hillslope and LogEC50) from each data set were the same. For the water deficit treatments, the experimental design included three replicates (pots) with 4-6 plants in each replicate.
The dehydration/rehydration experiments were analyzed by one-way ANOVA, significant differences between groups were searched through Dunnett's or Tukey's multiple comparisons post-tests. Bartlett's test for equal variances showed no significant differences (P<0.05) within each group, thus one-way ANOVA could be applied in all cases. The null hypothesis was that there were not significant differences in the means between groups. All the statistical analyses and curve fitting were performed using Prism 5 for Mac OS X (GraphPad Software, Inc., La Jolla, CA).

AtLEA4-5 constructions
AtLEA4-5 open reading frame (477 bp) was cloned into pBluescript KS+ (Amp R , Stratagene) from a cDNA library (leaf, flowers and siliques) using specific primers, which add NcoI and SalI restriction sites (5'-AAA CCA TGG AGT CGA TGA AAG AAA C-3'; 5'-GCG GTC GAC CCG TTT ATC CAG TAT ATC C-3'). This cDNA was used as template for random labeling of PCR fragments to make specific probe for hybridization from this screening three lines were selected for phenotypic analysis.
For AtLEA4-5 antibody production, AtLEA4-5 ORF was subcloned as NcoI/SalI fragment into pTRC99A vector (Amp R , Amann et al., 1988) to induce the overproduction of the native recombinant protein in E. coli (see antibodies section).

Genomic DNA extraction and Southern blot experiments
For PCR experiments quick mini-preparations of genomic DNA from Arabidopsis were obtained as described by Edwards et al. (1991). Genomic DNA was obtained by large-scale extractions as described in Taylor et al. (1993). Southern blots were carried out using genomic DNA (40 μg) from wild type and mutant homozygous lines from the European Arabidopsis Stock Centre (NASC ID: N122943) digested with ClaI, HindIII and BamHI, and separated by electrophoresis. Transfer and hybridizations were carried out following standard protocols under stringent conditions. AtLEA4-5 probe was obtained by random labeling with [α 32 P] dCTP (3000 Ci mmol -1 ) an 800 bp PCR product using specific primers for the AtLEA4-5 gene. After washing at high stringency conditions, membrane was exposed to Kodak film using an intensifying white screen (Amersham).

RNA extraction and northern blot experiments
Total RNA was extracted from flowers, immature siliques and dry seeds according to Vicient & Delseny (1999

Protein extraction and western blot
Total proteins from flowers, immature siliques, dry seeds and vegetative organs were extracted as described by Hurkman and Tanaka (1986) with some modifications. Zymed) for 1 hour at room temperature, washed with PBS 1X and developed with peroxidase substrates from Supersignal West pico (Pierce). Membranes were exposed to X-ray films (Kodak).

Antibodies preparation
AtLEA4-5 antibodies were produced using native recombinant protein, whereas antibodies against AtLEA4-1 and AtLEA4-2 were obtained using GST fusion proteins.
Recombinant proteins were purified from E. coli after induction with IPTG. For the purification of AtLEA4-5 native protein, pelleted cells were resuspended in ice-cold buffer containing Tris 20 mM, NaCl 50 mM, PMSF 1mM and lysed by sonication. After centrifugation, cleared supernatant was boiled 10 minutes and put in ice for 10 minutes. This step enriches boiling soluble proteins from bacteria, most of it being the recombinant protein (Jepson & Close, 1995). To obtain AtLEA4-1-GST and AtLEA4-2-GST fusion proteins, pelleted cells were resuspended in ice-cold PBS and sonicated three times. After addition of Triton X-100 (1% final concentration), cell debris were discarded and supernatant was incubated with glutathione-agarose beads (Sigma) and washed three times with cold PBS. Soluble fusion protein was obtained after boiling in SDS 10%.
For polyclonal antibody production, purified proteins were separated on polyacrylamide gels and stained with Coomasie blue, the band corresponding to the protein of interest was eluted from the gel, equilibrated with distilled water and emulsified with complete Freund´s adjuvant (Gibco, Grand Islands, NY) at 1:1 ratio, homogenized exhaustively and used to immunize New Zealand female rabbits, through multiple intradermal injections. Pre-immune serum was previously taken. The titer and specificity of the antibodies was monitored through inoculation protocol using dilutions of the purified recombinant proteins and protein extracts from dry seeds and from osmotic stressed plants. IgG fraction was precipitated from serum with saturated ammonium sulphate and stored at 4ºC with 0.02% sodium azide. To verify the specific recognition of the antibodies for their own antigen, antibodies for each of the three group members were cross-tested with the purified recombinant proteins by western blot. As a negative control pre-immune serum was also used to incubate with membranes. Immunopurified antibodies were obtained by affinity purification according to Lillie and Brown (1987). Custom rabbit specific polyclonal antibodies against AtLEA4-2 were produced against a peptide corresponding to a non-conserved region between the members of this family (CSKEAQAKADLHQSK) (GenScript Corporation, Piscataway, NJ). This antibody specifically recognized a GST-AtLEA4-2 fusion protein produced in bacteria. To verify the antibody specificity competition experiments were carried out using increasing concentrations of the corresponding purified antigens (Recombinant AtLEA4-5, AtLEA4-2_GST and AtLEA4-2 peptide).  Lang et al., 2005). In all, 77 sequences were analyzed after translating them in silico (the complete list, including the "short names" used in this work is shown in Supplemental Table 3).
The predicted hydrophilicity, molecular mass and pI of the proteins of interest were verified using Kyte and Doolittle plots (1982) and Protean (Protein sequence analysis software, DNASTAR), respectively. Their predicted secondary structure was established using PSIPRED (Protein Structure Prediction Server, McGuffin et al., 2000) and COILS programs (Prediction of "coiled-coil" regions in proteins, Lupas et al., 1991).
Sequence motifs were inferred with MEME (Multiple Em for Motif Elicitation; Bailey and Gribskov, 1998;Bailey et al., 2006) from 72 group 4 LEA proteins (all but the five proteins from Physcomitrella patens) using default parameters. The model used was zoops (zero or one per sequence). Ten motifs were obtained, and these were searched using the MEME associated tool MAST (Motif Alignment & Search Tool; Bailey and Gribskov, 1998;Bailey et al., 2006) over the complete collection (this time including the moss proteins).
Alignments were made with either the whole proteins, their "main conserved block" (MCB), which is the concatenation of motifs 4-1-2-6/7 found by MEME, or with motif 2 alone. All phylogenetic analyses were made with the Phylip suite of phylogenetic programs (Felsenstein, 2005). Bootstrapped data set were obtained with Seqboot

Supplemental Data
The following material is available through the online version of this paper: Supplemental Figure S1. Sequence similarity between group 4 LEA proteins in Arabidopsis and prediction of "coiled-coil" regions in these proteins.
Supplemental Figure S2. Phenotypic analysis of plants overexpressing  protein.
Supplemental Figure S3. Description of the transposon insertion mutant  in the AtLEA4-5 gene.
Supplemental Figure S4. Design of artificial microRNA to silence AtLEA4-1 and AtLEA4-2 genes and construction of RNAi-triggered silencing for AtLEA4-5 gene.
Supplemental Figure S5.  Table S1. Phenotypic analysis of T 3 homozygous adult plants overexpressing AtLEA4-5 protein (35S::AtLEA4-5::NOS) during drought and after recovery from stress. Table S2. Phenotypic analysis of T 4 homozygous transposon insertion mutant in AtLEA4-5 gene and of homozygous T 2 PTGS mutants in AtLEA4 gene family during drought and after recovery from stress. Table S3. Taxonomic classification of the sequences used for motif search, sequence number and sequence name depicted in Figure 9.       Reversible staining with Ponceau red after transfer was used as loading reference in B and C. wild type backgrounds (4-5 OE) are shown as controls. Bars indicate mean ± SE (n=5).

Supplemental
The numbers in parenthesis show the lines selected from each construction (as indicated in Figures 4 and 7). Significant differences among genotypes were determined by one-way ANOVA (P<0.0001) statistical analysis. Different letters show significant differences between groups as indicated by Dunnett's post-tests (P<0.05).   Figure 4. Reduction in the expression levels of AtLEA-5 transcript and its corresponding protein in the transposon insertion mutant (dSpm). A) Northern blot analysis using a specific probe for AtLEA-5 and total RNA (10 μg) from dry seeds of wild type (WT) and dSpm mutant plants.
Reversible stain with methylene blue after transfer was used as loading reference. B) Western blot analysis using specific antibodies against AtLEA4-5 and total protein extracts (10 μg) from dry seeds of WT and dSpm mutants. C) Western blot analysis using protein extracts from roots under dehydration (5 μg) of WT and dSpm mutants. D) Western blot analysis using total protein extracts (10 μg Wt dSpm Figure 5. Accumulated germination percentage of wild type (WT) and AtLEA4 transgenic lines under optimal growth conditions or under stress. Germination was quantified by radicle emergence using seeds of homozygous lines plated on A) standard MS medium (control), B) MS with 0.35 M mannitol (osmotic stress) or C) MS with 0.25 M NaCl (ionic + osmotic stress). Transgenic lines used were: 35S::AtLEA4-5::NOS construction in WT background (OE 4-5) and 35S::AtLEA4-5::NOS construction in dSpm mutant background (dSpm Compl). Also, insertion mutant in AtLEA4-5 gene (dSpm 4-5) and double mutant in AtLEA4-1 and AtLEA4-2 genes silenced with an artificial microRNA construct (a-miR 4-1/2) were analyzed. Seeds were stratified for 3 days and incubated in a growth chamber at 25ºC for the indicated time. Error bars indicate standard error of three replicates (n= 300) which were fit to a sigmoidal dose-response curve. Significant differences between genotypes were found in three parameters of the curve fit (steepness of the curve, Y value at the top plateau and X value when the response is halfway between bottom and top) at P>0.0011 (A), P<0.0001 (B) and P<0.0001 (C).
A B C Figure 6. Phenotypic analysis of adult plants with altered accumulation levels of AtLEA4 protein family. Seedlings grown in vitro for two weeks were transplanted to a low-water retention substrate and kept under optimum irrigation with nutrient solution until flowering, under greenhouse conditions. Wild type (WT) and homozygous lines were grown in the same pot. Dehydration was followed by loss of water from the substrate and pots were rotated in the tray every two days to maintain uniform water loss during drought treatment. A) Biomass of whole plants under control (well-irrigated plants) or after 6 days of recovery from stress. One-way ANOVA was applied for each treatment to compare the performance of the different lines. This analysis showed significant differences between lines after recovery from drought (P<0.0001). Bars indicate mean ±SE (n=8). B) Number of axillary and floral buds per plant under optimum irrigation (n=4) or after 6 days of recovery from stress (n=8). Significant differences between groups were found using one-way ANOVA under control (well irrigated plants, P=0.0095) and after recovery from stress (P<0.0001). Bars indicate mean ±SE. Different letters show significant differences between bars (P<0.05) as indicated by Tukey's post-tests. Homozygous lines were used in all experiments: transposon insertion in the AtLEA4-5 gene (dSpm 4-5), PTGS single mutant with RNAi-directed silencing of AtLEA4-5 gene (RNA 4-5), PTGS double mutants in AtLEA4-1 and AtLEA4-2 genes using an artificial microRNA (a-miR 4-1/2) and the resulting F 2 crosses from mutants of RNAi with a-miR (triple mutant). This figure also shows data from lines ectopically overexpressing AtLEA4-5 protein (35S::AtLEA4-5::NOS) in WT    Figure 7. Post-transcriptional gene silencing of AtLEA4 genes using an artificial microRNA (a-miR 4-1/2) to silence AtLEA4-1 and AtLEA4-2, and RNAi to silence AtLEA4-5 transcripts (RNAi 4-5). A) Northern blot using antisense probe of a-miR 4-1/2 and RNA (20μg) from homozygous transgenic seedlings grown under optimal conditions to confirm the constitutive expression of mature a-miR 4-1/2. Wild type (WT) and RNAi 4-5 lines (3 -5) were used as controls, where a-miRNAs do not accumulate. Reversible staining with methylene blue after transfer was used as loading reference. B) Western blot using specific antibodies for AtLEA4-1 and AtLEA4-2 proteins and total protein extracts (10 μg) from adult plants grown under drought to show their accumulation levels in Wt and in homozygous silenced plants (2 -4). F: flower, L: leaf, R: root. Arrows show the proteins migrating with the expected molecular mass for the corresponding monomer size, arrowhead shows the higher molecular mass band specifically detected with AtLEA4-2 antibodies. The selected lines for further phenotypic analysis were 3 and 4. C) Western blot using antibodies against AtLEA4-5 protein and total protein extracts (10 μg) from homozygous RNAi 4-5 seedlings (1 -5) grown in vitro for two weeks and immersed in liquid MS, where they were treated for 8h without (C) or with 25% PEG solution (S, stress), showing different silencing levels. The selected lines for further phenotypic analyses were those showing the lower AtLEA4-5 protein accumulation (3, 4 and 5). Reversible staining with Ponceau red after transfer was used as loading reference in B and C. . Total seed production of AtLEA4 single (RNAi 4-5, dSpm 4-5), double (a-mir 4-1/2) and triple mutants grown under optimal irrigation. Plants were germinated in vitro and transplanted to develop and set seeds under optimum irrigation conditions. Seeds from wild type (WT) and two independent homozygous lines from each construct used in this study were harvested during all productive cycle until senescence. The transgenic lines overexpressing AtLEA4-5 gene in the dSpm mutant (dSpm 4-5) and wild type backgrounds (4-5 OE) are shown as controls. Bars indicate mean ± SE (n=5). The numbers in parenthesis show the lines selected from each construction (as indicated in Figures 4 and 7). Significant differences among genotypes were determined by one-way ANOVA (P<0.0001) statistical analysis. Different letters show significant differences between groups as indicated by Dunnett's post-tests (P<0.05).