Plant Physiol. EPICENTRE Biotechnologies
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (14)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kerk, D.
Right arrow Articles by Gribskov, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kerk, D.
Right arrow Articles by Gribskov, M.
Agricola
Right arrow Articles by Kerk, D.
Right arrow Articles by Gribskov, M.

Plant Physiol, March 2003, Vol. 131, pp. 1209-1219

Arabidopsis Proteins Containing Similarity to the Universal Stress Protein Domain of Bacteria1


David Kerk,* Joshua Bulgrien, Douglas W. Smith, and Michael Gribskov

Department of Biology, Point Loma Nazarene University, 3900 Lomaland Drive, San Diego, California 92106 (D.K., J.B.); Division of Biology, 0116, University of California, San Diego, La Jolla, California 92093-0116 (D.W.S.); and San Diego Supercomputer Center, 0505, University of California, San Diego, La Jolla, California 92093-0505 (M.G.)


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
MATERIALS AND METHODS
LITERATURE CITED

We have collected a set of 44 Arabidopsis proteins with similarity to the USPA (universal stress protein A of Escherichia coli) domain of bacteria. The USPA domain is found either in small proteins, or it makes up the N-terminal portion of a larger protein, usually a protein kinase. Phylogenetic tree analysis based upon a multiple sequence alignment of the USPA domains shows that these domains of protein kinases 1.3.1 and 1.3.2 form distinct groups, as do the protein kinases 1.4.1. This indicates that their USPA domain structures have diverged appreciably and suggests that they may subserve distinct cellular functions. Two USPA fold classes have been proposed: one based on Methanococcus jannaschii MJ0577 (1MJH) that binds ATP, and the other based on the Haemophilus influenzae universal stress protein (1JMV), highly similar to E. coli UspA, which does not bind ATP. A set of common residues involved in ATP binding in 1MJH and conserved in similar bacterial sequences is also found in a distinct cluster of Arabidopsis sequences. Threading analysis, which examines aspects of secondary and tertiary structure, confirms this Arabidopsis sequence cluster as highly similar to 1MJH. This structural approach can distinguish between the characteristic fold differences of 1MJH-like and 1JMV-like bacterial proteins and was used to assign the complete set of candidate Arabidopsis proteins to one of these fold classes. It is clear that all the plant sequences have arisen from a 1MJH-like ancestor.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
MATERIALS AND METHODS
LITERATURE CITED

The "USPA domain" is a recently identified protein structure now known to be widespread in prokaryotic organisms, both bacterial and archaeal. It is named after the UspA (universal stress protein A) of Escherichia coli, which was originally identified because of its prominence in stationary phase cells. Genetic evidence has subsequently shown that UspA mediates survival of cells starved for a wide variety of nutrients, exposed to toxic chemicals, and exposed to osmotic stress or UV light damage (Nystrom and Neidhardt, 1992, 1993, 1994). It is a Ser and Thr phosphoprotein, which is phosphorylated by the Tyr phosphoprotein TypA (Freestone et al., 1997, 1998). The precise biochemical function of UspA is unknown. However, the fact that it is vital to stationary phase cell growth, and the observation that uspA gene transcription is up-regulated during the shift from Glc to acetate metabolism, has led to the suggestion that it is involved in coordinating this metabolic shift (Nystrom and Neidhardt, 1993; Tao et al., 1999). The first solved structure in the USPA domain family (1MJH) was obtained using the Methanococcus jannaschii protein MJ0577 (Zarembinski et al., 1998). This protein surprisingly contained a bound molecule of ATP in the crystal structure. Biochemical analysis showed that although MJ0577 binds ATP, it can only hydrolyze it in the presence of uncharacterized additional proteins from an M. jannaschii crude cell extract. It was suggested that MJ0577 mediates an ATP-dependent function, such as acting as a molecular switch. However, no cellular role has been discovered for this protein. More recently, the structure of the universal stress protein of Haemophilus influenzae (1JMV), which has 68% sequence similarity to E. coli UspA, has been solved (Sousa and McKay, 2001). In contrast to MJ0577, this protein cannot bind ATP. This has led to the suggestion that there are two distinct folds encompassing the USPA domain family: one exemplified by the ATP-binding structure of 1MJH and the other exemplified by the non-ATP-binding structure of 1JMV.

The USPA domain appears to be part of an ancient protein structural family. Zarembinski et al. (1998) noted in their characterization of the 1MJH structure that it has significant similarities to other solved structures: human electron transport flavoprotein, DNA photolyase, and tyrosyl-tRNA synthetase. These initial observations have recently been confirmed and extended by Aravind et al. (2002), who have performed an extensive evolutionary analysis. Based on patterns of conserved structural features, they propose that the USPA domain is part of a larger protein structural family, whose members had already diversified and were present in the last universal common ancestor of all extant life. They suggest that the ancestral function of the USPA domain was nucleotide binding and signal transduction.

We have performed extensive database searches utilizing techniques exploiting primary sequence similarity and higher order structural features to assemble a set of sequences in Arabidopsis that encode USPA domains. Our analysis has revealed both free-standing small proteins reminiscent of those found in bacteria and larger proteins where the USPA domain comprises but a part of the total sequence. Our evidence indicates that all Arabidopsis USPA domain-containing sequences have evolved from a 1MJH-like ancestor.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
MATERIALS AND METHODS
LITERATURE CITED

Repetitive database searches with conserved sequence motifs and sequence profiles produced a set of 44 Arabidopsis proteins plus prokaryotic proteins, each containing domains with similarity to the bacterial USPA domain. A multiple sequence alignment is presented in Figure 1. The alignment is annotated with features of secondary structure revealed in the structure of 1MJH (Zarembinski et al., 1998). This consists of five beta strands, alternating with four alpha helices. A number of conserved blocks of hydrophobic sequence are readily apparent that correspond to residues in beta 1, alpha 1, beta 2, beta 3, alpha 3, beta 4, and beta 5. Five sequences, identified in the legend to Figure 1, were dropped from the alignment because they were truncated and did not extend through all the known secondary structure elements.



View larger version (128K):
[in this window]
[in a new window]
 
Figure 1.   Multiple sequence alignment (MSA) of the USPA domains of the USPA proteins. See text for identification and discussion of these sequences. Arabidopsis sequence numbers from The Institute for Genomic Research (TIGR) are used for plant sequences; otherwise, National Center for Biotechnology Information (NCBI), Protein Data Bank, or Swiss-PROT names are used for prokaryotic sequences; NCBI gi numbers are given in Figure 3. Three shades of gray are used to designate amino acids conserved at a given position in the multiple sequence alignment to at least 85%: darkest shade, hydrophobic residues (AMILVCWF); middle shade, Gly (G); and lightest shade, acidic residues (DE). Shading was done using ClustalX (Thompson et al., 1997). Secondary structure elements and ATP-binding residues shown are from the structure of 1MJH (Zarembinski et al., 1998). The following sequences were dropped from the data set at the alignment stage, because they were truncated (did not contain all the known secondary structure elements): At5g49050, At5g57680, At2g45910, At3g58450, and At5g61560. Sequences At3g58450 and At5g49050 were identified independently by NCBI and posted in the Conserved Domain Database (CDD). Sequence At3g58450 was independently identified and posted by Inter-Pro.

The residues in the structure of 1MJH that make contact with the bound molecule of ATP have been identified. We examined these positions in the alignment, their conservation in closely related bacterial proteins, and their representation in Arabidopsis sequences.

Position D13 in 1MJH (ILYPTD) is involved in coordinating a Mn2+ ion, which in turn binds to the phosphate groups. This residue is conserved in most of the bacterial and Arabidopsis sequences. The V at position 41 of 1MJH (VILLHV) hydrogen bonds to adenine. This position is conserved in many of the aligned sequences. The G at position 127 of 1MJH (IIIMG) hydrogen bonds to Rib. This is present in nearly all the sequences. Position G130 (IMGSHG) hydrogen bonds with the beta phosphate and is conserved in the small group of MJH-like bacterial sequences (YXIE_BACSU, 1MJH, MJ0531, and MTH993) and a subset of Arabidopsis proteins. The G at position 140 in 1MJH (GSVTEN) does not bind to ATP, but it is conserved in all the MJH-like bacteria, suggesting it may be important. It may serve to position subsequent residues in the proper binding conformation. It is conserved in a subset of the Arabidopsis sequences. Position S141 in 1MJH (GSVTEN) hydrogen bonds to the gamma phosphate and is conserved in the MJH-like bacterial sequences and a subset of the Arabidopsis sequences. The V at position 142 in 1MJH (GSVTEN) is conserved in the MJH-like bacteria and in a subset of the Arabidopsis sequences. In 1MJH, T at position 143 (GSVTEN) hydrogen bonds to the alpha phosphate. This residue is replaced by A or S in the MJH-like bacterial sequences and by S in a subset of Arabidopsis sequences. Table I summarizes the analysis of the subset of Arabidopsis sequences with the best conservation of amino acid residues known to be functionally important in 1MJH or conserved in its closest bacterial relatives. Five sequences (At1g11360, At3g03270, At3g11930, At4g27320, and At5g54430) contain all of them.


                              
View this table:
[in this window]
[in a new window]
 
Table I.   Analysis of Arabidopsis USPA domain proteins for residues conserved in 1MJH-like bacterial sequences

Figures 2 and 3 present alternative renderings of a phylogenetic tree constructed from the multiple sequence alignment of Figure 1. Several distinct groups are revealed by this analysis. The Arabidopsis sequences presented in Table I are designated "1MJH like, plant," are shown in bold in Figures 2 and 3, and their branches are thickened. A striking aspect of Figure 3 is the concentration of hits for various types of bacterial USPA domain models with the Arabidopsis sequences in this group, as compared with the sequence set as a whole. This indicates that a substantial degree of sequence conservation is readily detectable between the Arabidopsis sequences and their bacterial counterparts.



View larger version (39K):
[in this window]
[in a new window]
 
Figure 2.   Radial phylogenetic tree from USPA protein sequence comparisons. Branch lengths are in arbitrary units. Sequence names are those used in Figure 1. See text for definitions and discussion appropriate to the clustered groupings shown. Names of Arabidopsis sequences from Table I are in bold, and branches are broadened. Only branches with ClustalW neighbor-joining (NJ) bootstrap values of 40% or higher are shown.



View larger version (58K):
[in this window]
[in a new window]
 
Figure 3.   Topographic cladogram with additional information for USPA protein sequences. Branch lengths for the cladogram are unit length. Representative bootstrap values are shown; the value above the line is the ClustalW NJ value, and the value below the line is the parsimony value (see "Materials and Methods"). An appropriate NCBI gi number is provided for each taxon. The PlantsP plant phosphorylation database (Gribskov et al., 2001) identification number is shown for all plant kinase sequences. The availability of expressed sequence tags (Yes/No) and the number are indicated if "Yes," and the availability of a full-length cDNA is indicated (Yes/No). The sequence names shown correspond to those shown in Figure 1, and cluster designations correspond to those shown in Figure 2. Sequences At5g20310 and At3g61410 are denoted S for short; these sequences also are not present in the PlantsP database because they are not kinases (see text) and, hence, are designated NA for not applicable. Names of Arabidopsis sequences from Table I are in bold, and branches are broadened. Only branches with ClustalW NJ bootstrap values of 40% or higher are shown. Sequences designated with the letters "a," "b," and/or "c" were independently identified and posted in the NCBI CDD, Inter-Pro (IPR006015), and Pfam (PF00582) public databases, respectively.

However, despite their highly suggestive pattern of conserved residues, and the clear similarity with bacterial USPA domain models, this group of sequences nevertheless fails to form a single well-supported clade when subjected to conventional phylogenetic tree analysis. A cluster of seven sequences (At4g27320, At5g54430, At3g21210, At1g11360, At3g53990, At3g03270, and At3g17020) receives moderate bootstrap support (735/1,000 NJ, 95/200 parsimony) and features more strongly supported subclusters. Two other clusters are apparent in the NJ tree, but these each receive weak support (At3g01520, At5g14680, At1g68300, 427/1,000 NJ, and 66/200 parsimony; and At3g11930, At2g47710, At1g09740, At3g62550, 450/1000 NJ, and 41/200 parsimony). These are short sequences in which the USPA domain comprises essentially the whole protein. This architecture is very reminiscent of the bacterial proteins initially characterized with this domain. A second group of sequences (Figs. 2 and 3; "Small Plant" cluster) containing At2g03720, At1g69080, At5g17390, At3g03290, and At1g44760 receives moderate bootstrap support in NJ (626/1,000), weaker support in parsimony (51/200), and also comprises small proteins. Inspection of the alignment shows that these sequences are considerably more divergent and fail to contain a number of the functionally important residues characterized in the bacterial sequence 1MJH. Three groups are formed that consist primarily of protein kinases, where the USPA domain resides at the N terminus of the sequence, upstream of the kinase domain (in the analysis presented here, the kinase domain was identified and removed before USPA domain analysis). One cluster (Figs. 2 and 3; protein kinase 1.4.1 cluster) with strong bootstrap support in NJ (995/1,000 NJ) and somewhat weaker support in parsimony (77/200) consists of kinases of the 1.4.1 (receptor-like cytoplasmic kinase IX) family: At1g16760, At1g78940, At3g20200, At2g24370, At4g31230, At2g07020, At5g35380, At1g72760, At1g17540, At5g12000, At5g26150, At4g25160, and At5g61550). This cluster also contains one small protein: At5g20310. In the NJ tree, but not in the parsimony tree, this cluster contains an additional 1.4.1 kinase (At3g49060) paired with the small sequence At3g61410 (in the parsimony tree, this pair is preserved but placed elsewhere). Another cluster (Figs. 2 and 3; protein kinase 1.3.2 cluster) with strong bootstrap support (1,000/1,000 NJ; 196/200 parsimony) contains kinases of the 1.3.2 (receptor-like cytoplasmic kinase VI) family: At1g77280, At1g21590, and At5g63940. There is also a pair of sequences (Figs. 2 and 3; protein kinase 1.3.1 cluster) of the 1.3.1 (Pro-rich receptor kinase) family: At3g13690 and At1g55200. These cluster together with 1.3.2 kinases in the NJ tree with weak NJ bootstrap support (366/1,000); however, they are placed elsewhere in the parsimony tree. The topology of the clustering of bacterial sequences differs slightly between the two phylogenetic tree inference methods. Figures 2 and 3 show the NJ topology, where there are two well-supported clusters of bacterial sequences. 1MJH, together with MJ0531, MTH993, and YXIE_BACSU, form one group (926/1,000) that is in agreement with previously published work (Zarembinski et al., 1998). USPA_ECOLI and 1JMV cluster tightly, which also confirms published work (Sousa and McKay, 2001). Together with YECG_ECOLI, these sequences form another well-supported group (998/1,000). In the parsimony tree, all the bacterial sequences form a single cluster with good bootstrap support (150/200). Finally, in the NJ tree, the Arabidopsis sequences At4g13450 and At3g25930 are associated with these bacterial sequences. The bootstrap support for these associations is not strong, however, and close examination of the alignment reveals no obvious support for this placement. Our conclusion is that this represents noise in the analysis. This is reinforced by the observation that these Arabidopsis sequences are placed elsewhere in the parsimony tree.

The USPA containing proteins are clearly divergent at the primary sequence level. This is apparent from inspection of the alignment in Figure 1 and is reflected in the weak or moderate bootstrap support for many of the clusters identified and the occasional topological differences between the NJ and parsimony trees. To confirm the Arabidopsis sequences as USPA domain candidates, protein three-dimensional structure threading analysis was performed as described in "Materials and Methods." In this approach, we used as a reference the solved crystal structures of 1MJH (Zarembinski et al., 1998) and 1JMV (Sousa and McKay, 2001). Results are presented in Table II. Initial results with the bacterial sequence set showed that the techniques are able to distinguish between the subtle difference in the conformations of the folded structure of 1MJH and 1JMV. The 123D+ method (Alexandrov et al., 1995) correctly assigns a superior Z score for the compatibility of MJ0531, MTH993, and YXIE_BACSU with the structure of 1MJH. It also correctly assigns USPA_ECOLI and YECG_ECOLI higher scores for compatibility with the structure of 1JMV. The 3D-PSSM method (Kelley et al., 2000) also makes correct fold assignment scores for the 1MJH-like group. However, it misassigns a higher score to YECG_ECOLI for compatibility with 1MJH rather than 1JMV. For both methods, there was a strong preference for assigning the highest scores to the USPA structures 1MJH or 1JMV among the various structures represented in the fold libraries. Thus, for all Arabidopsis query sequences except one, the top two hits returned from the fold libraries were always these two. Thus, the threading methods could readily distinguish a preference of the Arabidopsis sequences for these folds, as opposed to other alpha/beta structures present in the fold libraries. For the 123D+ method, all Arabidopsis sequences produced a stronger hit with the structure of 1MJH than with 1JMV. When the stronger score for the pair is considered for each Arabidopsis sequence, the Z scores range from 6.96 to 23.36, with only 3/44 being less than 9.0. The mean Z score of all the Arabidopsis sequences with the 1MJH structure (15.35 ± 5.14 [SD]) is significantly greater than the mean Z score with the 1JMV structure (12.18 ± 4.26; P = 0.0011). The set of Arabidopsis sequences described in Table I as sharing critical conserved residues with the structure of 1MJH, and designated in Figures 2 and 3 as "1MJH like," are in bold in Table II. It is apparent by simple inspection that these sequences have very high 123D+ Z scores when compared by higher order structural criteria with the solved structure of 1MJH. In fact, the mean Z score for these sequences is 21.65 ± 1.31. This score is nearly identical to the mean score of the various bacterial sequences with respect to their "parent" sequence (22.16 ± 2.13) and is significantly higher than that of the remainder of the USPA sequences (12.41 ± 3.22; P = 3.81e-17). This analysis confirms the cluster of Arabidopsis sequences initially defined by the consideration of conserved putative ATP-binding residues presented in Table I and extends the region of high structural similarity to the entire molecule.


                              
View this table:
[in this window]
[in a new window]
 
Table II.   Threading scores for bacterial USPA proteins and Arabidopsis candidates

With the 3D-PSSM method, the best score for each structure was at the 95% confidence level for all the Arabidopsis sequences except At3g13690, where the best hit was at the 90% level, and At1g55200, where it was at the 80% level. Two sequences identified in the legend to Table II were excluded from the analysis for failing to produce hits to the bacterial structures at the 50% confidence level. For most of the Arabidopsis sequences, the 3D-PSSM method is in agreement with the 123D+ method, assigning a higher score to the hit with the structure of 1MJH than to that of 1JMV. There are six exceptions to this, where the stronger score was assigned to the hit with the 1JMV structure: At5g63940, At3g21210, At1g72760, At3g03290, At4g25160, and At3g49060. A consideration of the clustering pattern in the phylogenetic tree makes it very unlikely that any of these represent accurate results, however. In each case, one of these sequences is placed in a highly supported cluster with one or more other sequences where both threading methods agree in assigning the higher score to the 1MJH hit. We consider it much less likely that two sequences from two different lineages would converge in such a fashion than that the 3D-PSSM method has erroneously assigned an incorrect score. In summary, the threading data strongly support the members of the Arabidopsis sequence set as valid USPA domain candidates, they show that the sequences have the highest compatibility with the 1MJH structure, and they support the presence of a cluster of sequences with particularly high structural similarity to 1MJH.

All Arabidopsis sequences were examined for additional conserved protein domains through the use of the NCBI CDD (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). Three of the family 1.4.1 protein kinases contain detectable U box domains at their extreme C termini: At5g26150 (E = 3e-18), At4g25160 (E = 3e-18), and At5g61550 (E = 2e-16). Sequence At3g49060, also a 1.4.1 kinase, contains a U box (E = 3e-20) downstream of the kinase domain but upstream of a Myosin-tail domain (E = 6e-04) that lies at the C terminus. This C-terminal region obtains high scores in methods that predict likely coiled-coil conformation: Coils (http://www.ch.embnet.org/software/COILS_form.html; Lupas et al., 1991) and MultiCoil (http://multicoil.lcs.mit.edu/cgi-bin/multicoil; Wolf et al., 1997). Finally, sequence At3g21210 represents a peculiar case. It is a long protein with a USPA domain at the amino terminus. There is no kinase domain detectable by profile methods. This sequence contains the conserved stretch GSVS corresponding to residues 140 to 143 of 1MJH. It is placed in the tree in the assemblage of "1MJH-like sequences" cited in Table I. In support of this, it receives a high Z score from 123D+ for threading with the 1MJH structure (19.16). However, an examination of the alignment shows that it has apparently suffered a deletion of most of the residues between the G corresponding to G127 of 1MJH and G140.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
MATERIALS AND METHODS
LITERATURE CITED

Arabidopsis proteins that contain the USPA domain represent a sufficiently diverse group of sequences that they cannot be identified using primary sequence comparisons alone. Simple pair-wise comparison with database sequences (e.g. BLAST) do not have sufficient resolving power to identify these sequences with significant scores utilizing bacterial USPA sequence queries. Rather, alternative approaches that use the combined power of multiple sequence-based models (machine learning-based motifs, position-specific scoring matrices, and Hidden Markov Models [HMMs]) were necessary to find them. Conventional phylogenetic analysis based upon sequence comparison often failed to produce robust clustering patterns with high bootstrap support. However, meaningful patterns could be discerned within this sequence set when approaches were used that probed additional levels of protein structure, drawing upon the resource of solved three-dimensional USPA domain structures. When patterns of conservation of critical residues known to be involved in a common biochemical function, namely ATP binding, were examined, a set of Arabidopsis sequences emerged, the "1MJH-like" sequences (see Figs. 2 and 3), which clearly share these critical features with the bacterial structure 1MJH. When threading algorithms were applied, which draw upon conservation of secondary and tertiary structural features, this collection of Arabidopsis sequences was strongly supported as a distinct group. This general approach of examination of protein structural features meaningfully extends analyses using only sequence comparisons.

The solved USPA domain structures (1MJH and 1JMV) show that they fold into an alpha/beta conformation featuring a slightly twisted planar surface of several parallel beta strands, with alpha helices adjacent on either side of this surface (Zarembinski et al., 1998; Sousa and McKay, 2001). Structural comparison methods have revealed that the two structures are readily superimposable over most of their length: beta 1, alpha 1, beta 2, beta 3, alpha 3, beta 4, and beta 5 (Sousa and McKay, 2001). These regions of similarity are readily detectable in the bacterial sequences presented in the alignment of Figure 1. The similarity of the Arabidopsis sequences to the bacterial sequences in these regions is also evident from the alignment. That the Arabidopsis sequences are likely to be able to assume this specific set of secondary structural elements is indicated by the strong scores in the threading analysis. The solved bacterial structures are multimers, consisting of two (1MJH) or four protomers (1JMV) that appear to interact with each other at beta 5. This is confirmed by biochemical evidence consistent with a dimer conformation of the native bacterial proteins (Freestone et al., 1997; Sousa and McKay, 2001). Conserved hydrophobic residues in beta 5 apparently mediate this protomer interaction. An examination of the alignment data shows that the five Arabidopsis sequences presented in Table I with the highest degree of conservation with the structure of 1MJH (At3g11930, At3g03270, At1g11360, At5g54430, and At4g27320) all possess a highly conserved hydrophobic stretch in beta 5 that would be consistent with such an interaction. This, plus the fact that these are small proteins similar to their bacterial counterparts, suggests that these proteins may also assume a dimer association.

There are distinctive differences between the structures of the solved bacterial proteins. These occur in alpha 2, alpha 4, and the loop between beta 4 and alpha 4. The 1MJH structure contains a long alpha 2 helix. In contrast, the 1JMV structure has a short alpha 2 helix, corresponding to the distal portion of the helix of 1MJH. The proximal portion of alpha 2 in 1JMV apparently has a degree of conformational flexibility, assuming a more disordered conformation in three of the four protomers in the crystal, and a short beta 2' strand in the fourth. Finally, helix alpha 4 is somewhat shorter in 1MJH than it is in 1JMV (Sousa and McKay, 2001). Two threading methods were used in this study to indicate the degree of compatibility between query sequences and solved USPA structures. Detailed examination of the predicted secondary structures obtained with analysis of the bacterial sequences showed that neither method was entirely successful in modeling all the subtle structural variations that exist between the 1MJH and 1JMV structures (data not shown). However, there was sufficient discriminating ability to result in correct overall relative score assignments, particularly for the 123D+ method.

The most distinctive difference between the solved bacterial structures, however, occurs in the region between beta 4 and alpha 4, and at the beginning of alpha 4. In 1MJH, there is a long loop, and residues in that loop and adjacent to it on either end participate in binding interactions with the Rib and phosphate groups of ATP. In contrast, this loop region is shorter in 1JMV, lacks the distal binding residues beyond the G at the end of beta 4, and cannot bind ATP (Sousa and McKay, 2001). The combination of primary sequence alignment and strong threading Z scores in 123D+ agree in defining a set of small Arabidopsis proteins that appear to have descended from a 1MJH-like ancestor and to have retained sequence features consistent with a possible ATP-binding function. However, the T143 of 1MJH is replaced by S in all these plant sequences. In other 1MJH-like bacterial sequences, this T residue is also replaced by S (YXIE_BACSU) or A (MJ0531 and MTH993). It will take biochemical experiments to determine the ATP-binding capabilities of these "1MJH-like" proteins. The remainder of the threading data are in overall agreement that all of the Arabidopsis sequences identified in this study have descended from a 1MJH-like ancestor.

It is apparent from the structure of the phylogenetic tree (Figs. 2 and 3) that the sequences of the 1.4.1, 1.3.1, and 1.3.2 kinases have diverged considerably from those of the small Arabidopsis proteins. This suggests that the USPA domains of these proteins serve distinct cellular functions from each other and from the smaller proteins. The Arabidopsis sequence At5g20310 (and also possibly At3g61410) represents a small protein that is contained within the fairly strongly supported cluster together with the 1.4.1 kinases. The simplest explanation for these results is that this sequence has lost a kinase domain as a secondary event after acquisition by a kinase ancestor of the USPA domain. Finally, the remaining small proteins show sequence features that indicate considerable divergence. They lack the conserved residues necessary for ATP binding beyond the conserved G at the end of beta 4.

There is fragmentary evidence available concerning possible function of the small Arabidopsis USPA proteins in vivo, some of it consistent with a role in stress responses. Hohnjec et al. (2000), in a study of genes differentially expressed in the nodules of legumes, identified the broad bean (Vicia faba) nodulin protein VfENOD18. Their analysis showed this protein to have primary sequence similarity to that of 1MJH and its relatives, including conserved ATP-binding residues. They postulated this protein might mediate an ATP-dependent function. Utilizing BLAST searches against Arabidopsis expressed sequence tags, they identified three similar sequences ("AtE1," "AtE3," and "AtE6"), which turned out to be identical to proteins At3g53990, At3g17020, and At3g03270, respectively. These are among our set of best conserved "1MJH-like" sequences (Table I). Zegzouti et al. (1999) performed a study of ethylene-induced gene expression in tomato (Lycopersicon esculentum), using differential display techniques. Ethylene serves as a plant hormone that has been well documented in a number of studies to exert effects on physiological processes such as fruit ripening, senescence, cell elongation, and leaf abscission, as well as responses to environmental stresses such as pathogens, wounding, and desiccation (Abeles et al., 1992; Lelievre et al., 1997). Zegzouti et al. (1999) identified several transcripts up-regulated by ethylene in various tissues. ER (ethylene responsive) 6 was a transcript that showed "moderate" up-regulation in response to ethylene in late fruit ripening but had a constitutive pattern of expression in leaves and roots. The Arabidopsis homolog to protein ER6, At1g09740, is a member of our data set. This is also one of the highly conserved "1MJH-like" small USPA protein sequence set. Other members of this "1MJH-like" group have been annotated with suggested functions because of similarity detected by BLAST searches. These include: At3g17020, At3g53990, and At3g62550 ("putative ER6" or "ER6 like"); At3g11930 ("ethylene responsive"); At2g21620 ("auxin regulated"); and At3g21210 ("CHP-rich zinc finger protein"). It should be emphasized that no experimental data have been reported to support these structural assignments. Finally, Yamaguchi-Shinozaki et al. (1992) identified a set of transcripts that were induced by desiccation (RD [desiccation responsive]) in Arabidopsis. The transcript for their clone "RD2" appeared in response to fairly severe desiccation (7 h, by which time the plants had lost more than 75% of their initial weight). RD2 corresponds to sequence At2g21620. This is a small USPA protein that shares some of the sequence features of the "1MJH-like" set in the crucial region responsible for binding to the Rib and phosphates of ATP: It has the conserved residues corresponding to 1MJH G130 and V142 but lacks the other conserved residues. In both the NJ and parsimony trees, it is placed close to the "1MJH-like" sequence set, but the low bootstrap support reflects its divergence.

The function of the USPA-containing protein kinases in Arabidopsis is more enigmatic. Several sequences in families 1.3.1 and 1.3.2 are annotated as being similar to the Pto protein or the Pto interactor protein Pti1. These are protein kinases that in tomatoes form part of the well-studied signaling pathway that mediates resistance to bacterial speck disease (Sessa and Martin, 2000). We performed BLAST searches with the family 1.3.1 and 1.3.2 kinase sequences and found hits to Pto with E values approximately e-50, and hits to Pti1 with E values of approximately e-60, indicating a high degree of sequence similarity and probable common ancestry. In each case, the hits with the 1.3.1 kinases were slightly stronger than those with the 1.3.2 kinases. By way of comparison, the BLAST score between the tomato Pto protein and the Arabidopsis homolog has an E value of approximately e-109. One sequence of family 1.4.1, At5g61550, is annotated as a putative disease resistance protein. We subjected all the 1.4.1 kinases in our set to BLAST analysis and found hits to Pto with E values of about e-45, but no hits to Pti1. Aravind et al. (2002) identified protein kinase "7488259" as being one of several Arabidopsis kinases containing a conserved U box motif. This sequence is identical to At2g45910, a protein kinase of the 1.4.1 family, which was dropped from our data set because the USPA domain it contains is truncated. However, our analysis of the remaining 1.4.1 kinases confirms that several have U box domains. These are modified zinc finger structures that are thought to be important in the process of protein ubiquitination (Aravind and Koonin, 2000) and suggest involvement of 1.4.1 kinases in this activity (Aravind et al., 2002). Sequence At3g49060, another 1.4.1 kinase, in addition to the U box motif, contains a C-terminal myosin tail motif. This, plus the high scores in secondary structure prediction methods, indicates that this portion of the molecule probably assumes a coiled-coil conformation, which often mediates protein-protein interactions.

In summary, our structural and phylogenetic analyses indicate that USPA domains from a 1MJH-like ancestor have been dispersed in several sets of Arabidopsis proteins. The available evidence indicates possible roles in stress-related responses, though it is unlikely that any of the Arabidopsis proteins correspond closely to the universal stress proteins of prokaryotes. Our data should provide a basis for the systematic experimental investigation of these proteins in their natural setting.


    MATERIALS AND METHODS
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
MATERIALS AND METHODS
LITERATURE CITED

Retrieval of USPA Domain Sequences from Databases

The initial objects of investigation were protein kinases previously classified into groups by maximal linkage (a sequence must match all sequences in a group to a given threshold rather than just matching one) and all-against-all BLAST (Gribskov et al., 2001). Kinases of the 1.3.2 and 1.4.1 families (Gribskov et al., 2001) were analyzed for location of their kinase domains by use of the ProfileScan server (http://hits.isb-sib.ch/cgi-bin/PFSCAN), which searches queries against the HMMs of the Pfam database (Bateman et al., 2000) and the generalized profiles of the Swiss Institute of Bioinformatics (Bucher et al., 1996). This region was removed from the sequences, and the remaining N and C terminal sequence, if more than 20 amino acids, was analyzed further. Later in the process, protein kinases of the 1.3.1 family were identified as described below and similarly processed to remove the kinase domain.

Sequences were initially subjected to BLASTP (Altschul et al., 1997) searches at NCBI. Hits were observed between some query sequences and the models for USPA domains in the NCBI CDD (CDD5811) and Pfam databases (PF00582). These sequences were then collected into a set and subjected to Multiple Em for Motif Elicitation (Bailey and Elkan, 1995; http://meme.sdsc.edu/meme/website/meme.html) analysis, which finds conserved motifs using expectation maximization. The NCBI nonredundant database was then searched for sequences with significant similarity by the Motif Alignment and Search Tool search technique (Bailey and Gribskov, 1998; http://meme.sdsc.edu/meme/website/mast.html). High-scoring sequences with the proper set of conserved motifs were then retrieved, the sequence set expanded, and the Multiple Em for Motif Elicitation/Motif Alignment and Search Tool procedure was iterated until no new high scoring hits were obtained. Later in the process, ClustalW (Higgins et al., 1996) was used to make a multiple sequence alignment, and a profile was generated with the ProfileMake program at the MotifWeb server (http://motifweb.sdsc.edu/) using the method of Gribskov and Veretnik (1996). The nonredundant protein sequence database was searched, and high scoring sequences were retrieved and placed into the alignment. Sequences with an acceptable level of similarity as judged by eye were retained. Alignments were manually edited and used to produce new profiles. This database search and profile generation procedure was iterated until no new high-scoring hits were obtained. In some instances, sequences were used as queries in BLASTP searches: at NCBI to discover their degree of similarity to known plant disease resistance proteins and at TIGR (http://www.tigr.org/tdb/e2k1/ath1/) or the PlantsP database (http://PlantsP.sdsc.edu) to discover the standard Arabidopsis identification number (TIGR version 2.0).

Proceeding independently, several public databases also cataloged Arabidopsis USPA domain proteins concurrently with our work. NCBI utilized PSSM and posted their results in CDD (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) under accession number CDD5811. HMMs were used by Inter-Pro (http://www.ebi.ac.uk/interpro/) and Pfam (http://pfam.wustl.edu/), and the resulting sequence sets were posted under accession numbers IPR006015 and PF00582, respectively. Identification of these proteins by these resources in indicated in Figure 3.

Threading Analysis

Sequences were subjected to threading analysis, which tests their compatibility with a solved reference folded structure. Two different implementations were used: 3D-PSSM (Kelley et al., 2000; http://www.sbg.bio.ic.ac.uk/~3dpssm/) and 123D+ (Alexandrov et al., 1995; http://123d.ncifcrf.gov/run123D+.html). Each technique utilizes a combination of primary sequence similarity, predicted secondary structure, and amino acid exposure/solvation potentials to arrive at an estimate of compatibility, expressed as an E value and confidence interval (3D-PSSM) or Z score (123D+).

Multiple Sequence Alignment and Phylogenetic Tree Analysis

The structural alignment between amino acid sequences of solved bacterial USPA structures 1MJH (Methanococcus jannaschii protein MJ0577), 1JMV (Haemophilus influenzae universal stress protein), and the universal stress protein of Escherichia coli produced by Sousa and McKay (2001) was used as a starting point. A small collection of bacterial USPA proteins obtained from the literature was added to this alignment by using the profile alignment option of ClustalW, followed by manual editing. The set of Arabidopsis USPA protein candidates was then added to this bacterial sequence alignment using the profile alignment feature of ClustalW, followed by manual editing. Phylogenetic trees were inferred by the NJ algorithm (Saitou and Nei, 1987) as implemented in ClustalW and by maximum parsimony as implemented in PHYLIP (Felsenstein, 1996). Each starting alignment was reshuffled by bootstrap resampling (1,000× for NJ, 200× for parsimony), followed by tree inference, and finally the inference of a consensus tree. The tree topology presented is that generated by NJ, with bootstrap support at critical nodes indicated as a percentage.

Examination of Conserved Protein Domains

Conserved protein domains were examined by using the CDD at NCBI (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) in conjunction with BLAST searches.

    FOOTNOTES

Received October 9, 2002; returned for revision November 3, 2002; accepted December 18, 2002.

1 This work was supported by the National Science Foundation (grant nos. NSF ROA DBI-9975808/PTLOMA and NSF DBI-9975808).

* Corresponding author; e-mail dkerk{at}ptloma.edu; fax 619-849-2598.

Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.102.016006.


    LITERATURE CITED
TOP
ABSTRACT
INTRODUCTION
RESULTS
DISCUSSION
MATERIALS AND METHODS
LITERATURE CITED

  • Abeles FB, Morgan PW, Salveit ME (1992) Ethylene in Plant Biology, Ed 2. Academic Press, San Diego, pp 1-432
  • Alexandrov NN, Nussinov R, Zimmer RM (1995) Fast protein fold recognition via sequence to structure alignment and contact capacity potentials. In L Hunter, T E Klein, eds, Pacific Symposium on Biocomputing '96. World Scientific Publishing, Singapore, pp 53-72
  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402[Abstract/Free Full Text]
  • Aravind L, Anantharaman V, Koonin EV (2002) Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA world. Proteins Struct Funct Genet 48: 1-14[CrossRef][Web of Science][Medline]
  • Aravind L, Koonin EV (2000) The U box is a modified RING finger: a common domain in ubiquitination. Curr Biol 10: R132-R134[CrossRef][Web of Science][Medline]
  • Bailey TL, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21: 51-80
  • Bailey TL, Gribskov M (1998) Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 48-54[Abstract/Free Full Text]
  • Bateman A, Birney E, Durbin R, Eddy SR, Howe KI, Sonnhammer ELL (2000) The Pfam protein families database. Nucleic Acids Res 28: 263-266[Abstract/Free Full Text]
  • Bucher P, Karplus K, Moeri N, Hofmann K (1996) A flexible search technique based on generalized profiles. Comput Chem 20: 3-24[CrossRef][Web of Science][Medline]
  • Felsenstein J (1996) Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266: 418-426[Web of Science][Medline]
  • Freestone P, Nystrom T, Trinei M, Norris V (1997) The universal stress protein, UspA, of Escherichia coli is phosphorylated in response to stasis. J Mol Biol 274: 318-324[CrossRef][Medline]
  • Freestone P, Trinei M, Clarke SC, Nystrom T, Norris V (1998) Tyrosine phosphorylation in Escherichia coli. J Mol Biol 279: 1045-1051[CrossRef][Medline]
  • Gribskov M, Fana F, Harper J, Hope DA, Harmon AC, Smith DW, Tax FE, Zhang G (2001) PlantsP: a functional genomics database for plant phosphorylation. Nucleic Acids Res 29: 111-113[Abstract/Free Full Text]
  • Gribskov M, Veretnik S (1996) Identification of sequence patterns with profile analysis. Methods Enzymol 266: 198-211[Web of Science][Medline]
  • Higgins DG, Thompson JD, Gibson TJ (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol 266: 383-401[Web of Science][Medline]
  • Hohnjec N, Kuster H, Albus U, Frosch SC, Becker JD, Puhler A, Perlick AM, Fruhling M (2000) The broad bean nodulin VfENOD18 is a member of a novel family of plant proteins with homologies to the bacterial MJ0577 superfamily. Mol Gen Genet 264: 241-250[Medline]
  • Kelley LA, MacCallum RM, Sternberg MJE (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299: 499-520[Web of Science][Medline]
  • Lelievre J-M, Latche A, Jones B, Bouzayen M, Pech JC (1997) Ethylene and fruit ripening. Physiol Plant 101: 727-739[CrossRef]
  • Lupas A, Van Dyke M, Stock J (1991) Predicting coiled coils from protein sequences. Science 252: 1162-1164[Free Full Text]
  • Nystrom T, Neidhardt FC (1992) Cloning, mapping and nucleotide sequencing of a gene encoding a universal stress protein in Escherichia coli. Mol Microbiol 6: 3187-3198[CrossRef][Medline]
  • Nystrom T, Neidhardt FC (1993) Isolation and properties of a mutant of Escherichia coli with an insertional inactivation of the uspA gene, which encodes a universal stress protein. J Bacteriol 175: 3949-3956[Abstract/Free Full Text]
  • Nystrom T, Neidhardt FC (1994) Expression and role of the universal stress protein, UspA, of Escherichia coli during growth arrest. Mol Microbiol 11: 537-544[CrossRef][Web of Science][Medline]
  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406-425[Abstract]
  • Sessa G, Martin GB (2000) Protein kinases in the plant defense response. Adv Bot Res 32: 379-398
  • Sousa MC, McKay DB (2001) Structure of the universal stress protein of Haemophilus influenzae. Structure 9: 1135-1141[Medline]
  • Tao H, Bausch C, Richmond C, Blattner FR, Conway T (1999) Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media. J Bacteriol 181: 6425-6440[Abstract/Free Full Text]
  • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25: 4876-4882[Abstract/Free Full Text]
  • Wolf E, Kim PS, Berger B (1997) MultiCoil: a program for predicting two- and three-stranded coiled coils. Protein Sci 6: 1179-1189[Web of Science][Medline]
  • Yamaguchi-Shinozaki K, Koizumi M, Urao S, Shinozaki K (1992) Molecular cloning and characterization of 9 cDNAs for genes that are responsive to desiccation in Arabidopsis thaliana: sequence analysis of one cDNA clone that encodes a putative transmembrane channel protein. Plant Cell Physiol 33: 217-224[Abstract/Free Full Text]
  • Zarembinski TI, Hung L-W, Mueller-Dieckmann H-J, Kim K-K, Yokota H, Kim R, Kim S-H (1998) Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. Proc Natl Acad Sci USA 95: 15189-15193[Abstract/Free Full Text]
  • Zegzouti H, Jones B, Frasse P, Marty C, Maitre B, Latche A, Pech J-C, Bouzayen M (1999) Ethylene-regulated gene expression in tomato fruit: characterization of novel ethylene-responsive and ripening-related genes isolated by differential display. Plant J 18: 589-600[CrossRef][Web of Science][Medline]
© 2003 American Society of Plant Biologists



This article has been cited by other articles:


Home page
J. Biol. Chem.Home page
G. Merkouropoulos, E. Andreasson, D. Hess, T. Boller, and S. C. Peck
An Arabidopsis Protein Phosphorylated in Response to Microbial Elicitation, AtPHOS32, Is a Substrate of MAP Kinases 3 and 6
J. Biol. Chem., April 18, 2008; 283(16): 10493 - 10499.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. L. Nielsen, A. L. Hogh, and J. Emmersen
DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples
Nucleic Acids Res., November 14, 2006; 34(19): e133 - e133.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (14)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kerk, D.
Right arrow Articles by Gribskov, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kerk, D.
Right arrow Articles by Gribskov, M.
Agricola
Right arrow Articles by Kerk, D.
Right arrow Articles by Gribskov, M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
ASPB Publications PLANT PHYSIOLOGY® THE PLANT CELL
Copyright © 2003 by the American Society of Plant Biologists