|
Plant Physiol, March 2003, Vol. 131, pp. 1209-1219
Arabidopsis Proteins Containing Similarity to the Universal
Stress Protein Domain of Bacteria1
David
Kerk,*
Joshua
Bulgrien,
Douglas W.
Smith, and
Michael
Gribskov
Department of Biology, Point Loma Nazarene University, 3900 Lomaland Drive, San Diego, California 92106 (D.K., J.B.); Division of
Biology, 0116, University of California, San Diego, La Jolla,
California 92093-0116 (D.W.S.); and San Diego Supercomputer Center,
0505, University of California, San Diego, La Jolla, California
92093-0505 (M.G.)
 |
ABSTRACT |
We have collected a set of 44 Arabidopsis proteins with
similarity to the USPA (universal stress protein A of
Escherichia coli) domain of bacteria. The USPA domain is
found either in small proteins, or it makes up the N-terminal portion
of a larger protein, usually a protein kinase. Phylogenetic tree
analysis based upon a multiple sequence alignment of the USPA domains
shows that these domains of protein kinases 1.3.1 and 1.3.2 form
distinct groups, as do the protein kinases 1.4.1. This indicates that
their USPA domain structures have diverged appreciably and suggests
that they may subserve distinct cellular functions. Two USPA fold
classes have been proposed: one based on Methanococcus
jannaschii MJ0577 (1MJH) that binds ATP, and the other based on
the Haemophilus influenzae universal stress protein
(1JMV), highly similar to E. coli UspA, which does not
bind ATP. A set of common residues involved in ATP binding in 1MJH and
conserved in similar bacterial sequences is also found in a distinct
cluster of Arabidopsis sequences. Threading analysis, which examines
aspects of secondary and tertiary structure, confirms this Arabidopsis
sequence cluster as highly similar to 1MJH. This structural approach
can distinguish between the characteristic fold differences of
1MJH-like and 1JMV-like bacterial proteins and was used to assign the
complete set of candidate Arabidopsis proteins to one of these fold
classes. It is clear that all the plant sequences have arisen from a
1MJH-like ancestor.
 |
INTRODUCTION |
The "USPA domain" is a recently
identified protein structure now known to be widespread in prokaryotic
organisms, both bacterial and archaeal. It is named after the UspA
(universal stress protein A) of Escherichia coli, which was
originally identified because of its prominence in stationary phase
cells. Genetic evidence has subsequently shown that UspA mediates
survival of cells starved for a wide variety of nutrients, exposed to
toxic chemicals, and exposed to osmotic stress or UV light damage
(Nystrom and Neidhardt, 1992 , 1993 ,
1994 ). It is a Ser and Thr phosphoprotein, which is phosphorylated by the Tyr phosphoprotein TypA (Freestone et al., 1997 , 1998 ). The precise biochemical function of
UspA is unknown. However, the fact that it is vital to stationary phase
cell growth, and the observation that uspA gene
transcription is up-regulated during the shift from Glc to acetate
metabolism, has led to the suggestion that it is involved in
coordinating this metabolic shift (Nystrom and Neidhardt,
1993 ; Tao et al., 1999 ). The first solved
structure in the USPA domain family (1MJH) was obtained using the
Methanococcus jannaschii protein MJ0577 (Zarembinski et al., 1998 ). This protein surprisingly contained a bound
molecule of ATP in the crystal structure. Biochemical analysis showed
that although MJ0577 binds ATP, it can only hydrolyze it in the
presence of uncharacterized additional proteins from an M. jannaschii crude cell extract. It was suggested that MJ0577
mediates an ATP-dependent function, such as acting as a molecular
switch. However, no cellular role has been discovered for this protein.
More recently, the structure of the universal stress protein of
Haemophilus influenzae (1JMV), which has 68% sequence
similarity to E. coli UspA, has been solved (Sousa
and McKay, 2001 ). In contrast to MJ0577, this protein cannot
bind ATP. This has led to the suggestion that there are two distinct
folds encompassing the USPA domain family: one exemplified by the
ATP-binding structure of 1MJH and the other exemplified by the
non-ATP-binding structure of 1JMV.
The USPA domain appears to be part of an ancient protein structural
family. Zarembinski et al. (1998) noted in their
characterization of the 1MJH structure that it has significant
similarities to other solved structures: human electron transport
flavoprotein, DNA photolyase, and tyrosyl-tRNA synthetase. These
initial observations have recently been confirmed and extended by
Aravind et al. (2002) , who have performed an extensive
evolutionary analysis. Based on patterns of conserved structural
features, they propose that the USPA domain is part of a larger protein
structural family, whose members had already diversified and were
present in the last universal common ancestor of all extant life. They
suggest that the ancestral function of the USPA domain was nucleotide
binding and signal transduction.
We have performed extensive database searches utilizing techniques
exploiting primary sequence similarity and higher order structural
features to assemble a set of sequences in Arabidopsis that encode USPA
domains. Our analysis has revealed both free-standing small proteins
reminiscent of those found in bacteria and larger proteins where the
USPA domain comprises but a part of the total sequence. Our evidence
indicates that all Arabidopsis USPA domain-containing sequences have
evolved from a 1MJH-like ancestor.
 |
RESULTS |
Repetitive database searches with conserved sequence motifs and
sequence profiles produced a set of 44 Arabidopsis proteins plus
prokaryotic proteins, each containing domains with similarity to the
bacterial USPA domain. A multiple sequence alignment is presented in
Figure 1. The alignment is annotated with
features of secondary structure revealed in the structure of 1MJH
(Zarembinski et al., 1998 ). This consists of five beta
strands, alternating with four alpha helices. A number of conserved
blocks of hydrophobic sequence are readily apparent that correspond to
residues in beta 1, alpha 1, beta 2, beta 3, alpha 3, beta 4, and beta
5. Five sequences, identified in the legend to Figure 1, were dropped from the alignment because they were truncated and did not extend through all the known secondary structure elements.

View larger version (128K):
[in this window]
[in a new window]
|
Figure 1.
Multiple sequence alignment (MSA) of the USPA
domains of the USPA proteins. See text for identification and
discussion of these sequences. Arabidopsis sequence numbers from The
Institute for Genomic Research (TIGR) are used for plant sequences;
otherwise, National Center for Biotechnology Information (NCBI),
Protein Data Bank, or Swiss-PROT names are used for prokaryotic
sequences; NCBI gi numbers are given in Figure 3. Three shades of gray
are used to designate amino acids conserved at a given position in the
multiple sequence alignment to at least 85%: darkest shade,
hydrophobic residues (AMILVCWF); middle shade, Gly (G); and lightest
shade, acidic residues (DE). Shading was done using ClustalX
(Thompson et al., 1997 ). Secondary structure elements
and ATP-binding residues shown are from the structure of 1MJH
(Zarembinski et al., 1998 ). The following sequences were
dropped from the data set at the alignment stage, because they were
truncated (did not contain all the known secondary structure elements):
At5g49050, At5g57680, At2g45910, At3g58450, and At5g61560. Sequences
At3g58450 and At5g49050 were identified independently by NCBI and
posted in the Conserved Domain Database (CDD). Sequence At3g58450 was
independently identified and posted by Inter-Pro.
|
|
The residues in the structure of 1MJH that make contact with the bound
molecule of ATP have been identified. We examined these positions in
the alignment, their conservation in closely related bacterial
proteins, and their representation in Arabidopsis sequences.
Position D13 in 1MJH (ILYPTD) is involved in
coordinating a Mn2+ ion, which in turn binds
to the phosphate groups. This residue is conserved in most of the
bacterial and Arabidopsis sequences. The V at position 41 of 1MJH
(VILLHV) hydrogen bonds to adenine. This
position is conserved in many of the aligned sequences. The G at
position 127 of 1MJH (IIIMG) hydrogen bonds to Rib. This is present in nearly all the sequences. Position G130 (IMGSHG) hydrogen bonds with the
beta phosphate and is conserved in the small group of MJH-like
bacterial sequences (YXIE_BACSU, 1MJH, MJ0531, and MTH993) and a subset
of Arabidopsis proteins. The G at position 140 in 1MJH
(GSVTEN) does not bind to ATP, but it is
conserved in all the MJH-like bacteria, suggesting it may be important.
It may serve to position subsequent residues in the proper binding
conformation. It is conserved in a subset of the Arabidopsis sequences.
Position S141 in 1MJH (GSVTEN) hydrogen bonds to
the gamma phosphate and is conserved in the MJH-like bacterial
sequences and a subset of the Arabidopsis sequences. The V at position
142 in 1MJH (GSVTEN) is conserved in the
MJH-like bacteria and in a subset of the Arabidopsis sequences. In
1MJH, T at position 143 (GSVTEN) hydrogen bonds
to the alpha phosphate. This residue is replaced by A or S in the
MJH-like bacterial sequences and by S in a subset of Arabidopsis
sequences. Table I summarizes the
analysis of the subset of Arabidopsis sequences with the best
conservation of amino acid residues known to be functionally important
in 1MJH or conserved in its closest bacterial relatives. Five sequences
(At1g11360, At3g03270, At3g11930, At4g27320, and At5g54430) contain all
of them.
Figures 2 and
3 present alternative renderings of a
phylogenetic tree constructed from the multiple sequence alignment of Figure 1. Several distinct groups are revealed by this analysis. The
Arabidopsis sequences presented in Table I are designated "1MJH like,
plant," are shown in bold in Figures 2 and 3, and their branches are
thickened. A striking aspect of Figure 3 is the concentration of hits
for various types of bacterial USPA domain models with the Arabidopsis
sequences in this group, as compared with the sequence set as a whole.
This indicates that a substantial degree of sequence conservation is
readily detectable between the Arabidopsis sequences and their
bacterial counterparts.

View larger version (39K):
[in this window]
[in a new window]
|
Figure 2.
Radial phylogenetic tree from USPA protein
sequence comparisons. Branch lengths are in arbitrary units. Sequence
names are those used in Figure 1. See text for definitions and
discussion appropriate to the clustered groupings shown. Names of
Arabidopsis sequences from Table I are in bold, and branches are
broadened. Only branches with ClustalW neighbor-joining (NJ) bootstrap
values of 40% or higher are shown.
|
|

View larger version (58K):
[in this window]
[in a new window]
|
Figure 3.
Topographic cladogram with additional information
for USPA protein sequences. Branch lengths for the cladogram are unit
length. Representative bootstrap values are shown; the value above the
line is the ClustalW NJ value, and the value below the line is the
parsimony value (see "Materials and Methods"). An appropriate NCBI
gi number is provided for each taxon. The PlantsP plant phosphorylation
database (Gribskov et al., 2001 ) identification number
is shown for all plant kinase sequences. The availability of expressed
sequence tags (Yes/No) and the number are indicated if "Yes," and
the availability of a full-length cDNA is indicated (Yes/No). The
sequence names shown correspond to those shown in Figure 1, and cluster
designations correspond to those shown in Figure 2. Sequences At5g20310
and At3g61410 are denoted S for short; these sequences also are not
present in the PlantsP database because they are not kinases (see text)
and, hence, are designated NA for not applicable. Names of Arabidopsis
sequences from Table I are in bold, and branches are broadened. Only
branches with ClustalW NJ bootstrap values of 40% or higher are shown.
Sequences designated with the letters "a," "b," and/or "c"
were independently identified and posted in the NCBI CDD, Inter-Pro
(IPR006015), and Pfam (PF00582) public databases, respectively.
|
|
However, despite their highly suggestive pattern of conserved residues,
and the clear similarity with bacterial USPA domain models,
this group of sequences nevertheless fails to form a single well-supported clade when subjected to conventional phylogenetic tree
analysis. A cluster of seven sequences (At4g27320, At5g54430, At3g21210, At1g11360, At3g53990, At3g03270, and At3g17020) receives moderate bootstrap support (735/1,000 NJ, 95/200 parsimony) and features more strongly supported subclusters. Two other clusters are
apparent in the NJ tree, but these each receive weak support (At3g01520, At5g14680, At1g68300, 427/1,000 NJ, and 66/200 parsimony; and At3g11930, At2g47710, At1g09740, At3g62550, 450/1000 NJ, and 41/200
parsimony). These are short sequences in which the USPA domain
comprises essentially the whole protein. This architecture is very
reminiscent of the bacterial proteins initially characterized with this
domain. A second group of sequences (Figs. 2 and 3; "Small Plant"
cluster) containing At2g03720, At1g69080, At5g17390, At3g03290, and
At1g44760 receives moderate bootstrap support in NJ (626/1,000), weaker
support in parsimony (51/200), and also comprises small proteins.
Inspection of the alignment shows that these sequences are considerably
more divergent and fail to contain a number of the functionally
important residues characterized in the bacterial sequence 1MJH. Three
groups are formed that consist primarily of protein kinases, where the
USPA domain resides at the N terminus of the sequence, upstream of the
kinase domain (in the analysis presented here, the kinase domain was
identified and removed before USPA domain analysis). One cluster (Figs.
2 and 3; protein kinase 1.4.1 cluster) with strong bootstrap support in
NJ (995/1,000 NJ) and somewhat weaker support in parsimony (77/200)
consists of kinases of the 1.4.1 (receptor-like cytoplasmic kinase IX)
family: At1g16760, At1g78940, At3g20200, At2g24370, At4g31230,
At2g07020, At5g35380, At1g72760, At1g17540, At5g12000, At5g26150,
At4g25160, and At5g61550). This cluster also contains one small
protein: At5g20310. In the NJ tree, but not in the parsimony tree, this
cluster contains an additional 1.4.1 kinase (At3g49060) paired with the
small sequence At3g61410 (in the parsimony tree, this pair is preserved
but placed elsewhere). Another cluster (Figs. 2 and 3; protein kinase
1.3.2 cluster) with strong bootstrap support (1,000/1,000 NJ; 196/200
parsimony) contains kinases of the 1.3.2 (receptor-like cytoplasmic
kinase VI) family: At1g77280, At1g21590, and At5g63940. There is also a
pair of sequences (Figs. 2 and 3; protein kinase 1.3.1 cluster) of the
1.3.1 (Pro-rich receptor kinase) family: At3g13690 and At1g55200. These
cluster together with 1.3.2 kinases in the NJ tree with weak NJ
bootstrap support (366/1,000); however, they are placed elsewhere in
the parsimony tree. The topology of the clustering of bacterial
sequences differs slightly between the two phylogenetic tree inference
methods. Figures 2 and 3 show the NJ topology, where there are two
well-supported clusters of bacterial sequences. 1MJH, together with
MJ0531, MTH993, and YXIE_BACSU, form one group (926/1,000) that is in
agreement with previously published work (Zarembinski et al.,
1998 ). USPA_ECOLI and 1JMV cluster tightly, which also confirms
published work (Sousa and McKay, 2001 ). Together with
YECG_ECOLI, these sequences form another well-supported group
(998/1,000). In the parsimony tree, all the bacterial sequences form a
single cluster with good bootstrap support (150/200). Finally, in the
NJ tree, the Arabidopsis sequences At4g13450 and At3g25930 are
associated with these bacterial sequences. The bootstrap support for
these associations is not strong, however, and close examination of the
alignment reveals no obvious support for this placement. Our conclusion
is that this represents noise in the analysis. This is reinforced by
the observation that these Arabidopsis sequences are placed elsewhere
in the parsimony tree.
The USPA containing proteins are clearly divergent at the primary
sequence level. This is apparent from inspection of the alignment in
Figure 1 and is reflected in the weak or moderate bootstrap support for
many of the clusters identified and the occasional topological
differences between the NJ and parsimony trees. To confirm the
Arabidopsis sequences as USPA domain candidates, protein
three-dimensional structure threading analysis was performed as
described in "Materials and Methods." In this approach, we used as
a reference the solved crystal structures of 1MJH (Zarembinski et al., 1998 ) and 1JMV (Sousa and McKay, 2001 ).
Results are presented in Table II.
Initial results with the bacterial sequence set showed that the
techniques are able to distinguish between the subtle difference in the
conformations of the folded structure of 1MJH and 1JMV. The
123D+ method (Alexandrov et al., 1995 ) correctly assigns
a superior Z score for the compatibility of MJ0531, MTH993, and
YXIE_BACSU with the structure of 1MJH. It also correctly assigns USPA_ECOLI and YECG_ECOLI higher scores for compatibility with the
structure of 1JMV. The 3D-PSSM method (Kelley et al.,
2000 ) also makes correct fold assignment scores for the
1MJH-like group. However, it misassigns a higher score to YECG_ECOLI
for compatibility with 1MJH rather than 1JMV. For both methods, there
was a strong preference for assigning the highest scores to the USPA
structures 1MJH or 1JMV among the various structures represented in the
fold libraries. Thus, for all Arabidopsis query sequences except one, the top two hits returned from the fold libraries were always these
two. Thus, the threading methods could readily distinguish a preference
of the Arabidopsis sequences for these folds, as opposed to other
alpha/beta structures present in the fold libraries. For the 123D+
method, all Arabidopsis sequences produced a stronger hit with the
structure of 1MJH than with 1JMV. When the stronger score for the pair
is considered for each Arabidopsis sequence, the Z scores range from
6.96 to 23.36, with only 3/44 being less than 9.0. The mean Z score of
all the Arabidopsis sequences with the 1MJH structure (15.35 ± 5.14 [SD]) is significantly greater than the
mean Z score with the 1JMV structure (12.18 ± 4.26;
P = 0.0011). The set of Arabidopsis sequences described
in Table I as sharing critical conserved residues with the structure of 1MJH, and designated in Figures 2 and 3 as "1MJH like," are in bold
in Table II. It is apparent by simple inspection that these sequences
have very high 123D+ Z scores when compared by higher order structural
criteria with the solved structure of 1MJH. In fact, the mean Z score
for these sequences is 21.65 ± 1.31. This score is nearly
identical to the mean score of the various bacterial sequences with
respect to their "parent" sequence (22.16 ± 2.13) and is
significantly higher than that of the remainder of the USPA sequences
(12.41 ± 3.22; P = 3.81e-17). This analysis
confirms the cluster of Arabidopsis sequences initially defined by the consideration of conserved putative ATP-binding residues presented in
Table I and extends the region of high structural similarity to the
entire molecule.
With the 3D-PSSM method, the best score for each structure was at
the 95% confidence level for all the Arabidopsis sequences except
At3g13690, where the best hit was at the 90% level, and At1g55200,
where it was at the 80% level. Two sequences identified in the legend
to Table II were excluded from the analysis for failing to produce hits
to the bacterial structures at the 50% confidence level. For most of
the Arabidopsis sequences, the 3D-PSSM method is in agreement with the
123D+ method, assigning a higher score to the hit with the structure of
1MJH than to that of 1JMV. There are six exceptions to this, where the
stronger score was assigned to the hit with the 1JMV structure:
At5g63940, At3g21210, At1g72760, At3g03290,
At4g25160, and At3g49060. A consideration of the clustering
pattern in the phylogenetic tree makes it very unlikely that any of
these represent accurate results, however. In each case, one of these
sequences is placed in a highly supported cluster with one or more
other sequences where both threading methods agree in assigning the
higher score to the 1MJH hit. We consider it much less likely that two
sequences from two different lineages would converge in such a fashion
than that the 3D-PSSM method has erroneously assigned an
incorrect score. In summary, the threading data strongly support the
members of the Arabidopsis sequence set as valid USPA domain
candidates, they show that the sequences have the highest compatibility
with the 1MJH structure, and they support the presence of a cluster of
sequences with particularly high structural similarity to 1MJH.
All Arabidopsis sequences were examined for additional conserved
protein domains through the use of the NCBI CDD
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). Three of the
family 1.4.1 protein kinases contain detectable U box domains at their
extreme C termini: At5g26150 (E = 3e-18), At4g25160 (E = 3e-18), and At5g61550 (E = 2e-16). Sequence At3g49060, also a
1.4.1 kinase, contains a U box (E = 3e-20) downstream of the
kinase domain but upstream of a Myosin-tail domain (E = 6e-04) that lies at the C terminus. This C-terminal region obtains
high scores in methods that predict likely coiled-coil conformation: Coils (http://www.ch.embnet.org/software/COILS_form.html; Lupas et al., 1991 ) and MultiCoil
(http://multicoil.lcs.mit.edu/cgi-bin/multicoil; Wolf et
al., 1997 ). Finally, sequence At3g21210 represents a peculiar case. It is a long protein with a USPA domain at the amino terminus. There is no kinase domain detectable by profile methods. This sequence
contains the conserved stretch GSVS corresponding to residues 140 to
143 of 1MJH. It is placed in the tree in the assemblage of "1MJH-like
sequences" cited in Table I. In support of this, it receives a high Z
score from 123D+ for threading with the 1MJH structure (19.16).
However, an examination of the alignment shows that it has apparently
suffered a deletion of most of the residues between the G corresponding
to G127 of 1MJH and G140.
 |
DISCUSSION |
Arabidopsis proteins that contain the USPA domain represent a
sufficiently diverse group of sequences that they cannot be identified
using primary sequence comparisons alone. Simple pair-wise comparison
with database sequences (e.g. BLAST) do not have sufficient resolving
power to identify these sequences with significant scores utilizing
bacterial USPA sequence queries. Rather, alternative approaches that
use the combined power of multiple sequence-based models (machine
learning-based motifs, position-specific scoring matrices, and Hidden
Markov Models [HMMs]) were necessary to find them. Conventional
phylogenetic analysis based upon sequence comparison often failed to
produce robust clustering patterns with high bootstrap support.
However, meaningful patterns could be discerned within this sequence
set when approaches were used that probed additional levels of protein
structure, drawing upon the resource of solved three-dimensional USPA
domain structures. When patterns of conservation of critical residues
known to be involved in a common biochemical function, namely ATP
binding, were examined, a set of Arabidopsis sequences emerged, the
"1MJH-like" sequences (see Figs. 2 and 3), which clearly share
these critical features with the bacterial structure 1MJH. When
threading algorithms were applied, which draw upon conservation of
secondary and tertiary structural features, this collection of
Arabidopsis sequences was strongly supported as a distinct group. This
general approach of examination of protein structural features
meaningfully extends analyses using only sequence comparisons.
The solved USPA domain structures (1MJH and 1JMV) show that they fold
into an alpha/beta conformation featuring a slightly twisted planar
surface of several parallel beta strands, with alpha helices adjacent
on either side of this surface (Zarembinski et al.,
1998 ; Sousa and McKay, 2001 ). Structural
comparison methods have revealed that the two structures are readily
superimposable over most of their length: beta 1, alpha 1, beta 2, beta
3, alpha 3, beta 4, and beta 5 (Sousa and McKay, 2001 ).
These regions of similarity are readily detectable in the bacterial
sequences presented in the alignment of Figure 1. The similarity of the
Arabidopsis sequences to the bacterial sequences in these regions is
also evident from the alignment. That the Arabidopsis sequences are likely to be able to assume this specific set of secondary structural elements is indicated by the strong scores in the threading analysis. The solved bacterial structures are multimers, consisting of two (1MJH)
or four protomers (1JMV) that appear to interact with each other at
beta 5. This is confirmed by biochemical evidence consistent with a
dimer conformation of the native bacterial proteins (Freestone et al., 1997 ; Sousa and McKay, 2001 ). Conserved
hydrophobic residues in beta 5 apparently mediate this
protomer interaction. An examination of the alignment data shows that
the five Arabidopsis sequences presented in Table I with the highest
degree of conservation with the structure of 1MJH (At3g11930,
At3g03270, At1g11360, At5g54430, and At4g27320) all possess a highly
conserved hydrophobic stretch in beta 5 that would be consistent with
such an interaction. This, plus the fact that these are small proteins
similar to their bacterial counterparts, suggests that these proteins
may also assume a dimer association.
There are distinctive differences between the structures of the solved
bacterial proteins. These occur in alpha 2, alpha 4, and the loop
between beta 4 and alpha 4. The 1MJH structure contains a long alpha 2 helix. In contrast, the 1JMV structure has a short alpha 2 helix,
corresponding to the distal portion of the helix of 1MJH. The proximal
portion of alpha 2 in 1JMV apparently has a degree of conformational
flexibility, assuming a more disordered conformation in three of the
four protomers in the crystal, and a short beta 2' strand in the
fourth. Finally, helix alpha 4 is somewhat shorter in 1MJH than it is
in 1JMV (Sousa and McKay, 2001 ). Two threading methods
were used in this study to indicate the degree of compatibility between
query sequences and solved USPA structures. Detailed examination of the
predicted secondary structures obtained with analysis of the bacterial
sequences showed that neither method was entirely successful in
modeling all the subtle structural variations that exist between the
1MJH and 1JMV structures (data not shown). However, there was
sufficient discriminating ability to result in correct overall relative
score assignments, particularly for the 123D+ method.
The most distinctive difference between the solved bacterial
structures, however, occurs in the region between beta 4 and alpha 4, and at the beginning of alpha 4. In 1MJH, there is a long loop, and
residues in that loop and adjacent to it on either end participate in
binding interactions with the Rib and phosphate groups of ATP. In
contrast, this loop region is shorter in 1JMV, lacks the distal binding
residues beyond the G at the end of beta 4, and cannot bind ATP
(Sousa and McKay, 2001 ). The combination of primary
sequence alignment and strong threading Z scores in 123D+ agree in
defining a set of small Arabidopsis proteins that appear to have
descended from a 1MJH-like ancestor and to have retained sequence
features consistent with a possible ATP-binding function. However, the
T143 of 1MJH is replaced by S in all these plant sequences. In other
1MJH-like bacterial sequences, this T residue is also replaced by
S (YXIE_BACSU) or A (MJ0531 and MTH993). It will take biochemical
experiments to determine the ATP-binding capabilities of these
"1MJH-like" proteins. The remainder of the threading data are in
overall agreement that all of the Arabidopsis sequences identified in
this study have descended from a 1MJH-like ancestor.
It is apparent from the structure of the phylogenetic tree (Figs. 2 and
3) that the sequences of the 1.4.1, 1.3.1, and 1.3.2 kinases have
diverged considerably from those of the small Arabidopsis proteins.
This suggests that the USPA domains of these proteins serve distinct
cellular functions from each other and from the smaller proteins. The
Arabidopsis sequence At5g20310 (and also possibly At3g61410) represents
a small protein that is contained within the fairly strongly supported
cluster together with the 1.4.1 kinases. The simplest explanation for
these results is that this sequence has lost a kinase domain as a
secondary event after acquisition by a kinase ancestor of the USPA
domain. Finally, the remaining small proteins show sequence features
that indicate considerable divergence. They lack the conserved residues
necessary for ATP binding beyond the conserved G at the end of beta 4.
There is fragmentary evidence available concerning possible function of
the small Arabidopsis USPA proteins in vivo, some of it consistent with
a role in stress responses. Hohnjec et al. (2000) , in a
study of genes differentially expressed in the nodules of legumes,
identified the broad bean (Vicia faba) nodulin protein VfENOD18. Their analysis showed this protein to have primary sequence similarity to that of 1MJH and its relatives, including conserved ATP-binding residues. They postulated this protein might mediate an
ATP-dependent function. Utilizing BLAST searches against Arabidopsis expressed sequence tags, they identified three similar sequences ("AtE1," "AtE3," and "AtE6"), which turned out to be
identical to proteins At3g53990, At3g17020, and At3g03270,
respectively. These are among our set of best conserved "1MJH-like"
sequences (Table I). Zegzouti et al. (1999) performed a
study of ethylene-induced gene expression in tomato (Lycopersicon
esculentum), using differential display techniques. Ethylene
serves as a plant hormone that has been well documented in a number of
studies to exert effects on physiological processes such as fruit
ripening, senescence, cell elongation, and leaf abscission, as well as
responses to environmental stresses such as pathogens,
wounding, and desiccation (Abeles et al., 1992 ;
Lelievre et al., 1997 ). Zegzouti et al.
(1999) identified several transcripts up-regulated by ethylene
in various tissues. ER (ethylene responsive) 6 was a transcript that
showed "moderate" up-regulation in response to ethylene in late
fruit ripening but had a constitutive pattern of expression in leaves
and roots. The Arabidopsis homolog to protein ER6, At1g09740, is a
member of our data set. This is also one of the highly conserved
"1MJH-like" small USPA protein sequence set. Other members of this
"1MJH-like" group have been annotated with suggested functions
because of similarity detected by BLAST searches. These include:
At3g17020, At3g53990, and At3g62550 ("putative ER6" or "ER6
like"); At3g11930 ("ethylene responsive"); At2g21620 ("auxin
regulated"); and At3g21210 ("CHP-rich zinc finger
protein"). It should be emphasized that no experimental data have
been reported to support these structural assignments. Finally,
Yamaguchi-Shinozaki et al. (1992) identified a set of
transcripts that were induced by desiccation (RD [desiccation responsive]) in Arabidopsis. The transcript for their clone "RD2" appeared in response to fairly severe desiccation (7 h, by which time the plants had lost more than 75% of their initial weight). RD2
corresponds to sequence At2g21620. This is a small USPA protein that
shares some of the sequence features of the "1MJH-like" set in the
crucial region responsible for binding to the Rib and phosphates of
ATP: It has the conserved residues corresponding to 1MJH G130 and V142
but lacks the other conserved residues. In both the NJ and parsimony
trees, it is placed close to the "1MJH-like" sequence set, but the
low bootstrap support reflects its divergence.
The function of the USPA-containing protein kinases in Arabidopsis is
more enigmatic. Several sequences in families 1.3.1 and 1.3.2 are
annotated as being similar to the Pto protein or the Pto interactor
protein Pti1. These are protein kinases that in tomatoes form part of
the well-studied signaling pathway that mediates resistance to
bacterial speck disease (Sessa and Martin, 2000 ). We
performed BLAST searches with the family 1.3.1 and 1.3.2 kinase
sequences and found hits to Pto with E values approximately e-50, and
hits to Pti1 with E values of approximately e-60, indicating a high
degree of sequence similarity and probable common ancestry. In each
case, the hits with the 1.3.1 kinases were slightly stronger than those
with the 1.3.2 kinases. By way of comparison, the BLAST score between
the tomato Pto protein and the Arabidopsis homolog has an E value of
approximately e-109. One sequence of family 1.4.1, At5g61550, is
annotated as a putative disease resistance protein. We subjected all
the 1.4.1 kinases in our set to BLAST analysis and found hits to Pto
with E values of about e-45, but no hits to Pti1. Aravind et al.
(2002) identified protein kinase "7488259" as being one of
several Arabidopsis kinases containing a conserved U box motif. This
sequence is identical to At2g45910, a protein kinase of the 1.4.1 family, which was dropped from our data set because the USPA domain it
contains is truncated. However, our analysis of the remaining 1.4.1 kinases confirms that several have U box domains. These are modified
zinc finger structures that are thought to be important in the process
of protein ubiquitination (Aravind and Koonin, 2000 ) and
suggest involvement of 1.4.1 kinases in this activity (Aravind
et al., 2002 ). Sequence At3g49060, another 1.4.1 kinase, in
addition to the U box motif, contains a C-terminal myosin tail motif.
This, plus the high scores in secondary structure prediction
methods, indicates that this portion of the molecule probably assumes a
coiled-coil conformation, which often mediates protein-protein interactions.
In summary, our structural and phylogenetic analyses indicate that USPA
domains from a 1MJH-like ancestor have been dispersed in several sets
of Arabidopsis proteins. The available evidence indicates possible
roles in stress-related responses, though it is unlikely that any of
the Arabidopsis proteins correspond closely to the universal stress
proteins of prokaryotes. Our data should provide a basis for the
systematic experimental investigation of these proteins in their
natural setting.
 |
MATERIALS AND METHODS |
Retrieval of USPA Domain Sequences from Databases
The initial objects of investigation were protein kinases
previously classified into groups by maximal linkage (a sequence must
match all sequences in a group to a given threshold rather than just
matching one) and all-against-all BLAST (Gribskov et al.,
2001 ). Kinases of the 1.3.2 and 1.4.1 families (Gribskov et al., 2001 ) were analyzed for location of their kinase
domains by use of the ProfileScan server
(http://hits.isb-sib.ch/cgi-bin/PFSCAN), which searches queries against
the HMMs of the Pfam database (Bateman et al., 2000 ) and
the generalized profiles of the Swiss Institute of Bioinformatics
(Bucher et al., 1996 ). This region was removed from the
sequences, and the remaining N and C terminal sequence, if more than 20 amino acids, was analyzed further. Later in the process, protein
kinases of the 1.3.1 family were identified as described below and
similarly processed to remove the kinase domain.
Sequences were initially subjected to BLASTP (Altschul et al.,
1997 ) searches at NCBI. Hits were observed between some query sequences and the models for USPA domains in the NCBI CDD (CDD5811) and
Pfam databases (PF00582). These sequences were then collected into a
set and subjected to Multiple Em for Motif Elicitation (Bailey and Elkan, 1995 ;
http://meme.sdsc.edu/meme/website/meme.html) analysis, which finds
conserved motifs using expectation maximization. The NCBI nonredundant
database was then searched for sequences with significant similarity by
the Motif Alignment and Search Tool search technique
(Bailey and Gribskov, 1998 ;
http://meme.sdsc.edu/meme/website/mast.html). High-scoring sequences
with the proper set of conserved motifs were then retrieved, the
sequence set expanded, and the Multiple Em for Motif Elicitation/Motif
Alignment and Search Tool procedure was iterated until no new high
scoring hits were obtained. Later in the process, ClustalW
(Higgins et al., 1996 ) was used to make a multiple
sequence alignment, and a profile was generated with the ProfileMake
program at the MotifWeb server (http://motifweb.sdsc.edu/) using the
method of Gribskov and Veretnik (1996) . The nonredundant protein sequence database was searched, and high scoring sequences were
retrieved and placed into the alignment. Sequences with an acceptable
level of similarity as judged by eye were retained. Alignments were
manually edited and used to produce new profiles. This database search
and profile generation procedure was iterated until no new high-scoring
hits were obtained. In some instances, sequences were used as queries
in BLASTP searches: at NCBI to discover their degree of similarity to
known plant disease resistance proteins and at TIGR
(http://www.tigr.org/tdb/e2k1/ath1/) or the PlantsP
database (http://PlantsP.sdsc.edu) to discover the standard Arabidopsis identification number (TIGR version 2.0).
Proceeding independently, several public databases also cataloged
Arabidopsis USPA domain proteins concurrently with our work. NCBI
utilized PSSM and posted their results in CDD
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) under accession
number CDD5811. HMMs were used by Inter-Pro
(http://www.ebi.ac.uk/interpro/) and Pfam (http://pfam.wustl.edu/), and
the resulting sequence sets were posted under accession numbers
IPR006015 and PF00582, respectively. Identification of these proteins
by these resources in indicated in Figure 3.
Threading Analysis
Sequences were subjected to threading analysis, which tests
their compatibility with a solved reference folded structure. Two
different implementations were used: 3D-PSSM (Kelley et al., 2000 ;
http://www.sbg.bio.ic.ac.uk/~3dpssm/) and 123D+
(Alexandrov et al., 1995 ;
http://123d.ncifcrf.gov/run123D+.html). Each technique utilizes a combination of primary sequence similarity, predicted secondary structure, and amino acid exposure/solvation potentials to
arrive at an estimate of compatibility, expressed as an E value and
confidence interval (3D-PSSM) or Z score (123D+).
Multiple Sequence Alignment and Phylogenetic Tree
Analysis
The structural alignment between amino acid sequences of solved
bacterial USPA structures 1MJH (Methanococcus jannaschii
protein MJ0577), 1JMV (Haemophilus influenzae universal
stress protein), and the universal stress protein of Escherichia
coli produced by Sousa and McKay (2001) was used
as a starting point. A small collection of bacterial USPA proteins
obtained from the literature was added to this alignment by using the
profile alignment option of ClustalW, followed by manual editing. The
set of Arabidopsis USPA protein candidates was then added to this
bacterial sequence alignment using the profile alignment feature of
ClustalW, followed by manual editing. Phylogenetic trees were inferred
by the NJ algorithm (Saitou and Nei, 1987 ) as
implemented in ClustalW and by maximum parsimony as implemented in
PHYLIP (Felsenstein, 1996 ). Each starting alignment was
reshuffled by bootstrap resampling (1,000× for NJ, 200× for
parsimony), followed by tree inference, and finally the inference of a
consensus tree. The tree topology presented is that generated by NJ,
with bootstrap support at critical nodes indicated as a percentage.
Examination of Conserved Protein Domains
Conserved protein domains were examined by using the CDD at NCBI
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) in conjunction
with BLAST searches.
 |
FOOTNOTES |
Received October 9, 2002; returned for revision November 3, 2002; accepted December 18, 2002.
1
This work was supported by the National Science
Foundation (grant nos. NSF ROA DBI-9975808/PTLOMA and NSF
DBI-9975808).
*
Corresponding author; e-mail dkerk{at}ptloma.edu; fax
619-849-2598.
Article, publication date, and citation information can be found at
www.plantphysiol.org/cgi/doi/10.1104/pp.102.016006.
 |
LITERATURE CITED |
-
Abeles FB, Morgan PW, Salveit ME
(1992)
Ethylene in Plant Biology, Ed 2. Academic Press, San Diego, pp 1-432
-
Alexandrov NN, Nussinov R, Zimmer RM
(1995)
Fast protein fold recognition via sequence to structure alignment and contact capacity potentials.
In
L Hunter, T E Klein, eds, Pacific Symposium on Biocomputing '96. World Scientific Publishing, Singapore, pp 53-72
-
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ
(1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res
25: 3389-3402[Abstract/Free Full Text]
-
Aravind L, Anantharaman V, Koonin EV
(2002)
Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA world.
Proteins Struct Funct Genet
48: 1-14[CrossRef][Web of Science][Medline]
-
Aravind L, Koonin EV
(2000)
The U box is a modified RING finger: a common domain in ubiquitination.
Curr Biol
10: R132-R134[CrossRef][Web of Science][Medline]
-
Bailey TL, Elkan C
(1995)
Unsupervised learning of multiple motifs in biopolymers using EM.
Machine Learning
21: 51-80
-
Bailey TL, Gribskov M
(1998)
Combining evidence using p-values: application to sequence homology searches.
Bioinformatics
14: 48-54[Abstract/Free Full Text]
-
Bateman A, Birney E, Durbin R, Eddy SR, Howe KI, Sonnhammer ELL
(2000)
The Pfam protein families database.
Nucleic Acids Res
28: 263-266[Abstract/Free Full Text]
-
Bucher P, Karplus K, Moeri N, Hofmann K
(1996)
A flexible search technique based on generalized profiles.
Comput Chem
20: 3-24[CrossRef][Web of Science][Medline]
-
Felsenstein J
(1996)
Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods.
Methods Enzymol
266: 418-426[Web of Science][Medline]
-
Freestone P, Nystrom T, Trinei M, Norris V
(1997)
The universal stress protein, UspA, of Escherichia coli is phosphorylated in response to stasis.
J Mol Biol
274: 318-324[CrossRef][Medline]
-
Freestone P, Trinei M, Clarke SC, Nystrom T, Norris V
(1998)
Tyrosine phosphorylation in Escherichia coli.
J Mol Biol
279: 1045-1051[CrossRef][Medline]
-
Gribskov M, Fana F, Harper J, Hope DA, Harmon AC, Smith DW, Tax FE, Zhang G
(2001)
PlantsP: a functional genomics database for plant phosphorylation.
Nucleic Acids Res
29: 111-113[Abstract/Free Full Text]
-
Gribskov M, Veretnik S
(1996)
Identification of sequence patterns with profile analysis.
Methods Enzymol
266: 198-211[Web of Science][Medline]
-
Higgins DG, Thompson JD, Gibson TJ
(1996)
Using CLUSTAL for multiple sequence alignments.
Methods Enzymol
266: 383-401[Web of Science][Medline]
-
Hohnjec N, Kuster H, Albus U, Frosch SC, Becker JD, Puhler A, Perlick AM, Fruhling M
(2000)
The broad bean nodulin VfENOD18 is a member of a novel family of plant proteins with homologies to the bacterial MJ0577 superfamily.
Mol Gen Genet
264: 241-250[Medline]
-
Kelley LA, MacCallum RM, Sternberg MJE
(2000)
Enhanced genome annotation using structural profiles in the program 3D-PSSM.
J Mol Biol
299: 499-520[Web of Science][Medline]
-
Lelievre J-M, Latche A, Jones B, Bouzayen M, Pech JC
(1997)
Ethylene and fruit ripening.
Physiol Plant
101: 727-739[CrossRef]
-
Lupas A, Van Dyke M, Stock J
(1991)
Predicting coiled coils from protein sequences.
Science
252: 1162-1164[Free Full Text]
-
Nystrom T, Neidhardt FC
(1992)
Cloning, mapping and nucleotide sequencing of a gene encoding a universal stress protein in Escherichia coli.
Mol Microbiol
6: 3187-3198[CrossRef][Medline]
-
Nystrom T, Neidhardt FC
(1993)
Isolation and properties of a mutant of Escherichia coli with an insertional inactivation of the uspA gene, which encodes a universal stress protein.
J Bacteriol
175: 3949-3956[Abstract/Free Full Text]
-
Nystrom T, Neidhardt FC
(1994)
Expression and role of the universal stress protein, UspA, of Escherichia coli during growth arrest.
Mol Microbiol
11: 537-544[CrossRef][Web of Science][Medline]
-
Saitou N, Nei M
(1987)
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Mol Biol Evol
4: 406-425[Abstract]
-
Sessa G, Martin GB
(2000)
Protein kinases in the plant defense response.
Adv Bot Res
32: 379-398
-
Sousa MC, McKay DB
(2001)
Structure of the universal stress protein of Haemophilus influenzae.
Structure
9: 1135-1141[Medline]
-
Tao H, Bausch C, Richmond C, Blattner FR, Conway T
(1999)
Functional genomics: expression analysis of Escherichia coli growing on minimal and rich media.
J Bacteriol
181: 6425-6440[Abstract/Free Full Text]
-
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG
(1997)
The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
Nucleic Acids Res
25: 4876-4882[Abstract/Free Full Text]
-
Wolf E, Kim PS, Berger B
(1997)
MultiCoil: a program for predicting two- and three-stranded coiled coils.
Protein Sci
6: 1179-1189[Web of Science][Medline]
-
Yamaguchi-Shinozaki K, Koizumi M, Urao S, Shinozaki K
(1992)
Molecular cloning and characterization of 9 cDNAs for genes that are responsive to desiccation in Arabidopsis thaliana: sequence analysis of one cDNA clone that encodes a putative transmembrane channel protein.
Plant Cell Physiol
33: 217-224[Abstract/Free Full Text]
-
Zarembinski TI, Hung L-W, Mueller-Dieckmann H-J, Kim K-K, Yokota H, Kim R, Kim S-H
(1998)
Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics.
Proc Natl Acad Sci USA
95: 15189-15193[Abstract/Free Full Text]
-
Zegzouti H, Jones B, Frasse P, Marty C, Maitre B, Latche A, Pech J-C, Bouzayen M
(1999)
Ethylene-regulated gene expression in tomato fruit: characterization of novel ethylene-responsive and ripening-related genes isolated by differential display.
Plant J
18: 589-600[CrossRef][Web of Science][Medline]
© 2003 American Society of Plant Biologists
This article has been cited by other articles:

|
 |

|
 |
 
G. Merkouropoulos, E. Andreasson, D. Hess, T. Boller, and S. C. Peck
An Arabidopsis Protein Phosphorylated in Response to Microbial Elicitation, AtPHOS32, Is a Substrate of MAP Kinases 3 and 6
J. Biol. Chem.,
April 18, 2008;
283(16):
10493 - 10499.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. L. Nielsen, A. L. Hogh, and J. Emmersen
DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples
Nucleic Acids Res.,
November 14, 2006;
34(19):
e133 - e133.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|
|