- Copyright © 2002 American Society of Plant Physiologists
Abstract
Glycosylphosphatidylinositol (GPI) anchoring of proteins provides a potential mechanism for targeting to the plant plasma membrane and cell wall. However, relatively few such proteins have been identified. Here, we develop a procedure for database analysis to identify GPI-anchored proteins (GAP) based on their possession of common features. In a comprehensive search of the annotated Arabidopsis genome, we identified 167 novel putative GAP in addition to the 43 previously described candidates. Many of these 210 proteins show similarity to characterized cell surface proteins. The predicted GAP include homologs of β-1,3-glucanases (16), metallo- and aspartyl proteases (13), glycerophosphodiesterases (6), phytocyanins (25), multi-copper oxidases (2), extensins (6), plasma membrane receptors (19), and lipid-transfer-proteins (18). Classical arabinogalactan (AG) proteins (13), AG peptides (9), fasciclin-like proteins (20), COBRA and 10 homologs, and novel potential signaling peptides that we name GAPEPs (8) were also identified. A further 34 proteins of unknown function were predicted to be GPI anchored. A surprising finding was that over 40% of the proteins identified here have probable AG glycosylation modules, suggesting that AG glycosylation of cell surface proteins is widespread. This analysis shows that GPI anchoring is likely to be a major modification in plants that is used to target a specific subset of proteins to the cell surface for extracellular matrix remodeling and signaling.
Conventional membrane proteins possess one or more transmembrane domains that traverse the hydrophobic lipid bilayer. A glycosylphosphatidylinositol (GPI) anchor is an alternative means of attaching a protein to the membrane (for review, see Udenfriend and Kodukula, 1995), and it is found in all eukaryotic organisms. The C terminus of a GPI-anchored protein (GAP) is covalently attached via phosphoethanolamine and a conserved glycan to phosphatidylinositol or a ceramide (Kinoshita and Inoue, 2000). GAPs have many features that distinguish them from proteins with transmembrane domains. The anchor can be removed by the action of specific phospholipases (Griffith and Ryan, 1999), converting the protein into a water-soluble form. In many organisms, they are found specifically at the outer leaflet of the plasma membrane (PM). It is important that GPI anchoring can target proteins to the PM in a polarized or localized manner, for example to the apical membrane of polarized mammalian epithelial cells (Le Gall et al., 1995) or the axon of neurons (Brown et al., 2000). As part of this targeting mechanism, GAPs are thought to associate with lipid rafts (Muniz and Riezman, 2000; Ikonen, 2001), and there is evidence for rafting of GAPs in plants (Peskan et al., 2000). Therefore, this class of proteins forms a potentially important group of molecules involved in plant cell surface generation and remodeling (Sherrier et al., 1999).
GAPs have only relatively recently been discovered in plants. Arabinogalactan (AG) proteins (AGPs), cell surface proteoglycans, are the best characterized class of such proteins (Youl et al., 1998; Oxley and Bacic, 1999; Sherrier et al., 1999; Svetek et al., 1999). Already, 12 classical AGPs, five AG peptides, and 17 fasciclin-like AGPs have been predicted to be GPI anchored in Arabidopsis (Gaspar et al., 2001). There is evidence for anchoring of a nitrate reductase in sugar beet (Beta vulgaris) and barley (Hordeum vulgare;Kunze et al., 1997), and a purple acid phosphatase in Spirodela oligorrhiza (Nakazato et al., 1998; Nishikoori et al., 2001). However, in general, the types of plant proteins that use this mode of membrane attachment are unknown. We have previously shown that there are abundant GAPs that are not AGPs on the surface of Arabidopsis callus cells (Sherrier et al., 1999), and tobacco (Nicotiana tabacum) protoplasts also display a range of GAPs at their surface (Takos et al., 1997). The few Arabidopsis proteins other than AGPs that have been proposed to be GPI anchored are COBRA (Schindelman et al., 2001), four blue-copper binding proteins (Nersissian et al., 1998), and four matrix metalloproteinases (MMPs; Maidment et al., 1999), yielding 43 candidates (including the AGPs, AG peptides, and fasciclin-like AGPs). All of these proteins are also predicted to be localized to the PM or cell wall. Moreover, polarized localization of certain AGPs (McCabe et al., 1997; Majewska-Sawka and Nothnagel, 2000) and the COBRA protein in root cells (Schindelman et al., 2001) has been reported.
GPI anchoring of proteins could serve several different functions. First, GAPs may serve as links between the PM and the cell wall. The GPI anchor would keep the protein attached to the membrane, while the carbohydrate moiety of proteins such as AGPs could interact with other cell wall components (Kohorn, 2000). Second, GPI anchoring may facilitate a fast switch between the two spatially close but distinct extracellular locations of the PM and the cell wall through cleavage by specific phospholipases. This cleavage could activate or deactivate a protein by separating it from substrates, ligands, or other interaction partners or by changing its conformation (Butikofer et al., 2001). Third, the released protein could become a structural component of the cell wall (Kohorn, 2000). The remnant of the anchor on the protein may serve to link it covalently to the cell wall, as is the case in yeast (Saccharomyces cerevisiae; Kollar et al., 1997). Fourth, the anchor may target the protein to a particular area of the PM underlying the cell wall where its function is needed. Fifth, GPI anchoring may cause cotargeting of otherwise unrelated proteins. Thus, proteins with complementary functions might become juxtaposed, increasing their working efficiency.
All known proteins that become GPI anchored have a cleavable N-terminal secretion signal for translocation into the endoplasmic reticulum. They also have a hydrophobic C terminus, which most likely forms a transient transmembrane domain. It is thought to function as a recognition signal for a transamidase, which cleaves the C-terminal hydrophobic propeptide and transfers the protein to a prefabricated GPI anchor (Kinoshita and Inoue, 2000). Certain sequence constraints exist for the cleavage site (the “ω-site”) and the residues surrounding it (for review, seeUdenfriend and Kodukula, 1995; Eisenhaber et al., 1998; Coussen et al., 2001). Therefore, it is possible to identify probable GAPs using these amino acid sequence features.
A computer-based algorithm for the identification of GAPs in metazoans and protozoans (“big-Pi”) based on the analysis of the C-terminal sequence has recently been described (Eisenhaber et al., 1999; Coussen et al., 2001). Screens based on the N-terminal and C-terminal signals have been effective for analyzing the yeast genome (Caro et al., 1997;Hamada et al., 1998, 1999). However, no method optimized for large-scale screening of plant sequences is available to date. Here, we develop a procedure that effectively identifies GAPs from Arabidopsis. The 210 proteins identified are likely to function at the PM-cell wall interface.
RESULTS AND DISCUSSION
There are no absolutely conserved sequence motifs that can be used to predict unambiguously the addition of a GPI anchor to a protein. Nevertheless, we found that the application of several simple sequence analysis tests was a highly selective procedure. All known GAPs from all organisms have an N-terminal signal sequence for targeting to the endoplasmic reticulum, a hydrophobic C-terminal sequence, and no internal transmembrane helices (TMHs; Udenfriend and Kodukula, 1995). Therefore, a search program termed “GPT” (GPI-anchoring Prediction Tool) was devised that performed three tests for the presence of these hydrophobic segments in primary amino acid sequences. The permissible location, length, and hydrophobicity of each of these segments could be independently varied. The possession of all three characteristics identified a potential GAP (see “Materials and Methods”). The list produced by GPT was refined using independent algorithms to confirm that each protein conformed to these expected characteristics. Only sequences that contained a potential ω cleavage site (Udenfriend and Kodukula, 1995) were included in the list of predicted GAP. Because there are few experimentally demonstrated GAP in plants, this procedure was first optimized and tested using known yeast GAPs (see “Materials and Methods”). Yeast was chosen because a large number of GAPs have already been identified (Hamada et al., 1998, 1999), and a recent study demonstrated some conservation of the machinery involved in synthesis of GAP between plants and yeast (Takos et al., 2000). The procedure correctly identified 93% of known yeast GAP with less than 2% obvious false positives (see “Materials and Methods”).
Analysis of Proteins Encoded in the Arabidopsis Genome
Analysis of the Arabidopsis protein database (March 29, 2001) of 29,337 partially redundant sequences using GPT identified 443 potential GAP. Refinement of the analysis by SignalP V2.0 (Nielsen et al., 1997) and TMHMM (Sonnhammer et al., 1998), confirming the presence of a signal sequence and absence of TMHs, reduced the number of candidate sequences to 263. After elimination of sequences without a potential C-terminal ω cleavage site, six of the remaining 202 candidates were removed because of redundancy. All 196 sequences obtained were sorted into classes based on homologies to other proteins. Just four sequences could be identified as false positives because they had known subcellular localizations other than the PM. A further four sequences were removed, as their prediction from genomic sequence was found to be incorrect. A renewed screen of the Arabidopsis protein database (dated August 8, 2001) resulted in the addition of four sequences. The list comprised 192 putative GAPs (Table I). Based on high sequence similarity to these proteins, 18 further sequences, which did not pass through all the filters of the screen, were judged likely to be GAP. They were added to the list and are shown in parentheses in Table I, increasing the total number of candidate GAP to 210. These sequences represent approximately 0.8% of the estimated 25,498 genes (The Arabidopsis Genome Initiative, 2000). A putative function could be ascribed to 176 of the proteins based on sequence homology or on previous reports.
Putative GPI-anchored proteins in Arabidopsis
It is likely that a few candidate GAPs will not have been identified because the annotation of the Arabidopsis genome, particularly of small genes, is incomplete. It is clearly crucial for this analysis that proteins are correctly annotated, particularly at the N and C termini. Furthermore, SignalP and TMHMM V1.0 have prediction accuracies of 85% (Nielsen et al., 1997) and 97% (Krogh et al., 2001), respectively, suggesting that a few proteins may have been wrongly excluded. Nevertheless, we believe the proteins presented in Table I represent all the major types of GAP in Arabidopsis.
Predicted Families of GAPs
AGPs and AG Peptides (22)
AGPs and AG peptides are members of the superfamily of cell wall Hyp-rich glycoproteins (HRGPs; Knox, 1995; Kieliszewski, 2001), and are characteristically highly glycosylated. They are thought to have multiple functions in plant growth and development (Showalter, 2001). “Classical” AGPs consist largely of multiple Hyp-rich glycosylation modules (glycomodules; Kieliszewski and Lamport, 1994), whereas the recently described AG peptides possess a single such glycomodule, being just 10 to 15 amino acids in their predicted mature form (Schultz et al., 2000; Gaspar et al., 2001). Classical AGPs and AG peptides have been predicted to become GPI anchored, and anchoring of AtAGP10 from Arabidopsis and a variety of AGPs from other plants has already been verified experimentally (Youl et al., 1998; Oxley and Bacic, 1999;Sherrier et al., 1999; Svetek et al., 1999; Schultz et al., 2000).
The analysis identified all 12 classical AGPs and four of the five AG peptides previously reported (Gaspar et al., 2001). It is interesting that one further putative classical AGP (AAC63647.1) and four novel AG peptides (BAB09787.1, AAG21543.1, CAB41186.1, and AAF01556.1) were also identified. The identification of so many members of this family of proteins confirmed that the screen was very thorough and effective in identifying candidate GAPs. The single AG peptide AtAGP16 that was not recognized by GPT has an unusually long 9-residue hydrophilic region C-terminal to the potential hydrophobic anchor addition signal, but was included in Table I on the basis of close similarity to the other AG peptides and the previous prediction (Schultz et al., 2000). The potential significance of GPI anchoring of these proteins has been widely reviewed (Majewska-Sawka and Nothnagel, 2000; Schultz et al., 2000; Gaspar et al., 2001).
Extensin-Related Proteins (6)
Extensins, like AGPs, are members of the HRGP superfamily (Knox, 1995; Kieliszewski, 2001). They are characterized by long stretches of S(Hyp)4 motifs that become arabinosylated glycomodules. This analysis predicted six extensin-related GAPs with several such glycomodules.
Extensins are thought to have structural roles in the cell wall. Their physical properties are highly variable due to specific glycosylation, hydroxylation, and crosslinking with other cell wall components (Kieliszewski and Lamport, 1994; Knox, 1995). One of these six extensin-related GAPs (CAB87672.1) also has probable AG glycomodules. Chimeric proteins with AG glycomodules and extensin glycomodules have been reported previously (Lind et al., 1994; Schultz et al., 1997;Bosch et al., 2001). There are no previous reports of GAPs related to extensins.
Phytocyanin-Like Proteins (25)
Phytocyanins are a superfamily of blue mono-copper binding proteins (Nersissian et al., 1998). This study identified 13 phytocyanins and 12 phytocyanin-related early nodulin-like proteins. Together, they probably form the largest group of GAPs in Arabidopsis.
All known phytocyanins have secretion signals and are thought to be extracellular proteins. There are four subfamilies: stellacyanins, uclacyanins, mavicyanins, and plantacyanins. The studied stellacyanins and uclacyanins have an HRGP domain and a hydrophobic C terminus (Nersissian et al., 1998), and it has previously been speculated that they are GPI anchored (van Gysel et al., 1993; Nersissian et al., 1998). The processing of N and C termini of cucumber (Cucumis sativus) stellacyanin supports this prediction (Nersissian et al., 1996). The HRGP domain contains probable AG glycomodules (see discussion of AG glycomodules), facilitating cell wall interactions (Nersissian et al., 1998). Eight of the sequences identified in this study can be classified as uclacyanins, and five as stellacyanins based on the composition of their putative copper-binding sites (Nersissian et al., 1998). These include all four previously reported Arabidopsis phytocyanins (van Gysel et al., 1993; Nersissian et al., 1998). It is interesting that not all of the Arabidopsis uclacyanins and stellacyanins identified here contain AG glycomodules (TableI).
Mono-copper binding proteins are believed to function as mobile electron carriers in electron-transport systems, but the substrates of the phytocyanins are unknown. Their redox-reactive center is much more accessible than that of plastocyanins, and this makes the phytocyanins more likely to interact with low M rcompounds than with protein electron mediators (Hart et al., 1996;Nersissian et al., 1998). The current view on phytocyanins holds that they are redox-active proteins most likely involved in the primary defense response, oxidative stress, or in lignin formation at the cell surface (Drew and Gatehouse, 1994; Nersissian et al., 1998; Miller et al., 1999; Ezaki et al., 2000; Zhang et al., 2000). Copper ions have recently been implicated in hydroxyl radical-mediated cell wall remodeling (Fry, 1998; Fry et al., 2001). It is an intriguing possibility that the phytocyanin-like proteins identified here might be involved in this process.
Two members of a family of phytocyanin-related early nodulins, MtENOD16 and MtENOD20, have been reported in the model legume Medicago truncatula and were predicted to become GPI anchored (Greene et al., 1998). These phytocyanin-related early nodulins lack the metal-binding-site Cys, suggesting that these proteins do not bind copper (Greene et al., 1998). Of the 25 proteins identified in this study, 12 lack this Cys, and therefore belong to the early nodulin-like subgroup of phytocyanin-related proteins. Eight have probable AG glycomodules.
COBRA Family Proteins (11)
COBRA is important for root formation in Arabidopsis (Benfey et al., 1993; Schindelman et al., 2001). In this screen, COBRA and two additional sequences that show 64% and 71% amino acid identity to COBRA were identified. Three further proteins (probably incorrectly annotated) were included in Table I based on high sequence identity. A second family of five proteins with lower levels of similarity to COBRA were also identified. Three of these have probable AG glycomodules at three conserved positions.
Based on COBRA's primary sequence, the protein was predicted to become GPI anchored (Schindelman et al., 2001). Although anchoring has yet to be directly demonstrated, the targeting to the PM supports the prediction (Schindelman et al., 2001). In mutant COBRA plants, root cells expand radially rather than longitudinally, suggesting that COBRA is necessary for oriented cell expansion. The mutant roots also have decreased levels of cellulose, implying a role for COBRA in cell wall maintenance. It is possible that COBRA directs cellulose synthesis to specific areas of the cell surface (Schindelman et al., 2001).
Glycerophosphodiesterase-Like Proteins (6)
Five closely related proteins showed low levels of similarity to bacterial glycerophosphodiesterases (Porcella et al., 2000). A sixth probably incorrectly annotated protein was included in Table I based on high sequence identity. Proteomic analysis of callus GAP demonstrated that AtGPIP2 is one of these proteins (Sherrier et al., 1999; P. Dupree, M. Mann, and B. Kuester, unpublished data). This supports the prediction that these proteins are GPI anchored. A related PM glycerophosphodiesterase homolog from mammals has recently been implicated in lipid metabolism and signal transduction, but this protein is not GPI anchored (Zheng et al., 2000). Like many of the proteins found in the screen (see below), all of these glycerophosphodiesterase-like proteins contain probable AG glycomodules. It has already been shown that AtGPIP2 has AG glycosylation (Sherrier et al., 1999).
HIP-Like Proteins (3)
The screen identified two novel proteins showing distant homology to Hedgehog interacting protein (HIP; Chuang and McMahon, 1999). A third incorrectly annotated similar protein was included in Table I. Proteomic analysis indicated that one of these HIP-like proteins is AtGPIP1, the major GAP expressed in Arabidopsis callus (Sherrier et al., 1999; P. Dupree, M. Mann, and B. Kuester, unpublished data), confirming the prediction.
In various animals, Hedgehog is a diffusible protein morphogen that affects cell patterning and development (Zeng et al., 2001). HIP, a mouse PM protein, is a negative regulator of Hedgehog function, perhaps by sequestering free Hedgehog (Chuang and McMahon, 1999). There is no known homolog of Hedgehog in Arabidopsis. However, given the extensive homology, it seems plausible that AtGPIP1 may have developmental importance by interacting with an extracellular protein, and we are currently investigating this possibility.
Fasciclin-Like Proteins (20)
Two distinct families of proteins with one or two fasciclin-like (fas1) domains were identified. The larger family, whose members contain stretches of AG glycomodules alternating with fasciclin-like domains, has been called fasciclin-like AGPs (Schultz et al., 2000). In addition to the 14 proteins initially identified in the screen, a further four highly related sequences that do not have hydrophobic C termini were included in Table I because they are probably incorrectly annotated. One fasciclin-like protein has already been shown to possess AG glycosylation (AtAGP8; FLA8) (Schultz et al., 2000), and the existence of 16 homologs in the Arabidopsis genome has been reported (Gaspar et al., 2001). In addition to these sequences, we identified one further fasciclin-like AGP (AAF02137.1).
The two members of the second family of fasciclin-like proteins (AAD32933.1 and CAB45494.1) contain a single fas1 domain. They are only very distantly related to the fasciclin-like AGPs, with different residues conserved in the fas1 domain. Also, they do not contain AG glycomodules.
Fasciclins are developmentally important cell surface proteins in various organisms ranging from algae (Huber and Sumper, 1994) to humans (Kawamoto et al., 1998; Kim et al., 2000). In fruit fly (Drosophila melanogaster), they appear to play a major role in growth cone guidance (Grenningloh et al., 1991; Prokop, 1999). Some fruit fly fasciclins such as Fas1 are GPI anchored and have been shown to function in signaling via Tyr kinases (Elkins et al., 1990). Thus, the plant fasciclin-like proteins are good candidates for receptors or cell adhesion molecules (Schultz et al., 2000).
Cell Wall Hydrolytic Enzymes (17)
Fifteen of the identified sequences have significant sequence identity to known β-1,3-glucanases. A sixteenth protein with a weak ω cleavage site was included in Table I based on high sequence similarity. In addition, a polygalacturonase-like protein was identified. There are no previous reports of GPI-anchored cell wall hydrolytic enzymes from plants, but there are GPI-anchored cell wall-modifying glucanases in yeast (Caro et al., 1997; Hamada et al., 1998).
β-1,3-Glucanases form at least two large families in plants. One comprises extracellular and acidic enzymes, whereas members of the other family are vacuolar and basic (Van den Bulcke et al., 1989). All 16 predicted GPI-anchored glucanases have an acidic pI, consistent with an extracellular localization. The β-1,3-glucanases are “pathogenesis-related” proteins that are induced in response to pathogen attack (Vogeli-Lange et al., 1988; Stintzi et al., 1993; Beffa and Meins, 1996). They may break down β-1,3-glucan of fungal cell walls, which in turn can produce elicitors to trigger a complex defense response and increase resistance to infection (Klarzynski et al., 2000). β-1,3-glucanases have also been implicated in certain developmental processes. Plant β-1,3-glucan (callose) breakdown is considered crucial for pollen development (Bucciaglia and Smith, 1994), and callose is also synthesized and broken down at the forming cell plate (Samuels et al., 1995; Otegui and Staehelin, 2000). PM or cell wall-localized GPI-anchored β-1,3-glucanases would be ideally localized for site-specific remodeling of cell wall components. The GPI anchor may cause localized targeting of the enzymes to those parts of the cell where the restructuring takes place.
Proteases (13)
A total of 13 proteases were identified as putatively GPI anchored. They fall into two main subfamilies of aspartyl or metalloproteases, plus one Cys protease-like protein.
One family of seven members shows homology to a chloroplast nucleoid-binding aspartyl protease from tobacco (Nakano et al., 1997;Murakami et al., 2000). The homology is restricted to the protease domain of the chloroplast protein (Nakano et al., 1997), and the putative DNA-binding domain is absent. Two of the seven proteins have probable AG glycomodules, further supporting the predicted cell surface localization.
A second family consists of five MMPs (Maidment et al., 1999). At1-MMP, which has no consensus ω cleavage site, and At4-MMP, which failed the SignalP test, were included based on high sequence identity. A soybean (Glycine max) MMP appears to be cell wall associated (Pak et al., 1997), and it was previously suggested that At1-, At2-, At3-, and At5-MMPs might be GPI anchored (Maidment et al., 1999).
Extracellular proteases could be involved in the remodeling and degradation of plant extracellular matrix (ECM) proteins. MMPs have been studied in detail in animals (Massova et al., 1998; Nagase and Woessner, 1999), where they degrade proteins of the ECM and are thus involved in a multitude of physiological processes. Some mammalian MMPs favor Hyp-rich proteins such as collagen. The plant cell wall contains an abundance of Hyp-rich proteins such as HRGPs (Knox, 1995;Kieliszewski, 2001), and many of the proteins found in this screen contain HRGP domains.
LTPL Proteins (18)
Lipid transfer proteins (LTPs) are a group of extracellular proteins of many proposed functions (Kader, 1996, 1997). In this screen, 18 sequences with distant homology to Arabidopsis LTPs were identified. They belong to the family of LTP-like (LTPL) proteins (Kader, 1997). Nine of these proteins have probable AG glycomodules.
Like the LTPLs identified here, plant LTPs are thought to be secreted, as they have signal peptides. There are many reports confirming their extracellular or cell wall localization (Bernhard et al., 1991; Sterk et al., 1991; Segura et al., 1993; Thoma et al., 1993, 1994). The 140- to 204-amino acid LTPLs identified in this screen are somewhat larger than LTPs, which are less than 120 amino acids. Even though plant LTPs are divergent, they all have four conserved disulfide bonds that may increase extracellular protein stability. This arrangement of invariable Cys residues is conserved in the 18 LTPLs, suggesting that they have a similar fold as the LTPs and that they are also extracellular.
The function of LTPs is still somewhat unclear. Several studies have established their ability to bind lipids, and they are capable of nonspecific lipid transfer in vitro. They may function in the deposition of cutin monomers for the formation of extracellular waxes. Better established are the antipathogenic properties of certain LTPs. Furthermore, expression of LTPs can be induced by biotic and abiotic stress: fungal, bacterial, and viral infection, temperature, light, and osmotic stress (for review, see Kader, 1996, 1997). An LTP-like protein, stylar Cys-rich adhesin, has recently been implicated in pollen tube cell adhesion (Mollet et al., 2000; Park et al., 2000).
Bp10-Like (2)
Two of the putative GPI-anchored Arabidopsis proteins are highly similar to Bp10 from oilseed rape (Brassica napus; Albani et al., 1992). Bp10 is a pollen-specific gene with homology to ascorbate oxidases, which belong to the family of multi-copper oxidases. Ascorbate has been shown to be very important for the crosslinking of cell wall components and redox reactions on the plant cell surface during cell elongation (Cordoba and Gonzalez-Reyes, 1994). High ascorbate oxidase activity in the cell wall can be correlated with areas of rapid cell growth (Smirnoff and Wheeler, 2000). Ascorbate has also been suggested to function in scission of plant cell wall polysaccharides (Fry, 1998; Fry et al., 2001). Thus, the Bp10 homologs reported here might be involved in cell wall remodeling.
Receptor-Like Proteins (19)
A group of 19 proteins with homology to receptor-like kinases (RLKs) were identified. Plant Ser/Thr RLKs are characterized by an extracellular receptor domain, a single TMH, and a cytoplasmic kinase domain. They can be further classified according to the nature of their extracellular domain (for review, see McCarty and Chory, 2000). All sequences reported here are homologous only to the extracellular receptor-domains of plant RLKs. The putative GPI-anchoring signal is C-terminal to this receptor domain, and the proteins lack a cytosolic kinase domain. We propose that they are GPI-anchored receptors.
Eleven sequences with homology to plant Ser/Thr kinases were identified. They fall into four subfamilies. Six proteins (BAB11105.1, BAB11104.1, BAB11106.1, AAF19714.1, AAD12705.1, and AAF26777.1) have two DUF26 domains (pfam01657), which are common in plant Ser/Thr kinases. The first four of these are related to the receptor domain of RLK3 from Arabidopsis, which is induced in response to pathogens and oxidative stress (Czernic et al., 1999) and have a single probable AG glycomodule near the C terminus. The second subfamily comprises two sequences (CAA18495.1 and AAF79910.1) showing very high similarity to PRK5, an Arabidopsis receptor protein kinase whose receptor domain is related to the pathogenesis-related-5 family (Wang et al., 1996). One of these proteins (AAF79910.1) has a charge in the C-terminal hydrophobic domain. One sequence (AAF79572.1) has a lectin-related receptor domain; lectin-type Ser/Thr RLKs have been reported, but their function is unknown (Herve et al., 1999). The remaining two sequences (AAF07793.1 and AAB81674.1) show low levels of similarity to uncharacterized putative receptor kinases from Arabidopsis. Both have two LysM domains (smart00257), which are thought to bind peptidoglycan in bacteria (Bateman and Bycroft, 2000), but their function in eukaryotes is unknown.
The remaining eight sequences from this screen are homologous to the Cf-2/Cf-5 family of tomato (Lycopersicon esculentum) disease resistance proteins (Dickinson et al., 1993; Dixon et al., 1998) and to the meristem and organ development protein Clavata2 from Arabidopsis (Jeong et al., 1999). Both families are related to Leu-rich repeat RLKs, but have very short cytoplasmic tails and lack the kinase activity. Clavata2 has been suggested to function in heterodimeric receptors in conjunction with RLKs (Jeong et al., 1999). The same could be true for the putative GAP identified in this screen. It is interesting to note the variable number of Leu-rich repeats in the sequences reported here, ranging from nine to 26. A similar variability has been described within the Cf-5 family (Dixon et al., 1998), and may provide a means of altering the binding specificity of the receptor domain.
GAPEPs (8)
A family of five sequences with no detectable homology to known proteins was identified in the screen. In their unprocessed form, they are only 69 to 70 amino acids long, leaving a mere 17 to 18 amino acids after cleavage of the predicted signal peptides and hydrophobic C termini. Therefore, we propose the name GAPEP (GPI-anchored peptide) (AtGAPEP1–5, BAB08290.1, BAB08815.1, AAF14841.1, BAB09700.1, and CAB62355.1). This novel family has at least three more members that lack a consensus ω cleavage site (AtGAPEP6–8, BAA96979.1, AAF14840.1, and BAB09698.1). The GAPEPs have 65% sequence similarity in their predicted mature forms and are charged, with basic and acidic residues in conserved positions. A similarly small GAP, CD52, has been reported in mammals (Domagala and Kurpisz, 2001). We speculate that these peptides could be signaling molecules.
Other Miscellaneous GAPs (6)
The remaining functionally annotated proteins were the embryo-specific protein 3 (CAB87275.1) and a homolog (BAA97185.1); the auxin-induced protein AIR12 (AAF02148.1; Neuteboom et al., 1999); a protein related to the oilseed rape pollen-specific protein BNM1 (BAB10290.1; Treacy et al., 1997); a Bcp1-like protein, probably involved in plant male fertility (AAF97962.1; Xu et al., 1993, 1995); and a Gly-rich protein (CAB78588.1).
Unknown and Hypothetical Proteins (34)
Of the 210 sequences in Table I, 34 could not be assigned a function on the basis of homology to known proteins. There were 25 unknown proteins whose sequence has been predicted from genomic DNA, and whose expression has been confirmed by existence of at least one expressed sequence tag, and nine hypothetical proteins whose sequence has been predicted from genomic DNA, but without further evidence for their expression.
Anticipated and Unanticipated Families of GAP
All three Arabidopsis proteins that we could use as a positive control were correctly identified: AtGPIP1 and AtGPIP2 (Sherrier et al., 1999) and AtAGP10 (Schultz et al., 2000). In addition, all their correctly annotated homologs were identified. Other protein families that had previously been suggested to have GPI-anchored members were also identified: AG peptides (Schultz et al., 2000), fasciclin-like proteins (Gaspar et al., 2001), phytocyanins (Nersissian et al., 1998), MMPs (Maidment et al., 1999), and the COBRA protein (Schindelman et al., 2001). In addition, the number of putative GAPs from these families was enlarged by four AG peptides, one putative classical AGP, three fasciclin-like proteins, 21 phytocyanin-like proteins, and 10 COBRA-related proteins.
Hitherto, several families have had no reported GPI-anchored members in plants. However, the β-1,3-glucanases, aspartyl proteases, extensin-related proteins, LTPL proteins, and receptor-like proteins have homologs with known PM or extracellular localizations, compatible with the notion of GPI anchoring. Furthermore, known GAPs from yeast include glucanases and proteases (Caro et al., 1997; Hamada et al., 1998, 1999), supporting the prediction of GPI-anchored analogs in Arabidopsis.
It is interesting to note that not all secreted proteins have GPI-anchored relatives. Of all the cell wall hydrolases, only β-1,3-glucanases and one potential polygalacturonase were found. There were a few weak cellulase and pectin methylesterase candidates which we cannot exclude are GPI anchored (not shown). It is notable that there were no candidate xyloglucan endotransglycosylases (Fry et al., 1992; Campbell and Braam, 1999) or expansins (Cosgrove, 2000). Although some receptor-kinase like proteins appear to have GPI anchors, we found no putative wall-associated kinases (Kohorn, 2001). It is surprising that there were no homologs of nitrate reductase (Kunze et al., 1997) or purple acid phosphatase (Nakazato et al., 1998;Nishikoori et al., 2001). Homologs of a variety of GAPs of mammalian cells such as the folate receptor or prion proteins were also not found, probably reflecting the large differences between the extracellular environments of plants and animals.
AG Glycomodules Are Widespread in GAP
One of the most surprising features of this analysis is that over 40% of the proteins contain probable AG glycomodules. In addition to the 40 classical AGPs, AG peptides, and fasciclin-like AGPs that we identified, a further 49 candidates for AG glycosylation were found (bold in Table I). Thus, it appears that a significant proportion of GPI-anchored cell-surface proteins may become AG glycosylated.
AG glycosylated proteins possess one or more small Hyp-rich glycomodules (Shpak et al., 1999, 2001; Schultz et al., 2000;Kieliszewski, 2001). There are two different types of Hyp-rich glycomodules defined by the contiguity hypothesis (Kieliszewski and Lamport, 1994). First, AG glycomodules contain clustered noncontiguous Hyps that direct attachment of complex AG-type chains. Second, arabinosylated glycomodules contain clustered contiguous Hyps. This hypothesis has been corroborated by studies using synthetic oligopeptides, allowing reliable prediction of glycosylation of glycomodules (Shpak et al., 1999, 2001). Consistent with this idea, AG glycomodules based on a “PAPAP” motif are frequently found in Arabidopsis classical AGPs (Schultz et al., 2000). Moreover, all the AG peptides have a single such AG glycomodule (Schultz et al., 2000). The six GPI-anchored glycerophosphodiesterase-like proteins have just a few potential sites for AG glycosylation that loosely satisfy the criteria, yet at least one becomes AG glycosylated (Sherrier et al., 1999). The remaining 83 proteins identified here have AG peptide modules as defined by the contiguity hypothesis, often with perfect or imperfect PAPAP motifs or runs of four or more XP motifs, and therefore we believe that most of these are likely to become AG glycosylated.
AG glycosylation appears to be of major importance for several specific classes of putative GAPs. In particular, two-thirds of the phytocyanin-like proteins have probable AG glycomodules. Other families with a substantial proportion of putatively AG glycosylated members include the LTPLs, the receptor-like proteins, the COBRA-like proteins, and the glycerophosphodiesterase-like family. It is interesting that no glucanases, no HIP-like proteins, and only two proteases seem likely to become thus modified.
What could be the possible function of the glycosylation? It is becoming clear that AGPs are very important for cell development, implying they might be involved in signaling and adhesion (for review, see Majewska-Sawka and Nothnagel, 2000). The carbohydrate chains could physically link the proteins to the ECM (Kohorn, 2000) or could bind substrates and ligands. The latter seems plausible for the fasciclin-like and the receptor-like proteins. Alternatively, the carbohydrate moiety might conceal the C terminus of the protein, thus rendering it inaccessible to other proteins such as proteases. AG glycosylation would be particularly suitable as a protective shield because only a few Hyps suffice for the attachment of large carbohydrate side chains (Kieliszewski, 2001). In the case of the phytocyanins, the AGP domain might prevent contact between the reactive N terminus and components of the PM. It is also possible that just as glycosylation is increasingly being found to be important in protein sorting in mammalian cells (Benting et al., 1999; Hauri et al., 2000), AG glycosylation may provide a targeting signal that directs the proteins through the secretory pathway to a specific part of the cell surface.
PERSPECTIVE
Cell walls are highly dynamic structures. Cell elongation may require localized wall loosening and expansion, and other events like trichome development, root tip growth, pollen tube growth, and wounding require highly localized and temporally controlled remodeling events. Under these conditions, GPI anchoring may provide the mechanism necessary for specific wall component targeting. Thus, it is particularly apposite that our analysis revealed families of GAPs that are likely to function in ECM modification such as the proteases, glucanases, and COBRA proteins. Wall remodeling may also be the role of phytocyanins and the putative multi-copper oxidases homologous to Bp10.
The growing pollen tube is a paradigm for extreme cell polarization requiring specific protein targeting to the growing tip, and GPI anchoring may be especially important in pollen. Several of the sequences found in this screen are related to pollen-specific proteins: the Bp10-like proteins and the BNM1 protein. The major family of β-1,3-glucanases described here are related to pollen-specific β-1,3-glucanases (Huecas et al., 2001), and callose, a β-1,3-glucan, is an important polysaccharide in pollen walls (Rhee and Somerville, 1998). Furthermore, AGPs have also been shown to function in pollen growth (Roy et al., 1998; Bosch et al., 2001).
Our analysis supports a role of GPI anchoring in signal generation and reception. The peptides that appear to have a GPI anchor (GAPEPs) are too small to have an enzymatic activity, and we speculate that they could function in intercellular signaling. AG peptides are also potential signaling molecules (Schultz et al., 2000), and AGPs have already been implicated in signaling and development (Nothnagel, 1997;Majewska-Sawka and Nothnagel, 2000; Schultz et al., 2000). It is perhaps surprising to find numerous PM receptor-like proteins that have no cytosolic domains. However, there are many reports of GAPs participating in signal transduction across the PM in animals (Marmor and Julius, 2000; Saarma, 2000; Wang, 2001).
The present study is the most comprehensive analysis of Arabidopsis GAPs to date. Many of the identified proteins can be broadly categorized as being involved in signaling, adhesion, stress response, and cell wall remodeling, providing some insight into the purposes of GPI anchoring. The number and variety of proteins reported here establish the GPI anchor as a major protein modification and means of protein targeting in Arabidopsis, and more generally in plants.
MATERIALS AND METHODS
The yeast (Saccharomyces cerevisiae) protein database was retrieved from the ftp server for the Genome Databases Group (Department of Genetics, Stanford University School of Medicine; ftp://genome-ftp.stanford.edu/pub/yeast). The annotated Arabidopsis protein database was retrieved from The Arabidopsis Information Resource (ftp://tairpub:tairpub@ftp.Arabidopsis.org/home/tair/Sequences/blast_datasets).
In the first phase of the procedure, a Perl-based program, GPT, was used to screen the database for sequences of potential GAPs. The program performed three pass-or-fail checks for hydrophobic stretches in amino acid sequences. Only proteins with a putative N-terminal signal peptide, a hydrophobic C terminus, and no internal TMHs were identified as potentially GPI anchored. For all calculations of summed hydrophobicities, GPT used the thermodynamic Goldman-Engelman-Steitz hydropathy scale (Engelman et al., 1986).
Because there are few experimentally demonstrated GAP in plants, GPT was trained and benchmarked using a set of 35 known GAP from yeast (Hamada et al., 1998) to find the most stringent parameter settings for the hydrophobic segments that would allow the identification of all 35 proteins. A further 11 known yeast GAPs (Hamada et al., 1999) were used as jack-knife control. The most stringent parameters, which still allowed correct identification of all sequences in the training set, were determined as follows: signal peptide length of 10 amino acids, hydrophobicity of −14 kJ mol−1; hydrophobic signal length at C terminus of 12 amino acids, hydrophobicity of −18 kJ mol−1; TMH length of 17 amino acids, hydrophobicity of −34 kJ mol−1. The hydrophobic stretch of the putative signal peptide was restricted to the first 30 residues. The program was allowed to ignore up to five charged or hydrophilic residues at the extreme C terminus. With these optimized parameters, GPT generated a list of 159 potential GAP from the yeast protein database of 8,996 (partly redundant) sequences.
In the second phase of the procedure, the list produced by GPT was refined to remove false positives. Proteins were first analyzed with SignalP V2.0 (Nielsen et al., 1997), setting the cut-off toP ≥ 0.9 certainty (http://www.cbs. dtu.dk/services/SignalP-2.0/#submission). The sequences that passed through this screen were checked for TMHs using TMHMM (Sonnhammer et al., 1998;http://www.cbs.dtu.dk/services/TMHMM-1.0). The final test was a screen for the presence of a suitable cleavage site near the C terminus, employing the rules established by Udenfriend and Kodukula (1995). Proteins narrowly failing these tests but highly similar to proteins that passed the tests were not discarded. Duplicated sequences were removed.
In the yeast optimization, 34 of the 35 sequences initially used to define the GPT parameters and nine of the 11 jack-knife control sequences were correctly identified. In addition, 23 further candidates were identified. Fourteen of these have been reported in previous in silicio studies of yeast GAPs (Caro et al., 1997; Hamada et al., 1998). One was a false positive. The remaining eight sequences are candidates for novel yeast GAPs.
All sequence alignments and calculations of sequence identities were performed with Clustal W (Thompson et al., 1994) on the Network Protein Sequence Analysis server of the Pôle Bio-Informatique Lyonnaise (http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa-_clustalw.html). BLAST searches (Altschul et al., 1990, 1997) for homologous sequences from all organisms were performed on the Cambridge Biological Sciences server (BLAST 2.1;http://www.bio.cam.ac.uk/cgi-bin/blast2/blastallsrs.pl). BLAST searches of the Arabidopsis genome were performed using The Arabidopsis Information Resource BLAST 2.0 (http://www.Arabidopsis.org/Blast). Searches for conserved domains were performed using Reverse Position-Specific Blast (Altschul et al., 1997) on the National Center for Biotechnology Information server (http://www.ncbi.nlm.nih.gov/BLAST).
ACKNOWLEDGMENT
We thank William Matthews for his help in the initial stages of this project.
Footnotes
-
↵1 This work was supported by the Biotechnology and Biological Sciences Research Council and by the European Commission. G.H.H.B. also received a scholarship from the Studienstiftung des Deutschen Volkes.
-
↵2 Present address: Department of Plant and Soil Sciences and Delaware Biotechnology Institute, University of Delaware, Newark, DE 19717.
-
↵3 Present address: Hebrew University Jerusalem, Department of Biological Chemistry, Alexander Silberman Institute of Life Sciences, Givat Ram, IL–91904 Jerusalem, Israel.
-
↵* Corresponding author; e-mail p.dupree{at}bioc.cam.ac.uk; fax 44–1223–333345.
-
Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.010884.
- Received September 27, 2001.
- Revision received November 8, 2001.
- Accepted January 7, 2002.