|
|
||||||||
|
Plant Physiology 135:783-800 (2004) © 2004 American Society of Plant Biologists Specification of the Peroxisome Targeting Signals Type 1 and Type 2 of Plant Peroxisomes by Bioinformatics Analyses1,[w]Albrecht-von-Haller-Institute for Plant Sciences, Department for Plant Biochemistry, D37077 Goettingen, Germany
To specify the C-terminal peroxisome targeting signal type 1 (PTS1) and the N-terminal PTS2 for higher plants, a maximum number of plant cDNAs and expressed sequence tags that are homologous to PTS1- and PTS2-targeted plant proteins was retrieved from the public databases and the primary structure of their targeting domains was analyzed for conserved properties. According to their high overall frequency in the homologs and their widespread occurence in different orthologous groups, nine major PTS1 tripeptides ([SA][RK][LM]> without AKM> plus SRI> and PRL>) and two major PTS2 nonapeptides (R[LI]x5HL) were defined that are considered good indicators for peroxisomal localization if present in unknown proteins. A lower but significant number of homologs contained 1 of 11 minor PTS1 tripeptides or of 9 minor PTS2 nonapeptides, many of which have not been identified before in plant peroxisomal proteins. The region adjacent to the PTS peptides was characterized by specific conserved properties as well, such as a pronounced incidence of basic and Pro residues and a high positive net charge, which probably play an auxiliary role in peroxisomal targeting. By contrast, several peptides with assumed peroxisomal targeting properties were not found in any of the 550 homologs and hence playif at allonly a minor role in peroxisomal targeting. Based on the definition of these major and minor PTS and on the recognition of additional conserved properties, the accuracy of predicting peroxisomal proteins can be raised and plant genomes can be screened for novel proteins of peroxisomes more successfully.
Peroxisomes are ubiquitous small cell organelles that are involved in a variety of oxidative metabolic processes. Plant peroxisomes participate in recycling of P-glycolate produced by the oxygenase activity of Rubisco during photorespiration and they are the site of fatty acid -oxidation. Some enzymes of the main metabolic pathways of peroxisomes have been cloned only recently, such as the photorespiratory enzymes Ala-glyoxylate aminotransferase (AGT) and Glu-glyoxylate aminotransferase (GGT; Liepman and Olsen, 2001
Peroxisomal proteins are encoded in the nucleus and synthesized in the cytoplasm on free ribosomes with targeting signals that specify their delivery to peroxisomes (Lazarow and Fujiki, 1985
Experimental studies revealed a functional degeneracy of the PTS1 motif: a small uncharged residue at position 3, a basic residue at position 2, and a nonpolar residue at position 1 ([SAC][KRH]L>; Gould et al., 1989
The PTS2 of mammalian and yeast proteins has been defined as a nonapeptide with four conserved amino acid residues separated by five variable amino acid residues ([RK][LVI]x5[HQ][LA]; Swinkels et al., 1991
Prediction of the subcellular localization of unknown proteins is a challenge of the postgenomic era but requires targeting motifs of high specificity to maximize the number of true positives and minimize that of false positives. Major difficulties in predicting PTS1-targeted peroxisomal proteins are (1) the small size of the signal, (2) the missing cleavage site that would provide additional diagnostic information, (3) the lower hierarchy of the PTS1 as compared to N-terminal signals (Neuberger et al., 2003b
We aimed to identify novel genes encoding putative peroxisomal proteins in the Arabidopsis genome (Arabidopsis Genome Initiative, 2000
Identification of Sequences Homologous to PTS1- and PTS2-Targeted Proteins
Thanks to large-scale sequencing projects of whole plant genomes (Arabidopsis, Oryza sativa) and ESTs (as of August 2003, 27 plant species with >20,000 ESTs each) the large number of DNA sequences in the public databases provides an enormous amount of biological information that can possibly be used to specify the signals that are indicative of targeting proteins to plant peroxisomes. According to our current knowledge, the targeting mechanism of specific peroxisomal proteins is conserved within the plant kingdom. No peroxisomal protein has been reported that contains a PTS1 in one plant species while carrying a PTS2 or another peroxisomal targeting signal in an ortholog in another plant species or vice versa. Thus, if a specific plant protein is targeted to peroxisomes by a PTS1 (or PTS2), all plant orthologs of this protein are expected to possess a PTS1 (or PTS2). About two-thirds of the currently known matrix proteins from plant peroxisomes contain a C-terminal canonical tripeptide of the PTS1 motif defined by Hayashi et al. (1997)
The starting point of the database search for homologs of PTS1/2-targeted peroxisomal proteins was a retrieval of the Arabidopsis orthologs of all PTS1/2-targeted plant proteins from the protein database (Fig. 1). The genes that were cloned from Arabidopsis were supplemented by the Arabidopsis orthologs of known PTS1/2-targeted matrix proteins, which were identified by sequence similarity. In case of unusually large gene families in Arabidopsis, such as that of the glycolate oxidase (GOX)-related proteins comprising five genes (Reumann, 2002
Bioinformatics Specification of the PTS1 Motif for Higher Plants In total, 391 sequences were retrieved that are homologous to PTS1-targeted peroxisomal matrix proteins and derive from various higher plant species. Of these sequences, 73 represented full-length homologs and 318 homologous ESTs (81%). About one-fourth of the sequences were derived from monocotyledons and three-quarters from dicotyledons and in total from about 80 different plant species. The total number of different C-terminal tripeptides was 39, one-half of which were found several times and represented 95% of the sequences. The remaining 19 tripeptides were not found in any other homologous sequence. Considering the large number of possible tripeptides, these 19 sequences with unique C-terminal tripeptides (5%) are estimated to represent the maximum number of false positives, i.e. homologs corresponding to nonperoxisomal isoforms, incorrectly annotated genes, or incorrectly sequenced ESTs. It can be concluded that the criteria to select the ESTs were reasonably chosen and that the statistical analysis was not considerably disturbed by sequencing errors or an unwanted extraction of sequences derived from nonperoxisomal isoforms.
Within each orthologous group, on average 30.1 ± 14.5 homologs from different plant species were found, reflecting the high sensitivity of the homology analysis as well as the fact that most known proteins of plant peroxisomes are abundant enzymes of primary metabolism, the cDNAs of which are represented in many EST collections. The homologs of each enzyme contained on average 6.5 ± 2.5 different C-terminal tripeptides (all sequences included) and at least 5.2 ± 1.9 different true PTS1 tripeptides (sequences with unique tripeptides excluded). This number indicates that the sequence of the PTS1 is medium conserved within an orthologous group and that a considerable number of different PTS1 tripeptides are produced if summarized over all 13 orthologous groups. The large majority of PTS1 sequences of an orthologous group contained a canonical C-terminal tripeptide included in the plant-specific PTS1 motif defined by Hayashi (93% ± 4%, SOX homologs excluded, [SACP][KR][LMI]>; Hayashi et al., 1997
The possible 24 tripeptides of the conservative Hayashi motif ([SACP][KR][LMI]>) turned out to differ largely in the frequency at which they occur in the homologs of PTS1-targeted proteins. Nine major PTS1 tripeptides can be defined, each of them comprising at least 10 ESTs and occurring in at least 3 different orthologous groups (Fig. 3). They are headed by far by SRL> (91 sequences, 23.3%), which accordingly represents the prototypical plant PTS1, followed by SRM> and SKL>, PRL>, ARL>, and SRI>, and finally AKL>, SKM>, and ARM>. These major PTS1 tripeptides ([SA][RK][LM]> without AKM> plus SRI> and PRL>) are thought to represent the most reliable indicators for peroxisomal localization and covered altogether 84% of the homologous sequences of PTS1-targeted proteins (Fig. 3).
A smaller number of sequences contained any of the tripeptides PKL>, PRM>, SKI>, CKL>, or CRL>, most of which were found in ESTs and have not been reported in plant peroxisomal matrix proteins before. These tripeptides are defined as minor PTS1 tripeptides and considered functional but low-abundant PTS1. If only a single sequence with a particular C-terminal tripeptide per orthologous group is considered, the overall result is similar even though the frequency of some tripeptides is lower (e.g. SRL>, SRM>, SRI>, SKL>, and PRL>), indicating their dominance in particular orthologous groups (Fig. 2), whereas that of others is higher (e.g. SKI>, PRM>, and CRL>), reflecting the fact that these tripeptides occur in a number of orthologous groups above average. The tripeptides AKI>, CRM>, and CRI> were found in single sequences and probably also represent low-abundant variations of the plant PTS1. By contrast, no homologs of PTS1-targeted proteins were detected that contained any of the remaining tripeptides included in the most conservative pattern of the plant-specific PTS1 (ARI>, AKM>, PRI>, PKM>, PKI>, CKM>, and CKI>; Hayashi et al., 1997 If particular amino acid combinations are by far more widespread as PTS1 tripeptides than other closely related combinations, the question arises whether rules can be deduced about which combinations yield a functional PTS1. The frequency of each amino acid residue at its specific position within the tripeptide was calculated and the amino acids classified accordingly (Fig. 4A). Highly abundant amino acids in PTS1 are S, R, and L of the tripeptide SRL>, medium abundant K and M, and the remaining low abundant (Fig. 4A). The amino acids differed also in the number of different PTS1 tripeptides in which they occurred. The residues S, R, and L were found in a large number of different tripeptides (911) and seem to reveal a broad tolerance with respect to the residues at the remaining positions, whereas other amino acids, such as P, C, N, and I, were restricted to a small number of very specific tripeptides and mostly to combinations with two residues of the motif S[RK]L> (Fig. 4B). Combinations exclusively of amino acids none of which belongs to the prototypical plant PTS1 SRL> were not found in any plant protein (e.g. A, P, and C at position 3; K, N, M, and S at position 2; and M, I, and V at position 1).
Conserved Properties of the PTS1 Domain Sequences upstream of the PTS1 may have an auxiliary function as accessory elements in targeting proteins to peroxisomes and can provide additional indications for peroxisomal localization of unknown proteins. To investigate whether PTS1-targeted proteins contain conserved sequences upstream of the PTS1 tripeptide, the C-terminal 18 amino acids of the PTS1 targeting domains were analyzed successively in groups of tripeptides both per orthologous group (data not shown) and per group of identical PTS1 tripeptides (Fig. 5). The content of basic residues, R and K, was found to increase significantly from an average value in the core protein (about 10%, position 18 to 7) to 24% in the tripeptide in front of the PTS1 and to 32% in the PTS1 tripeptide itself (Fig. 5A). Thus, most PTS1-targeted proteins carry a second basic residue closely in front of the PTS1 tripeptide. The rise in the content of basic residues is accompanied by a decrease in the content of acidic residues toward the C terminus (Fig. 5A). The uneven distribution of charged residues leads to an increasing positive net charge toward the C-terminal end with a total positive charge of the C-terminal 6-mer of 1.6 and results in a significant increase of the pI to 12 (Fig. 5, B and C). Because of the low SD, the pI in particular is a useful additional criterion to confirm the postulated peroxisomal localization of unknown proteins. In addition the PTS1 domain is characterized by a high probability of P occurring in front of the PTS1 (Fig. 5D), showing that plant PTS1 proteins contain on average 0.70 ± 0.61 P residues within the 6-mer preceeding the PTS1 tripeptide.
For proteins with an unusually low positive charge directly in front of the PTS1 peptide, such as those carrying AKL> and SKL> (Fig. 6, A and B), the positive charge seems to be spread over a larger region of 9 to 12 residues upstream of the PTS1 tripeptide, leading to similar values for the entire domain compared with the other PTS1 groups. In parallel, the position of P also is shifted further upstream (Fig. 6B). Closer inspection of sequence variations in the targeting domain within an orthologous group revealed that changes of the PTS from the prototypical targeting peptide SRL> or SKL> to low-abundant peptides of presumably weaker targeting efficiency (see "Discussion") are often accompanied by a simultaneous addition of further basic and/or P residues in close proximity to the PTS1 peptide (Fig. 7, AC). Overall, the second basic residue was mostly located at position 4 or at position 6. Only for proteins with M at position 1 of the PTS1 tripeptide, such as ARM>, PRM>, SKM>, and SRM>, which all contain a second basic residue in close proximity to the PTS1 tripeptide except for one-half of the GGT homologs, a preferential localization of the second basic residue at position 4 was found (Fig. 7D).
In light of targeting prediction an important question is whether different conserved properties function independently of each other or can compensate for each other. In case of an auxiliary function of specific amino acids in front of the PTS1 tripeptide in facilitating recognition by the cytosolic receptor Pex5, basic and P residues seem to be able to complement the accessory targeting function of each other. Most proteins with M at position 1 of the PTS1 peptide lack a P residue in front of the PTS1 tripeptide but are characterized by a high positive charge above average in the 3-mer in front of the PTS1 tripeptide (Fig. 6, C and D). Vice versa, proteins terminating with SRI>, most of which are AGT homologs, do not carry a pronounced positive net charge outside of the PTS1 tripeptide but contain about 2 P residues in the 6-mer adjacent to the PTS1 (Fig. 6, E and F). In summary, the PTS1 targeting domain comprises about 12 to 15 residues with characteristic properties, and the 3-mer directly in front of the PTS1 tripeptide is characterized by a positive net charge and a high P content. At least one of these properties seems to be required in plant PTS1 proteins to enhance targeting to peroxisomes.
About one-third of the proteins from plant peroxisomes contain an N-terminal PTS2 (Supplemental Table SI). Except for the peroxisomal Hsp70 homolog (Wimmer et al., 1997
Upon analysis of the neighboring regions of the PTS2 nonapeptide for conserved properties, the size of the PTS2 targeting domain was defined as a region of approximately 15 residues surrounding roughly symmetrically the PTS2 nonapeptide (position 3 to 12). Common features of the PTS2 domain are the following (Fig. 10): (1) a high incidence of R, (2) a low content of acidic residues (Fig. 10A), (3) a high positive charge of the targeting domain of on average 2.2 ± 0.7 (Fig. 10B), (4) a pronounced increase in pI, of which the low SD within the x5 region is most noteworthy (Fig. 10C), (5) a strict absence of P in front of the nonapeptide and within the x5 region, contrasting with a frequent presence immediately downstream of the PTS2, and (6) a high incidence of A, L, and V in front of the nonapeptide and within the x5 region (Fig. 10D). In addition, the sequences imply a conserved secondary structure of the PTS2 domain because in the large majority of sequences examined, the nonapeptide seemed to be located in the end of a short -helix (data not shown).
The large amount of biological information in the public databases opens the possibility to apply bioinformatics tools to build up novel hypotheses and to answer biological questions. The complete genome sequences of Arabidopsis and Oryza allow now the extraction of genes encoding unknown proteins with a putative PTS1 or PTS2. To increase the prediction accuracy of peroxisomal localization, an exact definition of the PTS is crucial. Experimental studies have provided valuable information on plant-specific PTS motifs but suffer from three important drawbacks. First, the number of experimentally analyzed PTS peptides is limited, and solid support for the assumption that the amino acids of different functional PTS can freely be combined is currently lacking. For instance, inclusion of G and T (position 3) in the permissive PTS1 motif is solely based on positive targeting results of three tripeptides (GRL>, GKL>, and TKL>; Mullen et al., 1997a
Bioinformatics analyses can possibly provide additional information to specify targeting motifs. Complete plant genome sequences provide novel essential information on the size of gene families, the subcellular localization of different isoforms, and on sequence similarity shared between orthologs and paralogs, all of which are prerequisite for an unambiguous identification of homologous sequences that correspond to specific isoforms in species lacking a sequenced genome. Especially kingdom-specific variations of targeting signals can be deduced from such bioinformatics analyses. The frequency at which PTS1 and PTS2 peptides occur in plant proteins reflects a close to final stage of the ongoing evolutionary optimization of targeting signals and reflects semiquantitatively the targeting efficiency of these peptides (see below). The large data set of PTS-targeted sequences also allows a study of the targeting signals within their native context and facilitates the detection of yet unrecognized accessory sequences with an auxiliary targeting function by sequence conservation. Critical to the specificity of targeting motifs deduced from such analyses is a reasonable compromise between the identification of a maximum number of homologous sequences and a minimum number of false positives. A major factor to increase the size of the data set to about 550 sequences (catalase homologs excluded) and thereby more than 10-fold as compared to the number of cloned genes was the use of EST databases, which contributed about 80% of the sequences. To prevent that the higher rate of sequencing errors of ESTs led to an erroneous identification of PTS1 tripeptides particularly because of their localization next to a stop codon, reasonable requirements for the selection of ESTs were chosen and all sequences manually inspected. In cases of a problematic differentiation between homologs corresponding to peroxisomal or nonperoxisomal isoforms due to the small length of ESTs, further bioinformatics tools were applied to identify the homologs of interest but three peroxisomal proteins had to be excluded beforehand.
The number of false PTS peptides determines the minimum number of sequences required to judge a particular tri- or nonapeptide a functional PTS and was estimated. Considering that the number of different C-terminal tripeptides is 8,000 and that the number of noncanonical tripeptides that can by created by single amino acid substitutions from canonical PTS1 is 448, it is not expected that 2 identical tripeptides can derive from canonical PTS1 by random point mutations within a total number of 39 different detected tripeptides and 23 sequences that contained a tripeptide found only once or twice. Therefore, those sequences containing unique targeting peptides are considered to represent the maximum number of false positives, which is relatively low (PTS1, 5%; PTS2, 0.6%). Because many of these unique peptides are either included in the conservative PTS1 motif defined by Hayashi et al. (1997)
A certain degree of sequence variation of the PTS within an orthologous group is required to allow the identification of PTS peptides rather independently of accessory elements specific to particular orthologous groups and to reach high coverage of all PTS1/2 peptides present in higher plants despite the relatively low number of input proteins as compared to the large size of the estimated peroxisomal proteome (Fukao et al., 2002
The high number of homologous sequences identified, the low number of false positives, and the medium degree of sequence variability of the PTS within orthologous groups allowed a statistically significant analysis of the frequency of particular PTS tripeptides in the plant kingdom. Regarding first the PTS1 proteins, not expected and most important is the result that only a small specific subset of PTS1 tripeptides is widespread in plant proteins and seems to constitute functional PTS peptides. Nine PTS1 tripeptides ([SA][RK][LM]> without AKM> plus SRI> and PRL>) have been defined as major PTS, of which PRL> and SRI> had shown only weak targeting efficiencies in experimental studies (Hayashi et al., 1997
From a position-specific quantitative analysis of the amino acids, it can be concluded which amino acid residues are tolerated in PTS1 peptides. Overall, the eleven amino acids of the restrictive PTS1 motif were indeed present in functional PTS ([CASP][KR][ILM]>; Hayashi et al., 1997
The strikingly different abundance of PTS1 peptides in plant proteins is interpreted from an evolutionary point of view in the following way: (1) a strong selection pressure has optimized the primary structure of PTS1 tripeptides with the result that the abundance of specific peptides in nature differs drastically nowadays, and (2) the current abundance of specific peptides in plant proteins correlates roughly with their targeting efficiency. In line with experimental results, highly abundant peptides, defined as major PTS peptides in this study, can be regarded as peptides with strong peroxisomal targeting properties (Kragler et al., 1998
The region upstream of the PTS1 tripeptide was also characterized by conserved properties. According to our analysis, the PTS1 targeting domain with characteristic and diagnostic properties comprises about 12 to 15 amino acids, similar to recent results for fungi and mammals (Neuberger et al. 2003a
Taken together, conserved properties upstream of the PTS1 tripeptide represent important information to further increase the prediction accuracy and shed new light on previous experimental results. The negative charge present upstream of the PTS1 in
As for the PTS1, only a specific subset of PTS2 nonapeptides seems to be widespread in the plant kingdom and to constitute functional PTS. Two major (R[LI]x5HL) and nine minor PTS2 nonapeptides were defined. In support of the conservative variant of the plant PTS2 motif (R[LIQ]x5HL, Kato et al., 1998
The PTS2 RQx5HL is unusual in several aspects. The nonapeptide was present in all plant thiolase homologs and exclusively present in this orthologous group. Moreover, to the best of our knowledge, Q (position 2) is absent in all known PTS2-targeted proteins from mammals and fungi including thiolase homologs. The fact that no stable single point mutation [LITMAV]
Strikingly similar to PTS1-targeted proteins, the PTS2 targeting domain was characterized by a high content of basic residues accompanied by a lack of acidic residues, resulting in a basic pI and a high positive charge. In this case, the positive net charge was concentrated upstream of the nonapeptide and within the x5 region, whereas P was predominantly present downstream of the PTS2. In addition, the PTS2 domain was significantly enriched in three hydrophobic residues, namely A, L, and V, and mostly forms a short
Application of the major PTS peptides to genome screens are expected to lead to the identification of a large number of interesting novel proteins from plant peroxisomes. By contrast, those peptides that have been included in experimentally determined plant PTS motifs but not been defined as functional PTS in this study are not recommended for any genome screens. For minor PTS peptides, true positives are expected, but an elevated number of nonperoxisomal proteins needs to be anticipated as well. It needs to be pointed out that the indicative value of peroxisomal targeting differs significantly within the group of minor PTS peptides. The tripeptide PKL>, for instance, is relatively close to the empirical threshold for peptides defined as major PTS peptides and close to the frequency of the major PTS1 ARM>, whereas other minor PTS1 tripeptides were rare and only found in single orthologous groups (e.g. GOX, SML>; SOX, SNL>, ANL>, SNM>, SSM>). In this regard, unknown structural properties of SOX are suspected to be responsible for the unusual tolerance of these noncanonical PTS1 peptides as compared to other PTS1-targeted proteins and need to be investigated experimentally. Taken together, targeting to peroxisomes should only be predicted if a minor PTS is detected in at least two to three different orthologous groups and present in combination with other conserved properties of peroxisomal targeting domains (Table I). As a general rule of thumb, peroxisomal targeting of atypical putative PTS1 can be predicted in the following way: First, the higher the abundance of each single amino acid of the tripeptide of interest in all PTS1 proteins examined in this study, the higher the probability that this tripeptide targets a protein to plant peroxisomes. Second, a low-abundant amino acid is only found in a functional tripeptide in the presence of one to two high-abundant amino acids with presumably high targeting properties. To improve targeting prediction, the amino acids of the plant-specific motif are now given in order of abundance and deduced targeting strength (PTS1, [SAPC][RKNMS][LMIV]>). For PTS2 proteins, R (position 1), L (position 2, 9), H (position 8), and, to a lesser degree, I (position 2, 9) represent residues of presumably strong targeting properties; at least three of these residues need to be present to constitute a functional PTS2 nonapeptide (R[LIQTMAV]x5H[LIF]).
Before starting time-consuming experiments, confirmation of the postulated peroxisomal localization of unknown proteins by further bioinformatics analyses is recommended. Postulated targeting to plant peroxisomes can sometimes be supported by results provided by new subcellular prediction software for peroxisomes despite their lack of plant-specific algorithms (PeroxiP; Emanuelsson et al., 2003
The number of plant genomes, the number and size of EST collections, andmost importantthe number of known proteins of plant peroxisomes will increase in the near future and affect the results provided in this study. Therefore, this definition of functional PTS peptides represents an intermediate result. Whereas the frequency and the strong indicative properties of major PTS are not expected to change significantly, new database entries will affect the frequency of minor and currently undetected PTS. The tripeptide CRM> found in a single sequence, for instance, was currently excluded but is expected to represent a rare but functional PTS1 (Kragler et al., 1998
In summary, the major and minor PTS deduced from this study are recommended for genome screens for novel peroxisomal matrix protein. In the Arabidopsis genome, about 170 proteins with a major PTS and 110 with a minor PTS are detected (S. Reumann, C. Ma, S. Lemke, and L. Babujee, unpublished data). We are currently setting up a database of Arabidopsis proteins with a putative PTS1 or PTS2 that will be presented in a forthcoming article and publicly available in the near future.
Selection of the Arabidopsis Proteins Suitable for the Bioinformatics Analysis
All peroxisomal sequences that were cloned from Arabidopsis and encode PTS1/2-targeted proteins were retrieved. The Arabidopsis orthologs of PTS1/2-targeted proteins that were only cloned from other plant species were identified based on sequence similarity using blastp (http://www.ncbi.nlm.nih.gov/BLAST/; nonredundant database; E value threshold: 10; matrix BLOSUM62, gap costs: existence: 11, extension: 1; Supplemental Table SI). Proteins were used for subsequent bioinformatics analyses if they fulfilled the following criteria: (1) localization of one protein of an orthologous group to plant peroxisomes by biochemical studies using native proteins or by in vivo localization studies using fusion proteins; (2) identification of the PTS by redirection of the protein with deleted PTS to the cytosol as compared to the full-length protein; and (3) possible differentiation between homologs corresponding to peroxisomal isoforms or to isoforms from other cell compartments by sequence similarity. With respect to the enzyme CHY1 (Zolman et al., 2001
To identify the plant homologs of known PTS1-targeted proteins from various plant species, the entire amino acid sequence of the Arabidopsis orthologs was blasted against the nonredundant database and the database of ESTs of GenBank at the National Center for Biotechnology Information using blastp and tblastn, respectively (http://www.ncbi.nlm.nih.gov/; E value threshold: 10; matrix BLOSUM62, gap costs: existence: 11, extension: 1; Supplemental Table SI). All blast searches were updated in August and September 2003, when the nonredundant database contained about 1,500,000 sequences and the database dbEst at NCBI the following number of sequences (for different species of the same genus, only the species with the largest EST collection is listed): 200,000 EST clones were selected for further analysis if they met the following requirements: (1) sufficient size (mostly longer than 150 bp), (2) sufficient sequence quality (no undefined nucleotides "x" in the sequence) and no stop codon within the C-terminal 20 amino acids (PTS1 homologs) or the N-terminal 50 amino acids (PTS2), (3) homology throughout the entire domain, and (4) identification as the homolog corresponding to the peroxisomal isoforms based on sequence similarity or analysis by multiple sequence alignments. Some ESTs were excluded because an abrupt reduction of sequence conservation in the C-terminal region suggested an upstream sequencing error leading to a frame shift. For the selected ESTs, the nucleotide sequence was then translated (http://www.expasy.org/tools/dna.html). Regarding the homologs of PTS1-targeted proteins, EST sequences were only included if the peptide that aligned with the C terminus of the open reading frame was followed by a stop codon (about 5 amino acids up- or downstream of the PTS1 of the query). Regarding the homologs of PTS2-targeted proteins, EST sequences were only included if the peptide that aligned with the N terminus of the peroxisomal protein was preceded by an M (about 20 amino acids upstream of the PTS2 of the query). The amino acid sequence of the C-terminal 20 amino acids (PTS1 proteins) or that of the N-terminal 50 amino acids (PTS2 proteins) was saved for later analysis of the targeting domain for conserved properties. For plants with large EST collections, several homologous cDNAs or ESTs from the same plant species or the same genus were present in the database that differed only slightly (e.g. Sorghum bicolor and S. propinquum, Triticum aestivum and T. monococcum, Oryza sativa and O. minuta). In an attempt to distinguish sequences with sequencing errors from sequences of closely related genes, an empirical threshold was chosen. Regarding homologs of PTS1-targeted proteins, additional sequences were excluded unless the C-terminal 20 amino acids differed by at least three amino acids (<90% sequence identity) compared to another homologous sequence from the same plant species or genus. The N-terminal 20 amino acids of PTS2-targeted proteins were handled analogously. To facilitate detection of these ESTs, homologous ESTs of the plant species or genus of interest were often used as query. For peroxisomal proteins that are encoded by several genes in Arabidopsis (e.g. ACX1, HIBCH, thiolase, citrate synthase, multi-functional protein, etc.), the application of this empirical rule resulted in a number of homologous sequences in large EST collections that was close to the size of the gene family in Arabidopsis.
For analysis of amino acid composition, charge, and pI of the targeting domain, the sequences were grouped according to their PTS peptide. Groups of canonical PTS1 tripeptides containing less than five sequences (CKL>, CRL>, SKI>) and the noncanonical tripeptides (ANL>, SML>, SNL>, SNM>, SRV>, and SSM>) were combined and analyzed together. Similarly, groups of PTS2 nonapeptides containing less than five sequences (RLx5HF, RLx5HI, RIx5HI, RAx5HL, RAx5HI, and RVx5HL) were analyzed together. For PTS1 sequences, the C-terminal 18 amino acids were analyzed in groups of three amino acids and provided with position-specific numbers relative to the PTS1 tripeptide (PTS1, position 1 to 3; 3-mer preceeding the PTS1, position 4 to 6). For PTS2 sequences, the residues of the nonapeptide were numbered according to the traditional system (PTS2, position 1 to 9) and residues in front and behind the nonapeptide were provided with negative and positive numbers, respectively. The amino acid composition of PTS2 proteins was analyzed in two groups of three amino acids in front of the nonapeptide (position 6 to 4 and position 3 to 1), the first two conserved residues of the nonapeptide (Rx, position 1, 2), the five residues of the unspecific x5 region (position 3 to 7), the other two conserved residues of the nonapeptide (Hx, position 8, 9), as well as two groups of three amino acids behind the nonapeptide (position 10 to 12 and position 13 to 15). For these groups, the amino acid composition, the absolute content of basic and acidic residues, the charge (difference of basic and acidic residues), and the pI were calculated (http://us.expasy.org/tools/protparam.html) and mean values and SD determined. Secondary structure was analyzed using PredictProtein (http://www.embl-heidelberg.de/predictprotein/predictprotein.html).
I thank Martin Fulda, Ivo Feussner, Hans-Walter Heldt, and the students of my group for stimulating discussions and critical comments on the manuscript with special thanks to Katharina Pawlowski. Received November 2, 2003; returned for revision January 22, 2004; accepted January 22, 2004.
1 This work was supported by the Deutsche Forschungsgemeinschaft (Re1304/21).
[w] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.103.035584. * E-mail sreuman{at}gwdg.de; fax 49551395749.
Amery L, Brees C, Baes M, Setoyama C, Miura R, Mannaerts GP, Van Veldhoven PP (1998) C-terminal tripeptide Ser-Asn-Leu (SNL) of human D-aspartate oxidase is a functional peroxisome-targeting signal. Biochem J 336: 367371 Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline]
Cutler SR, Ehrhardt DW, Griffitts JS, Somerville CR (2000) Random GFP::cDNA fusions enable visualization of subcellular structures in cells of Arabidopsis at a high frequency. Proc Natl Acad Sci USA 97: 37183723 Dammai V, Subramani S (2001) The human peroxisomal targeting signal receptor, Pex5p, is translocated into the peroxisomal matrix and recycled to the cytosol. Cell 105: 187196[CrossRef][Web of Science][Medline] de Hoop MJ, Ab G (1992) Import of proteins into peroxisomes and other microbodies. Biochem J 286: 657669
del Rio LA, Corpas FJ, Sandalio LM, Palma JM, Gomez M, Barroso JB (2002) Reactive oxygen species, antioxidant systems and nitric oxide in peroxisomes. J Exp Bot 53: 12551272
Dyer JM, McNew JA, Goodman JM (1996) The sorting sequence of the peroxisomal integral membrane protein PMP47 is contained within a short hydrophilic loop. J Cell Biol 133: 269280
Eilers T, Schwarz G, Brinkmann H, Witt C, Richter T, Nieder J, Koch B, Hille R, Hansch R, Mendel RR (2001) Identification and biochemical characterization of Arabidopsis thaliana sulfite oxidase. A new player in plant sulfur metabolism. J Biol Chem 276: 4698946994
Elgersma Y, Vos A, van den Berg M, van Roermund CW, van der Sluijs P, Distel B, Tabak HF (1996) Analysis of the carboxyl-terminal peroxisomal targeting signal 1 in a homologous context in Saccharomyces cerevisiae. J Biol Chem 271: 2637526382 Emanuelsson O, Elofsson A, von Heijne G, Cristobal S (2003) In silico prediction of the peroxisomal proteome in fungi, plants and animals. J Mol Biol 330: 443456[CrossRef][Web of Science][Medline] Flynn CR, Mullen RT, Trelease RN (1998) Mutational analyses of a type 2 peroxisomal targeting signal that is capable of directing oligomeric protein import into tobacco BY-2 glyoxysomes. Plant J 16: 709720[CrossRef][Web of Science][Medline]
Froman BE, Edwards PC, Bursch AG, Dehesh K (2000) ACX3, a novel medium-chain acyl-coenzyme A oxidase from Arabidopsis. Plant Physiol 123: 733742
Fukao Y, Hayashi M, Hara-Nishimura I, Nishimura M (2003) Novel glyoxysomal protein kinase, GPK1, identified by proteomic analysis of glyoxysomes in etiolated cotyledons of Arabidopsis thaliana. Plant Cell Physiol 44: 10021012
Fukao Y, Hayashi M, Nishimura M (2002) Proteomic analysis of leaf peroxisomal proteins in greening cotyledons of Arabidopsis thaliana. Plant Cell Physiol 43: 689696 Fulda M, Shockey J, Werber M, Wolter FP, Heinz E (2002) Two long-chain acyl-CoA synthetases from Arabidopsis thaliana involved in peroxisomal fatty acid beta-oxidation. Plant J 32: 93103[CrossRef][Web of Science][Medline] Gatto GJ Jr, Maynard EL, Guerrerio AL, Geisbrecht BV, Gould SJ, Berg JM (2003) Correlating structure and affinity for PEX5:PTS1 complexes. Biochemistry 42: 16601666[CrossRef][Medline]
Gietl C, Faber KN, van der Klei IJ, Veenhuis M (1994) Mutational analysis of the N-terminal topogenic signal of watermelon glyoxysomal malate dehydrogenase using the heterologous host Hansenula polymorpha. Proc Natl Acad Sci USA 91: 31513155
Glover JR, Andrews DW, Subramani S, Rachubinski RA (1994) Mutagenesis of the amino targeting signal of Saccharomyces cerevisiae 3-ketoacyl-CoA thiolase reveals conserved amino acids required for import into peroxisomes in vivo. J Biol Chem 269: 75587563
Gould SJ, Keller GA, Hosken N, Wilkinson J, Subramani S (1989) A conserved tripeptide sorts proteins to peroxisomes. J Cell Biol 108: 16571664
Gould SG, Keller GA, Subramani S (1987) Identification of a peroxisomal targeting signal at the carboxy terminus of firefly luciferase. J Cell Biol 105: 29232931 Hansen H, Didion T, Thiemann A, Veenhuis M, Roggenkamp R (1992) Targeting sequences of the two major peroxisomal proteins in the methylotrophic yeast Hansenula polymorpha. Mol Gen Genet 235: 269278[CrossRef][Medline] Hayashi M, Aoki M, Kato A, Kondo M, Nishimura M (1996a) Transport of chimeric proteins that contain a carboxy-terminal targeting signal into plant microbodies. Plant J 10: 225234[CrossRef][Web of Science][Medline]
Hayashi M, Aoki M, Kato A, Nishimura M (1997) Changes in targeting efficiencies of proteins to plant microbodies caused by amino acid substitutions in the carboxyl-terminal tripeptide. Plant Cell Physiol 38: 759768
Hayashi H, De Bellis L, Ciurli A, Kondo M, Hayashi M, Nishimura M (1999) A novel acyl-CoA oxidase that can oxidize short-chain acyl-CoA in plant peroxisomes. J Biol Chem 274: 1271512721
Hayashi H, De Bellis L, Hayashi Y, Nito K, Kato A, Hayashi M, Hara-Nishimura I, Nishimura M (2002) Molecular characterization of an Arabidopsis acyl-coenzyme A synthetase localized on glyoxysomal membranes. Plant Physiol 130: 20192026
Hayashi M, Toriyama K, Kondo M, Nishimura M (1998) 2,4-Dichlorophenoxybutyric acid-resistant mutants of Arabidopsis have defects in glyoxysomal fatty acid beta-oxidation. Plant Cell 10: 183195 Hayashi M, Tsugeki R, Kondo M, Mori H, Nishimura M (1996b) Pumpkin hydroxypyruvate reductases with and without a putative C-terminal signal for targeting to microbodies may be produced by alternative splicing. Plant Mol Biol 30: 183189[CrossRef][Web of Science][Medline] Hooks MA, Kellas F, Graham IA (1999) Long-chain acyl-CoA oxidases of Arabidopsis. Plant J 20: 113[CrossRef][Web of Science][Medline] Igarashi D, Miwa T, Seki M, Kobayashi M, Kato T, Tabata S, Shinozaki K, Ohsumi C (2003) Identification of photorespiratory glutamate:glyoxylate aminotransferase (GGAT) gene in Arabidopsis. Plant J 33: 975987[CrossRef][Web of Science][Medline]
Jones JM, Morrel JC, Gould SJ (2001) Multiple distinct targeting signals in integral peroxisomal membrane proteins. J Cell Biol 153: 11411150 Kamigaki A, Mano S, Terauchi K, Nishi Y, Tachibe-Kinoshita Y, Nito K, Kondo M, Hayashi M, Nishimura M, Esaka M (2003) Identification of peroxisomal targeting signal of pumpkin catalase and the binding analysis with PTS1 receptor. Plant J 33: 161175[CrossRef][Web of Science][Medline] Karpichev IV, Small GM (2000) Evidence for a novel pathway for the targeting of a Saccharomyces cerevisiae peroxisomal protein belonging to the isomerase/hydratase family. J Cell Sci 113: 533544[Abstract] Kato A, Hayashi M, Kondo M, Nishimura M (1996) Targeting and processing of a chimeric protein with the N-terminal presequence of the precursor to glyoxysomal citrate synthase. Plant Cell 8: 16011611[Abstract]
Kato A, Takeda-Yoshikawa Y, Hayashi M, Kondo M, Hara-Nishimura I, Nishimura M (1998) Glyoxysomal malate dehydrogenase in pumpkin: cloning of a cDNA and functional analysis of its presequence. Plant Cell Physiol 39: 186195
Klein AT, van den Berg M, Bottger G, Tabak HF, Distel B (2002) Saccharomyces cerevisiae acyl-CoA oxidase follows a novel, non-PTS1, import pathway into peroxisomes that is dependent on Pex5p. J Biol Chem 277: 2501125019
Kliebenstein DJ, Monde RA, Last RL (1998) Superoxide dismutase in Arabidopsis: an eclectic enzyme family with disparate regulation and protein localization. Plant Physiol 118: 637650
Kragler F, Lametschwandtner G, Christmann J, Hartig A, Harada JJ (1998) Identification and analysis of the plant peroxisomal targeting signal 1 receptor NtPEX5. Proc Natl Acad Sci USA 95: 1333613341 Kunze M, Kragler F, Binder M, Hartig A, Gurvitz A (2002) Targeting of malate synthase 1 to the peroxisomes of Saccharomyces cerevisiae cells depends on growth on oleic acid medium. Eur J Biochem 269: 915922[Medline]
Lametschwandtner G, Brocard C, Fransen M, Van Veldhoven P, Berger J, Hartig A (1998) The difference in recognition of terminal tripeptides as peroxisomal targeting signal 1 between yeast and human is due to different affinities of their receptor Pex5p to the cognate signal and to residues adjacent to it. J Biol Chem 273: 3363533643 Lazarow PB, Fujiki Y (1985) Biogenesis of peroxisomes. Annu Rev Cell Biol 1: 489530[CrossRef][Web of Science][Medline] Liepman AH, Olsen LJ (2001) Peroxisomal alanine:glyoxylate aminotransferase (AGT1) is a photorespiratory enzyme with multiple substrates in Arabidopsis thaliana. Plant J 25: 487498[CrossRef][Web of Science][Medline]
Liepman AH, Olsen LJ (2003) Alanine aminotransferase homologs catalyze the glutamate:glyoxylate aminotransferase reaction in peroxisomes of Arabidopsis. Plant Physiol 131: 215227 Lopez-Huertas E, Charlton WL, Johnson B, Graham IA, Baker A (2000) Stress induces peroxisome biogenesis genes. EMBO J 19: 67706777[CrossRef][Web of Science][Medline] Marzioch M, Erdmann R, Veenhuis M, Kunau WH (1994) PAS7 encodes a novel yeast member of the WD-40 protein family essential for import of 3-oxoacyl-CoA thiolase, a PTS2-containing protein, into peroxisomes. EMBO J 13: 49084918[Web of Science][Medline]
Mekhedov S, de Ilarduya OM, Ohlrogge J (2000) Toward a functional catalog of the plant genome. A survey of genes for lipid biosynthesis. Plant Physiol 122: 389402 Mullen RT, Lee MS, Flynn CR, Trelease RN (1997a) Diverse amino acid residues function within the type 1 peroxisomal targeting signal. Implications for the role of accessory residues upstream of the type 1 peroxisomal targeting signal. Plant Physiol 115: 881889[Abstract] Mullen RT, Lee MS, Trelease RN (1997b) Identification of the peroxisomal targeting signal for cottonseed catalase. Plant J 12: 313322[CrossRef][Web of Science][Medline]
Murphy MA, Phillipson BA, Baker A, Mullen RT (2003) Characterization of the targeting signal of the Arabidopsis 22-kD integral peroxisomal membrane protein. Plant Physiol 133: 813828
Nakamura T, Meyer C, Sano H (2002) Molecular cloning and characterization of plant genes encoding novel peroxisomal molybdoenzymes of the sulphite oxidase family. J Exp Bot 53: 18331836 Nakamura T, Yokota S, Muramoto Y, Tsutsui K, Oguri Y, Fukui K, Takabe T (1997) Expression of a betaine aldehyde dehydrogenase gene in rice, a glycinebetaine nonaccumulator, and possible localization of its protein in peroxisomes. Plant J 11: 11151120[CrossRef][Web of Science][Medline] Neuberger G, Maurer-Stroh S, Eisenhaber B, Hartig A, Eisenhaber F (2003a) Motif refinement of the peroxisomal targeting signal 1 and evaluation of taxon-specific differences. J Mol Biol 328: 567579[CrossRef][Web of Science][Medline] Neuberger G, Maurer-Stroh S, Eisenhaber B, Hartig A, Eisenhaber F (2003b) Prediction of peroxisomal targeting signal 1 containing proteins from amino acid sequence. J Mol Biol 328: 581592[CrossRef][Web of Science][Medline] Osumi T, Tsukamoto T, Hata S (1992) Signal peptide for peroxisomal targeting: replacement of an essential histidine residue by certain amino acids converts the amino-terminal presequence of peroxisomal 3-ketoacyl-CoA thiolase to a mitochondrial signal peptide. Biochem Biophys Res Commun 186: 811818[CrossRef][Web of Science][Medline] Osumi T, Tsukamoto T, Hata S, Yokota S, Miura S, Fujiki Y, Hijikata M, Miyazawa S, Hashimoto T (1991) Amino-terminal presequence of the precursor of peroxisomal 3-ketoacyl-CoA thiolase is a cleavable signal peptide for peroxisomal targeting. Biochem Biophys Res Commun 181: 947954[CrossRef][Web of Science][Medline] Pause B, Saffrich R, Hunziker A, Ansorge W, Just WW (2000) Targeting of the 22 kDa integral peroxisomal membrane protein. FEBS Lett 471: 2328[CrossRef][Web of Science][Medline] Rehling P, Marzioch M, Niesen F, Wittke E, Veenhuis M, Kunau WH (1996) The import receptor for the peroxisomal targeting signal 2 (PTS2) in Saccharomyces cerevisiae is encoded by the PAS7 gene. EMBO J 15: 29012913[Web of Science][Medline] Reumann S (2002) The photorespiratory pathway of leaf peroxisomes. In A Baker, IA Graham, eds, Plant Peroxisomes: Biochemistry, Cell Biology and Biotechnological Applications, Ed 1. Kluwer Academic Publishers, Dordrecht, the Netherlands, pp 141189
Sacksteder KA, Jones JM, South ST, Li X, Liu Y, Gould SJ (2000) PEX19 binds multiple peroxisomal membrane proteins, is predominantly cytoplasmic, and is required for peroxisome membrane synthesis. J Cell Biol 148: 931944
Sanders PM, Lee PY, Biesgen C, Boone JD, Beals TP, Weiler EW, Goldberg RB (2000) The Arabidopsis DELAYED DEHISCENCE1 gene encodes an enzyme in the jasmonic acid synthesis pathway. Plant Cell 12: 10411061 Schwartz BW, Sloan JS, Becker WM (1991) Characterization of genes encoding hydroxypyruvate reductase in cucumber. Plant Mol Biol 17: 941947[CrossRef][Web of Science][Medline]
Shockey JM, Fulda MS, Browse JA (2002) Arabidopsis contains nine long-chain acyl-coenzyme A synthetase genes that participate in fatty acid and glycerolipid metabolism. Plant Physiol 129: 17101722 Sommer JM, Cheng QL, Keller GA, Wang CC (1992) In vivo import of firefly luciferase into the glycosomes of Trypanosoma brucei and mutational analysis of the C-terminal targeting signal. Mol Biol Cell 3: 749759[Abstract]
Stintzi A, Browse J (2000) The Arabidopsis male-sterile mutant, opr3, lacks the 12-oxophytodienoic acid reductase required for jasmonate synthesis. Proc Natl Acad Sci USA 97: 1062510630 Strassner J, Schaller F, Frick UB, Howe GA, Weiler EW, Amrhein N, Macheroux P, Schaller A (2002) Characterization and cDNA-microarray expression analysis of 12-oxophytodienoate reductases reveals differential roles for octadecanoid biosynthesis in the local versus the systemic wound response. Plant J 32: 585601[CrossRef][Web of Science][Medline] Subramani S (1993) Protein import into peroxisomes and biogenesis of the organelle. Annu Rev Cell Biol 9: 445478[CrossRef][Web of Science][Medline] Swinkels BW, Gould SJ, Bodnar AG, Rachubinski RA, Subramani S (1991) A novel, cleavable peroxisomal targeting signal at the amino-terminus of the rat 3-ketoacyl-CoA thiolase. EMBO J 10: 32553262[Web of Science][Medline] Swinkels BW, Gould SJ, Subramani S (1992) Targeting efficiencies of various permutations of the consensus C-terminal tripeptide peroxisomal targeting signal. FEBS Lett 305: 133136[CrossRef][Medline]
Van der Leij I, Franse MM, Elgersma Y, Distel B, Tabak HF (1993) PAS10 is a tetratricopeptide-repeat protein that is essential for the import of most matrix proteins into peroxisomes of Saccharomyces cerevisiae. Proc Natl Acad Sci USA 90: 1178211786
Wimmer B, Lottspeich F, van der Klei I, Veenhuis M, Gietl C (1997) The glyoxysomal and plastid molecular chaperones (70-kDa heat shock protein) of watermelon cotyledons are encoded by a single gene. Proc Natl Acad Sci USA 94: 1362413629 Wimmer C, Schmid M, Veenhuis M, Gietl C (1998) The plant PTS1 receptor: similarities and differences to its human and yeast counterparts. Plant J 16: 453464[CrossRef][Web of Science][Medline]
Zolman BK, Monroe-Augustus M, Thompson B, Hawes JW, Krukenberg KA, Matsuda SPT, Bartel B (2001) chy1, an Arabidopsis mutant with impaired This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|