First published online April 22, 2005; 10.1104/pp.104.056069
Plant Physiology 138:490-515 (2005)
© 2005 American Society of Plant Biologists
GENETICS, GENOMICS, AND MOLECULAR EVOLUTION
Genome-Based Examination of Chlorophyll and Carotenoid Biosynthesis in Chlamydomonas reinhardtii1,[w]
Martin Lohr2,*,
Chung-Soon Im2 and
Arthur R. Grossman
Institut für Allgemeine Botanik Johannes Gutenberg-Universität, 55099 Mainz, Germany (M.L.); and The Carnegie Institution Department of Plant Biology, Stanford, California 84305 (C.-S.I., A.R.G.)
 |
ABSTRACT
|
|---|
The unicellular green alga Chlamydomonas reinhardtii is a particularly important model organism for the study of photosynthesis since this alga can grow heterotrophically, and mutants in photosynthesis are therefore conditional rather than lethal. The recently developed tools for genomic analyses of this organism have allowed us to identify most of the genes required for chlorophyll and carotenoid biosynthesis and to examine their phylogenetic relationships with homologous genes from vascular plants, other algae, and cyanobacteria. Comparative genome analyses revealed some intriguing features associated with pigment biosynthesis in C. reinhardtii; in some cases, there are additional conserved domains in the algal and plant but not the cyanobacterial proteins that may directly influence their activity, assembly, or regulation. For some steps in the chlorophyll biosynthetic pathway, we found multiple gene copies encoding putative isozymes. Phylogenetic studies, theoretical evaluation of gene expression through analysis of expressed sequence tag data and codon bias of each gene, enabled us to generate hypotheses concerning the function and regulation of the individual genes, and to propose targets for future research. We have also used quantitative polymerase chain reaction to examine the effect of low fluence light on the level of mRNA accumulation encoding key proteins of the biosynthetic pathways and examined differential expression of those genes encoding isozymes that function in the pathways. This work is directing us toward the exploration of the role of specific photoreceptors in the biosynthesis of pigments and the coordination of pigment biosynthesis with the synthesis of proteins of the photosynthetic apparatus.
Over the past several decades, the unicellular green alga Chlamydomonas reinhardtii has been an outstanding system for dissecting the function of various proteins involved in photosynthesis (Grossman, 2000 ; Harris, 2001 ; Rochaix, 2002 ). The ability of this alga to grow heterotrophically in the dark by metabolizing exogenous acetate has made it relatively easy to isolate a broad range of C. reinhardtii mutants that adversely affect photosynthetic function (Levine, 1969 ). Mutants defective for photosynthesis are readily analyzed at the genetic level as this organism has a relatively simple and short life cycle (Quarmby, 1994 ). Furthermore, a variety of physiological, biochemical, genetic, and molecular tools have been applied to studies of C. reinhardtii, making it an ideal model system for elucidating biological processes (for review, see Grossman, 2000 ; Harris, 2001 ; Rochaix, 2002 ; Grossman et al., 2004 ).
Recently, there has been considerable progress made with respect to genomic analysis of C. reinhardtii. The generation of extensive cDNA information (http://www.chlamy.org/search.html; Shrager et al., 2003 ), and a draft genome sequence (http://genome.jgi-psf.org/chlre2), is enabling researchers to understand more about genes present on the C. reinhardtii genome and the structure and expression of gene content. Furthermore, genome-based approaches have recently been applied to C. reinhardtii (http://genome.jgi-psf.org/chlre2) to elucidate the dynamics of the photosynthetic apparatus in response to nutrient and light conditions (Simpson and Stern, 2002 ; Grossman et al., 2003 ; Im et al., 2003 ; Shrager et al., 2003 ; Zhang et al., 2004 ; Y. Wang, Z. Sun, M.H. Horken, C.S. Im, Y. Xiang, A.R. Grossman, and D.P. Weeks, unpublished data).
Areas of interest with respect to light utilization in plants have focused on the involvement of pigments in both photosynthetic processes and the sensing and control of cellular processes through environmental light signals. Chlorophyll (Chl) and carotenoids are ubiquitous among photosynthetic organisms and play important roles in the function of the photosynthetic apparatus, the management of excitation energy and integration of photosynthetic function, and biogenesis of the photosynthetic membranes with the regulation of other cellular processes. Both Chl and carotenoid molecules bind to proteins integral to the photosynthetic machinery, where they absorb light energy to generate chemical bond energy (in the form of sugars) and also function in efficiently managing the use of excitation energy. Carotenoids also participate in redox reactions (Tracewell et al., 2001 ; Frank and Brudvig, 2004 ), the protection of organisms from photodamage by quenching singlet oxygen and triplet Chl species (Siefermann-Harms, 1987 ; Frank and Cogdell, 1993 ; Yamamoto and Bassi, 1996 ; Formaggio et al., 2001 ; Baroli et al., 2003 ), and the dissipation of excess absorbed light energy via interactions with singlet excited Chl molecules (Demmig-Adams, 1990 ; Demmig-Adams et al., 1996 ; Yamamoto and Bassi, 1996 ; Niyogi, 1999 ; Baroli and Niyogi, 2000 ; Pogson and Rissler, 2000 ; Ma et al., 2003 ). Carotenoids may even help stabilize membrane structure (Havaux and Niyogi, 1999) . Interestingly, intermediates in the Chl biosynthetic pathway may serve as signaling molecules that communicate the status of the pathway to the transcriptional machinery in the nucleus of the cell, thereby regulating levels of proteins that require Chl for their function (such as light-harvesting Chl-binding proteins; Johanningmeier and Howell, 1984 ; Johanningmeier, 1988 ; Kropat et al., 1997 ; Strand et al., 2003 ), and it appears that the biosynthesis of Chl is intimately linked to the presence and/or synthesis of the light-harvesting complex (LHC) polypeptides (Xu et al., 2001 ). It is likely that Chl and carotenoid biosynthesis are precisely controlled to meet the demands of growing cells under a range of light conditions, and because intermediates in the former pathway are unstable and photoreactive, the accumulation of some intermediates in Chl biosynthesis can elicit the formation of damaging, reactive oxygen species. Although the synthesis of both Chl and carotenoids occurs within chloroplasts, in vascular plants all of the enzymes of the pathway are encoded by nuclear genes and are synthesized in the cytoplasm of the cell as precursor polypeptides with amino-terminal extensions (transit peptides) that enable them to pass through the double membrane of the chloroplast envelope and to their site of function within the organelle.
Chl is a cyclic tetrapyrrole coordinated by a central Mg2+ ion. The synthesis of Chl in plants and algae proceeds along the C5 pathway, in which the first dedicated precursor of the pathway, 5-aminolevulinic acid (ALA), is synthesized from a Glu molecule (Fig. 1). Two molecules of ALA are then condensed to form porphobilinogen, and four porphobilinogen molecules are joined to form the first linear tetrapyrrole of the pathway, hydroxymethylbilane. The hydroxymethylbilane is then cyclized, followed by a decarboxylation and oxidation reactions to form protoporphyrin IX. Mg2+ is inserted into the protoporphyrin IX molecule, and the resulting Mg2+ protoporphyrin IX molecule is methylated, followed by a cyclization reaction that forms the cyclopentanone ring and sequential reduction steps to form chlorophyllide a. The reduction of protochlorophyllide to chlorophyllide can be catalyzed by two different enzymes, the nucleus-encoded, strictly light-dependent protochlorophyllide oxidoreductase (LPOR), common to all photosynthetic eukaryotes and cyanobacteria, or a light-independent (dark-active) enzyme complex (DPOR) that is not present in angiosperms. The latter is comprised of three subunits (ChlB, ChlL, and ChlN) that are encoded by the plastid genome. Phytylation of chlorophyllide a yields Chl a, while oxidation of chlorophyllide a could yield chlorophyllide b followed by phytylation to form Chl b. This pathway and its regulation have been reviewed recently (Reinbothe et al., 1996 ; Suzuki et al., 1997 ; Beale, 1999 ; Vavilin and Vermaas, 2002 ; Cornah et al., 2003 ; Grossman et al., 2004 ).

View larger version (21K):
[in this window]
[in a new window]
|
Figure 1. The Chl biosynthetic pathway. The names of the two major chlorophylls present in C. reinhardtii are in bold. Full enzyme names are as follows: GTS, glutamyl-tRNA synthetase; GTR, glutamyl-tRNA reductase; GSA, Glu-1-semialdehyde aminotransferase; ALAD, ALA dehydratase; PBGD, porphobilinogen deaminase; UROS, uroporphyrinogen III synthase; UROD, uroporphyrinogen III decarboxylase; CPX, coproporphyrinogen III oxidase; PPX, protoporphyrinogen IX oxidase; CHLD, protoporphyrin IX Mg-chelatase subunit D; CHLI, protoporphyrin IX Mg-chelatase subunit I; CHLH, protoporphyrin IX Mg-chelatase subunit H; PPMT, Mg-protoporphyrin IX methyltransferase; CHL27, Mg-protoporphyrin IX monomethylester cyclase subunit; DCR, divinyl protochlorophyllide reductase; LPOR, NADPH:protochlorophyllide oxidoreductase; CAO, chlorophyllide a oxygenase; CHS, Chl synthase; UPM, uroporphyrinogen III methyltransferase; and Fe-Chel, ferrochelatase. In the case of multiple commonly used synonyms, see Grossman et al. (2004) ; abbreviations indicative of enzymatic function were preferred. Throughout the text, the same abbreviations, in capital letters, are used for protein and gene designations, the latter being italicized. For chemical structures of intermediates in the pathway, see Beale (1999) or Vavilin and Vermaas (2002) .
|
|
The carotenoids are isoprenoids that belong to the tetraterpenoid group. Their basic structure is a C40 backbone containing a network of conjugated double bonds that form an extended -electron system; this accounts for the ability of these molecules to absorb in both the UV and visible region of the light spectrum. Carotenoids that consist exclusively of hydrogen and carbon atoms are collectively termed carotenes. However, most naturally occurring carotenoids are oxygenated at one or more positions, placing them into the xanthophyll subgroup, which has been associated with managing the utilization of light energy in plants and algae (Demmig-Adams, 1990 ; Niyogi, 1999 ).
The biosynthesis of carotenoids (Fig. 2) starts with isopentenyl-diphosphate formation, the general precursor of all isoprenoids. In vascular plants and green algae, isopentenyl-diphosphate used for carotenogenesis is synthesized exclusively in the plastid by the recently discovered methylerythritol phosphate (MEP) pathway (Lichtenthaler, 1999 ; Rodriguez-Concepcion and Boronat, 2002 ; Rohmer, 2003 ). The first carotenoid, phytoene, results from the bonding of two C20 molecules, each derived from the condensation of four C5-isoprenoid units, to build the symmetrical C40 backbone. This is followed by extension of the -electron system through sequential desaturation steps and cyclization of the ends of the molecule to generate carotenes. Finally, the introduction of oxygen groups onto the molecule generates xanthophylls. Details of carotenoid biosynthesis have been the subject of several recent reviews (Cunningham and Gantt, 1998 ; Hirschberg, 2001 ; Grossman et al., 2004 ).

View larger version (26K):
[in this window]
[in a new window]
|
Figure 2. The carotenoid biosynthetic pathway. The names of the major carotenoids present in C. reinhardtii are in bold. Pigments in brackets have not yet been observed in C. reinhardtii. Full enzyme names are as follows: DXS, 1-deoxy-D-xylulose-5-phosphate synthase; DXR, 1-deoxy-D-xylulose-5-phosphate reductoisomerase; CMS, 4-diphosphocytidyl-2-C-methyl-D-erythritol synthase; CMK, 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase; MCS, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; HMS, 1-hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate synthase; IDS, isopentenyl diphosphate:dimethylallyl diphosphate synthase; IDI, isopentenyl diphosphate:dimethylallyl diphosphate isomerase; GGPS, geranylgeranyl pyrophosphate synthase; PSY, phytoene synthase; PDS, phytoene desaturase; ZDS, -carotene desaturase; CRTISO, carotenoid isomerase; LCYB, lycopene -cyclase; LCYE, lycopene -cyclase; CHYB, carotene -hydroxylase; CHYE, carotene -hydroxylase; ZEP, zeaxanthin epoxidase; VDE, violaxanthin deepoxidase; NSY, neoxanthin synthase; LSY, loroxanthin synthase; BKT, carotene -ketolase; and GGR, geranylgeranyl reductase. In the case of multiple commonly used synonyms, see Grossman et al. (2004) ; abbreviations indicative of enzymatic function were preferred. Throughout the text, the same abbreviations, in capital letters, are used for protein and gene designations, the latter being italicized. For chemical structures of intermediates in the MEP pathway, see Rodriguez-Concepcion and Boronat (2002) , and for chemical structures of later intermediates, see Cunningham and Gantt (1998) or Hirschberg (2001) .
|
|
Whole-genome information is being generated for a number of photosynthetic eukaryotes (Arabidopsis Genome Initiative, 2001 ; Yu et al., 2002 ; Goff et al., 2002 ; Armbrust et al., 2004 ; Matsuzaki et al., 2004 ), and a nearly completed genome sequence (http://genome.jgi-psf.org/chlre2) as well as a wealth of cDNA information are available for C. reinhardtii. In this article, we exploit this genomic information to define the different genes encoding enzymes involved in Chl and carotenoid biosynthesis in C. reinhardtii, focusing on the relationship of the predicted protein sequences of this alga to those of Arabidopsis (Arabidopsis thaliana) and Synechocystis PCC 6803. The analyses are specifically restricted to nuclear genes that encode proteins with enzymatic activity in the biosynthetic pathways leading to the formation of Chl and carotenoids. We have learned about the structure of these genes and aspects of protein function based on comparisons of the deduced amino acid sequences, analyzed the encoded proteins for the presence of organellar targeting presequences, and identified different potential isozymes associated with specific reactions in the biosynthetic pathways. In addition, we have examined codon usage of the different genes, accumulation of the mRNAs derived from the different isogenes, and the influence of light on their expression levels. These analyses have enabled us to generate hypotheses concerning the function and regulation of proteins involved in the biosynthesis of both Chl and carotenoids.
 |
RESULTS AND DISCUSSION
|
|---|
General Comparison of Chl and Carotenoid Biosynthetic Genes from C. reinhardtii with Similar Genes from Arabidopsis and Synechocystis PCC 6803
The genes predicted to encode most of the polypeptides known to be directly involved in the biosynthesis of Chl and carotenoids in vascular plants were identified in the current version (assembly v2.0) of the C. reinhardtii genome and GenBank expressed sequence tag (EST) entries as of August 2004. Features of these genes have been compiled and are summarized in Table I (Chl genes) and Table II (carotenoid genes). These tables are intended to provide readers with a summary of the information with respect to genes encoding the enzymes of the Chl and carotenoid biosynthetic pathways and to serve as a resource to use for more in-depth analyses/experimentation. As indicated in the tables, some genes still contain gaps and/or have only partial cDNA coverage. Also, a number of gene models predicted from analysis of the genomic DNA sequence are incorrect, partly a consequence of sequence gaps, but also caused by noncanonical intron borders; often the correct mature transcript sequence can be inferred from available cDNA information. All gene models that we recognized as flawed are italicized in Tables I and II. Specific information on incorrect model prediction is included in the manual annotation of the respective gene models on the Joint Genome Institute (JGI) genome browser (http://genome.jgi-psf.org/chlre2/chlre2.home.html). Furthermore, we have performed additional cDNA sequencing for some of these genes to clarify or add needed sequence information (Tables I and II; see also "Materials and Methods").
View this table:
[in this window]
[in a new window]
|
Table I. Comparison of Chl biosynthetic genes of C. reinhardtii, the vascular plant Arabidopsis, and the cyanobacterium Synechocystis PCC 6803
C. reinhardtii genes encoding putative Chl biosynthetic enzymes (see Fig. 1 for full names of gene products) were analyzed for completeness and cDNA coverage, results being indicated by the following abbreviations: N, not available; P, partial; C, complete; C, complete cDNA sequences which were generated for this publication (see "Materials and Methods" for accession numbers); n.h., no homolog; n.i., not yet identified; and n.p., not present. Gene models and sequence lengths recognized as incomplete or erroneous are italicized, as are those data that are biased by this circumstance and therefore are preliminary; gene models and additional information can be found at http://genome.jgi-psf.org/chlre2/chlre2.home.html (use model number as search term under "Advanced Search"). For comparative analyses, homologous protein sequences of C. reinhardtii, Arabidopsis, and Synechocystis PCC 6803 were aligned with ClustalW, and the alignments corrected where necessary. Length of putative presequences was determined as N-terminal extension of C. reinhardtii proteins as compared to cyanobacterial homologs. Then, nonconserved ends at the N terminus (presequences) and C terminus of proteins were clipped to yield alignments containing only the putative functional cores. From these truncated alignments, the percentage of positions with identical (ident.) or identical plus similar (simil.) amino acid positions were calculated using the BioEdit software. Targeting prediction was done with the software tools TargetP (TarP), iPSORT (iPS), and Predotar (Pred), results being indicated by the following abbreviations: M, mitochondrial; P, plastid; and n, no targeting signal predicted.
|
|
View this table:
[in this window]
[in a new window]
|
Table II. Comparison of carotenoid biosynthetic genes of C. reinhardtii, the vascular plant Arabidopsis, and the cyanobacterium Synechocystis PCC 6803
Data for putative carotenoid biosynthetic genes (see Fig. 2 for full names of gene products) were compiled, analyzed, and presented as described for the Chl biosynthetic genes in the legend of Table I. (Note that abbreviations are the same as in Table I).
|
|
Alignments of the predicted amino acid sequences from the homologous Chl and carotenoid biosynthesis genes from C. reinhardtii, the vascular plant Arabidopsis, and the cyanobacterium Synechocystis PCC 6803 were constructed and compared with respect to the lengths of the encoded proteins, their degree of conservation (expressed as percent identity/similarity), and the number of shared introns for the eukaryotic sequences. The presence of putative targeting presequences and additional conserved domains exclusively present in eukaryotic homologs were also investigated, with results of the analyses summarized in Tables I and II.
The predicted Chl and carotenoid biosynthesis genes from C. reinhardtii and Arabidopsis are consistently larger than those of Synechocystis PCC 6803, suggesting that the eukaryotic polypeptides may contain organellar-targeting presequences and/or additional domains within the mature proteins. This was further examined by aligning each of the predicted proteins from C. reinhardtii with homologous sequences from several vascular plants and cyanobacteria (alignments not shown); these alignments confirmed that the sequences from C. reinhardtii and vascular plants contain N-terminal extensions, usually between 30 and 90 amino acids, relative to the homologous cyanobacterial sequences. The sizes of the N-terminal extensions on the C. reinhardtii polypeptides are presented in Tables I and II. In some cases, an additional conserved N-terminal domain, probably part of the mature polypeptide, was present on the C. reinhardtii and Arabidopsis proteins, relative to the cyanobacterial homolog. This additional sequence probably evolved after the origin of plastids. For predicted proteins containing an additional conserved N-terminal domain that appears to be present in the mature protein, the presequence sizes specified in Tables I and II are marked with asterisks. The potential significance of these domains is discussed in more detail below.
In Tables I and II, presequence lengths, as inferred from the amino acid sequence alignments, are also compared to results from cleavage site prediction by ChloroP (Emanuelsson et al., 1999 ). In general, ChloroP predicts shorter presequences, and only in the case of glutamyl-tRNA synthetase (GTS), ALA dehydratase (ALAD), uroporphyrinogen III decarboxylase 2 (UROD2), 1-hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate synthase (HDS), and geranylgeranyl reductase (GGR) do the predicted cleavage sites fall within conserved regions of the aligned proteins. We are aware of only one enzyme, coproporphyrinogen III oxidase 1 (CPX1), for which the cleavage site has been determined directly by sequencing the N terminus of the mature protein (Quinn et al., 1999 ); in this case, the experimental data and ChloroP cleavage site prediction are congruent. However, in other cases, incorrect predictions by ChloroP are likely to occur, and only N-terminal sequencing of mature proteins will provide valid data on the lengths and cleavage sites for presequences.
We also analyzed the C. reinhardtii deduced protein sequences with the targeting prediction tools TargetP (Nielsen et al., 1997 ; Emanuelsson et al., 2000 ), ChloroP (Emanuelsson et al., 1999 ), Predotar (Small et al., 2004 ), and iPSORT (Bannai et al., 2002 ). Often these tools, developed primarily for use with vascular plant sequences, are not able to differentiate between C. reinhardtii mitochondrial and plastidic targeting signals since chloroplast transit peptides in this alga share features with both mitochondrial and plastid presequences of vascular plants (Franzen et al., 1990 ). However, in spite of the shortcomings of these programs, organellar targeting was predicted for nearly all of the proteins analyzed by at least two of the three software tools (Tables I and II). Carotenogenic -carotene desaturase (ZDS; step 32; steps are marked in Figs. 1 and 2 and are listed in tables) was the sole protein for which only a single algorithm predicts its localization to an organelle.
The MEP-pathway enzyme HDS (step 26) appears to have an exceptionally short leader sequence, as deduced from both cDNA and genomic information. The open reading frame (ORF) contains an eight-amino acid sequence that precedes the first conserved motif (YCES). However, both TargetP and iPSORT predicted targeting of this polypeptide to the chloroplast, while Predotar suggested mitochondrial localization (Table II). Based on both TargetP and ChloroP, HDS has a putative organellar-targeting presequence of 22 amino acids, although the latter algorithm did not confirm that the presequence was involved in chloroplast localization. Since the conservation within the HDS polypeptide begins at amino acid nine and the transit peptide is predicted to be represented by the first 22 amino acids, it is conceivable that the HDS targeting sequence is not cleaved from the protein upon import into the chloroplast. This has recently been shown to be the case for CP29, a Chl-binding light-harvesting protein, of C. reinhardtii (Turkina et al., 2004 ).
Some Proteins Exhibit Significantly Less Conservation
Most proteins in Tables I and II are highly conserved among C. reinhardtii, Arabidopsis, and Synechocystis PCC 6803, sharing more than 60% pairwise amino acid identity and approximately 80% amino acid similarity. In many cases, the close phylogenetic relationship between C. reinhardtii and Arabidopsis genes is supported by the presence of one or more conserved intron positions. However, the deduced sequences for some Chl and carotenoid biosynthetic enzymes exhibit a significantly lower level of conservation. A lack of conservation is striking for the uroporphyrinogen III synthase (UROS; step 6); this protein is poorly conserved among all three of the organisms examined in this analysis.
In bacteria, UROS is the product of hemD. A number of (cyano)bacterial species, including Synechocystis PCC 6803, contain a hemD-like gene predicted to encode a hybrid protein representing a fusion of uroporphyrinogen III methyltransferase (UPM) with UROS (Panek and O'Brian, 2002 ). The occurrence of UROS as a domain of a fusion protein in Synechocystis PCC 6803 could in part explain the low similarity between the cyanobacterial UROS domain of the fusion protein and the UROS of C. reinhardtii. UPM catalyzes the first committed step in the biosynthesis of siroheme, which involves the methylation of the product formed by the UROS reaction (Fig. 1). Therefore, fusion of UPM with UROS in the (cyano)bacterial proteins probably has a role in regulating allocation of pathway intermediates for siroheme and Chl formation. Arabidopsis (At5g40850; Leustek et al., 1997 ) and C. reinhardtii (C_940006) both contain homologs of UPM; it is important to examine the expression of these proteins and their potential for interactions with UROS.
Several proteins that are part of the Chl biosynthetic pathway of C. reinhardtii are represented by multiple genes coding for putative isozymes. Some of these predicted proteins, specifically the isoforms of UROD3 (step 7), CPX2 (step 8), and a putative H-subunit of the magnesium (Mg)-chelatase (CHLH2; step 10c), appear to have diverged significantly from their counterparts in Arabidopsis and Synechocystis PCC 6803. As a first approximation, this can be explained by a relaxed pressure to conserve genes that are represented by multiple copies on the genome, allowing for the evolution of enzymes with altered function(s) or expression patterns. A more detailed analysis of the potential isozymes is presented below.
With respect to the carotenoid biosynthetic pathway, the sequence of the enzyme isopentenyl diphosphate:dimethylallyl diphosphate isomerase (IDI; step 28) is not highly conserved. The C. reinhardtii and Arabidopsis enzymes are 43% identical and 62% similar at the amino acid sequence level. Low conservation for this protein was previously noted by Cunningham and Gantt (2000) ; they concluded that IDI from green algae and vascular plants were likely to have separate origins. The cyanobacterial (type II) IDI is completely unrelated to the eukaryotic enzyme (Steinbacher et al., 2003 ). Furthermore, the late carotenogenic enzymes in C. reinhardtii and Arabidopsis, including the lycopene cyclases (LCYB, step 34; LCYE, step 35), the carotenoid hydroxylases (CHYB, step 36; CHYE, step 37), and zeaxanthin epoxidase (ZEP; step 38), are significantly less conserved than the other enzymes in the pathway. These proteins have no cyanobacterial homologs, with the exception of the lycopene cyclases in the genus Prochlorococcus.
Several deduced proteins that function in Chl and carotenoid biosynthesis in C. reinhardtii and Arabidopsis have strong similarity to each other but low levels of conservation relative to their cyanobacterial homologs. The Chl genes in this group encode the plastidic GTS (step 1), the porphobilinogen deaminase (PBGD; step 5), and the CPX (step 8). The carotenoid genes in this group encode most enzymes of the MEP pathway (steps 2127), namely, deoxy-xylulose-5-phosphate synthase (DXS), 4-diphosphocytidyl-2-methyl-erythritol synthase (CMS), 4-diphosphocytidyl-2-methyl-erythritol kinase (CMK), 2-methyl-erythritol-2,4-cyclodiphosphate synthase (MCS), and HDS. For all of these genes, identity/similarity between Synechocystis PCC 6803 and C. reinhardtii is about 20% lower than between C. reinhardtii and Arabidopsis (Tables I and II). The low similarity between MEP enzymes of vascular plants and cyanobacteria was noted previously (Lange et al., 2000 ); a similar degree of difference in these proteins between C. reinhardtii and Synechocystis PCC 6803 is noted here. Lange et al. (2000) suggested the most likely explanation for this observation to be a lateral transfer of these genes from other eubacteria into cyanobacteria subsequent to the primary endosymbiosis that led to evolution of extant plastids.
The nuclear genes encoding proteins involved in Chl and heme biosynthesis in C. reinhardtii may have originated either from an ancestral chloroplast or mitochondrion. Indeed, the CPX proteins from vascular plants and C. reinhardtii are more similar to human and yeast CPX than to any of the cyanobacterial homologs. CrCPX1 has 55% (73%) and CrCPX2 has 43% (64%) identity (similarity) to CPX of human, and only 42% (57%) and 35% (55%) identity (similarity) with Synechocystis PCC 6803 CPX, respectively (see Table I and below). Similarly, the green algal and vascular plant PBGD are most closely related to the homologous enzyme from -proteobacteria, which are considered to be among the closest known eubacterial relatives of mitochondria (Gray et al., 1999 ). The similarity between the plant enzyme and the mitochondrial PBGD from animals, however, is much lower, while a comparison of the latter with eubacterial sequences revealed highest similarity to the enzyme from cyanobacteria. To unravel this virtual paradox, more detailed studies will be necessary to elucidate the phylogenetic relationship of these enzymes from a larger set of taxa.
Conserved Domains in C. reinhardtii and Vascular Plant Proteins That Are Absent in the Cyanobacterial Homologs
Since both Chl and carotenoids are synthesized in plastids, N-terminal plastid targeting signals are expected to be associated with all nucleus-encoded proteins that participate in the synthesis of these pigments. The targeting signals generally display little or no conservation at the primary sequence level (von Heijne et al., 1989 ). Several enzymes involved in pigment biosynthesis in vascular plants were observed to have conserved domains absent from their cyanobacterial counterparts. These conserved domains are generally composed of 20 to 40 amino acids, and are mostly located at the N terminus of the protein between the targeting signal and the first common domain to the Arabidopsis/C. reinhardtii and cyanobacterial homologs. These additional sequences are part of the mature protein but are probably not essential for catalytic activity since they are absent in the cyanobacterial enzymes. Removal of the N-terminal extensions from phytoene synthase (PSY; Misawa et al., 1994 ), lycopene cyclases LCYB and LCYE (Hugueney et al., 1995 ; Cunningham et al., 1996 ), and the chlorophyllide a oxygenase (CAO; Nagata et al., 2004 ) did not result in a loss of enzymatic activity. Hence, these N-terminal extensions may have a regulatory function, either through interactions with metabolites or other polypeptides. This is supported to some extent by the finding that expression of tomato (Lycopersicon esculentum) PSY lacking the conserved N-terminal motif in Escherichia coli resulted in higher phytoene production than the expression of tomato PSY for which just the targeting sequence had been removed (Misawa et al., 1994 ).
In the following analyses, we focus on domains of Chl and carotenoid biosynthesis enzymes conserved between vascular plants and C. reinhardtii but not present in the bacterial homologs. Identification of domains conserved only within the green algal lineage would require genomic/cDNA sequence information from additional green algal genera. In the Chl biosynthetic pathway, the three early enzymes glutamyl-tRNA reductase (GTR), Glu-1-semialdehyde aminotransferase (GSA), and ALAD (steps 24), as well as the Mg-protoporphyrin IX methyltransferase PPMT (step 11), possess an N-terminal conserved domain of 15 to 20 amino acids present in both C. reinhardtii and vascular plant enzymes (Supplemental Fig. 1). While GTR, ALAD, and PPMT probably acquired this sequence after the establishment of plastids within host cells, in the case of GSA the conserved domain is also present in some of the cyanobacterial homologs (e.g. species from the genus Prochlorococcus). In other cyanobacteria, including Synechocystis PCC 6803, remnant of the sequence still appears to be present contiguous to the N terminus of the ORF in the genome, but it appears to be no longer part of the ORF (see Supplemental Fig. 1).
In the carotenoid biosynthetic pathway, the enzymes 1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXR; step 22), phytoene desaturase (PDS; step 31), and ZDS (step 32) share conserved N-terminal extensions of up to 40 amino acids (Supplemental Fig. 2). Interestingly, the extensions associated with LCYB and LCYE display only a very low level of conservation between vascular plants and green algae but are well conserved within each of the two clades (data not shown). However, our alignments include sequences from only three green algae for LCYB (C. reinhardtii, Volvox carteri, and Haematococcus pluvialis) and two for LCYE (C. reinhardtii and V. carteri), with all of these sequences from the genus Volvocales (data not shown). Therefore, additional analyses of conserved domains associated with the two cyclases would benefit from a broader taxon sampling. As hypothesized by Grossman et al. (2004) , the putatively conserved domains of LCYB and LCYE might interact with LHC apoproteins leading to altered enzyme activity (see also below). We suggest that the conserved domains on these polypeptides in vascular plants and algae represent targets for regulatory processes and exciting areas for future research.
CAO (step 15) from vascular plants contains a particularly large N-terminal extension. The first CAO gene sequenced was from C. reinhardtii (Tanaka et al., 1998 ); the identification of this sequence facilitated the subsequent identification of the homologous gene from Arabidopsis (Espineda et al., 1999 ; Rüdiger et al., 1999 ). Partial sequences of CAO genes from the prochlorophytes Prochloron didemnii and Prochlorothrix hollandica were also obtained (Tomitani et al., 1999 ). Biochemical studies on Arabidopsis CAO revealed that, at least in vitro, only chlorophyllide a can be used as a substrate for catalyzing the formation of chlorophyllide b (Oster et al., 2000 ). A comparison of full-length CAO sequences from various organisms (Nagata et al., 2004 ) has demonstrated that the mature Arabidopsis and Oryza sativa enzymes contain an N-terminal extension with a highly conserved A-domain of approximately 130 amino acids and a less conserved B-domain of 30 amino acids; both domains are absent from the prochlorophyte and C. reinhardtii CAO. Although this N-terminal extension was shown to be dispensable for the catalytic activity of the Arabidopsis enzyme, it was hypothesized to play a role in regulation of enzyme activity (Nagata et al., 2004 ). An interesting observation made by Vermaas and coworkers (Xu et al., 2001 ) was that the activity of CAO from Arabidopsis expressed in Synechocystis PCC 6803 could be strongly enhanced by coexpressing it with an apoprotein of the LHCII from pea (Pisum sativum). Although no stably assembled LHCII was detectable in the cyanobacterial transformants and the newly formed Chl b accumulated mainly in the core complexes of both photosystems, the results suggest that an interaction between CAO and apo-LHCII may modulate the activity of the enzyme (Xu et al., 2001 ). Nagata et al. (2004) have suggested that the A-domain of CAO is critical for this interaction to occur.
In the C. reinhardtii genome, a number of small EST sequences are located upstream and in close proximity to the CAO gene. A model predicted for CAO by GreenGenie (genie.8.14) suggests that the C. reinhardtii ORF, originally predicted by Tanaka et al. (1998) , might extend into these small EST sequences, generating a coding region with an additional 600 bp at the 5' end of the gene and containing one additional intron. Another gene prediction tool, GENSCAN (http://genes.mit.edu/GENSCAN.html), also suggests the presence of an alternative start codon, extending the CAO N-terminal by 182 amino acids. We confirmed these predictions by sequencing a long C. reinhardtii cDNA clone (AV626430) and used the deduced sequence to search sequence reads from the V. carteri whole-genome shotgun (WGS) library. We were able to assemble the complete V. carteri CAO coding region from these reads. Figure 3 shows an alignment of the deduced CAO proteins from C. reinhardtii, V. carteri, Dunaliella salina, Arabidopsis, O. sativa, and P. hollandica; the D. salina sequence is likely to be incomplete. The N-terminal regions of CAO from green algae and vascular plants exhibit significant similarities. In addition, the secondary structures of the putative N-terminal extensions of CAO predict an extended -helix that aligns between the algal and vascular plant proteins. Finally, examination of the presequence for a putative plastid targeting sequence also corroborates the presence of an N-terminal extension on the C. reinhardtii CAO. The original C. reinhardtii (BAA33964 and D. salina (BAA82481 CAO sequences deposited in GenBank were not predicted by TargetP, iPSORT, or Predotar to have a presequence that routes the protein to the plastid, while such a presequence was predicted by all three of these algorithms for the extended forms of CAO from C. reinhardtii and V. carteri. However, recent biochemical evidence from immunological analyses of the putative CrCAO polypeptide using an AtCAO antibody suggests that mature CAO from C. reinhardtii has an approximate molecular mass of 51 kD (Eggink et al., 2004 ). This would correspond to the theoretical mass of 51.4 kD calculated from the CAO sequence (BAA33964 determined by Tanaka et al. (1998) . The predicted molecular mass of CrCAO with the conserved extension, after removal of the predicted 29-amino acid presequence (predicted by ChloroP), would result in a 69-kD mature protein. Therefore, it is critical to sequence the N-terminal part of the mature, chloroplast-localized protein.

View larger version (74K):
[in this window]
[in a new window]
|
Figure 3. Protein alignment of chlorophyllide a oxygenase (CAO) from C. reinhardtii with the homologous enzymes from the green algae V. carteri and D. salina (sequence probably incomplete), the vascular plants Arabidopsis and O. sativa, and the full-length sequence of CAO from the prochlorophyte P. hollandica. At conserved sites, black boxes indicate identical amino acids, while gray boxes denote similar amino acids indicating conservative replacements (according to PAM250 matrix). Note the large N-terminal conserved domains of the putative full-length proteins from C. reinhardtii and V. carteri having significant sequence similarity to the corresponding domains in the CAO from the two vascular plants. Thick bars above these domains indicate putative -helical stretches predicted by PSIPRED (McGuffin et al., 2000 ). The CAO cDNA sequence from C. reinhardtii has been deposited in GenBank under the accession number AY860816. Sequence accessions for other species are as follows: V. carteri, assembled from WGS reads ABSY196646.g1, AOBN193539.y1, AOBN79537.x1, AOBN24721.x1, ABSY137769.b2, ABSY7818.g1, ABSY158303.x1, AOBN193539.x3, AOBN24721.x1, ABSY203416.b1, AOBN191145.x1, AOBN43705.x1, ABSY3682.g2, ABSY23169.g1, AOBN163119.x1, AOBN14760.y1, AOBN191145.y1; D. salina, BAA82481 Arabidopsis, BAA82484 O. sativa, AB021310; and P. hollandica, BAD02269
|
|
If the additional conserved domain at the N terminus of the C. reinhardtii CAO is confirmed to be part of the mature polypeptide, it will be important to establish whether or not it interacts with the C. reinhardtii LHC polypeptides and the specificity of these interactions, if they occur. It is possible that the low degree of conservation between extensions of the green algal and vascular plant CAO can be explained by corresponding differences in the green algal and vascular plant LHC polypeptides (Elrad and Grossman, 2004 ), and a need for the two proteins to coevolve.
HDS (step 26), which catalyzes the penultimate step in formation of active isoprene by the MEP pathway, is another potential target for additional research that would help elucidate regulatory processes involved in pigment biosynthesis. The C. reinhardtii HDS gene model (Table II) predicts the presence of an extended insertion of about 260 amino acids with significant sequence similarity to the analogous domain from HDS of Arabidopsis (Querol et al., 2002 ). This domain is located in the central part of the protein. While this domain is absent from bacterial HDS, the Arabidopsis HDS can complement a HDS-null mutant of E. coli (Querol et al., 2002 ). The presence of this additional domain in the C. reinhardtii HDS was confirmed by sequencing a putative HDS cDNA (AV626792). An alignment of the HDS sequences from C. reinhardtii, Arabidopsis, and Synechocystis PCC 6803 is available as Supplemental Figure 3. The significance of the additional domain(s) in the plant and algal enzyme is not known, but it is interesting to note that the amino acid identity of this domain from C. reinhardtii and Arabidopsis is only 49%, while the remainders of the proteins are 74% identical (data not shown). It is reasonable to speculate that the inserted eukaryotic domain of HDS may be involved in regulation and that the details of this regulation may be somewhat different in C. reinhardtii and Arabidopsis.
Two Carotenogenic Genes of Vascular Plants Have No Identified Homologs in C. reinhardtii, while the Presence of Another Carotenogenic Gene on the C. reinhardtii Genome Was Unexpected
For most genes known to be directly involved in the biosynthesis of Chl and carotenoids in vascular plants, we were able to identify homologs in the current version of the C. reinhardtii genome. However, we were unable to identify C. reinhardtii genes encoding the plant enzymes violaxanthin deepoxidase (VDE; step 39) and neoxanthin synthase (NSY; step 40). Since the current version of the C. reinhardtii genome is only about 90% complete, the missing genes might still be discovered in the fractions of the genome that have not yet been sequenced. However, we wouldn't regard this as very likely for reasons explained below.
NSY has only been identified in two species of the family Solanaceae, potato (Solanum tuberosum) and tomato (Al-Babili et al., 2000 ; Bouvier et al., 2000 ). In the complete genome of Arabidopsis, no gene homologous to NSY was detected (Hirschberg, 2001 ). Interestingly, NSY from tomato and potato turned out to be paralogous to the two lycopene cyclases (LCYB and LCYE), common to all vascular plants, and the closely related capsanthin-capsorubin synthase (CCS) from bellpepper (Capsicum annuum). Furthermore, both NSY (Ronen et al., 2000 ) and CCS (Hugueney et al., 1995 ) were shown to possess lycopene-cyclase activity. Therefore, it is conceivable that in plants lacking a separate NSY, one of the two lycopene cyclases (most likely LCYB based on its similarity to NSY) might be responsible for the formation of neoxanthin from violaxanthin, possibly triggered by interactions with the neoxanthin-binding proteins of LHCII (Grossman et al., 2004 ). Alternatively, an enzyme unrelated to NSY of the Solanaceae might be responsible for neoxanthin formation in Arabidopsis and C. reinhardtii.
VDE catalyzes the deepoxidation of violaxanthin as part of the photoprotective xanthophyll cycle (Yamamoto et al., 1999 ). Genes encoding VDE have been sequenced from several vascular plants (Bugos et al., 1998 ). Neither the current version of the C. reinhardtii genome nor WGS reads available from the closely related alga V. carteri contain any sequences with significant similarity to VDE from vascular plants. These results suggest that green algae may use a deepoxidating enzyme with characteristics different from those of the plant enzyme. This suggestion is supported by the observation that dithiothreitol, a potent inhibitor of vascular plant VDE, does not prevent violaxanthin deepoxidation in high light-exposed cultures of C. reinhardtii (K. Niyogi, personal communication). Identification of the deepoxidase from C. reinhardtii by map-based cloning is currently in progress (Anwaruzzaman et al., 2004 ).
We surprisingly detected a gene coding for a putative -carotene ketolase (BKT; step 42), based on similarity to BKT from the green alga H. pluvialis. BKT introduces a keto-group at C(4) of -ionon rings and, in conjunction with -carotene hydroxylase (CHYB), catalyzes the formation of the ketocarotenoid astaxanthin (Lotan and Hirschberg, 1995 ; Breitenbach et al., 1996 ). Interestingly, the genes coding for CHYB and BKT are contiguous on the C. reinhardtii genome, with BKT located on the same strand and just upstream of CHYB. To the best of our knowledge, astaxanthin has not been detected in C. reinhardtii, and our attempts to detect it (using HPLC) in both nutrient-replete and nutrient-limited cultures have been unsuccessful (data not shown). The putative BKT gene in C. reinhardtii appears to be expressed since it is represented by a cDNA clone (1024014H04) in the EST database. Since the BKT gene in the current version of the C. reinhardtii genome database contains two large gaps, we sequenced the corresponding cDNA. As the alignment in Figure 4 demonstrates, the central part of the C. reinhardtii and H. pluvialis homologs is highly conserved at the amino acid level (70% identity and 82% similarity). However, BKT from C. reinhardtii is predicted to have a C-terminal extension of about 115 amino acids, which is absent from any ketolase previously characterized. It will be interesting to examine the functional significance of this amino acid extension, which might relate to the absence of astaxanthin in C. reinhardtii.
C. reinhardtii and Arabidopsis Differ Significantly in the Number of Putative Isozymes Involved in the Biosynthesis of Chl and Carotenoids
In Arabidopsis, there are often multiple genes coding for putative isozymes that function at a number of different steps in the pathway for Chl synthesis (Lange and Ghassemian, 2003 ). By contrast, most reactions in the analogous pathway in C. reinhardtii are catalyzed by unique gene products, with the exception of UROD, CPX, CHLI, CHLH, and CHL27 (Table I). The reactions of the carotenogenic pathway in C. reinhardtii are all catalyzed by unique gene products (with the caveat that there are still some gaps in the genome sequence). In Arabidopsis, there do appear to be isogenes for DXS, IDI, geranylgeranyl diphosphate synthase (GGPS; step 29), and CHYB (Table II).
The increased number of isozymes associated with pigment biosynthesis in vascular plants relative to C. reinhardtii or cyanobacteria may be related to increased regulatory demands and perhaps also to different local environments (e.g. in cells of different tissue types). As an example for the Chl biosynthesis pathway, the expression of GTR1 (HEMA1) in Arabidopsis was highest in green tissue and under stringent light control, while GTR2 was mainly expressed in roots and flowers in a light-independent manner (McCormac et al., 2001 ; Ujwal et al., 2002 ). Organ-specific expression of two GTR isogenes from barley (Hordeum vulgare; Bougri and Grimm, 1996 ) has also been reported. With respect to the carotenoid pathway, DXS was shown to be encoded by two different genes in the legume Medicago truncatula. DXS1 was expressed in a variety of developing tissues, with the exception of the roots. DXS2 expression was strongly stimulated in roots upon colonization with mycorrhizal fungi (Walter et al., 2002 ). There are both organ- and organellar-specific isoforms of GGPS in Arabidopsis (Zhu et al., 1997 ; Okada et al., 2000 ). There appear to be 12 different GGPS isogenes in Arabidopsis (Lange and Ghassemian, 2003 ), although expression of only five has been demonstrated (Okada et al., 2000 ). There is one GGPS and three other ORFs on the C. reinhardtii genome that encode related, prenyl transferase-like proteins. Alignment of these sequences with GGPS and related prenyl transferases from Arabidopsis and other vascular plants (data not shown) has tentatively enabled us to assign the products of the other genes the functions geranyl pyrophosphate synthase (gene model C_490103), farnesyl pyrophosphate synthase (C_120115), and solanesyl pyrophosphate synthase (C_1690011).
Putative Isozymes That Function in Chl Biosynthesis in C. reinhardtii and Their Relationship to Enzymes from Other Photosynthetic Organisms
The C. reinhardtii isogenes involved in Chl biosynthesis are UROD (step 7), CPX (step 8), two subunit genes of the Mg-chelatase (step 10), CHLI and CHLH, and the recently identified CHL27 (step 12). The CHL27 protein appears to be involved in catalyzing the formation of the cyclopentanone ring of Chl (Moseley et al., 2000 ; Tottey et al., 2003 ). As a first step toward a detailed characterization of potential isozymes associated with pigment biosynthesis in C. reinhardtii, we searched for homologs in the genomes of Arabidopsis and O. sativa, in vascular plant EST databases, in the genomes of the red alga Cyanidioschyzon merolae, the diatom Thalassiosira pseudonana, as well as in the current cyanobacterial databases.
In C. reinhardtii, UROD is the first enzyme in the Chl biosynthetic pathway encoded by multiple genes (step 7), three in this case. A comparison among the predicted UROD proteins of C. reinhardtii is presented in Supplemental Figure 4. The encoded proteins have 43% to 55% identity and 67% to 76% similarity among themselves, and expression of all three of the UROD isogenes is supported by EST sequence data. Vascular plants in general appear to contain at least two different UROD genes; two isogenes were identified in Arabidopsis, potato, tobacco (Nicotiana tabacum), and barley. For O. sativa and Zea mays, cDNA data suggest the occurrence of three isogenes (but see below). The genome of C. merolae also contains two UROD isogenes, while the T. pseudonana genome harbors three isogenes. The cyanobacterial genomes (eight complete and five partial) each contain a single UROD gene.
The phylogenetic relationship among UROD isoforms from different organisms is depicted in Figure 5. The cyanobacterial enzymes cluster at the base of the neighbor-joining tree, while the eukaryotic enzymes fall into three groups, with the CrUROD1 from C. reinhardtii located at the base of a cluster also containing UROD1 from vascular plants. Similarly, CrUROD2 from C. reinhardtii and vascular plants form a second cluster. Both clusters have high bootstrap support. The third UROD cluster is divided into two subclusters containing the red algal and diatom isozymes, and while the phylogenetic position of CrUROD3 from C. reinhardtii is less well resolved, it appears to fall into a subcluster with one each of the red algal and the diatom isoforms. The other UROD from C. merolae and the two remaining isoenzymes from T. pseudonana (TpUROD3 was assembled from unplaced WGS reads) comprise the other subcluster. A very similar branching pattern, with similar bootstrap values, resulted from a maximum-likelihood analysis employing 100 bootstrap replicates (data not shown).

View larger version (27K):
[in this window]
[in a new window]
|
Figure 5. Neighbor-joining tree for UROD constructed from protein alignment (314 amino acid positions) of isozymes from C. reinhardtii with homologs from other algae, vascular plants, and cyanobacteria. The protein sequence from the cyanobacterium Gloeobacter violaceus PCC 7421 was used to root the tree. Bootstrap support higher than 50% is indicated at respective nodes (n = 1,000). Species and sequence accessions are as follows: Arabidopsis 1, NP_181581, and 2, NP_850587; C. merolae 1, CMP083C, and 2, CME194C; G. violaceus PCC 7421, NP_926823; H. vulgare 1, CAA58039 and 2, assembled from CB883221, AL508038, BG299928, BJ483280, BJ455669, AV834454; L. esculentum 1, assembled from BE462354, AI778274, AW624527, BI924627, BF112811, BI421978, and 2, assembled from BI928533, BI926422, BG124909, BI929528, BG129172, BM411365; Nostoc PCC 7120, NP_487949; O. sativa 1, AK070859, 2, AK106203, and 3, AK110601; Prochlorococcus marinus CCMP 1375, NP_875471; P. marinus MIT 9313, NP_894278; S. tuberosum 1, assembled from CV474374, BE921296, BQ047515, BE923279, BQ113938, BG097583, BM404774, and 2, assembled from CK276507, AW906474, BG097407, BE341237, CK276508, CK268172, BQ118921; Synechococcus WH 8102, NP_897588; Synechocystis PCC 6803, NP_442753; T. pseudonana 1, newV2.0.genewise.20.4.1 [thaps1:129030], 2, genewise.8.624.1 [thaps1:45777], and 3, assembled from unplaced WGS reads PQI71568x1, SXZ37045y1, PQI86858y1, SXZ116998.y1, SXZ24083x1, SXZ24793x1, SXZ7875.x1, SXZ62420y1, SXZ28894y1, PQI126422.y1; Trichodesmium erythraeum IMS101, ZP_00326423; and Z. mays 1, O81220, 2, assembled from CD434648, CD435388, CD436605, and 3, assembled from CD442487, CD444385, CD977856, CA400912.
|
|
Surprisingly, OsUROD3 (from O. sativa) is most closely related to CrUROD2. However, the OsUROD3 sequence is supported by a single cDNA entry (AK110601), and we failed to retrieve any additional EST or genomic sequence data for this putative gene from the O. sativa databases. Hence, the single cDNA may represent a contamination of the cDNA library with an unidentified green alga. This is corroborated by comparative analyses of GC content and codon usage of the O. sativa UROD genes. While OsUROD1 and OsUROD2 have a GC content of 50% and 54% and an effective number of codons (ENC) used of 55.6 and 57.9, respectively, the OsUROD3 sequence has a strong bias both with respect to GC (64%) and ENC (36.2) values, which are more similar to the values expected for ORFs of green algae like C. reinhardtii or V. carteri (see Table III and below).
View this table:
[in this window]
[in a new window]
|
Table III. Comparison of EST data and ENC values for ORFs in the C. reinhardtii genome encoding putative proteins involved in biosynthesis of chlorophylls (116) or carotenoids (2143)
C. reinhardtii EST clones deposited in GenBank (http://www.ncbi.nlm.nih.gov) were grouped into three categories: (1) unstressed (clones of projects 874, 894, and 1,024 from Shrager et al., 2003 ; clones whose accession begins with "AV" from Asamizu et al., 1999 , 2000 ); (2) stressed (projects 963, 1,031, and 3,510 [=1,115] from Shrager et al., 2003 ; projects 832 and 833); (3) other (projects 1,030, 3,511 (=1,112), and 925 from Shrager et al., 2003 ; clones whose accession begins with "BP" from Asamizu et al., 2004). EST clones with more than one sequence entry in GenBank (5'- and 3'-reads) were counted only once. The detailed frequencies of EST clones in each project are available as Supplemental Tables I and II. The ENC used by each ORF was calculated according to Wright (1990) . For UROD, CPX, CHLI, and CHLH, the respective isogene that is represented by the majority of EST clones in cDNA libraries generated from unstressed cells and at the same time has the lowest ENC value is in bold.
|
|
The UROD reaction is positioned at a branch point of tetrapyrrole biosynthesis, competing with UMP for the substrate uroporphyrinogen III. A comparison of intron positions among the C. reinhardtii and Arabidopsis isogenes (Table I; Supplemental Fig. 4) reveals that some intron positions are conserved between the different UROD isogenes within a given organisms. These findings suggest that a gene duplication occurred after the endosymbiotic event that presaged the evolution of the chloroplast in eukaryotic plant cells. Furthermore, since all plant and algal species that we examined contain at least two different genes encoding putative UROD isozymes, it is possible that the UROD isozymes fulfill different roles in the cell; they may no longer be functionally equivalent. Therefore, it will be useful to characterize the expression characteristics and localization of the putative UROD isozymes.
The product of the reaction catalyzed by UROD, coproporphyrinogen III, is oxidized by CPX (step 8), which is encoded by two different genes in C. reinhardtii. EST sequences are available for both CPX genes (Tables I and III). The full-length sequence of CPX1 was previously reported, and its gene product was purified and shown to be localized in the plastid (Hill and Merchant, 1995 ; Quinn et al., 1999 ). The deduced CPX protein sequences are compared in Supplemental Figure 5. The genomes of Arabidopsis and T. pseudonana also contain two potential CPX genes, while the C. merolae genome has a single CPX gene. The two CPX isogenes from Arabidopsis are very similar at the nucleotide level, which probably reflects a recent gene duplication. However, CPX2 from Arabidopsis does not appear to encode a functional product since the ORF contains a frame shift (Santana et al., 2002 ). A single copy of the CPX gene was detected in other vascular plant genomes.
As noted earlier, CPX from vascular plants and C. reinhardtii is most similar to the mitochondrial enzyme from animals and fungi. The two CPX genes from C. reinhardtii have no intron positions in common, and the deduced amino acid sequences of the isozymes differ significantly (Supplemental Fig. 5); the amino acid sequences of the two CPX proteins from T. pseudonana are also very different. The CPX1 isozymes of the two algae cluster with the single CPX proteins from vascular plants and C. merolae and the CPX homolog from the prasinophyte Ostreococcus tauri. By contrast, C. reinhardtii and T. pseudonana CPX2 isoforms group in a separate cluster, positioned between the cyanobacterial and animal/fungal CPX clusters (Fig. 6). The same branching pattern could be reproduced with high bootstrap support (n = 100) by a maximum-likelihood analysis of the data set (data not shown). As there is a close relationship between algal CPX2 and mitochondrial CPX, it will be important to establish the subcellular location(s) of CPX2 in C. reinhardtii. Cyanobacteria of the genus Nostoc also have two CPX genes. However, these isoforms cluster within the cyanobacterial branch of the tree, suggesting that they are the result of a recent, local gene duplication that is restricted to a subgroup of the cyanobacteria.

View larger version (29K):
[in this window]
[in a new window]
|
Figure 6. Neighbor-joining tree for CPX constructed from protein alignment (275 amino acid positions) of isozymes from C. reinhardtii with homologs from other algae, vascular plants, fungi, animals, and cyanobacteria. The protein sequence from the enterobacterium E. coli was used to root the tree. Bootstrap support higher than 50% is indicated at respective nodes (n = 1,000). Species and sequence accessions are as follows: Anabaena variabilis ATCC 29413 1, ZP_00159111, and 2, ZP_00158651; Arabidopsis 1, CAD12661 and 2 (pseudogene), NP_567256; C. merolae, CMO136C; Drosophila melanogaster, NP_524777; Escherichia coli, BAB36730 Gallus gallus, XP_416596; Gloeobacter violaceus PCC 7421, NP_926822; Homo sapiens, NP_000088; H. vulgare, CAA58037 L. esculentum, BT014254; Mus musculus, BAA03840 N. tabacum, CAA58038 Nostoc PCC 7120 1, NP_484694, and 2, NP_485400; O. tauri, AAS88901 O. sativa, XP_473852; Prochlorococcus marinus CCMP 1986, NP_893699; P. marinus MIT 9313, NP_895533; S. cerevisiae, AAA34529 Synechocystis PCC 6803, NP_440183; T. pseudonana 1, newV2.0.genewise.69.87.1 [thaps1:154535], and 2, newV2.0.grail.67.2.1 [thaps1:109130]; and Yarrowia lipolytica, XP_504574.
|
|
Interestingly, vascular plants generally seem to contain two protoporphyrinogen IX oxidase (PPX) isozymes (step 9), which catalyze the step in Chl and heme biosynthesis immediately following CPX. In tobacco, one of these isozymes has been shown to be plastid specific, while the other was localized to mitochondria (Lermontova et al., 1997 ). The three algal genomes that we examined each contain a single PPX gene, encoding a protein that is most similar to the plastid-specific PPX from tobacco and the PPX1 (At4g01690) from Arabidopsis (Table I).
Mg-chelatase (step 10) is situated at another important branch point in the tetrapyrrole biosynthetic pathway, catalyzing the committed step leading to Chl formation. This reaction has been recognized as an important target for regulation and has been the focus of several studies (e.g. see Walker and Willows, 1997 ; |