Plant Physiology 133:1051-1071 (2003)
© 2003 American Society of Plant Biologists
GENOME ANALYSIS
Genome-Wide Characterization of the Lignification Toolbox in Arabidopsis1,[w]
Jeroen Raes2,
Antje Rohde2,
Jørgen Holst Christensen,
Yves Van de Peer and
Wout Boerjan*
Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology, Ghent University, Technologiepark 927, B9052 Gent, Belgium
 |
ABSTRACT
|
|---|
Lignin, one of the most abundant terrestrial biopolymers, is indispensable for plant structure and defense. With the availability of the full genome sequence, large collections of insertion mutants, and functional genomics tools, Arabidopsis constitutes an excellent model system to profoundly unravel the monolignol biosynthetic pathway. In a genome-wide bioinformatics survey of the Arabidopsis genome, 34 candidate genes were annotated that encode genes homologous to the 10 presently known enzymes of the monolignol biosynthesis pathway, nine of which have not been described before. By combining evolutionary analysis of these 10 gene families with in silico promoter analysis and expression data (from a reverse transcription-polymerase chain reaction analysis on an extensive tissue panel, mining of expressed sequence tags from publicly available resources, and assembling expression data from literature), 12 genes could be pinpointed as the most likely candidates for a role in vascular lignification. Furthermore, a possible novel link was detected between the presence of the AC regulatory promoter element and the biosynthesis of G lignin during vascular development. Together, these data describe the full complement of monolignol biosynthesis genes in Arabidopsis, provide a unified nomenclature, and serve as a basis for further functional studies.
Lignin is an aromatic heteropolymer that is mainly present in secondary thickened plant cells, where it provides rigidity and impermeability to the cell walls. In addition, lignin deposition may be induced upon wounding and infection to protect plant tissues against invading pathogens. Lignin is composed of different phenylpropanoids, predominantly the monolignols p-coumaryl, coniferyl, and sinapyl alcohols that differ in their degree of methoxylation (Fig. 1). When these monolignols are incorporated into lignin, they are called p-hydroxyphenyl (H), guaiacyl (G), and syringyl (S) units, respectively. In addition to the three monolignols, other phenylpropanoids, such as hydroxycinnamyl aldehydes, hydroxycinnamyl acetates, hydroxycinnamyl p-hydroxybenzoates, hydroxycinnamyl p-coumarates, and hydroxycinnamate esters, are also present in the polymer (Ralph et al., 2001 ; Boerjan et al., 2003 ). Considerable variation exists in lignin composition between taxa, cell types, and developmental and environmental conditions.

View larger version (28K):
[in this window]
[in a new window]
|
Figure 1. The monolignol biosynthetic pathway. All the enzymatic reactions presented in the pathway have been demonstrated at least in vitro. Because of the variety in isoenzymes and kinetic properties, alternative routes through the metabolic pathway may exist. A question mark after an enzyme name means that the substrate has not been tested yet with this enzyme. For reactions with a single question mark, direct conversion has been detected, but the respective enzyme is unknown, whereas for those with a double question mark, no direct conversion has been detected.
|
|
Over the last decade, there has been a tremendous effort in cloning new genes involved in the monolignol biosynthetic pathway and in tackling the enzyme kinetics of the corresponding proteins and the role these enzymes play in controlling the amount and composition of lignin to be deposited in the cell wall (Anterola and Lewis, 2002 ; Humphreys and Chapple, 2002 ; Boerjan et al., 2003 ). As a consequence, the monolignol biosynthetic pathway has virtually been rewritten, although the exact route toward the monolignols is still a matter of debate (Fig. 1).
Although enzymatic assays and transgenic plants have contributed extensively to our understanding of the in vivo role of the enzymes, the role of individual gene family members has been more difficult to tackle, a limitation that can only be overcome in plant species such as Arabidopsis, for which the genome sequence and efficient reverse genetics tools are available (Arabidopsis Genome Initiative [AGI], 2000 ). Furthermore, the advent of genome-wide microarrays will make it possible to study the transcriptional differences that are the consequence of single gene perturbations and will allow the often pleiotropic phenotype of particular mutants to be explained at the molecular level.
As a first step toward studying the role of individual family members, we have undertaken a bioinformatics approach to identify, in Arabidopsis, all the gene family members of all monolignol biosynthesis genes known today. In many cases, only a subset of a given gene family, mostly obtained by homology-based gene isolation, has been characterized in the past. As a consequence, more distant family members might not have been discovered when, for example, primers were designed on only a few members of the family. This has led to an important bias in the range of sequence data available in public databases.
Here, we have used sensitive computational methods to delineate, in Arabidopsis, all members of the gene families currently known to be involved in monolignol biosynthesis. The integration of expression studies and promoter sequence analyses of the individual family members with phylogenetic analysis of the family has allowed us to select 12 genes as the most likely candidates to be involved in the developmental lignification in vascular tissues. Importantly, the promoter comparisons revealed a possible link between G lignin biosynthesis and the presence of the AC element that is correlated with a strong xylem expression. Together, these data describe the full complement of monolignol biosynthesis genes in Arabidopsis, introduce a unifying nomenclature for all genes of the pathway (Table I), and serve as a basis for further functional studies.
View this table:
[in this window]
[in a new window]
|
Table I. Unifying nomenclature for gene families investigated in this study
Nomenclature was chosen to accommodate as much as possible previously published names. For explanation, see text.
|
|
 |
RESULTS
|
|---|
A semiautomatic structural annotation and a phylogeny-based classification were performed using prediction results, experimental data, and information from homologous sequences (see "Materials and Methods"). A total of 34 candidate monolignol biosynthesis genes were annotated, of which nine had, to our knowledge, never been described before (Table I). In addition, 27 closely related superfamily members ("likes") were identified in this process (Table I). To get a first insight into whether all these genes are indeed expressed and, more importantly, whether their expression pattern correlates with developmental lignification, their expression was analyzed in a set of tissues and for six developmental stages of inflorescence stem known to contain a high portion of lignifying cells. These data were compared with previous expression data from Arabidopsis and with information extracted from public databases of expressed sequence tag (EST). In addition, putative promoter elements, which drive expression during lignification, in pathogen and wound responses, and after induction by stress-related hormones, and potential subcellular localization signals were identified (due to size limitations, tables compiling all these data are available as supplemental data and at http://www.psb.ugent.be/bioinformatics/lignin/and are indexed by an "s" throughout the manuscript).
 |
PAL
|
|---|
PAL (E.C. 4.3.1.5) is the first enzyme of the general phenylpropanoid pathway and catalyzes the nonoxidative deamination of Phe to trans-cinnamic acid and NH3 (Fig. 1). PAL mediates the influx from primary metabolism into the phenylpropanoid pathway and becomes rate limiting when its activity is reduced below a threshold of 20% to 25% in transgenic tobacco (Nicotiana tabacum; Bate et al., 1994 ; Sewalt et al., 1997 ).
By using a thorough semiautomated annotation method, four genes encoding PAL proteins were detected in the Arabidopsis genome, three of which have been described previously (Ohl et al., 1990 ; Wanner et al., 1995). The phylogenetic analysis of PAL genes from various species provided no evidence for different classes in the PAL gene family (Fig. 2), although PAL1 is most closely related to PAL2, and PAL3 always clusters together with PAL4 (data not shown). The duplication that created the two PAL groups (PAL1 and PAL2 and PAL3 and PAL4) in Arabidopsis has been postulated to have predated the monocot-dicot split (Wanner et al., 1995), but the latter is not confirmed by our phylogenetic tree (Fig. 2).

View larger version (10K):
[in this window]
[in a new window]
|
Figure 2. Neighbor-joining tree of the PAL family, inferred from Kimura corrected evolutionary distances. Bootstrap values (Neighbor-Joining/Maximum Likelihood; NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per nucleic acid. Clusters of sequences are represented as triangles with a height equal to the average distance separating the terminal nodes from the deepest branching point in the cluster and a base proportional to the number of sequences composing it. Species and GenBank identification numbers of non-Arabidopsis sequences included in this tree are: dicots, Populus (169453, 485808, and 1109640), Glycine (18376), Trifolium (437711), Citrus (4808125, 4808127, and 1276902), Rubus (7208613 and 7208615), Camellia (662270), Petroselinum (534892), Nicotiana (170349), Digitalis (2631994), and Lactuca (18001006); monocots, Oryza (20280 and 871493); and gymnosperms, Pinus (1143311). Arath, Arabidopsis; Pinta, pine (Pinus taeda).
|
|
PAL1 and PAL2 are not only structurally very similar, but they also share common promoter elements and a similar expression pattern (supplemental Table Is). mRNAs from both genes are most abundant in roots and stems, where the expression increases during the later stages of development (Fig. 3; supplemental Table Is; Wanner et al., 1995). Analysis of the fusion between the AtPAL1 promoter and -glucuronidase (GUS) revealed that the expression is located in the vascular tissues (Ohl et al., 1990 ; Leyva et al., 1995 ). Besides PAL1 and PAL2, PAL4 also is highly expressed in stem tissue, as shown by our RT-PCR expression analysis (Fig. 3). In addition, PAL2 and PAL4 are abundantly expressed in the seeds, as judged from the EST data (Fig. 3; supplemental Table Is). Although all four genes are almost ubiquitously expressed in the tissues investigated in this study, PAL3 seems to be generally expressed at a lower level (supplemental Table Is; Wanner et al., 1995; Mizutani et al., 1997 ; Ruegger et al., 1999 ). PAL1 was one of the first plant defense genes identified, and its involvement in pathogen infection and abiotic stress has been studied. Among the ESTs derived from diverse stresses, PAL1 and PAL2 are clearly the most important stress-responsive family members, with 20 of 41 ESTs and 17 of 50 ESTs in total, respectively, even taking into account the relative database sizes (supplemental Table Is). In line with this, a number of regulatory elements, shown to be involved in promoter responsiveness to elicitors, wounding, and pathogen infection, were found in these genes using the stringent search method (see "Materials and Methods"; supplemental Table Is).

View larger version (57K):
[in this window]
[in a new window]
|
Figure 3. Expression profiles of all 34 monolignol biosynthesis genes. Semiquantitative expression was determined using reverse transcription (RT)-PCR (see "Materials and Methods"). Due to different PCR dynamics of shorter or longer amplification products, only different tissues for a particular gene may be compared. It should be noted that CAD7 and CAD8 arose during a recent duplication event (described in detail by Tavares et al., 2000 ) and could not be distinguished in the RT-PCR analysis because of their high sequence similarity: 98% and 94% identity in the coding regions and putative 3'-untranslated regions, respectively. S, Seedling; R, root; L, leaf; F, flower; Si, green siliques; St, stem (at 1-, 3-, 5-, 10-, 15-, and 20-cm length).
|
|
In addition, and in accordance with the expression pattern, the promoters of PAL1 and PAL2 contain well-conserved AC elements that specify vascular expression of phenylpropanoid genes (supplemental Table Is; Ohl et al., 1990 ; Hauffe et al., 1993 ; Hatton et al., 1995 ; Wanner et al., 1995; Lacombe et al., 2000 ). An A box, proposed to work in conjunction with the AC elements in the parsley (Petroselinum crispum) PAL1 and PAL4 genes (Logemann et al., 1995 ), was not detected in the Arabidopsis PAL1 and PAL2 promoters (supplemental Table Is). PAL4 contains an A box but lacks an AC element. Interestingly, an H box and a G box were found in the PAL4 promoter. This combination of cis-elements was shown to be sufficient for the feed-forward induction of the chalcone synthase (CHS) promoter by p-coumaric acid in bean (Phaseolus vulgaris; Loake et al., 1992 ; Lindsay et al., 2002 ). This observation may indicate that PAL4 is regulated by the reaction product of C4H.
In conclusion, all PAL genes are expressed in the inflorescence stem, a tissue with a high portion of lignifying cells. However, the presence of an AC element qualifies PAL1 and PAL2 as the most likely candidates to be involved in monolignol biosynthesis in the vascular lignifying cells. In accordance, the corresponding mutants show defects in lignin formation (A. Rohde and W. Boerjan, unpublished data).
 |
C4H
|
|---|
C4H (E.C. 1.14.13.11) controls the conversion of cinnamate into p-coumarate (Fig. 1). C4H (CYP73A5) belongs to the cytochrome P450-dependent monooxygenases, like the two other hydroxylases in the pathway (C3H, F5H). So far, only one C4H gene has been described in Arabidopsis (Bell-Lelong et al., 1997 ; Mizutani et al., 1997 ; Urban et al., 1997 ). Although multiple family members have been detected in other plants (Betz et al., 2001 , and refs. therein), we could not find any evidence for additional CYP73 genes in Arabidopsis. Phylogenetic analysis shows that two classes of C4H genes exist in plants (Fig. 4; Nedelkina et al., 1999 ; Betz et al., 2001 ). Furthermore, the tree topology indicates that the origin of these two classes has predated the divergence of gymnosperms and angiosperms, suggesting that class II members must have existed at some time in the evolution for most plant lineages. The Arabidopsis C4H gene belongs to class I; a class II homolog was most probably lost during the evolution of this species.

View larger version (11K):
[in this window]
[in a new window]
|
Figure 4. Neighbor-joining tree of the C4H family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Clusters of sequences are represented as described in Figure 2. Species and GenBank Identifier numbers of non-Arabidopsis sequences included in this tree are: Class I dicots, Populus (12276037, 3915089, and 3915096), Gossypium (9965899 and 9965897), Petroselinum (3915088), Ruta (13548653), Citrus (8572559 and 14210375), Catharanthus (1351206), Lithospermum (16555879 and 16555877), Capsicum (3603454 and 12003968), Zinnia (3915112), Helianthus (417863), Glycine (3915111), Phaseolum (586082), Glycyrrhiza (3915095), Cicer (14917048), Medicago (586081), and Pisum (3915077 and 9957081); Class II dicots, Mesembryanthemum (4206116), Citrus (7650489), Phaseolus (7430650), and Nicotiana (14423323 and 14423325); monocots, Triticum (10442761) and Sorghum (14192803); and gymnosperms, Pinus (4566493). Arath, Arabidopsis; Pinta, pine.
|
|
C4H is expressed in all tissues and upon exposure to light, wounding, and fungal infection (supplemental Table IIs; Bell-Lelong et al., 1997 ; Meyer et al., 1998 ; Nair et al., 2002 ). In our RT-PCR experiment, C4H expression increased during the later stages of stem development (Fig. 3). Activity of AtC4H::GUS coincides with vascular cells in the inflorescence stem and in leaves, but in roots the promoter is active in all cells, giving the strongest expression in this tissue (Bell-Lelong et al., 1997 ; Nair et al., 2002 ). A strong C4H expression is also found in siliques and seeds, where it could be involved in the production of sinapate esters (Chapple et al., 1994 ). In addition, the C4H promoter contains an H box, which might be responsible for induction of C4H expression after elicitation.
By TargetP (Emanuelsson et al., 2000 ), the C4H protein is predicted to contain an endoplasmic reticulum (ER)-targeting peptide. However, this peptide coincides with the membrane anchor region of P450 enzymes, whose features are a stretch of hydrophobic amino acids, followed by a small region rich in basic amino acids and a hinge region of the conserved (P/I)PGPx(G/P)xP sequence (Chapple, 1998 ). All class II C4H proteins included in the phylogenetic analysis (Fig. 4) show a divergent hinge and basic amino acid region. Although the function of these class II C4H proteins is unclear at the moment, the shared degeneration of this crucial region could be an important clue in discovering their function.
 |
4CL
|
|---|
4CL (E.C. 6.2.1.12) catalyzes the formation of CoA esters of p-coumaric acid, caffeic acid, ferulic acid, 5-hydroxyferulic acid, and sinapic acid (Fig. 1; Lee et al., 1997 ; Hu et al., 1998 ). The plethora of additional potential substrates may explain why there are many 4CL isoenzymes in most plants. In addition to the different substrate specificities, the genes typically have a distinct spatio-temporal expression pattern (Lewis and Yamamoto, 1990 ; Hu et al., 1998 ; Harding et al., 2002 ).
We detected four 4CL and nine 4CL-like genes in the Arabidopsis genome. Phylogenetic analysis of the predicted proteins, together with characterized 4CL proteins and luciferases, acetate, and fatty acid CoA-ligases (other adenylate-forming enzymes; data not shown), shows that 4CL proteins fall into two classes (Fig. 5; Ehlting et al., 1999 ; Cukovic et al., 2001 ). Three of the Arabidopsis proteins belong to class I (4CL1, 4CL2, and 4CL4) and 4CL3 to class II; the remaining nine are classified as 4CL like because they do not correspond to any of the 4CL or other enzyme classes mentioned above.

View larger version (15K):
[in this window]
[in a new window]
|
Figure 5. Consensus of two neighbor-joining trees of the 4CL and 4CL-like proteins, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Clusters of sequences are represented as described in Figure 2. Species and GenBank identification numbers of non-Arabidopsis sequences included in this tree are: Class I dicots, Solanum (398963, 398965, and 5163399), Capsicum (12003966), Nicotiana (12229631, 7428495, and 12229632), Lithospermum (1117778), Petroselinum (112800 and 112801), Rubus (9651915 and 9651917), Populus (7437854, 7437855, 14289344, 18032806, 7437852, and 15636677), and Amorpha (17063848); gymnosperms, Pinus 4CL (7437872); Class II monocots, Lolium (7188335) and Oryza (12229650); Class II dicots, Lithospermum (9988455), Glycine (18266852), and Populus (7437853 and 14289346); monocots, Oryza (112802), Lolium (7188337 and 7188339); and 4CL-like, Oryza (12039389). Arath, Arabidopsis; Pinta, pine.
|
|
Our expression analysis showed that 4CL genes are expressed in almost all investigated tissues, with 4CL4 having the most restricted expression (Fig. 3). The latter observation is supported by the smallest number of ESTs found for 4CL4 among the 4CL genes (supplemental Table IIIs). 4CL1 and 4CL2 are expressed throughout inflorescence stem development and expression increases during the later stages (supplemental Table IIIs; Lee et al., 1995 ; Mizutani et al., 1997 ; Ehlting et al., 1999 ). On the contrary, 4CL3 and 4CL4 are expressed only during the later stages of inflorescence stem development (Fig. 3; supplemental Table IIIs). The expression of 4CL3 is not affected by wounding and Peronospora parasitica infection, in clear difference to the class I 4CL genes (supplemental Table IIIs; Ehlting et al., 1999 ). In accordance with the expression analysis, the promoters of both 4CL1 and 4CL2 contain AC elements. Furthermore, the promoter analysis identified an AT-rich sequence motif in the 4CL4 promoter and an H box in the 4CL3 and 4CL4 promoters, hinting to a role in particular stress responses (Seki et al., 1996 ; Rushton et al., 2002 ).
In conclusion, 4CL1 and 4CL2 are the best candidates for a function in monolignol biosynthesis during developmental lignification, as suggested previously by Ehlting et al. (1999 ). Their expression correlates with tissues containing a high portion of lignifying cells, and AC elements are present in their promoters. To the contrary, 4CL3 (class II) was suggested to channel activated p-coumarate to CHS and subsequently to the flavonoid biosynthesis (Ehlting et al., 1999 ). 4CL4 (class I), although expressed more specifically or at a lower level, might have yet another substrate specificity. In soybean (Glycine max), a single amino acid deletion determines whether or not 4CL can use sinapic acid as a substrate (Lindermayr et al., 2003 ), a function lacking for 4CL1, 4CL2, and 4CL3 in Arabidopsis (Ehlting et al., 1999 ). Interestingly, 4CL4 shows a similar deletion in the region coding for the substratebinding pocket, suggesting that this gene may have acquired an altered substrate specificity toward sinapic acid after duplication. A recent paper shows that 4CL4 is indeed able to convert sinapic acid (Schneider et al., 2003 ).
 |
HCT
|
|---|
HCT belongs to a large family of acyltransferases that are involved in the biosynthesis of diverse secondary metabolites. Only recently, the first HCT has been purified from tobacco stems, and the corresponding gene was cloned (Hoffmann et al., 2003 ). In tobacco, HCT catalyzes the conversion of p-coumaroyl-CoA and caffeoyl-CoA to the corresponding shikimate or quinate esters (Fig. 1). These shikimate and quinate esters, themselves being important intermediates in the phenylpropanoid pathway, have been shown recently to be good substrates for C3H (Kühnl et al., 1987 ; Schoch et al., 2001 ; Franke et al., 2002a , 2002b ; Nair et al., 2002 ). Moreover, HCT catalyzes also the reverse transesterification (Hoffmann et al., 2003 ). Therefore, HCT might play a critical role up- and downstream of C3H. For the Arabidopsis HCT homolog, a biochemical activity similar to that of the tobacco HCT has been shown (Hoffmann et al., 2003 ).
Here, one HCT gene was detected in the Arabidopsis genome (Fig. 6). Because only two homologs were characterized and the family is apparently well conserved (approximately 60% identity between monocot and dicot members; data not shown), no more distantly related genes were included.

View larger version (7K):
[in this window]
[in a new window]
|
Figure 6. Neighbor-joining tree of the HCT family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Species and GenBank identification numbers of non-Arabidopsis sequences included in this tree are: Ipomoea (6469032), Oryza (21740518), and Nicotiana (27475615). Arath, Arabidopsis; Ipoba, Ipomoea batatas; Nicta, tobacco; Orysa, rice (Oryza sativa).
|
|
The expression analysis shows that HCT is expressed in all tissues investigated but strongly in the inflorescence stem (Fig. 3; supplemental Table IVs). The promoter contains an AC element. The high and ubiquitous expression is confirmed by the second highest number of ESTs found for the 10 gene families analyzed (supplemental Table IVs). Interestingly, the combined presence of an H and a G box was observed, as for PAL4 and F5H2, suggesting transcriptional regulation by the pathway intermediate p-coumaric acid (Loake et al., 1992 ).
 |
C3H
|
|---|
C3H was originally named after its suspected function in C3-hydroxylation of p-coumaric acid, but recently, CYP98A3 (C3H1) was shown to preferentially convert the shikimate and quinate esters of p-coumaric acid into the corresponding caffeic acid conjugates, whereas p-coumaric acid and p-coumaroyl-CoA were not substrates of this enzyme (Fig. 1; Schoch et al., 2001 ; Franke et al., 2002b ; Nair et al., 2002 ).
We detected three C3H genes in the Arabidopsis genome, which all belong to the CYP98 class of the P450 enzymes. Only a few proteins of this class could be found from other species for phylogenetic analysis (Fig. 7). Arabidopsis C3H1 clusters with all known C3Hs of other species, whereas C3H2 and C3H3 (CYP98A8 and CYP98A9, respectively) probably constitute a different class that diverged before the gymnosperm-angiosperm split (Fig. 7).

View larger version (11K):
[in this window]
[in a new window]
|
Figure 7. Neighbor-joining tree of the C3H family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Species and GenBank identification numbers of non-Arabidopsis sequences included in this tree are: Sesamum (17978831), Sorghum (5915857), Pinus (17978651), and Glycine (5915858). Arath, Arabidopsis; Glyma, soybean; Sesin, Sesamum indicum; Sorbi, Sorghum bicolor; Pinta, pine.
|
|
The expression analysis shows that C3H1 is expressed in all tissues, an observation that is supported by ESTs from various tissues (supplemental table Vs). Previous studies detected the highest expression in the vascular tissues of stem and root (supplemental table Vs; Schoch et al., 2001 ; Franke et al., 2002b ; Nair et al., 2002 ). On the contrary, C3H2 and C3H3 are expressed only during particular stages of inflorescence stem development: C3H2 is expressed in older stems and C3H3 in young developing stems (Fig. 3). The fact that only one EST is found for C3H2 and none for C3H3 suggests that they are either conditionally regulated or expressed at low levels (supplemental table Vs). The promoter analysis reveals a well-conserved AC element in the promoter of C3H1, in agreement with its vascular expression detected by the GUS reporter system (Nair et al., 2002 ).
Analysis of the N terminus by TargetP predicts the C3H1 protein to contain an ER-targeting peptide, but it overlaps, as for C4H, with the membrane anchor region of P450 enzymes. The C3H1 protein has previously been localized in the membrane fraction in yeast (Franke et al., 2002b ). In contrast to C3H1, the sequences of C3H2 and C3H3 are divergent in both the stretch of basic amino acids and the hinge region of the membrane anchor. Because these regions are necessary for the correct insertion of the enzyme in the membrane (Chapple, 1998 ), the degeneration of this region suggests they are not membrane-anchored proteins. It should be noted that C3H2 and C3H3 do not hydroxylate shikimate and quinate esters of p-coumaric acid (Schoch et al., 2001 ). In conclusion, C3H1 is involved in the monolignol pathway, as is functionally demonstrated with the ref8 (reduced epidermal fluorescence) mutant (Franke et al., 2002a , 2002b ).
 |
CCoAOMT
|
|---|
CCoAOMT (E.C. 2.1.1.104) catalyzes the methylation of caffeoyl-CoA to feruloyl-CoA (in vitro and in vivo) and 5-hydroxyferuloyl-CoA to sinapoyl-CoA (at least in vitro) and is, together with COMT, responsible for the methylation of the monolignol precursors (Fig. 1; Ye et al., 1994 ; Zhong et al., 1998 ; Pinçon et al., 2001 ).
Seven putative members of the CCoAOMT gene family were detected in the Arabidopsis genome (Fig. 8). Plant CCoAOMT genes fall into two classes: Class I contains the Arabidopsis CCoAOMT1 gene together with the majority of experimentally characterized CCoAOMT genes (e.g. Zhong et al., 1998 ; Meyermans et al., 2000 ), whereas class II consists of six Arabidopsis genes and a few sequences from other species. The latter class does not closely resemble most of the certified CCoAOMT genes but contains an experimentally characterized chickweed (Stellaria longipes) CCoAOMT able to methylate caffeoyl-CoA (Zhang and Chinnappa, 1997 ).

View larger version (13K):
[in this window]
[in a new window]
|
Figure 8. Neighbor-joining tree of the CCoAOMT family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per nucleic acid. Clusters of sequences are represented as described in Figure 2. Species and GenBank identification numbers of non-Arabidopsis sequences included in this tree are: Class I dicots, Populus (2960355, 857577, 13249170, and 2960357), Zinnia (533120), Petroselinum (169648), Nicotiana (2511736), Citrus (6561880), Vitis (1000518), and Eucalyptus (5739372 and 1934858); gymnosperms, Pinus CCoAOMT (4104458); Class II dicots, Stellaria (438896) and Populus (1785476); and monocots, Zea (5101869, 5101867) and Oryza (5091496 and 5257255 [three genes]). Arath, Arabidopsis; Pinta, pine.
|
|
CCoAOMT1 is expressed in all tissues investigated and has by far the highest number of ESTs (Fig. 3; supplemental Table VIs). Moreover, the CCoAOMT1 gene has two AC elements in its promoter. CCoAOMT1 is highly expressed in the basal portion of the inflorescence as compared with the apical portion (Goujon et al., 2003 ). Of the class II genes, CCoAOMT5 and CCoAOMT7 are expressed in all tissues, but only the expression of CCoAOMT7 increases during the later stages of inflorescence stem development. Furthermore, CCoAOMT4 and CCoAOMT5 are also expressed at all stages of inflorescence stem development. Others, such as CCoAOMT2, CCoAOMT3, and CCoAOMT6, are expressed toward the end of inflorescence stem development (Fig. 3). Few ESTs have been found for most genes of class II (supplemental Table VIs).
CCoAOMT genes of other species were shown to be responsive to pathogens or elicitors (e.g. Pakusch et al., 1991 ; Chen et al., 2000 ); corresponding promoter elements were identified in CCoAOMT1, CCoAOMT2 and CCoAOMT3 (supplemental Table VIs). CCoAOMT3 has an extended N-terminal sequence, not shared by any of the other CCoAOMTs, predicted to be an ER-targeting peptide.
Based on its clustering in class I, its expression characteristics and level, and the presence of two AC elements in its promoter, CCoAOMT1 is the main candidate gene to be involved in the monolignol pathway during developmental lignification.
 |
CCR
|
|---|
CCR (E.C.1.2.1.44) catalyzes the conversion of cinnamoyl-CoA esters to their respective cinnamaldehydes and is the first enzyme of the monolignol-specific part of the lignin biosynthetic pathway (Fig. 1). The two previously described CCR genes and five new CCR-like genes were found (Fig. 9; Jones et al., 2001 ; Lauvergeat et al., 2001 ).

View larger version (11K):
[in this window]
[in a new window]
|
Figure 9. Neighbor-joining tree of the CCR family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Clusters of sequences are represented as described in Figure 2. Species and GenBank identification numbers of non-Arabidopsis sequences included in this tree are: CCR dicots, Eucalyptus (7431407, 7431408, and 10304406) and Populus (7239228, 2960364, and 9998901); CCR monocots, Lolium (9964087), Saccharum (3341511 and 17978549), and Zea (7431410 and 3242328); gymnosperms, Pinus CCR (17978649), Zea CCR2 (3668115), and Oryza CCR-like (13486725, 13486726, and 18307514); and CCR-like angiosperms, Oryza (15624051). Arath, Arabidopsis; Orysa, rice; Pinta, pine; Zeama, maize (Zea mays).
|
|
CCR1 is highly expressed in all tissues examined, whereas CCR2 is in all tissues but flowers, siliques, and the earliest stage of inflorescence stem development (Fig. 3). Although CCR2 was hardly detected in stem by RNA gel blots (Lauvergeat et al., 2001 ), the more sensitive RT-PCR clearly reveals CCR2 expression in the inflorescence stem (Fig. 3). For both genes, expression increases with age during inflorescence stem development (Fig. 3). Corresponding with the differences in expression levels of CCR1 and CCR2 (Lauvergeat et al., 2001 ), 10-fold more ESTs are found for CCR1 than for CCR2 (supplemental Table VIIs). Both genes are induced by Xanthomonas campestris infection and ESTs linked with stress and pathogen infection have been detected (Lauvergeat et al., 2001 ; supplemental Table VIIs). The promoter of CCR1 contains a well-conserved AC element and conforms with its function in lignification and the strong expression in stems (Lauvergeat et al., 2001 ; supplemental Table VIIs).
In conclusion, CCR1 and CCR2 are expressed during both developmental lignification and pathogen response, as documented by our expression analysis and ESTs (Fig. 3; supplemental Table VIIs). The role of CCR1 in lignification has clearly been established through the irx4 (irregular xylem) mutant characterization (Jones et al., 2001 ). Although CCR2 seems to be implicated in stress and elicitor response (Lauvergeat et al., 2001 ), the expression results do not exclude a (minor) role for CCR2 in developmental lignification.
 |
F5H
|
|---|
F5H, also called coniferaldehyde 5-hydroxylase, is a cytochrome P450-dependent monooxygenase (CYP84) that is required for the production of syringyl lignin because it is responsible for the 5-hydroxylation of coniferaldehyde and/or coniferyl alcohol (Fig. 1; Humphreys et al., 1999 ; Li et al., 2000 ; Humphreys and Chapple, 2002 ).
The Arabidopsis genome harbors two F5H homologs, both belonging to the CYP84 family of the P450 monooxygenases. F5H1 (CYP84A1) has been characterized in Arabidopsis, Liquidambar styraciflua, and Brassica napus (Meyer et al., 1996 ; Osakabe et al., 1999 ; Nair et al., 2000 ), whereas F5H2 (CYP84A4), a more divergent member of the CYP84 family, is described for the first time, to our knowledge, in this study. So far, no genes that closely resemble F5H2 have been detected in other plants, although the phylogeny indicates that the two proteins found in Arabidopsis diverged before the divergence of the different Rosidae subfamilies (Fig. 10).

View larger version (13K):
[in this window]
[in a new window]
|
Figure 10. Neighbor-joining tree of the F5H family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. GenBank identification numbers of non-Arabidopsis sequences included in this tree are: Populus CYP84A4 (6688937), Lycopersicon CYP84A2 (5002354), Liquidambar CYP84A3 (5731998), and Brassica F5H1, F5H2, and F5H3 (10197650, 10197652, and 10197654). Arath, Arabidopsis; Brana, Brassica napus; Liqst, Liquidambar styraciflua; Lyces, tomato (Lycopersicon esculentum); Poptr, Populus trichocarpa.
|
|
Our expression analysis revealed F5H1 expression in all tissues and an increasing expression during inflorescence stem development (Fig. 3), in accordance with results of earlier studies (supplemental Table VIIIs; Meyer et al., 1998 ; Ruegger et al., 1999 ; Goujon et al., 2003 ). F5H1 was also expressed in several other tissues but mainly in young and senescent leaves and in roots (Meyer et al. 1996 ; Ruegger et al., 1999 ). In contrast to F5H1, F5H2 had the strongest expression in the early stages of inflorescence stem development (Fig. 3). Only two ESTs were found for F5H1 and none for F5H2 (supplemental Table VIIIs).
In the promoter analysis, for both genes an H box was found and for F5H2 a G box was also found, suggesting that both genes may be inducible and that F5H2 may be regulated by p-coumarate (Loake et al., 1992 ; Lindsay et al., 2002 ). Moreover, F5H1 and F5H2 contain a fully conserved membrane anchor region. In addition, F5H2 is predicted to contain an ER-targeting peptide that coincides with the region of the membrane anchor of P450 enzymes. Remarkably, no AC element was detected for either F5H gene, although F5H1 had been shown to be involved in lignification through the analysis of the fah1 mutant (Chapple et al., 1992 ).
 |
COMT
|
|---|
COMT (E.C. 2.1.1.68) was originally postulated to be a bifunctional enzyme methylating caffeic acid and 5-hydroxyferulic acid. However, in vitro and transgenic studies revealed that the predominant role of COMT is the methylation of 5-hydroxyconiferaldehyde and/or 5-hydroxyconiferyl alcohol to sinapaldehyde and/or sinapyl alcohol, respectively (Fig. 1; Osakabe et al., 1999 ; Li et al., 2000 ; Chen et al., 2001 ; Guo et al., 2001 ; Parvathi et al., 2001 ; Goujon et al., 2003 ).
We detected only one COMT gene in the Arabidopsis genome. Furthermore, 13 proteins similar to COMT were detected that clustered in-between the functionally characterized COMT clade and the cluster containing the hydroxycinnamic acid/hydroxycinnamoyl-CoA ester O-methyltransferase protein (AEOMT; Li et al., 1997 , 1999 ), i.e. among proteins that have been shown to use a wide variety of substrates (Fig. 11; Vernon and Bohnert, 1992 ; Maxwell et al., 1993 ; Pellegrini et al., 1993 ; Takeshita et al., 1995 ). Because the role of AEOMT in the monolignol pathway is still a matter of debate (Anterola et al., 2002 ), and other COMT candidate genes of conifers clustered much more closely to the known COMTs, it is unclear whether these 13 genes play any role in the monolignol pathway. Therefore, these genes were classified as COMT likes. As a consequence, only one class of COMTs exists in plants (Fig. 11; Maury et al., 1999 ).

View larger version (17K):
[in this window]
[in a new window]
|
Figure 11. Neighbor-joining tree of the COMT family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Clusters of sequences are represented as described in Figure 2. Species and GenBank Identifier numbers of non-Arabidopsis sequences included in this tree are: COMT dicots, Populus (7528266, 762870, 231757, 444327, 7332271, 7447887, and 762872), Stylosanthes (1582580), Medicago (116908), Prunus (3913295), Fragaria (6760443), Liquidambar (5732000), Chrysosplenium (1184041 and 567077), Vitis (7271883), Capsicum (3421382, 7488967, and 12003964), Nicotiana (480082 and 480083), Eucalyptus (1169009 and 5739365), Clarkia (2832224 and 3913289), Mesembryanthemum (7447880), Thalictrum (4808522, 4808524, 4808526, 4808528, and 4808530), Catharanthus (18025321), Ocimum (5031492, 5031494), and Zinnia (642952); COMT monocots, Lolium (4104220, 4104222, 4104224, and 2388664), Sorghum (18033964), Saccharum (3341509), Zea (729135), and Festuca (14578611, 14578613, 14578615, and 14578617); COMT gymnosperms, Pinus (15524083), Picea (COMT-C7 and COMT-C16; M.H. Walter, personal communication); Nicotiana Catechol-OMT III (542050); Glycyrrhiza OMT (1669591), Medicago OMT (7447884), Mesembryanthemum IMT1 (1170555), Coptis sinapoyl-Glc:malate sinapoyltransferase (SMT; 758580), and Medicago O-diphenol OMT (6688808); and AEOMT gymnosperms, Pinus (7447883, 1777386, and 4574324). Arath, Arabidopsis; Copja, Coptis japonica; Glyec, Glycyrrhiza echinata; Medsa, alfalfa; Mescr, Mesembryanthemum crystallinum; Nicta, tobacco.
|
|
Our RT-PCR data show that COMT is expressed in all tissues investigated, and the numerous ESTs point toward a generally high and ubiquitous expression (Fig. 3; supplemental Table IXs). Ninety-nine COMT ESTs, with a fifth being stress related, is almost twice the number found for any other gene in this analysis (supplemental Table IXs). COMT expression is particularly high in the inflorescence stem, with an increase during the later stages of development (Fig. 3; supplemental Table IXs). Correspondingly, COMT::GUS expression occurs in xylem, differentiating fibers, and mature phloem (Goujon et al., 2003 ). Unlike many other monolignol biosynthesis genes, COMT has no AC elements in its promoter. In fact, to the best of our knowledge, AC elements have never been reported in COMT promoters of other plants either.
Interestingly, the COMT protein might be myristoylated. The N-terminal MGSTAETQLTPVQVTDDE sequence was identified as a "twilight zone" myristoylation signal, which corresponds both with truly myristoylated proteins and with false positives (Maurer-Stroh et al., 2002 ). Myristoylation is generally associated with cell membrane anchoring or, as recently shown for an Arabidopsis protein kinase, ER attachment (Lu and Hrabak, 2002 ). Pending the experimental verification of this observation, the putative localization of the COMT protein indicates a new research avenue in the field of monolignol channeling and export.
 |
CAD
|
|---|
CAD (E.C. 1.1.1.195) catalyzes the last step in monolignol biosynthesis, i.e. the reduction of cinnamyl aldehydes into their corresponding alcohols (Fig. 1). CAD reduces various aldehydes, present in different cell types or during different stages of development. Besides the function in developmentally regulated lignification, a number of CAD genes have been characterized for their response to plant pathogens (Kiedrowski et al., 1992 ).
Here, nine putative CAD genes were detected in the Arabidopsis genome (Table I; Tavares et al., 2000 ; Sibout et al., 2003 ). Our phylogenetic analysis revealed that eight of the CAD proteins fall into three classes, whereas CAD9 is more divergent (Fig. 12). CAD2 and CAD6, belonging to the class I CADs, closely resemble CAD proteins that have been characterized for their involvement in lignification in other species. The topology of the tree indicates furthermore that the class I "true" CAD clade diverged from the other CADs before the angiosperm-gymnosperm split (Fig. 12).

View larger version (12K):
[in this window]
[in a new window]
|
Figure 12. Neighbor-joining tree of the CAD family, inferred from Kimura corrected evolutionary distances. Bootstrap values (NJ/ML) above 50% are shown at the internodes. The scale measures evolutionary distance in substitutions per amino acid. Clusters of sequences are represented as described in Figure 2. Species and GenBank Identifier numbers of non-Arabidopsis sequences included in this tree are: Class I dicots, Populus (421814, 1168734, 9998899, and 7239226), Nicotiana (231676 and 231675), Medicago (399168), Aralia (1168727), Zinnia (1944403), Eucalyptus (1705554, 10281656, 399165, 10719920, and 3913185); Class I monocots, Saccharum (10719916), Zea (3913182 and 7430938), Lolium (3913181), Festuca (15428276, 15428278, 15428280, and 15428282); gymnosperm CAD, Picea (584872 and 10719915), Pinus (107623, 3334135, 1168733, and 3372645); Class II dicots, Stylosanthes (3913194), Apium (12643507), Petroselinum (1168732), Lycopersicon (8099340 and 7430935), Mesembryanthemum (10720090), Fragaria (10720093, 13507210), and Populus (14279694); and Class III dicots: Stylosanthus (3913193) and Medicago (10720088). Arath, Arabidopsis.
|
|
Class II CADs (CAD3, CAD4, and CAD5) cluster with a number of alcohol dehydrogenases with diverse substrate preferences, such as the poplar (Populus tremuloides) sinapyl alcohol dehydrogenase (Li et al., 2001 ), the celery (Apium graveolens) mannitol dehydrogenase (Williamson et al., 1995 ), and the parsley ELI3/CAD proteins (Kiedrowski et al., 1992 ; Logemann et al., 1997 ). CAD4 (AtELI3-1) and CAD5 (AtELI3-2) have been identified previously as responsive to elicitor treatments and Pseudomonas syringae infection (Kiedrowski et al., 1992 ). Moreover, CAD5 has a substrate specificity distinct from "true" CADs, mannitol dehydrogenase, and aromatic alcohol: NADP+ oxidoreductase and was, therefore, named benzyl alcohol dehydrogenase (BAD; Somssich et al., 1996 ).
Class III CADs (CAD1, CAD7, and CAD8) cluster in a group with an alcohol dehydrogenase from alfalfa (Medicago sativa), which is able to catalyze the reduction of cinnamaldehyde, sinapaldehyde, and coniferaldehyde, but also several aliphatic aldehydes and various substituted benzaldehydes (Brill et al., 1999 ). Being very divergent from class I "true" CADs, this class also represents a group of multisubstrate alcohol dehydrogenases.
All CAD genes, except CAD2, CAD4, and CAD5, are expressed in all stages of inflorescence stem development (Fig. 3). Moreover, CAD2 and CAD6 are expressed in the inflorescence stem close to the bundle and interfascicular cambium, as revealed by promoter::GUS constructs (Sibout et al., 2003 ). Expression of most CAD genes is documented by ESTs, except for CAD7 and CAD8, which are nevertheless expressed, as indicated in the RT-PCRs (Fig. 3; supplemental Table Xs).
The promoter analysis revealed that CAD6 from class I and CAD5 from class II contain AC elements (supplemental Table Xs). In addition, an A box was detected in the CAD6 promoter. The fact that only one gene in the pathway contains both an AC element and an A box casts doubt on the previous assumption that an A box works in conjunction with AC elements (Log |