- © 2005 American Society of Plant Biologists
Abstract
Wood formation is a fundamental biological process with significant economic interest. While lignin biosynthesis is currently relatively well understood, the pathways leading to the synthesis of the key structural carbohydrates in wood fibers remain obscure. We have used a functional genomics approach to identify enzymes involved in carbohydrate biosynthesis and remodeling during xylem development in the hybrid aspen Populus tremula × tremuloides. Microarrays containing cDNA clones from different tissue-specific libraries were hybridized with probes obtained from narrow tissue sections prepared by cryosectioning of the developing xylem. Bioinformatic analyses using the sensitive tools developed for carbohydrate-active enzymes allowed the identification of 25 xylem-specific glycosyltransferases belonging to the Carbohydrate-Active EnZYme families GT2, GT8, GT14, GT31, GT43, GT47, and GT61 and nine glycosidases (or transglycosidases) belonging to the Carbohydrate-Active EnZYme families GH9, GH10, GH16, GH17, GH19, GH28, GH35, and GH51. While no genes encoding either polysaccharide lyases or carbohydrate esterases were found among the secondary wall-specific genes, one putative O-acetyltransferase was identified. These wood-specific enzyme genes constitute a valuable resource for future development of engineered fibers with improved performance in different applications.
Wood constitutes the oldest source of renewable energy for mankind and an important raw material for construction materials, textiles, pulp, and paper, as well as many other products. Recent scientific and technological advances offer new possibilities to expand the use of plant fibers toward novel types of products, such as biocomposite materials. In particular, the advent of biotechnology for short-rotation forestry and advances in enzyme technology will allow innovative fiber engineering to alter the structure, composition, and properties of the raw material. Wood formation is thus a fundamental biological process with significant economic interest.
Wood tissue develops by terminal differentiation of the vascular cambium. Xylogenesis progresses through several stages, including cell division, cell expansion, secondary cell wall formation, and programmed cell death. Although xylem fibers constitute the main body of wood biomass, other cell types contribute substantial heterogeneity. Poplar wood is typically composed of 33% (v/v) vessel elements, 53% to 55% fibers, 11% to 14% ray parenchyma, and less than 1% axial parenchyma (Mellerowicz et al., 2001). Each cell type has a characteristic structure and chemical composition and a different rate of differentiation. The resulting wood properties depend on the size, shape, and arrangement of different cell types in the xylem, as well as the structure, composition, and morphology of the secondary cell walls. Poplar wood consists of about 50% cellulose, 30% hemicellulose, and 20% lignin (Balatinecz et al., 2001). The major hemicellulose in aspens is O-acetyl-(4-O-methylglucurono)-xylan, but a smaller portion of O-acetylated glucomannan is also present (Gustavsson et al., 2001; Teleman et al., 2003). While the overall process of lignin biosynthesis has been studied intensively (for review, see Boerjan et al., 2003), this is not the case for the biosynthesis of the cell wall carbohydrate polymers.
An analysis of the entries in the Carbohydrate-Active EnZYme database (CAZy; http://afmb.cnrs-mrs.fr/CAZY/) reveals that the number of genes encoding glycosyltransferases (GTs) and glycoside hydrolases (GHs) is far greater in plants than in any other organisms sequenced so far (Coutinho et al., 2003b). This apparently reflects the importance of these enzymes in the biosynthesis and remodeling of the cell wall carbohydrate polymers during different stages of plant life. However, the majority of the corresponding genes belong to large gene families encoding enzymes with diverse substrate and product specificities. Therefore, their precise roles and functions are difficult to resolve based on the gene sequences alone (Henrissat et al., 2001; Coutinho et al., 2003a). Plant GTs are typically membrane bound, which makes their isolation and biochemical characterization difficult. Even other wood-specific enzymes present considerable challenges because of the inherent resilience of the source material. Consequently, the biosynthesis and enzymatic remodeling of the secondary cell wall carbohydrate polymers have remained poorly understood.
Functional genomics offers the means to identify enzymes and other proteins involved in developmental processes, including wood formation. The Swedish Populus Genome Project has previously established a database of 102,019 expressed sequence tag (EST) sequences from 19 different cDNA libraries from the European aspen (Populus tremula), the hybrid aspen (Populus tremula × tremuloides T89), and the black cottonwood (Populus trichocarpa; http://www.populus.db.umu.se; Sterky et al., 1998, 2004; Bhalerao et al., 2003). We have previously carried out expression profiling during wood formation using a microarray of 2,995 cDNA clones from a broad cambial region library. cDNA targets were prepared by tangential cryosectioning of tissue sections representing different stages of developing wood, including phloem (Ph), meristematic cells (A), early expansion (B), late expansion (C), secondary cell wall formation (D), and programmed cell death (E; Hertzberg et al., 2001). A larger poplar microarray with 13,490 cDNA fragments from 7 different, tissue-specific poplar cDNA libraries has been constructed to identify genes involved in leaf development (Andersson et al., 2004) and cellulose biosynthesis (Djerbi et al., 2004). The estimated redundancy on the 13.5 K microarray is 28%, indicating that approximately 10,000 unique genes are represented on the array. In this article, we have mined the old as well as previously unpublished poplar microarray data to identify xylem-specific genes encoding carbohydrate-active enzymes (CAZymes).
RESULTS AND DISCUSSION
Expression Profiling over the Developing Xylem
The 13.5 K microarray was hybridized using targets prepared from zones A to E during xylem development, as described earlier (Hertzberg et al., 2001). In order to retrieve (1) a broad group of candidate genes expressed over the developing cambium and (2) a narrower group of genes showing a relatively higher level of expression specifically in zone D, the expression data obtained were subjected to two stages of filtering. Filter I was tuned to select clones that had at least 2-fold higher expression in zone D compared to the average level of expression in all the zones (A–E). The first filtering step yielded a collection of 1,141 clones. These clones were subjected to a second filtering step that required, in addition, a higher signal in zone D than in neighboring zones C and E. The resulting 450 clones were grouped into 9 clusters based on their expression profiles across the entire wood-forming zone (A–E) using self-organizing maps (Fig. 1). The results obtained were validated in silico by comparative analysis with data obtained by independent experiments by Hertzberg et al. (2001) and Djerbi et al. (2004; Tables I and II). Because of the redundancy of the 13.5 K microarray, some of the genes are represented by more than one EST clone. With few exceptions, these redundant clones show similar transcript profiles, and they fall into the same clusters.
Clustering using self-organizing maps of 450 genes. Genes with less than two valid data points in sections C to E or with a signal less than 2-fold higher in D than in A to C and E were filtered out.
Strictly D-specific genes encoding CAZymes: a selection of genes that passed both filter I and filter II
Ratios are presented in log2 scale, and log2 ratios >1 are in bold.
Genes encoding CAZymes highly expressed during secondary cell wall formation: a selection of genes that passed filter I but did not fulfill the requirements of filter II
Ratios are presented in log2 scale, and log2 ratios >1 are in bold.
Annotation of the identified ESTs or full-length cDNAs was carried out with biocomputing tools routinely used to assign novel protein sequences to the CAZy database (http://afmb.cnrs-mrs.fr/CAZY; Coutinho and Henrissat, 1999). To avoid common misannotations introduced by BLAST-based automated annotation, each gene was compared against a library of individual catalytic and ancillary modules, so that family assignment was not impaired by the frequent modularity of CAZymes. Among the 1,140 xylem-specific cDNA clones that passed the first filtering, 34 individual genes turned out to encode putative CAZymes, and 23 out of these also passed the second, more stringent filtering (Table I). Thus, genes encoding 11 CAZYmes passed filter I but did not fulfill the requirements of filter II (Table II). In contrast with the large number of GT and GH-encoding genes that were found, no genes encoding polysaccharide lyases or carbohydrate esterases were identified. In Table III, the Populus secondary cell wall-related CAZymes hereby identified are listed together with their closest hits in Arabidopsis (Arabidopsis thaliana) and with genes reported to have high expression in the secondary walls in Arabidopsis (Oh et al., 2003), Zinnia (Zinnia elegans; Demura et al., 2002), loblolly pine (Pinus taeda; Whetten et al., 2001; Lorenz and Dean, 2002), black locust (Robinia pseudoacacia; Yang et al., 2003), and cotton (Gossypium hirsutum; Pear et al., 1996; Zhao et al., 2001; Li et al., 2002; Ji et al., 2003). Although all of the experiments on these species are not strictly comparable to ours, coinciding expression in secondary walls serves as an additional quality control.
Closest Arabidopsis homologs to the Populus CAZymes and published secondary wall-related expression data from other relevant plants
Classification of CAZymes Highly Expressed during Secondary Cell Wall Formation
Glycosyl Transferases
Family GT2
This is the largest GT family, with almost 3,200 members in the CAZy database, which was updated on October 1, 2004. Although few members of this family have been biochemically characterized, many different donor/acceptor specificities have been identified, all forming glycosidic linkages of stereochemistry opposite to that of the starting nucleotide sugar. This large family contains both single-addition and multiple-addition (or processive) GTs. Arabidopsis apparently contains two single-addition GT2 GTs (presumably dolichol-phosphate β-mannosyltransferase and dolichol-phosphate β-glucosyltransferase) and 40 processive GTs, which include cellulose synthases (CesA).
Nine different CesA genes have been identified so far in the EST libraries of PopulusDB (Sterky et al., 1998; Djerbi et al., 2004; Table IV). Based on the different microarray analyses (Hertzberg et al., 2001) and complementary experiments by real-time PCR (Djerbi et al., 2004), four CesA genes (PttCesA1, PttCesA3-1, PttCesA3-2, and PttCesA9) were found to be particularly highly expressed during xylogenesis. The full-length secondary cell wall-associated CesA genes from hybrid aspen cluster in the same phylogenetic groups as the xylem-specific CesA genes previously identified in other poplars, rice (Oryza sativa), barley (Hordeum vulgare), and Arabidopsis (Table IV), thus providing additional support for their involvement in secondary cell wall biosynthesis.
Percentage identities/positives of the protein sequences of CesA proteins identified in poplar, Arabidopsis, barley, and rice
Gene sequences were retrieved from the Stanford Web site (http://cellwall.stanford.edu). Boldface indicates genes involved in secondary cell wall synthesis (data from Taylor et al., 2000, 2003; Joshi, 2003; Tanaka et al., 2003; Burton et al., 2004; Djerbi et al., 2004). At, Arabidopsis; Ptr, P. tremuloides; Hv, barley; Os, rice.
Processive GT2 GTs in plants have been assigned to different subfamilies, and the more distantly related cellulose synthase-like (CSL) proteins apparently synthesize noncellulosic polysaccharides (Richmond and Somerville, 2001). This has been demonstrated for the guar β(1→4)-mannan synthase in one CSL subfamily (Dhugga et al., 2004). One CSL gene, PttGT2A, was found to be highly expressed in the wood-forming tissues of poplar (Table I). Since the molecular mechanism of catalysis appears to be conserved within the GT families (Campbell et al., 1997; Coutinho et al., 2003a) and since CSL proteins are clearly related to GTs synthesizing β-linked polysaccharides such as cellulose and mannan, PttGT2a also is likely to be involved in the synthesis of a β-linked polysaccharide, such as glucomannan or O-acetyl-(4-O-methylglucurono)-xylan, for example.
The closest homolog of PttGT2a is the AtCSLA9 from Arabidopsis. A T-DNA insertion mutation of AtCslA9 renders the mutant (rat4) resistant to Agrobacterium transformation. No differences were observed in linkage structure of insoluble noncellulosic polysaccharides because of the defective gene (Nam et al., 1999; Zhu et al., 2003). However, elevated levels of Gal were detected in the mutant plants, as would be expected if AtCSL9A was a bona fide mannan synthase. Since the PttGT2A transcript is abundant at the onset of the secondary cell wall formation, PttGT2A might be either a mannan or a xylan synthase.
Family GT8
The 13.5 K Populus microarray contains 13 genes assigned to family GT8, and 8 of these (PttGT8A–8H) exhibit a high level of expression in zone D (Tables I and II). All known members of this family catalyze the formation of α-glycosidic linkages. GT8 contains enzymes with many different known substrate specificities (e.g. α-glucosyltransferase, α-galactosyltransferase, glycogenin, and galactinol synthase) as well as enzymes of unknown donor/acceptor specificity. Phylogenetic analysis of all plant GT8 sequences (Fig. 2; Lao et al., 2003) reveals several distinct subfamilies. Only one subfamily contains enzymes with known activities (galactinol synthase), but no poplar sequence was found in this subfamily. PttGT8A to PttGT8C belong to another subfamily (Fig. 2), which contains sequences with distant similarity to animal glycogenins. Glycogenin is an autocatalytic, self-glucosylating protein that serves as the primer for glycogen synthesis. It has been proposed that a homologous enzyme in this subfamily (GenPept CAD89094) is involved in the initiation of starch biosynthesis in Zea mays (Chatterjee and Burrell, 2003). The Arabidopsis ortholog At3g18660 has a putative transit peptide for plastid localization, consistent with a role in starch synthesis. However, PttGT8A is highly expressed during the D and E zones, and both PttGT8B and PttGT8C cluster together with the secondary cell wall-specific CesAs (data not shown). The corresponding enzymes are thus unlikely to contribute to starch biosynthesis in poplar. However, it cannot be excluded that glycogenin-like proteins may have a priming function for other cell wall polysaccharides.
Phylogenetic analysis of plant GTs from family GT8. Profile-aligned Populus sequences are in bold, and partial sequences are indicated. Subfamilies with at least one characterized enzyme are outlined by a thick black line and those without defined activities by a thin gray line.
PttGT8D and PttGT8G belong to another GT8 subfamily (Fig. 2). PttGT8D is coexpressed with the secondary cell wall-related CesA genes, showing a sharp peak of expression in the D zone, while PttGT8G is also up-regulated in the C zone. Enzymes in this subfamily are presumably linked to pectin synthesis (Bouton et al., 2002), although the precise donor/acceptor specificity remains to be established.
PttGT8E and PttGT8F are found in yet another subfamily, which contains enzymes with unknown donor/acceptor specificity but with a predicted role in pectin biosynthesis (Lao et al., 2003; Fig. 2). There is no pectin biosynthesis after the onset of secondary wall formation in wood fibers and vessel elements. However, living parenchyma cells make pectins, which participate in the formation of protective and isotropic layers as well as tyloses (Fujii et al., 1981; Rioux et al., 1998). These wall structures are formed following secondary wall deposition. Since the secondary wall is deposited earlier in vessel elements and contact ray cells (Mellerowicz et al., 2001), PttGT8D, PttGT8E, PttGT8F, and PttGT8G may be specifically involved in the pectin biosynthesis in the contact ray cells. PttGT8H, whose up-regulation is even more pronounced in the E zone than in the D zone, cannot be assigned to any of the subfamilies identified (Fig. 2).
Family GT14
Two xylem-specific GT14 members, PttGT14A and PttGT14B, were identified by expression profiling. The characterized members of family GT14 enzymes of animal origin catalyze the transfer of β(1→6)-linked N-acetylglucosaminyl and β-linked xylosyl residues to proteins. So far, not a single plant family GT14 GT has been characterized. A phylogenetic analysis of this family shows that all the plant sequences cluster to a single subfamily, different from the subfamilies containing the animal β(1→6)-N-acetylglucosaminyltransferases and the β-xylosyltransferases (data not shown). Whether this reflects taxonomy or enzyme specificity differences is not known and awaits functional characterization.
Family GT31
Two genes in this family, PttGT31A and PttGT31B, were highly expressed in the tissues undergoing secondary cell wall synthesis (zone D). Family GT31 is a large eukaryotic family grouping together a few characterized animal β(1→3)-N-acetylglucosaminyltransferases, β(1→3)-galactosyltransferases, and β(1→3)-N-acetylgalactosaminyltransferases, together with a large number of GTs of unknown donor/acceptor specificity. Enzymes in this family form β-glycosidic linkages from α-linked nucleotide sugars. No plant homolog has been functionally characterized, but a high expression of one ortholog in Arabidopsis has been demonstrated in secondary xylem (Oh et al., 2003; Table III). The plant GT31 enzymes form distinct subfamilies, which may well coincide with more than one donor/acceptor specificity.
Family GT43
Family GT43 groups together only eukaryotic GTs. While functional (β-glucuronyltransferases) and structural data are reasonably abundant in the animal field, no function has been established experimentally for the plant homologs. Phylogenetic analysis divides plant GT43 sequences into two large subfamilies (Fig. 3). The two poplar xylem-specific genes in this family (PttGT43A and PttGT43B) are very similar to each other and encode proteins in the same subfamily. The deduced protein sequence of a cotton gene, which is highly expressed during fiber development (GenPept AAQ54338), is a close homolog of PttGT43A and PttGT43B. Furthermore, a member of the GT43 family is highly expressed in Arabidopsis xylem (Oh et al., 2003; Table III). The transcriptional clustering of PttGT43B together with the CesAs (data not shown) also suggests a role during fiber synthesis in poplar.
Phylogenetic analysis of plant GTs from family GT43. Legend as in Figure 2.
Family GT47
Out of the 10 family GT47 genes represented on the Populus array, four genes (PttGT47A–PttGT47D) were highly expressed in zone D. Mechanistic conservation in the families suggests that enzymes in GT47 form glycosidic bonds of stereochemistry opposite to that of the nucleotide sugar donor. The phylogenetic tree representing all plant GT47 enzymes (Fig. 4) reveals five subfamilies similar to those established previously (Zhong and Ye, 2003; Li et al., 2004). Two of these subfamilies contain enzymes with known activities. One of them contains the recently identified pectin β-glucuronyltransferase NpGUT1 from Nicotiana plumbaginifolia (Iwai et al., 2002). The nolac-H18 mutant affecting the NpGUT1 gene leads to a weak intercellular attachment of callus, and analysis of the pectin structure of the nolac-H18 mutant suggests that the enzyme transfers GlcUA to pectin rhamnogalacturanan II (Iwai et al., 2002). PttGT47A and PttGT47D belong to this subfamily, suggesting that they may be involved in the transfer of glucuronyl side chains to the poplar secondary wall 4-O-methylglucuronoxylan. Alternatively, the enzymes might synthesize rhamnogalacturonan II in ray cells that are forming the protective layer (Fujii et al., 1981). Subfamily A contains an Arabidopsis xyloglucan β-galactosyltransferase (MUR3; Madson et al., 2003), but the poplar ortholog of mur3 was not represented on these microarrays. PttGT47B and PttGT47C belong to the uncharacterized subfamilies E and C, respectively (Fig. 4).
Phylogenetic analysis of plant GTs from family GT47. Legend as in Figure 2.
Family GT61A
Only one candidate in this family (PttGT61A) has been identified in the Populus EST database. This gene is transcribed at a high level during xylogenesis over zones B to D (Table II). Family GT61 is a eukaryotic family with diverse plant, animal, and fungal proteins. In contrast with most GT families, only plant-derived members of this family have been characterized. The phylogenetic tree of plant GT61 enzymes reveals three subfamilies (Fig. 5). PttGT61A belongs to a subfamily that contains β(1→2)-xylosyltransferases from Arabidopsis, rice, and Physcomitrella. These enzymes catalyze the transfer of Xyl from UDP-Xyl to the core β-linked Man of N-linked oligosaccharides of glycoproteins (Strasser et al., 2000). Consequently, PttGT61A probably has no direct role in the synthesis of structural carbohydrates, but its presence among the secondary wall-specific genes reinforces the observation that correct protein glycosylation is required for efficient cellulose synthesis (Lukowitz et al., 2001; Williamson et al., 2002).
Phylogenetic analysis of plant GTs from family GT61. Legend as in Figure 2.
Glycosidases and Transglycosylases
Family GH9
Only one member of this family, PttCel9A, was up-regulated during secondary cell wall formation in Populus. The family GH9 enzymes are typically endoglucanases acting on β(1→4)-glucan polymers (cellulose). Genes encoding GH9 enzymes have been found in both prokaryotic and eukaryotic organisms with 25 putative family GH9 members identified in the Arabidopsis genome. PttCel9A contains an N-terminal cytoplasmic tail followed by a membrane anchor and a C-terminal catalytic domain (Rudsander et al., 2003). The deduced protein sequence of PttCel9A shares 82% sequence identity with the Arabidopsis enzyme KOR1, which was identified by the cellulose-deficient dwarf mutant Korrigan (Nicol et al., 1998). KOR1 protein is abundant during intensive cellulose biosynthesis, and the corresponding gene is highly expressed during secondary cell wall synthesis (Szyjanowicz et al., 2004). KOR1 has been proposed to release nascent cellulose chains from a putative sitosterol-glucoside primer (Peng et al., 2002) or to contribute in cellulose microfibril assembly or editing (Molhoj et al., 2002). Recombinant variants of Cel9A from Brassica napus (Molhoj et al., 2001) and hybrid aspen (Master et al., 2004) have been expressed in Pichia pastoris without the cytoplasmic and membrane-spanning domains. The purified proteins hydrolyzed low-substituted, soluble β(1→4)-glucan but not hemicelluloses. The specific activity of the recombinant PttCel9A was orders of magnitude lower than that of corresponding microbial enzymes (Master et al., 2004), suggesting that efficient cellulose degradation is not the primary role of the enzyme.
Family GH10
One putative Populus xylanase gene, PttXyn10, was highly up-regulated in zone D. Most known family GH10 enzymes are endo-β(1→4)-xylanases found in many different microbes and plants. The Arabidopsis genome contains 12 family GH10 genes (http://afmb.cnrs-mrs.fr/CAZY/). In addition to a catalytic module, other functional modules are commonly found in family 10 xylanases. PttXYN10 is similar to AtXyn1, which is predominantly expressed in vascular bundles (Suzuki et al., 2002). Full-length sequencing reveals three consecutive family CBM22 carbohydrate-binding modules at the N terminus of the predicted protein sequence. It seems thus that the PttXYN10 gene encodes an endoxylanase, which may be involved in xylan hydrolysis or remodeling during xylogenesis.
Family GH16
Close to 30 different family GH16 genes have been identified in the PopulusDB, and 16 of them were found in wood-forming tissues (N. Nishikubo and E. Mellerowicz, unpublished data). This family of enzymes groups together prokaryotic and eukaryotic enzymes of different substrate specificities, but all of the plant members characterized so far act exclusively on xyloglucan (Rose et al., 2002). Some plant family GH16 enzymes, including the hybrid aspen PttXET16A (Å. Kallas, S. Denman, K. Piens, H. Henriksson, P. Johansson, T.A. Jones, H. Brumer, and T.T. Teeri, unpublished data), are strict xyloglucan endotransglycosylases (XET; E.C. 2.4.1.207), whereas others have varying degrees of xyloglucan endohydrolase (XH; E.C. 3.2.1.151) activity (Rose et al., 2002). Only one member of GH16, PttXET16L, exhibited a high level of transcription during secondary cell wall formation in poplar. PttXET16L is most similar to At1g14720 (XTH28, also known as XTR2). Constitutive expression of XTR2 has been reported during various conditions in Arabidopsis (Xu et al., 1995; Akamatsu et al., 1999), with specific up-regulation in the developing secondary xylem (Oh et al., 2003).
Although xyloglucan is not supposed to be an abundant component of secondary walls, XET activity has been demonstrated recently in xylem and phloem fibers at an early stage of secondary cell wall formation (Bourquin et al., 2002). Xyloglucan incorporation into primary walls of fibers is probably completed at this stage, but xyloglucan degradation has been detected in xylogenic Zinnia cultures during secondary wall deposition (Ohdaira et al., 2002). PttXET16L belongs to the subfamily 3 of XTH genes, which also contains the putative hydrolytic xyloglucan endotransglucosylases/hydrolases (XTH) from Nasturtium seeds (Rose et al., 2002). It is thus possible, although not yet experimentally demonstrated, that these enzymes contribute to xyloglucan degradation by endohydrolytic activity.
Family GH17
Only one member from the family GH17, PttGH17, was found among the secondary cell wall-associated genes of hybrid aspen. This family contains enzymes able to degrade mainly β(1→3)- or β(1→3)(1→4)-glucans. They have been well characterized in plants, both at the biochemical and at the three-dimensional structural level. The D-specific cDNA fragment (PttGT17A) shows high similarity with a β(1→3)-glucanase (SgGN) from willow (Salix gilgiana Seemen; Futamura et al., 2000). Both PttGT17A and SgGN show high similarity with anther-specific β(1→3)-glucanases from Arabidopsis and Brassica (Hird et al., 1993). Some β(1→3)-glucanases have a glycosylphosphatidylinositol anchor, but the Arabidopsis homologs to the poplar PttGH17 lack this feature (Borner et al., 2002). About one-half of the 49 GH17 family members contained in the Arabidopsis genome are appended to an auxiliary module of unknown function denoted X8 (Henrissat and Davies, 2000). However, the D-specific poplar PttGT17A gene does not appear to include an X8 module. Recently, it has been established that callose [β(1→3)-glucan] occurs as a regular feature in the exoplasmic zone of cotton fiber secondary walls, not just as a result of a stress response (Salnikov et al., 2003). However, these expression data indicate higher expression of PttGH17A in phloem than xylem (Table I). Since the sieve areas of phloem sieve tubes are particularly rich in callose, PttGH17 may indeed have a role in callose degradation.
Family GH19
One family GH19 gene, PttGH19A, was picked in the screen for genes highly expressed during poplar secondary cell wall synthesis. The only function established so far for members in this family is the hydrolysis of chitin, and plant chitinases are a component of the defense system against fungal infection. However, xylem is unlikely to be the site of primary attack by fungi. The molecular mechanism and the three-dimensional structure of several GH19 chitinases have been determined (Hart et al., 1995; Hahn et al., 2000). Two Glu perform catalysis, one acting as a general acid and the other as a base to activate a water molecule for the single-step displacement reaction. Close homologs to PttGH19A have been identified in other plants, two from Arabidopsis (At3g16920 and At1g05850) and cotton (GenPept AAP80801 and AAP80801) as well as one from rice (GenPept BAC81645), maize (Zea mays; GenPept AAQ31297), and pea (Pisum sativum; GenPept BAC81645). Interestingly, in all of them, including PttGH19A, the catalytic acid seems to be missing.
In Arabidopsis, the mutation ELP1 defective in the At1g05850 gene (also denoted AtCTL) caused ectopic deposition of lignin and aberrant shapes of cells with incomplete cell walls in the pith of inflorescence stems. Ethylene production also was enhanced in this mutant. High expression of the AtCTL gene was detected in seedlings, stems, and flowers during normal plant growth and development. The expression of this gene was not influenced by wounding or induction by salicylic acid, pectin fragments, or ethylene (Zhong et al., 2002). Another Arabidopsis mutant, POM1, is apparently identical to ELP1. In a dendrogram obtained by hierarchical cluster analysis of Arabidopsis lines based on their Fourier transform infrared spectra, the two pom1 alleles clustered together with other cell wall mutants with reduced cellulose content in the walls (Mouille et al., 2003; Pilling and Hofte, 2003). Similar to PttGH19A, a chitinase-like gene, also lacking the apparent catalytic acid, is preferentially expressed in fibers during secondary wall deposition in cotton (Zhang et al., 2004). Different hypotheses can be offered to explain the apparent lack of an essential catalytic residue in these plant proteins. It seems most likely these proteins have acquired another function that does not depend on chitinase or any enzymatic activity. The acquisition of novel function by plant glycosidase scaffolds has been discussed by Coutinho et al. (2003b), who have noted that there is no obvious relationship between the ancestral and the novel, acquired function. Another possibility is that this subset of the GH19 proteins includes enzymes relying on an externally recruited acid/base, as described for an S-glycosidase, myrosinase (Burmeister et al., 2000), or one that uses substrate assistance from a ligand-donated acid. However, further functional analyses in vivo and in vitro are required to uncover the specific function of these xylem-specific proteins.
Family GH28
Family GH28 contains polygalacturonases (PGs), which are among the most important enzymes associated with pectin degradation and cell wall modification in the late phases of fruit ripening, in organ abscission, in pod dehiscence, and in pollen maturation (Dal Degan et al., 2001). Among several PG-like genes of family GH28 in the EST database PopulusDB, only one (PttGH28A) exhibited a high level of expression in zone D. Other examples of putative secondary cell wall-associated endo-PGs have been identified in differentiating tracheary elements of Zinnia (Demura et al., 2002; Nakashima et al., 2004) and in differentiating vascular tissue in tomato (Lycopersicon esculentum; Sitrit et al., 1999). It has been suggested that PGs may participate in the primary wall degradation between secondary thickenings of primary xylem (Ohdaira et al., 2002) and in the regions where future perforations develop in secondary xylem vessel elements (for review, see Mellerowicz et al., 2001). Pits, on the other hand, retain a thin layer of their primary wall (pit membrane) that probably undergoes some form of wall modification before cell autolysis. Pit membranes contain high levels of pectin (Zwieniecki et al., 2001), which could act as a substrate for the xylem-specific PGs.
Family GH35
Two D-specific genes (PttBGal35A and PttBGal35B) in family GH35 were also identified. This family is one of the few isofunctional families in the CAZy database, as it contains only β-galactosidases from plants, animals, fungi, bacteria, and archaea. It is thus likely that all plant homologs are β-galactosidases. In plants, these enzymes form a multigene family with 18 copies in Arabidopsis, and the characterized members act on various substrates, including arabinogalactans, galactolipids, and pectin to release Gal (Smith and Gross, 2000; Esteban et al., 2003). A particular feature of plant GT35 members is that about one-half of them carry a C-terminal extension distantly related to animal rhamnose-binding lectins, suggesting that this extension is a carbohydrate-binding module. Out of the two genes that we have identified, PttBGal35A and PttBGal35B, only PttBGal35B encodes this C-terminal putative binding domain. β-Galactosidases might play a role in pit membrane modification and hydrolysis of primary walls in the perforations, as discussed earlier, by degrading galactan side chains of rhamnogalacturanan I, or xyloglucan. Alternatively, they could act on arabinogalactan antenna of the arabinogalactan proteins (AGPs).
Family GH51
Only one secondary cell wall-related gene was found in family GH51 (PttGH51). Family GH51 is isofunctional and all characterized members, bacterial and eukaryotic, have α-l-arabinofuranosidase activity. α-l-Arabinofuranosidases are accessory enzymes releasing Ara from substrates such as arabinan, arabinoxylan, gum arabic, and arabinogalactan (Saha, 2000). PttGH51A is very similar to α-l-arabinofuranosidases from apple (Malus domestica), tomato (LeARF1), Japanese pear (Pyrus pyrifolia), and Arabidopsis (AtASD1 and AtASD2).
Since Ara is rare in poplar wood, a likely function for α-l-arabinofuranosidase during the secondary cell wall formation is modification of AGPs. Another possibility is that arabinofuranosidase participates in the pectin degradation in the defined cell wall areas, such as developing perforations or pit membranes. A similar expression pattern of other pectin-degrading enzymes, such as β-galactosidases and PGs, suggests degradation of homogalacturonan and rhamnogalacturonan I during the secondary wall deposition. In support of the latter hypothesis are the data on sugars released to the medium by a xylogenetic Zinnia culture at the stage of secondary cell wall formation (Stacey et al., 1995; Ohdaira et al., 2002). Linkage analysis indicated a degradation of homogalacturonan and side chains of rhamnogalacturonan I (arabinan and galactan) rather than a degradation of AGP antenna.
Other Secondary Cell Wall-Specific Enzymes and Proteins
Interestingly, some of the enzymes known to be involved in xylogenesis, such as Suc synthases in family GT4 (Babb and Haigler, 2001), were not secondary wall-specific in poplar. In our earlier work (Hertzberg et al., 2001), one Suc synthase was up-regulated in both zones C and D, while this dataset only showed up-regulation in zone C.
Hemicelluloses, such as xyloglucan (Pauly and Scheller, 2000), glucuronoxylan (Timell, 1967), glucomannan (Teleman et al., 2003), and pectins (Pauly and Scheller, 2000), are often O-acetylated. It has been shown that acetyl-CoA is the donor substrate for O-acetylation of cell wall polysaccharides (Pauly and Scheller, 2000). Recent results show a positive correlation between xyloglucan O-acetylation and fucosylation, suggesting that a suitable substrate for at least one Arabidopsis O-acetyltransferase is fucosylated xyloglucan (Perrin et al., 2003). Cas1p is a membrane protein required for the O-acetylation of glucuronoxylomannan in the fungal pathogen Cryptococcus neoformans, and four orthologs were found in Arabidopsis (Janbon et al., 2001). By performing a TBLASTN search with the Cas1p sequence against the PopulusDB, a gene coding for a putative O-acetyltransferase was identified. The transcript of the Populus putative O-acetyltransferase is highly expressed during zone D, which is consistent with a role in O-acetylation of the secondary wall polysaccharides glucomannan and glucuronoxylan. Nevertheless, functional characterization of this gene and its products is required to verify or discard its hypothesized role in O-acetylation.
Another interesting class of proteins is those associated with cortical microtubules, which have been suggested to facilitate the alignment of cellulose microfibrils as they are deposited in the cell wall (Baskin, 2001). Notably, among 14 different tubulin genes present in the cambial cDNA library of poplar, 10 were very highly expressed during late expansion (C) and secondary wall biosynthesis (D; data not shown).
The rest of the xylem-specific genes encoded many proteins of unknown function (data not shown). Some of these genes encode proteins with putative transmembrane domains or glycosylphosphatidylinositol anchors, and it is possible that new GTs remain to be identified in this group. Another abundant group of unknown genes seems to encode proteins containing domains involved in DNA-DNA or DNA-protein interactions and signaling events.
In summary, this functional genomic approach allowed the identification of 25 GTs in 7 different GT families and 9 GHs from 8 different GH families in poplar xylem. Even though the precise substrate specificity currently can be predicted only for a few of these enzymes, the assignments to CAZy families performed here help to narrow future functional studies to a limited number of enzymes and polysaccharides. Finally, the remaining collection of xylem-specific genes of entirely unknown function offers a rich source of new targets for comprehensive studies of the secondary cell wall biosynthesis in plants.
MATERIALS AND METHODS
Microarray Analysis
Construction of the Populus 13.5 K microarray containing cDNA fragments, printed in duplicate, from seven tissue-specific libraries has been described by Andersson et al. (2004). Complementary DNA for microarray hybridization was obtained by reamplifying material from a study by Hertzberg et al. (2001), consisting of five tissue sections over the developing xylem, i.e. meristematic cells (A), early expansion (B), late expansion (C), secondary cell wall formation (D), and programmed cell death (E). Because there is no obvious reference tissue section in developing xylem, a mixture of equal amounts of the individual samples (A + B + C + D + E) was used as a common reference. This ensures that genes expressed in only one tissue will be represented in the reference, thereby decreasing the risk of losing such genes in the quality filter process. However, the transcript profiles of genes that are highly expressed in one sample will be truncated because that sample comprises one-fifth of the reference.
All amplification and hybridization steps were carried out as described previously (Hertzberg et al., 2001), except that each Cy3-labeled sample was hybridized twice (A–C) or three times (D and E) against a Cy5-labeled pool of all samples. The slides were scanned at 5-μm resolution using ScanArray 4000/ScanArray Lite scanners (Perkin-Elmer, Boston). The images were quantified and analyzed using GenePix Pro 5.0 (Axon Instruments, Union City, CA). Spot intensity was computed as a median foreground minus median background signal. A quality filter was applied where the following conditions classified a spot as bad: (1) The spot was manually flagged as bad; (2) the spot intensity (background subtracted) was lower than 2 times a global background estimate (calculated as the median of mean background pixel intensities) in one or both channels; and (3) more than 30% of the pixels were saturated in both channels. If one or more of these criteria were met, the spot was flagged and excluded from further analysis. The data were normalized to a median of 1 for the ratios of all genes by using the intensity-dependent locally weighted linear regression (Lowess) procedure as implemented in GeneSpring 6.1 (Silicon Genetics, Redwood City, CA). Intensities of within-slide replicates were averaged before calculating ratios. The complete dataset from this hybridization has been published elsewhere (Schrader et al., 2004).
The secondary cell wall-specific genes were selected for this study based on the following criteria: at least two valid data points in each of the sections, C to E, and a signal at least 2-fold higher in D than in the common reference channel (a pool of tissues A–E). A second, more stringent selection required the signal in D to be >2-fold higher than in any of the other samples. Nine clusters were generated for the more stringent selection using self-organizing maps in GeneSpring 6.1 (Silicon Genetics). Hybridization of the 2 × 13.5 K microarray with targets prepared from the developing xylem and cambium/phloem tissues has been described in detail by Djerbi et al. (2004).
Full-Length Cloning
The sequences available for bioinformatic analysis consisted of ESTs, contig sequences (Sterky et al., 2004; http://www.populus.db.umu.se), and full-length cDNA sequences, as indicated in Tables I and II. For each identified CAZyme, the longest available sequence was used for the detailed modular annotation. For full-length sequencing of the cDNA clones, the insert of the cDNA clone corresponding to the EST sequence was sequenced on ABI 3700 sequencers (Applied Biosystems, Foster City, CA) by a primer-walking strategy. In most cases, the complete open reading frame could be revealed, but some gene sequences remained partial. Full-length sequences for a subset of genes were obtained by using the FirstChoice RLM-RACE kit (Ambion, Austin, TX). Initial cambial and xylem RNA was extracted by the RNeasy plant mini kit (Qiagen, Valencia, CA). The 5′ sequences were cloned in silico with the previously partial sequences to obtain full-length cDNA sequences. For a selection of genes, the full-length cDNA was cloned from cDNA synthesized from hybrid aspen cambial RNA by using an oligoT primer followed by PCR amplification with gene-specific primers. The products from the PCR reactions were cloned into a pGEM-T easy vector (Promega, Madison, WI) and sequenced.
Sequence Analysis and Data Storage
All Populus ESTs, contig sequences, and full-length sequences were compared against an annotated sequence library derived from the CAZy database (29,442 entries as of October 1, 2004). The CAZy database classification assigns nonoverlapping segments of each protein sequence to one of the different families of enzymes (presently including GHs and transglycosylases, GTs, polysaccharides lyases, and carbohydrate esterases). Because CAZymes are frequently modular, we have undertaken a systematic analysis and assignment of the modular structure of these enzymes (Coutinho and Henrissat, 1999). More than 56,000 modules arranged in 360 families have already been defined. Sequence segments corresponding to these modules are compiled into a library that is used for updating the CAZy database with new sequences by BLAST-based analyses (Altschul et al., 1997). The assignment of the ESTs reported here was performed following exactly the same procedure as that used for updating CAZy. For low levels of sequence identity, assignments were verified by the presence of known family signatures. Occasionally, hydrophobic cluster analysis (Callebaut et al., 1997) was used to help refine the boundaries of distantly related modules. Because CAZymes often display variable modular structures, only matches to catalytic modules were retained.
Phylogenetic Analyses
An analysis of the evolutionary relationships between plant CAZymes was performed in a three-step procedure. First, the catalytic domains from full sequences of plant origin belonging to relevant CAZy families were excised and aligned using Muscle 3.0 (Edgar, 2004). Then this initial alignment served as a profile for the alignment of all EST-derived sequences and other relevant incomplete sequences using the profile mode of ClustalW 1.83 (Thompson et al., 1994). Finally, phylogenetic trees were calculated from the resulting alignments from ClustalW using 1,000 bootstrap steps. The display of the trees was made using Treeview 1.6.6 (Page, 1996).
Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers (in parentheses) PttCesA1 (AY573571), PttCesA2 (AY573572), PttCesA3-1 (AY573573), PttCesA3-2 (AY573574), PttCesA9 (BI128217, BI128207), PttGT2A (AI165974), PttGT8A (CK087305, AI161580), PttGT8B (AY935502), PttGT8C (AY935503), PttGT8D (AI165903, BI127989), PttGT8E (BI128621, BU821382), PttGT8F (BI127969), PttGT8G (AI164749, BI135686), PttGT8H (BI129494), PttGT14A (AY935509), PttGT14B (AY935510), PttGT31A (BI132214), PttGT31B (BI130614), PttGT43A (AY935504), PttGT43B (AY935505), PttGT47A (AY935506), PttGT47B (AY935507), PttGT47C (AY935508), PttGT47D (BI131711), PttGT61A (BI131605, AI162640), PttCel9A (AY660967), PttXyn10A (AY935501), PttGH16L (BU821749), PttGH17A (BI128906), PttGT19A (AI163580, CK102151, CK092039), PttGH28A (AI164358), PttGal35A (AI164669), PttGal35B (BI128674, BI130571), and PttGH51A (BI120952).
Footnotes
-
Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.104.055087.
-
↵1 This work was supported by the Wallenberg Foundation and the European Union (project no. QLK5–CT2001–00443).
-
↵2 Present address: Commonwealth Scientific and Industrial Research Organization (CSIRO) Livestock Industries, 306 Carmody Road, St. Lucia, Queensland 4067, Australia.
- Received October 15, 2004.
- Revised December 28, 2004.
- Accepted January 6, 2005.
- Published February 25, 2005.