- Copyright © 2002 American Society of Plant Physiologists
Identification of the biosynthetic enzymes involved in cell wall biosynthesis remains one of the major unsolved problems of plant biology. Of the major polysaccharides of the plant cell wall, pectins and hemicelluloses are synthesized in the Golgi, and callose and cellulose are synthesized at the plasma membrane. The evidence is now quite extensive that the catalytic subunits of cellulose synthase are encoded by members of the largeCESA gene family (Arioli et al., 1998; Fagard et al., 2000;Holland et al., 2000; Taylor et al., 2000). With a few exceptions, however, the genes for the enzymes of pectin and hemicellulose biosynthesis have not been identified (Edwards et al., 1999; Perrin et al., 1999). Nothing is currently known about the genes encoding the enzymes that catalyze the synthesis of the hemicellulose backbones.
The primary cell walls of all higher plants contain large amounts of cellulose in their walls, and, consistent with this, CESAgenes are found throughout the plant kingdom (Richmond, 2000; Richmond and Somerville, 2000). In contrast, the hemicelluloses of dicotyledons and graminaceous monocotyledons (cereals) are distinct. Whereas dicots contain large amounts of pectin and xyloglucan, cereals contain low amounts of pectin and xyloglucan, large amounts of glucuronoarabinoxylan, and, at least in some tissues, the cereal-specific polymer (1–3),(1–4)-β-d-glucan (also known as mixed-linked glucan) (Carpita and Gibeaut, 1993; Carpita, 1996). On the basis of these structural differences, it would be expected that dicots and cereals would have a distinct panoply of hemicellulose biosynthetic enzymes.
Plants contain a superfamily of genes, called CSL (cellulose synthase-like), whose amino acid sequences are related to theCESA genes. The Csl proteins are predicted to be integral membrane proteins and contain a sequence, the “D,D,D,QXXRW” motif, that seems to be characteristic of processive glycosyl transferases (Saxena and Brown, 1995). On these grounds, it has been proposed that the CSL genes encode the catalytic subunits of the enzymes that synthesize the hemicellulose backbones (Richmond and Somerville, 2000, 2001).
Although no biochemical function has yet been elucidated for anyCSL gene, three studies implicate them in wall biosynthesis. Root hairs of Arabidopsis plants that are mutated in AtCSLD3are defective, apparently because of abnormal cell walls (Favery et al., 2001; Wang et al., 2001). A gene (NaCSLD1) that is highly expressed in Nicotiana alata pollen tubes, whose walls are composed almost entirely of callose and cellulose, has been proposed to encode a pollen-specific cellulose synthase (Doblin et al., 2001). Arabidopsis mutants in AtCSLA9 have increased resistance to Agrobacterium tumefaciens, which binds to plant cell walls at an early stage of infection (Nam et al., 1999).
With the completion of the Arabidopsis genome, every CSLgene in this plant has been identified (Richmond and Somerville, 2001). The rice (Oryza sativa) genome is expected to be complete by the end of 2002, and currently, approximately 50% of the rice genome is available either publicly in GenBank or through Monsanto's password-protected web site (http://www.rice-research.org). Approximately 80,000 rice expressed sequence tags (ESTs) and the actual corresponding cDNAs are also in the public domain.
We present here an analysis of the CSL genes present in the available rice sequence databases. We have identified 37 CSLgenes and have deduced full-length protein coding sequences for 23 of them (Table I). The genes were identified by BLAST searches of GenBank (nonredundant and dbEST) and the Monsanto database using the Arabidopsis CesA and Csl proteins as queries. Richmond's web page (http://cellwall.stanford.edu) served as a very useful starting point for the analysis. cDNAs corresponding to all OsCSL ESTs were obtained from the appropriate sources and sequenced completely. Most of the cDNAs came from the Rice Genome Research Program (http://rgp.dna.affrc.go.jp). The Rice Genome Research Program cDNA clones were of high quality; all but one were viable and accurately annotated. The one exception,D22177, was chimeric, containing OsCSLA2 at one end and a predicted DNA-binding protein at the other. For all sequences, the corresponding proteins were deduced using gene prediction software from GeneMark (Atlanta;http://opal.biology.gatech.edu/GeneMark) and Softberry, Inc. (White Plains, NY; http://www.softberry.com), and by manual alignment with the Arabidopsis Csl proteins and with each other. The sequences were aligned with Clustal X and presented with TreeView (Glasgow, UK) and CorelDraw (Ottawa, ON, Canada) (Thompson et al., 1994; Page, 1996; Jeanmougin et al., 1998).
The CSL superfamily of rice
Like the Arabidopsis Csl proteins, all of the rice Csl proteins are predicted to be integral membrane proteins. All except two have the QXXRW motif (Saxena and Brown, 1995). The exceptions are OsCslA10, which has RXXRW, and OsCslE2, which has LXXRW, at the equivalent positions. All of the OsCsl proteins have a DXD motif approximately 120 to 250 amino acids upstream of QXXRW.
The results indicate that there are both striking similarities as well as differences between the CSL genes of rice and Arabidopsis, which may reflect the similarities and differences in the hemicellulose composition of dicots and graminaceous monocots. Arabidopsis and rice both contain members of the CSLA,CSLC, CSLD, and CSLE families with no consistent distinctions between the two species (Fig.1). However, the rice and Arabidopsis sequences differ in at least three respects.
Unrooted phylogenetic tree of Csl proteins from rice and Arabidopsis. Only the deduced full-length rice Csl (OsCsl) proteins are included. The Arabidopsis Csl coding sequences were deduced by the same criteria used for the rice proteins and the sizes of many of the AtCsl proteins differ slightly from those given by Richmond (http://cellwall.stanford.edu). All of the Arabidopsis CslB, CslD, CslE, and CslG proteins are included, but for clarity only three of nine AtCslA, three of five AtCslC, and a sampling of maize (Zea mays), rice, and Arabidopsis CesA proteins are shown; inclusion of the others did not significantly change any of the relationships. The lengths of each deduced protein in number of amino acids are indicated after the protein names.
First, rice has a group of CSL genes, the products of which are related to CesA and CslD but nonetheless form a distinct group separate from either of these families (Fig. 1). These proteins are also significantly shorter than the CesA or CslD proteins because of truncation at their N termini (Fig. 1). On these grounds, we propose that these genes constitute a new cereal-specific family, for which we propose the name CSLF. (As with earlier classifications of the CSL genes [Richmond and Somerville, 2001], the family designations are solely for nomenclatural convenience and do not necessarily reflect any underlying functional relationships).
The products of OsCSLF1 and OsCSLF2 have >98% amino acid identity but are clearly two different genes based on a number of nucleotide differences in their 5′- and 3′-untranslated regions. OsCSLF1, OsCSLF2, OsCSLF3, and OsCSLF4 are physically linked within an approximately 49-kb region on PAC AP004261. Consistent with this, OsCSLF3and OsCSLF4 are on the same overlapping Monsanto contigs (Table I). It is not yet known if any of the other OsCSLgenes are clustered, although some are on the same chromosomes (TableI).
Some doubt remains about the accuracy of the deduced amino acid sequence of OsCSLF7. It appears to be both the most divergent and the shortest of the OsCSLF family (Fig. 1). The structure of OsCSLF7, with a short N-terminal exon followed by a large (4 kb) intron (Fig.2), is one that in our experience is particularly hard for gene prediction programs to call correctly. The structure of OsCSLF7 should be considered tentative until a full-length cDNA is sequenced.
Intron/exon structures of the full-length riceCSL genes. Exons are indicated by solid boxes and introns by white boxes. Vertical black lines indicate the position of the QxxRW motif. The number of introns for each gene is indicated in parentheses after the gene name. The genes are drawn to scale; the bar in the lower left indicates 1 kb.
Full-length coding sequences for OsCSLF5 andOsCSLF6 are not available, and the two deduced partial proteins do not overlap. Therefore, it is possible that these two proteins are from the same gene.
A second major difference between Arabidopsis and rice is the deep branching between their respective members in the CslB family. All six Arabidopsis CslB proteins form one cluster, whereas the two rice CslB-like proteins form a related but distinct branch. No rice proteins cluster tightly with the AtCslB sequences. In contrast to the OsCslF proteins, the deduced CslB-like proteins of the two species are similar in size (Fig. 1). We attempted to analyze other CslB and CslB-like proteins, based on EST sequences, from other dicots and cereals to see if the dichotomy shown in Figure 1 would hold up. Two partial Sorghum bicolor CslB-like proteins could be reliably assembled from public ESTs, and both of these (SbCslB2 accession nos. A286049 and BE594529; SbCslB3 nos. BE597410 andBG463462; see http://cellwall.stanford.edu) aligned more closely with the rice CslB-like proteins than with the AtCslB family (data not shown). This supports the hypothesis that the cereal CslB-like proteins constitute a distinct family, and we therefore propose the nameCSLH for the rice CSLB-like genes.
A third salient feature of the tree (Fig. 1) is that rice apparently lacks any CSLG family, members of which are widespread in dicots and have not been found so far in any monocot. This observation was made earlier by Richmond and Somerville (2001).
Arabidopsis is predicted to have 30 CSL genes (Richmond and Somerville, 2001), whereas rice has at least 37 (Table I). A number of the rice genome survey sequences predict the existence of additionalOsCSL genes (see http://cellwall.stanford.edu), but because of their short lengths, unavailability for further sequencing, and lack of utility for predicting intron/exon structure, they have not been included in the current analysis. Rice and Arabidopsis differ in the number of predicted genes in each of the “common” families. Arabidopsis and rice have nine and 10 CSLA genes, five and nine CSLC genes, six and four CSLD genes, and one and fiveCSLE genes, respectively.
Intron/exon structures were deduced for all of the full-lengthOsCSL genes (Fig. 2). The OsCESA,OsCSLA, OsCSLH, and OsCSLEfamilies tend to have more introns compared with OsCSLD,OsCSLC, and OsCSLF. In Arabidopsis, theAtCSLD family has the fewest introns (Richmond and Somerville, 2000). Intron number also tends to be conserved within a family (Fig. 2).
Genes in the CSL superfamily are currently the most promising candidates for encoding the glycosyl synthases that make the hemicellulose backbones of plant cell walls (Richmond and Somerville, 2001). Although all plant cell walls have similarities in their polysaccharide composition, the hemicelluloses of dicots and cereals show marked differences (Carpita, 1996). This dimorphism is expected to be reflected in distinct patterns of wall biosythetic enzymes and hence encoding genes. Consistent with both the similarities and differences between the walls of dicots and cereals, the CSL gene superfamily shows both degrees of conservation and degrees of differences between Arabidopsis and rice.
ACKNOWLEDGMENTS
We thank Robin Hawley (Michigan State University-Department of Energy [MSU-DOE]) for technical assistance and Todd Richmond (NimbleGen Systems, Inc., Madison, WI) for useful discussions. We thank Weiqing Zeng (MSU-DOE) for sharing his analyses of the Arabidopsis CSL genes. For cDNA clones, we thank the Rice Genome Research Program of the National Institute of Agrobiological Resources (Tsukuba, Japan); the Department of Cytogenetics (National Institute of Agricultural Sciences and Technology, Suwon City, Korea); the Department of Plant Breeding (Cornell University, Ithaca, NY); and Christine Michalowski (University of Arizona, Tucson). All cDNA clones mentioned in this paper are available to nonprofit researchers directly from the original source or, with their written permission, from the corresponding author.
Footnotes
-
↵1 This work was supported in part by the U.S. Department of Energy, by the Division of Energy Biosciences, and by the National Science Foundation Plant Genome Program (to Natasha Raikhel [MSU-DOE], Ken Keegstra [MSU-DOE], and J.D.W.).
-
↵* Corresponding author; e-mail walton{at}msu.edu; fax 517–353–9168.
-
www.plantphysiol.org/cgi/doi/10.1104/pp.010875.
- Received September 25, 2001.
- Revision received September 29, 2001.
- Accepted November 2, 2001.