Genetic resources for maize cell wall biology.

Grass species represent a major source of food, feed, and fiber crops and potential feedstocks for biofuel production. Most of the biomass is contributed by cell walls that are distinct in composition from all other flowering plants. Identifying cell wall-related genes and their functions underpins a fundamental understanding of growth and development in these species. Toward this goal, we are building a knowledge base of the maize (Zea mays) genes involved in cell wall biology, their expression profiles, and the phenotypic consequences of mutation. Over 750 maize genes were annotated and assembled into gene families predicted to function in cell wall biogenesis. Comparative genomics of maize, rice (Oryza sativa), and Arabidopsis (Arabidopsis thaliana) sequences reveal differences in gene family structure between grass species and a reference eudicot species. Analysis of transcript profile data for cell wall genes in developing maize ovaries revealed that expression within families differed by up to 100-fold. When transcriptional analyses of developing ovaries before pollination from Arabidopsis, rice, and maize were contrasted, distinct sets of cell wall genes were expressed in grasses. These differences in gene family structure and expression between Arabidopsis and the grasses underscore the requirement for a grass-specific genetic model for functional analyses. A UniformMu population proved to be an important resource in both forward- and reverse-genetics approaches to identify hundreds of mutants in cell wall genes. A forward screen of field-grown lines by near-infrared spectroscopic screen of mature leaves yielded several dozen lines with heritable spectroscopic phenotypes. Pyrolysis-molecular beam mass spectrometry confirmed that several nir mutants had altered carbohydrate-lignin compositions.

The C 4 grasses, with their high photosynthetic efficiency compared with C 3 plants, provide more than half of the world's calories in human nutrition and for grazing animals (Langenheim and Thimann, 1982). Lignocellulosic biomass of grasses, comprising plant cell walls, is also regarded as a sustainable and re-newable feedstock for biofuels (Ragauskas et al., 2006). Although the Arabidopsis (Arabidopsis thaliana) genome sequence has provided entry points for the identification of many wall-related genes, the functions of grass-specific genes need to be elucidated to gain genetic control of biomass yield and quality in food crops and in bioenergy grasses (Carpita and McCann, 2008). The close evolutionary relationship of the C 4 grasses of the Panicoid subfamily and the syntenic organization of grass genomes makes possible the rapid translation of genes that impact biomass characteristics identified in maize (Zea mays) into more genetically recalcitrant species. Maize also provides a wealth of tools built on a century of breeding experience and genetic research. In this study, we provide, to our knowledge, the first comprehensive evaluation of the maize genes responsible for the biogenesis of the special type II cell walls of grasses, and we validate a strategy to determine gene function.
The genome sequences of Arabidopsis (Arabidopsis Genome Initiative, 2000), rice (Oryza sativa; International Rice Genome Sequencing Project, 2005), and maize inbred line B73 (Schnable et al., 2009; http:// www.maizesequence.org) provide essential inventories for comparative genomic analyses of cell wallrelated genes in Arabidopsis and other flowering plants with the grass species. Grasses have cell walls of a distinct composition from all other flowering plants. Arabidopsis has a type I cell wall: a framework of cellulose microfibrils cross-linked by xyloglucans and embedded in a matrix of acid-rich pectic polysaccharides (McCann and Roberts, 1991;Carpita and Gibeaut, 1993). Type II cell walls of the grasses, such as maize and rice, have a framework of cellulose microfibrils cross-linked primarily with glucuronoarabinoxylans (GAXs). Pectins are a small proportion of the matrix polymers, with GAXs providing most of the negatively charged matrix of the type II cell wall (Carpita, 1996). Structural proteins are deposited and cross-linked at the cessation of growth of type I walls, whereas networks of hydroxycinnamic acid-rich phenylpropanoids covalently linked to GAXs are deposited in the type II wall. Furthermore, a mixed-linkage (1/3),(1/4)-b-D-glucan is synthesized during cell expansion of grasses and hydrolyzed when growth ceases (Carpita and Gibeaut, 1993;Carpita, 1996). All angiosperms synthesize certain polymers in common, but these vary in abundance and possess subtle distinctions in fine structure.
With nearly half of the genes in plant genomes yet to be functionally annotated, an estimated 1,000 genes of unknown function may encode cell wall-related proteins (Yong et al., 2005). Analysis of coregulated genes using microarrays in Arabidopsis has implicated unannotated genes as candidates in secondary wall formation (Brown et al., 2005;Kubo et al., 2005;Persson et al., 2005). Microarrays with maize coding sequences are currently available but incomplete (http://www. maizearray.org/), whereas massively parallel sequencing technologies allow relative transcript abundance to be measured on a genome-wide scale (Eveland et al., 2008). From this data set of transcripts expressed in the maize developing ovary before pollination, we profiled the major primary cell wallrelated genes. We then compared these profiles with microarray data for Arabidopsis and rice ovaries sampled at the same stage of development. Our results document that orthology often cannot be inferred from homology and that different sets of genes than would be predicted are expressed between a model dicotyledonous species and the model grasses.
Even for Arabidopsis, only a small percentage of genes have a biochemically validated functional annotation. The identification of mutants provides important clues to specific gene function in wall biogenesis. From the T-DNA insertional lines of Arabidopsis (Alonso et al., 2003), we generated over 1,000 homozygous lines mutated in annotated cell wallrelated genes (http://cellwall.genomics.purdue.edu/ families). To generate a comparable resource of maize mutants, we screened the UniformMu population  by both forward-and reversegenetics approaches. This collection has a uniform, inbred background, and the highly mutagenic lines (transposon-on) can be readily stabilized genetically (transposon-off) to prevent additional insertional events occurring in lines of interest (http://www. plantgdb.org/prj/). We constructed DNA grids from over 15,000 independent, pedigreed lines that enable PCR-based screening of the population for insertional mutants in selected cell wall genes. In addition, we identified Robertson's Mutator (Mu)-tagged maize cell wall mutants by analysis of Mu-flanking DNA sequences derived from the DNA grids and in a forward screen of mutagenized populations by near-infrared (NIR) spectroscopy.
From identification and assembly of the many families of cell wall-related genes in dicots and grasses, we estimate that plants devote 10% of their genomes to cell wall biogenesis (Yong et al., 2005). We have classified cell wall-related genes of Arabidopsis, rice, and maize into gene families whose products function in six stages of wall biogenesis: substrate generation, polysaccharide synthesis, membrane trafficking, assembling and turnover, secondary wall formation, and signaling (http://cellwall.genomics.purdue.edu; McCann and Carpita, 2005;Yong et al., 2005). Here, we present several examples of families from these six functional stages and their organization in species with different cell wall types. A database including other species except maize was developed that adopted our classification scheme (Girke et al., 2004), and an additional resource for a selected number of maize cell wall-related genes that are the most similar sequences to annotated Arabidopsis genes was reported (Guillaumie et al., 2007). However, as we describe here, the structure of the gene families in grasses and their comparative expression have diverged to such an extent that identification of only the closest Arabidopsis homologs will not provide a comprehensive approach to identify and characterize the unique genes that are responsible for the differences in the structures of type I and type II walls. More recently, the members of a great many of the glycosyl transferase (GT) gene families of rice were compared with those of several dicots and inferred to be largely nonorthologous (Cao et al., 2008). We report here a comprehensive inventory of maize cell wall genes, an analysis of comparative expression profiles in a single tissue type between rice, maize, and Arabidopsis, an effective means to generate and identify mutants with defects in cell wall-related genes, and a robust highthroughput screen for cell wall mutants independent of visible phenotypes yet reflective of cell wall composition and architecture.

RESULTS AND DISCUSSION
Building Families of Cell Wall-Related Genes in Arabidopsis, Rice, and Maize Sequences of annotated cell wall-related genes of Arabidopsis and their paralogs were used in a BLAST search (Altschul et al., 1990) to find the putative orthologs and paralogous sequences in rice and, subsequently, the homologous sequences in maize. Until very recently, the maize sequence was largely unannotated and incomplete; thus, computer algorithms used to find common sequences based on keywords or full-length sequences, as described by Guillaumie et al. (2007), may have missed many relevant genes. We used the newly released maize genome sequence (Schnable et al., 2009) to attain a complete view of many cell wall gene families in maize for comparison with rice and Arabidopsis. Dendrograms of cell wall gene families of Arabidopsis, maize, and rice were developed for each species individually and in combination. In every instance where ambiguity as to classification or large family member expansions occurred, predicted protein sequences were compared and analyzed for motifs that validated the annotation using ProScan (Zdobnov and Apweiler, 2001).
About 60% of the Arabidopsis genome is annotated with respect to predicted function of its protein products (Swarbreck et al., 2008), and several Web sites have assembled gene families based on known functionalities. For example, the Carbohydrate-Active Enzyme database (http://www.cazy.org/) assembles families of GTs, glycosyl hydrolases (GHs), and other carbohydrate-metabolizing enzymes. There are 91 gene families of evolutionarily distinct GTs and 112 GHs, with Arabidopsis and rice genes populating 40 and 34 of them, respectively. Although the total number of GTs is higher in rice than in Arabidopsis, 550 versus 445, the numbers of GHs are about the same, 419 for rice and 403 for Arabidopsis. Only a few groups within families of cell wall-related genes have similar numbers of members for Arabidopsis and grasses (Table I). We observed differences in the number of members of a family, in numbers of members of a single group of a family, and in the presence of new family groups (or loss of family groups). This is likely a consequence of duplication and divergence in the genomes since the last common ancestor, resulting in splitting of a single gene function between paralogs (subfunctionalization), new function in a duplicate gene (neofunctionalization), or a combination of both events (subneofunctionalization). For example, the monosaccharide transporter genes of Arabidopsis and rice have duplicated, adapted, and diverged extensively (Johnson and Thomas, 2007). Differences in the duplication and retention of genes in animals and hominoid lineages are proposed to account for their evolutionary differences (Fortna et al., 2004;Hughes and Friedman, 2004). In 12 genotyped Drosophila species, 41% of the gene families have undergone expansion or contraction in different species, even though the overall gene number in the family remains relatively constant (Hahn et al., 2007). Furthermore, compared with sorghum (Sorghum bicolor; Paterson et al., 2009), maize appears to have retained significantly more duplications resulting from the tetraploidization that occurred before the maize-sorghum divergence rather than from independent tandem duplication (Schnable et al., 2009). As we describe below, EST/ cDNA data support differential expression of each of the duplicates, indicating their neofunctionalization and subfunctionalization after the tetraploidization event.

Genes of Substrate Generation: Nucleotide-Sugar Interconversion Pathways
The numbers of genes in the 10 families that encode the enzymes of the nucleotide-sugar interconversion pathways, which are responsible for the formation of the basic sugar building blocks of many cell wall carbohydrates ( Fig. 1A; Reiter and Vanzin, 2001), vary little between Arabidopsis and the grasses. These families are combined for convenience into one dendrogram, and evolutionary relationships are relevant within a single family (Fig. 1B). With the exception of maize duplications in several families, genes appear to be in orthologous relationships (Fig. 1B). Deoxysugars Fuc and Rha are characteristic of several polymers in type I walls but are of much lower abundance in type II walls. While Fuc is a key sugar in the structure of rhamnogalacturonan (RG) II and certain type I xyloglucans, it is absent from grass RG II (Thomas et al., 1989) and xyloglucan (Gibeaut et al., 2005). Still, Fuc is a component of N-linked glycoproteins, so retention of GMD and GER genes in rice and maize is expected (Fig. 1B).
Although UDP-Ara is synthesized in the pyranose form, a substantial portion of the Ara in cell wall polysaccharides is in the furanose form. Konishi et al. (2007) discovered that the "reversibly glycosylated protein" (RGP; Dhugga et al., 1997) can act as an isomerase that interconverts UDP-Arap and UDP-Araf. There are several maize and rice homologs to the five Arabidopsis RGP genes (Supplemental Fig. S1; Drakakaki et al., 2006). There is one known splice variant of the rice gene, Os03g40270; the three maize members closely related to Arabidopsis RGP3 share 93% protein sequence conservation. This could represent a recent duplication event, as only one rice RGP3 homolog (Os07g41360) is found associated with a fourth RGP3-related maize gene in a group separate from the five Arabidopsis RGP genes.

Genes of Substrate Generation: Phenylpropanoid Metabolism
The large and diverse families associated with phenylpropanoid biosynthesis show an expansion of the grass genes relative to those of Arabidopsis, consistent with the relative importance of the phenylpropanoid network in walls of grasses (Carpita, 1996). As with the families of nucleotide-sugar interconversion, these families are combined into a single dendrogram, and evolutionary relationships are relevant within a single family (Fig. 2).
Grass-dominated clades are also observed in the gene families encoding cinnamyl alcohol dehydrogenase (CAD) and the cytochrome P450 monooxygenases (trans-cinnamate 4-hydroxylase, p-coumaroyl-shikimate/ quinate 3#-hydroxylase, ferulate [coniferyl alcohol/ aldehyde] 5-hydroxylase, caffeoyl-CoA 3-O-methyl transferase, and 4-coumaric acid CoA-ligase). The structure of the gene family encoding cinnamoyl-CoA reductases (CCRs) has an overall 3-fold expansion in numbers of grass sequences (Fig. 2B). Broad expansion and divergence among the grass-specific clades of families suggests novel functions associated with the synthesis of the complex phenylpropanoid network that is made in primary as well as secondary walls of grass species.

Genes of Polysaccharide Synthesis: Processive GTs
The CesA/Csl (for cellulose synthase-like) superfamily and callose synthases belong to the GT2 family and encode processive synthases of cellulose and other b-linked backbone polymers (Fig. 3;Delmer, 1999;Holland et al., 2000). First discovered in developing cotton (Gossypium hirsutum) fiber cells, plant CesA proteins possess four "U-domains" considered essential for catalysis of (1/4)-b-linked glucans in which one residue is oriented 180°with respect to each neighbor (Pear et al., 1996). The CesA genes of rice and maize appear to form orthologous clusters with Arabidopsis   Genes of the nucleotide-sugar interconversion pathways. A, Schematic of pathways for plant nucleotide-sugar interconversion. The committed step to synthesis of uronic acids and pentoses is catalyzed by UDP-Glc dehydrogenase (UGD); isoforms exhibit different catalytic activities that indicate varied functions (Karkonen et al., 2005). The function of the UDP-GlcA decarboxylase (carboxyl-lyase) was established for the UXS family in barley (Zhang et al., 2005), with homology to the SUD/ AUD group proposed for Arabidopsis (Reiter and Vanzin, 2001). Apiose, the essential monosaccharide in the boron didiester cross-linking of RG II, is synthesized by enzymes encoded by members of the AXS group, which converts irreversibly UDP-GlcA to a mixture of UDP-apiose and UDP-Xyl. A reduction in the levels of these synthases results in an RG II deficiency and cell wall abnormalities (Ahn et al., 2006). Although pectins are a minor component of the walls of grasses, an apiose-containing RG II with only slightly modified side groups is present (Thomas et al., 1989). B, These evolutionarily distinct families are combined for convenience into one dendrogram; evolutionary relationships are relevant only within a single family. Three groups of C-4 epimerases have been annotated: the UDP-Glc 4-epimerases (UGEs), including REB1, that interconvert UDP-Glc and UDP-Gal (Seifert et al., 2002;Nguema-Ona et al., 2006); the UDP-GlcA 4-epimerases (GAEs) that interconvert UDP-GlcA and UDP-GalA CesAs known to function at specific stages of cell growth and differentiation ( Fig. 3; Holland et al., 2000;Vergara and Carpita, 2001;Appenzeller et al., 2004). At least three CesA genes are expressed in growing cells producing primary cell wall cellulose, whereas three different CesA genes are coexpressed in cells engaged in secondary wall cellulose synthesis in Arabidopsis (Taylor et al., 2003), rice (Tanaka et al., 2003), maize (Appenzeller et al., 2004), and barley (Hordeum vulgare; Burton et al., 2004). The clade structure and expression patterns suggest conserved cooperative functions for cellulose synthase catalytic units in primary versus secondary walls. However, six maize CesAs are duplicated relative to their rice and Arabidopsis putative orthologs.
The Csl genes encode proteins that each contain the four U-motifs but lack several key sequences of CesAs, such as the zinc-finger domains and/or plant-specific sequences (Delmer, 1999). An exception may be the CslD group, some of which encode the zinc-finger domains and may constitute the cellulose synthases of tip-growing cells of root hairs (Favery et al., 2001) and pollen tubes (Doblin et al., 2001), with CSLD1 (Os10g42750) in rice  having a similar role. Of the eight distinct classes of Csls, several are represented in both grasses and Arabidopsis. From heterologous expression studies, the CslA group encodes (1/4)-b-D-(gluco)mannan synthases Liepman et al., 2005). While several Arabidopsis CslAs cluster, those of both rice and maize are more diverse (Fig. 3). CslCs encode b-glucans that are possibly the backbones of xyloglucans (Cocuron et al., 2007). The CslF group is unique to the grasses (Hazen et al., 2002) and encodes a synthase of mixedlinkage (1/3),(1/4)-b-D-glucans (Burton et al., 2006), a polysaccharide that among the angiosperms is unique to the grasses (Poales). It is the clearest example of the divergence of a group within an entire family that functions in the synthesis of a grass-specific polysaccharide. The functions of the CslBs, CslEs, and CslGs are unknown, but within the CesA/Csl superfamily CslE genes are the most similar to cyanobacterial CesAs, which are implicated as progenitors of modern plant cellulose synthase genes (Nobles and Brown, 2004). Inclusion of grass sequences alters the clade structures proposed for gene families when Arabidopsis sequences alone are considered (Hazen et al., 2002). For example, CslE and CslG may belong to the same clade; similarly, CslB and CslH are predicted to be derived from a common ancestral gene (Fig. 3).

Genes of Polysaccharide Synthesis: The Nonprocessive GTs
From over 35 nonprocessive GT gene families, we will focus on a subset of five. The families GT8 and GT47 are exceptionally well populated with members that function in substitution of cell wall pectic and cross-linking glycan backbones with several kinds of sugars. Family GT31 of galactosyl transferases may broadly function in the synthesis of the galactan backbones of AGPs. Families GT34 and GT37 encode xylosyl and fucosyl transferases involved in xyloglucan synthesis.
Family GT8 comprises over 40 genes in five groups in both Arabidopsis and the grasses (Fig. 4A). The functions of only a few genes have been determined. Group D, the largest, contains GALACTURONOSYL TRANSFERASE1 (GAUT1 [At3g61130]; Sterling et al., 2006), which encodes what is thought to be a synthase of homogalacturonans. Because GT8 family members encode transferases with retaining mechanisms of glycosyl transfer, resulting in a-linked polysaccharides, they are thought to be primarily involved in pectin synthesis (Scheller et al., 2007). However, a-linked residues are also found in many types of nonpectic polysaccharides, such as the (1/2)-a-GlcA of GAXs (Carpita, 1996), and an enzyme essential for glucuronoxylan synthesis is encoded by the group B gene GAUT-like (GATL1 [PARVUS]; At1g19300; Lee et al., 2007a). Despite roughly equal numbers of total GT8 genes, Arabidopsis group B genes are significantly expanded in number, whereas the grass genes are more abundant in groups D and E (Fig. 4A).
The five-group family GT47 in Arabidopsis encodes enzymes with at least four different types of transferase activities, including galactosyl, arabinosyl, xylosyl, and glucuronosyl transferases, all with inverting mechanisms of glycosyl transfer. Numerous duplications of both maize and rice genes are evident in  Usadel et al., 2004); and the UDP-Xyl 4-epimerases (UXEs), including MUR4, that interconvert UDP-Xyl and UDP-Ara (Burget et al., 2003). GDP-Fuc is synthesized de novo from GDP-Man via two enzymes, a 4,6-dehydratase (GMD), such as MUR1, and a 3,5-epimerase-reductase (GER; Reiter and Vanzin, 2001). For accession numbers of all genes in these families, see http://cellwall.genomics.purdue.edu/families/1-1/. Color coding for all dendrograms of gene families (Figs. 1-7; Supplemental Figs. S1-S3) are Arabidopsis (red), rice (green), and maize (blue), with numbers of genes in each group indicated. Expression levels for maize genes are indicated as the numbers of reads obtained in the sequencing runs, with 10 or more considered highly or moderately expressed (dark blue boxes), whereas one to nine reads are considered low expression (light blue boxes). Maize gene expression was compared with Arabidopsis and rice expression in developing ovary from public sources, primarily NCBI Gene Expression Omnibus (Barrett et al., 2007), as visualized in Genevestigator (https://www. genevestigator.ethz.ch/). Genevestigator was the primary model where expression in ovary is compared with that in other organs and tissues. Genes minimally expressed in Arabidopsis and rice are noted with light red and light green boxes, respectively, whereas genes moderately to highly expressed are noted with dark red and dark green boxes. Whether expressed or not, known mutants are indicated after the gene annotation. comparison with their closest Arabidopsis homologs in groups A, B, and E. This resulted in a minimum of seven subclades for the grass genomes, one of which is so substantially diverged as to constitute a new grassspecific group F (Fig. 4B). In contrast, Arabidopsis genes form distinct subgroups with numerous duplications in groups A, C, and D compared with their grass homologs (Fig. 4B). With the exception of the group C XYLOGALACTURONAN DEFICIENT1 (At5g33290), which encodes a xylosyl transferase that adds the (1/3)-b-Xyl units to homogalacturonan (Jensen et al., 2008), the three other enzymatic activities are encoded by unclustered Arabidopsis genes. Of these, group A includes MUR3 (At2g20370), which encodes a galactosyl transferase that adds the (1/2)b-D-Gal residue of the first xylosyl residue from the reducing end of the repeating heptasaccharide unit of xyloglucan (Madson et al., 2003). Group B includes ARABINOSE DEFICIENT1 (ARAD1; At2g35100), which is involved in the (1/5)-a-arabinan synthesis (Harholt et al., 2006), and group E includes AtGUT1 (At1g27440), AtGUT2 (At5g61840), and FRA8 (At2g28110), which encode glucuronosyl transferases (Zhong et al., 2005;Peñ a et al., 2007). While maize and rice homologs are observed for each of these Arabidopsis transferase genes, a significant challenge remains to establish the function of even the Arabidopsis genes of this large family.
GT31 is a five-group family in Arabidopsis, for which a galactosyl transferase function associated with N-glycan formation is ascribed to a single member in group B (Fig. 5;Strasser et al., 2007). However, Qu et al. (2008) suggest that members of other groups of GT31 function in galactan backbone formation of AGPs. When grass sequences are included, a new group F emerges, which contains a single Arabidopsis gene with weak homology to genes in groups A and B.
Five pairs of rice and maize genes in group F appear after the Arabidopsis-grass divergence (Fig. 5).
In Arabidopsis, the eight-membered family GT34 encodes transferases shown to xylosylate the glucan backbone of xyloglucan (Faik et al., 2002) and both (1/2)-a-and (1/6)-a-galactosyl transferases (Edwards et al., 1999). Inclusion of the rice (10) and maize (18) sequences expands the group structure of the gene family from three to five, with several duplications of grass genes homologous to a single Arabidopsis gene (Supplemental Fig. S2A). The XXT1 (At3g62720) and XXT2 (At4g02500) group genes encode the xyloglucanspecific xylosyl transferases (Cavalier et al., 2008). Family GT37 encodes fucosyl transferases, with MUR2 (FUT1; At2g03220) encoding the enzyme that forms the fucosylated trisaccharide side group of xyloglucan in Arabidopsis (Sarria et al., 2001;Vanzin et al., 2002). Consistent with the retention of genes to synthesize Fuc despite the lack of fucosylation of grass RG II (Thomas et al., 1989) and xyloglucan (Gibeaut et al., 2005), the maize and rice sequences fall into two new groups, B and C, distinct from Arabidopsis (Supplemental Fig. S2B).
From informatics-based analyses, GT43 has been predicted to contain genes encoding the (1/4)-b-Dxylan synthase of GAX (Mitchell et al., 2007), despite an expectation that the Csls would encode the processive synthases for (1/4)-b-linked glycans (Hazen et al., 2002). Peñ a et al. (2007) showed that the irx9 mutant phenotype was a result of the failure to elongate primed xylan chains. Based on the Pfam family PF02458 motif, the designation GT61 could be a misnomer, encoding not a sugar transferase but a GAXspecific feruloyl transferase (Mitchell et al., 2007). The GT77 members, Arabidopsis RGX1 and RGX2, encode (1/3)-a-D-xylosyl transferases thought to be involved in the synthesis of RG II (Egelund et al., 2006). Egelund Figure 2. Genes of phenylpropanoid substrate synthesis. A, The current view of the metabolic pathways from Phe or Tyr to hydroxycinnamic acids and monolignols. The PAL1 and PAL2 genes encoding PAL (Rohde et al., 2004) and CAD-C and CAD-D (Sibout et al., 2005) were identified as genes involved in lignification in the floral stem of Arabidopsis. The fah1 mutant lacking sinapate esters was characterized as ferulate (coniferyl aldehyde/alcohol) 5-hydroxylase (Meyer et al., 1996;Humphreys et al., 1999). An extension of a genetic screen for reduced epidermal fluorescence mutants resulted in the discovery of ref3 and ref8 (Franke et al., 2002a(Franke et al., , 2002b, which were found to encode a cinnamate 4-hydroxylase (C4H) and a p-coumaroyl-shikimate/ quinate 3#-hydroxylase (C3#H), respectively. The latter enzyme had been identified by Schoch et al. (2001) after a phylogenetic analysis of Arabidopsis cytochrome P450 enzymes. Identification of the substrate of this enzyme was aided by early studies on chlorogenic acid biosynthesis by Heller and Kü hnl (1985) and Kü hnl et al. (1987). Generation of p-coumaroyl-CoA from p-coumaric acid is catalyzed by HCT. This enzyme is also responsible for the transfer of the caffeoyl moiety of chlorogenic acid (caffeoyl-quinate) and caffeoyl-shikimate to CoA, based on studies by Hoffmann et al. (2003) that built on early studies by Stö ckigt and Zenk (1974), Rhodes and Wooltorton (1976), and Ulbrich and Zenk (1980). Selection of irregular xylem mutants also proved helpful in identifying genes of phenylpropanoid metabolism, as the irx4 mutant was determined to result from a defective CCR gene (Jones et al., 2001). A knockout of CAFFEIC ACID O-METHYLTRANSFERASE1 (COMT1) results in lignins with strongly reduced levels of syringyl units in maize (Vignols et al., 1995) and, subsequently, in Arabidopsis (Goujon et al., 2003). 4CL, 4-Coumaric acid CoA-ligase; CALDH, coniferaldehyde dehydrogenase; CCoAOMT, caffeoyl-CoA 3-O-methyl transferase; F5H, ferulate (coniferyl alcohol/aldehyde) 5-hydroxylase; TAL, Tyr ammonia-lyase. B, These evolutionarily distinct families are combined for convenience into one dendrogram; evolutionary relationships are relevant only within a single family. Color scheme and dendrogram labeling are as described in the legend of Figure 1. For accession numbers of all genes in these families, see http://cellwall.genomics.purdue.edu/families/1-3/. Functional classification of phenylpropanoid genes in maize, rice, and Arabidopsis was based on characteristic signal peptides and motifs identified using InterProScan (http://www.ebi.ac.uk/ Tools/InterProScan/) but do not imply that enzymatic function has been experimentally verified. et al. (2007) showed that the rrd1 and rrd2 mutants of Arabidopsis are Ara deficient in the wall residue after pectin and GAX extraction, consistent with a deficiency in extensin arabinosylation and with the appearance of homologs associated as the major GT family in Chlamydomonas, an organism with Hyp-rich protein walls (Lee et al., 2007b;U. Goodenough, personal communication).

Genes of Wall Assembly and Rearrangement
Expansins are involved in wall loosening during growth and do so by disruption of hydrogen bonds (Cosgrove, 2000). Members of the expansin superfamily are related to the GH45 family of glucosidase genes, but the proteins have no measurable glycolytic activity (Sampedro and Cosgrove, 2005). The Arabidopsis expansin family comprises 35 genes in two major groups, those encoding the a-expansins and b-expansins, and two minor groups encoding the a-expansin-like A and b-expansin-like B proteins ( Fig. 6A; Sampedro and Cosgrove, 2005). In rice and maize, the expansin family is slightly expanded in the a-expansin group and greatly expanded in the b-expansin group containing the class 1 pollen allergens ( Fig. 6A; Sampedro and Cosgrove, 2005;Valdivia et al., 2007). The grasses also have a subgroup of b-expansins, grass group 2 pollen allergens, which lack the GH45-related expansin domain (Sampedro and Cosgrove, 2005). The class B expansin and an original set of class A Figure 3. Genes of the CesA/Csl superfamily. At least three CesA genes are coexpressed during primary wall formation, and mutants in each of them, AtCesA1 (rsw1; Arioli et al., 1998), AtCesA6 (procuste; Fagard et al., 2000), and AtCesA3 (cev1; Ellis et al., 2002;eli1;Cañ o-Delgado et al., 2003), result in cellulose deficiencies, indicating that all three are essential for cellulose synthesis. The irregular xylem mutants AtCesA8 (irx1), AtCesA7 (irx3), and AtCesA4 (irx5) are deficient in cellulose synthesis specifically in secondary walls (Taylor et al., 2003). The root-hairless mutant kojak was traced to a mutation in the CslD3 gene proposed to be a cellulose synthase in these tip-growing cells (Favery et al., 2001). Rice CslD1  and maize CslD5 are apparent orthologs, as mutations in each result in the reduced root hair phenotype. Heterologous expression of the Arabidopsis CslA9 in Drosophila cells in culture confirmed the role of this gene in mannan synthesis (Liepman et al., 2005). Expression of a barley CslF gene in Arabidopsis resulted in the de novo appearance of epitopes of the mixed-linkage (1/3),(1/4)b-D-glucan (Burton et al., 2006), and characterization of a CslC coexpressed with a xyloglucan-specific xylosyl transferase in Pichia resulted in the synthesis of extended glucan polymers (Cocuron et al., 2007). Color scheme and dendrogram labeling are as described in the legend of Figure 1. See http://cellwall.genomics.purdue.edu/families/2-1/ for accession numbers of all genes in this superfamily. expansin genes show a conserved microsynteny across many grass species (Valdivia et al., 2007).
Two large gene families function in cell wall expansion. The GH16 family contains the xyloglucan endo-b-transglucosylase/hydrolase genes (XTHs), so named because of the two distinct activities of enzymes they encode (Rose et al., 2002). Xyloglucan endo-b-hydrolase cleaves the glucan backbones of Figure 4. Genes of major nonprocessive GT families. A, In GT8, the largest clade, group D, encodes GAUT1, the only protein established to iteratively extend GalA units typical of homogalacturonan (Sterling et al., 2006). This large family of retaining transferase genes encodes several putative GAUTs and three distinct subgroups of GATL proteins. GAUT8, or QUASIMODO1 (QUA1), is involved in the synthesis of RG II (Mouille et al., 2007), whereas the group C GATL1 and a secondary wall-associated group A member (Sterling et al., 2006) have no established function. The PARVUS gene (Lao et al., 2004) is involved in the synthesis of the tetrasaccharide primer of xylan synthesis (Lee et al., 2007a). See http://cellwall. genomics.purdue.edu/families/2-3-1/ for accession numbers of all genes in this family. B, Group A of GT47 contains MUR3, which encodes a galactosyl transferase that adds the (1/2)-b-D-Gal residue of the first xylosyl residue from the reducing end of the repeating heptasaccharide unit of xyloglucan (Madson et al., 2003). In group B, ARAD1 functions in (1/5)-a-L-arabinan synthesis (Harholt et al., 2006), and in group C, XYLOGALACTUR-ONAN DEFICIENT1 (XGD1) encodes a xylosyl transferase that adds the (1/3)-b-Xyl units to homogalacturonan (Jensen et al., 2008). Group E contains FRAGILE FIBER8 (FRA8), which functions in the synthesis of the tetrasaccharide primer for xylan synthase (Peñ a et al., 2007), and putative glucuronosyl transferase genes, GUT1 and GUT2, that add substituents to secondary wall xylans (Zhong et al., 2005). Color scheme and dendrogram labeling are as described in the legend of Figure 1. For accession numbers of all genes in this family, see http://cellwall. genomics.purdue.edu/families/2-3-2/. xyloglucan, a function essential for mobilization of seed reserve xyloglucan (Buckeridge et al., 2000), and the xyloglucan endo-b-transglucosylase (XET) cleaves xyloglucans but can ligate a donor chain to an acceptor chain, retethering xyloglucans during growth (Rose et al., 2002). The XTH family of Arabidopsis has 33 members assembled in at least three groups, and rice and maize have numerous members in all of them (Fig. 6B). Hrmova et al. (2007) suggested that, in the grasses, XETs may function to couple xyloglucans to cellulosic oligomers or to the mixed-linkage (1/3), (1/4)-b-D-glucan. However, because the rates of reaction with these alternative acceptor substrates are a small fraction of that of galactosylated xyloglucan, the physiological significance in vivo is still in question. In particular, group B XTHs show a tight association of Arabidopsis sequences, with a much looser clustering of grass homologs into three distinct subgroups. Group C is largely populated with grass members, whereas group A includes several Arabidopsis members (Fig. 6B). Xyloglucans can reach an abundance of about 10% of wall mass in growing tissues of grasses (Gibeaut et al., 2005), and XETactivity may serve more significant functions in grasses than previously thought (Yokoyama et al., 2004).

Genes of Wall Modification and Disassembly
Most families contain larger numbers of maize and rice genes compared with those of Arabidopsis, but the smaller number of Arabidopsis genes is not strictly a result of its compact genome. This is particularly evident in the genes devoted to pectin metabolism, where the numbers of Arabidopsis genes greatly exceed those of the grasses (Fig. 7). The GH28 family that encodes polygalacturonases (PGases) and carbohydrate esterase family GE8 genes that encode pectin methylesterases (PMEs) constitute two of the larger families found in angiosperms González-Carranza et al., 2007). The PGase family consists of seven groups, where multiple duplication events in groups B, E, and G resulted in a much larger number of genes in Arabidopsis than in maize or rice (Fig. 7A). Several multiple duplications are also observed in maize and rice in groups A, C, D, and H, suggesting novel subfunctionalizations in the grasses despite the low amounts of pectins in their cell walls. Group B is Arabidopsis only, and a novel grassspecific group H diverged from an Arabidopsis-only group F. Other subgroups are evident within the group nomenclature established for Arabidopsis sequences (Fig. 7A).
Similarly, family PME comprises five groups, three of which, C, D, and E, have reduced group members in rice and maize compared with Arabidopsis (Fig. 7B). However, the clustering of these few genes on several chromosomes suggests that they arose by consecutive tandem duplications. The presence of two grassspecific clusters in groups A and C, with one and no A single member of the GT31, GALT1 from group B, has been shown to be a galactosyl transferase required in the synthesis of N-glycan hybrid structures (Strasser et al., 2007). Recent bioinformatics approaches indicate that many of the members of this family may encode (1/3)-b-galactosyl transferases (Qu et al., 2008). For accession numbers of all genes in this family, see http://cellwall. genomics.purdue.edu/families/2-3-5/. Figure 6. Genes of wall assembly and rearrangement. A, The Expansin gene family. This family comprises two major groups, the a-expansins and b-expansins, and two minor groups, the a-like and b-like expansins (Sampedro and Cosgrove, 2005). The grasses have high numbers of aand b-expansin genes in all groups, but Arabidopsis has a disproportionately high number of a-expansins. The crystal structure of one expansin, the b-expansin EXPB1 (Zea m1), a maize group 1 pollen allergen, has been solved (Yennawar et al., 2006). Color scheme and dendrogram labeling are as described in the legend of Figure 1. See http://cellwall. genomics.purdue.edu/families/4-1-1/ for accession numbers of all genes in this family. B, The XTH gene family. Three major groups of XTH genes have been identified, but transferase and/or hydrolase activities have not been systematically defined. Mutants with defects in XTH24 (meri5; Verica and Medford, 1997), XTH22 (tch4; Xu et al., 1995), and XTH28 (formerly XTR2; Akamatsu et al., 1999) all result in altered growth responses. Even though xyloglucan is a specific substrate, the xyloglucan-poor grasses have nearly equal representation in all three major groups. For accession numbers of all genes in this family, see http://cellwall.genomics.purdue.edu/ families/4-2/.

Maize Cell Wall Biology
Plant Physiol. Vol. 151, 2009 Figure 7. Genes of pectin modification. A, The PGase gene family. The polygalacturonan hydrolase gene family in Arabidopsis comprises three groups based on protein structure, which can be further divided into subgroups González-Carranza et al., 2007). Inclusion of maize and rice sequences defines a new group H with grass-only sequences. A mutation in a member of group A causes failure of separation of pollen tetrads (quartet3 [qrt3]; Rhee et al., 2003). See http://cellwall. genomics.purdue.edu/families/4-3-3/ for accession numbers of all genes in this family. B, The PME gene family. The original Arabidopsis gene family comprises five major groups (Louvet et al., 2006), of which groups C, D, and E are enriched in Arabidopsis sequences. The mutation qrt1 also results in a pollen cell-separation phenotype similar to qrt3 (Francis et al., 2006). For accession numbers of all genes in this family, see http://cellwall.genomics. purdue.edu/families/4-5-1/. Color scheme and dendrogram labeling are as described in the legend of Figure 1. close Arabidopsis homolog, respectively, also suggests a novel subfunctionalization in the grasses.

The COBRA Gene Family
The COBRA gene family encodes glycosylphosphatidylinositol-anchored proteins that are associated with cellulose biosynthesis and orientation but for which the biochemical function remains unknown (Supplemental Fig. S3; Schindelman et al., 2001;Roudier et al., 2002Roudier et al., , 2005Brady et al., 2007). A mutation in the founding member of the COBRA family in Arabidopsis results in severe inhibition of root elongation and the progressive radial swelling of the cortical cells away from the root tip (Hauser et al., 1995). Rice Brittle culm1 (Bc1) is expressed primarily in vascular regions of the leaves and culm, and the mutant bc1 phenotype results in organ brittleness . Mutations in Arabidopsis COBRA-like4 (COBL4), the closest homolog of the rice Bc1 gene, result in plants with normal morphology but stems with reduced tensile strength (Brown et al., 2005). The maize ortholog of COBL4 and Bc1 is Brittle stalk2 (Bk2), and there is an additional COBL4 like gene each in rice and maize. Ching et al. (2006) reported that the loss of mechanical strength in bk2 is due to a reduction in the synthesis of secondary wall cellulose, but Sindhu et al. (2007) propose that the protein functions in a patterning of lignin-cellulosic architecture that maintains organ flexibility, as the cellulose deficiency is present in juvenile plants but the brittle phenotype occurs only after the developmental transition to adult plants.

Expression Profiles of Genes for the Primary Cell Wall in Ovary Tissues
Massively parallel sequencing technologies enhance coverage and quantification of transcripts (Eveland et al., 2008). The long-read, 3# untranslated region (UTR)-anchored, gene-specific transcript profiling strategy can readily distinguish closely related gene family members (Supplemental Table S1) and quantify their abundance across a wide range of expression (Fig. 8). In a study of developing maize ovaries, Eveland et al. (2008) resolved transcripts for most CesA genes of maize and found that CesA family members exhibited nearly 100-fold differences in transcript abundance.
As described here, the completion of the maize genome sequence and our annotation of the gene families allowed us to interrogate the database of transcript abundances for many additional, coexpressed members of several cell wall-related gene families. We compared Roche Applied Science 454based sequencing of 3# UTR-rich expression profiles generated from developing maize ovaries before pollination with those of the publicly available Arabidopsis and rice microarray data. The Genevestigator anatomy expression tool (Grennan, 2006; https:// www.genevestigator.ethz.ch/), combined with the National Center for Biotechnology Information's (NCBI's) Gene Expression Omnibus (Barrett et al., 2007), allowed comparison of homologous sequences expressed in developing ovaries dissected from Arabidopsis and rice before pollination.
From 14,822 unique contigs searched by BLAST analysis, significant matches were made to 30% of the unique maize cell wall genes (167 of 556). In comparison, we identified matches for 28% of expressed cell wall genes in Arabidopsis (141 of 498) and 38% of expressed cell wall genes in rice (192 of 501; https:// www.genevestigator.org/). Thus, 454 sequencing compares favorably with microarray analyses for capturing gene expression within a specific tissue when a complete sequence is available. In addition, the specificity of the 3# UTR sequences often enabled resolution of expression for putative paralogous transcripts that aligned to a single cDNA sequence.
Transcript abundances for individual members of several cell wall gene families in maize also revealed wide differences in expression (Fig. 8). Arabidopsis and rice expression from public sources was compared with maize expression patterns in ovary tissue ( Fig. 8; Supplemental Table S1) to identify putative orthologous and/or divergent cell wall-related genes. We assume that the functional equivalence of orthologous sequences requires homologous gene expression in the equivalent tissues of different species. For the genes of the nucleotide interconversion pathway, transcripts of at least one family member were detected for all 10 families of rice, nine in Arabidopsis, and eight in maize (Figs. 1B and 8). In families GME, RHM/UER, UGD, and GMD, apparently orthologous genes are expressed in all species, whereas in SUD/AUD (UXS), UGE, and GAE families, more divergent members are expressed in the grasses (Fig. 1B).
Maize and rice showed moderate to high levels of expression for multiple genes in all eight families involved in phenylpropanoid biosynthesis (Figs. 2B and 8). In contrast, very few Arabidopsis genes in these families were expressed in ovary tissue, with isolated instances of coexpression with the closest grass homolog. The much higher number of phenylpropanoid genes expressed in the ovary tissues of the grasses underscores the phenylpropanoid richness of even the primary wall of these species. Guillaumie et al. (2007) reported expression of a limited number of genes in the phenylpropanoid pathway, showing that some were more highly expressed in young stems, whereas expression of others was greater in developing internodes.
All three primary wall CesAs of Arabidopsis (CesA1, CesA3, and CesA6) were expressed in developing ovaries of the grasses, with six in rice and eight in maize (Fig. 3). The CslA and CslC groups show a large departure in sequence between the grasses and Arabidopsis, but few instances of expression of the closest homologs are observed between rice and maize. No expression of the maize CslF genes was detected in the ovary tissue (Fig. 3). The grass-specific (1/3),(1/4)-b-D-glucan synthase encoded by these genes is typically absent from the meristematic tissues, and the b-glucan is synthesized in abundance only when cell expansion ensues (Carpita, 1984;McCann et al., 2007).
For the two major GT families, GT8 and GT47, many Arabidopsis, rice, and maize genes are expressed, but the closest grass homolog to an Arabidopsis gene is rarely expressed (Fig. 4). In GT8, Arabidopsis and rice have at least one member expressed from each of the five groups (Fig. 4A). Three members of maize GT8 are highly expressed in developing maize ovaries (Figs. 4A and 8). Two members in group A are related to Arabidopsis PGS1P1 (At1g77130), and the third is related to group C genes GATL1 (PARVUS; At1g19300; Lee et al., 2007a) and rice Os04g44850.
Family GT47 contains at least one Arabidopsis member expressed in four of its five represented groups (Fig. 4B). Group E is dominated by grass genes, with a total of six rice and nine maize genes, four of the maize genes being coexpressed with rice genes. Two clades of group D show Arabidopsis and maize coexpression, whereas group C shows only rice expression. In group B, grass homologs closest to the arabinosyl transferase gene ARAD1 (At2g35100; Harholt et al., 2006) are expressed, but several other related grass genes are also expressed, including one each of maize and rice in the new group F (Fig. 4B). Collectively, a diverse group of at least 17 different GT47 genes are expressed in developing maize ovaries (Fig. 8). Neofunctionalization of arabinosyl transferases might be expected in the grasses. For example, in grasses, the Figure 8. Expression profiles of cell wall genes in developing maize ovaries. Maize cell wall genes were classified by function according to pathway and/or gene family (the latter as defined in Figs. 1-7; Supplemental Figs. S1-S3). Relative transcript abundance within each class was quantified by the frequency of 3# UTR-anchored cDNA sequences (number of reads) from each transcript in the data set of Eveland et al. (2008). The sequence counts for each transcript are plotted on a log scale (note range in abundance). Transcripts are labeled with their ZM2G maize sequence identifiers (maizesequence.org). The groups shown include two major biosynthetic processes, nucleotide sugar interconversion (1.1) and phenylpropanoid biosynthesis (1.3), and five large gene families, xyloglucan endotransglucosylase/hydrolases (4.2), expansins (4.1.1), GT8 (2.3.1), GT47 (2.3.2), and GT31 (2.3.5). See Supplemental  Table S1 for mRNAs of cell wall genes not shown here.
arabinosyl units are attached mostly on the O-3 positions along the xylan backbone, whereas in type I walls, the arabinosyl units are mostly on the O-2 positions (Carpita and Gibeaut, 1993). Similarly, the expression of the closest homolog between Arabidopsis and the grasses in families GT31 (Fig. 5), GT34, and GT37 (Supplemental Fig. S2) is the exception rather than the rule. Expression within the GT31 family varied over a 10-fold range in developing maize ovaries (Fig. 8).
Thirteen unique maize expansin family members were expressed (Fig. 6A), with mRNAs from all of these abundant in the maize developing ovary (Fig. 8). These transcripts included five aand six b-group members. Numerous rice expansin genes (13) were also expressed (Fig. 6A), compared with only six in Arabidopsis, with five of the latter limited to the a-group (Fig. 6A). No Arabidopsis genes encoding a-like or b-like expansins were found to be expressed in the developing ovary of that species, whereas two each of the maize and rice genes encoding a-like expansin genes were expressed. The large increase in numbers of expansin genes expressed in grasses compared with Arabidopsis indicates that cell wall loosening by expansins may play a greater role in development of ovary tissue in grasses.
Five members of family XTH showed significant expression in Arabidopsis, compared with 13 in maize and six in rice, with at least one of these genes from each of three XTH subgroups (Fig. 6B). None of these genes was a close homolog between Arabidopsis and either of the grasses (Fig. 6B). At least 11 maize XTHs were among the abundantly expressed genes in developing ovaries (Fig. 8), consistent with their possible contribution to expansion and/or remodeling of the xyloglucan-rich primary cell walls of grasses (Rose et al., 2002, Yokoyama et al., 2004, Gibeaut et al., 2005. Arabidopsis has markedly expanded PGase and PME gene families compared with the grasses, but multiple members of many groups were highly expressed in Arabidopsis, maize, and rice ovary tissues. For PGases, 16 Arabidopsis, 20 rice, and 12 maize genes show significant expression, with most expressed from group A and one closely homologous expression group for all three species (Fig. 7A). Group B has two highly expressed genes in this Arabidopsis-only grouping. For PMEs, the 17 Arabidopsis, 15 rice, and 13 maize genes showed significant expression in most groups, but the single maize gene in group E was not expressed. In only two instances was expression of the closest homologs between all three species observed (Fig. 7B).
To summarize, the large differences in expression patterns between Arabidopsis and the grasses with respect to the cell wall gene families underscore the vast amount of work needed to establish specific gene function. For small gene families, such as those of nucleotide-sugar interconversion, orthology can be reasonably inferred by expression of closely related homologs within similar cell types. However, our analyses show that this is a rare occurrence rather than a common one. For large families, such as those of GTs, expansion of certain clades through successive duplications in maize and rice indicates a large potential for neofunctionalization and subfunctionalization during evolution. Such relationships are especially prominent in families such as the PGase, PME, and those associated with the phenylpropanoid synthesis pathway, which may relate to the specific archi- Figure 9. Example of results obtained during a reverse-genetics screen for Mu inserts in CslA7 using the UniformMu DNA grids. In this instance, no phenotype was detected, but visible features appear in other mutants (e.g. cslD5 in Fig. 10). A, Top panels show PCR products obtained from a screen of x and y axes from a given DNA grid. A total of 48 lanes appear in double rows on each panel, and each lane represents a pooled fraction of DNA from 48 families (one to two plants each). Collectively, 2,304 families are screened in each grid. Bottom panels localize a gene-specific Mu insert to a grid coordinate of x34, y41 via hybridization to a gene-specific 32 P-labeled probe. B, Nested primers are used to amplify a selected Mu flank for sequence validation using a combination of gene-specific and Mu-specific primers. Here, forward primers FA and FB are used for CslA7, in respective combination with the TIR sequences TIR6 and TIR8, characteristic of Mus. The process progresses from the original plant DNA (black line), to the initial PCR product (striped line), and then to the final product (white line) used for sequencing. Confirmation of a Mu insert is followed by field and greenhouse tests for heritability and possible phenotypic features. tecture of the different cell wall types. While common motif predictions clearly classify genes to a family, protein modeling and BLAST score analyses are currently insufficient to make specific predictions of gene function and orthology. Thus, what is needed is an empirical approach to gene function, such as the characterization of mutants in genes known to be expressed in a cell-specific context. In addition, expression analysis using large-scale techniques, such as 454 sequencing of 3# UTRs, allows for rapid assignment of expression patterns that, if performed in enough tissue types or time courses, could help to elucidate function in space and time of specific duplicated genes or altered gene patterns in different cell wall types.

Forward and Reverse Genetics to Identify Specific Mutants in Wall-Relevant Genes
For Arabidopsis, the T-DNA insertional mutant lines have provided an invaluable resource for genetic functional analyses (Alonso et al., 2003). The UniformMu population was developed by introgressing Robertson's Mu into genetically uniform W22 and B73 inbreds Settles et al., 2007), and a database was developed as a similar resource to facilitate high-throughput molecular analysis of Mutagged mutants and gene knockouts in maize (http:// www.maizegdb.org/documentation/uniformmu/). High-throughput sequencing of Mu insertion sites in UniformMu lines has enabled in silico mapping of random Mu insertions in the maize genome  and facilitated cloning of transposontagged mutants (Suzuki et al., , 2008. Hence, the UniformMu resources enable both forward-and reverse-genetics applications. Over 100 putative transposon knockouts of maize genes have been analyzed by gene-specific PCR and tested to confirm heritability (Settles et al., 2007). We identified 72 insertion sites in 63 maize cell wall genes (Supplemental Table S2). While not meant as an exhaustive search for knockouts of all maize genes, it represents a substantial increase over the handful of previously known cell wall-related mutants in maize. This resource will continue to grow as current and new Mu insertions are placed with the appropriate genes and seed stocks (http://www. maizegdb.org/documentation/uniformmu/).
To identify mutants with Mu inserts in specific genes, we have developed DNA grids for highthroughput screening of the UniformMu population. These grids can be screened by PCR with Mu-and gene-specific primer pairs. Resulting candidate mutants are sequence verified and tested for heritability in progeny of the appropriate line. Each UniformMu grid allows concurrent screening of DNA from 2,304 lines in a 48 3 48 arrangement of pooled extracts. Individual lines are traced by the presence of their DNA in given pools from the x and y axes. These materials include a total of eight UniformMu grids, allowing 18,432 lines to be tested. All lines chosen for the grid are transposon-off and genetically stable mutants.
To identify putative Mu inserts in the gene of interest, grids were screened with gene-specific primers in combination with a second primer designed to hybridize with diverse Mu terminal inverted repeats (TIR6; Fig. 9). Two PCR methods were also tested for each gene screened, since standard PCR protocols worked best for some genes and touch-down PCR for others (for the latter, PCR annealing temperatures were decreased in increments so that they could "touch down"). Products were tested by hybridizing Southern blots of resulting PCR products with genespecific 32 P-labeled fragments from wild-type genes. Positive results were followed by dilution of the initial PCR product (from TIR6 + gene-specific primer) and reamplification using nested Mu primers (TIR8 + genespecific primers where available; Fig. 9). The second product was sequence verified, and plants were grown to test heritability of the mutation. Progressive adaptation of this screening procedure resulted in a manyfold enhancement of specificity for the target genes and gave rise to confirmed gene-specific insertions (Supplemental Table S2). Figure 10. Phenotype of a Mu insertion in CslD5. A, Seedlings are root hair deficient. B, Both Mu and Ac inserts near the 5# end of the CslD5 gene confer a similar phenotype. C to E, Progressive magnification of wild-type (WT) and cslD5 mutant roots reveals the extent of surface differences. Note that although the cslD5 roots appear hairless, they retain a capacity to initiate, but not necessarily elongate, hairs (note hair initials in E.). [See online article for color version of this figure.] Initial work was directed toward identifying mutants in the Csl gene families. Plants carrying homozygous Mu insertions in CslD5 showed a clear root hair-deficient phenotype (Fig. 10A) and were similar in both Mu-and Ac-induced knockouts of this gene (Fig. 10B) and the Arabidopsis kojak mutant (Favery et al., 2001). In the maize cslD5 mutant, root hairs initiate but fail to elongate to the degree observed for wild-type root hairs (Fig. 10, C-E). Microscopy of live, tip-growing root hairs (data not shown) indicated that those of mutant seedlings were susceptible to bursting at their tips, thus prematurely terminating their elongation. There were no aboveground phenotypic features evident for cslD5 mutant plants in field or greenhouse analyses. Linkage between the root hairdeficient phenotype and the Mu insert in CslD5 was confirmed by testing for a similar phenotype in an Ac insertional allele (accession no. AC027037) obtained from the Ac/Ds resources (Kolkman et al., 2005). These results confirm a conserved role of csld5 orthologs in root hair elongation by Arabidopsis, rice, and maize ( Fig. 10; Favery et al., 2001;Kim et al., 2007). Other Mu Figure 11. Classification of maize mutants nir23 (A, C, and E) and nir27 (B, D, and F) based on NIR spectra. A and B, Baselinecorrected and area-normalized NIR spectra for the mutant (red) and wild-type control (W22; blue). C and D, Digital subtractions (mutant -wild type) of the spectra shown in A and B, respectively. E and F, PCA score plots showing how the mutant and wildtype leaf samples can be distinguished from each other based on their spectral characteristics. The insets show the percentage of correctly classified samples using a multivariate model with increasing numbers of PCs. In both mutants, a model based on the first three PCs results in 100% correct classification. NIR spectra and their digital subtractions from W22 for all 39 nir mutants can be found at http://cellwall.genomics.purdue.edu/families/7/. [See online article for color version of this figure.]  Supplemental Table S3. Populations of samples of mutant and W22 (wild type) are classified correctly on the basis of five PCs, accounting for over 90% of the correct classification for both mutants. Loadings of PC1 to PC3 indicate that the cell walls of the mutant have a higher carbohydrate-to-lignin ratio in the mutant, as evidenced by the following ions in PC1 of m/z 55, 73, 85, and 98 diagnostic of carbohydrate, m/z 114 specifically transposon insertions identified in Csl genes, such as cslA7, have no obvious phenotype.

Screening for "Invisible" Phenotypes
The forward-genetics screen of the UniformMu population identified independently 29 maize mutants with distinct visible phenotypes, including dwarfs, mutants with altered morphology or altered leaf texture, mutants affecting cell-cell adhesion, and a novel brown midrib mutant (Supplemental Fig. S4). We also used NIR spectroscopy in a high-throughput screen for alterations in wall composition in leaves for which no visible phenotype was obvious . We acquired NIR spectra from mature dried leaves representing 2,200 segregating F2 families of the UniformMu population. Spectral data were subjected to a multivariate statistical analysis designed to identify unusual spectra indicative of putative mutants. Only those Mu insertion lines where 15% to 30%, or three to six per 20 plants of the F2 siblings, gave similar "spectrotypes" that were distinct from the wild type were counted in this total. Of several hundred lines that were tentatively selected as potential mutants, 39 NIR spectrotype (nir) mutants were identified (1.8% of the lines screened) and found to be heritable. Six of these nir mutants displayed subtle changes in leaf texture, color, architecture, or disease susceptibility/ disease lesion mimic, but the other 33 were indistinguishable from the W22 wild-type controls under normal field conditions. The NIR spectra, principal components analysis (PCA), and a characterization of probable chemical phenotypes for all of these are provided at our Web site (http://cellwall.genomics. purdue.edu/families/7/). Examples of two of these mutants with distinct spectral differences are shown in Figure 11.
The cell wall compositions of the nir mutants were analyzed using pyrolysis-molecular beam mass spectrometry (PyMBMS) as described (Evans and Milne, 1987;Tuskan et al., 1999). This method relies on thermal degradation of the cell wall constituents under anoxic conditions to provide information about hexose and pentose content and the content and composition of phenolic compounds derived from lignin and hydroxycinnamic acids. PCA of the PyMBMS spectra from isolated cell walls showed that at least six of the 39 nir mutants had altered carbohydrate-lignin interactions of different kinds. When applied to PyMBMS profiles, PCA resolves the nir23 and nir27 mutants based on differences in the relative abundance of ions diagnostic for carbohydrate and aromatics (Supplemental Table S3), indicating a low-lignin phenotype (Fig. 12). Seeds from the 39 nir mutants are available through the Maize Genetics Stock Center (http://maizecoop.cropsci.uiuc.edu/).

CONCLUSIONS AND FUTURE PROSPECTS
Genes that encode the enzymes for cell wall biogenesis are assembled in families common to all plants. However, publication of the maize genome sequence (Schnable et al., 2009) has allowed a comparative genomics analysis with rice and Arabidopsis genome sequences that highlights distinctions in the family substructure for cell wall-related genes of the grasses. In a few instances, homologous sequences indicate potential orthologous functions, whereas in others, a clear expansion of groups is observed that suggests the evolution of novel functions specific to the type IIwalled grasses or the type I-walled Arabidopsis. Expression profiling by massively parallel cDNA sequencing indicates that many of these novel genes are expressed in addition to the much smaller proportion of the apparent orthologs. We conclude that potential orthologs cannot be identified solely by identifying sequences with the highest sequence similarities.
As the C 4 grasses are key bioenergy species (Ragauskas et al., 2006;Carpita and McCann, 2008), knowledge of cell wall structure and biosynthesis will be pivotal to improve biomass yield and quality. Gene discovery in maize can be translated to increased yields, improved digestibility, and saccharification potential by the microsynteny among bioenergy grass genomes. As described here, forward-and reversegenetics screens of a genetically stabilized insertional mutant collection like UniformMu (http://maizeGDB. org/documentation/UniformMu) offer a powerful means to establish gene function comparable to the resources available for Arabidopsis. Activator (Bai et al., 2007) and RescueMu (Fernandes et al., 2004) are additional resources of insertional lines for forward-or reverse-genetics screens, and TILLING (for Targeting-Induced Local Lesions in Genomes) in maize (Weil and Monde, 2007) provides a means of generating rich allelic series in essentially any gene. The natural diversity of a population can be exploited in conventional breeding, but having well-mapped populations of recombinant inbred lines, such as the intermated B73 3 Mo17 (IBM) population, allows rapid identification of quantitative trait loci of the genes relevant for biomass improvement (Lee et al., 2002). In fact, Hazen and colleagues (2003) discovered several quantitative trait loci affecting cell wall sugar composition in the caryopsis pericarp using the IBM lines. The Nested Association Mapping lines, 200 recombinant inbred lines each derived from crosses of B73 with 25 diverse inbreds, includes many highbiomass tropical maize inbreds that capture a substantial amount of the existing genetic diversity of the species (Yu et al., 2008; http://www.panzea.org). These rich genetic resources greatly facilitate the discovery of biomass-relevant genes, especially when coupled with the rapid screens of carbohydrate and aromatic composition by NIR spectroscopy and PyMBMS that we describe here.
Not only for biofuels production but also for food and feed derived from grass species, maize offers an ideal model system because of a sequenced genome, an accounting of a great many cell wall-related genes, a single-gene knockout strategy, high-throughput screens for nonvisible phenotypes, and a vast genetic diversity that can be exploited in testing individual and multiple gene effects (Carpita and McCann, 2008). The next challenge will be to define the novel functions of grass cell wall-related genes at the biochemical level and their roles in plant growth and development.

Rice and Maize Sequences
Rice (Oryza sativa) gene families were constructed by querying the most current rice peptide sequence database from the J. Craig Venter Institute (http://www.jcvi.org; formerly www.tigr.org) with Arabidopsis protein sequences by downloading the flat file and converting it to a usable database for NCBI's BLAST with formatdb (Altschul et al., 1990). A custom DOSshell script was used to direct the BLAST through multiple sequence files using the following parameters: protein-protein BLAST search, expect value of 10 220 , and no alignment output. The BLAST results were parsed using a custom C++ script to scan and place the queried Arabidopsis gene name, associated rice gene names, and match score values for any score greater than 200 into a Microsoft Excel file. Duplicate matches due to multiple hits to the same rice sequence from closely related Arabidopsis sequences were eliminated to generate a unique rice gene list to extract sequences from the rice database using the fastacmd program from NCBI (Altschul et al., 1990) in a custom DOSshell script.
Maize (Zea mays) gene families were identified as described for rice genes using the newly completed gene annotations from the maize sequencing project (Schnable et al., 2009; http:// www.maizesequence.org). All protein sequences used in the construction of the dendrograms are available at our Web site (http://cellwall.genomics.purdue.edu).

Sequence Alignments and Dendrogram Development
Dendrograms were assembled from protein-coding sequences by the neighbor-joining method in ClustalW (Saitou and Nei, 1987;Chenna et al., 2003). The parameters used were for a slow, accurate tree with gap open penalty of 10, gap extension penalty of 0.05, and a Gonnet weight matrix for proteins for multiple alignments; a gap open penalty of 10, gap extension penalty of 0.1, and a Gonnet weight matrix for proteins for pairwise alignments; and bootstrapped 1,000 times. After the initial multiple alignment, individual clade alignments were checked using Multalin (Corpet, 1988; http://www-archbac.u-psud.fr/genomics/multalin.html). Matches to conserved regions within groups of family clades with suspect alignments were manually checked using InterProScan (Zdobnov and Apweiler, 2001; http:// www.ebi.ac.uk/Tools/InterProScan/), and nonmatching members of the families were removed. Dendrograms were drawn using TreeDyn (Chevenet et al., 2006; http://www.treedyn.org/). The dendrograms exist as FLASH files with interactive links on our Web site (http://cellwall.genomics.purdue.edu).
To evaluate relative transcript abundances of phylogenetically classified maize cell wall genes in an expanding tissue (Figs. 1-8), maize cell wall genes were matched (greater than 85% identity) to 3# UTR-anchored cDNA consensus sequences in the developing ovary dataset of Eveland et al. (2008) using BLASTN. Transcript abundance was quantified as the number of 454sequence reads assigned to each consensus sequence (Eveland et al., 2008).

Rice and Arabidopsis Sequence Location and Protein Motif Analyses
Sequence locations and distances were verified using the Gbrowse designed for rice (http://www.jcvi.org) and Arabidopsis (http://www. arabidopsis.org). Proteins were checked for conserved motifs using ProScan (Zdobnov and Apweiler, 2001) from the European Bioinformatics Institute Web site (http://www.ebi.ac.uk/interpro/).

Mu Flank Matches to Maize Genes
A data set of transposon-flanking sequences was constructed from the B73 draft genome by extracting 1-kb segments flanking 2,725 independent, germinal insertion sites identified in UniformMu (MaizeGDB.org/ documentation/UniformMu/). The flanking sequences were annotated by BLASTX searches of rice (Michigan State University Osa1; http://rice. plantbiology.msu.edu/) and Arabidopsis (Arabidopsis Genome Initiative) protein sequence databases using a cutoff expect value of 10 -7 . Sequences related to individual maize genes for each family were found using BLASTN using an expect value of 10 -10 and a score of greater than 100. The results were parsed with a custom C++ script and unique Mu flanks matched with their best corresponding maize gene sequence. Some variation from exact matches was allowed due to comparing sequences of W22 with B73 inbreds; however, most expect values were well above 10 -100 .

Screening of the UniformMu Population by NIR
Over 2,200 F2 families of a segregating UniformMu population were sown in 16 to 20 seeds per 17-foot (5.2 m) row, with 30-inch-wide (76.2 cm) spacing between rows (over 40,000 plants) during the 2003 and 2004 seasons on irrigated fields, rotated with soybeans (Glycine max), at Purdue University's Agronomy Center for Research and Education in West Lafayette, Indiana. Plants were bar-code tagged with unique identifiers at the five-leaf stage, and visible segregant phenotypes were documented photographically. In a single day of collection, a crew of 16 workers excised 7.5-cm sections at mid-leaf blade from each plant at mid height (adult leaf 5 or leaf 6), mounted them flat with bar-code tags in glassine envelopes, and air dried them at 50°C. The majority of the viable plants were self-pollinated to preserve potential homozygous mutant lines.
NIR spectra (average of 30 spectral acquisitions) were obtained from dried leaf samples using a hand-held probe connected to a FieldSpec Pro NIR spectrometer (Analytical Spectral Devices). The reflectance of light from the leaf segment in the range between 350 and 2,500 nm was recorded relative to a Gore-Tex disc as white reference. Baseline-corrected and area-normalized data sets of the spectra were then used in the chemometric analyses. Most of the PCA was carried out using WIN-DAS software (Kemsley, 1998). Multivariate partial least squares and some of the PCAs were also carried out using Matlab 6.5.1 (The MathWorks). Linear discriminant analysis was used to develop a discriminative calibration model to classify spectra into groups. Mahalanobis distance was used as the distance metric to measure the distance of each observation (spectrum) from each group center. Linear discriminant analysis using squared Mahalanobis distance metrics was applied to the PCA scores of original data (Kemsley, 1998).

PyMBMS
A custom-built molecular beam mass spectrometer using an Extrel model TQMS C50 mass spectrometer was used for pyrolysis vapor analysis (Evans and Milne, 1987;Tuskan et al., 1999). The quartz pyrolysis reactor described previously was replaced with a commercially available pyrolysis unit and autosampler (Frontier model no. PY-2020 iD). Minor modifications were made to incorporate the autosampler inlet pyrolysis system onto the molecular beam mass spectrometer. Maize samples were ground to pass through a 2-mm screen and were placed into 80-mL stainless-steel sample cups. The samples were inserted into the pyrolysis oven with helium flowing through at 2 L min 21 (at standard pressure and temperature). The autosampler furnace was electronically maintained at 500°C, and the interface was set to 350°C. A quarter-inch transfer line used to interface the autosampler pyrolysis unit to the molecular beam mass spectrometer was electronically heated to approximately 350°C. The total pyrolysis time was 2 min, although the pyrolysis reaction was completed in less than 20 s. The mass-to-charge ratio was set between m/z 30 and m/z 450.
The Unscrambler software program (CAMO; version 9.7) was used to normalize background-subtracted data based on total ion content and to perform PCA. The normalized triplicate MBMS spectra for each sample were averaged before PCA. Peak assignments (Supplemental Table S3) were made according to Evans and Milne (1987) and Boon (1989).

Reverse-Genetics Screens of DNA Grids
Eight grids were constructed for reverse genetics from the UniformMu population at the University of Florida. All seeds used for grids were Mu-off individuals, each carrying an estimated 10 unique, stable inserts, plus 50 other known inserts from W22 and parental sources. Each seed was also pedigreed, with ancestral information for seven generations. The screens focused on 50 cell wall-related genes in families hypothesized to have specific roles in grasstype cell walls.
We developed a two-tiered approach to these screens. The first was a "standard PCR screen" used for all 50 target genes, and the second was an "advanced screen" used where more intensive effort was warranted to identify a mutant. Standard screens were done with a Mu-specific primer plus a forward-directed, gene-specific primer to target approximately 1.5 kb of the upper coding sequence of each gene in x axis samples. Advanced screens expanded the target sequence to 2.5 kb, incorporated additional PCR protocols, and tested samples from both axes of the grids. This approach more than doubled the return of Mu inserts sought and was used to supplement the standard screen wherever possible (42% of grid screenings).
Sequences of the reference chromosomes have been deposited in GenBank as accession numbers CM000777 to CM000786. Sequences are also available at http://www.maizesequence.org.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. The RGP gene family.
Supplemental Figure S3. The Cobra gene family.
Supplemental Figure S4. Examples of visible mutants selected from the UniformMu population.
Supplemental Table S1. Genes expressed in maize ovaries before pollination.
Supplemental Table S2. Currently available cell wall-related Mu knockout insertion lines.
Supplemental Table S3. Origins of m/z fragments from PyMBMS.

ACKNOWLEDGMENTS
We thank Bill Foster, Phil Devillez, and Javier Campos for technical expertise in the field operations and NIR screens at Purdue, and Wayne Avigne and Susan Latshaw for technical expertise and reverse-genetics PCR at the University of Florida. We thank Robert Sykes for technical assistance with the PyMBMS. We also thank Cristal Musser for her dedicated service as project coordinator and the many National Science Foundation Research Experience for Undergraduates and undergraduate research interns who contributed throughout many phases of this work.