An mRNA blueprint for C 4 photosynthesis derived from comparative transcriptomics of closely related C 3 and C 4 species

C 4 photosynthesis involves alterations to the biochemistry, cell biology, and development of leaves. Together these modifications increase the efficiency of photosynthesis, and despite the apparent complexity of the pathway, it has evolved at least 45 times independently within the angiosperms. To provide insight into the extent to which gene expression is altered between C 3 and C 4 leaves, and to identify candidates associated with the C 4 pathway we used massively-parallel mRNA sequencing of closely related C 3 ( Cleome spinosa ) and C 4 ( Cleome gynandra ) species. Gene annotation was facilitated by the phylogenetic proximity of Cleome and Arabidopsis. Up to 603 transcripts differ in abundance between these C 3 and C 4 leaves. This includes 17 transcription factors, putative transport proteins, as well as genes that in A. thaliana are implicated in chloroplast movement expansion, connectivity, and cell-wall modification. These are all characteristics known to alter in a C 4 leaf, but which previously had remained undefined at the molecular level. We also document large shifts in overall transcription profiles for selected functional classes. Our approach defines the extent to which transcript abundance in these C 3 and C 4 leaves differs, provides a blueprint for the NAD-ME C 4 pathway operating in a dicotyledon, and furthermore identifies potential regulators. We anticipate that comparative transcriptomics of closely related species will provide deep insight into the evolution of other complex traits.


Introduction
C 4 photosynthesis is a complex biological trait which enables plants to either accumulate biomass at a much faster rate or live in adverse environments compared with 'ordinary' plants (Hatch, 1987;Osborne and Freckleton, 2009). These C 4 plants have added a CO 2 concentration mechanism on top of their regular photosynthetic carbon fixation which makes them not only more efficient at assimilating inorganic carbon; they frequently also have higher water and nitrogen use efficiencies (Black, 1973;Oaks, 1994;Osborne and Freckleton, 2009). Beyond the basic biochemistry, our understanding of the C 4 photosynthesis is limited.
The principle of C 4 photosynthesis is deceivingly simple: instead of using Ribulose 1,5-Bisphosphate Carboxylase Oxygenase (RuBisCO) as the primary carbon fixing enzyme, C 4 plants use PEP carboxylase (PEPC). Unlike RuBisCO, PEPC is more specific for inorganic carbon (Hatch, 1987). Since the C 4 cycle is an add-on rather than a replacement for RubisCO and the Calvin-Benson Cycle, the prefixed CO 2 is transported in a bound form, a C 4 acid, hence the name, to the site of RuBisCO. The C 4 cycle generates high concentrations of CO 2 around RubisCO (Hatch, 1987) and this increases the rate of photosynthesis because competition between CO 2 and O 2 at the active site of RuBisCO is reduced (Jordan and Ogren, 1984). In most C 4 plants, concentrating CO 2 around RuBisCO involves the reactions of photosynthesis being partitioned between bundle sheath (BS) and mesophyll (M) cells as well as changes to cell biology and leaf development (Hatch, 1987;Sage, 2004), although in some lineages, C 4 photosynthesis operates within individual cells (Reiskind et al., 1989;Keeley, 1998;Voznesenskaya et al., 2001;Voznesenskaya et al., 2002;Voznesenskaya et al., 2003).
In all known C 4 plants CO 2 enters M cells and is converted into bicarbonate by carbonic anhydrase (CA). Phosphoenolpyruvate carboxylase (PEPC) then combines HCO 3 with phosphoenolpyruvate (PEP) to generate the C 4 oxaloacetic acid (OAA), which is rapidly converted into either aspartate or malate. These C 4 acids then diffuse to the site of RuBisCO through abundant plasmodesmata, where C 4 acid decarboxylases release CO 2 (Hatch, 1987).
Three distinct C 4 acid decarboxylases known as NADP-dependent malic enzyme (NADP-ME), NAD-dependent malic enzyme (NAD-ME) and phosphoenolpyruvate carboxykinase (PEPCK) have been co-opted into the C 4 pathway, and this has been used to define three biochemical subtypes of C 4 photosynthesis. The three carbon compound released after decarboxylation diffuses back to the M cells, and is converted to PEP catalysed by pyruvate,orthophosphate dikinase (PPDK) (Hatch and Slack, 1968). Because the enzymes involved in the C 4 cycle are found in the cytosol, chloroplasts and mitochondria, a significant amount of transport across organellar membranes is required for the C 4 cycle to operate.
However, few genes encoding transporters that allow the increased intracellular flux of metabolites required for C 4 photosynthesis have been identified (Bräutigam et al., 2008a;Majeran and van Wijk, 2009). In addition, we have a very limited understanding of mechanisms controlling the altered cell biology and morphology associated with C 4 leaves.
The C 4 cycle likely affects not only the relatively small number of enzymes and transport proteins needed to perform the core reactions but, given the consequences on the ecological performance of the plants, also a range of other processes.
The gaps in our understanding of mechanisms underlying C 4 photosynthesis limits insight into a metabolic pathway that has evolved repeatedly at least 45 times in plants (Sage, 2004), and so is of interest in terms of understanding a remarkable example of convergent evolution.
In addition, because C 4 plants are amongst the most productive on the planet and the pathway is associated with increased water and nitrogen use efficiencies (Brown, 1999), it has been suggested that characteristics of C 4 photosynthesis should be placed into C 3 crops (Matsuoka et al., 2001;Mitchell and Sheehy, 2006;Hibberd et al., 2008). A more complete understanding of genes involved in C 4 photosynthesis is fundamental to attempts at placing components of the C 4 pathway into C 3 crops to increase yield.  (Flicek and Birney, 2009;Bräutigam and Gowik, 2010). We chose to compare the C 4 plant Cleome gynandra with the C 3 plant Cleome spinosa since they are members of the same genus and are closely related to Arabidopsis thaliana (Brown et al., 2005;Marshall et al., 2007). Given the close phylogenetic relationship, we can take advantage of the well annotated Arabidopsis genome (Swarbreck et al., 2008) and its known genome history (Bowers et al., 2003;Haberer et al., 2004;Thomas et al., 2006) to identify and quantify the biological functions regulated at the level of transcript abundance in the C 4 species compared with the C 3 species. Although the experiment will also capture variation in the abundance of transcripts associated with differences between the species that do not relate to C 4 photosynthesis, the close proximity of the Cleome species should reduce this effect. We chose to use mature fully differentiated leaves for the analysis since we wanted to minimize the influence of species-specific effects during leaf differentiation but rather focus on transcript profiles when C 4 photosynthesis is fully operational. Once this profile is defined, analysis of developmental stages may reveal how the profile is achieved during differentiation.
By comparing the transcriptomes of closely related C 3 and C 4 species, we will test (i) whether cross species transcriptomic comparisons are feasible, (ii) the degree to which the core C 4 cycle enzymes and transport proteins are regulated at the level of transcript abundance, (iii) whether the changes in metabolism associated with C 4 photosynthesis are associated with additional unexpected shifts in transcript profiles in leaves of C 4 compared with C 3 plants, and (iv) define candidates for additional functions critical to C 4 photosynthesis based on unbiased observation of the data. By analyzing the complete transcriptome, we define the maximal extent to which the C 4 pathway alters leaf transcript profiles.

Physiological analysis of C 3 and C 4 leaves confirms C 4 metabolism in C. gynandra
To confirm that the C. spinosa and C. gynandra leaves we used for transcriptomic analysis were using C 3 and C 4 photosynthesis respectively, we analyzed the steady-state levels of metabolites associated with the C 4 cycle. For example, large quantities of aspartate, alanine and pyruvate are produced in M and BS cells of NAD-ME C 4 leaves, and they were 19, 3.9, and 3.6 times more abundant respectively, in C. gynandra compared with C. spinosa (Table   S1). In contrast, and in agreement with the lower demand for the photorespiration in C 4 leaves, glycerate and glycolate, intermediates of the photorespiratory cycle, were 4.5 and 1.9 times more abundant in C. spinosa (Table S1). We also determined the extractable activities of PEPC, aspartate aminotransferase (AspAT), NAD dependent malate dehydrogenase (NAD-MDH), NAD-ME and alanine aminotransferase (AlaAT). Except for NAD-MDH, significantly higher activities of the enzymes required for the C 4 cycle were measured in C.
gynandra leaf extracts ( Figure S1). The metabolite profiling of leaf extracts using GC-EI-TOF and the enzyme activity assays showed that the plants we used for digital gene expression analysis had clear differences in their metabolite profiles and enzyme activities, and these were consistent with functional C 3 and C 4 photosynthesis operating in leaves of C.

The leaf transcriptomes for closely related C 3 and C 4 species are qualitatively similar
To obtain sequence tags for digital gene expression (DGE) analysis from C. spinosa (C 3 ) and C. gynandra (C 4 ), RNA was isolated from mature leaves of each species and prepared for 454 sequencing. One sequencing run on a GS FLX sequencing system was conducted on leaf cDNA isolated from either C. gynandra or C. spinosa. From C. spinosa, we obtained 70,564,592 nucleotides (nts) and from C. gynandra 91,851,136 nts of raw sequence that after quality control corresponded to 65,525,139 nts and 85,681,233 nts, respectively ( Table 1).
The mean read-length of the cleaned sequence reads was 232 nts for C. gynandra and 230 nts for C. spinosa (Table 1).
To exclude program specific mapping artifacts and to test whether the C. gynandra and C.
spinosa libraries behave robustly similar during mapping, two different programs, BLAST and BLAT, were used to align the reads to Arabidopsis as the reference genome. To define the most suitable mapping parameters an array of parameters for mappings in both the DNA and protein space were tested (Table 2). Neither the C. gynandra nor the C. spinosa library mapped well to Arabidopsis cDNAs in the DNA space using BLAT or BLAST, although the differences are more dramatic for BLAT (Table 2). In the protein space, however, the proportion of mapped reads increased dramatically. When 75% amino acid sequence identity was required, three quarters of the reads could be mapped with BLAT, resulting in 1.48 and 1.57 average mappings per read, respectively. Even with the most lenient mapping parameters the proportion of mapped reads did not exceed 83% with BLAT and 78.8% with BLAST (Table   2). In all mapping attempts, the C. gynandra and the C. spinosa read libraries yielded qualitatively similar mapping results, irrespective of mapping program or parameters.
To obtain a stringent yet inclusive mapping, the mapping conducted in protein space at ≥ 75% identity with BLAT was chosen and this mapping file was parsed by in-house scripts to keep only the read match with the highest number of matching bases. For a more lenient mapping, a BLAST mapping at a cut-off of 1e -5 was chosen and parsed to keep only the best BLAST Hit for each read. For each AGI the number of matching reads was counted and the hit count was then transformed to Reads Per Million (RPM) to normalize for the number of reads available for each species. After parsing, the sequenced libraries matched between 50.5% and 55.3% of the genes in the Arabidopsis reference (Table S2).
To assess whether the datasets for the two different species and the two different mappings were qualitatively similar, we tested the coverage of the functional classes. Overall about 50% of all genes were represented in both species with the BLAT ( Figure 1A) and the BLAST mapping ( Figure 1B). Although the majority of gene classes were represented by more than 50% of genes in each class for both mappings the classes 'function unknown', 'putative lipid transfer protein', 'storage protein' and 'defence' were under-represented compared to all genes ( Figure 1A and B). Genes present in the organellar genomes were not well represented (Table S3). Genes classified into primary metabolism including 'photosynthesis', 'central carbon', 'nitrogen metabolism', 'amino acid' and 'nucleotide metabolism' as well as many cellular processes were well-represented categories, and about four-fifths of genes predicted to be involved in the C 4 pathway were detected in both species. Overall the pattern of detection in the different gene classes was similar for both species and independent of the program used for the mapping ( Figure 1A and B).

Transcripts of known C 4 genes are more abundant with one exception
Detailed analysis of known C 4 genes showed that all but one gene necessary for the core C 4 cycle of NAD-malic enzyme type plants were massively upregulated in C. gynandra compared with C. spinosa. Transcripts encoding PEP carboxylase were upregulated 78-fold, aspartate aminotransferase were upregulated 343-fold, NAD-malic enzymes upregulated 27fold and 21-fold, respectively, and alanine aminotransferase was upregulated 29-fold (Table   3). The results for the BLAT and the BLAST mappings were similar with one exception. In the BLAST mapping the reads mapping to PEPC were split onto two genes in the Arabidopsis reference genome whereas they mapped to only one gene in the BLAT mapping (Table 3).
Transcripts encoding mitochondrial malate dehydrogenases were increased only 1.3-fold (Table S3). Not only were genes associated with the C 4 pathway upregulated compared to C 3 but they also had high absolute read counts between 1,800 and 4,806 RPM.

The leaf transcriptomes for closely related C 3 and C 4 species are quantitatively different
Before undertaking detailed analysis of differences in transcript abundance between C.
gynandra and C. spinosa we used quantitative polymerase chain reactions (qPCR) to confirm estimates of transcript abundance identified by RNA-Seq. We chose genes whose transcript abundance differed over four orders of magnitude, and used qPCR to assess their abundance.
qPCR was performed on both the cDNA used for RNA-Seq and cDNA generated from RNA isolated from leaves in a separate experiment. This approach provided strong support for the differences in abundance of transcripts between the two species that we determined from RNA-Seq ( Figure 2). Overall, this showed that the ratios of transcript abundance obtained by RNA-Seq-based DGE are suitable for calling differentially expressed genes between two related species.
gynandra and 327/345 (1.5%/1.6%) transcripts being more abundant in C. spinosa ( Figure 3A and B, 'all'). We tested whether significantly changed transcripts are enriched in functional categories and whether they were more highly expressed in the C 4 or the C 3 species. While the qualitative classification of detected genes showed a very similar pattern between C.
spinosa and C. gynandra ( Figure 1A and B), the quantitative analysis revealed massive differences in representation between gene classes in the C 3 and the C 4 species ( Figure 3A and B). The transcript profile generated by the BLAT mapping ( Figure 3A) is similar to the one generated by the BLAST mapping ( Figure 3B), although not all genes called as significantly regulated were identical (Table S3). The classes containing the highest percentage of changed genes are the photosynthetic classes as well as the C 4 cycle, Calvin-Benson Cycle, and photorespiration ( Figure 3A and B). The latter two have lower steady-state mRNA levels in C 4 leaf tissue ( Figure 3A and B bottom) while the photosynthetic classes of photosystem I, cyclic electron flow, and cytochrome b6/f complex as well as the C 4 cycle have higher levels in C 4 leaf tissue ( Figure 3A and B top). A number of classes involved in primary metabolism also have lower steady state transcript levels in C 4 tissues: one carbon compound metabolism, other central carbon metabolism, shikimate pathway and amino acid metabolism. Protein synthesis also has lower steady state transcript levels which are limited to cytosolic and plastidic protein synthesis genes ( Figure S3). Among the classes with higher steady state transcript levels are starch metabolism, cofactor synthesis, putative lipid transfer proteins, nitrogen metabolism, and beta-1,3 glucan metabolism. The quantitative pattern ( Figure 3A and B) is similar to the qualitative pattern ( Figure 1A and B) with regard to the influence of the mapping program; the BLAT and BLAST mappings look remarkably similar with the exception of shikimate metabolism.

Transcripts with similar patterns of abundance compared with bona fide C 4 genes and RuBisCO
The list of 13,662 transcripts detected in either C. spinosa or C. gynandra tissues and the list of 603 transcripts which are differentially regulated between both species (Table S3, BLAST mapping) prompted us to determine which transcripts showed changes in abundance similar to the core C 4 genes or RuBisCO subunit encoding genes. Such transcripts display both a large fold-change between the C 3 and the C 4 plants and large absolute read numbers.
For example, among the transcripts encoding putative transport proteins, three plastidic transport proteins, the phosphoenolpyruvate phosphate translocator PPT, a putative bile acid: sodium symporter, and a putative proton:sodium antiporter, two mitochondrial dicarboxylate carriers and one plasmamembrane intrinsic protein were massively upregulated in C 4 C.
gynandra (Table 4). No transcripts encoding transport proteins are found to be downregulated to a comparable degree. Among metabolic genes, two cytosolic carbonic anhydrases, one of which (CA4; see Table 4) is likely tethered to the plasma membrane, an adenylate kinase and a pyrophosphatase are upregulated at levels comparable to those of C 4 cycle genes. Many proteins of unknown function showed differential expression, the most striking case being a putative lipid transfer protein, also annotated as an extensin-like protein. Based on annotation and differential expression pattern, several transcripts predicted to encode known C 4 functions that have not yet been assigned to genes, such as CHUP1 and actin for chloroplast positioning or callose degrading enzymes for regulating plasmodesmatal opening were identified (Table   4).

Regulatory genes that are significantly changed
The transcript profiles of these C 3 and the C 4 species identify a number of regulatory proteins that are candidates for maintaining C 4 status. Among transcripts encoding proteins with regulatory functions, 43 were significantly upregulated in either C. gynandra or C. spinosa ( Figure 3A and B). These include bona fide transcription factors, protein phosphatases and kinases, and the regulatory proteins of the pyruvate dehydrogenase complex (up in C 4 ), of PPDK (up in C 4 ) and of RuBisCO (down in C 4 ). Only seventeen transcription factors are significantly changed, seven of those have higher steady state mRNA levels compared to the C 3 leaf tissue while ten have lower steady state mRNA levels (Table 5).
In addition to the detailed quantitative and qualitative analysis of read mappings to generate ESTs for both species, contigs were assembled from cleaned reads for each species as described previously (Weber et al., 2007;Bräutigam et al., 2008b)  Since no read alignment program has emerged as the consensus program for NGS data analysis, two different programs were used for mapping and the output compared in all cases.
The output proved robust against changing the mapping program both qualitatively and quantitatively. When we mapped the quarter million reads obtained from each species of Cleome to a minimized TAIR 9 release of the A. thaliana genome they corresponded to ~11,000 loci. As the minimized TAIR 9 dataset contains 21,972 gene loci, the reads we collected in C. gynandra and C. spinosa represent approximately 50% of the transcriptome. In Likewise, certain pathways of secondary metabolism are likely restricted to defined tissues or developmental stages making it unlikely that we would pick up many of these genes when profiling leaf libraries. Based on the gene detection pattern, the two plant species did not encounter different biotic or abiotic stressors or are not in different stages of growth as very similar genes were detected in both species ( Figure 1A and B, Figure 3A and B).
Finally, only a very small proportion of transcripts showed significant differences in abundance between the two different species (Tables S2 and S3) and these changes were enriched in a limited number of functional classes ( Figure 3A and B). We conclude that cross species mapping in protein space is a feasible strategy to compare different species as long as an equidistant reference is available (Bräutigam and Gowik, 2010).
Transcripts derived from core C 4 cycle genes are more abundant in the C 4 species C 4 photosynthesis has evolved convergently in many different lineages of plants (Sage, 2004) and in many cases the alterations to expression of specific genes has been related to transcriptional regulation (summarized in Sheen, 1999). Our genome scale analysis allowed us to compare the steady state transcript levels for all candidate C 4 genes at the same time. For all of the enzymes where a change in total extractable activity could be shown ( Figure S1), a higher mRNA level of at least one isoform as judged from the read count is also present (  1997;Bräutigam et al., 2008a), is also upregulated 20-fold indicating that this transport protein is regulated at the level of mRNA abundance. Based on similarities in transcript abundance to known C 4 genes, our comparative RNA-seq also identified likely additional components needed for C 4 photosynthesis. When PPDK was characterized it was proposed that adenylate kinase as well as inorganic pyrophosphatase need to be abundant in C 4 chloroplasts (Hatch and Slack, 1968). RNA-seq confirmed this prediction and showed that the upregulation also occurs at the level of transcript abundance. Taken together, we found that almost all transcripts encoding the proteins required for the core C 4 cycle have higher steady state mRNA levels and we propose that, at least in C. gynandra, the activity of C 4 cycle enzymes and transport proteins is controlled at least partially at the level of transcript abundance. photorespiratory genes. The photosynthetic genes, the Calvin-Benson Cycle and photorespiratory genes (in C 3 ) and the C 4 cycle genes (in C 4 ) are those with the highest read counts of the genes with known function (Table S3). Although it is currently not possible to quantify absolute transcript levels since the genomes of neither Cleome species has been sequenced, the high read counts obtained for the genes of central carbon metabolism and photosynthesis indicate that the steady state level of transcripts are high. Since the most altered gene classes are also those that contain the genes with the highest absolute read count, it is not clear whether C 4 photosynthesis lowers or raises the demand on protein synthesis and accessory pathways such as amino acid synthesis. However, both the protein synthesis and the amino acid metabolism classes contain more genes that have lower steady state levels in C 4 leaf tissue (Figure 3). Within the protein synthesis gene class, many transcripts encoding structural components of plastidic and cytosolic ribosomes were reduced ( Figure S3). This was not the case for components of mitochondrial ribosomes ( Figure S3) indicating that there is not a general effect on translation but that the effect is likely specific to ribosomes involved in translation for the Calvin-Benson Cycle and photorespiration. The protein to fresh-weight ratio is also lower in C 4 leaf tissue compared to C 3 leaf tissue ( Figure S2). We propose that plastidic ribosomes are relieved of the high translation load associated with the large subunit of RuBisCO and the cytosolic ribosomes need to translate fewer transcripts associated with central carbon metabolism as well as the small subunit of RuBisCO. The reduced production of proteins in the leaves of C 4 plants is considered important in increasing nitrogen use efficiency because the rate of photosynthesis per unit nitrogen in the leaf is increased (Oaks, 1994). Our data indicate that there is also likely a significant saving in the nitrogen provision in the leaf because fewer ribosomes as well as fewer proteins for central carbon metabolism are required.
The dataset contains two additional gene classes, beta-1,3 glucan metabolism and putative lipid transfer proteins, that showed differences in transcript abundance between C. gynandra than C. spinosa that could be explained within the current framework of knowledge of C 4 photosynthesis. The C 4 pathway requires efficient exchange of metabolites between M and

BS cells via large numbers of plasmodesmata connecting both cell types while the BS cell
wall of many C 4 plants is suberized to reduce diffusion of CO 2 away from RuBisCO (Hatch, 1987). Transcripts encoding three distinct glucan 1,3-beta-glucosidases (Table 4)  There are additional changes in the transcript profile that are less easily explained. Among the gene classes containing more genes with significantly higher transcript levels in C 4 leaf tissue are starch metabolism, cofactor synthesis and nitrogen metabolism and heat shock/protein folding (in order of decreasing number of significantly different genes). On the other hand it is difficult to conceive why genes involved in metal handling are frequently lower in transcript level in C 4 leaf tissues ( Figure 3A and B). These changes may be connected to currently unknown phenomena relating to the C 4 pathway or be part of differences not relating to C 4 photosynthesis between the two species. Overall the global analysis of transcription on the level of functional classes reveals unexpected shifts in transcript profiles which can be explained based on the current knowledge about the C 4 pathway while a range of, albeit smaller, changes remain enigmatic.
Finally, our global transcriptional analysis of C 4 and C 3 leaf tissues not only allows testing hypotheses about the C 4 pathway on a global scale but also allowed genes with expression patterns similar to those of known C 4 genes to be identified. The phylogenetic proximity of the Cleomaceae to A. thaliana allows the identification of the orthologues in A. thaliana which will facilitate translational research into the model species (Brown et al., 2005).

Candidates for additional C 4 related genes
The identification of transport proteins involved in the C 4 cycle lags behind that of enzymes considering that the C 4 cycle requires the intracellular transport of pyruvate, PEP, aspartate and alanine across different organellar membranes (Bräutigam and Weber, 2010). A wide range of C 4 plants take up pyruvate into chloroplasts from the M in co-transport with sodium (Aoki et al., 1994;Aoki and Kanai, 1997), which might explain the requirement for sodium as a micronutrient in many C 4 species (Brownell and Crossland, 1972). Since the rate of pyruvate transport into C 4 M cell chloroplasts occurs at or exceeds the apparent rate of CO 2 assimilation, sodium-coupled pyruvate import implies a large influx of sodium into these chloroplasts, but the transporter has not yet been identified at the molecular level (Aoki and Kanai, 1997). Our finding that a putative plastidic proton:sodium symporter (NHD1) is 16fold up-regulated in C. gynandra prompts us to hypothesize that it functions in exporting sodium from the chloroplast in order to maintain the sodium gradient required for import of pyruvate. In addition, we found strong upregulation of a putative bile acid:sodium cotransporter in C. gynandra. Interestingly, up-regulation of the putative bile acid:sodium cotransporter or of NHD1 was not observed in maize (Bräutigam et al., 2008a), which belongs to a group of C 4 plants that show proton-dependent, not sodium-dependent transport of pyruvate into M cell chloroplasts (Aoki et al., 1994;Aoki and Kanai, 1997). PEP generated from pyruvate in M cell chloroplasts is exported from these chloroplasts by PPT, thereby providing the substrate for the cytosolic PEPC reaction. Accordingly, transcripts encoding PPT are 20-fold up-regulated in C. gynandra, likely reflecting the increased requirement for transport of PEP (Table 3). In contrast to what has been observed for the NADP-ME-type C 4 plant maize by quantitative proteomic analysis (Bräutigam et al., 2008a), we did not detect increased transcript abundance of the putative M chloroplast oxaloacetate/malate exchanger DiT1 (Taniguchi et al., 2002;Renne et al., 2003;Taniguchi et al., 2004) (Table S3). This is consistent with the fact that OAA/malate shuttling across the M cell chloroplast envelope membrane is not required for NAD-ME-type C 4 photosynthesis (Bräutigam and Weber, 2010; Weber and von Caemmerer, 2010). The mitochondrial dicarboxylate carriers are prime suspects for the C 4 acid importer into the mitochondria where decarboxylation takes place (Table 4). The initial uptake of inorganic carbon and its conversion to bicarbonate may be facilitated by the concerted action of a membrane intrinsic protein channelling the gas and a carbonic anhydrase that is predicted to be membrane bound (Table 4).
Chloroplasts in the BS of C. gynandra are larger than those in the BS of C 3 species and as in many other C 4 plants are positioned in a strictly centripetal pattern (Marshall et al., 2007;Voznesenskaya et al., 2007). Transcripts derived from the GC1 (GIANT CHLOROPLAST1) gene were more abundant in C. gynandra than in C. spinosa (Table 4)

Controlling and maintaining a C 4 state in leaf tissue
Our estimate that around 603 transcripts accumulate differentially in leaves of C 3 and C 4 species provides insight into the extent to which gene expression profiles change in C 4 leaves.
For example, the fact that 258 transcripts were more abundant in the leaves of the C 4 compared to the species indicates that about 2.8% of the leaf transcriptome differentially accumulates in C 4 leaves (Tables S2 and 6). To compare the complexity of the C 4 pathway with other multigenic traits, we assessed the number of transcripts that are known to be regulated by sugars, cold, diurnal and circadian rhythms as well as attack by pests and pathogens (Table 6). Interestingly, the alterations in transcript abundance of leaves of C.
gynandra compared with those of C. spinosa, were greater than those observed in response to cold treatment, and lower than those induced by glucose feeding, those occurring during pathogen attack, and of the response to both diurnal and circadian rhythms. As significant progress has been made in understanding sugar signalling (Rolland et al., 2006), pathogen attack (Wise et al., 2007) and the control of gene expression in response to the diurnal cycle and circadian rhythms (Imaizumi et al., 2007), it should be possible to identify the regulators responsible for these alterations in transcript abundance in a C 4 leaf compared with a C 3 leaf.
The changes in transcript abundance that we document in a C 4 leaf compared with a C 3 leaf likely over-represents the changes in transcript abundance actually associated with C 4 photosynthesis on a whole leaf basis, as some differences in gene expression are likely due to the phylogenetic distance between C. gynandra and C. spinosa. A more confident estimate of the extent to which the leaf transcriptome is altered in association with C 4 photosynthesis will be generated when additional con-generic pairs of C 3 and C 4 species are subjected to deep www.plantphysiol.org on August 29, 2017 -Published by Downloaded from Copyright © 2010 American Society of Plant Biologists. All rights reserved. transcriptome analysis, and shared transcripts are identified. Between M and BS cells, the alterations in gene expression may be greater than those that we have defined for whole leaves. For example, up to 18% of genes are estimated to be differentially expressed between M and BS cells of maize (Sawers et al., 2007). However, it is not clear how different the transcript profiles of M and BS cells are in a dicot C 3 leaf, and until this is defined, it is not possible to infer the extent to which transcript abundance alters in these cell-types in association with C 4 photosynthesis.
As we sampled from mature leaves to capture the differences between C 3 and C 4 leaves at the point of fully differentiated pathways we likely also captured regulatory genes needed to maintain C 4 architecture and metabolism in mature leaves. Of the seventeen transcription factors significantly altered (Table 5), GOLDEN2-LIKE1 (GLK1), has previously been implicated in regulating genes important in C 4 photosynthesis. In maize GOLDEN2 (G2) controls functional differentiation of chloroplasts in BS cells (Langdale and Kidner, 1994), and GOLDEN2-LIKE1 has been implicated in the expression of photosynthesis genes in M cells (Rossini et al., 2001). The fact that GLK1 transcripts are significantly more abundant in leaves of C. gynandra would not necessarily be predicted as previous work indicates it becomes specialised in BS cells of C 4 leaves, but not that its abundance is altered significantly. This implies that the increase in abundance of GLK1 transcripts may not simply be due to its involvement in C 4 photosynthesis. When over-expression of GLK1 was induced in A. thaliana the abundance of 114 transcripts were altered (Waters et al., 2009). We assessed the extent to which the genes that are controlled by GLK1 alter in abundance in leaves of C. gynandra compared to C. spinosa, and found that only 19 genes were shared between the two datasets. This may be due to a number of factors that could include the following; that there are differences in the targets of GLK1 in A. thaliana and C. gynandra; that a number of other transcriptional regulators are more important than GLK1 in maintaining patterns of photosynthesis gene expression in C. gynandra; or that a rapid Evolution did not stop in the lineage to the reference genome of A. thaliana after the Cleomaceae branch diverged. Hence there may be genes which were tandem duplicated or retained after the whole genome duplication event of the Brassicaceae that are absent in either of the Cleomaceae (Bräutigam and Gowik, 2010). To avoid mapping problems such as splitting of reads or mapping errors due to differential retention of genes in either Cleomaceae or Arabidopsis, we created a minimal genome for mapping. The remnants of the last whole genome duplication in the lineage of the Brassicaceae (Bowers et al., 2003;Thomas et al., 2006) and the tandem duplicated genes (Haberer et al., 2004) were reduced to one representative for each based on the TAIR9 coding sequence set. In each case, the gene with the lowest AGI code was retained for mapping. For each gene, the supplemental files store whether there are duplicates and which duplicates match the gene (Tables S3 and S5). We recommend recovery of the associated duplicated genes followed by a detailed analysis with phylogenetic trees to define the true orthologue when translating the results of Cleomaceae analyses to Arabidopsis research.
The 454 sequence reads were mapped onto coding sequences of the minimalized TAIR9 genome by BLAT (Kent, 2002) and BLAST (Altschul et al., 1997) with varying parameters and the output was parsed with in-house PERL scripts to retain only the best matching AGI for each sequence read and the best BLAST hit, respectively. Differentially expressed transcripts were identified using the Poisson statistics developed by Audic and Claverie (Audic and Claverie, 1997) followed by a Bonferroni-correction to account for the accumulation of alphatype errors when conducting multiple pair-wise comparisons (Audic and Claverie, 1997).

Plant material and qPCR analysis
Both species were grown in a growth chamber in long day conditions (16 hrs light / 8 hrs dark) under 350 µmol photons m -1 s -1 , at 22°C, and 65% relative humidity prior to samples being taken for qPCR. qPCR was conducted on the same samples used for RNA-Seq, and also on mature leaves collected at noon grown in the growth cabinet. For qPCR RNA was isolated using TriPure reagent (Roche Applied Science). RNA was treated with DNase I (Promega) and purified with RNeasy Mini Kit (Qiagen). First-strand cDNA was then synthesised with SuperScriptII reverse transcriptase (Invitrogen) using 4 µg of RNA and oligo(dT) primers (Roche Applied Science). qRT-PCR was carried out with 96-well plates using a DNA Engine thermal cycler, Chromo4 real-time detector (BioRad), SYBR Green JumpStart Taq Ready Mix (Sigma) and 15-fold dilution of the cDNA as a template. Initial denaturation was carried out at 94°C for 2 min, then followed by 40 cycles of 94°C for 20 s, 60°C for 30 s, 72°C for 30 s, and 75°C for 5 s. Primers were designed to have melting temperatures of 60±0.5°C and to produce amplicons of 91 to 189 bp. The specificity of the primers and lack of primer dimers in the PCR reaction were verified using agarose gel electrophoresis and melting curve analysis. C T values were generated for three technical replicates and four independent biological replicates per species. The comparative 2 -∆∆CT method was used to quantify relative abundance of transcripts (Livak and Schmittgen, 2001).

Polar metabolite, chlorophyll, protein and enzyme activity analysis
For metabolite analysis, mature leaves from 56-day-old plants were collected in the middle of the light period and immediately frozen in liquid nitrogen. Three independent biological replicates were used. The tissues were ground in a mortar and a 50 mg fresh weight aliquot was extracted using the procedure described by Lee and Fiehn (Lee and Fiehn, 2008). Ribitol was used as an internal standard for data normalization. For GC-EI-TOF analysis, samples were processed and analyzed according to Lee and Fiehn (Lee and Fiehn, 2008). Enzyme activities, chlorophyll and protein content were determined according to (Hausler et al., 2001).

Supplemental Material
The following materials are available in the online version of this article. Table S1. Relative abundance of predominant metabolites detected by GC-EI-TOF in C. gynandra and in C. spinosa confirms a functional C 4 pathway in the leaves of C. gynandra used for analysis of transcript abundance. Table S2. Number of gene loci and number of differentially expressed genes detected with two different mapping programs, BLAT and BLAST. Table S3. Excel worksheet providing quantitative information for all reads mapped onto the reference genome from Arabidopsis thaliana. Table S4. Comparison of mapping parameters with two different programs, BLAST and BLAT, using a minimized and a complete reference genome. Table S5. Segmental and tandem duplicates in the Arabidopsis thaliana genome. Figure S1. Quantitation of marker enzyme activities in leaf extracts of C. spinosa and C. gynandra. PEPC, PEP carboxylase; NADME, NAD-dependent malic enzyme; PEPCK, PEP carboxykinase; AlaAT, alanine aminotransferase; AspAT, aspartate aminotransferase; NADMDH, NAD-dependent malate dehydrogenase. Figure S2. Protein to fresh weight and protein to chlorophyll ratios in leaves of C. gynandra and C. spinosa.

Supplemental
Supplemental Figure S3. The percentage of genes encoding ribosomal components with significantly lower abundance of transcripts in C 4 (blue bars), or significantly higher abundance of transcripts in C 4 (red bars) based on the total number of genes in each annotation class (in parentheses on the y axes).      Tables   Table 1. Massively parallel signature sequencing allows large-scale assembly of transcripts in both C. spinosa and C. gynandra after comparison with the TAIR8 Arabidopsis database. One GS FLX sequencing run allowed significant generation of sequence for both species and the vast majority of these could be used to assemble contigs, and then matched to Arabidopsis genes.