Expression quantitative trait loci analysis of two genes encoding rubisco activase in soybean.

Rubisco activase (RCA) catalyzes the activation of Rubisco in vivo and plays a crucial role in photosynthesis. However, until now, little was known about the molecular genetics of RCA in soybean (Glycine max), one of the most important legume crops. Here, we cloned and characterized two genes encoding the longer α -isoform and the shorter β -isoform of soybean RCA (GmRCA α and GmRCA β, respectively). The two corresponding cDNAs are divergent in both the translated and 3 ′ untranslated regions. Analysis of genomic DNA sequences suggested that the corresponding mRNAs are transcripts of two different genes and not the products of a single alternatively splicing pre-mRNA. Two additional possible α -form RCA-encoding genes, GmRCA03 and GmRCA14, and one additional β -form RCA-encoding gene, GmRCA11, were also isolated. To examine the function and modulation of RCA genes in soybean, we determined the expression levels of GmRCA α and GmRCA β, Rubisco initial activity, photosynthetic rate, and seed yield in 184 soybean recombinant inbred lines. Correlation of gene expression levels with three other traits indicates that RCA genes could play an important role in regulating soybean photosynthetic capacity and seed yield. Expression quantitative trait loci mapping revealed four trans-expression quantitative trait loci for GmRCA α and GmRCA β. These results could provide a new approach for the modulation of RCA genes to improve photosynthetic rate and plant growth in soybean and other plants.

Photosynthesis is a major target for improving crop productivity, and considerable research has been carried out to select and breed for genotypes with a superior photosynthetic rate (P N ; Sinclair et al., 2004). In higher plants, photosynthesis is usually limited at the step of CO 2 assimilation as catalyzed by Rubisco (Hartman and Harpel, 1994;Spreitzer and Salvucci, 2002). The activity of Rubisco is regulated by complex mechanisms in vivo. Numerous studies have shown that Rubisco can be maintained in an active state by the continued action of a second protein called Rubisco activase (RCA; Portis, 2003). The activities of RCA are thought to be key regulation points for photosynthesis under different environmental stress conditions (Crafts-Brandner and Salvucci, 2000;Pollock et al., 2003). Plants expressing reduced levels of RCA exhibit decreased levels of P N and/or growth (Mate et al., 1996;Eckardt et al., 1997;He et al., 1997), and those with very low or no RCA expression cannot survive in atmospheric CO 2 (Somerville et al., 1982;Salvucci et al., 1985Salvucci et al., , 1986Mate et al., 1993;von Caemmerer et al., 2005). These results make modulation of RCA an attractive experimental goal for the improvement of CO 2 fixation rates and, ultimately, crop productivity.
RCA is an AAA+ (ATPases associated with a variety of cellular activities) protein that functions like a molecular chaperone (Sanchez de Jimenez et al., 1995), catalyzing the activation of Rubisco in vivo by the ATP-dependent removal of various inhibitory sugar phosphates (Portis, 2003). Based on many mutagenesis studies of RCA and/or Rubisco, Portis et al. (2008) described a model for the mechanism of RCA action as follows. First, RCA is bound to Rubisco through electrostatic and other forces, including amino acid regions 89 to 94 on Rubisco and amino acid regions 311 to 314 on RCA. Second, ATP hydrolysis promotes movement of the C-terminal sensor-2 domain (including amino acid region 311-314) of RCA, with the Arg residue in the sensor-2 domain possibly establishing this couple. Third, due to the interaction established at amino acid positions 89 to 94 and probably elsewhere, the N-terminal domain of Rubisco moves accordingly, which could break the interactions between Glu-60 in the N-terminal domain of Rubisco, Lys-334 in loop 6, and the bound sugar phosphate. Finally, loop 6 becomes free to move out of the active site, and the bound sugar phosphate is free to dissociate. In this way, RCA frees the active sites of Rubisco for spontaneous carbamoylation by CO 2 and metal binding and activates the Rubisco holoenzyme. Activated Rubisco catalyzes the carboxylation of ribulose 1,5-bisphosphate to form two molecules of 3-phosphoglycerate under ample concentrations of CO 2 (Portis et al., 2008).
RCA gene expression seems to be tissue specific in all higher plants examined. It occurs almost only in green parts of the plant and is developmentally regulated by leaf age and light (Watillon et al., 1993;Liu et al., 1996). For example, circadian oscillations of RCA mRNA levels have been detected in tomato (Solanum lycopersicum), apple (Malus domestica), Arabidopsis, and rice (Martino-Catt and Ort, 1992;Watillon et al., 1993;Liu et al., 1996;To et al., 1999). Changes in mRNA levels may result from either transcriptional or posttranscriptional regulation or both (Chen and Rajewsky, 2007). A nuclear run-on analysis showed that the rhythmic oscillation is controlled at the transcriptional level in Arabidopsis, in which RCA mRNA synthesis is correlated with RCA mRNA accumulation (Pilgrim and McClung, 1993).
For some years, quantitative trait locus (QTL) analysis has been performed to detect the determinants of important agronomic or physiological traits (Mouille et al., 2006;Cui et al., 2008;Tisne et al., 2008), providing valuable information for gene discovery and crop improvement. Diversity in gene expression is one of the mechanisms underlying phenotypic diversity among individuals. Therefore, analysis of determinants of candidate gene expression not only helps in understanding the mechanisms for phenotypic variation but also provides an approach to improve phenotypes via the modulation of gene expression. With advances in gene expression profiling, an approach named "genetical genomics" has been put forward to identify the determinants of gene expression (Jansen and Nap, 2001). This approach treats mRNA expression levels as quantitative traits in a segregating population and maps expression QTLs (eQTLs) that control expression levels in vivo. For almost any gene analyzed in a segregating population, eQTL analysis can identify the genomic regions influencing its expression level. The genetical genomics approach has been employed for identifying eQTLs regulating gene expression (Potokina et al., 2006;Sladek and Hudson, 2006). Soybean (Glycine max) is one of the most important legume crops and a typical allotetraploid (Shoemaker et al., 2006). However, to date, the only information concerning RCA in soybean is the existence of two RCA isoforms detected by immunoblotting (Salvucci et al., 1987). In this study, we cloned and characterized two soybean RCA genes, GmRCAa and GmRCAb, encoding the longer a-isoform and the shorter b-isoform of RCA, respectively. We also isolated three soybean RCA-like genes and analyzed their phylogenetic relationships to GmRCAa and GmRCAb. Correlation analysis of RCA gene expression level, Rubisco initial activity, P N , and seed yield in a set of soybean recombinant inbred lines (RILs) showed that RCA gene expression levels could affect photosynthetic capacity and plant growth, and eQTL mapping revealed four trans-eQTLs for GmRCAa and GmRCAb. The existence of multiple family members and different expression patterns of RCA genes, in combination with long-term genome duplication, provide interesting information about soybean evolution. Taken together, these data provide new information for the modulation of RCA genes to improve P N and ultimately seed yield in soybean and other plants.

Characterization of GmRCAa and GmRCAb cDNAs
The cDNAs of GmRCAa and GmRCAb contained a complete open reading frame (ORF) and partial 3# untranslated sequences. The predicted proteins encoded by GmRCAa and GmRCAb contained 478 and 443 amino acids, with calculated molecular masses of 52.29 and 48.63 kD, respectively (Fig. 1). The first 58 amino acids at the N terminus of both proteins were predicted to be chloroplast transit peptides (chlorop version 1.1 server; http://www.cbs.dtu.dk/services/ ChloroP/). Thus, the predicted mature proteins encoded by GmRCAa and GmRCAb contained 420 and 385 amino acids, with calculated molecular masses of 46.76 and 43.14 kD, respectively (Fig. 1). The deduced protein sequence of GmRCAa contained a 36-amino acid extension at the C terminus ( Fig. 1, in boldface), including two Cys residues that are known to be involved in redox regulation (Zhang and Portis, 1999;Salvucci et al., 2003). Two conserved ATP-binding domains, GGKGQGKS and LFIND (Shen and Ogren, 1992), were identified at amino acid positions 169 to 176 and 229 to 233, respectively, in soybean RCA; here, numbers correspond to the amino acid positions in the complete GmRCAa protein sequence (Fig. 1). According to studies in spinach and tobacco (Shen et al., 1991;van de Loo and Salvucci, 1998), Lys-175 in the first domain of soybean RCA correlates with Rubisco activation and ATPase activities, and Asp-233 in the second domain of soybean RCA is necessary for the precise coordination of the g-phosphate and, therefore, for subunit aggregation.
In addition to the 36-amino acid extension at the C terminus of the a-isoform, the two RCA isoforms differ at 41 positions, including eight residues in the putative transit peptides (Fig. 1). Obvious differences in the 3# untranslated regions of GmRCAa and GmRCAb cDNAs were also observed, although only partial sequences were obtained (Fig. 2). The divergence in both the ORF and 3# untranslated regions of the cDNA sequences encoding RCA aand b-isoforms suggests that the two forms of soybean RCA are encoded by different genes.

Analysis of Genomic Fragments of GmRCAa and GmRCAb Genes
Soybean genomic DNA was analyzed to determine whether GmRCAa and GmRCAb mRNAs could arise from one alternatively spliced gene. Amplification of genomic DNA using gene-specific primers (Supplemental Table S1) resulted in PCR fragments of 3,533 bp for GmRCAa and 3,704 bp for GmRCAb. As shown in Figure 3, the exon, intron, and flanking sequences (including the 3# and 5# regions) of the GmRCAa genomic DNA were 1,434, 2,057, and 42 bp long, respectively, whereas those of the GmRCAb genomic DNA were 1,329, 2,175, and 200 bp. Alignment analysis (http://www.ncbi.nlm.nih.gov/spidey/) showed that the genomic sequences of GmRCAa and GmRCAb precisely matched their corresponding cDNA ORFs and contained six and five introns, respectively.
Comparison of the genomic sequences of GmRCAa and GmRCAb using the DNAMAN program (http:// www.lynnon.com) showed that these two sequences matched poorly, with only 40.6% identity. BLAST analysis against the soybean genome sequence (http://www.phytozome.com/soybean.php) showed that GmRCAa and GmRCAb are located on chromosomes Gm02 and Gm18, respectively. These findings were consistent with the presence of two genes coding for RCA aand b-isoforms in soybean.

Western-Blot Analysis Detected Two RCA Isoforms
Extracts from soybean and Arabidopsis leaves, Escherichia coli transformed with pET-30a expressing the ORF of GmRCAa or GmRCAb, E. coli transformed with empty pET-30a, and untransformed E. coli were sep- Figure 1. Alignment of amino acids deduced from GmRCAa and GmRCAb cDNAs. Numbers indicate amino acid positions for the putative protein sequence of GmRCAa: regular numbers for complete protein sequence, and boldface numbers for mature protein. The arrow indicates the putative cleavage site for the putative transit peptide. The two conserved ATPbinding domains are marked with a solid line and double lines, respectively. The sequence in boldface indicates the additional amino acids at the C terminus of GmRCAa. The diamonds indicate the conserved Cys residues involved in redox regulation. Dots in the GmRCAa sequence represent gaps that were introduced to optimize the alignment. Dashes in the GmRCAb sequence represent amino acids identical to those of GmRCAa. arated by SDS-PAGE and probed with polyclonal Arabidopsis RCA antibodies. As shown in Figure 4, two polypeptides of approximately 47 and 43 kD were detected in Arabidopsis (lane 1), which is consistent with previous findings (Salvucci et al., 1987). Two polypeptides with similar molecular masses were also detected in soybean (lanes 4 and 5), which is consistent with the observations of Salvucci et al. (1987), but with a slight difference; we detected higher levels of the soybean RCA a-isoform than the b-isoform, whereas Salvucci et al. (1987) detected higher levels of the b isoform. As we used an anti-Arabidopsis RCA antibody and Salvucci et al. (1987) used an anti-spinach RCA antibody, this might reflect the affinity difference between the two antibodies. However, the difference in soybean genotypes or leaf developmental stages between these two studies cannot be excluded.  As shown in Figure 4, both the recombinant proteins GmRCAa and GmRCAb were immunoreactive to the anti-Arabidopsis RCA antibody (lanes 2 and 3), while the controls showed no specific bands (lanes 6 and 7). GmRCAa was approximately 4 kD larger than GmRCAb. As both proteins contained an additional N-terminal His tag of 5.31 kD from the expression vector pET-30a, their molecular masses were larger than their native RCA aand b-isoforms in soybean leaves (lanes 4 and 5). These results confirm that GmRCAa and GmRCAb do encode the soybean RCA aand b-isoforms, respectively.

Sequence Analysis of Three Additional RCA-Like Genes in Soybean
BLAST analysis of the GmRCAb ORF sequence against the soybean genome sequence (http://www. phytozome.com/soybean.php) revealed three regions on chromosomes Gm03, Gm11, and Gm14 that may contain additional RCA genes. The corresponding cDNAs, tentatively designated as GmRCA03, GmRCA11, and GmRCA14, were cloned using genespecific primers (Supplemental Table S1) and sequenced. The calculated molecular masses of the predicted proteins encoded by GmRCA11, GmRCA03, and GmRCA14 were 48.67, 52.28, and 52.43 kD, respectively. This implies that GmRCA11 may encode the RCA b-isoform and that GmRCA03 and GmRCA14 may encode the RCA a-isoform.
The phylogenetic relationships among GmRCAa, GmRCAb, GmRCA03, GmRCA11, and GmRCA14 were studied using the sequences encoding the deduced mature polypeptides (Fig. 5). GmRCA03 is the sister gene to GmRCAa, GmRCAb, GmRCA11, and GmRCA14, which are grouped into another lineage in which GmRCAa and GmRCA14 form one sublineage and GmRCAb and GmRCA11 form another. The latter relationship, however, is the only one that received strong bootstrap resampling support.
Correlation among Gene Expression, Rubisco Initial Activity, P N , and Seed Yield Gene-specific primers ( Fig. 2; Supplemental Table  S1) were used to determine the expression levels of GmRCAa and GmRCAb. The inherent variation in the expression of housekeeping genes makes the use of a proper endogenous reference gene indispensable for accurate normalization of mRNA samples. For reliable estimation of the relative expression level of a target gene, the amplification efficiency of reverse transcription (RT)-PCR for an endogenous reference gene should be equal to that for the target gene. In this study, the soybean tubulin gene was used as an endogenous reference to examine the sample-to-sample variation in the amount of cDNA (Potokina et al., 2006), since its amplification efficiency was similar to both GmRCAa and GmRCAb (reflected by the slope of the lines shown in Fig. 6).
Gene expression levels of GmRCAa and GmRCAb, Rubisco initial activity, P N , and seed yield were measured in a set of RILs derived from soybean cv Nannong1138-2 and cv Kefeng No.1 (Supplemental Table S2). Correlation analysis among these five traits was performed (Table I). The expression level of GmRCAa showed a highly positive correlation with that of GmRCAb, and the expression levels of both genes correlated positively with Rubisco initial activity, P N , and seed yield. The correlation between the expression of GmRCAb and the other traits was higher than that of GmRCAa. Rubisco initial activity showed a significantly positive correlation with P N and seed yield, and P N correlated positively with seed yield.

eQTL and QTL Mapping
Expression levels of GmRCAa and GmRCAb, Rubisco initial activity, P N , and seed yield in the RILs followed a continuous distribution and were consistent with quantitative genetic variation (data not shown). Subsequent QTL analyses were performed on the means of corresponding traits of each RIL (Table II; Fig. 7). We detected 12 eQTLs or QTLs for all traits examined, and each trait was controlled by two to three eQTLs or QTLs dispersed among the chromosomes. Individual eQTLs or QTLs explained 4.24% to 12.58% of the total phenotypic variation (r 2 ) of given traits, and one-third of eQTLs or QTLs had r 2 exceeding 10%. eQTLs or QTLs with positive and negative allelic effects were   Two eQTLs on linkage group (LG) A1 were detected for GmRCAa (Table II; Fig. 7). eQTL qRaA1.1 explained 10.19% of the total phenotypic variance, with the alleles of Kefeng No.1 showing increased GmRCAa expression level at this locus. eQTL qRaA1.2 explained 9.56% of the phenotypic variance with positive alleles from Nannong1138-2. For GmRCAb, two eQTLs were detected, on LG-A2 and LG-I. The eQTLs qRbA2.1 and qRbI.1 explained 9.20% and 12.58% of the total phenotypic variance, respectively. Additive effect values indicated that Kefeng No.1 alleles were positive for qRbI.1 but not for qRbA2.1. None of the eQTLs was colocalized with any of the QTLs described below for Rubisco initial activity, P N , seed yield, or those identified for agronomic and resistance traits using the same mapping population in other studies (Chen et al., 2008;Cui et al., 2008).
QTLs for Rubisco Initial Activity, P N , and Seed Yield Two QTLs were detected for Rubisco initial activity, on LG-B2 and LG-D2, respectively (Table II; Fig. 7). The QTLs qRactB2.1 and qRactD2.1 explained 5.24% and 9.81% of the total phenotypic variance, respectively. Additive effect values indicated that Nan-nong1138-2 alleles were positive for both QTLs. QTL qRactD2.1 was possibly colocalized with QTL qP N D2.1, as described below for P N , due to their overlapping confidence intervals on LG-D2 and their shared direction of additive effect with positive alleles from Nannong1138-2.
Three QTLs were detected for P N on LG-C1, LG-D2, and LG-E (Table II; Fig. 7). These QTLs, qP N C1.1, qP N D2.1, and qP N E.1, explained 6.97%, 9.35%, and 12.24% of the total phenotypic variance, respectively. Additive effects indicated that Kefeng No.1 alleles were positive for qP N E.1 but not for qP N C1.1 and qP N D2.1. None of these QTLs was colocalized with any QTL described below for seed yield.

DISCUSSION
The mRNAs Encoding Soybean RCA aand b-Isoforms Are Transcripts of Separate Genes In plant species studied so far, RCA proteins are present either as larger a-isoforms and smaller b-isoforms or only b-isoforms (Portis, 2003). In plants such as Arabidopsis, spinach (Werneke et al., 1989), and rice , which have two RCA isoforms, the two forms are encoded by mRNAs produced from alternative splicing of the transcribed pre-mRNA from a single RCA gene. Cotton is so far the only known plant with two RCA isoforms encoded by different genes (Salvucci et al., 2003). In this study, we found that the mRNAs encoding soybean RCA aand b-isoforms were also transcribed from two Table I. Correlation coefficients and significance of correlations among GmRCAa and GmRCAb expression levels, Rubisco initial activity, P N , and seed yield in a soybean recombinant inbred line P values are as follows: * P , 0.05, ** P , 0.01.

Trait
GmRCAa Expression  different RCA genes. This conclusion is supported by the following evidence: (1) the large difference in amino acid composition between GmRCAa and GmRCAb ( Fig. 1) could not be resolved by the current RCA transcript splicing mechanism observed in Arabidopsis, spinach, and rice; (2) although only partial 3# untranslated regions of cDNAs were cloned and sequenced, GmRCAa and GmRCAb showed considerable diversity in this region (Fig. 2); (3) the genomic sequences of GmRCAa and GmRCAb showed only approximately 40% identity; (4) a BLAST survey against the soybean genome sequence revealed different chromosomal positions for GmRCAa and GmRCAb; and (5) no common eQTL was detected for GmRCAa and GmRCAb (Table II; Fig. 7). Although the mRNAs encoding soybean RCA aand b-isoforms were transcribed from two genes, we could not exclude the possibility that an alternative splicing mechanism for RCA mRNA might exist in soybean. First, the structure of GmRCAa was similar to that of the alternatively spliced RCA gene in other plants. GmRCAa genomic DNA had six junctions between introns and exons (Fig. 3). The sixth junction (counting from the beginning of the ORF) was interrupted by a 266-bp intron that separated the 36-amino acid extension from the rest of the coding region. About 10 bp beyond the splice junction was an ochre stop codon, which if read through in frame would generate a polypeptide of the approximate size of the b-isoform. This phenomenon has been observed in cotton (Salvucci et al., 2003). However, searching the soybean EST data (http://www.ncbi.nlm.nih.gov/) with the sequence covering the splice junction revealed no EST identities, and no cDNAs were ampli-fied using the specific primers designed against the predicted alternatively spliced b-isoform (data not shown). This indicates that the GmRCAa gene transcript has a low probability of being alternatively spliced. Second, other putative a-isoform-encoding RCA genes, such as GmRCA03, might be capable of alternative splicing. Sequencing the N-terminal amino acids of the soybean aand b-isoforms would be a direct way to understand the mechanism for producing RCA, since this region might differ between the polypeptides encoded by GmRCAb and GmRCAa or other possible a-isoform-encoding RCA genes (M.E. Salvucci, personal communication).

Expression Levels of RCA Genes Could Modulate Photosynthetic Capacity and Plant Growth
Much attention has focused on the function of RCA since its discovery. RCA may exert considerable control over photosynthesis (Portis, 2003;Salvucci and Crafts-Brandner, 2004;Kim and Portis, 2005;von Caemmerer et al., 2005;Salvucci et al., 2006;Weston et al., 2007;Hendrickson et al., 2008) and plant growth (Wu et al., 2006Kurek et al., 2007), especially under heat stress. So far, studies investigating the effect of RCA on photosynthesis and/or plant growth have used rca mutants with reduced or increased levels of RCA (Pollock et al., 2003;von Caemmerer et al., 2005;Salvucci et al., 2006;Wu et al., 2006Wu et al., , 2007Salvucci, 2008). In our study, a new strategy was employed to investigate the effect of RCA on photosynthetic capacity and seed yield in soybean. Rubisco initial activity, P N , seed yield, and gene expression levels were measured in a set of soybean RILs to  Cui et al. (2008): starting with "q," followed by an abbreviation of the trait name, the name of the linkage group, and the number of QTLs (eQTLs) affecting the trait on the linkage group. Ra, GmRCAa expression level; Rb, GmRCAb expression level; Ract, Rubisco initial activity; SY, seed yield. b Linkage group on which QTLs (eQTLs) were mapped. c Marker intervals within which QTLs (eQTLs) were mapped. d Position from the first marker on each linkage group. e Confidence intervals were set as the map interval corresponding to a 1-LOD decline on either side of the LOD peak. examine the function of GmRCAa and GmRCAb. Like the cotton RCA genes (Salvucci et al., 2003), the soybean GmRCAa and GmRCAb genes appeared to be functionally equivalent to alternatively spliced RCA genes in other plants.
The significant correlation between the expression of both GmRCAa and GmRCAb and Rubisco initial activity, P N , and seed yield indicated that these genes could play a role in increasing photosynthetic capacity and seed yield (Table I). However, the correlation coefficients between gene expression and Rubisco initial activity, P N , and seed yield were relatively small (Table  I). This was also reflected by the fact that no coincident QTL (eQTL) was found between gene expression levels and the other three traits examined (Table II; Fig. 7). The coincidence of QTLs for two traits, with allelic differences corresponding to the expected relationship between the traits, is strong evidence that the two traits are causally related (Thumma et al., 2001). Thus, we must conclude that factors other than GmRCAa and GmRCAb limit photosynthetic capacity and seed yield. One possible factor for our inability to detect coincident QTLs may be the nonstressing experimental conditions we used. In experiments with rca mutants under nonstressful conditions, a strong relationship between RCA levels and CO 2 assimilation rates was observed only when RCA levels were reduced to below about 30% of wild-type levels (Mate et al., 1996;Eckardt et al., 1997;von Caemmerer et al., 2005); however, although at a high relative concentration, RCA correlated closely with Rubisco activity under heat stress (Salvucci et al., 2006;Salvucci, 2008). Another possible interfering factor could be the genetic background of the mapping population used in this study. As Rubisco initial activity, P N , and seed yield are all multifactorial traits, the relationships between them and RCA gene expression are expected to be complex and variable in different plant materials.  Fu et al. (2006). QTLs (eQTLs) represented by bars are shown on the left of the linkage groups, close to their corresponding markers. The lengths of the bars are proportional to the confidence intervals of the corresponding QTLs (eQTLs), as shown in Table II.
An interrogation using SOYBASE (http://soybase. org, 2009-9-28) revealed many QTLs that mapped in the same or nearby genomic regions as those we identified for RCA gene expression. Within the regions where eQTLs for GmRCAa were identified, a QTL for seed cell wall polysaccharides and a QTL for resistance to Sclerotinia sclerotiorum were found near markers sat267 and satt648 on LG-A1, respectively (Table II). Within the regions where GmRCAb eQTLs were identified, a QTL for seed nitrogen accumulation and a QTL for leaf area were identified near markers A262_2 and I on LG-A2, respectively (Table II). Both A262_2 and I were within the marker interval BE820148-sat_162 on the soybean consensus linkage map (Song et al., 2004). In addition, two QTLs for seed yield and one QTL for plant height were detected near the marker sct_189 on LG-I (sct_189 and satt440 are spaced 1.1 centimorgan [cM] apart on the consensus linkage map; Table II). The traits of cell wall polysaccharides, seed nitrogen accumulation, leaf area, seed yield, and plant height represent different aspects of plant growth. The coincidence of QTLs among these traits with RCA gene expression suggest that RCA expression might exert control over plant growth, which is comparable with the results of many other studies (Wu et al., 2006Kurek et al., 2007). RCA genes play an important role in regulating plant growth under abiotic stresses such as high temperature (Crafts-Brandner and Salvucci, 2000;Salvucci and Crafts-Brandner, 2004). S. sclerotiorum may represent a form of biological stress; further experiments are needed to determine whether RCA genes also play a role in regulating plant growth under biological stresses such as S. sclerotiorum.

eQTL Analysis Provides New Insights into the Modulation of RCA in Vivo
The correlation of a structural gene's map position and its eQTL provides an indication of its regulation (Potokina et al., 2006). If the position of one gene and its eQTL are congruent, cis-regulation could be inferred, which means that the allelic polymorphism of the gene itself, or closely linked regulatory elements, directly impact the gene's expression. Such a pattern was observed for the Ser carboxypeptidase I gene, Cxp1, where colocalization of the Cxp1 eQTL and structural gene provided circumstantial evidence that observed differences in gene expression levels are the result of cis-regulation (Potokina et al., 2006). In our study, the eQTLs for two soybean RCA genes do not colocalize with these two genes. A BLAST survey of the soybean genome sequence (http://www. phytozome.com/soybean.php) using the sequences of markers closely linked to the eQTLs showed that the four markers (sat_267, satt385, sat_171, and satt648; Table II) linked to qRaA1.1 or qRaA1.2 are located on chromosome Gm05, whereas the GmRCAa gene is on chromosome Gm02. The four markers (BE820148, sat_162, satt440, and satt102; Table II) linked to qRbA2.1 or qRbI.1 are located on chromosome Gm08 or Gm20, whereas GmRCAb is on chromosome Gm18. This result suggests that the observed differences in RCA gene expression could be the consequences of trans-regulation, which means that gene expression is mainly regulated by trans-acting factors. A similar phenomenon has been observed for a set of genes involved in the biosynthesis of lignin in Eucalyptus (Kirst et al., 2004). Most of these genes were significantly influenced by two eQTLs on linkage groups 4 and 9, whereas the structural genes were distributed throughout the entire genome.
As we and many others have shown, RCA plays an important role in regulating plant photosynthesis and plant growth. Genetic engineering experiments have been carried out to improve RCA activity (Wu et al., 2006;Kurek et al., 2007). Can breeders modulate RCA gene expression by design? The GmRCAa and GmRCAb eQTLs we identified make it possible to improve RCA gene expression, and ultimately RCA activity and seed yield, by marker-assisted breeding methods such as QTL pyramiding, which is a process of assembling several QTLs for a specific trait from different loci to produce superior genotypes (Xu, 1997). However, our study constitutes only first-order knowledge about the genetic determinism of RCA expression levels in soybean. Considering that RCA activity decreases severely under heat stress (Crafts-Brandner and Salvucci, 2000;Salvucci and Crafts-Brandner, 2004) and that only those QTLs (eQTLs) detected in different materials and under multiple environments are the most valuable ones for breeding, further eQTL mapping of RCA genes in a range of soybean materials under different environments is warranted.

Genome Duplication and Evolution Might Have Led to Multifamily Members and Different Expression Patterns of RCA Genes in Soybean
Polyploidy is a crucial force in plant evolution, and many angiosperms have experienced one or more episodes of polyploidization (Adams and Wendel, 2005). Cotton (Salvucci et al., 2003) and soybean (this study) each have different genes encoding two RCA isoforms. Compared with old polyploid plant species such as rice  and Arabidopsis (Werneke et al., 1989), which contain one alternatively spliced RCA gene encoding two RCA isoforms, cotton and soybean experienced additional genome duplications within the past 5 million years (Adams and Wendel, 2005). Did the recent genome duplication result in two separate genes encoding two RCAs? The lack of information on soybean's diploid progenitors (Shoemaker et al., 2006) makes it difficult to test this hypothesis directly in soybean. However, two separate genes encoding two RCA isoforms exist in cotton diploid species (Salvucci et al., 2003). In addition, Rundle and Zielinski (1991) showed that the alternatively spliced RCA gene (rcaA) in barley evolutionarily preceded the eQTL Analysis of RCA Genes in Soybean second one (rcaB, encoding the RCA b-isoform). The former existed 150 million years ago when monocots and eudicots diverged, and the latter appeared only after this lineage split (Rundle and Zielinski, 1991). Based on this information, we hypothesize that the phenomenon of separate genes encoding two RCA isoforms in soybean might have been formed in an early whole genome duplication event. During this event, the primitive alternatively spliced RCA gene might first have duplicated, and following the long-term diploidization process (Doyle et al., 2008), the duplicates diverged to encode either the aor b-isoform of soybean RCA. This hypothesis is supported by the argument that the diploid progenitor(s) of soybean underwent an early large-scale genome duplication event (Shoemaker et al., 2006).
A BLAST search against the soybean genome, combined with cDNA cloning, showed that soybean possibly contains five RCA genes. Small families of RCA genes have also been observed in tobacco (Qian and Rodermel, 1993). Both soybean and tobacco underwent a recent round of whole or segmental genome duplication within the last approximately 5 to 10 million years (Adams and Wendel, 2005). During long-term diploidization, levels of retention for gene duplicates might vary among different plant species; for example, LYK gene duplicates are retained more often in legumes than in Arabidopsis or rice (Zhang et al., 2007). Therefore, the numbers of RCA genes could also vary among recently polyploidized plant species. To form the current possible five RCA genes, we postulate at least two rounds of duplication of RCA genes in soybean. This is consistent with the argument that soybean has undergone at least two rounds of whole genome duplication, with disparate time estimates of approximately 14 and 44 million years ago (Schlueter et al., 2004) or approximately 4 and 16 million years ago (Blanc and Wolfe, 2004).
Duplicate genes may retain original gene function, subfunctionalize, neofunctionalize (i.e. obtain a new function), or be silenced (Wendel, 2000;Doyle et al., 2008). In this study, the significant correlation of GmRCAa and GmRCAb expression levels with Rubisco initial activity, P N , and seed yield suggests that aand b-isoforms of RCA might play similar roles in soybean (Table I). However, eQTL mapping indicates that these two forms of RCA might have differential expression patterns, since the expression levels of GmRCAa and GmRCAb were controlled by different loci on different linkage groups (Table II). These data indicate that soybean RCA genes might have undergone subfunctionalization, suggesting that ancestral functions were partitioned among duplicate genes (Doyle et al., 2008).
In this study, we only analyzed the expression levels of GmRCAa and GmRCAb, although as mentioned, there may be at least five RCA genes in soybean. Could the other three RCA genes also affect photosynthesis and yield? What is the relative contribution of these genes to overall RCA function in vivo? Multiple RCA genes could result in finer control over protein expression (M.E. Salvucci, personal communication); if so, how does each of these genes respond to stresses? Future studies of the expression of all soybean RCA genes under different environmental conditions could address these questions, providing better understanding of the functions and the evolution of RCA genes in soybean.

CONCLUSION
GmRCAa and GmRCAb encode the longer a-isoform and the shorter b-isoform of soybean RCA, respectively. Their mRNAs are transcribed from two separate genes, and their expression levels are controlled by different loci on different linkage groups. The correlation between gene expression levels and Rubisco initial activity, P N , and seed yield suggests that RCA genes could play an important role in regulating photosynthetic capacity and plant growth. Mapping analysis revealed four eQTLs acting in trans-mode for GmRCAa and GmRCAb, which may be useful in future marker-assisted breeding. The multiple family members and different expression patterns of RCA genes might be the consequences of long-term whole genome duplication and duplicate evolution in soybean.

Plant Material and Plant Growth Conditions
Soybean (Glycine max 'Kefeng No.1') was used for gene cloning and western-blot analysis of RCA. Plants were field grown under natural conditions at Nanjing Agricultural University. Sowing was carried out on May 28, 2007. Once the third euphylls had expanded, fully expanded leaves were collected and frozen immediately in liquid nitrogen, then stored at -80°C until further use.
A soybean RIL population derived from a cross between Kefeng No.1 and cv Nannong1138-2 was used to determine expression levels of RCA genes, P N , Rubisco initial activity, and seed yield. This population consists of 184 F7:11 lines derived via single-seed descent at the National Center for Soybean Improvement of China. The planting experiment was conducted under natural conditions at Jiangpu Experimental Station, Nanjing Agricultural University. Genotypes were grown individually in pots containing 3 L of soil in a completely randomized design with six replications (one pot per replication). Nine seeds were sown per pot, and 7 d after emergence, plants were thinned to one per pot. To control environmental effects on phenotypic evaluation, the RIL population was divided into three groups according to their maturity time observed in previous years (data not shown). Each group was sown at different times, so that when trait data were collected, all lines were at a similar growth stage. Sowing was carried out on May 8, 15, and 22, 2007. Nutrition and water were supplied sufficiently throughout the experiment to avoid potential nutrient and drought stresses. At the R6 stage of development, the mature upper third leaves were collected individually from three plants of each RIL in the morning (9:00-11:30 AM) on a sunny day, frozen immediately in liquid nitrogen, and stored at -80°C until further use.

Isolation of Genomic DNA and Synthesis of cDNA
Genomic DNA was extracted from young leaflets of soybean using the cetyl-trimethyl-ammonium bromide protocol as described by Weising et al. (1995). Total RNA was isolated from leaves using the RNeasy Plant Mini Kit (Qiagen) and was then treated with 10 units of RNase-free DNase I (TaKaRa).
First-strand cDNA was synthesized in a final volume of 20 mL containing 4 mL of 53 buffer, 1 mg of total RNA, 500 mM oligo(dT) 18 primer, 10 units of avian myeloblastosis virus reverse transcriptase (TaKaRa), 1 mM deoxyribonucleotide triphosphates, and 20 units of RNase inhibitor (Promega).

Sequence Retrieval and Primer Design
Soybean RCA genes were identified using an in silico mRNA subtraction strategy. Homologue searches in the soybean EST database (http://www. ncbi.nlm.nih.gov/) and tentative consensus sequences database (http:// compbio.dfci.harvard.edu/tgi/) were performed using the Arabidopsis (Arabidopsis thaliana) RCA gene (GenBank accession no. 818558) as a query. RCA-like EST and tentative consensus sequences were downloaded and assembled, and then resulting contigs were predicted. Consequently, two putative cDNAs with complete ORFs encoding RCA-like proteins in soybean were obtained and tentatively designated as GmRCAa (for the sequence with the longer ORF) and GmRCAb (for the sequence with the shorter ORF). After verifying the prediction of GmRCAa and GmRCAb in soybean by gene cloning and recombinant protein expression, we performed another round of homologue searches against the soybean genome sequence using the GmRCAb ORF sequence as a query. This predicted three additional RCA-like genes, designated as GmRCA03, GmRCA11, and GmRCA14, located on chromosomes Gm03, Gm11, and Gm14, respectively. On the basis of sequence information, gene-specific primers (Supplemental Table S1) were designed for PCR amplification from cDNA and genomic DNA, expression of recombinant RCA protein, and/or real-time quantitative PCR analysis.

Cloning and Sequence Analysis of PCR Products
PCR products obtained from genomic DNA and cDNA prepared from leaves were separated on 1% agarose gels and purified with a gel extraction kit (HuaShun) according to the manufacturer's protocol. The purified product was cloned into the pGEM-T vector (Promega) and sequenced (Invitrogen). Sequence analysis was performed using DNAMAN software (http://www. lynnon.com), spidey software (http://www.ncbi.nlm.nih.gov/spidey/), and the ChloroP version 1.1 server (http://www.cbs.dtu.dk/services/ChloroP/). The M r of the predicted protein was calculated using the BioXM program (version 2.6; http://www.bio-soft.net/format/bioxm.htm). A phylogenetic tree was constructed using the MEGA program (version 4.1; Kumar et al., 2008).

Expression of Recombinant RCA Proteins
GmRCAa and GmRCAb cDNAs were used as templates for PCR amplification. PCR products were digested with BamHI and XhoI restriction enzymes and introduced into the pET-30a expression vector (Novagen). The resulting constructs were introduced into Escherichia coli strain BL21 (DE3) (Novagen). Expression of these two recombinant RCA isoforms was performed as described previously (van de Loo and Salvucci, 1996).

Leaf Protein Extraction
Soybean (Kefeng No.1) or Arabidopsis (ecotype Columbia, C14) leaf samples (40 mg) were ground with a mortar and pestle in liquid nitrogen until pulverized. Two milliliters of extraction buffer (20% [v/v] glycerol, 0.25% [w/v] bovine serum albumin, 1% [v/v] Triton X-100, 50 mM HEPES/ KOH [pH 7.5], 10 mM MgCl 2 , 1 mM EDTA, 1 mM benzamidine, and 1 mM phenylmethylsulfonyl fluoride) was added immediately. The slurry was centrifuged for 10 min at 12,000g at 4°C. Protein-containing supernatant was used to determine Rubisco initial activity. The same leaf protein extraction procedure was followed for western-blot analysis, except that 1 g of leaf per sample was used.

Western-Blot Analysis of RCA Protein
Protein extracts were subjected to SDS-PAGE using a 12.5% acrylamide resolving gel (Mini Protean II System; Bio-Rad; Fling and Gregerson, 1986). Separated proteins were then transferred to polyvinylidene difluoride membranes, and nonspecific binding of antibodies was blocked with 5% nonfat dried milk in phosphate-buffered saline (pH 7.4) for 2 h at room temperature. Membranes were then incubated overnight at 4°C with polyclonal Arabidop-sis anti-RCA antibodies (aA-18; sc-15864; Santa Cruz Biotechnology) diluted 1:3,000 in phosphate-buffered saline plus 1% nonfat milk. Immune complexes were detected using rabbit anti-goat IgG (H+L) horseradish peroxidase (BS30503; Bioworld Technology). The color was developed with a solution containing 3,3#-diaminobenzidine tetrahydrochloride as the peroxidase substrate, and membranes were scanned. of 2.0 for declaring a QTL (eQTL) was employed. Low thresholds may not be useful in plant breeding programs but they have been shown to help in understanding relationships among traits (Thumma et al., 2001).
The maximum LOD score along the interval was taken as the position of the QTL (eQTL), and the region in the LOD score within 1 LOD unit of maximum was taken as the confidence interval. Additive effects of QTLs (eQTLs) detected were estimated from composite interval mapping results as the mean effect of replacing both Nannong1138-2 alleles at the locus of interest by Kefeng No.1 alleles. Thus, for a QTL (eQTL) to have a positive effect, the Kefeng No.1 allele must increase the trait value. The contribution of each identified QTL (eQTL) to total phenotypic variance (r 2 ) was estimated by variance component analysis. QTL (eQTL) nomenclature was adapted as described previously : starting with "q," followed by an abbreviation of the trait name, the name of the linkage group, and the number of QTLs (eQTLs) affecting the trait on the linkage group.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Table S1. Primer pairs used in this research.