|
|
||||||||
|
Plant Physiology 137:1211-1227 (2005) © 2005 American Society of Plant Biologists Sequencing and Analysis of Common Bean ESTs. Building a Foundation for Functional Genomics1,[w]Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Apartado 66210 Cuernavaca, Morelos, Mexico (M.R., L.B.-L., S.S., A.M.-S., G.H., M.L.); Agronomy and Plant Genetics (M.R., C.P.V.) and Plant Biology (M.A.G.), University of Minnesota, St. Paul, Minnesota 55108; Plant Science Research Unit, U.S. Department of Agriculture. Agriculture Research Service, St. Paul, Minnesota 55108 (C.P.V.); and International Center for Tropical Agriculture, Cali, Colombia (M.W.B.)
Although common bean (Phaseolus vulgaris) is the most important grain legume in the developing world for human consumption, few genomic resources exist for this species. The objectives of this research were to develop expressed sequence tag (EST) resources for common bean and assess nodule gene expression through high-density macroarrays. We sequenced a total of 21,026 ESTs derived from 5 different cDNA libraries, including nitrogen-fixing root nodules, phosphorus-deficient roots, developing pods, and leaves of the Mesoamerican genotype, Negro Jamapa 81. The fifth source of ESTs was a leaf cDNA library derived from the Andean genotype, G19833. Of the total high-quality sequences, 5,703 ESTs were classified as singletons, while 10,078 were assembled into 2,226 contigs producing a nonredundant set of 7,969 different transcripts. Sequences were grouped according to 4 main categories, metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%), and further subdivided into 15 subcategories. Comparisons to other legume EST projects suggest that an entirely different repertoire of genes is expressed in common bean nodules. Phaseolus-specific contigs, gene families, and single nucleotide polymorphisms were also identified from the EST collection. Functional aspects of individual bean organs were reflected by the 20 contigs from each library composed of the most redundant ESTs. The abundance of transcripts corresponding to selected contigs was evaluated by RNA blots to determine whether gene expression determined by laboratory methods correlated with in silico expression. Evaluation of root nodule gene expression by macroarrays and RNA blots showed that genes related to nitrogen and carbon metabolism are integrated for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to bean improvement.
Common bean (Phaseolus vulgaris) is the most important grain legume for direct human consumption; it comprises 50% of the grain legumes consumed worldwide (McClean et al., 2004
Partial sequencing of cDNA inserts or expressed sequence tags (ESTs) obtained from many plant tissues and organs has been used as an effective method of gene discovery, molecular marker generation, and transcript pattern characterization. It is an efficient approach for identifying a large number of plant genes expressed during different developmental stages and in response to a variety of environmental conditions. In addition, once ESTs are generated, they provide a resource for transcript-profiling experiments. Currently, only the grasses surpass the legumes (Fabaceae family) for the number of publicly available ESTs. There are nearly 986,000 nucleotide sequences representing the Fabaceae family available from the National Center for Biotechnology Information (NCBI) taxonomy browser (October, 2004; http://www.ncbi.nlm.nih.gov/Taxonomy). Over 92% of the ESTs deposited for the Fabaceae family are derived from the model legumes Medicago truncatula and Lotus japonicus and the crop legume soybean (Glycine max). Despite the importance of common beans as a crop legume, very little EST information is currently publicly available. Only 575 ESTs from common bean and 20,120 ESTs from the related species, runner bean (Phaseolus coccineus), have been deposited in GenBank's EST database. For this reason, we have undertaken a survey of the bean transcriptome by analyzing ESTs from diverse organs. Our research has been performed within the framework of Phaseomics, the international consortium for Phaseolus genomics (Broughton et al., 2003
Nitrogen (N) and phosphorus (P) are critical macronutrients required for plant growth. In the bean-growing regions of the developing world, soils are frequently depleted in N and P (Graham and Vance, 2003
Features of Generated ESTs In an effort to develop an EST platform for common bean, 5 cDNA libraries were constructed, 4 from the Mesoamerican cultivar Negro Jamapa and 1 from the Andean cultivar G19833. The sources of RNA to construct each library were pods, leaves, P-deficient roots, and nodules for the Mesoamerican genotype and leaves for the Andean genotype. Single-pass 5' sequencing resulted in 3,400 to 4,900 ESTs from each of the Mesoamerican and Andean libraries (sequences deposited in GenBank, accession nos. CV528971CV544303). In addition, single-pass 3' sequencing of the Andean genotype yielded an additional 854 sequences. In total, 21,026 ESTs were sequenced (Table I). This number includes the 575 common bean ESTs already present in GenBank. Between 19% and 33% of the sequenced ESTs from the 5 libraries were discarded and not considered for contig assembly due to low-quality sequence or the absence of insert in the clone. In addition, clones identified as chimeric or alternative splice products were not included in contig assembly. Redundant ESTs were grouped into contigs using the program Phrap (http://www.phrap.org/phredphrap/phrap.html). Of the total 15,781 EST sequences considered acceptable for contig assembly, 5,703 of these were classified as singletons and the remaining 10,078 assembled into 2,266 contigs ranging in EST redundancy from 2 to 264 (Table II). Library-specific contigs ranged from 44 to 228, depending upon the organ. Total contigs and singletons comprised a nonredundant gene set of 7,969 different transcripts. All EST sequences, contig images, single-nucleotide polymorphism (SNP), and gene family data analyses are available (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).
Functional Annotation
To identify putative functions for genes encoding ESTs, BLASTX analysis was used to compare the common bean contigs and singletons to the Uniref 100 protein database (Apweiler et al., 2004
Contigs Composed of Most Abundant ESTs Analysis of EST frequency (abundance) comprising a contig and the source of the contig can provide insights with respect to gene expression levels and biochemical functions occurring in an organ or tissue. Therefore, to identify genes that were highly expressed in certain tissues, we identified the contigs that were most abundantly expressed in pods, leaves, P-deficient roots, and root nodules (Table III). The 20 contigs from each library composed of the most redundant ESTs are shown in Table III. Those contigs having ESTs from a single organ source are noted as specific. Given our methodology, contigs may appear in the top 20 of multiple tissues. A larger version of Table III, including the UniProt accession numbers, is available at our Web site (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).
Since pods were collected over a range of maturity dates, contigs composed of abundant ESTs reflect genes involved in both pod and seed growth and development. Contigs related to seed traits have homology to albumins (UniProt accessions Q39837 and Q9ZQX0), lectin (Q8L683), lipoxygenases (O24320, P27481, and Q9FQF9), acid phosphatases (O49855), -glucosidase (Q9XJ67), and lipid transfer proteins (O24440 and Q8W539). By comparison, contigs related to pod function included photosynthetic proteins such as chlorophyll a/b-binding proteins (Q39831, Q40512, Q43437, Q9LKI0, Q9LKI1, Q9SQL2, Q9XF89, and Q9XQB1), PSI reaction center protein (Q9S7N7), and storage protein (O23808). Unexpectedly, a contig annotated as nodulin 26 (contig 2,670, Q39882), which corresponds to a membrane transporter, contained 20 pod-derived ESTs. Nodulin 26 ESTs were also found in leaves and roots, but not in nodules. A contig containing numerous ESTs for alcohol dehydrogenase (contig 2,662, Q8LJR2) was also found in pods. Of the 20 contigs noted as those containing numerous ESTs in pods, 6 were pod specific. The leaf contigs composed of the most abundant ESTs from both the Mesoamerican and Andean cultivars are shown in Table III. As expected, many contigs from leaf ESTs of both cultivars related to photosynthesis and similar processes. Among the contigs from the Andean cultivar are several involved in amino acid metabolism. These were not evident in the Mesoamerican sequences. Conversely, there are 9 contigs in the Mesoamerican group that had no comparable sequences in the Andean leaf group, including 2 nodulin 30s (Q39882 and Q41121), 1 leghemoglobin (Q03972), and a carbonic anhydrase (Q9XQB0), which is not represented in the Mesoamerican cultivar. Thus, the complement of contigs between the two germplasm sources was quite distinct. These differences in contigs may represent genotypic, growth condition, and/or developmental stage variables. Because root ESTs were derived from P-stressed plants, contigs composed of abundant root ESTs reflect not only root function, but also those that may be related to stress. This is exemplified in the five root contigs containing the most abundant ESTs that have homology to a stress-related pathogenesis protein (P25985), an extensin (Q41707), a plasma membrane intrinsic protein (Q9XGG8), a metallothionein (Q75NH5), and an S-adenosyl-methionine (SAM) decarboxylase (Q8W3Y2), all of which are related to biotic/abiotic stress. Noticeably, several other contigs encode putative transport/membrane, oxidative stress, transcription factor, and phosphatase proteins. Five of the most abundant root contigs were found only in the root library. Nodule contigs composed of the most abundant ESTs have homology to putative proteins involved in core functions related to N fixation, including oxygen control (leghemoglobin Q03972] and ascorbate peroxidase [Q41712]), C metabolism (Suc synthase [Q8GTA3], Suc nonfermenting protein 1 [SNF1; Q9XIW0], aldolase [O65735], malate dehydrogenase [MDH; Q9FSF0]), amino acid synthesis (Gln synthetase N-1 [P00965]), and ureide synthesis (uricase [P53763] and inosine dehydrogenase [Q84XA3]). Interestingly, several of the nodule contigs encode putative proteins functioning in plant-microbe interactions, for example, CDR-1 (Q6XBF8), 2-on-2 hemoglobin (Q6QDC2), epoxide hydrolyase (Q9ZP87), hypersensitive-induced response protein (Q6L4S3), and polygalacturonase (O81798). Putative membrane-trafficking and transport proteins (Q7XJQ3), nodulins 24 (P04145) and 55 (Q02917), and annexin (O65848) were also highly represented. Surprisingly, of the 20 nodule contigs shown in Table III, none were found only in nodules. Several other contigs composed of 2 to 10 ESTs were nodule specific. A complete list of the contigs containing a higher number of ESTs is available (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).
In recent years, considerable effort has focused on the identification of nodule-enhanced or nodule-specific genes. To allow comparisons between projects, the 340 nodule-specific M. truncatula EST contigs identified by Fedorova et al. (2002)
Ten contigs (477; 616; 642; 825; 917; 1,041; 1,067; 1,372; 1,843; and 2,376) were identified with no or Phaseolus-only BLASTX hits to the Uniref 100 protein database or to non-Phaseolus sequences in the database of legume sequences. To verify that these contigs were indeed Phaseolus specific, TBLASTX was used to compare them to the EST_others database and the Arabidopsis (Arabidopsis thaliana) genome. Comparisons to the EST_others database would detect homology to genes expressed in a variety of conditions. Comparisons to the Arabidopsis genome allowed identification of sequences whose expression had not been detected in other species and could also be used to find homology to genes that have not yet been predicted. These additional analyses confirmed that 9 of the 10 contig sequences were indeed Phaseolus specific. Full-length sequencing of the ESTs in these contigs and RNA-blot expression studies may provide further insight into the function of these genes.
Single-linkage clustering, as described by Graham et al. (2004)
SNPs were identified between the Andean and the Mesoamerican genotypes by comparing the Andean leaf ESTs against all other ESTs from all other tissue libraries of the Mesoamerican genotype. A total of 645 contigs (28% of the total) contained at least 1 sequence from both genotypes and could be mined for potential SNPs. Two different criteria were used to identify SNPs. High-quality SNPs were confirmed by two or more sequences from each genotype showing the same base change. A total of 138 high-quality SNPs were found in 72 contigs. Lower quality SNPs were confirmed by one sequence in one genotype and at least two sequences in the other. A total of 421 SNPs, representing 196 contigs, were identified in this class. Together, these 559 SNPs corresponded to 199 contigs, giving an average SNP per contig number of 2.8. As expected, the majority of the SNPs were due to base pair mutations (94.9%) compared to insertion-deletion events (5.1%). Among the base pair mutations, transversions (34.5%) were less common than transitions (65.6%) and, among these, Cys-to-thymidine mutations (65.1%) were more common than adenine-to-guanine mutations (34.9%). SNPs were found in a range of contigs. Due to the nature of the comparison between EST libraries conducted here, where Andean ESTs were all from leaf tissue, many of the SNPs were found in contigs representing highly expressed leaf genes involved in the structure of the PSI and PSII, and in the CO2 assimilation process. Confirming their high level of expression in leaf tissue, the photosynthesis-related genes were homologous to the contigs with the greatest number of ESTs, ranging from >20 up to 161 individual sequences in the case of contig 2,685 with homology to the ribulose bisphosphate carboxylase precursor (Q43874).
Tissue-specific or tissue-enhanced ESTs were chosen from nodules, pods, leaves, and P-deficient root cDNA libraries to verify transcript abundance in different plant tissues by RNA blots. Five ESTs were selected to verify nodule-specific and/or nodule-enhanced expression (Fig. 2A). All were highly expressed in nodules, with a sulfate transporter (contig 2,167), SNF (contig 2,434), and leghemoglobin appearing to be expressed only in nodules. Two different-size RNAs were detected with the SNF-like cDNA probe. Most of the pod ESTs selected for RNA-blot analysis (Fig. 2B) are expressed in a pod-enhanced manner, independent of the EST redundancy, since pod storage protein (contig 2,671) is represented by 44 pod ESTs and myoinositol-1-P synthase (contig 2,532) is represented by 5 pod ESTs in this cDNA library. Lipoxygenase (contig 2,628) transcript is detected in pods, but also in leaves, with the greatest expression in stems.
Figure 2C shows that, with the exception of a hypothetical protein (contig 2,608), most of the ESTs selected from the 2 leaf cDNA libraries are expressed in a leaf-enhanced manner and leaf-specific expression was detected for a carbonic anhydrase (contig 2,534) transcript. Interestingly, a transcript of a lower Mr for plastidic aldolase (contig 2,668) was detected in nodules as compared to other organs. The unexpected hybridization of Rubisco (contig 2,682) to nodule RNA is puzzling. However, the different size of the transcript detected in nodules could reflect a chimeric clone or the presence of a very abundant transcript in nodules with high homology with Rubisco. Nodule ESTs annotated as Rubisco can be found in M. truncatula and L. japonicus databases. Transcript abundance analysis of 7 selected ESTs from the P-deficient root library (Fig. 2D) shows that only 2 (pathogenesis-related protein [contig 2,665] and aquaporin [contig 2,522]) were more highly expressed in roots than in other tissue. The pathogenesis-related protein contig is composed of 38 root ESTs and the aquaporin contig is composed of 4 root ESTs. Independent of the number of ESTs, transcript levels of aquaporin are higher than those of pathogenesis-related protein in roots. The other ESTs in Figure 2D were selected as specific sequences of a P-deficient root cDNA library, but none show root-enhanced expression. The transcript of a putative phosphatase (contig 2,286) EST was clearly detected in P-deficient roots, but was not detected in any other tissue, including roots grown in the presence of P, suggesting that this phosphatase plays a specific role in phosphate release processes that take place in roots under P deprivation.
Macroarray approaches, as described previously (Fedorova et al., 2002
From the 3 to 5 independent nylon filter arrays hybridized with first-strand cDNA from nodules, roots, leaves, stems, and pods, only those replicates (2 or 3) with a high determination coefficient (r2
Figure 3A shows that the expression ratio of nodules to roots was lower than nodules as compared to other organs. This might be due to the fact that either (1) the roots used for RNA isolation and macroarray hybridization were obtained from nodulated bean plants after nodules were removed; or (2) nodules are derived from root cortical cells. The data shown in Figure 3A revealed that 31 ESTs had 5-fold or higher nodule-root expression ratios. From these, 2 ESTs identified as villin 2 (NOD_247_F07) and Suc synthase (contig 2,654) showed the highest expression ratio (8; Table IV). Forty-nine ESTs had a higher expression in roots as compared to nodules (Fig. 3A). From these, an EST identified as ring-H2 finger protein (contig 905) has the highest expression in roots versus nodules (expression ratio = 12).
Greater differences in ratios of gene expression were observed when comparing nodules with leaves and stems; these large ratios reflect very different function between nodules and those source organs (Fig. 3, B and C). In nodules versus leaves and stems, 188 and 294 ESTs had expression ratios of 10 or higher, respectively (Fig. 3, B and C). From these, 99 and 138 ESTs were expressed 20-fold or more in nodules than in leaves and stems, respectively. In the comparisons of nodules versus leaves and nodules versus stems, totals of 6 and 26 ESTs, respectively, were found with expression ratios higher than 50. As shown in Table IV, at least 15 ESTs showed very high expression ratios (ranging from 52135) both in nodule-leaf and nodule-stem. The functional categories of these ESTs were identified as proteins for nodulation or nodulins, such as leghemoglobin (contig 2,686), nodulin 30 (contig 2,679), and early nodulin 55-2 (contig 2,589), as well as proteins involved in C metabolism, defense, or regulation. Data from Figure 3, B and C, show that 61 and 44 ESTs, respectively, were more expressed in leaves and in stems than in nodules. The most highly expressed ESTs in leaves versus nodules were identified as VirF-interacting protein (NOD_225_E10; expression ratio = 9), ring-H2 finger protein (contig 905; expression ratio = 8), and one without homology to known genes (contig 2,009; expression ratio = 6); the first 2 were also highly expressed in roots and pods as compared to nodules. Pods, as well as nodules, can be considered as sink organs; pods receive photosynthate from the leaves and mobilize N for pod development and seed formation. In general, expression ratios found in nodules versus pods were not as high as those found when comparing leaves and stems (Fig. 3). A total of 197 ESTs had nodule-pod expression ratios higher than 10. From these, 65 had 20-fold or higher expression ratios. Only 3 ESTs (nodulin 30, an unknown protein, and a hypothetical protein) had nodule-pod expression ratios higher than 50. Forty-three ESTs were more highly expressed in pods than in nodules (Fig. 3D); VirF-interacting protein (NOD_225_E10; expression ratio = 9) and a nonidentified EST showed the highest expression in pods versus nodules (expression ratio = 6).
Although transcriptome studies of genes related to nodule N and C metabolism have been reported for the model legumes M. truncatula (Györgyey et al., 2000 At least 11 enzymes of C metabolism appeared to have enhanced expression as evidenced by either the expression ratio of nodule-root in macroarrays or abundant ESTs (Table V). Notably Suc synthase (contig 2,654) and phosphoenolpyruvate (PEP) carboxylase (PEPC; contig 2,265), enzymes that contribute to sugar use, had high expression (Figs. 2 and 4). Several enzymes involved in general glycolysis (triose phosphate isomerase [contig 2,550], phosphoglycerate kinase [contig 2,537], and enolase [contig 2,622]), also had enhanced transcript levels (Fig. 4), as well as Glc-6-P dehydrogenase (Table V), a key source of NADPH for nodules.
With respect to N metabolism, four enzymes related to initial assimilation of fixed N into Gln had enhanced expression. In addition, another two enzymes related to ureide metabolism had enhanced expression.
Confirmation of macroarray results and contig analysis for common bean root nodule genes involved in C and N metabolism was obtained through RNA blots (Fig. 4). Even though expression of most genes involved in C metabolism that we tested was not nodule specific, the greatest transcript abundance was usually found in nodules. Reflecting nodule function, those genes involved in N metabolism are most clearly expressed preferentially in nodules (Fig. 4). In contrast with soybean (Lee et al., 2004
In this article, we provide an initial platform for functional genomics of common bean by the identification of almost 8,000 unique genes assembled from more than 20,000 ESTs sequenced from various plant organs. These sequences enrich the collection of ESTs in this important crop and provide new understanding of bean metabolism, development, and adaptation to stress. Roughly 3,400 to 4,900 ESTs were sequenced from each of 5 cDNA libraries of different bean tissues, and we identified 2,226 contigs (with 2 or more ESTs each) which were classified into 15 functional subgroups. From these contigs, 36% represented sequences of unknown function or had no homology to previously identified proteins in the UniProt database (Apweiler et al., 2004
A comparison of EST redundancy in contigs having sequences derived from multiple organs can provide a broad overview of gene expression and biochemical functions occurring within an organ (Colebatch et al., 2002
Transcript expression evaluated by macroarrays provides a detailed picture of nodule biology, particularly C and N metabolism. While whole-nodule transcript studies of L. japonicus and M. truncatula (Colebatch et al., 2002
We found several enzymes related to sugar use and glycolysis to be up-regulated in nodules that are also reflected in the contigs containing abundant ESTs from nodules. Suc synthase, the initial enzyme in Suc cleavage, which is critical for N fixation, is highly expressed in bean nodules (Fig. 4) and the corresponding contig (2,654) has 21 ESTs from nodules. Interestingly, the five enzymes of glycolysis (Table V; Fig. 4) that we find enhanced in bean nodules are involved in the synthesis of 3-C intermediates that ultimately lead to PEP, which is the fundamental backbone for both malate and Asn synthesis (Deroche and Carrayol, 1988
Malate is considered the primary C source in nodules used by bacteroids for energy to reduce N (Appels and Haaker, 1991
The initial assimilation of fixed N into the nodule amino acid pool is catalyzed by glutamine synthetase (GS) and NADH-GOGAT in concert with Asp aminotransferase (AAT; Gantt et al., 1992
During the review of our submission, Lee et al. (2004) Although the abundance of ESTs within a contig derived from an organ or tissue can frequently correlate with transcript expression within the tissue, conclusions drawn from in silico analyses can be misleading. We chose 20 ESTs that were specific to or highly enhanced in a particular organ and evaluated their expression in various organs by RNA blot (Fig. 2). Although many of the ESTs gave expression patterns similar to that expected from in silico data, several had abundant expression in organs other than the one from which they were selected. For example, lipoxygenase (contig 2,628), a hypothetical protein (contig 2,632), and zinc finger protein (contig 2,266) derived from pods, leaves, and roots, respectively, have quite high expression in other organs. These results could be due to several factors, including mRNA stability, growth conditions, and developmental stage. An added complexity in correlating in silico results with RNA blots is the occurrence of contigs as multigene families. In fact, of the 2,226 contigs we identified, 943 belonged to gene families. At this stage of limited sequencing, most (557) of the gene families are composed of 2 sequences, while 36 gene families contain 10 sequences, and 3 gene families contain 60+ sequences. From this analysis, we can conclude that 21% (3,358) of the ESTs used for contig assembly are members of gene families. Inclusively, our findings show the necessity of verifying in silico EST expression data by RNA blots and/or quantitative reverse transcription-PCR.
This study also showed the utility of mining EST collections in common bean for SNPs. To reduce errors caused by single-pass sequencing and low base quality values, we used two different criteria for identifying SNPs. Lower quality SNPs were supported by one sequence in one genotype and at least two sequences in the other. Using these criteria, a SNP could be found every 508 bp. High-quality SNPs were supported by at least two sequences from each genotype. Similarly, these criteria identified a SNP every 601 bp. By combining these data together, we identified 529 SNPs in 214 kb of SNP-containing contigs, giving a SNP every 387 bp. These values are similar to those found for equivalent comparisons made in other in-breeding species of plants, but less frequent than in maize (Tenaillon et al., 2001 Because of our overriding interests in bean root nodule development and function, this project was initiated to focus on global characterization of bean nodule transcripts. This priority is evidenced by our in-depth analysis of bean nodule metabolism. As the Phaseomics consortium coalesced and defined its goals, it became apparent that the bean community needed EST profiles of additional bean organs. Thus, we sequenced ESTs from pods, leaves, and P-deficient roots. Future reports will concentrate on more detailed characterization of and research with ESTs from other bean organs and development of SNP-based molecular markers from the current set of EST sequences.
Plant Material
Two genotypes of common bean (Phaseolus vulgaris) were used for library construction. The first was the Mesoamerican cultivar Negro Jamapa 81, plants of which were grown in greenhouses at Centro de Investigación sobre Fijación de Nitrógeno (CIFN)/Universidad Nacional Autónoma de México (Cuernavaca, Mexico) and at University of Minnesota (St. Paul), as previously reported (Ortega et al., 1992
A total of 5 cDNA libaries were made, 4 from Negro Jamapa 81 and 1 from G19833. In the case of Negro Jamapa 81, total RNA was isolated from different plant organs: (1) young (1.55 cm) and mature (15 cm) pods from inoculated plants; (2) leaves from 15-d-old nodulated plants; (3) roots from P-deficient plants; and (4) mature effective nodules harvested after 15 dpi with R. tropici CIAT 899. For all the libraries made from Negro Jamapa 81, poly(A+) RNA was obtained from total RNA using oligo(dT) cellulose. The poly(A+) RNA used for the pod library was obtained from total RNA combined from young and mature pods in a 1:1 (w/v) ratio. Conversion of polyadenylated RNA to cDNA was performed in the phage Uni-ZAP XR with a Stratagene (La Jolla, CA) synthesis and cloning kit. The cDNA synthesis of poly(A+) mRNA was primed by oligo(dT)-XhoI adapter primer with MNLV-reverse transcriptase, while the second strand was synthesized via polymerase I ribonuclease H coincubation. EcoRI adapter was added to the blunted double-stranded cDNA followed by XhoI digestion. Recovered cDNA was directionally cloned into the EcoRI-XhoI Uni-ZAP XR vector, according to the manufacturer's instructions. The cDNA from all libraries was size selected via Sephacryl S-500 spin columns as part of the procedure described by the manufacturer (Stratagene). The fifth cDNA library, made for the genotype G19833, was prepared from total RNA isolated from leaves and vegetative meristems of 3-week-old plants. For this library, poly(A+) RNA was purified and reverse transcribed, and cDNAs were directionally cloned into the NotI/SalI sites of the pCMVSport6.0 vector (Invitrogen, Carlsbad, CA).
For conversion of the 4 Negro Jamapa 81 cDNA phage libraries (ZAP XR vector) into the plasmid form (pBluescript), mass excision was performed, according to the procedure described by the manufacturer (Stratagene). Single colonies of Escherichia coli strain SOLR carrying the excised phagemid were replicated, and glycerol stocks were stored in microtiter plates at 80°C. Plasmid DNA from a nodule cDNA library was isolated using the QIAprep 96 Turbo Miniprep kit, according to the manufacturer's instructions (Qiagen, Valencia, CA). The plasmid DNA isolation of the other three libraries was made by a modified alkaline lysis method. Sequencing of the plasmid cDNA was performed by the Advanced Genetic Analysis Center (St. Paul) for the pod, root, and nodule libraries and at the CCG (Cuernavaca, Mexico) for the leaf library. Standard T3 sequencing primer was used for 5' single-stranded sequencing. For the G19833 library, the clones were transformed into E. coli EMDH12S cells, which were plated on Q plates with carbenicillin (100 mg L1). A Q-Bot was used to pick and array colonies into plates and filters. Plasmid DNA was isolated using a modified alkaline lysis method and the individual cDNAs were sequenced either from the 5' end, using a SP6 primer, or from the 3' end with a T7 primer at the Clemson University Genomics Institute (Clemson, SC) and at CIAT.
Common bean EST sequences were analyzed using a processing pipeline developed by the Center for Computational Genomics and Bioinformatics (CCGB) at the University of Minnesota (Lamblin et al., 2003
BLASTX (Altschul et al., 1997) comparisons against the Uniref 100 protein database (August, 2004; Apweiler et al., 2004
Contigs with no or Phaseolus-only BLASTX (Altschul et al., 1997
To allow comparisons between EST projects, the nodule-specific M. truncatula EST contigs identified by Fedorova et al. (2002)
For northern analysis, RNA was extracted from 0.2 g of frozen nodule, root, stem, and leaf using an RNA extraction kit (BIO-101, Irvine, CA). The RNA (10 µg) was denatured in 50% formamide, 17% formaldehyde, and 10% MOPS buffer (200 mM MOPS, pH 7.0, 50 mM Na-acetate, and 1 mM EDTA) at 65°C for 5 min. Twenty micrograms of total RNA were separated on 1.2% agarose gel containing 2.2 M formaldehyde in MOPS buffer and transferred to positively charged nylon membranes (Hybond-N+; Amersham, Buckinghamshire, UK) by downward capillary transfer in 20x SSC. After a 30-min prehybridization (300 mM Na2HPO4, pH 7.2, 7% SDS), the blot was hybridized for 24 h at 65°C with [32P]-labeled specific probes. After stringent washing, radioactive membranes were exposed to x-ray film (Kodak, Rochester, NY) overnight at 70°C. Three repetitions were done for each probe and similar results were obtained. The blots shown are representative of the three repetitions.
The cDNA portion of each nodule EST was amplified by PCR, using standard T3 and T7 primers. Before spotting, the quality of each PCR product was evaluated by gel electrophoresis. The PCR products were spotted in replicate, onto Gene Screen Plus membranes (NEN Life Science Products, Boston) using the Q-bot (Genetix, Boston) automated spotting system with a 96-pin gravity griddling head with 0.4-mm pin diameter.
Total RNA was isolated from mature nodules elicited by R. tropici CIAT 899 and nodule-deprived roots, leaves, and stems from inoculated Negro Jamapa 81 bean plants at 18 dpi. Pod RNA was obtained from a mixture of young developing and mature pods taken from two independent sources. In two independent experiments, RNA was isolated from the organs of plants grown under similar conditions. Total RNA was also isolated from P-deficient roots. Radiolabeled cDNA probes were synthesized by reverse transcription of 30 µg of total RNA for 1 h in the presence of 50 µCi
Radioactivity of each spot was quantified using a Phosphor Screen imaging system (Molecular Dynamics, Sunnyvale, CA). The signal intensity of each spot was determined automatically using the software Array-Pro Analyzer (Media Cybernetics, Carlsbad, CA). This program allows the normalization of quantified signals against the background. The normalized intensities were reported in Excel (Microsoft, Redmond, WA) files and linked to the corresponding cDNA clone. In order to work with highly reproducible experiments, linear regression analysis was performed for each pair of membrane replicas; only those replicas for which the linear model could explain at least 80% of the variation (determination coefficient r2 Genes were considered as reliably expressed if they showed intensity/background ratios greater than 1.5 through all related parallel hybridizations. A final gene set was obtained by joining the genes expressed in each organ and removing all duplications. Single expression values per organ were then calculated as the gene average expression in the sets of correlated replicas. Given that the expression differences between any two organs follow a bell-shaped distribution (data not shown), the t test for paired observations was applied to determine whether genes show significantly different expression values from organ to organ. Nevertheless, we also applied the nonparametric Wilcoxon signed-rank test for matched pairs, which does not rely upon the assumption of normality. Both tests strongly supported the hypothesis of differential expression (P < 0.001). The housekeeping gene polyubiquitin (EST NOD_206_B07) served as an internal normalization control for calculating expression ratios between pairs of organs. The signal intensity value of each gene was divided by the signal value of the polyubiquitin EST in the respective organ. Normalized expression ratios were estimated by dividing the polyubiquitin-normalized signal intensities in nodules by the polyubiquitin-normalized signal intensities in the other organs. Original signal intensities and transformed data of all experiments are available from our Web site (http://www.ccg.unam/phaseolusest/Data_download.htm; see also supplemental data online).
In order to identify gene families, the common bean contigs and singletons were combined into a single dataset. TBLASTX (E-value cutoff of 1012) was used to compare the dataset against itself. As described by Graham et al. (2004)
The ace file output of Phrap was used as input to the PolyBayes SNP detection program along with the base values assigned by Phred for each of the contigged sequences. Perl scripts were used to parse the PolyBayes output file and identify SNPs in two categories. High-probability SNPs had SNP probability values >0.5 and the specific SNP was found in two EST sequences from each genotype. Lower probability SNPs had SNP probability values >0.5 and the SNP were found in one EST from one genotype and at least two ESTs from the other. Perl scripts were used to identify and store 50 bp of sequence on either side of the SNP. Upon request, all novel materials described in this publication will be made available in a timely manner for noncommercial research purposes, subject to the requisite permission from any third-party owners of all or parts of the material. Obtaining any permission will be the responsibility of the requester. Sequence data from this article have been deposited with the EMBL/GenBank data libraries under accession numbers CV528971 through CV544303.
We acknowledge the technical assistance provided by Mike Atkins, Mike Palmer, and Jeff Tomkins at the Clemson University Genomics Institute and help from Monica C. Muñoz, Eliana Gaitan, and Joe Tohme at CIAT. We also gratefully acknowledge Guillermo Dávila and Rosa I. Santamaria for providing the facility and for technical assistance for DNA sequencing at CCG, Unversidad Nacional Autónoma de México, and for Eric Verdorn's assistance in bioinformatics at the University of Minnesota. Received October 20, 2004; returned for revision January 21, 2005; accepted January 30, 2005.
1 This work was supported in part by Consejo Nacional de Ciencia y Tecnología, Mexico (grant no. G31751B at CCG), U.S. Department of Agriculture, Agricultural Research Service, Current Research Information System (project no. 36402100001900D at the University of Minnesota), and by U.S. Agency for International Development at International Center for Tropical Agriculture. M.R. received a postdoctoral fellowship from Consejo Nacional de Ciencia y Tecnología, Mexico.
2 These authors contributed equally to the paper.
[w] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.104.054999. * Corresponding author; e-mail lara{at}ccg.unam.mx; fax 527773174357.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 33893402
Appels MA, Haaker H (1991) Glutamate oxalacetate transaminase in pea root nodules. Plant Physiol 95: 740747
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32: D115D119 Blair MW, Pedraza F, Buendia HF, Gaitán-Solís E, Beebe SE, Gepts P, Tohme J (2003) Development of a genome-wide anchored microsatellite map for common bean (Phaseolus vulgaris L.). Theor Appl Genet 107: 13621374[CrossRef][Web of Science][Medline] Broughton WJ, Hernández G, Blair M, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.)model food legume. Plant Soil 252: 55128[CrossRef] Colebatch G, Desbrosses G, Ott T, Krusell L, Montanari O, Kloska S, Kopka J, Udvardi MK (2004) Global changes in transcription orchestrate metabolic differentiation during symbiotic nitrogen fixation in Lotus japonicus. Plant J 39: 487512[CrossRef][Web of Science][Medline] Colebatch G, Sebastian K, Ben T, Susanne F, Thomas A, Udvardi MK (2002) Novel aspects of symbiotic nitrogen fixation uncovered by transcript profiling with cDNA arrays. Mol Plant Microbe Interact 15: 411420[Web of Science][Medline] Cook DR (1999) Medicago truncatula a model in the making. Curr Opin Plant Biol 2: 301304[CrossRef][Web of Science][Medline] Deroche ME, Carrayol E (1988) Nodule phosphoenolpyruvate carboxylase: a review. Physiol Plant 74: 775782
Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8: 175185
Fedorova M, Van De Mortel J, Matsumoto PA, Cho J, Town CD, Vanden-Bosch KA, Gantt JS, Vance CP (2002) Genome-wide identification of nodule-specific transcripts in the model legume Medicago truncatula. Plant Physiol 130: 519537 Food and Agriculture Organization of the United Nations (2001) FAOSTAT Agriculture Data. http://www.fao.org/Statistics
Gantt JS, Larson RJ, Farnham MW, Pathirana SM, Miller SS, Vance CP (1992) Aspartate aminotransferase in effective and ineffective alfalfa nodules. Plant Physiol 98: 868878
Gepts P (1998) Origin and evolution of common bean: past events and recent trends. Hort Sci 33: 11241130
Graham MA, Silverstein KAT, Cannon SB, VandenBosch KA (2004) Computational identification and characterization of novel genes from legumes. Plant Physiol 135: 11791197
Graham PH, Vance CP (2003) Legumes: importance and constraints to greater use. Plant Physiol 131: 872877 Györgyey J, Vaubert D, Jimenez-Zurdo JI, Charon C, Troussard L, Kondorosi A, Kondorosi E (2000) Analysis of Medicago truncatula nodule expressed sequence tags. Mol Plant Microbe Interact 13: 6271[Web of Science][Medline] Handberg K, Stougaard J (1992) Lotus japonicus, an autogamous, diploid legume species for classical and molecular genetics. Plant J 2: 487496[CrossRef][Web of Science]
Journet EP, van Tuinen D, Gouzy J, Crespeau H, Carreau V, Farmer MJ, Niebel A, Schiex T, Jaillon O, Chatagnier O, et al (2002) Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis. Nucleic Acids Res 30: 55795592
Lamblin AF, Crow JA, Johnson JE, Silverstein KA, Kunau TM, Kilian A, Benz D, Stromvik M, Endre G, VandenBosch KA, et al (2003) MtDB: a database for personalized data mining of the model legume Medicago truncatula transcriptome. Nucleic Acids Res 31: 196201
Lara M, Porta H, Padilla J, Folch J, Sánchez F (1984) Heterogeneity of glutamine synthetase polypeptides in Phaseolus vulgaris L. Plant Physiol 76: 10191023 Lee HL, Hur CG, Oh CJ, Kim HB, Park SY, An CS (2004) Analysis of the root nodule-enhanced transcriptome in soybean. Mol Cells 18: 5362[Web of Science][Medline] Liao H, Yan X, Rubio G, Beebe SE, Blair MW, Lynch JP (2004) Basal root gravitropism and phosphorus acquisition efficiency in common bean. Funct Plant Biol 31: 959970[CrossRef] McClean P, Kami J, Gepts P (2004) Genomic and genetic diversity in common bean. In RF Wilson, HT Stalker, EC Brummer, eds, Legume Crop Genomics. AOCS Press, Champaign, IL, pp 6082 Morales M, Roig E, Monforte AJ, Arús P, Garcia-Mas J (2004) Single-nucleotide polymorphisms detected in expressed sequence tags of melon (Cucumis melo L.). Genome 47: 352360[Medline]
Ortega JL, Sánchez F, Soberón M, Lara M (1992) Regulation of nodule glutamine synthetase by CO2 levels in bean (Phaseolus vulgaris L.). Plant Physiol 98: 584587
Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J (2001) The TIGR gene indices: analysis of gene transcript sequence in highly sampled eukaryotic species. Nucleic Acids Res 29: 159164 Russell J, Booth A, Fuller J, Harrower B, Hedley P, Machray G, Powell W (2004) A comparison of sequence-based polymorphism and haplotype content in transcribed and anonymous regions of the barley genome. Genome 47: 389398[Medline]
Silvente S, Camas A, Lara M (2003) Molecular cloning of the cDNA encoding aspartate aminotransferase from bean root nodules and determination of its role in nodule nitrogen metabolism. J Exp Bot 54: 15451551 Temple SJ, Vance CP, Gantt JS (1998) Glutamate synthase and nitrogen assimilation. Trends Plant Sci 3: 5156
Tenaillon MI, Sawkins MC, Long AD, Gaut RL, Doebley JF, Gaut BS (2001) Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc Natl Acad Sci USA 98: 91619166
Uhde-Stone C, Zinn KE, Ramírez-Yañez M, Li A, Vance CP, Allan DL (2003) Nylon filters array reveal different gene expression in proteoid roots of white lupin in response to phosphorus deficiency. Plant Physiol 131: 10641079 Yan X, Liao H, Beebe SE, Blair MW, Lynch JP (2005) Molecular mapping of QTLs associated with root hairs and acid exudation as related to phosphorus uptake in common bean. Plant Soil (in press)
Zhu YL, Song QJ, Hyten DL, Tassell CP, van Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, Cregan PB (2003) Single-nucleotide polymorphisms in soybean. Genetics 163: 11231134 This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|