|
|
||||||||
|
First published online August 29, 2002; 10.1104/pp.006833 Plant Physiol, October 2002, Vol. 130, pp. 519-537 Genome-Wide Identification of Nodule-Specific Transcripts in the Model Legume Medicago truncatula1Departments of Agronomy and Plant Genetics, 1991 Upper Bedford Circle (M.F., J.v.d.M., P.A.M., C.P.V.) and Plant Biology, 1445 Gortner Avenue (K.A.V., J.S.G.), University of Minnesota, St. Paul, Minnesota 55108; United States Department of Agriculture-Agricultural Research Service, St. Paul, Minnesota 55108 (C.P.V.); and The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850 (J.C., C.D.T.)
The Medicago truncatula expressed sequence tag (EST) database (Gene Index) contains over 140,000 sequences from 30 cDNA libraries. This resource offers the possibility of identifying previously uncharacterized genes and assessing the frequency and tissue specificity of their expression in silico. Because M. truncatula forms symbiotic root nodules, unlike Arabidopsis, this is a particularly important approach in investigating genes specific to nodule development and function in legumes. Our analyses have revealed 340 putative gene products, or tentative consensus sequences (TCs), expressed solely in root nodules. These TCs were represented by two to 379 ESTs. Of these TCs, 3% appear to encode novel proteins, 57% encode proteins with a weak similarity to the GenBank accessions, and 40% encode proteins with strong similarity to the known proteins. Nodule-specific TCs were grouped into nine categories based on the predicted function of their protein products. Besides previously characterized nodulins, other examples of highly abundant nodule-specific transcripts include plantacyanin, agglutinin, embryo-specific protein, and purine permease. Six nodule-specific TCs encode calmodulin-like proteins that possess a unique cleavable transit sequence potentially targeting the protein into the peribacteroid space. Surprisingly, 114 nodule-specific TCs encode small Cys cluster proteins with a cleavable transit peptide. To determine the validity of the in silico analysis, expression of 91 putative nodule-specific TCs was analyzed by macroarray and RNA-blot hybridizations. Nodule-enhanced expression was confirmed experimentally for the TCs composed of five or more ESTs, whereas the results for those TCs containing fewer ESTs were variable.
The rapidly expanding field of
genomics provides vast opportunities for evaluating the coordinated
functioning and expression of thousands of genes (Lockhart and
Winzeler, 2000 In recent years, Medicago truncatula and Lotus
japonicus have emerged as model systems for genomic approaches to
plant-microbe symbiotic associations (Barker et al., 1990 The creation of a large-scale EST database, the M. truncatula Gene Index (MtGI; http://www.tigr.org/tdb/mtgi), from
the results of an international effort in high-throughput sequencing,
offers the prospect of in silico identification of genes whose
expression are specific for or greatly enhanced by symbiosis. Release
4.0 of MtGI was made public in September 2001, and contains over
140,000 sequence entries from 30 non-normalized cDNA libraries
representing various vegetative and reproductive organs. Based upon
sequence overlap, all ESTs are processed into a nonredundant set of
clustered tentative consensus sequences (TCs) and singletons (unique
nonoverlapping sequences; Quackenbush et al., 2000 The potential of in silico analysis of EST collections has been
demonstrated for a number of plant species (Sasaki et al., 1994 The objectives of our studies were to assess whether Boolean analysis
of in silico expression data would be a useful genome-wide approach in
identifying novel genes specific to developing and functioning of root
nodules. The language of the Boolean formalism (Genoud and
Métraux, 1999
In Silico Identification of Nodule-Specific TCs Among 30 cDNA libraries represented in Release 4.0 of MtGI, five were prepared from mRNA extracted from nodules at different developmental stages (Table I). Three major stages of development can be distinguished. The early nodule MtBB library was prepared from emerging nodules attached to the root segments, before detection of N2 fixation (E.-P. Journet, personal communication). R108Mt, GVN, and Nodulated Root libraries represent mature nodules actively fixing N2. It should be noted that the MtBB and the Nodulated Root libraries were prepared from the mixture of roots and nodules and, therefore, potentially contain sequences expressed in root tissues, as well as nodules. Finally, the GVSN library represents senescent nodules. In total, 20,347 EST sequences in MtGI are from nodule libraries, which comprises 14.4% of the 141,501-EST dataset. Given that other cDNA libraries represent all major plant organs (roots, leaves, stems, flowers, pods, and seeds), this number appears to be sufficient for sketching the nodule-specific transcriptome.
The language of Boolean formalism was applied to screen MtGI Release 4.0, and to identify those TCs composed of ESTs derived exclusively from MtBB, R108Mt, GVN, Nodulated Root, or GVSN libraries (operator "OR"), but not from any other library (operator "NOT"). This search revealed 340 entries as nodule-specific TCs. All of these TCs are posted on the M. truncatula Consortium Web site (http://www.medicago.org). Each nodule-specific TC sequence is clustered from individual overlapping ESTs, and, therefore, putatively represents a unique transcript presumably from a single gene. Variability in the number of ESTs comprising each TC likely reflects the differences in abundance of the transcripts from the corresponding genes. Nodule-specific TCs were grouped into four categories based on the number of ESTs contributing to an individual TC contig. Notably, 70% of nodule-specific TCs are represented by two to four ESTs, 17% of the TCs contain five to nine ESTs each, and 7% of the TCs contain 10 to 19 ESTs. Approximately 6% of the TCs contain over 20 ESTs each. Assuming that the number of ESTs comprising a single TC reflects gene expression level, the current categorization of TCs composed of few ESTs as nodule specific may be not final. The likelihood of finding transcripts in non-nodule libraries after deeper sequencing should be considered. This scenario has already proven true for a number of such TCs upon comparison of MtGI Release 3.0 with Release 4.0, which was supplemented with 13,877 additional EST sequences. In addition to 340 nodule-specific contigs (TCs), the MtGI contains 1,867 singletons also sequenced from nodule libraries. They were not considered for further analysis because their nodule-specific status is questionable due to a limited number of identified transcripts. All 340 nodule-specific TCs were again analyzed using BLASTX and
grouped into three categories based on the statistical significance of
their matches to proteins in the GenBank protein database: novel (zero
matches in the database), strong similarity (E values less than
10 Some 40% (137) of the nodule-specific TCs showed strong similarity to
known GenBank sequences, whereas the remaining 57% (193) of the TCs
exhibited weak similarity with GenBank sequences (E values higher than
10 Characterization of Nodule-Specific TCs The 137 TCs showing strong similarity to the GenBank
protein accessions were subdivided into nine categories based upon
the putative function of their strongest
BLASTX score (Fig. 1; Tables II and III).
Of these TCs, function could be predicted for 76 (55%) TCs.
Twenty-three (17%) TCs encoded proteins
of unknown function, previously described in legumes as nodule
specific, or nodulins (Legocki and Verma, 1980
Nine of the functionally defined TCs corresponded to leghemoglobins (Lbs). Lb genes are among the most abundantly expressed nodule-specific genes. Each Lb-encoding TC was composed of 13 (TC31876) to 379 (TC35566) ESTs. Nodulin TCs containing the greatest number of ESTs corresponded to MtN22, ENOD20, nodulin-25, ENOD18, MtN29, MtN1, and EnodGRP5, with each containing 84, 29, 23, 20, 19, 13, and 11 ESTs, respectively. The putative functions or cellular locations of the identified nodulins are listed in Table II. It is worth noting that a number of nodulin TCs contain a high
proportion of ESTs from the MtBB library. This library corresponds to
early nodule development before N2 fixation. At
least 47% of the ESTs in each of TC28588, TC29418, TC28970, TC36450,
TC29982, TC28429, TC37466, TC33130, and TC35962 came from the MtBB
library, indicating that they are early nodulins induced before the
onset of N2 fixation (Nap and Bisseling, 1990 In comparison, among the 660 Lb ESTs sequenced, only three ESTs originate from the MtBB early nodule library. Lbs would be expected to represent a low number of ESTs in MtBB because they are usually most highly expressed in mature N2-fixing nodules (GVN, R108Mt, and Nodulated Root libraries). In contrast to early nodulins, all four ESTs comprising TC40954, which
is similar to M. sativa nodule-specific protein nms22, are
derived from the GVSN library representing senescent nodules. Another
nodule-specific TC (TC40868), also sharing some similarity with nms22
(E value of 10 Because the function of most nodulins is unresolved, we analyzed
the amino acid sequences deduced from their TCs by the PSORT (prediction of protein sorting signals and
localization sites) and the Inter-Pro (identification of protein
functional domains) programs (Table II). These analyses suggest that
homologs of three nodulins, ENOD12 (TC28970), MtN22 (TC31873 and
TC31874), and nodulin-25 (TC35677 and TC35678), possess a cleavable
N-terminal sequence targeting the protein into the endomembrane system
or outside the cell. N-Terminal signal peptides deduced for
MtN22-like TCs are identical. Likewise, the deduced N-terminal signal
peptides for nodulin-25-like TCs are also identical. MtN22- and
nodulin-25-type signal peptides are more similar to each other (48%
identity) than to those of the ENOD12-type signal peptide (31%
and 25% identity, respectively). The N-terminal sequence of M. sativa nodulin-25 was earlier proposed to target the protein into
the PBS of the nodule (Kiss et al., 1990 Despite the original definition of nodulins as genes expressed
exclusively in legume root nodules, eight of the 23 TCs corresponding to the known nodulins also have strong similarities (E values of
10 Besides those encoding Lbs, a group of nodule-specific TCs with strong similarity to genes of known function includes 12 (9%) related to metabolism, 9 (7%) related to transport, 28 (19%) related to signal transduction, 15 (11%) related to cell structure/maintenance, and three (2%) related to growth factor/hormone processes (Table III). Among these groups of TCs, those having the greatest number of ESTs encoded peroxidase precursor (nine), carbonic anhydrase (seven), purine permease (14), calmodulins (14 and 10), bark agglutinin precursor (40 and 12), plantacyanin (45), B12D protein (14), and embryo-specific protein (10). The majority of the TCs, however, are composed of four or fewer ESTs. Two nodule-specific TCs (TC32103 and TC36302) encode proteins that are
similar to a bark lectin-related polypeptide/agglutinin of R. pseudoacacia and Cicer arietinum Basic blue copper protein, or plantacyanin (TC32101), is encoded by
another highly expressed nodule-specific TC. The nucleotide sequence of
TC32101 is 97% identical to that of nodulin MsNod202 encoding a
plantacyanin from M. sativa (Jiménez-Zurdo et al., 2000 Two proteins encoded by nodule-specific TC36259 and TC35428, assembled from 10 and two ESTs, respectively, are 77% identical, and are similar to Arabidopsis embryo-specific protein (GenBank accession no. AB019235). Some 40% of the clones comprising TC36259 were sequenced from the early nodule library (MtBB). One of the unexpected outcomes of in silico survey for nodule-specific
TCs was identification of a TC41286 that encodes a protein similar to a
Rubisco small subunit, a photosynthesis-related protein normally
observed in green tissues. This TC consists of three ESTs derived from
the MtBB, Nodulated Root, and GVSN libraries. Surprisingly, the
statistical significance of the similarity between the deduced amino
acid sequence of TC41286 to Rubisco small subunits of nonleguminous
woody plants Betula verrucosa and L. laricina (E
values of 10 Unique Nodule-Specific Calmodulin-Like Proteins Six nodule-specific TCs with similarity to calmodulins were identified in silico (TC35910, TC35911, TC35912, TC34223, TC41252, and TC37063). The number of ESTs comprising each of these TCs varied from three (TC34223) to 14 (TC35910; Table III). Based upon BLASTX comparisons, the deduced amino acid sequence identity to known calmodulins was lower for the nodule-specific TCs (38%-70%) than for the two TCs encoding typical calmodulins and expressed in various other tissues of M. truncatula (TC31994 and TC35885, 100% identity). Therefore, nodule-specific TCs were named calmodulin-like proteins. Complete coding sequences (CDS) were obtained for all six of these TCs, and also for two TCs encoding typical calmodulins expressed in various other tissues of M. truncatula (TC31994 and TC35885). To verify the assembly of the contig, at least one representative cDNA clone was completely resequenced for each TCs. Complete cDNA sequences corresponding to all nodule-specific calmodulin-like TCs and to two typical calmodulin TCs are deposited to the GenBank under the accession numbers AF494212 through AF494220. With two exceptions, the complete CDS length of calmodulin-like TCs and typical calmodulin TCs was comparable (767-983 bp). The CDS for TC34223 was considerably smaller (567 bp), apparently due to an internal deletion. TC37063 appeared to be assembled of two types of clones, identical throughout the entire sequence, but different in length due to an extension of a 3' region in one of them. Therefore, two versions of TC37063 were proposed, TC37063-s (short, 501 bp) and TC37063-l (long, 781 bp). The 280-bp-long extension at the 3' end of TC37063-l occurred almost entirely in the 3'-untranslated region; however, the deduced amino acid sequence of TC37063-l is also slightly longer (12 additional amino acids preceding the stop codon). Notably, all TCs, including those encoding typical calmodulins possess relatively long 3'-untranslated regions. Four nodule-specific TCs encode longer calmodulin-like polypeptides
(140-179 amino acids for TC35910, TC35911, TC35912, and TC34223) than
the others (TC41252, 116 amino acids, TC37063-s, 103 amino acids; and
TC37063-l, 115 amino acids). TC31994 and TC35885 encode calmodulin
polypeptides of 149 amino acids, similar to most known calmodulins
(Reddy, 2001
The alignment of the deduced amino acid sequences of nodule-specific calmodulin-like TCs, typical calmodulin TCs, and several calmodulins from other organisms is shown in Figure 2. Typical calmodulins possess four Ca2+-binding domains (EF hand motifs; boxed in Fig. 2), each including several highly conserved residues that form Ca2+-binding sites (underlined amino acids). For example, calmodulins of Medicago sativa, bean, T. pyriformis, T. gondii, and both typical calmodulins of M. truncatula contain all four domains. The Inter-Pro program used to determine Ca2+-binding motifs in nodule-specific calmodulin-like TCs showed that these TCs do not contain all four complete Ca2+-binding domains. The optimal amino acid alignment of calmodulin-like proteins with typical calmodulins produces a gap in the amino acid stretch of four calmodulin-like TCs. This gap occurs in the region corresponding to domain II. Three calmodulin-like proteins (TC35910, TC35911, and TC34223) contain complete domains III and IV only; TC35912 contains domain IV; TC41252 contains domain I, and both versions of TC37063 contain domain II. However, many functionally important amino acid residues in the regions corresponding to the missing complete EF motifs are still conserved in all calmodulin-like TCs.
The remarkable unique feature of all nodule-specific calmodulin-like
proteins is a conserved 40-amino acid-long N-terminal extension, which
is absent from all typical calmodulins. As predicted by PSORT analysis,
these N-terminal peptides contain a putative cleavable signal sequences
(24 or 18 amino acids long) that potentially target the proteins into
the endomembrane system or outside the cell. Typical calmodulins
(including M. truncatula TC31994 and TC35885)
lack an N-terminal extension encoding a signal sequence. As
predicted by PSORT analysis, TC31994 and TC35885 polypeptides are
localized in the cytoplasm, typical of the common calmodulins (Zielinski, 1998 Interestingly, the signal peptide of calmodulin-like proteins is very similar to those found in nodulin-25 (TC35677 and TC35678). For example, there is 75% similarity between the signal peptides of TC35678 (nodulin-25) and TC35910 (nodule-specific calmodulin-like protein). In both cases, the cleavage site is predicted to occur after the first 24 amino acids of the polypeptide. However, the mature nodulin-25 does not show any similarity to calmodulins and, as determined by Inter-Pro scanning, lacks any EF hand domains. To determine whether any other M. truncatula sequences, besides nodulin-25, contain such an N-terminal signal motif, we searched the MtGI database with the amino acid signal sequences of calmodulin-like proteins (TBLASTN analysis). No other TCs appear to have such a signal peptide. Among the singletons, only two accessions with the similar signal motif were found (AW127197 and BE999027). Both ESTs were sequenced from nodule libraries (GVN and GVSN, respectively), and their closest nucleotide matches are nodule-specific calmodulin-like proteins (BLASTN against MtGI). Moreover, searching the entire National Center for Biotechnology Information (NCBI) protein database did not reveal additional accessions with similar signal motifs (BLASTP analysis). Nodule-Specific TCs Encoding CCPs Five types of CCPs showing some similarity to previously
described pea (Pisum sativum) nodulin 3 (ENOD3), nodulin 6, and nodulin 14 (Scheres et al., 1990
Three types of changes were found in the first Cys cluster: (a) For 22 of the 114 predicted proteins, the Asp was not conserved; (b) For two predicted proteins (encoded by TC37420 and TC40754), the Cys were separated by three or 12 amino acids instead of 5; and (c) For three predicted proteins, the second Cys was replaced by a Trp or Tyr, possibly due to sequencing errors (Cys is encoded by TGT or TGC, whereas tryptophan is encoded by TGG, and Tyr is encoded by TAT). Therefore, a more correct predominant structure of the first Cys cluster for M. truncatula CCPs would be "Cys-X5-Cys." Deviations from the proposed model were also observed in the structure of the second Cys cluster: (a) In six and three of the predicted proteins, two Cys were separated by five or six amino acids instead of four amino acids; and (b) in seven predicted proteins, one of the Cyst was substituted by Phe, Tyr, or Leu. However, similar to the situation with the first cluster, all of these substitutions may be a result of a single nucleotide sequencing error. Our data indicate that the structure of the second Cys cluster can be best described as "Cys-X4-6-Cys." Because some ESTs for CCP-encoding TCs were sequenced from the MtBB and GVSN libraries, respectively, expression of CCP genes appears to be induced before the onset of N2 fixation and extends throughout nodule senescence. Validation of Nodule-Specific TCs Identified in Silico through Macroarrays and RNA Blots To assess whether genes identified as nodule specific via in silico analysis showed enhanced expression in nodules in vivo, transcript abundance for selected TCs was evaluated by macroarray hybridization and RNA-blot analysis. The 91 TCs chosen for macroarray analysis were composed of a variable number of ESTs: 13 contained 20 or more ESTs, 13 TCs contained 10 to 19 ESTs, 28 TCs contained five to nine ESTs, and 37 TCs contained two to four ESTs. Each TC on the macroarray was represented by two different cDNA clones, and each clone was spotted in duplicate. The experiment evaluated the hybridization intensities for each spot on three different filters probed with radioactively labeled cDNAs derived from nodule, leaf, or root mRNA. Four macroarray hybridizations were performed, each using independently harvested tissue for mRNA extraction. We determined the average nodule-to-root (N:R) and nodule-to-leaf (N:L) ratios of the intensities of hybridization signal for each TC sequence. Table V represents the final N:R and N:L averages from all four experiments. We defined TCs as being nodule enhanced when gene expression in nodules exceeded that in other tissues by at least 2-fold.
For all 91 nodule-specific TCs, the average N:R ratio exceeded 2-fold, confirming that expression of all these genes was enhanced in nodules as compared with roots. However, the average N:L ratio was equal to or exceeded 2-fold for only 72 of the TCs, whereas for 19 TCs it was below this value. Of these 19 TCs, three TCs were represented by six or seven ESTs, and the remaining 16 TCs were composed of three to five ESTs each. Overall, the results of the macroarray hybridizations indicate that in silico-based nodule-specific assignment to the TCs may be not correct for TCs composed of five or fewer ESTs. Final verification of the nodule-specific/-enhanced status for such TCs will require more sensitive experimental methods, such as real-time PCR. From the 91 TCs selected for macroarrays a subset of nine TCs, each composed of four to 25 ESTs, was examined by RNA-blot analysis. Transcript abundance was evaluated in nodule, senescent nodule, root, leaf, flower, and pod tissues. Equivalent loading of RNA was verified by probing blots with a 28S RNA probe. RNA-blot analysis confirmed the nodule-specific/-enhanced nature of TC32516, TC29264, TC35910, TC36259, TC31903, TC28580, and TC29160 (Fig. 4). Expression of two other nodule-specific TCs (TC40870 and TC28421) dramatically increased during nodule senescence (Fig. 4). Expression of TC28421 (encoding Cys proteinase) was almost undetectable in active N2-fixing nodules by RNA-blot analysis. Not surprisingly, this TC assembly is composed of five ESTs from GVSN (senescent nodule library) and of only one EST from the N2-fixing nodule library (R108Mt). Results of both in silico and in vivo northern analyses for TC28421 indicate why macroarray results did not reveal the enhanced transcript abundance in N2-fixing nodules as compared with leaves. The expression of several other TCs with less than five ESTs each was also examined by RNA-blot analysis (data not shown) and was found to have extremely low levels of hybridization in all tissues. RNA blots could not clearly confirm their in silico classification as nodule specific.
In this report, we have extended the understanding of plant genes
involved in symbiotic nitrogen fixation by identifying in silico 340 genes (TCs) that appear to be expressed solely in root nodules.
Nodule-specific TCs represent 2.6% of the total TCs annotated in the
MtGI. They were identified by applying Boolean search operators to
screen 12,925 TCs assembled from over 140,000 ESTs. Nodule-specific TCs
are composed of between two and 84 ESTs. Although EST sequencing previously has been successfully used on a limited scale to identify genes that have nodule-enhanced expression (Szczyglowski et al., 1997 Several advantages of an in silico genome-wide approach are immediately evident. Foremost, the number of gene sequences that can be evaluated is virtually unlimited and the analysis is quite rapid. Second, Boolean search operators can simultaneously be applied to ESTs identified from a large number of cDNA libraries reflecting various organs and tissues. Third, the search can be organized to answer a range of questions, such as: (a) which genes are expressed only in selected libraries and not in all others (i.e. nodule specific), (b) which genes are expressed in common in related libraries (i.e. root and shoot meristems), and (c) which genes are represented in all libraries (i.e. constitutively expressed). Last, microarray analysis of gene expression may be limited due to its availability and cost, whereas in silico expression profiling is available to anyone with access to Internet capabilities. Determining the validity of using in silico expression data as a true
reflection of in vivo transcript abundance is extremely important.
Audic and Claverie (1997) It should be acknowledged that experimental validation of in silico
data on macroarrays and RNA blots is complicated by the potential cross
hybridization of the closely related sequences. This problem has been
already partially addressed in several publications in relation to
microarray (Girke et al., 2000 One of the merits of in silico analysis is the opportunity to obtain an
overview of the variety of nodule-specific TCs. Although function can
be predicted for the protein products of 76 (22%) nodule-specific TCs
(TCs for Lbs, TCs related to metabolism, transport, signal
transduction, cell structure and maintenance, growth factors, and
hormone regulation), 264 (78%) nodule-specific TCs remain functionally
uncharacterized. These include TCs from a weak similarity category, TCs
for novel proteins and nodulins, unclassified TCs, and TCs similar to
hypothetical, unknown, and putative proteins of Arabidopsis and rice.
At least 31% of nodule-specific TCs have strong homology to sequences
from nonlegume species. These are TCs from a strong similarity category
excepting those encoding Lbs and the majority of nodulins. Thus, it
appears that a significant proportion of nodule-specific functions are
performed by recruiting genes common to all plants. In contrast,
approximately one-half of nodule-specific TCs appear to represent the
genes unique for legumes. This can be deduced from the fact that
corresponding transcripts could not be found in nonlegume species,
neither by BLASTX (GenBank protein database) nor by TBLASTX
(GenBank EST database) analyses. Legume-specific TCs are those encoding
novel proteins, Lbs, and the majority of nodulins (from a strong
similarity category), and CCPs (from a weak similarity category). The
remaining nodule-specific TCs belonging to the weak similarity category are a potential resource for revealing more legume-specific genes. The
fact that complete genomic sequences of Arabidopsis and rice are
already available (AGI, 2000 Boolean search analysis revealed several functionally diverse
nodule-specific TCs whose role in nodules was previously overlooked. These include TCs that encode proteins similar to: (a) purine permease,
a high-affinity transporter for adenine, cytosine, and purine
derivatives (Gillissen et al., 2000 Calcium is well recognized as a second messenger, playing a vital role
in plant responses to biotic and abiotic stimuli (Zielinski, 1998 Another provocative role for nodule calmodulin-like proteins would
involve regulation of nodule Glu decarboxylase. This enzyme requires
activation by Ca2+-bound calmodulin to convert
Glu to Although this is the first report on plant-encoded calmodulin-like
proteins related to legume nodule functioning, a rhizobium-encoded calmodulin-like protein, termed calsymin, has been recently identified in the bean microsymbiont Rhizobium etli (Xi et al., 2000 A group of 114 nodule-specific TCs was defined as encoding CCPs. The
first CCP gene (ENOD3) was reported for pea (Scheres et al., 1990 It is noteworthy that a group of plant defensins, apparently encoding
proteinase inhibitors and known for the antifungal activity, also
possess several conserved Cys clusters and an N-terminal signal
sequence (Maitra and Cushman, 1998 Although we have identified 340 putative nodule-specific genes
(nodulins) through an in silico approach, our results need to be viewed
conservatively. As originally defined, nodulin genes are those
expressed exclusively in nodules (Legocki and Verma, 1980 It should be also acknowledged that the parameters of Boolean analysis of M. truncatula EST collection used to identify strictly nodule-specific TCs disregards a large group of genes that are also critically involved in nodule functioning and are expressed in a nodule-enhanced, rather than a nodule-specific, manner. For example, TCs encoding Gln synthetase (TC35731), Suc synthase (TC31899), sulfate transporter (TC29347), and hexose transporter (TC29639) would be in this group. Last, 1,867 singletons have been sequenced from nodule libraries. However, because of their apparently low level of expression, it is not possible to confidently assign them a nodule-specific pattern. A relatively small number of sequences in the database
appear to be derived from M. truncatula plastid and
mitochondrial genomes. Some of these sequences can be assembled into
TCs. However, inspection of the original unprocessed sequence data
shows that none of the plastid or mitochondrial-like DNAs, for which
the complete sequence is available, have a
poly(A+) tail at their 3' end. We suspect that
they may have originated from organellar DNA and not from organellar
transcripts. Because nodule libraries were constructed from material
collected from tissue containing large numbers of S. meliloti, the possibility exists that some nodule ESTs may have
been derived from rhizobium. Therefore, we examined MtGI for the
presence of S. meliloti sequences (Galibert et al., 2001 A number of factors have made the in silico identification of nodule-specific transcripts possible. First, the international community has created a large EST dataset (over 140,000 entries). Second, the ESTs are derived from a collection of libraries constructed from a wide variety of organs, and the data is archived in a relational database. Third, each of the libraries has been sequenced to considerable depth. These factors are extremely important for the validity of an in silico approach, and should be carefully considered for any genome scale analysis.
Database Analyses, Sequencing, and Sequence Analyses Structured query language was applied to analyze the MtGI
Release 4.0 (http://www.tigr.org/tdb/mtgi) and identify a subset of TCs
containing only ESTs from the nodule libraries (MtBB, R108Mt, GVN,
Nodulated Root, and GVSN). These nodule-specific TCs were reanalyzed
using BLASTX against the NCBI protein database
(http://www.ncbi.nlm.nih.gov/BLAST). Additional analysis using TBLASTX
was performed for TCs with zero matches in the protein database. The
GCG Wisconsin software package (Genetics Computer Group, Madison,
WI) was used for sequence analysis and comparisons. Inter-Pro
(http://www.ebi.ac.uk/interpro; Apweiler et al., 2001 Plant Material and Growth Conditions Seeds of Medicago truncatula [Gaertn.], line
A17 of cv Jemalong (T. Huguet, unpublished data), were surface
sterilized for 10 min in sulfuric acid, germinated on the petri plates
for 2 d, then planted in a sand:vermiculate mix. After planting,
seeds were inoculated with Sinorhizobium meliloti 10F51
as described by Egli et al. (1989) RNA Extraction and RNA-Blot Hybridization Total RNA was extracted from nodule, senescent nodule, root,
leaf, flower, and pod tissues as described by Gregerson et al. (1993) Macroarray Hybridization At least two individual clones were evaluated for each TC by
macroarray hybridization. cDNA inserts cloned into pBluescript were
amplified by PCR of 2 µL of 150-µL overnight bacterial cultures using standard T3 and T7 primers. The quality of each PCR product was
examined by gel electrophoresis. Approximately 100 ng of each PCR
product was spotted in duplicate onto GeneScreen Plus membranes (NEN
Life Science Products, Boston). Each experiment evaluated three
membranes hybridized with either 32P-labeled nodule, root,
or leaf first strand cDNA probes. Single-stranded probes were
synthesized from total RNA using SuperScript II reverse transcriptase
(Invitrogen Life Technologies, Carlsbad, CA). The reaction
mixture included 7 µL of RNA primer solution [30 µg of total RNA
and 0.5 µg of oligo(dT)12-18 primer, annealed by heating
to 70°C for 10 min], 4 µL of 5× first strand buffer, 2 µL of
0.1 M dithiothreitol, 1 µL of dNTP mix (2.5 mM dCTP, 2.5 mM dGTP, 2.5 mM dTTP,
and 0.0625 mM dATP), 5 µL of [
Received April 5, 2002; returned for revision May 9, 2002; accepted June 2, 2002. 1 This work was supported by the National Science Foundation (Plant Genome Project no. 9872664) and by the U.S. Department of Agriculture-Agricultural Research Service (grant no. CRIS 3640-21000-014-00D). This is a joint contribution of the U.S. Department of Agriculture-Agricultural Research Service and the Minnesota Agricultural Experimental Station Scientific Journal Series.
* Corresponding author; e-mail vance004{at}umn.edu; fax 651-649-5058.
Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.006833.
|