|
|
||||||||
|
First published online February 24, 2002; 10.1104/pp.010681 Plant Physiol, March 2002, Vol. 128, pp. 896-910 Comparison of RNA Expression Profiles Based on Maize Expressed Sequence Tag Frequency Analysis and Micro-Array Hybridization1Department of Biological Sciences, Stanford University, Stanford, California 94305-5020 (J.F., V.W.); Departments of Zoology and Genetics (V.B., X.G., S.L.) and Statistics (V.B.), Iowa State University, Ames, Iowa 50011-3260; and Department of Plant Sciences, University of Arizona, Tucson, Arizona 85721-0001 (V.L.C., R.P.E., D.W.G., E.A.P.)
Assembly of 73,000 expressed sequence tags (ESTs) representing multiple organs and developmental stages of maize (Zea mays) identified approximately 22,000 tentative unique genes (TUGs) at the criterion of 95% identity. Based on sequence similarity, overlap between any two of nine libraries with more than 3,000 ESTs ranged from 4% to 20% of the constituent TUGs. The most abundant ESTs were recovered from only one or a minority of the libraries, and only 26 EST contigs had members from all nine EST sets (presumably representing ubiquitously expressed genes). For several examples, ESTs for different members of gene families were detected in distinct organs. To study this further, two types of micro-array slides were fabricated, one containing 5,534 ESTs from 10- to 14-d-old endosperm, and the other 4,844 ESTs from immature ear, estimated to represent about 2,800 and 2,500 unique genes, respectively. Each array type was hybridized with fluorescent cDNA targets prepared from endosperm and immature ear poly(A+) RNA. Although the 10- to 14-d-old postpollination endosperm TUGs showed only 12% overlap with immature ear TUGs, endosperm target hybridized with 94% of the ear TUGs, and ear target hybridized with 57% of the endosperm TUGs. Incomplete EST sampling of low-abundance transcripts contributes to an underestimate of shared gene expression profiles. Reassembly of ESTs at the criterion of 90% identity suggests how cross hybridization among gene family members can overestimate the overlap in genes expressed in micro-array hybridization experiments.
A central goal of genome analysis is
to identify and classify all the genes of a particular species.
Functional genomics seeks to understand the precise roles of these
genes, including unique and redundant functions. Apart from
Arabidopsis, for which the complete genome is already available, gene
discovery in most plants is primarily based on sample sequencing of
expressed sequence tags (ESTs) prepared as cDNA to polyadenylated mRNA
(Lim et al., 1996 Alternative and potentially more powerful methods for profiling gene
expression require prior knowledge of the gene sequences garnered from
an EST or genome sequencing project, but measure RNA expression more
directly. One such method relies on PCR amplification of mRNA and
restriction digestion patterns of the resulting cDNAs to enumerate
expressed genes identified by the lengths of fragments generated (Bruce
et al., 2000 Complete interpretation of gene profiling results depends on knowledge
of the underlying genome structure. Ideally, the complete genome
sequence would be available, with accurate prediction of all the genes
and their alternative transcripts. For Arabidopsis, a near complete
genome sequence is available, but current annotation is incomplete (for
review, see Cho and Walbot, 2001 Global analysis of maize (Zea mays) genome structure
indicates that a relatively recent allotetraploidization event occurred approximately 11.5 million years ago (MYA) between grass species that
had diverged approximately 20 MYA (Gaut and Doebley,
1997 Given the complication of recent duplications within the modern maize genome, we were interested in comparing gene expression profiling results and conclusions based on EST sampling with results and conclusions derived from micro-array hybridization. Here, we report gene discovery results, list widely expressed genes, and determine the extent of overlap of EST sequence representation in maize based on about 73,000 ESTs drawn mainly from nine developmental stages. Micro-arrays fabricated with ESTs from either developing endosperm or immature ear were hybridized with the source and heterologous RNA samples. These micro-arrays were analyzed for reproducibility of hybridization results, quantification of transcript levels compared with EST recovery, and the extent of overlap in RNA expression profile between endosperm and ear. Both the EST and initial micro-array analyses demonstrate quantitative differences in expression profiles, but micro-array analyses detected a much higher qualitative overlap in gene expression between the tissues, relative to that observed by EST sequencing.
Assignment of EST as Singlets or Members of Contigs The publicly available maize ESTs are periodically assembled into
unique contigs, annotated, and made available via the ZmDB database (http://www.zmdb. iastate.edu; Gai et al., 2000 In the following sections, we report more detailed analyses of nine cDNA projects with at least 3,000 EST entries each; the projects are listed in Table I. They collectively define 17,096 TUGs composed of 9,597 singlets and 7,499 contigs. Contig assembly depends on both EST length, which averaged from 380 to 520 nucleotides in the Maize Gene Discovery projects considered here, and sequence quality, which is very high (Table I). Plasmid templates were sequenced from only one end in most cases; bidirectional sequencing was used throughout projects 707 and 946 and on a limited basis in other projects as an aid in contig assembly. Of the 12,208 pairs of 5' and 3' sequences available, 8,882 were grouped into single contigs.
Comparing the ESTs recovered from different cDNA libraries, 24% to 43% of the ESTs from a given library were apparently unique to their specific source (Table I). Moreover, about two-thirds (11,280) of the TUGs are comprised of sequences from a single library (Table II). Within each library, the majority (54%-78%) of TUGs were accounted for by singlets and by contigs with two or three ESTs only from that library. These ESTs should represent high to moderately expressed genes in that tissue source because ESTs of rarely expressed genes are unlikely to be sampled.
Pair-Wise Comparison of EST Representation More detailed pair-wise comparisons between individual projects are presented in Figure 1. The shared TUGs are shown between two stages of tassel development (618 and 946) in Figure 1A, between tassel (618) and ear (606) inflorescences at the stage of spikelet formation in Figure 1B, and between immature ear (606) and 10- to 14-d-old endosperm (605) in Figure 1C. The smaller pie chart in each Figure 1, A through C, represents the fraction of TUGs containing at least one EST from each of the two libraries being compared as a measure of the overlap between the two EST projects. The "C + C" slice (purple) comprises contigs (C) with at least two ESTs from each of the two libraries. The "S + S" and "S + C" slices in blue and red, respectively, indicate contigs comprised of either two singlets (S) or a singlet from one of the libraries and a contig from the other library. The primary pie chart has four colored slices. Using Figure 1A as an example, the "contig 618" (pink) and "contig 946" (green) slices indicate contigs comprised of ESTs from just that library. The largest slices in the pair-wise analysis are the singlets in each library: "singlet 618" in yellow and "singlet 946" in orange.
As shown in Figure 1A, TUGs from tassel primordia before organ differentiation (project 946) share only 14% of TUGs with tassels after organ differentiation (project 618). Tassel and ear at the same stage of spikelet differentiation show only 10% overlap (Fig. 1B). The extent of overlap between these inflorescence projects is similar to the extent of overlap between 10- to 14-d-old endosperm (i.e. endosperm at the stage when storage protein genes are first transcribed) and immature ear (Fig. 1C). Because our micro-array analysis was focused on the comparison of endosperm and immature ear, additional detail is worth noting in this comparison. The 3,113 endosperm TUGs (from 6,109 ESTs) and 2,595 ear TUGs (from 4,845 ESTs) overlap by only 12%, and the common sequences are drawn approximately equally from the S + S, S + C, and C + C classes. Of 2,163 singlets in 605 and 1,850 singlets in 606, only 181 contigs formed when the two groups of singlets were combined. In these two projects, >95% of the sequences are from the 5' end of each cDNA clone. To assess the significance of the overlap percentage, we randomly halved two of the largest EST projects, 614 and 946, and calculated the degree of overlap between the two halves of each project. Because these comparisons are between samples of ESTs from the same source, once robust sampling is completed such that all ESTs are recovered multiple times, 100% overlap is expected; in less complete samples, singlets can only be represented in one half or the other. Based on six repeated random assignments, the range of overlap was found to be 41% to 44% for 614 and 32% to 34% for 946, with about equal fractions of the S + S, S + C, and C + C classes. Thus, we conclude that incomplete EST sampling is only one of the factors contributing to the low overlap between different sources. To compare the overlap of the other projects with projects 605 and 606, doughnut charts were constructed as shown in Figure 2. Each concentric ring in a doughnut chart represents the TUG overlap between two projects, one of which is kept constant across all comparisons (the white ring). The first three elements of each ring (starting at the top and moving clockwise) represent the percent of TUGs containing at least one EST from each of the two projects being compared. This is equivalent to the secondary pie charts in Figure 1, and the same color scheme is used. These shared sequences represent 9.7% to17.2% of the total TUGs from endosperm (Fig. 2A) and ear (Fig. 2B) and the other projects, respectively. Note that endosperm and ear typically share different contigs with the other projects. The next two elements of each ring are contigs comprised of ESTs from only one of the libraries in the comparison. As mentioned for the pie charts, the largest elements in the pair-wise analysis of libraries are the singlets in each library. These singlet classes comprise 27.6% to 37.5% of the total TUGs for endosperm and 25.0% to 34.5% for immature ear in comparisons to all eight other libraries.
Pair-wise comparisons for the other libraries gave similar results. Additional pie and doughnut charts are displayed at http://zmdb.iastate.edu/zmdb/publications/Fetal01-sm.html. The general conclusion is that at this level of EST sampling of developmentally staged organs or organ mixtures, distinctive suites of genes are detected with low overlap among organs. Of the EST projects examined, mixed adult organs (707) show the largest number of distinctive contigs that do not match either 605 or 606. Mixed adult organs (707) and embryos (687) have the highest percent of distinct singlets in pair-wise comparisons. Because EST sampling was not exhaustive, many expressed genes may have been overlooked. In particular, genes with low constitutive expression or moderate expression but in very limited domains within organs may not be defined by an EST. Abundant ESTs Up to 5% of the ESTs sequenced from a library assembled into a contig with five or more ESTs solely from that library as outlined in Table III. These contigs presumably represent genes that are highly expressed in particular tissues. Knowledge of EST representation provides candidate genes for recovery of promoters conferring stage or organ-specific expression. Compilation of maize contigs with a user-specified percent representation in a specific organ can be generated at http://www.tigr.org/tdb/zmgi/.
Highly sampled contigs are listed at
http://zmdb.iastate.edu/zmdb/publications/Fetal01-sm.html.
All are found in at least two libraries, and nine were represented in
all libraries. Five of the 30 most abundant EST clusters have no
significant match at GenBank and could represent novel maize
genes or possibly genes from fungi, insects, or other organisms
associated with maize. It is surprising that although retrotransposons
make up about two-thirds of the maize genome (SanMiguel et al., 1996 Ubiquitously expressed genes are represented by 26 contigs found in all
nine of the EST projects. As shown in Table V, available at
http://zmdb.iastate.edu/zmdb/publications/Fetal01-sm.html, all but three of these 26 contigs have more than one EST from each of
the nine libraries. Based on similarity to known maize genes or high
similarity to genes of known function in other organisms, nearly all of
these widely expressed ESTs represent "housekeeping" functions.
Despite their ubiquity, the majority of these widely expressed genes
exhibit a skewed distribution among libraries, with abundant
representation in only one or a few libraries. For example, of the 172 Duplicate Genes with Quantitatively Different Expression Patterns For pairs of closely related contigs, we asked if there were examples of high representation in one library combined with absence in other libraries. TUC01-12-19-1991.1 and TUC01-09-30-4459.2 share >99% sequence similarity to maize cytosolic glyceraldehyde-3-phosphate dehydrogenase genes Gpc3 and Gpc4, respectively. Eleven ESTs derived from the 660 library were from Gpc4 (TUC01-09-30-4459.2), but none were found for Gpc3 (TUC01-12-19-1991.1). TUC01-12-19-4269.1 and TUC05-31-1869.1 both share approximately 95% nucleotide sequence and high overall similarity to maize zein protein. TUC01-12-19-4269.1 is expressed in early embryo as judged by the presence of 15 ESTs derived from the early embryo library (687), although TUC05-31-1869.1 has no early embryo matches. In contrast, TUC05-31-1869.1 contains ESTs expressed in the endosperm library (605), while TUC01-12-19-4269.1 lacks contributions from the endosperm EST group. Contamination of embryo tissue samples by endosperm could explain these results, although we would have expected recovery of multiple types of zein ESTs in that case. A third example of possible tissue-specific expression is provided by TUC01-26-861.2 and TUC01-12-19-3881.1, which share approximately 89% sequence similarity to a subunit of the vacuolar proton ATPase. TUC01-12-19-3881.1 is expressed in root as judged by the contribution of 23 ESTs from library 614, but there were no representatives from other libraries. The root ESTs do not contribute to TUC01-26-861.2. Considering all aspects of EST analysis, the organs and tissues sampled show distinct gene expression profiles, suggesting that detailed analysis of EST organ distribution among gene families will provide hypotheses for tissue-specific expression that can be tested further. Micro-Array Hybridization Analysis of Gene Expression The EST analysis indicates low TUG overlap between libraries, despite the availability of relatively long ESTs representing 42,824 5' and 19,687 3' sequences from a total of 51,665 plasmids (some of which were sequenced multiple times) analyzed from the nine major libraries. This leads to the important question as to whether frequency distributions of ESTs within libraries adequately represent mRNA abundances within the organs and tissues examined. As an alternative means to assay mRNA expression patterns, we examined micro-array hybridization profiles from the 605 endosperm and 606 ear micro-arrays. Micro-arrays of the 605 and 606 projects were separately printed on glass slides. The 605 micro-arrays were printed in two formats: as a single array with adjacent duplicate elements (605.04) and as two adjacent arrays (605.03). A panel of control elements was spotted as duplicate elements at the top and bottom of the single array (605.04) or as single elements at the top and bottom of each replicate array (605.03). The 606 micro-arrays were printed as a single array with adjacent triplicate elements; controls were spotted as triplicate elements at the top and bottom of the array. Array formats are described in more detail in "Materials and Methods" and at http://zmdb.iastate.edu/zmdb/microarray/arrays-info.html. Controls include individual clones, specifically selected for this purpose, as well as ESTs identified through data mining of the sequences that were present within the individual libraries. A description of the controls used on the micro-arrays can be found at http://zmdb.iastate.edu/zmdb/microarray/controls.html. To evaluate the reproducibility of hybridization signals obtained from the micro-arrays, data from several different types of experiments were used. These include experiments in which: (a) only one labeled RNA was used in the hybridization, (b) a mixture of the same RNA labeled separately with Cy3 or Cy5 was used, or (c) a mixture of RNA from two different tissues labeled separately with Cy3 or Cy5 was used in reciprocal pair-wise hybridizations (dye reversal experiments). To compare gene expression patterns of endosperm and ear tissues, poly(A+) mRNA was prepared from 10- to 14-d-old endosperm and 1- to 2-cm ear primordia, at similar stages and of the same genotype as those used for library construction. Details of the labeling and hybridization protocols are provided in "Materials and Methods" and at http://zmdb.iastate.edu/zmdb/microarray/protocols.html. Descriptions of experiments, hybridization images, and original data sets are available at http://zmdb.iastate.edu/zmdb/microarray/data.html. Simple linear correlation analysis was used in pair-wise comparisons to evaluate variation within and between micro-arrays and within and between slides. The Pearson correlation coefficient (R) was computed as a means to quantitatively describe the strength of the relationship between replicates. Micro-Array Reproducibility within Single Glass Slides In this set of experiments, we employed the 605.04 slides containing a single micro-array with duplicate adjacent array elements. Correlation analysis was performed between the adjacent elements comparing the absolute signal intensities of the two replicates. We also employed the 606 micro-arrays containing a single micro-array, but with triplicate, adjacent array elements. In this case, the signal intensities for the first and third spot were compared (comparable correlation coefficients were observed for the other two possible pair-wise comparisons). If two labels were applied to the same slide, comparisons of signal intensities from each channel, as well as the ratio of signal intensities, are reported. For those within slide, highly local comparisons (over distances of approximately 200 µm) R values ranged from 0.92 to 0.98 for the individual channel intensity values (five and eight separate hybridizations of the 605 and 606 micro-arrays, respectively). Comparison of the ratio of signals (log10 transformed) between replicates (two separate hybridizations of each micro-array type) yielded R values ranging from 0.77 to 0.95. Because the ratio values combine the variation of two individual signal measurements, they would be expected to be more variable. In the next experiment, we employed the 605.03 slides, containing duplicate adjacent micro-arrays on the same glass slide, to compare the reproducibility of hybridization within the same slides but over larger distances (18 mm). The signal intensities produced by each array element in one micro-array were compared with the corresponding element in the second array on the same slide. Correlation analysis yielded R values ranging from 0.95 to 0.97 in two separate hybridizations. We also examined reproducibility of hybridization within a glass slide when a mixture of two different preparations of the same RNA was applied. In this experiment, we used total RNA from ear separately labeled with Cy3 or Cy5 and applied the mixture to two 606 slides. The signal intensities produced by each array element in one channel were compared with the corresponding element in the second channel on the same slide. Correlation analysis resulted in R values of 0.99 in two separate hybridizations. Between Micro-Array Reproducibility In the next series of experiments, we compared the signal intensities of hybridization produced by each array element on one slide with the corresponding element on a second slide. In these experiments, the mean signal intensities for replicate elements in the Cy3 channel on one slide were compared with mean signal intensities in the Cy5 channel on a second slide. In this dye reversal experiment, R values ranged from 0.61 to 0.92 for all comparisons involving two and three separate hybridizations of the 605 and 606 micro-arrays, respectively. Hybridization in the dye reversal experiment was more variable between replicate glass slides than between replicate elements within the same slide. Micro-Array-Based Analysis of Gene Expression in Different Tissues After establishing the reproducibility of hybridization,
micro-arrays were used to examine the patterns of hybridization between tissues, with the goal of estimating shared expression. Mixtures of
labeled targets from endosperm and ear RNA were applied to 606 ear and
605 endosperm micro-arrays in dye-reversal experiments. To
normalize signal intensities, we applied rank correlation analysis to
identify a subset of the plant control genes whose expression patterns
are similar among tissues (as described in "Materials and
Methods" and more fully at
http://zmdb.iastate.edu/zmdb/publications/Fetal01-sm.html). After two iterations of correlation analysis of hybridization signals
and exclusion of outliers, the slope of the resulting trend line was used to normalize signal intensities between channels. Control genes that were consistently expressed more highly in endosperm than ear tissue included vacuolar ATPase and EFs 1 The mean of the observed signal intensities in each channel and the coefficient of variation for the ratio of signals were calculated from the replicate elements on each slide. A comparison of the coefficient of variation of the signal ratios versus signal intensity indicates that noise in the signal ratios diminishes as signal intensity increases (Fig. 3); this pattern did not differ among slides. Typically, at a signal intensity of 2,000 units (about 3% of the maximum signal intensity), the amplitude of the coefficient of variation in hybridization signal ratios is nearly constant; we used this threshold to identify ESTs with specific hybridization signals above background noise.
Based on the comparison of TUGs identified in the EST sampling, we
found only 12% of TUGs in common, using the criterion of 95% match
over 40 bases. Micro-array hybridization is conducted at a stringency
at which 90% matching over 60 bases should suffice to form a stable
hybrid; therefore, we can calculate an expectation for the percent
"overlap" between endosperm and ear by conducting a new EST
assembly of the elements printed on the 605 (endosperm) and 606 (ear)
micro-arrays. This new estimate is conservative because experimental
data indicate that cross hybridization occurs on micro-arrays when
individual gene targets retain 80% to 85% similarity to one or more
of 142 Arabidopsis cytochrome P450 genes (Xu et al., 2001 For experimental comparison of TUG expression patterns between tissues, the signal intensity of a contig (composed of multiple ESTs) was represented by the maximum signal intensity of all ESTs in that contig. Figures 4 and 5 compare the hybridization signal intensity of endosperm and ear targets for each TUG on an endosperm and ear micro-array, respectively. Data are presented for only one experiment of each micro-array type; patterns of hybridization did not differ between the replicate micro-arrays of each type nor in a subsequent replicate of the entire experiment with new RNA samples hybridized to pairs of ear and endosperm slides (data not shown). On the endosperm micro-array (Fig. 4), the signal intensity for most TUGs is greater for target derived from endosperm than from ear over the entire signal intensity range. This pattern differs from that observed on the ear micro-array (Fig. 5), where the signal intensity for most TUGs is similar for targets from both tissues.
Figures 6 and 7 show the cumulative percentage of TUGs that hybridize with endosperm and ear targets as a function of hybridization signal intensity on an endosperm and ear micro-array, respectively. Mixtures of labeled targets from endosperm and ear RNA were applied to two 606 ear and two 605 endosperm micro-arrays in dye reversal experiments. The entire experiment was repeated with independent RNA preparations. On endosperm micro-arrays, at a conservative threshold of 2,000 units, an average of 83% of the TUGs hybridize with endosperm target and 57% hybridize with ear target. Approximately 10% of the TUGs that hybridize with endosperm target have signal intensity values that saturate, whereas fewer than 1% of that hybridize with ear target have values this high. As would be expected from the scatter plot (Fig. 4), a larger percentage of TUGs hybridize with endosperm, compared with ear target, over the entire range of signal intensity values on an endosperm micro-array (Fig. 6). This pattern differs from that observed on the ear micro-array where the average from four experiments is that 94% of the TUGs hybridize with targets from both tissues; data from one experiment are shown in Figure 7. Approximately the same percentage of TUGs (1%) hybridize with ear and endosperm targets at maximum signal intensity. Over the entire range of signal intensity values, the percentage of TUGs that hybridize with ear target is only slightly greater (<15%) than with endosperm target on the ear micro-array (Fig. 7).
Our general conclusions are that the TUGs contained within the endosperm project are typically more endosperm specific, are generally expressed at a higher level in endosperm, and that 10% of them had saturating signal intensities. In contrast, the TUGs contained within the ear project are more equally expressed in these two tissues. Furthermore, the micro-array experiments demonstrate that overlap is significantly greater than would be predicted by EST sampling or even by a 90% criterion of EST assembly: Approximately 57% and 94% of the TUGs hybridized above the 2,000 unit threshold with the heterologous target on the endosperm and ear micro-arrays, respectively.
ESTs are a quick and economical method for discovery of genes with moderate to abundant transcript levels. By sampling diverse organs at discrete developmental stages, high-quality ESTs assembled at stringent criteria can provide information on which genes in a gene family are expressed at quantitatively higher levels at specific stages in the plant life cycle. There are currently more than 106,000 maize ESTs in the public databases. A central annotation problem for EST collections is to estimate redundancy and to cluster ESTs into contigs that represent unique gene fragments. Such analysis is periodically performed at the ZmDB maize genome database (http://zmdb.iastate.edu). The most recent assembly of September 30, 2001, resulted in 28,220 TUGs composed of 15,095 contigs and 13,125 singlets. The number of TUGs is typically overestimated because ESTs from different parts of the same transcript type are reported separately until an EST overlap is generated. EST assembly that clusters sequences based on a minimum criterion of
95% sequence identity in a region of >40 bases (criteria used in the
ZmDB assemblies) should separate loci that are derived from the two
progenitor species of modern maize provided sequence polymorphisms
accumulate at a rate greater than one base change per 40 bases every 20 MYA (Gaut and Doebley, 1997 The most striking feature of the EST collection for maize is that
relatively few ubiquitously expressed genes were identified. By
RNA-excess DNA-RNA hybridization analysis, it was estimated that
approximately 5,000 genes were expressed in common among the major
organ systems of tobacco (Nicotiana tabacum; Kamalay and Goldberg, 1980 The second major finding is the low extent of overlap of TUGs between
EST projects. These results indicate that for the readily recovered
transcript classes, each tissue and developmental stage sampled has a
relatively distinctive suite of moderately to highly expressed genes,
often specific members of gene families. It seems likely that
individual members of gene families are, in general, expressed in
distinct patterns within maize. Ear (606) and tassel primordia (618) at
the same stage of development, just after specification of floral
organs, exhibit just 10% TUG overlap (Fig. 1B). Tassels at three
stages of development Because of the limitation of EST sampling to fully define mRNA
representation, we turned to micro-array analysis to test the conclusion that tissues and organs express largely discrete suites of
genes. Micro-arrays of ESTs or other representations of genes are a
powerful tool for identifying genes that are coordinately expressed
during a particular environmental treatment, in a specific genetic
background, or at a defined developmental stage. Array analysis, which
examines the covariance of expression patterns, can identify genes with
similar quantitative and qualitative aspects of expression. Confidence
in interpreting the results comes from identifying previously
well-studied genes in the expected patterns such as during acquisition
of systemic acquired resistance (Maleck et al., 2000 In our study, we used ESTs as candidates of tissue specificity and then
used micro-array experiments to ask how well such preliminary evidence
on expression predicted hybridization behavior with a second tissue.
Confidence in the results provided by micro-arrays is enhanced if it
can be demonstrated that the process of hybridization is intrinsically
accurate. Our results indicate very high within-slide reproducibility,
with somewhat lower reproducibility between slides. These results are
similar to other published data, in most cases exceeding the
corresponding values reported (Girke et al., 2000 In experiments with differentially labeled endosperm and ear targets, we detected extensive hybridization to both the 605 (endosperm) and 606 (ear) micro-arrays: 57% of the ear targets hybridized to the endosperm TUGs and 94% of the endosperm targets hybridized to the ear micro-arrays. These results set an upper bound of the overlap in gene expression between endosperm and immature ear because cross hybridization among closely related gene family members has undoubtedly occurred. Reassembly of contigs at the 90% match criterion reduced the proportion of unique TUGs within each array type to just 30% of elements printed and greatly increased the size of some contigs as multiple gene family members co-assembled. Relevant to the experimental results, the 90% match criterion predicted 40% endosperm target and 46% ear target hybridization to the heterologous array, based solely on the types of ESTs recovered in each project; this calculation cannot take into account shallow sampling within an EST project. At an even lower criterion, an even larger fraction of the ESTs within each array would assemble into contigs, and there would be an even higher prediction of overlap between ear and endosperm RNA samples. The true extent of overlap in gene expression between ear and endosperm
most likely lies between the estimates based on ESTs and micro-array
hybridization. EST sampling is clearly incomplete, resulting in an
under-estimate. Second, hybridization can occur among members of gene
families, overestimating the number of expressed genes in common using
micro-array profiling. A recent study using micro-arrays fabricated
with seed-derived ESTs of Arabidopsis similarly found 60% to 77%
overlap in expression with heterologous organs (Girke et al.,
2000 Because maize gene families have diverged during the approximately 20 MYA since the separation of the two species that later formed the
allotetraploid leading up to modern maize (Gaut and Doebley, 1997
EST Sequencing cDNA libraries were prepared from nine
different tissue sources. EST collections are identified by
three-digit project numbers assigned at the Stanford Genome
Technology Center (Palo Alto, CA; Table I). Detailed
descriptions of the tissue samples used for cDNA library construction
are provided at http://zmdb.iastate.edu/zmdb/EST/libraries.html (for the analysis reported here, data from EST projects 707 and 945 were combined and listed as project 707 because these cDNAs were
derived from the same library). Bacterial colonies containing cloned
cDNAs were transferred into 96 deep-well blocks containing Terrific
Broth (1.2 mL well Except for library 486, which was sequenced in part using gel-based
equipment (ABI377, Applied Biosystems), sequencing was performed with
MegaBACE capillary sequencers (Molecular Dynamics, Sunnyvale, CA). Base
calling and quality assessment were evaluated using Phred (Ewing and
Green, 1998 EST Assembly EST contig assembly of all maize (Zea mays) ESTs
in GenBank is periodically performed at ZmDB using the
ZmDBAssembler protocol (http://zmdb.iastate.edu/zmdb/EST/assembly.html).
ZmDBAssembler provides the logical flow between several third party
programs used in the protocol, including BLAST (Altschul et al., 1997 The assembly creates two sequence classes: TUCs and TUSs. Contigs are EST clusters with two or more member ESTs. Singlets are ESTs that are not significantly similar to any other ESTs. The combined TUCs and TUSs represent an approximate set of TUGs. "Tentative" indicates that all classifications are subject to constant and frequent changes as new ESTs are added to the assembly. New assemblies of available ESTs are compiled approximately every 4 months and reported at ZmDB. The analysis in this paper is based on the assembly of September 7, 2000. TUC names are assigned based on the last assembly date that changed a given TUC. For example, TUC09-07-5391.1 is a contig assembled on September 7, 2000. The number 5,391 reflects the contig number in that particular assembly. The terminal digit would be different from 1 only if a preliminary contig was split up by refined analysis with CAP3. The history of TUC names over successive assemblies can be traced at ZmDB. Project Overlap The degree of overlap between two EST projects was assessed by
the fraction of TUGs from the two projects that comprise
common contigs using all items available in GenBank. This procedure
estimates the fraction of genes expressed in both conditions
represented by the two projects. Precisely, for each TUG, we derived
the count of member ESTs from each cDNA library. Then, for each pair of projects, contigs were placed into one of the first five categories below, whereas singlets were assigned to one of the last two
categories: contig/contig The fraction of TUGs placed in the first three categories gives the degree of overlap. Because EST sampling was incomplete, actual overlap will be higher than calculated. Micro-Array Fabrication A detailed description of micro-array construction, data sets, and hybridization methods can be found at http://zmdb.iastate.edu/zmdb/microarray. For this study, micro-arrays representing the 605 and 606 EST projects were separately printed on glass slides. Copies of the slides can be ordered online at http://zmdb.iastate.edu/zmdb/microarray/ordering.html. The amplified inserts of all clones from the 605 EST project were
printed on the "605 endosperm micro-arrays," generating arrays with
considerable internal redundancy of EST representation for some genes.
In contrast, before PCR amplification and printing on the "606 ear
micro-arrays," the 606 EST project was consolidated by removal of
1,932 clones for which no sequence data was available. Information on ESTs contained within each project and their locations on the micro-array slides are provided at
http://zmdb.iastate.edu/zmdb/microarray/libraries.html. In
addition to the project maize ESTs, a panel of controls was selected
for printing on all micro-arrays (see
http://zmdb.iastate.edu/zmdb/microarray/controls.html). A
liquid-handling robot (Beckman-Coulter Biomek 2000, Fullerton, CA) was
used to remove 5-µL aliquots (1/10 volume) of the selected amplicons
to a new plate. Each amplicon was diluted with 15 µL of sterile water
and stored at The 605 and 606 projects (plus associated controls) were printed on separate glass slides. The 605 micro-arrays were printed in two formats: as a single array with adjacent duplicate elements (605.04) and as two adjacent arrays (605.03). The 606 micro-arrays were printed as a single array with three replicate spots for each element. The control plates were printed twice (at the start and end of each print). The 605 micro-arrays were produced from 84 sample plates (8,064 elements), and 93 different controls, to give a total of 16,500 array elements, with a center-to-center spacing of 190 µm. The elements that were of sufficient sequence length and quality were reported to GenBank (5,534); these ESTs represented approximately 2,800 TUGs. The 606 micro-arrays were produced from 52 sample plates (4,980 elements) and two control plates (113 controls) for a total of 15,618 array elements, with 195-µm spacing; these ESTs represented approximately 2,550 TUGs. A more detailed description of the format used to print each micro-array can be found at http://zmdb.iastate.edu/zmdb/microarray/arrays-info.html. Preparation of RNA Samples and Micro-Array Hybridization RNA samples were prepared from endosperm 10 to 14 d after pollination and from 1- to 2-cm immature ears of self-pollinated OH43 inbred plants grown in spring 2000 in Tucson, AZ. RNA was purified from tissue samples pulverized in liquid nitrogen. Total RNA was isolated using TRIzol (GibcoBRL Life Technologies, Rockville, MD) and poly(A+) mRNA was purified using DynaBeads Oligo (dT)25 (Dynal A.S., Oslo) according to manufacturers' instructions. For each sample, either 200 µg of total RNA or 4 µg of poly(A+) mRNA was labeled using either Cy3- or Cy5-dUTP (products PA53022 and PA55022, Amersham Pharmacia, Piscataway, NJ). Sigma's AMV-RT Kit (product STR1-KT) was used for the labeling reaction according to manufacturer's instructions. A more detailed description of RNA isolation and labeling protocols can be found at http://zmdb.iastate.edu/zmdb/microarray/protocols.html. Before hybridization, slides were held face down over a 42°C water bath for 5 to 10 s to rehydrate the array elements and then snap dried on a 70°C to 80°C heat block for 3 to 10 s. DNA was cross-linked to the glass slide using 65 mJ of 254-nm UV-C radiation (FB UVXL-1000 UV Cross Linker set to 650 × 100 µJ/cm2, Fisher Scientific, Pittsburgh). Slides were then washed for 2 min in 1% (w/v) SDS on an orbital shaker, washed for 2 min in 95°C water, rinsed by plunging rapidly 10 to 20 times in a 100% (w/v) ethanol bath at room temperature, and immediately dried by centrifugation at 50g to 100g for 2 to 5 min. Dry arrays were used immediately or were stored, for less than 7 d, at room temperature. Hybridization followed the slide manufacturer's recommended protocol with slight modifications (Sigma Technical Bulletin MB-745). The hybridization mixture consisted of 4 µg of each labeled mRNA target or, if total RNA was used, 200 µg of each labeled target; 2 µL of Liquid Block (Amersham product RP3601); 4 µL of 20× SSC buffer; and 1 µL of 2% (w/v) SDS, in a final volume of 30 µL. The mixture was denatured at 95°C for 2 min and then transferred to ice. The hybridization mixture was applied to a micro-array slide preheated to 65°C on a heat block, and quickly covered with a coverslip (Sigma Hybrislips Z36 591-2). The micro-array was then immediately transferred to a prewarmed hybridization chamber (50-mL plastic screw-top tube containing a paper towel moistened with 3× SSC) and then incubated overnight (8-12 h) at 62°C. To terminate hybridization, the slide was processed by successive 5-min washes in 2× SSC and 0.5% (w/v) SDS at 62°C, and in 0.5× SSC and then 0.05× SSC at room temperature. Slides were immediately dried by centrifugation at 50g to100g for several minutes. A more detailed description of DNA immobilization and hybridization can be found at http://zmdb.iastate.edu/zmdb/microarray/protocols.html. Micro-Array Data Acquisition and Analysis Slides were scanned within 8 h of hybridization using a GSI Lumonics ScanArray 3000 (Packard BioChip Technologies, Billerica, MA). Spot finding and analysis of signal intensity were carried out using ImaGene software (BioDiscovery, Los Angeles). For each element on the micro-array, net signal intensity was computed from the median signal minus the median value of the local background. Local background is calculated from the median signal in the ring-shaped area surrounding the element, 40 microns from the element and 70 microns in width. To normalize the signal intensities between different tissues, we applied rank correlation to the signal intensity values of 78 elements to identify control genes whose gene expression patterns do not vary between tissues. The 78 elements are comprised of the two replicates of 39 separate maize controls (http://gremlin3.zool.iastate.edu/zmdb/microarray/controls.html). Those elements that differed in signal intensity rank by 20% of the total number of ranks (16 ranks) were excluded from the normalization because the expression of these genes is not similar among tissues. For the remaining control genes, the mean signal intensities for replicate elements in one channel (tissue 1) were compared with mean signal intensities in the second channel (tissue 2) on same slide. The slope of the trend line was used to normalize the signal intensity values of the control elements. We found that after normalization, some of the control elements with high ranks in both channels still had signal ratios that were >2 or <0.5, suggesting tissue specific expression patterns. These were excluded and the normalization process was reiterated. The slope of the subsequent trend line was used for the normalization of all elements on the slide. The genes contributing to normalization are listed at http://gremlin3.zool.iastate.edu/zmbd/microarray/605+606controls.html. Further details of normalization can be found at http://gremlin3.zool.iastate.edu/zmbd/microarray/furtherdetails.html. Ratios of signal intensities were calculated by dividing the signal intensity from the experimental condition (e.g. RNA from the heterologous tissue) by that from the control condition (e.g. RNA from the same tissue used to make the micro-array elements). The mean and coefficients of variation for the observed signal intensities in each channel and the ratio of signals were calculated from the two replicate elements (605 endosperm micro-array) and three replicate elements (606 ear micro-array) on each slide. Simple linear correlation analysis was used in pair-wise comparisons to evaluate variation within and between micro-arrays and within and between slides. The Pearson correlation coefficient (R) was computed as a means to quantitatively describe the strength of the relationship between replicates. In comparison of TUG expression patterns between tissues, signal intensity for a contig was represented by the maximum signal intensity among all member ESTs of the contig. Signal intensity of the contig was determined for each channel separately; this means that the maximum signal in each channel does not necessarily come from the same EST. More information on data analysis can be found at http://gremlin3.zool.iastate.edu/zmdb/microarray/protocols.html.
We thank Brian Nakao, Gurpreet Randhawa, Bret Schneider, and Khaled Sarsour of the Stanford Genome Technology Center for their work in EST production sequencing and members of the maize community for supplying cDNA libraries, as listed at http://zmdb.iastate. edu/give. Liqun Xing made substantial contributions to the data analysis in the early stages of this work while he was at Iowa State University. We thank Dominic DeCianne (University of Arizona) for assistance with PCR amplification of ESTs, gel electrophoresis, and micro-array printing.
Received August 1, 2001; returned for revision October 2, 2001; accepted December 3, 2001. 1 This work was supported by the National Science Foundation Plant Genome Research Program as part of the Maize Gene Discovery, DNA Sequencing, and Phenotypic Analysis project (grant no. DBI-9872657).
* Corresponding author; e-mail walbot{at}stanford.edu; fax 650-725-8221.
Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.010681.
This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|