|
|
||||||||
|
First published online July 15, 2009; 10.1104/pp.109.139030 Plant Physiology 151:16-33 (2009) © 2009 American Society of Plant Biologists OPEN ACCESS ARTICLE
RiceArrayNet: A Database for Correlating Gene Expression from Transcriptome Profiling, and Its Application to the Analysis of Coexpressed Genes in Rice1,[C],[W],[OA]Division of Bioscience and Bioinformatics, Myong Ji University, Yongin, Kyonggido 449–728, Korea (T.-H.L., T.T.M.P., S.I.S., J.-K.K., B.H.N.); Genomics Genetics Institute, GreenGene BioTech, Inc., Yongin, Kyonggido 449–728, Korea (T.-H.L., Y.-K.K.); Division of Applied Life Sciences, Gyeongsang National University, Jinju 660–701, Korea (K.Y.K.); Division of Molecular and Life Sciences, Pohang University of Science and Technology, Pohang 790–784, Korea (G.A.); Department of Plant Pathology, University of California, Davis, California 95616 (K.-H.J.); Department of Plant Sciences and BIO5 Institute, University of Arizona, Tucson, Arizona 85721 (D.W.G.); School of Agricultural Biotechnology, Seoul National University, Seoul 151–921, Korea (M.K.); and National Academy of Agricultural Science, Rural Development Administration, Suwon 441–707, Korea (U.-H.Y.)
Microarray data can be used to derive understanding of the relationships between the genes involved in various biological systems of an organism, given the availability of databases of gene expression measurements from the complete spectrum of experimental conditions and materials. However, there have been no reports, to date, of such a database being constructed for rice (Oryza sativa). Here, we describe the construction of such a database, called RiceArrayNet (RAN; http://www.ggbio.com/arraynet/), which provides information on coexpression between genes in terms of correlation coefficients (r values). The average number of coexpressed genes is 214, with SD of 440 at r 0.5. Given the correlation between genes in a gene pair, the degrees of closeness between genes can be visualized in a relational tree and a relational network. The distribution of correlated genes according to degree of stringency shows how each gene is related to other genes. As an application of RAN, the 16-member L7Ae ribosomal protein family was explored for coexpressed genes and gene expression values within and between rice and Arabidopsis (Arabidopsis thaliana), and common and unique features in coexpression partners and expression patterns were observed for these family members. We observed a correlation pattern between Os01g0968800, a drought-responsive element-binding transcription factor, Os02g0790500, a trehalose-6-phosphate synthase, and Os06g0219500, a small heat shock factor, reflecting the fact that genes responding to the same biological stresses are regulated together. The RAN database can be used as a tool to gain insight into a particular gene by examining its coexpression partners.
Microarray technology provides high-throughput genome-wide measurements of gene transcription levels and promises to yield insights into the biological processes involved in gene regulation. This technology has revolutionized biological research by providing opportunities for researchers to inspect gene expression across the entire genome of the organism of interest (Schena et al., 1995
A number of analytical tools have been developed to extract gene relationships and functions from microarray data. Most of these tools provide clustering methods to group genes that show similar expression patterns under well-designed experimental conditions. The methods are based on algorithms to calculate pairwise relations and similarity measures, assuming that those clustered genes are related, and they greatly simplify the analysis of huge microarray data sets (Eisen et al., 1998
Centralized data storage systems for genome-wide expression profiling have been constructed both for individual organisms and for multiple species. As an example of the former, AtGenExpress contains more than 500 data sets from experiments examining Arabidopsis (Arabidopsis thaliana) development and responses to stress, light, pathogens, and hormone responses, based on the Affymetrix ATH1 GeneChip (Schmid et al., 2005
In reality, a gene may be part of several biological processes, and its expression is subject to controls for maximum efficiency. For example, cells have evolved efficient gene expression mechanisms in response to external signals in order to adapt to changing environments. In these concerted processes, many genes are likely to be subject to coregulation: they may be induced or repressed together or inversely. The accumulation of microarray data has provided good opportunities to correlate and understand patterns of gene expression simultaneously, both individually and in relation to other genes. Efforts to evaluate gene expression in the biological context of an organism, for the complete spectrum of experimental conditions and materials reported in the database, have been promising. In the case of Arabidopsis, a model plant for dicotyledons, strong evidence suggests that related genes, such as those involved in cell wall synthesis, are coregulated (Manfield et al., 2006
Rice (Oryza sativa) is a major, and financially important, crop worldwide and has been used as a model plant for monocotyledons because of the availability of its complete genomic sequence and full-length cDNA libraries. A map-based, finished-quality sequence that covers 95% of the 389-Mb genome, including virtually all of the euchromatin and two complete centromeres, is an invaluable source for research (International Rice Genome Sequencing Project, 2005
We analyzed transcriptome profiles using the Rice 60k Microarray (Jung et al., 2005
We employed the RAN database to study coexpression patterns in rice. The distribution of the correlation coefficients according to the degree of stringency shows how closely a given gene is coexpressed with other genes in the genome. Across the entire genome, the average number of coexpressed genes is 214, with SD of 440 at r
Database Content
Expression data from 183 microarrays were collected from the samples of either wild-type or mutant rice organs, such as leaf, root, flower, and callus, at various developmental stages (Supplemental Table S1). The experiments were performed to test how gene expression is modulated and reprogrammed in response to various biotic and abiotic stresses and hormone treatments. RAN was designed to be flexible in terms of choosing query genes and finding coexpressed genes. Users can directly input the oligomer identifiers (IDs) or spot numbers used to design the Rice 60k Microarray, or their gene IDs as annotated by TIGR or RAP, in order to search for coexpressed genes. Additionally, because RAN stores oligomer matches against various sequence databases, such as GenBank NR, Swiss-Prot, and the National Center for Biotechnology Information Conserved Domain Database (Marchler-Bauer et al., 2005
The correlation information between genes is presented in three different ways. First, the gene coexpression relationships can be visualized in a cluster diagram or network, where genes that have close expression relationships form a cluster (and close network), so researchers can easily detect groups of coexpressed genes. For example, given the oligomer ID Os056379_01, representing the ribosomal protein L7Ae, Os10g0124000, a relational tree of gene expression (Fig. 1A
) represents the global degrees of correlation between genes, while a network (Fig. 1B) shows how the 36 genes are correlated with each other. In the network, genes are denoted as filled circles and located such that their proximity represents the closeness of their relationships, with colored edges showing the sign of the r values. A red edge denotes a gene pair with a positive r value, while green denotes a negative value. In addition, the color contrast and line thickness of an edge are deeper and thicker, respectively, as the absolute values of the r values increase. If a gene in the network has more correlations, then it has more edges resulting in subnetworks. Thus, the researcher can observe the gene relationships in perspective with the graphs. Below the tree view (or network view), the coexpressed genes are listed (Supplemental Table S2). Each row in the list contains the spot number, RAP2 ID, TIGR ID, the most similar Arabidopsis gene, and a cluster or group number to which the oligomer belongs. The spot numbers of genes used as seeds (input for a retrieval) in the previous section are marked with asterisks. Statistical information for each gene in the tree (or network) and the other genes is provided on separate pages as a list, with particular information for each correlation between genes in a gene pair (Table I
). Statistics on the correlation coefficients between genes are given in descending order of r values, including the significance level or P value, calculated based on a t distribution (Manfield et al., 2006
The Distribution of Correlated Genes According to Degree of Stringency
To examine the distribution of r values, we calculated the average (µ) and SD (
The Rice Ribosomal Protein L7Ae Shows Both Similar and Unique Patterns of Coexpression in a Member-to-Member Comparison with the Arabidopsis Family
If RAN accurately reflects the coexpression of genes, then ribosomal proteins might be good test candidates, as these proteins should be under a coordinated mechanism of regulation to maintain stoichiometric ratios for efficient gene expression in Arabidopsis (Barakat et al., 2001
In the 315 genes retrieved with Os10g0124000 under the parameters of r
The correlation coefficient distribution for genes varies depending on the gene, and performing coexpression analysis on different data sets could yield spurious coexpression results. It is likely that different suites of genes get turned on in response to stress or at different developmental stages. To avoid a biased analysis when performed under strict conditions for two different data sets, we selected the top-ranked 5% of the coexpressed genes for the members of the family in each database, around 1,700 out of the 33,689 genes identifiable from RAN and 1,150 out of the 22,765 genes from ACT, as described in "Materials and Methods." In the subsequent analysis, all of the members of the MCL0 group and one of the other subgroups were used. A DREB transcription factor, Os01g0968800 (DREB1F), was chosen as an external control. When rice coexpressed genes are directly compared, members in MCL0, MCL1, and MCL2 showed values of 354 to 869, and the most commonly expressed genes were from Os05g0490100 in MCL1 and Os09g0507800 in MCL2 (Table II
). In contrast, the members showed relatively low numbers, 155 to 323, with MCL3. This may reflect the divergence of members of MCL3 from the others (Supplemental Fig. S2). Compared with Os01g0968800 (DREB1F), an external control that is likely to be expressed in response to drought, the coexpressed genes produced even lower numbers, 75 to 105. The commonly coexpressed genes of MCL0, -1, and -2 groups (Os10g0124000, Os03g0241200, Os02g0728600, Os05g0490100, and Os09g0507800) included 197 genes. This number drops to 31 when the MCL3 member Os07g0150200 is considered. A similar analysis was performed for Arabidopsis. Interestingly, the members of this family from Arabidopsis belong to MCL0 and show 712 to 908 coexpressed genes among them (Table II). This average is higher than that of rice members. There are 629 commonly coexpressed genes of Arabidopsis among all the MCL0 members: AT4G12600, AT4G22380, AT5G20160, and AT5G08180. Next, we asked how many coexpressed genes of rice have homologs in the Arabidopsis counterparts. As it is difficult to define the exact counterpart in the comparison between species, we first did BLASTp analysis and considered the genes with scores of 100 or higher to be the tentative counterparts (Supplemental Table S4; Supplemental Methods). Like other members in rice, Arabidopsis members (MCL0) showed more coexpressed genes, 372 to 452, with MCL0, MCL1, and MCL2 and lower numbers, 169 to 174, with the MCL3 member Os07g0150200. Like other rice members, they also showed lower numbers, 131 to 156, with Os01g0968800 (DREB1F). Lastly, the number of commonly coexpressed genes by both rice MCL0, -1, and -2 and Arabidopsis MCL0 is 118 (Supplemental Table S7). This table shows that almost 60% of the coexpressed genes of the rice MCL0, -1, and -2 groups have counterparts in Arabidopsis. A simulation test under the same conditions was performed 100 times, and the average is around 50 and the SD is 16.6. Significance analysis showed a P value of 0, suggesting that the value 118 is very significant. Enriched GO terms were tested for the 118 genes. In molecular function, 39 genes have been given the identifier GO:0003735 (structural constituent of ribosome; fdr = 0). Fifteen and five genes have been given the codes GO:0003723 (RNA binding; fdr = 0) and GO:0051082 (unfolded protein binding; fdr = 0.0111), respectively. In the cellular component, 12, three, and three genes have been given GO:0022627 (cytosolic small ribosomal subunit; fdr = 0), GO:0005732 (small nucleolar ribonucleoprotein complex; fdr = 0.0183), and GO:0022625 (cytosolic large ribosomal subunit; fdr = 0.0110), respectively. In biological processes, 43, seven, and four genes have been given the IDs GO:0006412 (translation; fdr = 0), GO:0006457 (protein folding; fdr = 0.0081), and GO:0006364 (rRNA processing; fdr = 0.0042), respectively. As is shown with an individual member, Os10g0124000, under the condition of r
Comparison of Gene Expression of Ribosomal Protein L7Ae in Rice and Arabidopsis at Developmental Stages and in Response to Abiotic Stresses Although it is difficult to compare gene expression directly between rice, a monocot, and Arabidopsis, a dicot, we assumed that the expression pattern(s) of a gene(s) in a gene family in rice would be comparable to that (those) of Arabidopsis. A distribution of log ratios to calculate r values as shown in Figure 2A using Os10g0124000 suggested that the genes in the ribosomal protein L7Ae consistently decreased in specific organs, compared with the callus or anther (Supplemental Table S8). These values were also compared with those of Arabidopsis, obtained from microarray collections in AFGN and Genevestigator. In the comparison, microarray sets from samples as described above in the rice database were compared with those performed with samples of similar tissues at similar developmental stages of Arabidopsis, as described in Supplemental Tables S9 and S10. The log ratios of the rice genes Os10g0124000, Os03g0241200, and Os02g0728600 in MCL0, Os05g0490100 in MCL1, Os09g0507800 in MCL2, and Os07g0150200 in MCL3 were compared with those of AT4G12600, AT4G22380, AT5G20160, and AT5G08180 in MCL0 (Fig. 4 ). These genes were strongly decreased in the leaf, root, and flower, compared with the callus and in response to abscisic acid (ABA) treatment, and were relatively weakly decreased in seeds compared with the callus and in response to abiotic stresses such as drought and cold. The comparisons between rice members show positive Pearson correlations, even though they are less significant (Supplemental Table S11). Interestingly, Os07g0150200 in MCL3, which showed lower numbers of coexpressed genes with other members of the L7Ae family, shows significant correlation with Os10g0124000 in MCL0 and Os09g0507800 in MCL2. Many Arabidopsis members show positive Pearson correlations not only with other Arabidopsis members but also with rice members. In contrast, DREB1F (Os01g0968800) was compared as a control, and the gene is induced by a plant hormone, ABA, and abiotic stresses such as drought and cold, unlike the ribosomal protein L7Ae family. Os01g0968800 shows even negative correlation with most ribosomal protein L7Ae members. These data show that gene expression of ribosomal protein L7Ae in rice and Arabidopsis could be under similar control mechanisms at various developmental stages and in response to abiotic stresses.
Drought-Related Genes Might Be Coregulated
We further applied the RAN database to dissect coexpression patterns in stress-related genes, such as DREB, T6pS, and SHSP genes (Jang et al., 2003 The 11 T6pS genes were extracted from rice RAP2. OrthoMCL analysis suggests that these genes belong to a group (Supplemental Table S14; nine members shown on the chip). Os01g0730300 showed percentage identities ranging from 59.2% to 70.9% with Os05g0517200, Os09g0397300, and Os03g0224300. It was also highly identical with genes from Arabidopsis, such as AT1G23870, AT1G68020, AT1G70290, AT2G18700, and AT4G17770, with percentage identities ranging from 27.8% to 66.6% (data not shown). Os01g0730300, Os02g0790500, and Os09g0397300 have 3,789, 2,679, and 2,498 identical genes, respectively. A total of 2,622 out of the 3,789 genes of Os01g0730300 have GO terms (Supplemental Table S15). Notable GO terms include 214 GO:0050896 (response to stimulus; fdr = 0.0367) and 10 GO:0009832 (plant-type cell wall biogenesis; fdr = 0.10) as biological processes. Fifty-seven genes are given GO:0044428 (nuclear part; fdr = 0) as a cellular component. Thus, these coexpressed genes are involved in more varied biological functions than DREB. The individual genes include a CBS domain-containing protein, a PGPD14 protein, a β-galactosidase precursor, an extensin, and a protein prenyltransferase domain-containing protein, among others.
The DREBs are important transcription factors that induce a set of abiotic stress-related genes and impart stress endurance to plants. From the preceding analysis, we observed that drought-related genes might be coexpressed, with one of the highest numbers of coexpressed genes for DREB, Os01g0968800 (data not shown). We observed a similar partitioning of members of a gene family according to coexpression patterns in stress-related genes, even including SHSP genes (data not shown). Since RAN can make a relational tree from multiple seeds, we used three keywords: Os01g0968800, a DREB1; Os02g0790500, a T6pS; and Os06g0219500, a SHSP. RAN produced a dense relational tree with these three keywords (Fig. 5A
). The tree shows not only the close relationship between these three genes with short edges but also the genes correlated with each. The relational tree shows that T6pS and SHSP have a close expressional correlation with DREB1 but that their expressional correlation with each other is relatively distant. In fact, the r values between T6pS and DREB1 and between SHSP and DREB1 are 0.529 and 0.614, respectively, while the r value between T6pS and SHSP is 0.394. For Os10g0124000, a ribosomal protein, L7Ae also has a relatively lower r value, 0.348, with Os01g0968800. Fewer coexpressed genes at r
Following the introduction of microarray technology in biology, much interest and effort have been devoted to describing the gene expression patterns of individual genes or groups of genes across genomic scales (Boldrick et al., 2002 0.5; many of these coexpressed genes encode ribosomal proteins, suggesting that these proteins are expressed in stoichiometric ratios for efficient translation, as seen in Arabidopsis. In contrast, other members of this gene family show lower numbers of coexpressed genes than seen for Os10g0124000 and Os03g0241200, indicating that coexpression can be partitioned across a gene family. To avoid bias in the analysis when performed under strict conditions for two different data sets, we selected the top-ranked 5% of the correlated genes of a "seed gene" in each database and compared the coexpressed genes within and between species. Major rice groups (MCL0–MCL2) have around 350 to 870 (1%–2%) genes within the species, while Arabidopsis has 710 to 910 (3%–4%) within the species. Rice has 12 members in the family and likely evolved more divergently between members. It is notable that an MCL3 member, Os07g0150200, which is further diverged in the phylogenetic tree, shares the lowest number of coexpressed genes not only with rice members but also with Arabidopsis members. Indeed, in comparison with Os01g0968800 (DREB1F), an external control that is likely to be expressed in response to drought, the coexpressed genes showed even lower numbers, while there are 118 commonly coexpressed genes by both rice major MCL groups and Arabidopsis MCL0. Analyses of the coexpressed genes by GO enriched terms, either by an individual gene (e.g. Os10g0124000) or by groups of genes, show that most of these GO terms are predominantly involved in protein synthesis, as protein binding, ribosomal constituents, and ribosomal RNA synthesis are included in these categories.
To evaluate the significance of the r value, a P value was generally used, taking a statistical perspective (Manfield et al., 2006
It is a challenge to determine how well the r value reflects the real correlation. We first adopted an r value from the distribution of the numbers of genes among eight r value bins. As the r value bins decrease from 0.7 to 0.2 in the coinduced regions (r values of 0.3–0.7 in Fig. 4), the skew in the number of genes in each of the eight r value bins shifts from right to left. In particular, the number of genes with coexpressed genes in the region (0, 100) is maximized at r
At least 18 genes in the family of rice and Arabidopsis were retrieved and first grouped with OrthoMCL. The program identifies identical proteins based on sequence similarities and distinguishes orthologs from paralog relationships without intensive computational phylogenetic analysis. The result suggests that the 16 members consist of four groups. For each individual gene, the coexpressed genes were compared with the Arabidopsis genes. While the numbers of correlated genes of rice in MCL0 range from 55 to 345 at r Beyond revealing the coexpressed gene characteristics of gene families, RAN can also be used for comparisons of gene expression patterns between rice and Arabidopsis. The gene expression values, as exemplified in the case of Os10g0124000 (Fig. 2), suggest the gene expression of the family is comparable with those values of genes from the Arabidopsis microarray collections such as AFGN and Genevestigator. The results show that, for both species, transcripts from the gene encoding a ribosomal protein L7Ae consistently decreased in specific organs compared with callus or anther tissues and decreased only slightly in seeds compared with callus and in response to abiotic stresses such as drought and cold (Fig. 4). These data show that RAN not only reveals the coexpressed gene characteristics of the gene family but can also be used in the comparison of gene expression patterns between rice and Arabidopsis.
As a model dicot plant, the Arabidopsis genome was sequenced (Arabidopsis Genome Initiative, 2000
We also analyzed genes involved in the drought stress response in rice. As stress responses are complicated processes involving transcription factors, enzymes, and effectors (Agarwal et al., 2006
The DREBs are important transcription factors that induce a set of abiotic stress-related genes and impart stress endurance to plants. The two DREB transcription factors, DREB1 and DREB2, are involved in two separate signal transduction pathways that respond under conditions of low temperature and dehydration, respectively (Agarwal et al., 2006
Several Web-based microarray analysis tools are currently available (Owen et al., 2003
Phylogenetic analysis by comparison of whole genome sequences suggests that flowering plants such as Arabidopsis and rice followed their own evolutionary paths after the monocot-dicot divergence from a common ancestral angiosperm (Bowers et al., 2003
RAN is a data resource that provides information on coexpressed genes in rice, based on two-dye microarray data. The closeness of coexpression between two genes is represented by the correlation coefficient and the statistical significance of the r value. The correlated gene groups are conveniently depicted in a relational tree and a relational network. RAN not only reveals the coexpressed gene characteristics of the gene family but can also be used in the comparison of gene expression between rice and Arabidopsis. Coexpression patterns in stress-related genes responding to the same biological pressures are shown to be regulated together. These results show that data obtained from a given experimental design could be cross-checked in RAN, and a new experiment could then easily be designed. Moreover, RAN is designed to be extended to other related plants, and the same database structure can be applied to construct a comprehensive resource on expression correlation between genes in any organism.
Microarray Data
As of April 2008, expression data from 183 microarrays were collected using either wild-type or mutant rice (Oryza sativa) organs, such as the leaf, root, flower, and callus, at various developmental stages. Various treatments were applied to the plant under research conditions. These include biotic and abiotic stresses and hormones (Supplemental Table S1). The microarray data are processed as described previously (Jung et al., 2005
Noncorrelation of the signal and background intensities is confirmed by plotting base 2 log background intensity on the x axis and base 2 log intensity subtracted from background intensity on the y axis. Before normalization, the normal distribution and linear relations of the Cy3 and Cy5 intensities are tested by qqplot and a linear regression model, respectively, in the R statistical language (http://cran.r-project.org). The spatial effects on the chip during the hybridization process are checked with spatial.func in the SMA package. The variance differences between the Cy3 and Cy5 intensities within the microarray are tested with the t test under the assumptions of both uniform and nonuniform variances. One- and two-way ANOVAs of the signal intensity differences between microarrays were performed. The median pixel intensities are transformed as log ratios with base 2 and then adjusted by block-by-block Lowess normalization for each slide (Yang et al., 2002
Microarray data are collected and processed as described previously (Jung et al., 2005
An additional standard score, the Z score, of the r value of a gene pair (designated "query pair") is calculated from the r values of all gene pairs including a given gene to determine the Z score in the query pair and the rest of the 58,416 genes. According to the central limit theorem, the distribution of all possible r values of a gene should be a normal distribution because the sample size is large enough, and we confirmed that most of the distributions are normal (Supplemental Table S4). Thus, the Z score is calculated with the mean (µ) and SD (
To draw a relational tree, RAN first searches relational genes with a minimum absolute r value and a depth set by a researcher from each gene (denoted as the seed) among the query genes selected by the researcher and makes a list from all the genes together. An option for "depth" is provided that determines the genes directly coexpressed: primary with the input gene and secondary with the genes coexpressed with the primary gene. Subsequently, a distance table, based on the r values between all the listed genes, is created. The distance (designated as d) between any two genes is determined by the formula d = 1 – | r | (D'haeseleer, 2005
Genes in a family were retrieved from RAP2 rice genome annotation (http://rapdb.dna.affrc.go.jp/) or TAIR Arabidopsis (Arabidopsis thaliana) genome annotation version 8 (http://www.arabidopsis.org/) using keywords. To draw the phylogenic tree, a two-step analysis is applied. First, ortholog groups are tested with OrthoMCL (http://www.orthomcl.org/). This analysis compartmentalizes the family into two groups. Second, the amino acid sequences of each group are aligned with ClustalW, and then a distance matrix of the alignment is calculated using the protdist program in the Phylib package. The matrix is transformed into a tree by the neighbor program. The tree is tested by bootstrap 1,000 by the seqboot program. The bootstrapping values are reported in place of the branch lengths.
To test coexpressed genes with a ribosomal protein L7Ae, Os10g0124000 (oligomer ID Os056379_01; spot no. B11032220), lists of clustered genes were retrieved using Os10g0124000 (oligomer ID Os056379_01; spot no. B11032220) as the input word or "seed," as shown in Figure 1.
To test coexpressed genes in the ribosomal protein L7Ae family within and between rice and Arabidopsis, the top-ranked 5% of coexpressed genes within each family were retrieved from RAN and ACT. To reduce the variation caused by oligomers from predicted genes in the Rice 60k Microarray, only the 33,689 oligomers that unanimously matched the recently annotated RAP2 are used. In the Arabidopsis design, ATH1 contains 22,765 genes that are positively identified by the chip design. The numbers of the top-ranked 5% of genes are 1,684 and 1,138 for rice and Arabidopsis, respectively. These numbers of genes are used to compare coexpressed genes with the ribosomal protein L7Ae family within and between rice and Arabidopsis. Os01g0968800 (DREB1F), known to be expressed in response to drought, is used as an external control. In the comparison of the coexpressed genes between species, BLASTp analysis is performed for the two species, and genes with scores of 100 or higher are considered to be the tentative counterparts (Supplemental Table S4). GO analysis was performed with GoMiner (Ashburner et al., 2000
Rice microarray log ratios were retrieved for those used in the calculation of r values as shown in Figure 2A. Initial analysis using Os10g0124000 suggests that the gene for the ribosomal protein L7Ae consistently decreased in specific organs, compared with the callus or anther. We also searched the expression values of the gene under conditions of plant hormone, ABA, and abiotic stresses such as drought and cold to test the ribosomal protein expression. A drought-responsive transcription factor, Os01g0968800, was also searched. The microarray sets ranged from 3 to 12. The distribution of the correlation coefficients for these genes varies in response to stress or at different developmental stages. To avoid bias in the analysis, microarray collections in AFGN and Genevestigator were searched for those sets performed with similar organs at similar developmental stages and experimental conditions (Supplemental Table S9). For example, organs and tissues are directly retrieved from the average values from the Web site. The lemma and palea of rice are sepal equivalents in Arabidopsis that nourish and protect florets and developing kernels. As experiments for drought stress, RNA samples are prepared from rice leaves after stress treatment for 2 to 6 h, and the values for drought for Arabidopsis are retrieved from the values marked as "Stress: drought_green_early," in which leaves are harvested at 0.5, 1, and 3 h after the onset of treatment. As microarray data in RAN are prepared by the two-dye method depicted in Figure 4, the microarray values of Arabidopsis are compared with the control rice data set and transformed by log2 transformation.
The following materials are available in the online version of this article.
We thank Drs. I.W. Manfield and J.R. Bradford of ACT at the University of Leeds for their helpful advice. Received March 27, 2009; accepted July 6, 2009; published July 15, 2009.
1 This work was supported by the Crop Functional Genomics Center of the Frontier Research Program, funded by the Ministry of Science and Technology (grant no. CG1210 to M.K. and grant no. CG1122 to B.H.N.), by the BioGreen21 Program (grant no. 20070401034008 to Y.-K.K. and grant no. 20090101060028 to B.H.N.), by the Rural Development Administration of the Republic of Korea, and by the Brain Korea 21 Project (grants to T.-H.L. and B.H.N.).
2 These authors contributed equally to the article. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Baek Hie Nahm (bhnahm{at}mju.ac.kr).
[C] Some figures in this article are displayed in color online but in black and white in the print edition.
[W] The online version of this article contains Web-only data.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.109.139030 * Corresponding author; e-mail bhnahm{at}mju.ac.kr.
Agarwal PK, Agarwal P, Reddy MK, Sopory SK (2006) Role of DREB transcription factors in abiotic and biotic stress tolerance in plants. Plant Cell Rep 25: 1263–1274[CrossRef][Web of Science][Medline] Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815[CrossRef][Medline] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al (2000) Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29 Barakat A, Szick-Miranda K, Chang IF, Guyot R, Blanc G, Cooke R, Delseny M, Bailey-Serres J (2001) The organization of cytoplasmic ribosomal protein genes in the Arabidopsis genome. Plant Physiol 127: 398–415 Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37: D885–D890 Boldrick JC, Alizadeh AA, Diehn M, Dudoit S, Liu CL, Belcher CE, Botstein D, Staudt LM, Brown PO, Relman DA (2002) Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc Natl Acad Sci USA 99: 972–977 Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433–438[CrossRef][Medline] Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element detection using correlation with expression. Nat Genet 27: 167–171[CrossRef][Web of Science][Medline] Clark TA, Sugnet CW, Ares M Jr (2002) Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296: 907–910 d'Erfurth I, Jolivet S, Froger N, Catrice O, Novatchkova M, Simon M, Jenczewski E, Mercier R (2008) Mutations in AtPS1 (Arabidopsis thaliana parallel spindle 1) lead to the production of diploid pollen grains. PLoS Genet 4: e1000274[CrossRef][Medline] D'haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23: 1499–1501[CrossRef][Web of Science][Medline] Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868 Felsenstein J (1989) PHYLIP: Phylogeny Inference Package (version 3.2). Cladistics 5: 164–166 Freeling M, Thomas BC (2006) Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res 16: 805–814 Goda H, Sasaki E, Akiyama K, Maruyama-Nakashita A, Nakabayashi K, Li W, Ogawa M, Yamauchi Y, Preston J, Aoki K, et al (2008) The AtGenExpress hormone and chemical treatment data set: experimental design, data evaluation, model data analysis and data access. Plant J 55: 526–542[CrossRef][Medline] Higo K, Ugawa Y, Iwamoto M, Korenaga T (1999) Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res 27: 297–300 Hirai MY, Sugiyama K, Sawada Y, Tohge T, Obayashi T, Suzuki A, Araki R, Sakurai N, Suzuki H, Aoki K, et al (2007) Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis. Proc Natl Acad Sci USA 104: 6478–6483 Horan K, Jang C, Bailey-Serres J, Mittler R, Shelton C, Harper JF, Zhu JK, Cushman JC, Gollery M, Girke T (2008) Annotating genes of known and unknown function by large-scale coexpression analysis. Plant Physiol 147: 41–57 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436: 793–800[CrossRef][Medline] Jang IC, Oh SJ, Seo JS, Choi WB, Song SI, Kim CH, Kim YS, Seo HS, Choi YD, Nahm BH, et al (2003) Expression of a bifunctional fusion of the Escherichia coli genes for trehalose-6-phosphate synthase and trehalose-6-phosphate phosphatase in transgenic rice plants increases trehalose accumulation and abiotic stress tolerance without stunting growth. Plant Physiol 131: 516–524 Jen CH, Manfield IW, Michalopoulos I, Pinney JW, Willats WG, Gilmartin PM, Westhead DR (2006) The Arabidopsis co-expression tool (ACT): a WWW-based tool and database for microarray-based gene expression analysis. Plant J 46: 336–348[CrossRef][Web of Science][Medline] Jiao Y, Ma L, Strickland E, Deng XW (2005) Conservation and divergence of light-regulated genome expression patterns during seedling development in rice and Arabidopsis. Plant Cell 17: 3239–3256 Jung KH, Han MJ, Lee YS, Kim YW, Hwang I, Kim MJ, Kim YK, Nahm BH, An G (2005) Rice Undeveloped Tapetum1 is a major regulator of early tapetum development. Plant Cell 17: 2705–2722 Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28: 27–30 Kilian J, Whitehead D, Horak J, Wanke D, Weinl S, Batistic O, D'Angelo C, Bornberg-Bauer E, Kudla J, Harter K (2007) The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant J 50: 347–363[CrossRef][Web of Science][Medline] Koo AJ, Chung HS, Kobayashi Y, Howe GA (2006) Identification of a peroxisomal acyl-activating enzyme involved in the biosynthesis of jasmonic acid in Arabidopsis. J Biol Chem 281: 33511–33520 Kotak S, Larkindale J, Lee U, von Koskull-Doring P, Vierling E, Scharf KD (2007) Complexity of the heat stress response in plants. Curr Opin Plant Biol 10: 310–316[CrossRef][Web of Science][Medline] Lee SI, Batzoglou S (2003) Application of independent component analysis to microarrays. Genome Biol 4: R76[CrossRef][Medline] Levitzki A, Gazit A (1995) Tyrosine kinase inhibition: an approach to drug development. Science 267: 1782–1788 Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178–2189 Li M, Moyle H, Susskind MM (1994) Target of the transcriptional activation function of phage lambda cI protein. Science 263: 75–77 Li X, Duan X, Jiang H, Sun Y, Tang Y, Yuan Z, Guo J, Liang W, Chen L, Yin J, et al (2006) Genome-wide analysis of basic/helix-loop-helix transcription factor family in rice and Arabidopsis. Plant Physiol 141: 1167–1184 Manfield IW, Jen CH, Pinney JW, Michalopoulos I, Bradford JR, Gilmartin PM, Westhead DR (2006) Arabidopsis Co-expression Tool (ACT): Web server tools for microarray-based gene expression analysis. Nucleic Acids Res 34: W504–W509 Mangelsen E, Kilian J, Berendzen KW, Kolukisaoglu UH, Harter K, Jansson C, Wanke D (2008) Phylogenetic and comparative gene expression analysis of barley (Hordeum vulgare) WRKY transcription factor family reveals putatively retained functions between monocots and dicots. BMC Genomics 9: 194[CrossRef][Medline] Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, et al (2005) CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res 33: D192–D196 Mutwil M, Obro J, Willats WG, Persson S (2008) GeneCAT: novel webtools that combine BLAST and co-expression analyses. Nucleic Acids Res 36: W320–W326 Obayashi T, Hayashi S, Saeki M, Ohta H, Kinoshita K (2009) ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic Acids Res 37: D987–D991 Owen AB, Stuart J, Mach K, Villeneuve AM, Kim S (2003) A gene recommender algorithm to identify coexpressed genes in C. elegans. Genome Res 13: 1828–1837 Qin F, Sakuma Y, Tran LS, Maruyama K, Kidokoro S, Fujita Y, Fujita M, Umezawa T, Sawano Y, Miyazono K, et al (2008) Arabidopsis DREB2A-interacting proteins function as RING E3 ligases and negatively regulate plant drought stress-responsive gene expression. Plant Cell 20: 1693–1707 Rawat A, Seifert GJ, Deng Y (2008) Novel implementation of conditional co-regulation by graph theory to derive co-expressed genes from microarray data. BMC Bioinformatics (Suppl 9) 9: S7 Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, et al (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31: 224–228 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425[Abstract] Sakuma Y, Maruyama K, Osakabe Y, Qin F, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2006) Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-responsive gene expression. Plant Cell 18: 1292–1309 Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467–470 Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37: 501–506[CrossRef][Web of Science][Medline] Slonim N, Atwal GS, Tkacik G, Bialek W (2005) Information-based clustering. Proc Natl Acad Sci USA 102: 18297–18302 Srinivasasainagendra V, Page GP, Mehta T, Coulibaly I, Loraine AE (2008) CressExpress: a tool for large-scale mining of expression data from Arabidopsis. Plant Physiol 147: 1004–1016 Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J (2004) CSB.DB: a comprehensive systems-biology database. Bioinformatics 20: 3647–3651 Takabayashi A, Ishikawa N, Obayashi T, Ishida S, Obokata J, Endo T, Sato F (2009) Three novel subunits of Arabidopsis chloroplastic NAD(P)H dehydrogenase identified by bioinformatic and reverse genetic approaches. Plant J 57: 207–219[CrossRef][Web of Science][Medline] Torchia J, Rose DW, Inostroza J, Kamei Y, Westin S, Glass CK, Rosenfeld MG (1997) The transcriptional co-activator p/CIP binds CBP and mediates nuclear-receptor function. Nature 387: 677–684[CrossRef][Medline] Toufighi K, Brady SM, Austin R, Ly E, Provart NJ (2005) The Botany Array Resource: e-northerns, expression angling, and promoter analyses. Plant J 43: 153–163[CrossRef][Web of Science][Medline] Wang Q, Guan Y, Wu Y, Chen H, Chen F, Chu C (2008) Overexpression of a rice OsDREB1F gene increases salt, drought, and low temperature tolerance in both Arabidopsis and rice. Plant Mol Biol 67: 589–602[CrossRef][Web of Science][Medline] Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30: e15 Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, et al (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4: R28[CrossRef][Medline] Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) Genevestigator: Arabidopsis microarray database and analysis toolbox. Plant Physiol 136: 2621–2632 Zmasek CM, Eddy SR (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics 17: 383–384
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|