Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches.

Gene-to-gene coexpression analysis provides fundamental information and is a promising approach for predicting unknown gene functions in plants. We investigated various associations in the gene expression of tomato (Solanum lycopersicum) to predict unknown gene functions in an unbiased manner. We obtained more than 300 microarrays from publicly available databases and our own hybridizations, and here, we present tomato coexpression networks and coexpression modules. The topological characteristics of the networks were highly heterogenous. We extracted 465 total coexpression modules from the data set by graph clustering, which allows users to divide a graph effectively into a set of clusters. Of these, 88% were assigned systematically by Gene Ontology terms. Our approaches revealed functional modules in the tomato transcriptome data; the predominant functions of coexpression modules were biologically relevant. We also investigated differential coexpression among data sets consisting of leaf, fruit, and root samples to gain further insights into the tomato transcriptome. We now demonstrate that (1) duplicated genes, as well as metabolic genes, exhibit a small but significant number of differential coexpressions, and (2) a reversal of gene coexpression occurred in two metabolic pathways involved in lycopene and flavonoid biosynthesis. Independent experimental verification of the findings for six selected genes was done using quantitative real-time polymerase chain reaction. Our findings suggest that differential coexpression may assist in the investigation of key regulatory steps in metabolic pathways. The approaches and results reported here will be useful to prioritize candidate genes for further functional genomics studies of tomato metabolism.


Introduction
23.5% involving a stress condition, 23.5% a genotype, and 17.6% a pathogen ( Figure 2A). They also corresponded to different organs, including fruits (44%), leaves (25.1%), and roots (11.7%) ( Figure 2B). Based on the organ type in the metadata, we divided the original expression data matrix (307 arrays × 9,989 probe-sets, defined as "all data") into 3 sub-matrices that we called "leaves", "fruits", and "roots". We used these matrices, as well as "all data", in further analyses ( Figure 1). The distribution of the 3 expression matrices appeared normal (Supplemental Figure S1A). To construct the coexpression networks, we calculated correlation matrices using Pearson's correlation coefficients. The correlation coefficients for each organ were also in normal distribution (Supplemental Figure S1B).

The Topological Characteristics of Tomato Coexpression Networks Are Highly Heterogenous
We next constructed tomato coexpression networks. To evaluate whether they manifested the common properties of a complex network, such as power-law degree distribution, we investigated the topological characteristics of the network (e.g., degree distribution and average path length) (Table I). We found that the degree distribution of the coexpression network followed a power law ( Figure   3A). In the case of the coexpression network with r ≥ 0.6 (p < 2.1e-31, Pearson's correlation statistical test), the degree distribution of "all data" network followed a power-law with a degree exponent of γ = 1.67. Here, P(k) ~ k −γ , where γ represents the degree exponent. The average path length and the average clustering coefficient of this network were 2.65 and 0.45, respectively, implying small-world properties and high modularity in the network (see review by (Barabasi and Oltvai, 2004)). Figure 3B shows a partial coexpression network generated using "all data"; it was based on the list of first-order genes that neighbored and were coexpressed with the LeMADS-Rin gene, which encodes a MADS box transcriptional factor regulating fruit ripening-related genes (Giovannoni et al., 1995;Vrebalov et al., 2002). The network shows how the neighboring genes of LeMADS-Rin correlate with each other.

Response to cold
Module 435, consisting of 33 genes, was related to "response to cold" (FDR = 4.3e-2). This module included genes encoding cytosolic ascorbate peroxidase 2 and dehydroascorbate reductase 2. These genes are involved in the ascorbate-glutathione cycle that scavenges reactive oxygen species, particularly H 2 O 2 .

Jasmonic acid metabolic process
Module 457 (8 genes) was involved in the "jasmonic acid metabolic process" (FDR = 8.7e-3). It contained the LeMTS1 gene encoding tomato monoterpene synthase. A homeobox knotted-1-like gene, LeT6 (Chen et al., 1997), was also included in this module. The LeT6 gene is essential for meristem maintenance and the process of leaf initiation (see Discussion).

Comparative Analyses of Topological Characteristics in Tomato Transcriptome Coexpression in Different Organ Datasets
To obtain further insight into the tomato transcriptome, we employed a comparative approach to organ-specific coexpression. Using 3 sub-datasets, "leaves", "fruits", and "roots", we first assessed the topological characteristics of each coexpression network. As shown in Figure 5A, the degree distribution of each network (r ≥ 0.6) was indicative of a typical power-law distribution in a wide range of values for the degree k. We also calculated several graph-theoretic statistics including the number of edges, average path length, and clustering coefficient (for example, see (Barabasi and Oltvai, 2004)) ( Table I). In the "roots" dataset, we observed that the highest number of edges and the highest clustering coefficient were 5,997,644 and 0.70, respectively. The average path length in 3 organs was smaller than 3, indicating that these networks are characterized by small-network property. To assess the degree of node overlap 1 0 among the 3 datasets, we investigated similarity scores for nodes based on their connection partners using the Jaccard coefficient, which assesses the interaction between two graphs by assessing the tendency that links simultaneously are present in both graphs. Figure 5B shows the relationship between the average of the Jaccard coefficient and the correlation coefficient, and it indicates that the larger the cut-off of the correlation coefficient, the smaller the average similarity. In this graph, we observed that the roots dataset had the highest Jaccard coefficient in all ranges.

Direct Measures of Differential Coexpressions in the Tomato Transcriptome
We next focused on changes in coexpression patterns between gene expression levels. We posited that a significant coexpression between given genes may be found under one condition but not another and that the changes elicited under one condition may be reversed under the other condition. An example of changed coexpressions is shown in Figure 6. The correlation coefficient between Les.107.1.S1_at (SGN-U565450, cyclin A2) and Les.1081.1.S1_at (SGN-U577270, putative ribosomal protein) was negative (r = -0.66) in leaves but, positive (r = 0.71) in fruits. To compare coexpression patterns among organs, we first visualized the 3 correlation matrices using pseudo-heatmaps (Supplemental Figure S2). While genes involved in photosynthesis were highly coexpressed in leaves and fruits, in roots they showed no remarkable coexpression within the pathway. For other pathways, root data displayed overall positive coexpressions.
To obtain the difference between coexpressions in the 3 organs, we used Fisher's Z-transformation. This approach can test directly for differential coexpression by testing, for example, the null hypothesis H 0 : r L = r F . Here, r L and r F indicate that coexpression is calculated over the leaves and fruits samples, respectively (see Methods). In 3 comparisons, "fruits" vs. "roots", "leaves" vs. "fruits", and "leaves" vs. "roots", the numbers of significantly different correlation pairs were 753,133, 826,190, and 395,814, respectively (FDR < 1e-10) ( Figure  7). Full lists of the differential coexpressions are shown in Supplemental Data 2. Using GO terms, we characterized the gene pairs with significant differential coexpressions (Table IV and Supplemental Table S4). For example, in the transition from r > 0.7 in leaves to r < -0.7 in fruits, the differential coexpression 1 1 included genes associated with cell wall modifications (FDR = 7.37e-03), while in the opposite transition (r < -0.7 in leaves to r > 0.7 in fruits), it included genes involved in flower development . In the case of the GO term "flower development", we observed 17 annotated probes including LeMADS-Rin and other genes encoding the MADS-box protein (Supplemental Table S4).

Duplicated Genes Show a Small But Significant Number of Differential Coexpressions
We further investigated whether differential coexpressions exist between duplicated genes. We first classified the all probe-set "target" sequences into similarity clusters (referred to as gene families; see Methods). Consequently, 1677 obtained clusters were regarded as gene families in this analysis. For 1677 gene families including duplicated genes we calculated the number of significantly different coexpressions (FDR < 1e-10). The numbers of duplicated genes that were differentially coexpressed among organs is listed in Table V and  Supplemental Table S5. Overall, we were able to observe a small but significant number of differential coexpression relationships (Table V). For example, a gene family, Markov Cluster (MC)100, was characterized by the GO terms "photosynthesis, light harvesting (FRD = 6.1e-5)" and "protein-chromophore linkage (FDR = 6.1e-5)". In MC100, there were 19 differentially coexpressed pairs between leaves and roots, while there were 41 pairs between fruits and roots.
To gain further insights into genes with significant differential coexpression, we investigated the number of significant differential coexpressions (FDR < 1e-10) for each of the 256 metabolic pathways from the LycoCyc (Bombarely et al., 2011) (Table VI and Supplemental Table S6). For example, there were 15 differential coexpressions in "glycolysis IV (plant cytosol)" between fruits and roots but only 3 differential coexpressions between leaves and roots. The coexpression patterns in biosynthetic pathways associated with the Calvin cycle, sugar degradation, photorespiration, and the tricarboxylic acid (TCA) cycle, as well as glycolysis, were usually different among organs. In comparisons between leaves and fruits, but not in other comparisons, there were 3 differentially coexpressed pairs in anthocyanine biosynthesis. On the other hand, gene pairs in flavonoid biosynthesis were different when compared to roots. We also studied duplicated genes encoding isozymes that 1 2 were primarily derived from LycoCyc. Most of these genes were inferred from computational analysis without any human curation. As shown in Supplemental Table S7, we observed quite a few pairs with significantly different coexpression (FDR < 1e-10) between duplicated genes (isoforms). For example, the number of significantly different coexpression genes encoding beta-fructofuranosidase was 4 of 528 possible gene pairs.

Differential Coexpressions Provide Clues of Key Regulatory Steps in Metabolic Pathways
To interpret differential coexpression in metabolic pathways, we investigated, in detail, 2 biosynthetic pathways, lycopene and flavonoid, in a proof-of-concept study.

Lycopene biosynthesis
The changes in coexpressions between organs involved in carotenoid biosynthesis are of particular interest because this pathway contributes to the production of lycopene and β-carotene, which can be utilized as indicators of tomato quality. The mapping of differential coexpressions between leaves and fruits onto a lycopene biosynthesis pathway ( short-chain dehydrogenase/reductase (SDR) homolog [LesAffx.68802.1.S1_at (SGN-U580225)] was highly positive (r = 0.72) in fruits and, mildly negative (r = -0.31) in leaves.

Independent Experimental Verification of The Microarray Datasets Using qRT-PCR
To independently confirm the expression profiling data, 6 expressed genes identified in our microarray results were assayed with qRT-PCR at 7 experimental conditions using the same RNA sample that was used for our original experiment consisting of 20 microarrays (see Methods). We chose 6 genes that were involved in photosynthesis, flavonoid biosynthesis, and carotenoid biosynthesis, which corresponded with 3 types of expression pattern: (1) leaf-specific (2) fruit-specific and (3) moderate ( Figure 10A). We verified that the expression patterns were consistent between microarray and qRT-PCR analyses and showed very good reproducibility (Supplemental Figure S3). Thus, we can state the followings: (i) the siginificant coexpression in Module 4, which includes the genes zep and psad, was reproducible (r = 0.84, p = 3.63e-6 in Figure  supports, at least in part, our approach (see also Figure 9). These verification results suggest that the resulting coexpressions derived from 307 collected datasets, as well as each sub-dataset ("leaves", "fruits", and "roots"), are relevant, although we must note coexpression between genes with low-level expression.

Discussion
We presented a comprehensive coexpression network analysis based on over 300 tomato microarrays and investigated the topological properties of the network. The degree distribution of the coexpression network followed the heterogeneous power law. The sub-datasets from 3 organs, leaf-, fruit-, and root samples, also exhibited high heterogeneity and modularity (e.g., power-law degree distributions and small-world properties). This result indicates that the properties of the network are consistent with earlier reports on typical coexpression networks (Stuart et al., 2003). We constructed genome-wide tomato coexpression networks with a correlation cut-off value of r ≥ 0.6 (p < 2.1e-31, Pearson's correlation statistical test) using "all data". Our threshold selection was supported, at least partially, by earlier studies. Aoki and colleagues showed that the Arabidopsis coexpression network had a minimal network density at a cut-off r value ranging from 0.55 to 0.60 (Aoki et al., 2007). Using simulated datasets, Elo and colleagues demonstrated that a cut-off value of r = 0.6 yielded both low error and high reproducibility (Elo et al., 2007).
We clustered densely coexpressed genes by graph clustering, a non-targeted approach that divides the tomato coexpression network into gene modules. We detected 465 coexpression modules and assessed them by GO term enrichment analysis. A coexpression module-based approach applied to 67 microarrays from 24 different tissues in tomato provided strong predictions for gene functions, such as the pathways involved in flavonoid biosynthesis (Ozaki et al., 2010). Based on the evolutionary conservation of coexpressions in the Solanaceae family, Miozzi and colleagues developed the data-mining tool, ORTom, which predicts functional relationships among tomato genes (Miozzi et al., 2010). The Tomato Functional Genomics Database (TFGD) offers a web-based retrieval tool for coexpressed genes (Fei et al., 2011). There are several differences between our approach and these previous studies: First, our study involved a much larger number of microarrays (>300) and covered wider experimental conditions, including various stress experiments. Second, we www.plantphysiol.org on January 6, 2018 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved. 1 5 introduced differential coexpression into our plant transcriptome study. In addition, the extracted modules carried many unknown genes ( Figure 4). Finally, the expression profiles of 6 selected genes were independently verified by qRT-PCR (Supplemental Figure S3 and Figure 10).
The coexpression-module-based approach by graph clustering is an important approach to facilitate predictions (Mentzen and Wurtele, 2008;Atias et al., 2009;Fukushima et al., 2009;Mao et al., 2009). In our previous works (Fukushima et al., 2009;Fukushima et al., 2011), we used DPClus-based graph clustering to characterize gene coexpression networks and metabolomic correlation networks. Although we believe that DPClus is also useful for gene coexpression networks, it has a limitation on the number of nodes (limitation: n < 5,000). To this end, we chose IPCA (Li et al., 2008) that enables us to perform the graph clustering for larger-scale networks in this study. We demonstrated that 88% of all coexpression modules in the tomato were assigned by GO terms, although the modules to which we assigned GO terms overlapped markedly. The tomato genome sequencing project and enhanced genomic annotations in tomato (Barone et al., 2008;Bombarely et al., 2011) will improve such characterizations of coexpression modules. We highlighted 4 modules, Module 1, Module 4, Module 435, and Module 457 (Table II). The coexpression between genes encoding cyclin B1 and histone H4 in Module 1 (GO term: DNA endoreduplication) has been observed in Arabidopsis [See ATTED-II database; (Obayashi et al., 2011)]. Module 4 (GO term "photosynthesis") is consistent with previous reports in Arabidopsis and rice (Mentzen and Wurtele, 2008;Fukushima et al., 2009). Genes in Module 435 (GO term "response to cold") are reasonable in the sense that abiotic stresses, including cold stress, are related to oxidative stress that perturbs almost all functions in a plant cell. Module 457 (GO term "Jasmonic acid metabolic process") was also relevant because it included the LeMTS1 gene. According to van Schie and colleagues, the expression of the gene in tomato leaves was induced by jasmonic acid treatment, and this gene encoded a linalool synthase (van Schie et al., 2007). We also observed the LeT6 gene (Chen et al., 1997) in this module. Although both genes were highly coexpressed, their relationship is currently unclear. Using the arf6arf8 double mutant, Tabata and his colleagues showed that Arabidopsis auxin response factors 6 and 8 regulate jasmonic acid biosynthesis and floral organ development via class 1-KNOX genes (Tabata et al., 2009). This mutant is similar to that found in plants harboring a mutation associated with the www.plantphysiol.org on January 6, 2018 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved.
1 6 biosynthetic pathway of jasmonic acid, which causes morphologic abnormality in floral organ development. Our clustering results may suggest a yet unknown transcriptional coordination between jasmonic acid biosynthesis and KNOX genes in tomato. The predominant functions of the coexpression modules detected here show biological relevance. Our top-down approach revealed good candidates for functional modules in the tomato transcriptome. That is, when using the "guilt-by-association" principle, such a coexpression-module-based approach can help to predict the function of unknown genes within a module, although coexpressed genes are not necessarily involved in the same biological process.
We studied differential coexpressions among tomato leaves, fruits, and roots using Fisher's Z-transformation. Direct measurements showed that duplicated genes as well as metabolic genes exhibit a small but significant number of differential coexpressions. Gene clusters assigned the GO term "flower development" as an example of switching coexpression, included LeMADS-Rin (Table IV and Supplemental Table S4). Regarding the link between the identification of differential coexpression and the identification of coexpressed gene modules, we describe that these are complementary relationship. The reason of this statement is 2-fold: Because a gene coexpression can be interpreted as a 'fingerprint' of the underlying transcriptional network, it can be used to compare 2 physiological states of cellular systems. That is, differential coexpression approach leads to a systematic difference in transcriptomic levels among such organs. Secondly, the comparative coexpression study here provides a way to use the observed coexpressions to find additional information about the plant transcriptome. It is obvious that identifying coexpressed gene modules is not sufficient, because differentially coexpressed gene groups also reflect the changes in underlying transcriptional regulation.
We also demonstrated the existence of several pairs within LycoCyc pathway (enzymatic isoforms) that had significantly different coexpression between duplicated genes. In our proof-of-concept study, we investigated various patterns of differential coexpressions in 2 biosynthetic pathways associated with lycopene and flavonoid. In the lycopene biosynthesis pathway, we identified 4 significant differences in coexpression between leaves and fruits ( Figure 8). Of the genes involved, psy encodes a key regulatory enzyme (Cazzonelli and Pogson, 2010 in tomato fruits is highly likely from a proof-of-concept point of view regarding differential coexpression approaches. The differential coexpression between the 2 isoforms of the psy gene may reflect organ-specific regulation on the transcription level. We found that 2 genes encoding the ε-ring hydroxylase and the SDR homolog, exhibited markedly different coexpression patterns in leaves and fruits. In fruits, the high coexpression (r = 0.73) between the genes encoding ε-ring hydroxylase and SDR was partly consistent with the observation that 2 abscisic acid-deficient mutants, sitiens and flacca, exhibit a decrease in lutein levels (Galpaz et al., 2008). Our approach suggests that the correlation between ABA and lutein levels in tomato plants is attributable to their transcriptional coordination.
Based on transcriptome coexpression analysis, several flavonoid biosynthetic genes in Arabidopsis had characterized (Luo et al., 2007;Yonekura-Sakakibara et al., 2007;Yonekura-Sakakibara et al., 2008;Yonekura-Sakakibara et al., 2012). Flavonoids in plants play significant roles in many biological processes such as protecting plants against ultraviolet light, providing pigmentation in fruits and flowers, and in protecting against diseases and pests in higher plants (Buer et al., 2010). In the tomato, the total flavonoid concentration in roots is lower than that in leaves and fruits (Zornoza and Esteban, 1984); among the 3 organs, tomato fruits manifest the highest concentration of total flavonoids. Earlier studies had detected up to 70 flavonoids including naringenin chalcone, kaempferol, and quercetin in tomato fruits (cv. Micro-Tom) (Moco et al., 2006;Iijima et al., 2008). Our results showed strong coexpression patterns among genes encoding chalcone synthase, chalcone isomerase, and flavanone 3-hydroxylase in fruits ( Figure 9). However, there was no coexpression between these genes in roots. In addition, we found a reversal of coexpression between genes encoding chalcone synthase and chalcone isomerase. This suggests a pronounced change in the underlying transcriptional regulation of the flavonoid pathway. Taken together, these observations suggest that at least in lycopene and flavonoid pathways, these differential coexpressions reflect a key regulatory mechanism among organs.
Because a marked change in coexpression patterns among organs may reflect underlying transcriptional changes directly or indirectly, the genes related to these differential coexpressions may act distinctly in those organs. Differential coexpression analysis represents a good data-mining approach to prioritizing candidate genes in further functional genomics studies on computationally www.plantphysiol.org on January 6, 2018 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved. 1 8 assigned genes encoding an enzyme, for example. Additional work is needed to evaluate the conservation of differential coexpression patterns among species, and more efficient tools for their assessment in plants would be helpful. For example, like Kappa-view 4 (Sakurai et al., 2011), the mapping of differential coexpressions onto metabolic pathways is highly useful in the field of plant research. We offer our extensive analysis of the coexpression networks to aid researchers in the selection and prioritization of candidate genes in studies on tomato functional genomics.

Materials and Methods Plant Material and Growth Conditions
We developed a plant irradiation system based on applying light-emitting diodes (LEDs) to a leaf. We used this system to investigate changes in transcript profiles of tomato leaves and fruits grown under different light conditions. Seeds from tomatoes (Solanum lycopersicum, cv. Reiyo) were sown in 72-cell plant trays (Takii Seed, Japan) and grown in a commercial soil mix (Napura Soil Mixes, Yanmar, Japan) for 2 weeks, in a growth chamber (MKV DREAM, Japan) at 25°C/20°C (light/dark) and 900 ppm CO 2 concentration with a light/dark cycle of 16 h/8 h for 2 weeks at Chiba University, Matsudo, Japan. Then, seedlings were transferred to pots one by one (final size: 2.4 L) and grown in a growth chamber (Asahi Kogyosha, Japan). Photosynthetic photon flux (PPF) level in the growth chamber was adjusted to 450-500 μ mol m -2 s -1 when we measured at the meristem of each tomato plant (light source: Ceramic metal halide lamps). After plant flowering in Summer 2010, we removed leaves and trusses from each plant, with the exception of the second truss, a leaf below the second truss and the meristem. LED irradiation at 0, 200, and 1000 μ mol m -2 s -1 was directly applied to the leaf using a plant irradiation system (humidity, 70%; CO 2 concentration, 900 ppm). Plant material was harvested after 0, 1, and 2 weeks. Twenty samples were analyzed (14 leaf samples and 6 pericarp samples), and 2-3 biological replicates were used.

GeneChip Microarray Analysis
Analysis was performed using the Affymetrix GeneChip® Tomato Genome Array (Santa Clara, CA, USA) according to the manufacturer's instructions. This microarray includes more than 9,000 transcripts and does not cover all of the www.plantphysiol.org on January 6, 2018 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved.
1 9 approximately 35,000 genes encoded in the tomato genome, which are located largely in euchromatic regions (Van der Hoeven et al., 2002). RNA was extracted using the standard procedure of Affymetrix. Our own data have been deposited in the Gene Expression Omnibus (GEO) (Barrett et al., 2011) and are accessible through the GEO series accession number GSE35020.

Quantitative Real-Time PCR Analysis
Total RNA was extracted using the RNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA). Reverse transcription of each total DNase-treated (Qiagen) RNA sample was performed using the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen, Carlsbad, CA, USA). qRT-PCR with the first-strand cDNA using the Fast SYBR Green Master Mix (Applied Biosystems, Foster City, CA, USA) was performed on the ABI StepOnePlus Real Time PCR system (Applied Biosystems). qRT-PCR primers used in the present study are listed in Supplementary Table S8.

Data Collection and Preprocessing
We collected 307 GeneChips from publicly available databases including the GEO, ArrayExpress (Parkinson et al., 2011), and Tomato Functional Genomics Database (TFGD) (Fei et al., 2011), and included 20 of our own hybridizations (GEO accession, GSE35020), resulting in a total of 327 arrays. The collected dataset contains 17 experiments (see also Figure 2 and Supplemental Data 1). The raw CEL data were normalized by the robust multichip average (RMA) (Irizarry et al., 2003) with Bioconductor (Gentleman et al., 2004). The resulting values were normalized to the same range by means-based scale normalization.
Probe-sets with the prefix " RPTR" or "AFFX" were removed. Using the detection call (present/absent) from the MAS5 algorithm (http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf) with default settings, we excluded all probe-sets with absent calls in all samples. For the visual inspection of the quality of microarrays based on chip pseudo-images of residual weights and signed residuals, we used the R packages affy (Gautier et al., 2004) and affyPLM (Gentleman et al., 2005). We next performed a quality check of the microarrays based on the Kolmogorov-Smirnov goodness-of-fit statistic D (Persson et al., 2005) and discarded all GeneChips with D ≥ 0.15, resulting in 307 high-quality datasets for www.plantphysiol.org on January 6, 2018 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved.

Graph Clustering and Functional Enrichment Analysis
To extract coexpression modules, we utilized the graph clustering algorithm IPCA (Li et al., 2008), an extension of the original DPClus algorithm (Altaf-Ul-Amin et al., 2006). DPClus is based on the combined periphery and density of graphs to extract dense subgraphs. The parameters we used were: modules with sizes smaller than 5 (S = 5), a shortest path length of 2 (P = 2), and a parameter "T in " of 0.6 (T = 0.6).
We used BiNGO (Maere et al., 2005), a tool to assess over-or under-representation in a set of genes, for the analysis of significantly over-represented GO categories among coexpression modules detected by graph clustering. We selected a GO term with the best p-value within the BP, CC, and MF domains. The Benjamini and Hochberg correction (Benjamini and Hochberg, 1995) for false discovery rate (FDR) was applied in functional enrichment analysis.

Calculating Differential Coexpressions
We used the term "differential coexpression" to describe significant correlations between given genes found under one-, but not another, condition. This definition also included instances in which the correlations were changed to the opposite direction under 2 conditions (for an example, see Figure 6). It provided a means to identify associations that were dramatically changed by mutations, organs, or treatments. For example, the correlation between gene expressions was calculated over the leaves sample, r L , and over the fruits sample, r F . In this case, the differential coexpression could be elucidated by testing the www.plantphysiol.org on January 6, 2018 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved.
2 1 null-hypothesis H 0 : r L = r F . To test whether 2 coexpressions were significantly different from one another, the correlations were transformed by Fisher's Z-transformation. If there were 2 correlations with sample sizes n 1 and n 2 , respectively, they were each transformed into Fisher's z values, z = 1/2[ln(1+r) /(1-r)]. Under the null hypothesis that the population correlations are equal, the Z value, , has approximately a normal distribution. We also calculated the local false-discovery rate for all correlation tests with the 'fdrtool' package (Strimmer, 2008).

Definition of duplicated genes
The tomato target sequences from the Affymetrix website (http://www.affymetrix.com/Auth/analysis/downloads/data/Tomato.target.zip) were utilized in an all-against-all BLASTX search (cutoff threshold, E-value < 1e-5). We used the Markov chain clustering (MCL) algorithm (http://micans.org/mcl/; Van Dongen, 2000) to assign the target sequences to clusters. The 1677 obtained clusters were regarded as gene families in this analysis.

Gene Annotation
Annotation information (Release 31) for tomato was downloaded from the Affymetrix website (http://www.affymetrix.com/Auth/analysis/downloads/na31/ivt/Tomato.na31.anno t.csv.zip). Gene ontology of tomato genes was based on information from TFGD  Figure 1: Work-flow for extracting coexpression modules and for differential coexpression analyses among organs. Coexpression modules for each gene were generated by graph clustering without regard to functional properties.

Figure Legends
(1) Quality checks of microarrays were performed with robust regression techniques and the Kolmogorov-Smirnov goodness-of-fit statistic D (See Methods). We discarded 20 arrays with low-quality scores (D ≥ 0.15). Probe sets with the prefix "AFFX" and "RPTR" were excluded.
(2) We rejected 220 probes with the detection call "absent" across all samples.     The correlation between the expression levels of 2 genes was significantly different in "leaves" and "fruits" (FDR = 9.27e-14). The axes represent relative gene expressions. Note that 'differential coexpression' was completely different from differential expression, and the mean level of given genes was significantly different between 2 organs (see Methods).          The correlation between the expression levels of 2 genes was significantly different in "leaves" and "fruits" (FDR = 9.27e-14). The axes represent relative gene expressions. Note that 'differential coexpression' was completely different from differential expression, and the mean level of given genes was significantly different between 2 organs (see Methods).    Fukushima et al. Figure 10