Patterns of Metabolite Changes Identified from Large-Scale Gene Perturbations in Arabidopsis Using a Genome-Scale Metabolic Network1[OPEN]

Global patterns of metabolic responses upon single gene perturbations are specific to gene functions, but they are coordinated with characteristics of the perturbed genes. Metabolomics enables quantitative evaluation of metabolic changes caused by genetic or environmental perturbations. However, little is known about how perturbing a single gene changes the metabolic system as a whole and which network and functional properties are involved in this response. To answer this question, we investigated the metabolite profiles from 136 mutants with single gene perturbations of functionally diverse Arabidopsis (Arabidopsis thaliana) genes. Fewer than 10 metabolites were changed significantly relative to the wild type in most of the mutants, indicating that the metabolic network was robust to perturbations of single metabolic genes. These changed metabolites were closer to each other in a genome-scale metabolic network than expected by chance, supporting the notion that the genetic perturbations changed the network more locally than globally. Surprisingly, the changed metabolites were close to the perturbed reactions in only 30% of the mutants of the well-characterized genes. To determine the factors that contributed to the distance between the observed metabolic changes and the perturbation site in the network, we examined nine network and functional properties of the perturbed genes. Only the isozyme number affected the distance between the perturbed reactions and changed metabolites. This study revealed patterns of metabolic changes from large-scale gene perturbations and relationships between characteristics of the perturbed genes and metabolic changes.

Rational and quantitative assessment of metabolic changes in response to genetic modification (GM) is an open question and in need of innovative solutions. Nontargeted metabolite profiling can detect thousands of compounds, but it is not easy to understand the significance of the changed metabolites in the biochemical and biological context of the organism. To better assess the changes in metabolites from nontargeted metabolomics studies, it is important to examine the changed metabolites in the context of the genome-scale metabolic network of the organism.
Although metabolomics data have the potential for identifying the roles of genes that are associated with metabolic phenotypes, the biochemical mechanisms that link functions of genes with metabolic phenotypes are still poorly characterized. For example, we do not yet know the principles behind how perturbing the expression of a single gene changes the metabolic system as a whole. Large-scale metabolomics data have provided useful resources for linking phenotypes to genotypes (Fiehn et al., 2000;Roessner et al., 2001;Tikunov et al., 2005;Schauer et al., 2006;Lu et al., 2011;Fukushima et al., 2014). For example, Lu et al. (2011) compared morphological and metabolic phenotypes from more than 5,000 Arabidopsis chloroplast mutants using gas chromatography (GC)-and liquid chromatography (LC)-mass spectrometry (MS). Fukushima et al. (2014) generated metabolite profiles from various characterized and uncharacterized mutant plants and clustered the mutants with similar metabolic phenotypes by conducting multidimensional scaling with quantified metabolic phenotypes. Nonetheless, representation and analysis of such a large amount of data remains a challenge for scientific discovery (Lu et al., 2011). In addition, these studies do not examine the topological and functional characteristics of metabolic changes in the context of a genome-scale metabolic network. To understand the relationship between genotype and metabolic phenotype, we need to investigate the metabolic changes caused by perturbing the expression of a gene in a genome-scale metabolic network perspective, because metabolic pathways are not independent biochemical factories but are components of a complex network (Berg et al., 2002;Merico et al., 2009).
Much progress has been made in the last 2 decades to represent metabolism at a genome scale (Terzer et al., 2009). The advances in genome sequencing and emerging fields such as biocuration and bioinformatics enabled the representation of genome-scale metabolic network reconstructions for model organisms (Bassel et al., 2012). Genome-scale metabolic models have been built and applied broadly from microbes to plants. The first step toward modeling a genome-scale metabolism in a plant species started with developing a genome-scale metabolic pathway database for Arabidopsis (AraCyc; Mueller et al., 2003) from reference pathway databases (Kanehisa and Goto, 2000;Karp et al., 2002;Zhang et al., 2010). Genomescale metabolic pathway databases have been built for several plant species Zhang et al., 2005Zhang et al., , 2010Urbanczyk-Wochniak and Sumner, 2007;May et al., 2009;Dharmawardhana et al., 2013;Monaco et al., 2013Monaco et al., , 2014Van Moerkercke et al., 2013;Chae et al., 2014;Jung et al., 2014). Efforts have been made to develop predictive genome-scale metabolic models using enzyme kinetics and stoichiometric flux-balance approaches (Sweetlove et al., 2008). de Oliveira Dal'Molin et al. (2010) developed a genome-scale metabolic model for Arabidopsis and successfully validated the model by predicting the classical photorespiratory cycle as well as known key differences between redox metabolism in photosynthetic and nonphotosynthetic plant cells. Other genomescale models have been developed for Arabidopsis (Poolman et al., 2009;Radrich et al., 2010;Mintz-Oron et al., 2012), C. reinhardtii (Chang et al., 2011;Dal'Molin et al., 2011), maize (Dal'Molin et al., 2010Saha et al., 2011), sorghum (Sorghum bicolor;Dal'Molin et al., 2010), and sugarcane (Saccharum officinarum; Dal' Molin et al., 2010). These predictive models have the potential to be applied broadly in fields such as metabolic engineering, drug target discovery, identification of gene function, study of evolutionary processes, risk assessment of genetically modified crops, and interpretations of mutant phenotypes (Feist and Palsson, 2008;Ricroch et al., 2011).
Here, we interrogate the metabotypes caused by 136 single gene perturbations of Arabidopsis by analyzing the relative concentration changes of 1,348 chemically identified metabolites using a reconstructed genomescale metabolic network. We examine the characteristics of the changed metabolites (the metabolites whose relative concentrations were significantly different in mutants relative to the wild type) in the metabolic network to uncover biological and topological consequences of the perturbed genes.

Comprehensive Metabolite Profiles from Single Gene Perturbations
To systematically characterize metabotypes resulting from gene perturbations, we selected 136 Arabidopsis genes across 11 functional categories of metabolism in the Arabidopsis metabolic database AraCyc (Mueller et al., 2003;Fig. 1;Supplemental Fig. S1; Supplemental Table S1). Among these, 29 are genes with experimentally demonstrated enzymatic function, while 33 have significant sequence similarity to known enzymatic genes. In addition, we selected 45 experimentally uncharacterized genes whose involvement in specific metabolic pathways was predicted computationally using the gene cofunction network AraNet ("Materials and Methods;" Lee et al., 2010). AraNet integrates functional genomics data to make probabilistic functional links between genes, which can be used to predict functions of uncharacterized genes using the guilt-by-association principle. Finally, 29 genes that could not be assigned to any functional category by existing methods were selected. We selected the genes based on the absence of a visible phenotype from the gene perturbation, the absence of homologs from recent gene duplications, and easily detected expression in 2-week-old leaves in AtGenExpress data (levels of at least 100; Supplemental Fig. S1). Functions of the well-characterized and predicted genes in this study spanned all metabolism categories (Fig. 1).
Wild-type ecotype Columbia (Col-0) seedlings and mutant seedlings of the 136 selected genes were maintained under standard growth conditions, and metabolites were analyzed as described previously (Bais et al., 2010Quanbeck et al., 2012). We detected 4,483 distinct metabolite peaks from the leaves of 2-week-old seedlings using 11 mass spectrometric platforms, including seven targeted and four nontargeted platforms. We identified 1,348 metabolites based on mass fragmentation patterns and chromatographic behavior (comparison with authentic standards; Fig. 2A) and classified them into nine categories based on systematic names, molecular structures, and functional roles in metabolism. Among the 1,348 chemically defined metabolites, fatty acids and lipids constituted the largest class (51%), followed by specialized metabolites (18%), carbohydrates (10%), amino acids (7%), and other less represented categories (Fig. 2B). The chemically defined metabolites were mapped to 839 metabolic reactions in AraCyc to evaluate the global metabolic changes in the metabolic network of reactions ("Materials and Methods;" Supplemental Fig. S2). The entire metabolite profiling data for all the mutants are available online (http://www.plantmetabolomics.org; Bais et al., 2010) and (http://www.metnetdb.org/PMR; Wurtele et al., 2012;Hur et al., 2013).

Specificity of Changed Metabolites across 136 Single Gene Perturbations
To determine the patterns of the changed metabolites caused by a gene perturbation, we identified metabolites whose relative concentrations were significantly different in mutants relative to the wild type among the 1,348 chemically defined metabolites (Supplemental Table S2; Supplemental Fig. S3). Over 80% (113/136) of the mutants showed fewer than 10 changed metabolites (Fig. 3A). Thirty-one mutants (23%) showed no changed metabolites. Only two mutants, SALK_015522C (AT5G36880) and SALK_069657C (AT1G32200), showed more than 50 changed metabolites. These results indicate that most single gene perturbations of metabolic genes did not rewire the metabolic network globally. The identity of changed metabolites varied among the mutants, suggesting that metabolic changes were specific to the perturbed genes (Fig. 3B). The same pattern of metabolic specificity and diversity was observed when genes with different degrees of characterization (e.g. well characterized, predicted by homology, predicted by AraNet, and unknown) were compared (Supplemental Figs. S4 and S5). Lipids and fatty acids were most commonly affected in the mutants, which may result from the prevalence of this class of metabolites in the analytical platforms in this study. The composition of the changed metabolites was independent from the number of changed metabolites Figure 1. Distribution of the genes selected for this study across 11 functional domains of metabolism in the Arabidopsis metabolic pathway database AraCyc (Mueller et al., 2003). Different colored bars in each domain of metabolism indicate proportions of genes with different degrees of knowledge about their roles in metabolism. Genes associated with more than one functional domain were categorized to multiple domains of metabolism (Supplemental Table S1). The number in parenthesis on the x axis indicates the number of pathways belonging to each functional domain of metabolism.   Table S3). These results demonstrate that metabolic responses from single gene perturbations are quantitatively and qualitatively specific and diverse.
To test whether the specificity of metabolic responses from the mutants corroborated the functional role of the genes, we examined the changed metabolites in the mutants of well-characterized genes. Fatty acids and lipids were enriched in the changed metabolites of perturbed genes involved in fatty acid and lipid metabolism (P = 0.04, hypergeometric test; Fig.  3C). Specialized metabolites were also enriched in the changed metabolites of perturbed genes involved in specialized metabolism, though it was not statistically significant (P = 0.08, hypergeometric test; Fig. 3D). We did not observe a statistical enrichment of amino acids and carbohydrates in the changed metabolites of perturbed genes involved in amino acid and carbohydrate metabolism (Fig. 3, E and F). Composition of the changed metabolites remained similar across the metabolic domains whereas the number of changed metabolites decreased when only three replicates were used to identify the changed metabolites (Supplemental Fig. S8). These results suggest that mutations in different classes of metabolic genes have different impacts on metabolism and that genes involved in fatty acid and lipid metabolism tend to specifically affect fatty acids and lipids. However, we cannot exclude the possibility that the different number of detected metabolites in the metabolic domains may have led to the observed statistical enrichment patterns.

Evidence of Functional Links between Changed Metabolites and Perturbed Genes
To test whether the metabolite profiles we generated were consistent with previously observed changes in metabolism, we examined the changed metabolites and the perturbed enzymatic functions of two mutants associated to well-characterized genes involved in fatty acid and lipid metabolism and specialized metabolism. One of the well-characterized genes in our study, AT1G32200 (Acyltransferase1 [ATS1]), encodes a chloroplast glycerol-3-P acyltransferase that synthesizes 1-acylglycerol 3-P (lysophosphatidic acid) through the acyl-CoAdependent acylation of sn-glycerol-3-P (Nishida et al., 1993). Lines carrying a transfer DNA insertion in the gene (ats1, SALK_069657C) lack most of the plastidic glycerol-3-P acyltransferase activity and are disrupted in the galactoglycerolipid biosynthesis pathway ( Fig.  4A; Kunst et al., 1988;Xu et al., 2006). In this mutant, the amount of galactolipids that contained the plastid-derived diacylglycerol (DAG) moiety decreased, whereas the amount of galactolipids that contained the endoplasmic reticulum (ER)-derived DAG moiety increased, perhaps to maintain plastid membrane function ( Fig. 4A; Okazaki et al., 2013). Plastid-derived glycerolipids have a 16carbon fatty acid at the sn-2 position, and ER-derived glycerolipids preferentially carry an 18-carbon fatty acid at the sn-2 position (Heinz and Roughan, 1983;Browse et al., 1986;Okazaki et al., 2013). The metabotype of ats1 lines showed a significant decrease in the plastid-derived 16-carbon fatty acids of monogalactosyl DAGs, digalactosyl DAGs, and sulfoquinovosyl DAGs and a significant increase in the 18-carbon fatty acids of monogalactosyl DAG, digalactosyl DAG, and sulfoquinovosyl DAG made in the ER compared with the wild type (Fig. 4B). These results show that the metabotypes of ats1 were consistent to changes that would be caused by perturbation of the enzyme encoded by ATS1.
Another well-characterized gene in our study, AT5G13930 (Transparent Testa4 [TT4]), encodes a naringeninchalcone synthase that produces a precursor of naringenin, 2'4'6'4-tetrahydroxychalcone (Shirley et al., 1995;Dana et al., 2006). Lines carrying a transfer DNA insertion in TT4 (tt4, SALK_020583) lack the naringenin-chalcone synthase activity and are blocked in the flavonoid biosynthesis pathway (Supplemental Fig. S9; Shirley et al., 1995). The metabotypes of tt4 lines showed that one metabolite, naringenin, was significantly changed in abundance in the mutant compared with the wild type (4-fold decrease in the mutant, P = 2.39 3 10 -3 , false discovery rate [FDR]-adjusted Student's t test). This indicates that the lack of naringenin-chalcone synthase activity from TT4 perturbation decreased the concentration of naringenin. Kusano et al. (2011) and Yonekura-Sakakibara et al. (2008) found that the relative concentrations of kaempferol 3-O-rhamnoside 7-O-rhamnoside, kaempferol 3-O-glucoside 7-O-rhamnoside, and quercetin 3-O-glucoside 7-O-rhamnoside from tt4 were significantly changed when compared with the wild type. These results also indicate that metabolic changes are associated with the loss of the enzymatic function of TT4, because all these metabolites are derived from the glycosylation of kaempferol, which in turn is generated from naringenin.
These metabotype patterns demonstrate that metabolic changes of the perturbed genes are not random, but can be closely associated with the enzymatic functions of the perturbed genes in the metabolic pathways. These observations prompted us to examine whether the changed metabolites are closely linked to each other and how the changed metabolites are linked to the perturbed genes in the genome-scale metabolic network.

Patterns of Changed Metabolites in the Reconstructed Metabolic Network
To investigate the effects caused by perturbing the expression of a metabolic gene on the metabolic system as a whole, we assessed metabolic changes in the context of a metabolic network. A metabolic network consists of a collection of metabolic pathways composed of biochemical reactions where metabolites are used as substrates to generate products. Therefore, the metabolic system can be represented by a network where a metabolite and a reaction are connected if the metabolite is a substrate or a product of the reaction in a pathway. We reconstructed a metabolic network of Arabidopsis using the pathway database AraCyc version 8.0 (Mueller et al., 2003;Zhang et al., 2010). The network consists of 2,689 reactions and 2,625 metabolites connected by 7,455 metabolite-reaction links (Supplemental Table S4; Supplemental Fig. S10).
To determine whether the changed metabolites in the mutants are catalyzed by reactions in closely related pathways or just randomly associated, we assessed the distance (number of enzymatic steps) between the changed metabolites in the network. The distances among the changed metabolites were significantly shorter than expected by chance (P = 1.0 3 10 -4 , Student's t test) and the distance among the unchanged metabolites (P = 9.15 3 10 -8 , Student's t test) across the mutants ( Fig. 5A; Supplemental Fig. S11). As an example, the average distance among the changed metabolites of the mutant ats1 (SALK_069657C, AT1G32200) in the network was 7.68 links compared with 9.52 links from random expectation (Fig. 5B). These results suggest that the changed metabolites from gene perturbations are not randomly associated but closely linked to each other in related pathways or in the same pathway.

The Number of Isozymes Affects the Distance between the Perturbed Reactions and Changed Metabolites
Metabolite concentrations depend on the rate of the reactions that consume or produce them. Thus, changing the enzymes' abundance by perturbing the expression of the encoding genes would change the concentration of the metabolites that are directly or closely linked to the reactions catalyzed by these enzymes. This somewhat obvious hypothesis has not been tested systematically. To test this hypothesis, we asked whether the changed metabolites were closely associated to the perturbed reactions in the network by measuring the distance between the changed metabolites and the perturbed reactions catalyzed by wellcharacterized enzymes. Twenty-one out of the 29 well-characterized enzymatic mutants were evaluated because eight mutants did not have changed metabolites in the network. In 13 perturbed reactions of seven mutants, the changed metabolites were significantly closer to the perturbed reactions than the same number of randomly chosen compounds among the detected metabolites (P , 0.05, Student's t test). In the remaining 26 reactions in 14 mutants, the changed metabolites were not closer to the perturbed reactions than expected by chance (Fig. 6A). This unexpected result indicates that perturbing a reaction has a more complex effect on the metabolic network than previously assumed.
To understand the causes that underpin the difference in the distance between the changed metabolites and site of perturbation in the network, we grouped the genes into two categories: (1) genes that cause metabotypes that are close to the mutated reaction than expected by chance (proximal metabotype) and (2) genes that cause metabotypes that are not close to the mutated reaction than expected by chance (nonproximal metabotype). We compared biological and network characteristics of the perturbed genes and reactions between the two groups of genes. At the enzyme-coding gene level, we evaluated: (1) the expression levels in 2-week-old rosette leaves and (2) the existence of paralogs. At the reaction level, we evaluated: (1) the number of changed metabolites upon a reaction perturbation, (2) the average distance among the metabolites affected by the changes in the reaction, (3) the number of metabolites connected to a perturbed reaction, (4) the catalytic speed of a perturbed reaction (a rate-limiting reaction), (5) the uniqueness of a perturbed reaction on consuming or producing a metabolite, (6) the number of predicted isozymes catalyzing a perturbed reaction, and (7) the location of the perturbed reaction in the network.
Among the nine properties tested, only the number of isozymes of a perturbed reaction was significantly smaller in the proximal metabotype than the nonproximal metabotype group (P = 1.15 3 10 -4 , Student's t test; Fig. 6B; Supplemental Fig. S12). Furthermore, the number of predicted isozymes of the perturbed reactions positively correlated with the distance between the changed metabolites and the perturbed reaction (R = 0.41, R 2 = 0.17, P = 0.01, Fisher's z test; Fig. 6C). These results indicate that metabolite abundance is affected not simply by the adjacency to the perturbed reaction, but by the characteristics of the associated enzyme-encoding genes.

DISCUSSION
This study assessed patterns caused by single gene perturbations by combining genetic, metabolomic, and network analysis. By identifying significantly changed metabolites across 136 single gene mutants, we observed that changed metabolites varied quantitatively and qualitatively. In about 80% of the mutants, perturbing the expression of a gene affected the concentration of less than 10 metabolites. Only two mutants affected more than 50 metabolites. This observation supports the concept that the overall metabolic network is highly robust against single gene mutations (Ishii et al., 2007).
To examine the global properties of the changed metabolites across the 136 mutants, we reconstructed a metabolic network for Arabidopsis using the AraCyc database. We removed the 24 currency metabolites (proton, water, oxygen molecule, NADP + , NADPH, ATP, diphosphate, carbon dioxide, phosphate, ADP, coA, UDP, NAD + , NADH, AMP, ammonia, hydrogen peroxide, oxidized electron acceptor, reduced electron acceptor, 3-5-ADP, GDP, carbon monoxide, GTP, and FAD; Supplemental Table S5) from the network because they are associated with many reactions in the cell, thus creating many biologically unrealistic shortcuts in the metabolic network Zeng, 2003a, 2003b;Verkhedkar et al., 2007). For example, water is an essential compound involved in most biochemical reactions. The biological paths among reactions created by water could be neither functionally associated nor biologically meaningful in metabolic pathway analyses because water is not functionally specific but ubiquitous in metabolism. To analyze the impact of removing the 24 currency metabolites in the network, we measured three topological properties of the network before and after removing the currency metabolites. We compared the average distance of all possible shortest paths, the fraction of reactions in connected components, and the number of links before and after removing the currency metabolites. On average, the shortest path between any two nodes before removing currency metabolites (4.29) was much shorter than the average distance after removing currency metabolites (9.00). In addition, more than 40% of the links (5,236/12,691) were created by the 24 currency metabolites. About 90% of the reactions (2,400/2,689) were included in the largest component of the network after deleting the currency metabolites. These analyses indicate that the deletion of the currency metabolites did not disrupt the structure of the overall network but increased the possibility of identifying functionally and biologically meaningful metabolic paths in the network.
We found that changed metabolites were closer to each other than the same number of randomly selected detected metabolites in the network. Because metabolites are connected to reactions in the network when the metabolite is a substrate or a product of the reaction, the proximity of changed metabolites implies that they are likely participating in the same pathway or closely related pathways. As an example, the changed metabolites of ats1 involved in DAG biosynthesis were enriched in lipids and were significantly closer to each other than among all lipids (P = 0.01, Student's t test) and among randomly sampled detected compounds (P = 3.61 3 10 -15 , Student's t test; Figs. 4 and 5B; Supplemental Table S2). However, we also observed dispersed patterns of metabotypes in the network, suggesting that enzymes encoded by these genes could be involved in multiple reactions or regulatory pathways rather than in specific enzymatic functions, which could affect metabolism more globally when perturbed. Another possibility is that these enzymes catalyze the so called hub metabolites, metabolites highly associated with other metabolites in the metabolic network; thus, disturbing these enzymes would likely have global effects. Moing et al. (2011), using a network-based analysis to obtain a global overview of the associations among 715 metabolites in three ripening stages of melon (Cucumis melo) fruit, identified that 13% (96/715) of these metabolites were hub metabolites. Alternatively, missing reactions in the network could increase the path lengths among the changed metabolites. Although the changed metabolites were close to each other in the network, they were not immediately adjacent to each other in many of the mutants. This could reflect the plasticity of a metabolic network against gene mutations by rerouting or rewiring biochemical pipelines (Harrison et al., 2007;Hanada et al., 2011). Fendt et al., 2010 also found that, in yeast, a perturbation in central carbon metabolism enzymes increased the local (but not necessarily immediate) reaction substrate concentrations, probably as a passive mechanism to minimize differences in flux upon metabolic disturbance. Better characterization of the flux control coefficients and other kinetic properties of the enzymes and reactions could help explain the exact nature of this type of pattern of metabotypes, but such efforts are outside the scope of this study.
We also found that changed metabolites were not always close to the perturbed reactions in mutants. In only seven out of the 21 well-characterized mutants, the changed metabolites were close to the perturbed reactions (proximal metabotypes). To analyze which properties of the mutated genes or perturbed reactions are responsible for the distance between the changed metabolites and the site of perturbation in the network, we investigated six biological and three metabolic network properties of the mutated genes or perturbed reactions. Among them, only the number of predicted isozymes was correlated with the proximity of the changed Figure 6. Effect of the number of predicted isozymes on the distance between changed metabolites and perturbed reactions in the metabolic network. A, Pairwise tests on the distance between changed metabolites and perturbed reactions for each well-characterized mutant. Dots represent the average distance from the perturbed reactions to the changed metabolites and the same number of randomly selected metabolites. Red dots are the perturbed reactions with significant closeness to the changed metabolites. B, Distribution of the number of isozymes between the mutants with proximal metabotype (changed metabolites are closer to perturbed reactions than expected by chance) and the mutants with nonproximal metabotype (changed metabolites are not closer to the perturbed reactions than expected by chance). C, Correlation between the number of isozymes of the perturbed reactions and the distance between the changed metabolites and the perturbed reaction. metabolites to the site of perturbation (Fig. 6, B and C; Supplemental Fig. S12). When there were fewer isozymes annotated to a perturbed reaction, the changed metabolites were closer to the perturbed reaction in the metabolic network. Possibly, the absence of isozymes for a reaction prevents the substitution or compensation of the enzymatic function of the perturbed reaction, resulting in the changed metabolites in the vicinity of the corresponding reactions. On the other hand, the presence of several isozymes might allow the divergence of some of them to perform new biochemical functions, resulting in catalyzing a different (distant) reaction in the metabolic network (Schmidt et al., 2003;Fani and Fondi, 2009). Additionally, the isozymes could be expressed in different cell types, and the observed metabotype could represent an average of several cell type-specific metabotypes. In these scenarios, perturbing a gene with many isozymes is likely to cause metabolic changes in functionally different parts of the metabolic network. With our current experimental design focusing on whole leaves, we were not able to examine this possibility.
While growing evidence indicates that metabolomics is valuable for understanding the biological roles of genes (Fiehn et al., 2000;Saito and Matsuda, 2010), there are still challenges to be overcome (Mendes, 2006). The first fundamental issue is the large number of unidentified metabolites ( Fig. 2A). About 70% of the detected compounds in our study are not annotated due either to the complexity of the chemical structure of the compounds or the lack of available standards. Second, metabolic changes compared with the wild type are not always detectable for single gene mutations. For example, we could not detect any changes in metabolites in 23% of the mutants (Supplemental Fig. S6, A and B). One possibility is the variability of the levels of detected metabolites arising from biological and technical noise. Perhaps, transient metabolites would not be detected in the current setting. Alternatively, the robustness of the metabolic system caused by genetic or network-level redundancy could also prevent the detection of metabolic changes in the single-gene mutants (Cornelius et al., 2011). Third, detecting changed metabolites in a single time point and from different organs may cause distinct metabotypes even though the same mutants are studied. By surveying other metabolomics studies (Yonekura-Sakakibara et al., 2008;Kusano et al., 2011) of tt4 and tt5 mutants, the metabotypes from the previous studies were similar, but not identical, to the metabotypes from our study. We identified flavonoid naringenin and naringenin derivatives including genistein, genistin, luteolin-39-7-O-glucoside, naringenin-7-O-glucoside, and naringin (Lapcik et al., 2006), whereas the previous studies identified kaempferol 3-O-rhamnoside 7-O-rhamnoside, kaempferol 3-O-glucoside 7-O-rhamnoside, and quercetin 3-O-glucoside 7-O-rhamnoside as changed metabolites in tt4 and tt5 mutants. Because flavonoids are known to be involved in responding to different ultraviolet light conditions and toxic compounds in soil such as aluminum, it is possible that differences in growth conditions could affect the metabolic changes of flavonoids in different studies (Winkel-Shirley, 2002). Nonetheless, both naringenin and kaempferol are known to be associated with the catalytic activity of the enzymes encoded by TT4 and TT5; thus, the metabolic changes from the previous studies and this study were both associated with the metabolic functions of the mutated genes.
In addition to the limitation of metabolite identification, different metabolic network models represent different sets of reactions. To minimize the impact of this limitation, we reconstructed the metabolic network using the AraCyc database, which is extensively curated with experimentally validated data. However, about 38% of the chemically identified metabolites (509/1,348) were still not present in the database, which may reduce the power of network analysis on the metabolic response caused by gene perturbation (Supplemental Fig. S2). Detecting and identifying all of the metabolites in a metabolic network in untargeted studies using incomplete reference compound libraries can be challenging (Hegeman, 2010). Because we evaluated metabolic changes of gene perturbations based on only a fraction of the metabolites in the network, the conclusions we made may change if there were more metabolites detected and identified. In addition, orphan reactions (reactions that are not connected to any other reactions) in pathway databases may be indicators of missing metabolites and enzymatic activities as well as the incompleteness of a metabolic network (Mueller et al., 2003), which could also limit the ability to link metabotypes to the perturbed genes using metabolic network analysis.
Our study reveals characteristics of metabotypes caused by single gene mutations in a genome-scale metabolic network. By examining biological and topological properties of metabolic changes in the context of a genomescale network, metabolomics can be used as a tool to identify the relationships between changed metabolites and their potential impact on the metabolic system and biology of the organism. This type of analysis could be useful in assessing the effects of GM on crops. For example, we found that perturbations generated by mutating single genes could affect metabolites that are proximal or nonproximal to the perturbation sites. By targeting genes that show only proximal metabotypes, undesired and unknown effects could potentially be minimized and, in parallel, more direct responses could be achieved. In addition, metabolomics combined with metabolic network analysis has the potential to systematically evaluate the substantial equivalence between GM and non-GM organisms. For example, the amount of essential or toxic compounds in GM plants can be quantified by comparing the metabolic profiles with the non-GM plants (Smith et al., 2010;van Rijssen et al., 2013). Developing tools that enable comprehensive evaluation of the transgene's effects upon the GM crops will be important to fully understand the impact of the GM (Ricroch et al., 2011). As experimental methods and data resolution improve, our understanding of the metabolome and the relationship between genotypes and metabolic phenotypes should become clearer (Fernie, 2007). Emerging tools and resources such as genome-scale metabolic networks, quantitative network modeling, and metabolomics may help assess the effects of GM on metabolism and may facilitate rational assessment of unintended effects of GM on metabolism.

Selecting Uncharacterized Genes Using AraNet
To select candidates from the pool of mutants perturbed in the expression of previously uncharacterized genes, we predicted functions of the corresponding gene products based on their functional similarity to known enzymes using AraNet Hwang et al., 2011). We queried AraNet using wellcharacterized genes for each metabolic pathway from AraCyc 8.0 (Mueller et al., 2003) and identified candidate genes encoding missing enzymes in the pathway. The gene function predictability of AraNet across 11 functional categories of metabolism is shown in Supplemental Figure S13. Out of the top 200 candidates for each pathway with area-under-the-curve scores greater than 0.7, we chose genes with the highest log likelihood score that met the following conditions: (1) gene expression in 2-week-old leaves was greater than 100 in AtGenExpress data (Schmid et al., 2005), (2) there were no recently duplicated genes (Blanc and Wolfe, 2004), (3) there were no known functional annotations (Lamesch et al., 2012), and (4) there were no visible phenotype (Supplemental Table S1).

Plant Growth Condition
Wild-type Col-0 and mutant seedlings of the 136 selected genes (129 homozygous lines carrying knockout alleles and seven overexpressing lines using the 35S promoter) were grown under consistent conditions to minimize environmental effects as described in Quanbeck et al. (2012). Seeds were sown on sterile Murashige and Skoog basal salt mixture supplemented with 0.1% (w/v) Suc and 13 liquid vitamin solution in 100-mm 3 100-mm 3 15-mmsquare petri dishes. Seeds were arranged in a single horizontal line, and each dish contained between 18 and 20 seeds. After sowing the seeds, the plates were wrapped with Micropore tape and then stored horizontally for 4 d at 4°C, with illumination of 1 mE m -2 s -1 to break seed dormancy. On the 5th day, plates were moved to the growth room and held in a vertical position in wire rack holders for 16 d under a 24-h regimen of constant fluorescent illumination of 50 mE m -2 s -1 and temperature of 23°C to 25°C. On the twentieth day after sowing the seeds, dishes were opened, and the aerial portions of the plants were harvested immediately upon plate opening and were frozen in liquid nitrogen. Details of the plant growth condition for each batch are available on PlantMetabolomics.org (http://www.plantmetabolomics.org) and also described in Bais et al. (2010Bais et al. ( , 2012 and Quanbeck et al. (2012). All of the plants were grown and harvested at Iowa State University and sent out to collaborating laboratories for metabolite extraction and profiling.

Metabolomics Analytical Platforms and Metabolite Profiling
We used 11 analytic platforms (seven targeted and four nontargeted) to profile the concentrations of 1,348 metabolites. We first distinguished 4,482 peaks for putative compounds from the leaves of 2-week-old seedlings using 11 mass spectrometric platforms, and we then identified 1,348 metabolites based on the mass fragmentation patterns by comparison with those of authentic standards using the National Institute of Standards and Technology library. The targeted platforms detected amino acids using GC with a flame ionization detector, isoprenoids using HPLC with diode array detection and MS, lipids using electrospray ionization-tandem MS, and ceramides, cuticle waxes, fatty acids, and phytosterols using GC-MS. The nontargeted platforms included capillary electrophoresis-MS, LC-MS, GC-time-of-flight-MS, and ultra-performance LC-quadrupole time-of-flight-MS. These nontargeted platforms captured various metabolites including amino acids, fatty acids, alcohols, carbohydrates, nucleosides, chalcones, flavonoids, glucosinolates, and terpenoids. Metabolic profiling data are available at http://plantmetabolomics.vrac. iastate.edu/ver2/datasets/overview.php (Bais et al., 2010), and details of the extraction protocols for each platform are available at http://plantmetabolomics. vrac.iastate.edu/ver2/tutorials/protocols.php (Bais et al., 2010.

Classifying Detected Metabolites
The chemically defined compounds were classified into nine categories (fatty acid and lipid, specialized metabolism, carbohydrate, amino acid, nucleotide, cofactor, plant hormone, amine and polyamine, and others) based on the AraCyc compound ontology (Mueller et al., 2003). We wrote a Java program to map profiled metabolite names to common names and synonyms in AraCyc using text matching. We then manually classified the unmapped metabolites based on the classification from chemical entities of biological interest (de Matos et al., 2010), Kyoto Encyclopedia of Genes and Genomes (Kanehisa et al., 2012), PubChem (Wang et al., 2012), Wikipedia, and Google searches. We exercised caution in adding unmapped metabolites to the database. In cases where there was clear additional evidence for the metabolite's existence in Arabidopsis (Arabidopsis thaliana), for example, based on literature citations stored in the Knapsack database (Afendi et al., 2012), the compound was added to AraCyc. However, in cases where the compound was typically associated with bacteria, such as aurantimycin, we used it for metabotype analysis but did not add it to the AraCyc database. Because the network analysis required compounds to be connected to reactions, we did not introduce unmapped compounds into the network that were not linked to a reaction in AraCyc.

Identifying Significantly Changed Metabolites
To identify metabolites that were significantly changed in the mutants compared with wild-type plants, we processed the metabolomics data through several filtering steps (Supplemental Fig. S3). First, we removed metabolites that were below detection limit. To check the consistency of the metabolic concentrations within biological replicates, we evaluated the correlation of three replicates for the amino acid-and ceramide-targeted platforms and capillary electrophoresis-MS and LC-MS-based untargeted platforms; three to six replicates for fatty acid-, cuticle wax-, phytosterol-, plastidial isoprenoid-, and lipid-targeted platforms and the ultra-performance LC-quadrupole time-of-flight-MS-based untargeted platform; or six replicates for the GC-time-of-flight-MS-based untargeted platform and removed the replicate that weakly correlated with the other replicates (correlation threshold , 0.7). About 0.7% of the profiles were removed due to weak correlation among biological replicates, indicating that more than 99% of the profiles were highly reproducible. Plants were grown in several experimental batches. In each batch for each analytic platform, metabolites were normalized using the median concentration of log-transformed values (log 2 ) across all genotypes within a platform for each metabolite to minimize possible technical differences among the batches (Quackenbush, 2002). We conducted three independent FDR-adjusted Student's t tests (Benjamin and Hochberg, 1993) between mutants and two sets of wild-type (Col-0) plants to identify metabolites that were significantly changed in the mutants (Supplemental Fig. S3). A metabolite was deemed significantly changed in the mutant only if the FDR-adjusted Student's t tests were significant in the relative concentrations between a mutant and two independent sets of wild-type plants (FDR-adjusted Student's t test , 0.10) and an FDR-adjusted Student's t test was not significant between the two sets of wildtype plants (FDR-adjusted Student's t test $ 0.10). Significantly changed metabolites for the mutants are listed in Supplemental Table S2.

Enrichment Test for Changed Metabolites across Functional Categories of Genes
To check whether the changed metabolites were enriched in a functional category, we conducted a hypergeometric test on the functional category of metabolites for each functional category of the genes. First, we counted the number of changed metabolites by metabolite functional category for each mutant and then examined whether the changed metabolites belonging to each category was enriched within a mutant by comparing them with the background frequency (the number of all identified metabolites in each metabolite category). Second, we tested whether the number of mutants in a functional category had an enriched category of metabolites using a hypergeometric test.

Reconstructing a Genome-Scale Metabolic Network
We extracted reactions, enzymes, genes, and compounds from the AraCyc database (version 8.0; Mueller et al., 2003) and converted them into a bipartite metabolic network with two classes of disjoint nodes containing 2,689 reactions and 2,625 metabolites (Supplemental Table S4; Supplemental Fig. S10). Nodes represent either metabolites or reactions, and edges (links) represent the association between metabolites and reactions. Because both a reactioncentric network (a node is a reaction and an edge is a metabolite) and a metabolite-centric network (a node is a metabolite and an edge is a reaction) could not specify the links of metabolites and reactions in the networks, respectively, we reconstructed a genome-scale metabolic network as an undirected bipartite network. We did not consider the directionality of metabolic reactions because only five reactions have directionality supported by experimental evidences in AraCyc. A metabolite was connected to a reaction if the metabolite was a substrate or a product of the reaction. The constructed metabolic network is a bipartite network because nodes were divided into two disjoint sets, U (metabolites) and V (reactions), such that every edge connected a node in U to one in V; that is, U and V were independent sets (Diestel, 2005). We then removed 24 currency metabolites (proton, water, oxygen molecule, NADP + , NADPH, ATP, diphosphate, carbon dioxide, phosphate, ADP, CoA, UDP, NAD + , NADH, AMP, ammonia, hydrogen peroxide, oxidized electron acceptor, reduced electron acceptor, 3-5-ADP, GDP, carbon monoxide, GTP, and FAD) from the reconstructed metabolic network (Supplemental Table S5). Because they take part in many reactions in the cell, thereby creating many biologically unrealistic shortcuts in the metabolic network, any path-based measurement of the network could be biased Zeng, 2003a, 2003b;Verkhedkar et al., 2007). To select currency metabolites, we prioritized metabolites by the connectivity (degree) of metabolites in the reconstructed metabolic network and then manually checked the links created by them by evaluating whether they were functionally meaningful.

Distance Measurement in the Metabolic Network
With the reconstructed metabolic network, the shortest distance among the metabolites and reactions was calculated. The distance of two nodes is 1 when they are directly connected in the network, and the distance of two nodes was measured only if they were connected in the network. We wrote a Java program to calculate a shortest distance based on the breadth-first search algorithm (Russell and Norvig, 2009). A metabolic path was ignored if the path was constructed from reactants to reactants or from products to products in a metabolic reaction.
Using the distance calculation described above, we measured adjacency of changed metabolites by calculating shortest distances among changed metabolites. To calculate the average shortest distance among changed metabolites, we determined pairwise shortest distances among all changed metabolites and averaged them for each mutant. We also measured the distance between a changed metabolite and the perturbed reactions in the network. We determined the shortest distances between changed metabolites and a perturbed reaction and then averaged them for each perturbed reaction of each mutant. Because the reconstructed metabolic network is an undirected network, the directionality of a reaction is not taken into consideration for the distance calculation.

Statistical Test for Network Distance Analyses
To check the statistical significance for the distance analysis, we compared the average shortest distance among the changed metabolites with the average shortest distance among randomly selected metabolites in the reconstructed metabolic network. Because the changed metabolites associated with the mutants are a subset of detected metabolites in the network, we randomly selected the same number of detected metabolites as the number of changed metabolites from each mutant and calculated shortest distances among the randomly sampled metabolites. One hundred independent selections and calculations were made to determine the average shortest distances of the random expectation from each mutant. Only the chemically identified (named) metabolites in the metabolic network were taken into consideration for random expectation of the distance of changed metabolites. Average distance among metabolites was measured over only the connected metabolites. To determine whether the distance among the changed metabolites was significantly different from random expectation, we performed Student's t test between the average shortest paths among the changed metabolites and the average shortest paths among the randomly selected compounds in the 136 mutants (Fig. 5A). In addition, the average distance between changed metabolites was compared with the average distance between unchanged metabolites using Student's t test (Supplemental Fig. S11). To determine whether the distance between the changed metabolites and the perturbed reaction was significantly different from random expectation, we performed one-sample Student's t test between the distance of the average shortest path of the changed metabolites to the perturbed reaction and the average shortest paths of randomly selected metabolites to the perturbed reaction (100 samples) for each well-characterized mutant (Fig. 6A).

Statistical Test of Biological and Topological Properties of Genes
We tested six biological and three topological properties that could distinguish between the two groups of mutants associated with the well-characterized genes with different proximity to the changed metabolites. The number of predicted isozymes was determined by counting the number of enzymes associated with a reaction. The rate limitability of a reaction was curated by PMN (2014). The catalytic irreversibility of a reaction and the catalytic uniqueness of a reaction were obtained from AraCyc (Mueller et al., 2003). Expression levels of the genes in 2-week-old rosette leaves were taken from the AtGenExpress database (Schmid et al., 2005). The number of significantly changed metabolites was determined in this study. Sets of paralogous genes were obtained from Thomas et al. (2006). The degree of a reaction is the number of links from the reaction to directly connected neighboring reactions through shared metabolites in the reconstructed network. The betweenness centrality of a reaction is the number of shortest paths from all nodes that pass through that reaction. These two topological properties were calculated by using the open source Java Universal Network/Graph framework (Madadhain et al., 2005). Average shortest distance was calculated using a Java program as described in the previous section. To evaluate statistical difference between the two groups, we conducted Student's t tests or Fisher's exact tests depending on the types of data (P , 0.05; Fig. 6B; Supplemental Fig. S12).
Sequence data from this article can be found in Supplemental Table S1.

Supplemental Data
The following supplemental materials are available.
Supplemental Figure S1. Criteria used for selecting target genes.
Supplemental Figure S2. Composition of the detected compounds that are named and mapped in AraCyc.
Supplemental Figure S3. Procedure used to identify significantly changed metabolites in the mutants compared with the wild type.
Supplemental Figure S4. Diversity of changed metabolites within the four types of the selected genes.
Supplemental Figure S5. Specificity of changed metabolites within the four types of the selected genes.
Supplemental Figure S6. Composition and number of the changed metabolites across the mutants.
Supplemental Figure S7. Comparison between the proportion of each type of compounds and the number of changed metabolites across the mutants.
Supplemental Figure S8. Proportions of the changed metabolites in four metabolic domains across well-characterized mutants when three biological replicates were used to identify changed metabolites.
Supplemental Figure S9. Illustration of the flavonoid biosynthesis pathway.
Supplemental Figure S10. Visualization of the reconstructed metabolic network based on AraCyc.
Supplemental Figure S11. The distance among changed metabolites across all the mutants and the well-characterized mutants.
Supplemental Figure S12. Comparison of biological and topological properties of metabolic phenotypes between proximal and nonproximal metabotype groups of mutants of known genes.
Supplemental Figure S13. Gene function predictability of metabolic pathways in AraNet.