Assessing the metabolic impact of nitrogen availability using a compartmentalized maize leaf genome-scale model.

Maize (Zea mays) is an important C4 plant due to its widespread use as a cereal and energy crop. A second-generation genome-scale metabolic model for the maize leaf was created to capture C4 carbon fixation and investigate nitrogen (N) assimilation by modeling the interactions between the bundle sheath and mesophyll cells. The model contains gene-protein-reaction relationships, elemental and charge-balanced reactions, and incorporates experimental evidence pertaining to the biomass composition, compartmentalization, and flux constraints. Condition-specific biomass descriptions were introduced that account for amino acids, fatty acids, soluble sugars, proteins, chlorophyll, lignocellulose, and nucleic acids as experimentally measured biomass constituents. Compartmentalization of the model is based on proteomic/transcriptomic data and literature evidence. With the incorporation of information from the MetaCrop and MaizeCyc databases, this updated model spans 5,824 genes, 8,525 reactions, and 9,153 metabolites, an increase of approximately 4 times the size of the earlier iRS1563 model. Transcriptomic and proteomic data have also been used to introduce regulatory constraints in the model to simulate an N-limited condition and mutants deficient in glutamine synthetase, gln1-3 and gln1-4. Model-predicted results achieved 90% accuracy when comparing the wild type grown under an N-complete condition with the wild type grown under an N-deficient condition.


One-Sentence Summary
A cell-type and leaf tissue-specific model provides new insights into nitrogen metabolism in the maize leaf. (N), gln1-3 and gln1-4 mutants (Martin et al., 2006), has provided a more accurate assessment of N metabolism within the maize leaf. Moreover, since integration of transcriptome, proteome, and metabolome data appeared not to be straightforward (Amiour et al., 2012;, the development of a model could help to identify putative candidate genes, proteins, and metabolic pathways contributing to plant growth and development. Maize is a C 4 plant that overcomes the inefficiencies of ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco), to capture O 2 over the preferred CO 2 , by separating the photosynthetic carbon (C) fixation process into two cell types: the bundle sheath and mesophyll cells. In comparison to C 3 plants, this separation allows C 4 plants to have a lower rate of photorespiration, higher rate of photosynthesis at high light intensities (under standard air and temperature conditions), and a higher photosynthetic N use efficiency (NUE) (Christin and Osborne, 2013;Driever and Kromdijk, 2013;Peterhansel et al., 2013;Sage, 2014;Wang et al., 2014). A C 4 -specific maize GSM could provide insight into N metabolism and provide cues for improving NUE (i.e. the vegetative biomass or grain yield produced per unit of N present in the soil). Since N is the major limiting factor in agricultural production among mineral fertilizers (Vitousek et al., 1997;Hirel et al., 2007;Andrews and Lea, 2013;Andrews et al., 2013) and NUE is estimated to be far below 50% in cereal grains (Raun and Johnson, 1999), improving NUE is essential for improving overall productivity in maize (Hirel and Gallais, 2011). Arabidopsis thaliana (The Arabidopsis Initiative, 2000), Arabidopsis lyrata (Hu et al., 2011), Glycine max (Schmutz et al., 2010), Oryza sativa (Goff et al., 2002;Yu et al., 2002), Populus trichocarpa (Tuskan et al., 2006), Sorghum bicolor (Paterson et al., 2009), Theobroma cacao (Tuskan et al., 2006), and Zea mays (Schnable et al., 2009).
Gene annotations of the whole-genome sequences have been used to determine the reactions within an organism and therefore build a GSM. FBA calculates all reaction fluxes in a metabolic network based on the optimization of an objective function (typically the maximization of the biomass yield). A quasi-steady state is assumed and flux constraints are set based on the specific media or the reversibility of reactions derived from thermodynamics. Incorporation of "omics" data into GSMs is achieved through appropriate constraints on fluxes that restrict metabolic flows to only conditionrelevant phenotypes.
During the last few years, multiple methods have been developed to integrate "omics" data into GSMs. Proteomic and transcriptomic data have been used to apply flux constraints on corresponding reactions determined by gene-protein-reaction (GPR) associations. The GIMME (Becker and Palsson, 2008), iMAT (Shlomi et al., 2008), and MADE (Jensen and Papin, 2011) algorithms use a "switch" approach to turn on/off reactions based on expression levels. The GIMME algorithm turns off reactions based on a user-specified threshold of the expression level. The iMAT algorithm turns on a minimal set of reactions associated with low expression data in order to achieve a userspecified metabolic function. The MADE algorithm incorporates related experimental datasets into the model to activate or repress reactions based on the progression of the experimental conditions. A different class of algorithms, known as the "valve" approach, was developed to incorporate proteomic and transcriptomic data by constraining the allowable flux ranges of reactions. The E-Flux method incorporates a user-specified function to convert gene expression data to flux constraints (Colijn et al., 2009 Li et al., 2010;Chang et al., 2012) as opposed to the Arabidopsis-based compartmentalization adopted in the previous iRS1563 maize model (Saha et al., 2011).
Light reactions have been expanded from an aggregate reaction (as described in the iRS1563 model) to multiple reactions for each complex with the inclusion of a thylakoid membrane compartment. In contrast to the C4GEM maize model , which focuses exclusively on primary metabolism in maize, the developed model also spans secondary metabolism by including all reactions known to occur within the maize leaf tissue. The model includes as many as 763 secondary metabolism reactions (without including duplicate counting due to compartmentalization). Through the incorporation of "omics" data, regulatory restrictions are introduced in the model to switch-off/on reactions under the N + WT, N -WT condition and two GS knockout mutants (gln1-3 and gln1-4) in the vegetative (V) stage during which the plant absorbs and assimilates N for root and leaf biomass production (Amiour et al., 2012;.
Reactions linked to genes or proteins with significantly different expression levels between N + WT and N -WT condition, as well as the gln1-3 and gln1-4 mutants versus the N + WT are conditionally turned on or off accordingly. The metabolite pool is simulated by maximizing the total flux through a metabolite (i.e. flux-sum) as a proxy for the metabolite turnover rate (Chung and Lee, 2009

Effect of Nitrogen Conditions on Biomass Components
Biomass components were measured in the N + WT condition, as well as for each N background (N -WT, gln1.3 and gln1.4 mutants). Table I and Figure 1 display the composition of the classes of biomass metabolites and Supplemental Table S1 indicates the specific biomass measurements in all modeled conditions. As expected, in the majority of cases, the N -WT condition produced a smaller concentration of biomass components than the N + WT, the gln1-3, and the gln1-4 mutant conditions. However, the concentration of amino acids produced was about five times higher in the gln1-4 mutant than the gln1-3 mutant, resulting in comparable amino acid concentrations between the gln1-4 mutant and the N + WT, as well as between the gln1-3 mutant and N -WT 1 0 conditions. The similar amino acid concentrations between the gln1-4 mutant and N + WT condition in the vegetative stage help to confirm that the GS1-4 isozyme is essential in plant maturity and has a smaller effect compared to the GS1-3 isozyme, at the vegetative stage. As expected, the concentration of starch was higher in the N -WT condition than the N + WT condition. Under Nconditions, the breakdown of starch is limited by the amount of N available (Tercé-Laforgue et al., 2004;Amiour et al., 2012). Due to the limited N available, the starch is stored rather than broken down to produce other biomass components. The stained micrograph depicting the starch visible in the N + WT, gln1-3 mutant, and gln1-4 mutant conditions are available in Supplemental Figure S1.
The condition-specific biomass concentrations have been incorporated in the maize leaf model to more accurately represent metabolism under each condition.

Development of the second-generation maize leaf model
The second-generation maize leaf model was developed using a combination of gene, protein, and reaction information from the previously developed maize model iRS1563

1
Finally, 303 specific reactions were added to model glycerolipid synthesis as shown in Supplemental Figure S2 and Table S2 (Moore, 1982;Murata, 1983;Murata and Tasaka, 1997;Mekhedov et al., 2000;Bachlava et al., 2009;Li-Beisson et al., 2010;Rolland et al., 2012). To the best of our knowledge, this is the first plant model to include detailed glycerolipid synthesis. Aggregate reactions were included to link specific two-tailed glycerolipids to the experimentally measured single lipids (see Supplemental Table S2).
Compiling transcriptomic and proteomic compartmentalization data with literature-based pathways yielded a model of 3,587 reactions, leaving 2,880 unique reactions, still with their localization unknown.
Once reactions were compartmentalized based on transcriptomic data, proteomic data, and published literature, the reactions were divided into two groups. The first group (core set) includes reactions with known localization, while the second group (non-core set) spans reactions known to occur within the maize leaf but with no localization evidence.
Whenever possible, core reactions were unblocked by first adding reaction(s) from the non-core set to one or multiple compartment(s) and second by appending inter-or intracellular transporter(s) (see Materials and Methods section). By following this approach, 1,032 unique reactions with previously unknown localization were assigned to compartments and 729 transporters were added. The remaining 1,848 unique reactions were assigned to compartments based on available pathway information or assigned to the cytosol of both the bundle sheath and mesophyll cells.
With all the reactions assigned to specific compartments, thermodynamically infeasible cycles that were generated due to the overly permissive inclusion of reactions in the model, as well as lack of reaction directionality information, were subsequently identified and eliminated. By first restricting the directionality of reactions and second removing reactions, it was possible to eliminate all thermodynamically infeasible cycles in the model. By this process we restricted the directionality of 36 reactions and removed 2,055 reactions from the model (Table II)

Incorporation of transcriptomic and proteomic data in the model
In order to more accurately model the N + WT, N -WT, and GS mutant conditions in maize, GPR associations mapped the gene transcripts and proteins that were statistically expressed at a low level to reactions that were turned-off in the model. However, no essential reactions to the model, which are required for biomass formation, were altered.
For example, the δ -aminolevulinic acid dehydratase reaction was experimentally determined to be higher in the N + WT condition, suggesting it should be restricted in the N -WT condition. However, when the flux through the δ -aminolevulinic acid dehydratase reaction is restricted to zero, biomass cannot be formed, as this reaction produces  Table S3). N perturbations within the leaf tissue were modeled by combining the incorporation of transcriptomic and proteomic data with the unique biomass compositions for each condition.
The minimal set of reactions, whose elimination causes a decrease in biomass yield, was determined for the N + WT, N -WT, gln1-3 mutant, and gln1-4 mutant conditions. There 1 3 are six reactions across the conditions that encompass the minimal set of reactions, as summarized in Table III. Of the 83 reactions with restricted flux in the N + WT condition, only two reactions were identified to affect biomass yield. These two reactions are the conversion of ethanol to acetaldehyde through either ethanol oxidoreductase involving NAD + or a hydrogen peroxide-dependent oxidation of ethanol catalyzed by catalase (Boamfa et al., 2005). These two reactions have a very slight effect on biomass formation as biomass yield drops by less than 1%. As expected, we find that many of the reactions that correspond to genes that are significantly down-regulated in the N + WT condition do not hinder biomass formation. In the N -WT condition, none of the reactions have an effect on the biomass yield suggesting, as expected, that the decreased amount of N is the main limiting factor in biomass yield. In the gln1-3 mutant condition, three of the 100 reactions, which are switched off based on "omics" data, affect the biomass yield. These switch from sucrose to starch synthesis, and inhibit photosynthesis at high CO 2 levels in Arabidopsis resulting in the inhibition of plant growth (Strand et al., 2000). Finally, the regulatory restrictions for the gln1-4 mutant involve only nine reactions of which one affected the biomass drain (i.e., ribose-5-phosphate isomerase reaction). The lack of ribose-5-phosphate isomerase reaction has been experimentally shown to cause premature death and affect cellulose synthesis in Arabidopsis (Howles et al., 2006;Xiong et al., 2009). A comparison of the number of reactions that affect the GS mutants suggests that at the V stage, the impact of the gln1-4 mutation is less severe than that occurring in the gln1-3 mutant. Such a finding is not surprising, since it has been shown that the gene encoding the GS1-3 isozyme is constitutively expressed irrespective of the leaf development stage and that the expression of the gene encoding the GS1-4 isozyme is much lower and only enhanced at later stages of leaf development (Hirel et al., 2005).
Although only a subset of reactions affect the biomass production in the N + WT, gln1-3 mutant, and gln1-4 mutant conditions, the additional regulation will have an effect on the flux predictions within the model.

Flux range variations among conditions
The flux range of each reaction was determined in the N + WT, N -WT, gln1-3 mutant, and gln1-4 mutant conditions under the assumption that the biomass is being maximized.
The flux range of a reaction in the N -WT, gln1-3 mutant, and gln1-4 mutant conditions was compared to the flux range in the N + WT reference condition to determine reactions whose flux ranges must deviate from the N + WT flux range. This indicates that the flux   1 6

Comparison of model predictions to metabolomic data
The metabolomic data were compared to flux predictions within the model in each of the various N background conditions. The increasing or decreasing trend of the metabolite concentration, displayed in Figure 3, was qualitatively compared to the change in the flux-sum range determined by the model, as displayed in Figure 4. The flux-sum is a measure of the amount of flow through the reactions associated with either the production or consumption of the metabolite. A variability analysis of flux-sum was performed and flux-sum ranges, normalized by the biomass rate, that do not overlap between the N background condition and the N + WT condition were analyzed. An increase/decrease in the flux-sum (i.e., used as a proxy for the metabolite pool) of a metabolite between the N -WT condition and the N + WT condition and between the two GS mutants and the N + WT condition was compared with the metabolite concentration changes. Figure 4 demonstrates the importance of restricting fluxes based on transcriptomic and proteomic data. In the N -WT condition, the accuracy changes from 13% to 90% when the flux constraints based on "omics" data are incorporated. Without the incorporation of these constraints, all flux-sum ranges normalized by the biomass rate are predicted higher in the N -WT condition. The identified flux-sum levels are included in Supplemental Table   S6. The flux-sum variability approach is able to predict the change in metabolite pool sizes more accurately when the flux ranges are similar to the wild-type condition, as in the N -WT condition. Between the N -WT and N + WT condition, only approximately 7% of the reactions active in either condition have flux ranges at the maximum biomass that do not overlap between the two conditions. In the gln1-3 and gln1-4 mutant conditions, the fluxes are significantly perturbed with 49% and 45% of the active reactions at maximum biomass resulting in non-overlapping ranges compared to the N + WT condition, respectively. The accuracy of flux-sum in the gln1-3 mutant and gln1-4 mutant condition with "omics"-based constraints incorporated reach 53% and 25% accuracy with 8 of 15 metabolites predicted correctly and 1 of 4 metabolites predicted correctly in the gln1-3 and gln1-4 mutant conditions, respectively. This level of prediction accuracy is far below what was seen for the N -WT alluding to a tenuous connection between concentration changes and gene expression levels when the genetic background changes. We explored the efficacy of the flux-sum method under different genetic backgrounds for a much more well studied and data-rich organism (i.e., E. coli) to explore whether the dissonance between gene expression levels and concentrations was maize-specific or applied broadly. We applied flux-sum variability to the Ishii et al. (2007)  We also decided to explore whether the dissonance between gene expression levels and concentration ranges was caused due to a deficiency in the proposed flux-sum method.
As an alternative, we used flux imbalance analysis ( prediction of compartment-specific metabolites whose associated reactions do not carry flux under maximum biomass formation. Comparable results between the flux imbalance analysis and flux-sum analysis in the N -WT condition provide independent backing regarding the validity of the flux-sum concept.

Concluding remarks
We have introduced a second-generation model that is specific for the leaf tissue of maize and differentiates between the bundle sheath and mesophyll cell types. By incorporating transcriptomic and proteomic data into the model, we were able to reproduce the metabolomic data with up to 90% accuracy when comparing the N -WT and N + WT conditions. Ethanol oxidoreductase/catalase, glyceraldehyde-3-phosphate dehydrogenase, fructose-bisphosphate aldolase, fructose bisphosphatase, and ribose-5-phosphate isomerase were shown to be important genes in suppressing biomass formation in the modeled conditions. In order to study the impact of these genes on plant biomass production when optimal N is provided, their functional validation can then be undertaken using transgenic technologies, mutagenesis, or by association genetics, either at the single gene or genome-wide level (Simons et al., 2014). The model also predicted a modification of the flux of metabolites formed during glutathione catabolism in the gln1-4 mutant conditions compared to the N + WT condition. This modification is predicted to compensate for the lack of GS1-4 by using the glutamate and pyruvate derived from glutathione to produce alanine. Thus, it will be interesting to determine whether the increase in alanine is related to the importance of the enzyme alanine aminotransferase in the improvement of plant productivity in general and NUE in particular (Good and Beatty, 2011;McAllister et al., 2013). In all N background conditions (i.e. N -WT, gln1-3 mutant, and gln1-4 mutant conditions), we find that the flux through chlorophyll biosynthesis, and those pathways directly related to chlorophyll biosynthesis, decrease confirming the important link between N metabolism and chlorophyll synthesis through the use of its precursor glutamate (Forde and Lea, 2007). The leaf model, with the addition of other maize tissue-specific models, can be integrated into a whole-plant genome-scale model for maize. By determining a required metabolic function that is specific to each tissue, tissue-specific models can be created ensuring that only relevant 1 9 reactions are included in each tissue. Future efforts will focus on tissue-specific models for the kernel, stalk, tassel, and root tissues. These tissue-specific models will follow community (Zomorrodi and Maranas, 2012;Zomorrodi et al., 2014) and multi-tissue human models (Duarte et al., 2007;Bordbar et al., 2011;Thiele et al., 2013) reconstruction principles. The tissues can be linked using inter-tissue transport reactions with the stalk tissue acting as the central transporter among the various tissues and particularly to developing ear (Cañas et al., 2012). A whole-plant genome-scale model of maize will help to elucidate the flow of N from the root to the other tissues in the plant, from the shoot to the ear, and within the developing ear (Cañas et al., 2010). By modeling the entire plant, non-intuitive bottlenecks in N metabolism can be determined, which then can be used to suggest genetic interventions through mutagenesis, transgenic technology, or maker-assisted section to increase the NUE in maize. In addition, the flow of sugars to the kernel tissue can be analyzed to guide the increase of carbohydrate/sugar content of maize kernel by breaking the inverse relationship existing between carbohydrates and proteins (Feil et al., 1990). Apart from its crucial role as a food crop, maize is also used for cellulosic biofuels. To this end, the amount and composition of cell wall polymers is important in developing cellulosic maize. Lignin not only provides rigidity to the maize plant (Sticklen, 2008;Vanholme et al., 2008), but also makes digestion of cellulosic and hemicellulosic sugars difficult during delignification (Li et al., 2008). Recent research endeavors have focused on altering lignin content, since plant viability and fitness are affected by lignin reductions (Li et al., 2008;Bonawitz et al., 2014). Therefore, by utilizing the whole-plant genome-scale model a system-wide implication of these genetic disruptions can be quantitatively assessed, thus facilitating new strategies for reducing lignin content without affecting the mechanical integrity of the maize plant.

Yield Components Analysis
Kernel yield, its components, and the N content of different parts of the plant at stages of development from silking to maturity were determined according to the method described

RNA and DNA Preparation
Total RNA was extracted as described by Verwoerd et al. (1989) Table S1. Statistical tests were computed and combined for each probe set using the logtransformed data. A significant probe set indicates an adjusted p-value was less than the effective α -level (α=0.05) in at least one of the two gene selection tests. A filtering procedure was used to exclude data points with low signal intensities (Amean < 7.0) that are considered biologically unreliable.

Total Protein Extraction, Solubilization, and Quantification
A TCA/acetone protein precipitation was performed as described by Méchin et al.

Two-dimensional Electrophoresis, Gel Staining, and Image Analysis
Total protein extraction, solublization, and quantification were performed as described by Méchin et al. (2007). Solubilized proteins (300µg) were subjected to twodimensional gel (2-D gel) electrophoresis and identified by liquid chromatography-mass spectrometry as described by Amiour et al. (2012)

Protein Identification by LC-MS/MS
Spot digestion and LC-MS/MS were performed as described by Martin et al. (2006). Ingel digestion was performed with the Progest system (Genomic Solution). Gel pieces were washed twice by successive separate baths of 10% acetic acid, 40% ethanol, and acetonitrile (ACN). The pieces were then washed twice with successive baths of 25 mM NH 4 CO 3 and ACN. Digestion was subsequently performed for 6 h at 37°C with 125 ng of modified trypsin (Promega) dissolved in 20% methanol and 20 mM NH 4 CO 3 . The peptides were extracted successively with 2% trifluoroacetic acid (TFA) and 50% ACN and then with ACN. Peptide extracts were dried in a vacuum centrifuge and suspended in 20 mL of 0.05% TFA, 0.05% formic acid, and 2% ACN. HPLC was performed on an Ultimate LC system combined with a Famos Autosampler, and a Switchos II microcolumn switch system (Dionex). Trypsin digestion was declared with one possible cleavage. Cys carboxyamidomethylation and Met oxidation were set to static and variable modifications, respectively. A multiple-threshold filter was applied at the peptide level: Xcorr magnitude were up to 1.7, 2.2, 3.3, and 4.3 for peptides with one, two, three, and

Metabolite Extraction and Analyses
Lyophilized leaf material was used for metabolite extraction. Approximately 20 mg of the powder was extracted in 1 ml of 80% ethanol/20% distilled water for an hour at 4°C.
During extraction, the samples were continuously agitated and then centrifuged for 5 min at 15,000 rpm. The supernatant was removed and the pellet was subjected to a further extraction in 60% ethanol and finally in water at 4°C, as described above. All supernatants were combined to form the aqueous alcoholic extract.
Nitrate was determined by the method of Cataldo et al. (1975). Total soluble amino acids were determined by the Rosen colorimetric method with leucine as a standard (Rosen, 1957). Chlorophyll was estimated using 10 mg of fresh leaf material (Arnon, 1949). The total N content of 2 mg of lyophilized material was determined in a N elemental analyzer using the combustion method of Dumas (Flash 2000, Thermo Scientific, Cergy-Pontoise, France). Starch content was determined as described by Ferrario-Méry et al. (1998).
Total lipids were extracted from frozen leaf material according to Miquel and Browse (1992). Individual lipids were purified from the extracts by one-dimensional thin layer chromatography on silica gel 60 plates (Ohnishi and Yamada, 1980;Lepage, 1967), The neutral monosaccharide composition was measured on 5 mg of dried alcohol insoluble material after hydrolysis in 2.5 M trifluoroacetic acid for 1.5 h at 100 °C as described by Harholt et al. (2006). To determine the cellulose content, the residual pellet obtained after the monosaccharide analysis was rinsed twice with ten volumes of water and hydrolysed with H 2 SO 4 as described by Updegraff (1969). The released glucose was diluted 500 times and then quantified using an HPAEC-PAD chromatography as was measured. The lignin content was calculated using the following formula:

Metabolome Analysis
All steps were adapted from the original protocol described by Fiehn (2006) following the procedure described by Amiour et al. (2012).

Model Development and Curation
Here, m i and b i are the RPKM abundance of gene i in the mesophyll and bundle sheath cells, respectively (Chang et al., 2012). A gene that is only expressed in one cell type will have a R i of 1, while a gene that is equally expressed in both cell types will have a R i of 0.
As suggested by Chang et al., a threshold of 0.8 or a five-fold abundance difference is adopted to assign gene cell type specificity. In the absence of RPKM information, an adjusted spectral count (adjSPC) along with the fold change difference between the mesophyll and bundle sheath cells was used to determine gene cell type specificity (Friso et al., 2010). adjSPC is the number of mass spectra identified for a protein normalized by the number of unique spectral counts. Since low counts are not statistically informative, a cutoff of 10 was used for adjSPC (Zybailov et al., 2008;Kim et al., 2009). Similar to the threshold used for RPKM data, a five-fold difference between the mesophyll and bundle sheath cell type normalized spectral abundance factor (nSAF) was used to determine cellular specificity of any gene (Friso et al., 2010). nSAF is a weighted adjSPC based on the number of theoretical tryptic peptides with a relevant length (Ehleringer et al., 1997;Friso et al., 2010). Additional intracellular compartmentalization was carried out based  (Sun et al., 2009). An original set of intercellular and intracellular transporters was determined based on literature evidence (Alberte and Thornber, 1977;Leegood, 1985;Stitt and Heldt, 1985;Furbank et al., 1989;Weiner and Heldt, 1992;Doulis et al., 1997;Burgener et al., 1998;Taniguchi et al., 2004;Sowiński et al., 2008;Friso et al., 2010). In the subsequent standardization step, the MetRxn knowledgebase (Kumar et al., 2012) as well as manual curation was used to standardize the description of metabolites and reactions such as fixing stoichiometric errors (i.e., elemental or charge imbalances) and incomplete atomistic detail (e.g. absence of stereospecificity, and presence of R-group(s)). Reactions and metabolites were given KEGG identifiers where available or were otherwise given new identifiers (in the form of MR or MC, respectively). Reaction directionality was adopted from the manually curated MetaCrop database, as available, and from the MaizeCyc database for the remaining reactions.
In the next step of model development, all reactions (including metabolic, intra-and extra-cellular transport reactions) were divided into two categories based on the evidence www.plantphysiol.org on August 29, 2017 -Published by Downloaded from Copyright © 2014 American Society of Plant Biologists. All rights reserved. of their inter-and intra-cellular compartmental specificity. The core set contains all metabolic reactions with experimental or literature-backed evidence of intracellular or intercellular compartmentalization, as well as known intracellular and intercellular transporters. The non-core set contains reactions with partial or completely absent localization information. Barring any conflicting evidence, these reactions were provisionally placed in all compartments. An optimization formulation (as shown below) was developed by imposing flow though the maximal number of core reactions while including minimal intra-and inter-cellular transporters and minimal participation of noncore reactions in various compartments. A parsimony criterion was used to apportion non-core functions, so that core functions could be restored. Furthermore, in order to restore a core function, the resolution strategy was prioritized in the following order: (i) apportion non-core reaction(s) in one/multiple compartment(s), (ii) add intra-cellular transporter(s) and (iii) add inter-cellular transporter(s). To this end, an objective function was formulated by taking the weighted sum of number of non-core reactions, intra-and inter-cellular transporters by providing weights of 1, 10 4 and 10 6 , respectively for these three groups of reactions. However, it is important to ensure that any resolution strategy does not cause thermodynamically infeasible cycles. Therefore, each of these solutions was further checked and those reactions that result in the formation of a cycle were rejected. For each core reaction, multiple solutions were determined and the solution that fixes the largest number of core reactions was accepted. When required, manual curation was used to delineate between multiple solutions. This approach is analogous to the one proposed by Mintz-Oron et al. (2012) but does not rely on a complicated scoring system.
It is also computationally less taxing, as it activates one core reaction at a time. encompasses an overall set of reactions M={1,…,m} and a set of metabolites N={1,…,n}.
In addition, binary variable y j is defined as: The task of identifying the minimal set of additional reactions that enable flux through a core reaction j* is posed as the following mixed integer linear programming problem.
Subject to: Here, S ij is the stoichiometric coefficient of metabolite i in reaction j and v j is the flux value of reaction j. Parameters v j,min and v j,max denote the minimum and maximum allowable fluxes for reaction j, respectively. v j* represents the core reaction flux that is currently being unblocked and ε is a small value to ensure a threshold amount of flux through each core reaction. c 1 , c 2 , and c 3 represent weights associated with each set of reactions (i.e., non-core set, intracellular transporters set, and intercellular transporters set, respectively). In this formulation, the objective function (1) minimizes the number of added reactions (from three reaction sets as mentioned earlier) so as to restore flux flow through reaction j*. We chose values of 1, 10 4 , and 10 6 for c 1 , c 2 , and c 3 , respectively, so metabolic reactions without experimental or literature evidence for compartmental specificity are added to specific compartment(s) before including additional transport reactions with no literature evidence. Constraint set (2) represents the pseudo-steady state assumption, while constraint (3) determines the threshold amount of flux necessary through j*. Bounds on core reaction fluxes are imposed by constraint set (4), while constraint set (5) ensures that only reactions from those three sets having non-zero flow are added to the model. This algorithm is repeated for each core reaction j* to ensure flux and, hence, provides compartmentalization assignments for 431 metabolic reactions by assigning them to at least one compartment, adding 1,032 total metabolic reactions to the model as shown in Table III. The reactions identified by the above-mentioned algorithm plus the reactions from the core set constituted two new sets, a set of reactions with resolved compartmental information and a set whose location still needs resolution as shown in Figure 5.
Reactions from the latter set that are known to occur within the maize leaf tissue, but were not in the initial model were added to intra/inter-cellular compartments manually In the final step, as shown in Figure 5, the GapFind/GapFill (Kumar et al., 2007) procedure was applied to identify blocked/dead-end metabolites and subsequently restore  Table IV. Reactions with GPRs associated with significantly lowered transcriptomic and proteomic expression are switched off under the corresponding conditions. Metabolite turnover rates were determined based on the flux-sum analysis method (Chung and Lee, 2009) and compared to the metabolomic data. The range of the flux-sum or the flow through each metabolite with experimental measurements was maximized/minimized as follows: Subject to: Here set E represents the set of metabolites with experimental measurements and set LE represents reactions with statistically lower expression of gene transcripts and/or proteins.
The formulation was run in an iterative manner for each metabolite with experimental measurements. The formulation was also repeated for each individual condition ensuring the proper nutrients and simulated knockouts were considered. By linearizing the objective function the resulting formulation is a mixed integer linear programming problem similar to the description by Chung and Lee (2009). Therefore, the basic idea is to determine the range of the flux-sum of a metabolite (for which metabolomic data is 1 available) under a given condition by switching off reaction fluxes corresponding to gene transcripts and/or proteins with lower expression levels (i.e. constraint 7). The flux-sum ranges were determined at the maximum biomass for the condition as displayed in constraint 8. Predictions were only made when the flux-sum ranges did not overlap between the N background condition and the N + WT condition and when the direction of change in all compartments was consistent. In this way, the compartment-specific predictions of the flux-sum ranges were compared to tissue-specific experimental measurements. The flux-sum levels in the N -WT, gln1-3 mutant, and gln1-4 mutant conditions were compared to the reference N + WT condition to find the qualitative trend in the change of metabolite pool size between the conditions. Flux variability analysis (FVA) was used to determine the flux range of each reaction under maximum biomass by subsequently maximizing and minimizing the flux through each reaction. The flux range of each reaction for the N -WT, gln1-3 mutant, and gln1-4 mutant conditions was compared to the reference N + WT condition. Flux ranges that did not overlap between one of the N background conditions and the reference condition were further analyzed. These are reactions that must change in response to the limited amount of N or the mutant conditions. Finally, for each condition, the minimum number of reactions which, when not regulated, will restore the biomass to the yield obtained when no "omics" based regulation is applied were determined. This was done by identifying the minimal set of reactions, which are included in the "omics" based regulation, that when active would allow for a biomass yield equivalent to the yield under no "omics" based regulation. This set of reactions represent the reactions whose restriction affects the biomass yield.

TABLES Table I. Experimental content of classes of metabolites in different conditions
The biomass components were determined experimentally for each of the conditions (N + WT, N -WT, gln1-3 mutant, and gln1-4 mutant). Values are the mean of three replicates unless indicated by *, representing that two replicate measurements were taken. Biomass measurements for the specific metabolites within each class are displayed in Supplemental  The minimum set of reactions that are down-regulated as a result of the inclusion of proteomic and transcriptomic data and affect biomass synthesis are displayed. The corresponding condition is displayed for each reaction as well as the role of the reaction.

Reaction
Condition (

Number of gene transcripts, proteins, and metabolites that significantly vary
The WT condition for each study was combined to create one uniform WT condition.
The number of gene transcripts, proteins, and metabolites that statistically vary are displayed below.