Plant Physiol.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shasha, D. E.
Right arrow Articles by Coruzzi, G. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shasha, D. E.
Right arrow Articles by Coruzzi, G. M.
Agricola
Right arrow Articles by Shasha, D. E.
Right arrow Articles by Coruzzi, G. M.

Plant Physiol, December 2001, Vol. 127, pp. 1590-1594

SCIENTIFIC CORRESPONDENCE

Using Combinatorial Design to Study Regulation by Multiple Input Signals. A Tool for Parsimony in the Post-Genomics Era1


Dennis E. Shasha, Andrei Y. Kouranov, Laurence V. Lejay, Michael F. Chou, and Gloria M. Coruzzi*

Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, New York 10012 (D.E.S); Department of Biology, New York University, 100 Washington Square East, New York, New York 10003 (A.Y.K., L.V.L., M.F.C., G.M.C.)


    INTRODUCTION
TOP
INTRODUCTION
USING COMBINATORIAL DESIGN TO...
SUMMARY AND PROSPECTS FOR...
LITERATURE CITED

For many systems in biology, genes, pathways, or metabolites are regulated or synthesized in response to multiple "input" signals. Herein, we describe how combinatorial design can be used to define a small set of experiments that will effectively explore the effects of all possible combinations of multiple inputs on such regulation.

Historically, understanding how a single signal (e.g. hormone) might potentially regulate a specific gene or the synthesis of a metabolite has been a daunting task that has required molecular biology, genetics, and cell biology. The new challenge in the post-genomics era is to understand how whole genomes or metabolomes respond not only to single signals, but also to collections of potentially interacting inputs.

Hormone interaction is a classic example of multiple input regulation. In plant biology, this includes the antagonistic effects of auxin and cytokinin, or abscisic acid and gibberellin, in regulating plant growth and development (Milborrow, 1970; Bertell and Eliasson, 1992; Beemster and Baskin, 2000; White and Rivin, 2000; Gomez-Cadenas et al., 2001). Owing in large part to molecular genetic approaches in Arabidopsis, the molecular basis for these types of regulation has recently begun to emerge. For example, cytokinin-signaling pathways converge on auxin signaling pathways (Coenen and Lomax, 1997; Ross and O'Neill, 2001), and sugar-signaling pathways have been shown to converge with hormones (e.g. ethylene and abscisic acid; Zhou et al., 1998; Arenas-Huertero et al., 2000; Huijser et al., 2000; Gibson et al., 2001; Rook et al., 2001). These pathways have also been shown to converge with other signaling pathways because it has recently been shown that sugar status and hormones can respectively affect developmental responses (Xiao et al., 2000; Hanson et al., 2001) and gene responses to nitrogen (Taniguchi et al., 1998; Zhang et al., 1999).

The intersections of "input" signaling pathways have been termed "cross talk" (Knight and Knight, 2001), and the interdependence of such "input" signals has been referred to as "combinatorial control" (Singh, 1998) and more recently as "matrix effect" (Coruzzi and Zhou, 2001). A complex picture is emerging in which signaling systems are subject to a "matrix" effect in which downstream responses are dependent on the interaction of multiple variables including, for example, cell type, developmental stage, metabolic status, and environmental conditions (Moller and Chua, 1999; Roitsch, 1999; Stitt, 1999; Coruzzi and Zhou, 2001).

In our ongoing studies to explore effects of nitrogen on gene expression, a complex picture has been emerging wherein gene responses to nitrogen sources appear to be dependent on multiple variables including starvation, light, and carbon status, to name a few (Lam et al., 1998; Oliveira and Coruzzi, 1999; Coruzzi and Zhou, 2001). At present, most gene chip or microarray experiments are concerned with responses to one variable or input at a time, where other variables are given fixed values. Such an approach leaves open the question as to whether changing the other variables might alter the influence of the input being tested. To explore the effects of multiple input interactions in a systematic and thorough way, we have begun to apply "combinatorial design" to define small sets of experiments needed to effectively investigate these "matrix" interactions on gene expression. This type of approach should become even more important in the post-genomics era as a means to effectively and efficiently test the regulation of whole genomes to multiple inputs.


    USING COMBINATORIAL DESIGN TO EFFECTIVELY AND EFFICIENTLY TEST EFFECTS OF MULTIPLE INPUTS ON A BIOLOGICAL SYSTEM: CASE STUDIES
TOP
INTRODUCTION
USING COMBINATORIAL DESIGN TO...
SUMMARY AND PROSPECTS FOR...
LITERATURE CITED

The "matrix effect" referred to above tells us that many inputs could potentially influence the regulation of a target gene, pathway, or metabolite. But suppose that a researcher thought that only a subset of these inputs in fact had this influence. This implies that the other inputs don't matter. How would one test such a hypothesis? Combinatorial design is one approach. We briefly describe the inspiration for this approach from the field of software testing and then explain its application to case studies in plant biology.

In software testing, the hypothesis (sometimes shown to be wrong) is that the software is correct under all input combinations. That is, if one could show that varying the inputs doesn't matter as far as correctness of the output is concerned, then the software is deemed to be "correct." For example, suppose there are 10 possible inputs, each having four possible values. This leads to 410 combinations of inputs, or 1,048,576 tests of the software. Such a large number is infeasible because software testing is still a manual art. Cohen et al. (1997) observe that it may not be necessary to test every possible input combination (a number that grows exponentially with the number of independent variables), but rather to test a much smaller subset of inputs that cover every value of each independent variable and every pair of values from each pair of independent variables. In general, this relaxed constraint allows several pair-wise combinations to be tested at once, significantly reducing the otherwise exhaustive test set. This insight leads to a test design in which the number of tests grows very slowly with the number of inputs. For the above example (10 inputs with four possible values each), the number of tests required under this hypothesis is merely 52, a reduction by a factor of 20,000. For another case where there are six inputs (a-f) and three possible values for each input (0, 1, or 2), the number of tests required is only 21, as compared with testing all 36 or 729 possible combinations. Note that there may be many possible input sets that satisfy the above pair-wise requirement. One possible set consisting of 21 tests is illustrated in Table I, where the notation "any value" means that any legal value of the given input (0, 1, or 2) can be used.


                              
View this table:
[in this window]
[in a new window]
 
Table I.   One possible "parsimonious test set" for exploring the effects of six different inputs (a-f), each with three input values (0, 1, or 2)

Now, computing a subset of the possibilities is not the same as computing all of them, so even if the software passes these tests, it may not be correct. Combinatorial design is a tradeoff between effort and thoroughness. However, the pragmatic fact is that this form of software testing picks up virtually all errors (Cohen et al., 1997).

The insight gained from software testing is that if there is a set of inputs that probably don't matter, then one can determine whether they in fact do matter using a small number of tests. Applying this idea to biology works as follows: If you are given a set S of inputs and you believe that a subset C are the only ones that matter, then you can use combinatorial design to show (or strongly suggest) that inputs S - C (the set difference) don't matter.

So, to put this principle to practice, we use as an example a case study, where there are six inputs: light, starvation, carbon, inorganic N, and organic N (Glu or Gln), and each of these inputs has three possible values (0, low [L], high [H]). As in the example above, to test how these six inputs interact to regulate gene expression, we would have to test 36 or 729 possible combinations. Allowing for replication required in northerns, microarrays, or metabolic profiles, this would be an unreasonable (and expensive) number of treatments to test. However, by using combinatorial design, we can propose a subset of these experiments that would give a good approximation of the experimental space.

To start, we might first hypothesize that one input (e.g. "light") is the only input that really matters. Then we would test all three values of the input "light" (0, L, or H) and determine whether the output (e.g. gene expression) changes depending on varying values of the other inputs. This entails applying combinatorial design to the other five inputs and combining the resulting treatments with each possible value of "light" (see Table II). So, for each pair of the remaining five inputs (each of which can be one of three values, 0, L, or H), nine pair-wise combinations (three × three) are tested (for example, see the bold entries in Table II). If all possible pairs of five inputs are tested, this results in a combinatorial design set of 17 treatments. These 17 treatments are then tested in the context of every value for light (0, L, or H) resulting in 3 × 17 or 51 total experiments (See Table II).


                              
View this table:
[in this window]
[in a new window]
 
Table II.   Combinatorial design predicts 51 treatments are necessary (out of a possible 729 combinations) to test the effects of "light," under all possible combinations of starvation, carbon, NH4NO3, Glu, and Gln, each with input values of 0, L, or H

For each of the three values for light (0, L, or H; separated by lines of space, note that each pair of the other inputs has all pair-wise combinations represented. For example, the entries in bold font show all nine (three × three) combinations of carbon and NH4NO3. However, scrutiny of any two columns will reveal that all nine possible input combinations are also represented. Although combinatorial design guarantees that, at a minimum, all pair-wise combinations are represented, in general, replication of some combinations will also occur.

We would next check the output (e.g. gene expression) to determine whether any of the other five inputs (e.g. starvation, carbon, NH4---NO3, Glu, or Gln) caused the output to vary. If the output did not vary in response to these other inputs, then "light" is the only input that matters, or is the dominant regulating input. However, if the output did vary when various combinations of the other inputs were tested, then we would next test the hypothesis that two inputs in fact matter (e.g. light and carbon). In our working example of six inputs (each with three values, 0, L, or H), if two inputs are hypothesized to matter then, using combinatorial design, 135 tests are needed. This is still a big savings over 729---the number required to test all possible combinations of six inputs.

Combinatorial design helps refine hypotheses as well. Suppose that in the above example, "light" determined the output (e.g. gene expression) only when its value was L or H. Then the output (e.g. gene expression) would vary depending on other inputs only when the "light" value was 0. By observing the differences in other inputs that caused the output to vary, one may arrive at some other hypothesis (e.g. when "light" is 0, then carbon determines the output). Forty-six experiments would be required to test this hypothesis, some of which are repeats of the original 51, and far fewer than the 135 tests needed to test two inputs.


    SUMMARY AND PROSPECTS FOR COMBINATORIAL DESIGN IN THE POST-GENOMICS ERA
TOP
INTRODUCTION
USING COMBINATORIAL DESIGN TO...
SUMMARY AND PROSPECTS FOR...
LITERATURE CITED

In addition to our proposed application of combinatorial design to biology described above, other applications to biology including genomic ones have also begun to emerge. Kerr and Churchill (2001) propose the use of combinatorial design in the construction of microarray experiments. They show that a balanced block design can avoid convolving factors such as placement on the chip and the dye used. Ben-Dor et al. (2000) use combinatorial design to specify the sequences of DNA to place into each cell of a microarray to minimize cross hybridization.

Our proposed application of combinatorial design to biology can potentially be used in the analysis of complex systems to allow researchers to determine which inputs do "matter" in a biological response and more importantly, to rapidly define which inputs don't matter, by using this streamlined and parsimonious experimental approach. The "read out" of the response can be changes in gene expression (DNA chips and microarrays), metabolite profiles (metabolomics), or even developmental responses. Using combinatorial design to reduce the experimental data set is important, not only because it is cost effective, but also because it presents a small number of datasets whose analysis will give an appropriate answer.

Finally, is it really possible to extrapolate what might be true for software testing to much more complex biological systems? We think that with care, the answer is yes. The reason is that the biological pathways so far elucidated can be modeled using fairly simple boolean circuits (Genoud et al., 2001). If some set of independent variables matter but a collection of tests suggested by combinatorial design doesn't discover their importance, then the underlying boolean circuit involving those variables would have to be quite complex. The "care" required has to do with performing repeats whenever results are not clear-cut because each experiment carries much more weight when one performs only a few. Finally, one can always view combinatorial design as a disciplined form of sampling, an approach that is taken whenever the number of inputs and outputs, if tested exhaustively, would otherwise make such biological experiments completely intractable.

    FOOTNOTES

Received August 28, 2001; accepted September 24, 2001.

1 This work was supported by the National Institutes of Health (grant no. GM32877 to G.M.C.) and by the National Science Foundation (grant no. IIS-9988636 to D.E.S.).

* Corresponding author; email gloria.coruzzi{at}nyu.edu; fax 212-995-4204.

www.plantphysiol.org/cgi/doi/10.1104/pp.010683.


    LITERATURE CITED
TOP
INTRODUCTION
USING COMBINATORIAL DESIGN TO...
SUMMARY AND PROSPECTS FOR...
LITERATURE CITED

© 2001 American Society of Plant Physiologists



This article has been cited by other articles:


Home page
Plant Physiol.Home page
D. E. Shasha
Plant Systems Biology: Lessons from a Fruitful Collaboration
Plant Physiology, June 1, 2003; 132(2): 415 - 416.
[Full Text] [PDF]


Home page
Plant Physiol.Home page
K. E. Thum, D. E. Shasha, L. V. Lejay, and G. M. Coruzzi
Light- and Carbon-Signaling Pathways. Modeling Circuits of Interactions
Plant Physiology, June 1, 2003; 132(2): 440 - 452.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
F. M. Ausubel
Summaries of National Science Foundation-Sponsored Arabidopsis 2010 Projects and National Science Foundation-Sponsored Plant Genome Projects That Are Generating Arabidopsis Resources for the Community
Plant Physiology, June 1, 2002; 129(2): 394 - 437.
[Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Shasha, D. E.
Right arrow Articles by Coruzzi, G. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shasha, D. E.
Right arrow Articles by Coruzzi, G. M.
Agricola
Right arrow Articles by Shasha, D. E.
Right arrow Articles by Coruzzi, G. M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
ASPB Publications PLANT PHYSIOLOGY THE PLANT CELL
Copyright © 2001 by the American Society of Plant Biologists