|
|
||||||||
|
Plant Physiol, June 2002, Vol. 129, pp. 394-437 Summaries of National Science Foundation-Sponsored Arabidopsis 2010 Projects and National Science Foundation-Sponsored Plant Genome Projects That Are Generating Arabidopsis Resources for the CommunityEdited by
Deciphering the functions of the
approximate 25,000 genes encoded in the Arabidopsis genome is an
extraordinarily complex, challenging, and expensive undertaking. In the
United States, federal funding for Arabidopsis genomics research is
coordinated by an interagency governmental program called the National
Plant Genome Initiative (NPGI). Established in 1988, NPGI has been
instrumental in the establishment of two relatively large programs at
the National Science Foundation (NSF), the Plant Genome Program and the
Arabidopsis 2010 Program. NPGI strongly supported the Arabidopsis
Genome Initiative in its goal to obtain the first complete sequence of
a plant genome. The Arabidopsis sequence, published in December 2000 (Arabidopsis Genome Initiative, 2000 The editors of this Plant Physiology special issue devoted to Arabidopsis-related research thought that it would be useful for the community to compile summaries of the Arabidopsis 2010 projects and the Plant Genome projects that are producing Arabidopsis resources. We hope that the following project summaries and/or progress reports will be a valuable and relatively concise source of valuable information that will catalyze the widespread dissemination of the huge body of data being generated about the Arabidopsis genome. Most of the projects have an associated Web site, the URL of which is indicated at the top of each summary.
Private Investigator (PI): David Bird, North Carolina State University, david_bird{at}ncsu.edu; Co-PI: Sandra Clifton, Washington University School of Medicine, sclifton{at}watson.wustl.edu; Co-PI: Thomas Kepler, Santa Fe Institute, kepler{at}santafe.edu; Co-PI: Joseph Kieber, University of North Carolina, Chapel Hill, jkieber{at}biomass.bio.unc.edu; Co-PI: Charles Opperman, North Carolina State University, warthog{at}unity.ncsu.edu; Co-PI: Jeffrey Thorne, North Carolina State University, thorne{at}brooks.statgen.ncsu.edu NSF Plant Genome Project No. 0077503; http://www.nematode.net Plants inhabit a complex environment in the
rhizosphere. Understanding the many interactions they experience with
other organisms (including parasites) is crucial to truly appreciate
plant development and function. Nematodes, which are ubiquitous soil
animals, are the most successful and cosmopolitan plant parasites (Bird
and Koltai, 2000 Nematode Gene Discovery Based on our best understanding of the phylogenetic relationships
within the genus Meloidogyne and the biological differences between species (host range, susceptibility of host R loci,
etc.), we have initiated expressed sequence tag (EST) sequencing
of six species (McCarter et al., 2000 In addition to immediate public submission of all EST data to GenBank, we established a Web site with tools to provide easier access to parasitic nematode sequence data. This site (http://www.nematode.net) allows BLAST and text searches of subsets of available ESTs (by species, library, clade, etc.) and NemaGene clusters. We also provide ftp access to all EST sequences and a viewer to inspect raw sequence trace data. Gene Expression Profiling In a compatible interaction between plants and
Meloidogyne spp., developmental changes ensue in the roots
and local changes in cytokinin and auxin levels are implicated. Thus,
examining the patterns of gene expression in Arabidopsis plants in
response to cytokinins and auxins is likely to be informative about the nematode-plant interaction; conversely, examining transcriptional changes during nematode infection may shed light on hormone regulation during other plant processes. Our preliminary gene expression studies
using Arabidopsis Affymetrix GeneChips (Affymetrix, Santa Clara,
CA) suggest that a set of diverse genes is induced after exogenous application of hormones, which exhibit a range of
induction kinetics. Analysis of these data has been done with a new,
nonparametric method of array normalization that we developed. Our
experiments using Arabidopsis are complemented with array experiments
using spotted tomato (Lycopersicon esculentum) and
nematode ESTs and in situ PCR of nematode-infected tissue (Koltai and
Bird, 2000
PI: Hans J. Bohnert, University of Illinois, bohnerth{at}life.uiuc.edu; Co-PI: Ray A. Bressan, Purdue University, bressan{at}hort.purdue.edu; Co-PI: Robert Burnap, Oklahoma State University, burnap{at}okstate.edu; Co-PI: John C. Cushman, University of Nevada, Reno, jcushman{at}unr.edu; Co-PI: David W. Galbraith, University of Arizona, galbraith{at}arizona.edu; Co-PI: Paul M. Hasegawa, Purdue University, paul.m.hasegawa.1{at}purdue.edu; Co-PI: Rolf A. Prade, Oklahoma State University, prade{at}okstate.edu; Co-PI: Jian-Kang Zhu, University of Arizona, jkzhu{at}ag.arizona.edu NSF Plant Genome Project No. 9813360; http://www.stress-genomics.org; http://www.OSMID.org How plants respond to stress in the environment is crucial to their productivity and survival. Among the yield-reducing factors, abiotic stresses play a significant role. Based on the prevailing view of stress resistance and sensitivity as multigenic traits, we initiated a genome-wide, phylogenetic analysis involving transcriptional profiling analysis of wild-type and stress-related mutants with a focus on sensing and response pathways that constitute the functional basis of osmotic and ionic stress tolerance. Our goal is to determine the number and functional complexity of essential, important, or ancillary genes that prepare plants to cope with stress. The comparative evolutionary approach includes a survey of cellular and organismal stress tolerance response pathways in halophytes and glycophytes alike by carrying out transcript expression analysis in a variety of model species including yeast (Saccharomyces cerevisiae), Aspergillus nidulans, Dunaliella salina, Mesembryanthemum crystallinum, rice (Oryza sativa), and Arabidopsis. Benefiting from Arabidopsis genome sequence and resources, a mutant generation and characterization pipeline has by now provided more than 200,000 T-DNA tagged lines in various genetic backgrounds, many of which affect the stress response phenotype and facilitate detection of mutations in specific stress signaling pathways. The sequenced cyanobacterium Synechocystis sp. PCC6803, which is easily manipulated with gene replacement by homologous recombination, serves as a model for studying the effects of fundamental stress responses on photosynthesis, ion homeostasis, reactive oxygen species scavenging, and respiration. A full-genome Synechocystis sp. DNA microarray has been generated and PCR-based deletion mutagenesis is used to assess the functions of genes identified by microarray analysis. This approach, for example, is being used to determine the functions of five paralogous Na+/H+ antiporter genes. The results to date indicate that redundancy corresponds, in part, to a functional mosaic; that is, specific paralogs seem to be dedicated to specific aspects of different physical parameters (e.g. pH and salinity) or carbon uptake that impacts pH homeostasis. In addition, microarrays let us explore the regulatory cascades involved in the multiphasic cell growth patterns and physiological activity that is observed after salt shock of this simple autotroph. Saprophytes, such as yeast and A. nidulans, must
rapidly adapt metabolism and constantly monitor their changing
environment. In the multicellular fungal salt tolerance model, A. nidulans vegetative growth requires positive turgor pressure.
Deletion of hogA, the nonredundant MAP kinase of the
high-osmotic glycerol pathway, partially reduces the
ability of the fungus to grow on high salt but severely affects
cell wall biogenesis and disrupts cell and nuclear division synchrony.
Transcriptome (microarrays by "aspergillus-genomics.org") analyses
of wild type and As part of our gene discovery efforts, we used repetitive rounds of
differential subtraction screening to identify 84 salt-regulated genes
in Arabidopsis, the majority of which were not previously known to be
salt responsive. Additional mutants, whose characterization is ongoing,
will increase this number. Six of these were implicated in playing
pivotal roles in the SOS signal pathway to mediate ion
homeostasis and salt tolerance. In addition, we have identified a set
of transcripts that comprise a common salinity stress response pathway for cell-specific functions involved in restructuring of the
proteome (RNA and protein turnover and new synthesis). These
latter genes are involved in pathways that preserve cell integrity, protein chaperoning, ion, water and metabolic homeostasis, and radical scavenging and detoxification. Overall, approximately 5%
to 10% of all transcripts are altered in the model organisms for which
salt stress-related microarray datasets have been generated (yeast,
Synechocystis sp., M. crystallinum, rice,
and Arabidopsis). One set of changes is indicative of the "salt
stress emergency response," which has a species-specific threshold.
Such changes are transient unless stress overwhelms the
defense capacity of the species. Responses to salinity stress in the
multicellular models include these cell-specific response categories,
but additionally encompass functions in longer term
adaptation and long-distance integration We have generated and provide to the community Arabidopsis T-DNA-tagged mutants defective in stress tolerance and/or stress signaling. Our screen also identifies lines that have defects in metabolism, transcription and translation machinery, and in protein targeting within cells. Moreover, we are able to provide a large set of tagged Arabidopsis mutant lines useful for different screens. We have established, annotated, and provide ESTs for the core set of stress-related transcripts from the glycophytic Arabidopsis and rice, based on the finding that stressed plants include a significant population of transcripts that are not expressed in the unstressed state. We have provided stress-related ESTs from naturally tolerant
species We have assembled DNA microarrays for the core set of stress-related
transcripts and ESTs for expression analysis in the three higher plant
models Results from this program provide knowledge for future plant improvement; for example, by providing transcripts and microarray data that can be incorporated into genetic engineering of elite germplasm and marker-assisted breeding programs. Several collaborations have started that use salt- and drought-induced transcripts as markers and initial results have correlated such transcripts with quantitative trait locus (QTL) regions. Salinity and drought stresses constitute a permanent and increasing agronomic problem in many areas of the world. Long-term irrigation agriculture, for example, which is about three times more productive than rain-fed agriculture, inevitably will continue to suffer production losses due to increased soil salinity. Plant breeding has not yet produced varieties suitable for use in such environments. Our work provides genes and (Arabidopsis) mutants, putative functions by homology and analysis, evolutionary comparisons, and a description of gene expression changes during stress in comparison with the unstressed state over the lifetime of several model species. The work has resulted in a number of publications, including: Bressan
et al. (2001)
PI: Nick Carpita, Purdue University, carpita{at}btny.purdue.edu; Co-PI: Sara Patterson, University of Wisconsin, spatters{at}facstaff.wisc.edu; Co-PI: Tony Bleecker, University of Wisconsin, bleecker{at}facstaff.wisc.edu; Cooperator: Maureen McCann, John Innes Centre, UK, maureen.mccann{at}bbsrc.ac.uk NSF Plant Genome Project No. 0077719; http://www.btny.purdue.edu/cellwalls Plant cell walls are composed of independent but
interacting networks of carbohydrates, proteins, and aromatic
substances (McCann and Roberts, 1991 We have developed Fourier transform infrared (FTIR) microspectroscopy
as a powerful and selective screening technique to identify broad
classes of cell wall biogenesis-related genes (Chen et al., 1998 Having established the screening and selection protocols and the throughput necessary to accomplish the logistic goals, a comprehensive team has been assembled to identify the genes of potential mutants and to determine their function in a biochemical and cellular context. In addition to the FTIR screen, we have strengthened the overall program to include other spectroscopic approaches. Steve Thomas (Colorado State University, Fort Collins) uses near-infrared spectroscopy as an ultrahigh-throughput means to identify maize secondary wall mutants in plants in the field, and June Medford (Colorado State University) has developed optical coherence microscopy to nondestructively characterize morphological mutants whose defects may arise from changes in wall architecture. "Reverse genetics" will be used to uncover mutant phenotypes resulting from insertions of transposon and T-DNA in genes that are already known to be wall biogenesis-related in maize (Don McCarty and Karen Koch) and in Arabidopsis (Sara Patterson and Tony Bleecker, University of Wisconsin, Madison). Efficient systematic protocols employing biochemical, spectroscopic, and cytological approaches were developed in parallel to deduce specific defects in wall metabolism that result in the infrared phenotypes revealed by our screens. Wolf-Dieter Reiter (University of Connecticut, Storrs), Brad Reuhs (Purdue University, West Lafayette, IN), and Nick Carpita (Purdue University) will develop high-throughput biochemical and spectroscopic technologies to determine linkage structure, polysaccharide unit sequence structures, and wall architecture. Chris Staiger (Purdue University), Maureen McCann (John Innes Centre, Norwich, UK), and June Medford are developing cytological approaches to identify cellular bases of defects that affect the wall structure. Wilfred Vermerris (Purdue University) and Steve Thomas (Colorado State University) are coordinating studies of mutations that affect the polyphenolic structures of maize cell walls. As the heritability of the mutations is confirmed, the plant biology community will be informed of them through a Web site that will be created as a repository for all cell wall-related genomics, and a system will be devised to dispense them. A major practical goal is to generate plants with genetically defined variation in composition and architecture to permit assessment of modifications on wall properties and plant development. Because cell walls are an enormously important source of raw material, we anticipate that several of the genes we identify and characterize, as well as several of the plants with genetically defined alterations, will be of economic importance. Examples include the modification of pectin cross linking or cell-cell adhesion to increase shelf life of fruits and vegetables, the enhancement of dietary fiber contents of cereals, the improvement of yield and quality of fibers, and the relative allocation of carbon to wall biomass for biofuels. The expertise required to fulfill the goals of this project is interdisciplinary, and as part of the effort we will assemble postdoctoral teams to broadly overlap these disciplines and establish an interdisciplinary doctoral student training program in genetics and molecular biology of the plant cell wall.
PI: Gloria Coruzzi, New York University, gloria.coruzzi{at}nyu.edu; Co-PI: Nigel Crawford, University of California, San Diego, ncrawford{at}ucsd.edu; Co-PI: Dan Bush, University of Illinois, U.S. Department of Agriculture-Agricultural Research Service/Plant Biology, Urbana, IL, dbush{at}uiuc.edu; Co-PI: Bud Mishra, New York University, Courant Institute of Math and Computer Sciences, mishra{at}cs.nyu.edu; Collaborator: Dennis Shasha, New York University, Courant Institute of Math and Computer Sciences, shasha{at}cs.nyu.edu NSF Arabidopsis 2010 Project No. 0115586; http://www.nyu.edu/fas/biology/n2010 The goals of our Arabidopsis 2010 genome project entitled: "N2010: Nitrogen Networks in Plants" are to identify networks of genes regulated by nitrogen (N) levels, and to further identify the regulatory genes and cis-acting DNA elements involved in this regulation. These results should substantially advance our understanding of the regulation of N metabolism in the context of plant growth and development, as well as provide new insights into our understanding of complex regulatory metabolic gene networks in plants. Given the central role of N availability and metabolism in crop productivity, these results should also have broad agricultural impacts. The Arabidopsis genome project has uncovered a large set of genes
involved in the uptake, metabolism, and allocation of N (600+).
Expression studies on a small subset of genes encoding N-metabolic/transport proteins have shown that N levels regulate their
transcription. Proposed N signals include nitrate, ammonium, Glu, Gln,
and C to N balance (Coruzzi and Bush, 2001 In our ongoing studies of N regulation of gene expression, a complex
picture has been emerging. N regulation of gene expression appears to
be dependent on multiple variables including starvation, light, and
carbon status, to name a few (Coruzzi and Zhou, 2001 To our knowledge, a first set of genome chip experiments
related to nitrate signaling was conducted by the Crawford lab (Wang et
al., 2000 A computer cluster will store the large amounts of data generated in this project provided via a publicly accessible Web page (http://www.nyu.edu/FAS/biology/N2010/). This Web site will include microarray expression datasets, gene identification information, and all software developed in this project. The new software will include new clustering algorithms, cis-search algorithms, as well as the bioinformatic tool "PathExplore," which can used to query expression datasets to search for coregulated genes in pathways, as described above. These new resources will be linked to the major plant databases for the widest possible distribution of information.
PI: Jeffrey L. Dangl, University of North Carolina, Chapel Hill, dangl{at}emailunc.edu NSF Arabidopsis 2010 Project No. 0114795; http://www.bio.unc.edu/faculty/dangl/lab/superpage.html Plants deploy an innate immune response after infection,
in addition to passive protection afforded by waxy cuticular layers and
preformed antimicrobials. Plant-pathogen interactions, particularly those involving biotrophic parasites, are governed by specific interactions between pathogen avr (avirulence) gene loci and
an allele of the corresponding plant disease resistance (R)
locus. When these are present in both host and
pathogen, the result is disease resistance. If either is inactive or
absent, disease results. R products recognize, directly
or indirectly, avr-dependent signals and trigger the chain
of signal transduction events culminating in a halt of pathogen growth.
Specific R-mediated immunity is layered atop one or more
basal response pathways. Basal defenses stop pathogen spread after
disease onset, protecting the organism at the cost of some tissue
destruction. Genetic overlap between specific and basal resistance
responses suggests that one function of R-mediated signaling
is to more rapidly and effectively deploy shared effector functions
(Dangl and Jones, 2001 Genetic screens, almost exclusively in Arabidopsis, defined loci
required for R gene action (Feys and Parker, 2000 We identified, mapped, and cloned the Arabidopsis RPM1 gene,
which conditions resistance to Pseudomonas syringae strains
carrying the avrRpm1 gene (Grant et al., 1995 We aim to understand "the function of a network of genes," a stated
2010 Project goal (see Table I). The
network begins with RPM1 and the genes required for its
function. However, loci so defined will overlap with loci required for
other R gene functions. Some of the genes to be studied were
identified by forward genetics; thus, we know they are relevant to this
signaling network. Some were isolated in yeast two-hybrid screens and
subsequent reverse genetic analyses confirmed their role in
RPM1-mediated or disease resistance-related
processes. Finally, some are molecular relatives of genes found via the
first two approaches, and we want to test the notion that they function
in similar disease resistance pathways. We cover two small multigene
families. We intend to make all the mutants and reagents generated
available. The first publications funded by the 2010 Project, and the
NSF grant that preceded it, are recently published or currently in
press (Tornero and Dangl, 2001
PI: Deborah Delmer, University of California, Davis, dpdelmer{at}ucdavis.edu; Co-PI: Candace Haigler, Texas Tech University, candace.haigler{at}ttu.edu; Co-PI: Allan Zipf, Alabama A&M University, aamzip01{at}aamu.edu; Co-PI: Andrew Spicer, Co-PI, Texas A&M, Houston, aspicer{at}ibt.tamu.edu; Unfunded Collaborator: Kanwarpal Dhugga, Pioneer HiBred, kanwarpal.dhugga{at}pioneer.com NSF Plant Genome Project No. 0110173; http://www-plb.ucdavis.edu/labs/Delmer/ Cellulose (1, 4-glucan) represents a major sink for carbon
in plants where it exists as a key cell wall polymer. The pattern and
extent of cellulose microfibril deposition contribute to patterns of
morphogenesis, to the unique characteristics of specialized cell types,
and to the strength and flexibility of plant stems. Cellulose is used
extensively as fuel, timber, fiber, forage, and chemical cellulose.
Manipulation of the patterns and extent of cellulose deposition, the
dimensions and crystallinity of the microfibrils, or the ratio of
cellulose to other sinks such as lignin or starch, can be expected to
improve the quality of many economically important plants. This project
seeks to continue work initiated in a previous NSF Plant Genome Grant
to study the functional genomics of the CesA gene family
proposed to encode the catalytic subunits of the multicomponent
cellulose synthase enzyme complex. The new project also extends these
objectives to include discovery and characterization of other genes
that are critical for the process. Research focuses on plants of
economic importance where modifications of this process could yield
most benefit Ongoing work includes: (a) studies of expression patterns of all 10 of the Arabidopsis CesA genes and their related ancestors, the CslD genes. We are defining developmental patterns of expression for all of these genes and also identifying potential pairs or triplets of CesA that are required as functional units within a single cell type, examining affects of carbon status and light on gene expression, and testing the hypothesis that the related CslD genes are the cellulose synthases of tip-growing cells; (b) with respect to maize, these studies will identify expression patterns for four key ZmCesA genes and relate these to any phenotypes generated in the four different selected Mu insertion lines that are mutated in these respective genes; (c) further testing of the hypothesis that at least two distinct CesA proteins and the Korrigan cellulase protein are all required for cellulose synthase complex formation and function; this is being done by co-expressing and analyzing complex formation and the ability to make cellulose when combinations of these genes are expressed in yeast and tobacco (Nicotiana tabacum) Bright-Yellow 2 cells; (d) completion of characterization of the first identified CesA gene from an alga; (e) determination of the comparative topology of a plant CesA protein in the plasma membrane with its related ancestor in animals, hyaluronan synthase, to relate structure of the proteins to their functions in the synthesis of the glucan chains of cellulose; (f) a description of the evolution, diversity, and map locations of CesA genes in cotton, studies that should shed light on the evolution of tetraploid cotton and also identify polymorphisms in these genes to contribute to the genome maps of diploid and tetraploid cottons; and (g) microarray experiments to study global expression patterns of large numbers of genes in Arabidopsis, maize, and cotton under conditions in which we know CesA gene expression is affected, with the goal of identifying other genes that are important for cellulose synthesis in plants. The project will also make available useful tools for the scientific
community such as seeds of transgenic Arabidopsis expressing the
reporter gene
PI: Xinnian Dong, Duke University, xdong{at}acpub.duke.edu; Co-PI: Frederick M. Ausubel, Massachusetts General Hospital, ausubel{at}molbio.mgh.harvard.edu; Co-PI: Shauna Somerville, Carnegie Institute, Stanford University, shauna{at}andrew.stanford.edu NSF Arabidopsis 2010 Project No. 0114783; http://genetics.mgh.harvard.edu/ausubelweb/nsf2010/nsf2010.htm Plants respond to pathogen attack through a variety of
signaling pathways consisting of a large number of regulatory as well as effector genes. During the past several years, many defense-related genes have been identified through genetic analysis conducted in
Arabidopsis. Importantly, Arabidopsis exhibits all of the major kinds
of defense responses present in other plants (Glazebrook, 1999 We will utilize Affymetrix GeneChips to identify Arabidopsis defense-related genes after infection with a variety of obligate biotrophic pathogens and necrotrophic pathogens. Based on this analysis, we will construct custom microarrays consisting of spotted 60 to 75 mers (pathoarrays) corresponding to defense-related genes (a list of the genes identified so far can be found at http://genetics.mgh.harvard.edu/ausubelweb/nsf2010/NSF_2010.html). The custom pathoarrays will be made available to the Arabidopsis community at a nominal cost with the expectation that the experimental results generated using these pathoarrays will be deposited in PMIDB. We will use the custom pathoarrays and Arabidopsis defense-related mutants to define the expression signatures resulting from the activation of defense pathways. We will create a Web-accessible PMIDB. This database, which will be accessible at http://genetics.mgh.harvard.edu/ausubelweb/nsf2010/NSF2010.html, will be developed during the next 4 years and will contain standardized experimental procedures for analyzing host defense responses, a list of all the pathogenesis-related mutants and their phenotypes, a list of defense-related genes with links to various sequence databases, and expression profiles of different plant-pathogen interactions and different defense-related mutants. PMIDB is being constructed as part of a larger databasing project (Integrated Microarray Database System [IMDS]) currently under way in the Department of Molecular Biology at the Massachusetts General Hospital. IMDS is designed to allow both local researchers and their external collaborators to store, retrieve, and analyze microarray data via a Web interface. In addition, the public will be able to view published microarray data and protocols. The IMDS will be capable of handling both spotted microarray data and Affymetrix chip data. In addition, the database will be capable of storing both the raw microarray data and sets of normalized data. The data input and retrieval software for the database will be open source and the database itself will be written in MySQL, an open source relational database management system. PMIDB (IMDS) is fully compliant with the minimum information about a
microarray experiment recommendations, which outline the minimum
information required to verify array-based gene expression profiling
experiments (Brazma et al., 2001 Both the Web-accessible user interface and the table structure of PMIDB (IMDS) have been designed to support a variety of detailed queries on the database, allowing the retrieval of microarray experiments based on mutant description, tissue and developmental stage characteristics of source plants, and detailed aspects of growth conditions and sample treatments. The ability to track multiple treatments and time courses within a given experiment has been built into the database. The combination of these features (Web accessibility, compliance with standard exchange model and exchange format specifications, ability to store both spotted array and Affymetrix data, flexibility to store sets of normalized microarray data in addition to raw data, and robust data retrieval capabilities designed for plant microarray specific experimental needs) is not currently available in other implemented microarray databases. Thus, we feel that implementation of PMIDB (IMDS) is required to meet the storage and analysis needs for plant microarray experiments.
PI: Joseph R. Ecker, The Salk Institute for Biological Studies, ecker{at}salk.edu; Co-PI: Ronald W. Davis, Stanford Genome Technology Center, Stanford University, dbowe{at}sequence.stanford.edu; Co-PI: Athanasios Theologis, Plant Gene Expression Center, University of California, Berkeley, theo{at}nature.berkeley.edu NSF Plant Genome Project No. 9975718/0196098; http://signal.salk.edu/SSP/index.html Project Summary To carry out functional genomic and proteomic studies using the recently completed Arabidopsis genomic sequence, we must be able to readily manipulate and express all of the genes. Unfortunately, current computational approaches for Arabidopsis gene prediction are not able to precisely predict or, in some cases, even recognize many of the genes. These limitations prohibit the use of new emerging technologies for global gene functional analysis genomes. The aim of our program is to experimentally define the transcription units for all Arabidopsis genes. This will provide an accurate determination of the gene structures and allow the construction of full-length cDNAs for each gene. Determining the sequences of the transcription units will resolve ambiguities in the annotated genomic sequence and allow precise positioning of introns/exons and 5' transcription start and 3' polyadenylation addition sites. The identification of full-length cDNAs for all Arabidopsis genes is of primary importance for the entire plant biology community because these clones will be essential for many future global functional genomic and proteomic studies. Responsibilities and Deliverables of the Salk, Stanford, Plant Gene Expression Center Consortium (SSPC) include: (a) isolation and complete sequencing of full-length cDNAs for 8,000 genes with immediate depositing of cDNA sequences in GenBank; (b) construction of 8,000 open reading frame (ORF) clones into a universal recombination plasmid vector (pUNI). The ORF clones, which are fully sequenced validated and error free, are deposited in the Arabidopsis Biological Resource Center (ABRC) at Ohio State University (Columbus; no Material Transfer Agreement required). Among the 8,000 ORF clones, 7,000 will be constructed by PCR from full-length cDNAs and the last 1,000 are being identified from nonexpressed annotated genes (hypothetical); and (c) identification of novel Arabidopsis transcription units using custom Affymetrix genome tiling arrays and mRNA samples prepared from various plant tissues and conditions. Methodology The strategy for isolating full-length/ORF cDNA clones for 8,000 Arabidopsis genes is shown in Figure 1. The strategy utilizes three complementary approaches for achieving of our goals.
Approach 1 Construction of ORF clones by reverse transcriptase (RT)-PCR (see below for the terminology of various clones). Sixty percent of the Arabidopsis genes have an identified EST and the source of mRNA for this clone is known. Using RT-PCR and gene specific primers at the ATG and TAA, full-length cDNA can be isolated for a large number of these genes. The annotated ATG/TAA can be tested experimentally to determine whether it is correct by designing RT-PCR primers for potential upstream ATG(s) using the genome sequence.Approach 2 The second approach utilizes the RIKEN Arabidopsis full-length (RAFL) clones constructed by Dr. Kazuo Shinozaki (RIKEN Genome Center, Tokyo, http://www.gsc.riken.go.jp/Plant/index.html). This collection was made available to SSPC by an agreement between the RIKEN Genome Science Center and the Salk Institute, Stanford University (Stanford University), and the University of California (Berkeley). The RAFL collection consists of approximately 15,000 clones representing approximately 10,500 unique Arabidopsis genes. The RAFL cDNAs (R clones) have been sequenced by the SSPC (Table II). Subsequently, the ORF of each RAFL clone is transferred into a pUNI vector by PCR/subcloning and each of the ORF clones (U clones) is then fully sequenced. The sequences of the error-free U clones are deposited in GenBank, whereas the clones themselves are deposited with the ABRC.
Approach 3 Finally, we have developed a novel strategy for identifying the "missing genes" that utilizes custom high-density genome tiling arrays constructed by Affymetrix. Use of several different types of custom high-density oligonucleotide arrays has allowed the identification of numerous transcriptional units that, thus far, have not been found in any of the deep EST or cDNA collections. We have developed protocols for labeling mRNA and calibrating the hybridization conditions for the transcript-mapping chip. Importantly, we have also developed a first generation software tool for scanning of the genome tiling arrays that allows interpreting this massive amount of expression data (see http://signal.salk.edu/msample.html).SSPC Deliverables Funding from our first year award was used to develop protocols for all the steps in the strategy to carry out the experiments using the Affymetrix arrays. The second year of funding was primarily used for large-scale cDNA sequencing and construction/sequencing of the ORF clones. Essential to the entire enterprise was the development of a cDNA sequence and mapping database, software for automating the sequencing and annotation procedures for full-length cDNA sequencing and ORF production, and software for analysis of high-density genome tiling arrays. Full-Length cDNA and ORF Clones Construction and Sequencing Production Below are definitions of the various cDNA clone types being generated by the SSPC and the total number of clones constructed, sequenced, and submitted to GenBank from each class as of March 15, 2002 (Table II). Additional details about the DNA sequences and clone/vector information can be found at the SSPC Web site (http://signal.salk.edu/SSP/index.html). This site contains all of the SSPC data in one location for ease of access to the community with links to each of the three participants Web sites.Preparation of mRNAs for Transcription Unit Discovery We have prepared 107 distinct mRNA population from a variety of plant tissues and treatments.Hybridization Data All data from Affymetrix pilot tiling chip and whole genome chip hybridizations experiments used for transcription unit discovery will be available at the end of the project.Overall Assessment of Cost With a 3-year budget of $7.5 million (direct/indirect cost), our NSF-funded Arabidopsis full-length cDNA sequencing and ORF clone construction project is the largest publicly funded program of its type. This amount of funding translates to approximately $500 per sequenced validated cDNA clone. An equivalent project called the Mammalian Gene Collection (http://mgc.nci.nih.gov/Info/ProjectSummary) is being carried under the sponsorship of 19 National Institutes of Health and National Cancer Institute Institutes and involves 22 academic laboratories and companies. The current total unique full-length cDNAs (as of March 23, 2002) are 7,646 (human) and 4,416 (mouse; Mus musculus) for a cost of $25 million. This amount of funding translates to approximately $2,000 per cDNA (with no ORF clones). Therefore, the SSPC project compares favorably with other similar public projects. Material Distribution DNA sequences All completed cDNA sequences are immediately deposited in GenBank. A variety of cDNA search tools are available on our Web site (http://signal.salk.edu/cgi-bin/sspsearch).cDNA Clones Sequence-validated, error-free ORF clones in pUNI51 are deposited and available through the ABRC (http://godot.ncgr.org/abrc). Beginning at the end of April 2002, all of the RIKEN Arabidopsis full-length (RAFL/R clone) cDNA clones whose full-length cDNA sequences have been determined by the SSPC will be available from the RIKEN Bioresource Center. Contact the Bioresource Center (PI: Dr. Masatomo Kobayasi, kobayasi{at}rtc.riken.go.jp) for any of the Arabidopsis RAFL cDNA clone. These clones will also become available through the ABRC. See our "where to order from" Web page for further details (http://signal.salk.edu/SSP/ssporder.html).Summary The creation of an easy to use graphical Web interface (Salk Institute Genome Analysis Laboratory [SIGnAL] Arabidopsis Gene Mapping Tool) to our cDNA database and the availability of the corresponding full-length cDNAs and ORF clones in public stock centers provides researchers with ready access to their genes of interest. Full-length cDNAs and ORF clones are prerequisite for the construction of whole proteome arrays, for high-throughput protein structural studies, and for the rapid creation of protein fusions (green fluorescent protein [GFP], tandem affinity purification tagged, etc.). For example, the ability to rapidly create translational fusions for any protein tag to any Arabidopsis protein will allow large-scale in vivo protein complex/mass spectrometry (MS) studies. These resources will allow investigators to begin to test hypotheses about plant gene function at an unprecedented rate and an unprecedented scale (i.e. thousands of genes in parallel). Citation of the Project Because we plan to submit the results of this study for publication, we request that you do not cite this project summary as a reference to our project. Instead, until publication, we suggest the following acknowledgment: "We thank the SSPC and the RIKEN Genome Science Center for providing the sequence-validated full-length cDNAs." Finally, we request that investigators include the GenBank accession numbers for RAFL cDNAs and SSPC ORF clones in all publications that describe cDNAs produced by our consortium.
PI: Joseph R. Ecker, The Salk Institute for Biological Studies, ecker{at}salk.edu NSF Arabidopsis 2010 Project No. 0115103; http://signal.salk.edu/tabout.html Overall Goals With the availability of the entire Arabidopsis genome sequence
(Arabidopsis Genome Initiative, 2000 Method The SIGnAL has established high-throughput genome sequencing methods to identify the sites of insertion of Agrobacterium tumefaciens T-DNA insertions in the Arabidopsis genome. Individual T-DNA-transformed plants from the Alonso/Crosby/Ecker collection (Arabidopsis ecotype Columbia [Col-0] strain) are grown in a 96-well format, genomic DNA is prepared, flanking plant DNA is recovered by adapter ligation/suppression PCR amplification of the T-DNA insertion site, and DNA sequences of the products are determined. As is typical for T-DNA transformation, approximately 50% of the transformed plants contain more than one T-DNA integration event. However, no attempt is made to physically separate the products before sequencing because this would create unmanageable tracking issues. In most cases, where two or more plant flanking sequences are amplified from a single plant line, a single high-quality DNA sequence is obtained from the longest insertion site PCR product. Each T-DNA sequence is aligned with the latest version of the annotated Arabidopsis genome in GenBank (current version: January 10, 2002). A single best location (based on E value) for each insertion sequence is determined, and annotation of a best approximation of insertion site is added (5'-untranslated region, exon, intron, and 3'-untranslated region; see frequently asked questions [FAQ] page for more details). The sequence data is made available via a Web-accessible graphical interface-SIGnAL Arabidopsis Gene Mapping Tool (http://signal.salk.edu/cgi-bin/tdnaexpress) that provides both text and DNA searches of the insertion mutant database. All T-DNA insertion site sequences with genome homology are deposited into GenBank (GSS Division) and also provided to The Arabidopsis Information Resource (TAIR; http://www.arabidospis.org). Resources Made Available Each month, seeds (approximately 100-µL vol.) from each Salk T-DNA insertion line is deposited with the ABRC. The ABRC distributes seed to the community and to the Nottingham Arabidopsis Stock Centre (UK); the SIGnAL laboratory does not distribute seeds to individual investigators. ABRC is propagating a subset of the Salk T-DNA insertions mutants. Each month, we provide ABRC with approximately 6,250 insertion lines (approximately 63 boxes of 100 individual T3 generation seeds) and a corresponding gene "hit" list. This allows the ABRC to prioritize their seed propagation program to initially focus on amplification of plant lines containing insertions within genes (versus lines with T-DNA insertions between genes). Importantly, no attempt is being made to identify lines that are homozygous for the insertion. Investigators are cautioned to confirm the presence of the expected T-DNA insertion using PCR (see FAQ page). We have made every attempt to reduce tracking and contamination problems. However, like other high-throughput operations, it is inevitable that due to mechanical or human error, such events will occur. Therefore, the Salk insertion lines are provided to the ABRC "as is." Users are expected to confirm our results before initiating their experiments. Please check our FAQ page (http://signal.salk.edu/tdna_FAQs.html) for experimental details regarding the confirmation of insertion targets before contacting the PI with questions. Progress During the first 6 months since initiation of our funding, we have identified approximately 32,500 sequence-indexed insertion lines and made available the seeds for each corresponding Salk mutants through the ABRC. This corresponds to approximately 5,000 insertion mutants per month and translates to approximately 9,000 unique gene mutations. To put these results in perspective, our 6-month total for identified Arabidopsis gene mutations is greater than the entire accumulated total number of community-identified gene mutation available in the public domain. Cost The full cost of the project (direct and indirect costs) for propagation of individual lines, preparation of genomic DNA, ligation/PCR amplification of plant flanking sequences, DNA sequencing reaction/product separation, sequence analysis, insertion site gene annotation, database development, cleaning/packaging of seed, and bar coding/shipment of individual mutant lines is $20.00 per individual Salk insertion line. Citation Because we plan to submit the results of this study for publication, we request that you do not cite this project summary as a reference to our project. Instead, until publication, we suggest the following acknowledgment: "We thank the Salk Institute Genomic Analysis Laboratory for providing the sequence-indexed Arabidopsis T-DNA insertion mutants." Finally, we request that investigators include the Salk accession number (Salk_xxxx) in all publications that describe mutants deposited by our laboratory with the ABRC. Summary The creation of an easy to use graphical Web interface (SIGnAL Arabidopsis Gene Mapping Tool) in conjunction with our database containing the insertion site sequence information and the availability of the corresponding mutant lines in public stock centers provides researchers with ready access to complete or partial loss-of-function mutants in most Arabidopsis genes, allowing the testing of hypotheses about gene function at an unprecedented rate.
PI: Asim Esen, Virginia Polytechnic Institute and State University, aevatan{at}vt.edu; PI: Jonathan E. Poulton, University of Iowa, jonathan-poulton{at}uiowa.edu; Co-PI: Chi-lien Cheng, University of Iowa, chi-lien-cheng{at}uiowa.edu; Co-PI: Ming-Che Shih, University of Iowa, ming-che-shih{at}uiowa.edu; Co-PI: Mohamed Ali, Virginia State University, amohamed{at}vsu.edu; Co-PI: Brenda Winkel-Shirley, Virginia Polytechnic Institute and State University, winkel{at}vt.edu; Co-PI: David R. Bevan, Virginia Polytechnic Institute and State University, drbevan{at}vt.edu; International Collaborator: Bernard Henrissat, Centre National de la Recherche Scientifique, France, bernie{at}cfmb.cnrs-mrs.fr; International Collaborator: Birger L. Møller, The Royal Agricultural and Veterinary University, Denmark, blm{at}kvl.dk NSF Arabidopsis 2010 Projects 0114666 and 0115937; http://www.biology.uiowa.edu/Arabidopsis/; http://www.biol.vt.edu/faculty/esen/glycosidaselab Widely distributed in animals, plants, and microbes,
O-glycoside hydrolases (EC 3.2.1.-) catalyze the cleavage of
chemical bonds between two or more carbohydrates or between a
carbohydrate and a non-carbohydrate moiety. This collaborative research
project will focus on approximately 75 members of two related families of Arabidopsis glycoside hydrolases. Family 1 includes Specific objectives of our 2010 Project are: (a) to undertake
phylogenetic analyses of the Arabidopsis Resources available to the public include: (a) cloned cDNAs of
all Arabidopsis
PI: Mark Estelle, Indiana University, mestelle{at}mail.utexas.edu; Co-PI: Bonnie Bartel, Rice University, bartel{at}bioc.rice.edu; Co-PI: John L. Celenza, Boston University, celenza{at}bio.bu.edu; Co-PI: Jerry D. Cohen, University of Minnesota, cohen047{at}tc.umn.edu; Co-PI: Jennifer Normanly, University of Massachusetts, normanly{at}chemserv.chem.umass.edu NSF Plant Genome Project No. 0077769; http://www.auxin.org The plant hormone auxin (indole-3-acetic acid [IAA]) functions in a multitude of plant growth and developmental processes. Modern molecular and genetic approaches combined with the development of highly sensitive methods of measuring IAA levels have allowed significant advances in the understanding of auxin biosynthesis, transport, metabolism, and response in Arabidopsis. Mutants with visible "auxin phenotypes" have revealed that the pathways involved in IAA homeostasis form a complex network with considerable redundancy. The goal of this project is to build from these insights by identifying novel genes and pathways involved in the physiological maintenance of auxin homeostasis (e.g. synthesis, transport, degradation, conjugation, etc.). Toward this end, we are in the process of developing a high-throughput analytical screen for chemically generated Arabidopsis mutants possessing slightly altered IAA levels. We expect that many of these mutants will not have visible phenotypes and thus should allow us to identify genes that would not have been identified in a conventional screen for auxin phenotypes. For selected mutants displaying altered IAA levels, we will isolate the corresponding wild-type gene and characterize its involvement in IAA homeostasis and function. In addition, we are developing several novel high-throughput screens for mutants affected in aspects of auxin homeostasis or response. A High-Throughput Analytical Screen for Mutants The current method for measurement of IAA levels in tissue involves prepurification by solid-phase extraction (SPE) of tissue homogenate followed by IAA methylation, purification by HPLC, and gas chromatography-selective ion monitoring-MS quantification using isotope dilution analysis with [13C6]IAA. This method is very time and labor intensive, and thus is not amenable to the high-throughput screening requirements that are central to this project's objective. In the development of a protocol to measure IAA tissue titers that is compatible with high-throughput screening, we will modify the current method of IAA tissue quantification. First, we will alter the mechanics and technical aspects of the tissue homogenization and extraction, SPE, and methylation steps to allow them to be performed robotically in a 96-well format. Second, we will substitute the HPLC purification step (due to its inherent incompatibility with high-throughput screening) before gas chromatography-selective ion monitoring-MS with an IAA immunopurification step. We are in the process of developing a novel monoclonal IAA antibody prepared for use in this method. In the development of the steps involving automated IAA extraction from Arabidopsis seedling tissue, we have performed preliminary evaluations of two methods of tissue disruption that are amenable to high-throughput format: (a) serial freeze-thaw cycling, and (b) tissue disruption by rapid agitation in the presence of beads. The commercial availability of high-throughput freeze-thaw cycling and rapid agitation equipment makes these methods excellent candidates for our automated tissue extraction step. In addition, in the development of high-throughput SPE prepurification of IAA from tissue extracts, we have collected SPE columns in the 96-well plate format containing amino sorbent (or an anion exchanger similar to an amino functional group) from about 10 manufacturers. We are in the process of evaluating and comparing these SPE plates to determine which SPE column and sorbent in the 96-well format most effectively and efficiently meets our IAA-automated extraction needs. Functional Screen for Auxin-Metabolizing Enzymes Because pathways for IAA conjugation and catabolism are likely redundant, standard mutant screens may have difficulty in identifying genes involved in IAA metabolism. Therefore, we propose to screen Arabidopsis cDNA expression libraries for these activities. Libraries amenable to expression in E. coli and yeast will be used. Candidate genes then will be analyzed by reverse genetic techniques in Arabidopsis. Gain-of-Function Screens We have generated >30,000 lines transformed with a 35S-cDNA
library (LeClere and Bartel, 2001
PI: Robert J. Ferl, University of Florida, robferl{at}mail.ifas.ufl.edu NSF Arabidopsis 2010 Project No. 0114501; http://www.hos.ufl.edu/ferllab/14-3-3_Proteins/14-3-3_Proteins.htm One of the central paradigms of signal transduction is
that phosphorylation of an enzyme can alter its activity or subcellular location. In recent years, this paradigm has undergone a significant modification in that, for many signal transduction events,
phosphorylation alone is not enough to accomplish the transition in
activity or location. Often phosphorylation is only the first
step in the signal-induced transition and the second requisite step is
the binding of 14-3-3 proteins to complete the signal transduction even
(Ferl, 1996 The specific goals of the project are to identify in the Arabidopsis genome client proteins that possess 14-3-3 docking sites and to assay the interaction strength of the potential docking site against the various members of the family of 14-3-3 proteins. This will result in a tested, predictive algorithm for identifying the spectrum of 14-3-3 interactions that would be expected for each client protein. These data, combined with emerging data from 14-3-3 knockouts and 14-3-3 isoform expression patterns, should provide key insights into the potential regulatory functions of the 14-3-3 signal-mediating molecules, and a central, fully characterized database for predicting 14-3-3-mediated signaling in diverse pathways. Initial searches of the Arabidopsis genome predict that 14-3-3s may interact with as many as 10% to 20% of the Arabidopsis gene products. By highlighting potential regulatory sites across diverse gene families and metabolic pathways, this project will provide an opportunity to integrate many of the Arabidopsis 2010 and Genome Projects identifying specific functional aspects of Arabidopsis gene families.
PI: Mary Lou Guerinot, Dartmouth College, Guerinot{at}Dartmouth.edu; Co-PI: David Eide, University of Missouri, eided{at}missouri.edu; Co-PI: Michael Gribskov, University of California, San Diego, gribskov{at}SDSC.edu; Co-PI: Jeffrey F. Harper, The Scripps Research Institute, harper{at}scripps.edu; Co-PI: David E. Salt, Purdue University, salt{at}hort.purdue.edu; Co-PI: Julian I. Schroeder, University of California, San Diego, Julian{at}biomail.ucsd.edu; Co-PI: John M. Ward, University of Minnesota, Minneapolis-St. Paul, jward{at}tc.umn.edu NSF Plant Genome Project No. 0077378; http://plantst.sdsc.edu/ Uptake and translocation of mineral nutrients in plants is essential for plant growth and human nutrition. Despite recent advances in identifying genes involved in nutrient transport, the systems that control acquisition of individual nutrients remain largely unknown. The major objective of the proposed research is to identify gene networks that control uptake and accumulation of a wide array of plant nutrients and toxic metals. The approach makes use of recent technical advances in inductively coupled plasma (ICP)-MS that now permit the measurement of multiple elements in 1 to 2 min per plant sample. Identifying genes controlling solute uptake and accumulation has significance for agriculture, human health, and the environment. For example, enhancing the ability of a crop plant to mobilize soil nutrients should reduce the use of fertilizers, thereby making agriculture more cost efficient and less polluting. Because plants are the primary source of food for humans, either directly or through animal feed, the nutritional value of plants is of central importance to human health. The most widespread nutritional problem in the world is iron deficiency. Increasing the ability of plants to provide higher levels of minerals, such as iron, will have a dramatic impact on human health. Furthermore, understanding the pathways by which toxic metals accumulate in plants will enable the engineering of plants to exclude toxic metals and create healthier food sources, or to extract toxic metals from the soil to clean up polluted lands and water. This project will functionally identify many important genes, including
those that are involved in: (a) mobilizing nutrients in the
rhizosphere, (b) cellular uptake and efflux systems, (c) subcellular
compartmentalization of solutes, (d) the operation of phloem and xylem
translocation systems, (e) central regulatory mechanisms, (f) sensing
nutrient levels, and (g) controlling root structure. This functional
genomic investigation will provide the first integrated picture, to our
knowledge, of the genes involved in a fundamental feature of
all living systems Proposal Goals The main aims of the proposal are to: (a) use bioinformatics to identify genes that potentially encode transporter; (b) use mRNA expression profiling to identify genes that change expression in response to nutrient deprivation or overfeeding; (c) use nutrient profiling to screen for mutant plants with abnormal element compositions. ICP will be used in a high-throughput strategy to determine the relative element composition of approximately 50,000 mutagenized plants; (d) use yeast to obtain functional predictions of plant orthologs. The primary approach will be to conduct ICP nutrient profiling of approximately 5,000 knockout lines of yeast; (e) establish a Web site to provide access to data sets and enhanced annotation of genes; and (f) Initiate collaborative research focused on selected mutations that control accumulation of Fe, Zn, K, Na, Ca, Se, and Cd to further demonstrate the power of this novel approach. Progress Highlights We have now analyzed 3,000 of the 5,000 viable knockout strains of yeast via ICP. We will complete the analysis by September 2002. The ICP data is available on the Web (http://plantst.sdsc.edu/) and can be searched for mutants whose metal content is increased or decreased relative to wild type. Screening of Arabidopsis lines is also under way and this data will also be available on the Web. All plants showing a reproducible >2 SD increase/decrease from wild type in their ion profile are being defined as mutants of interest. Seeds are collected from these plants and the progeny reanalyzed to confirm their mutant status. We are focusing on 18 elements including Na, P, K, Ca, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, and Cd. As an ongoing part of this project, we are developing a Web site on
mineral nutrient uptake and translocation
(http://plantst.sdsc.edu/). Annotation remains a challenge and we
will be implementing a community-based approach to the annotation of
transporter families. To facilitate gene expression studies, an
improved annotation for the GeneChip Arabidopsis Genome Array is
available for downloading from the Web
(http://www-biology.ucsd.edu/labs/schroeder/genechip.html; Ghassemian et al., 2001 To identify target genes for analysis in this project, all potential
membrane proteins in the Arabidopsis genome were grouped into families
of unknown, known, or predicted function (Maser et al., 2001
PI: Andrew D. Hanson, University of Florida, Gainesville, adha{at}mail.ifas.ufl.edu; PI: Yair Shachar-Hill, New Mexico State University, Las Cruces, yairhill{at}nmsu.edu NSF Arabidopsis 2010 Projects 0114117 and 0113620; http://www.hos.ufl.edu/meteng/1Cpage1.html The network of C1 reactions provides
C1 units for use in biosynthesis. It is crucial
to plant metabolism, but many of the network's enzymes are
known poorly or not at all. Accordingly, the goal of this
collaborative project is to determine functions for genes that
putatively encode: 10-formyl-tetrahydrofolate (THF) deformylase,
sarcosine oxidase, formamidase, 5-formyl-THF cycloligase, S-formylglutathione hydrolase, Glu forminotransferase, Met
To date, we have shown that the putative 5-formyl-THF cycloligase gene encodes this activity, and that the enzyme is mitochondrial, which is not the case in other eukaryotes. We have also confirmed that the putative S-formylglutathione hydrolase gene specifies this enzyme. This research will meet 2010 project objectives by assigning biochemical and metabolic functions to most of the unexpected, unexplained, and unexplored genes in plant C1 metabolism. The broader impact will be 3-fold. First, relating to crop improvement, many efforts to genetically engineer plants for human benefit involve changes to C1 metabolism, making it vital to understand C1 metabolism such that it can be engineered successfully. Second, relating to basic plant biochemistry, C1 metabolism is perhaps the least well-understood area of plant primary metabolism despite its central position in processes such as photorespiration, lignification, and alkaloid synthesis. Third, relating to biochemistry in general, the plant C1 metabolic network is special, not merely a minor variation on those in bacteria, yeast, or mammals. The project Web site (http://www.hos.ufl.edu/meteng/1Cpage1.html) contains a detailed outline of the proposed research and lists recent publications. This site will be used to post research results as they become available, and to catalog the full-length cDNAs, antibodies, and seed stocks generated in the project.
PI: Alice C. Harmon, University of Florida, harmon{at}botany.ufl.edu; Co-PI: John C. Cushman, University of Nevada, Reno, jcushman{at}unr.edu; Co-PI: Jeffery F. Harper, The Scripps Research Institute, harper{at}scripps.edu; Co-PI: Estelle M. Hrabak, University of New Hampshire, emhrabak{at}cisunix.unh.edu; Co-PI: Michael R. Sussman, University of Wisconsin, Madison, msussman{at}facstaff.wisc.edu NSF Arabidopsis 2010 Project No. 0114769; http://plantsp.sdsc.edu The genes under study belong to the Arabidopsis
calcium-dependent protein kinases (CDPKs)/SNF1-related kinase (SnRK)
family of protein kinases (Harmon, 2001 This 2010 project has three scientific goals that will not only provide information about the function of 64 of these protein kinases and materials for their study, but will also provide methods of analysis that can be applied to the study of other families of protein kinases. The educational goal is an outreach project, in which we will work with junior college instructors to develop course materials and Web-based virtual labs. The data from this project will be deposited in TAIR (http://www.Arabidopsis.org) and PlantsP (plantsp.sdsc.edu) databases, and DNA constructs will be made available through the Arabidopsis Stock Center (http://www.Arabidopsis.org/abrc). Scientific Goals and Progress Identify Kinase Substrates Using Substrate Traps Substrate traps are kinase constructs that have been engineered to stabilize or prolong their interaction with protein substrates. These constructs will be used in yeast or bacterial two-hybrid systems to identify substrates and interacting proteins. We are using a proven substrate trap design (Patharkar and Cushman, 2000Determine the Subcellular Locations of Membrane-Associated or Compartmentalized Kinases Our approach is to co-express and visualize by confocal microscopy both kinases tagged with yellow fluorescent protein (YFP) or GFP and markers specific for different membrane or cytoskeletal proteins tagged with cyan or red fluorescent protein. Evaluation of candidate marker proteins is in progress, and cloning and construction of tagged kinases is under way.Use MS to Identify Substrates Phosphorylated by Kinases in Vitro and Map the Phosphorylation Sites in Substrates and Kinases Arabidopsis proteins in cell extracts that are phosphorylated in vitro by recombinant protein kinases will be isolated and identified by MS and their sites of phosphorylation will be sequenced. This approach will yield consensus phosphorylation motifs for the kinases and will help address the question of overlaps in substrate specificity between kinase isoforms. Cloning of full-length kinases in expression vectors is under way, and methods for isolating phosphoproteins and peptides are in development.
PI: Richard Jorgensen, University of Arizona, raj{at}ag.arizona.edu; Co-PI: Judith Bender, Johns Hopkins University, bender{at}welchlink.welch.jhu.edu; Co-PI: Vicki Chandler, University of Arizona, chandler{at}ag.arizona.edu; Co-PI: Karen Cone, University of Missouri, conek{at}missouri.edu; Co-PI: , Purdue University, gelvin{at}bilbo.bio.purdue.edu; Co-PI: Heidi Kaeppler, University of Wisconsin, hfkaeppl{at}facstaff.wisc.edu; Co-PI: Shawn Kaeppler, University of Wisconsin, smkaeppl{at}facstaff.wisc.edu; Co-PI: David Mount, University of Arizona, mount{at}u.arizona.edu; Co-PI: Craig Pikaard, Washington University, pikaard{at}biology.wustl.edu; Co-PI: Eric Richards, Washington University, richards{at}biology.wustl.edu NSF Plant Genome Project No. 9975930; http://www.chromdb.org The goal of this project is to generate and analyze mutations in the full complement of genes in Arabidopsis and maize that contribute to chromatin-level gene regulation. Sequence similarity searches of the Arabidopsis genome sequence have identified 180 predicted non-histone chromatin proteins in the following classes: histone acetyltransferases (12), histone deacetylases (18), SWI2/SNF2 homologs (21, not including putative recombination/repair proteins), components of SWI2/SNF2 complexes (seven), DNA methyltransferases (seven), methyl DNA-binding proteins (12), nucleosome/chromatin assembly factors (25), linker histones (five), SET domain proteins (35), MAR binding factor (one), global transcription factors (six), gene silencing factors (five), bromodomain proteins not included in other categories (25), and chromodomain proteins not included in other categories (one). Information about these genes, including splicing models, predicted protein sequences, and availability of mutants, is provided at The Plant Chromatin Database, ChromDB (http://www.chromdb.org). Gene function data generated in this project will also be made available via ChromDB. Dominant Negative Mutations Because genetic tests of chromatin gene function such as nucleolar dominance and gene silencing effects require or are more efficiently carried out with dominant mutations, dominant negative mutations are being produced for each target chromatin gene using double-stranded RNA (dsRNA) silencing. This involves introduction to Arabidopsis of transgenes producing dsRNA molecules homologous to target genes. Two or more independent, single-copy, homozygous dsRNA lines are being produced for each target Arabidopsis gene and deposited to the ABRC. dsRNA mutations are also being generated for 100 target chromatin genes in maize. In the case that dominant negative mutants are lethal or deleterious, dexamethasone-inducible dominant negative mutations are being generated by use of a dexamethasone-inducible dsRNA construct. dsRNA vectors are available via ChromDB. Insertional Mutations T-DNA insertional mutations have been identified in the Feldmann Arabidopsis T-DNA collection for most histone encoding, histone acetyltransferase, and histone deacetylase genes. In addition, the Salk T-DNA collection deposited at ABRC is being searched with chromatin gene sequences as queries to identify potential mutants. All mutants identified are listed at ChromDB (http://www.chromdb.org) with accession numbers. Functional Characterization of Mutations All mutations in Arabidopsis and maize will be characterized to determine their effects on genetic transmission, plant growth, and development, and a comprehensive battery of biochemical and epigenetic tests. These tests include DNA methylation, histone acetylation, the processes of epimutation and paramutation, reactivation of silenced transgenes and transposons, the efficiency of A. tumefaciens T-DNA integration, and nucleolar dominance. GAL4 Tethering Assays Finally, fusions of chromatin gene products (especially histone deacetylases) to the GAL4 DNA-binding domain are being tested in Arabidopsis for effects on a set of 35S:luciferase reporter transgenes possessing a GAL4 upstream activating sequence to determine the ability of candidate genes to reverse or promote the formation of repressive chromatin. A set of 25 single-copy GAL4UAS:35S:luciferase reporter loci at known locations in the Arabidopsis genome, in a variety of sequence contexts, are being produced and will be deposited at ABRC. A smaller set of minimal 35S reporter loci are also being generated to assess the ability of proteins such as histone acetyltransferases to activate gene expression.
PI: Joseph Kieber, University of North Carolina, jkieber{at}unc.edu; Co-PI: G. Eric Schaller, University of New Hampshire, egs{at}cisunix.unh.edu; Co-PI: Estelle Hrabak, University of New Hampshire, emhrabak{at}cisunix.unh.edu; Co-PI: Robert M. Pope, University of North Carolina, Chapel Hill, rmpope{at}emailunc.edu NSF Arabidopsis 2010 Project No. 0114965; http://www.bio.unc.edu/research/two-component/default.htm Two-component systems are the primary means by which
bacteria sense and respond to environmental stimuli (Stock et al.,
2000 The expression pattern of many of these genes will be determined using
a combination of GUS fusions and RNA in situ analysis. The
location of representative proteins within the cell is also being delineated. To facilitate localization studies, 10 monoclonal antibodies are being generated to marker proteins, each of which resides at a distinct membrane location within the cell. To identify proteins that interact with these signaling elements and to determine the in vivo interactions among the various members of these protein families, protein complexes from Arabidopsis will be purified and
analyzed using a tandem affinity purification procedure (Rigaut et al.,
1999 The data from these studies are being deposited on a publicly accessible Web page (http://www.bio.unc.edu/research/two-component/default.htm). A link to this Web site has been established on the TAIR database, and we will coordinate with TAIR to deposit data as appropriate. The knockout seeds will be made publicly available through deposition in the ABRC Stock Center as they become characterized. The monoclonal antibodies that are raised against the membrane marker proteins will be available for the cost of shipping through the University of New Hampshire and the cell lines will also be deposited with the American Type Culture Collection. These studies should uncover the functions of these two-component signaling elements in Arabidopsis. The proteins encoded by these gene families are predicted to interact; thus, our studies should aid in the development of a paradigm for signaling specificity among interacting members of large gene families. In addition, tools will be developed that will be generally applicable in defining the subcellular location of proteins in Arabidopsis.
PI: Eric Lam, Rutgers University, lam{at}aesop.rutgers.edu; Co-PI: Robert A. Martienssen, Cold Spring Harbor Labs, martiens{at}cshl.org; Co-PI: Richard W. McCombie, Cold Spring Harbor Labs, mccombie{at}cshl.org; Co-PI: David L. Spector, spector{at}cshl.org NSF Plant Genome Project No. 0077617; http://aesop.rutgers.edu/~lamlab/ccharting.html The nucleus is the subcellular organelle in which the bulk
of the genomic information within an eukaryotic cell is organized. From
studies using hybridization technologies and microscopy work with
serial or optical sections of fixed cells, a picture of an organized
subnuclear structure has emerged. More recently, the application of the
GFP as an in vivo tag of genomic DNA has allowed the visualization of
chromatin in live cells of animals, fungi and plants (Kato and Lam,
2001 To achieve the objective of visualizing and charting sequences corresponding to each of the chromosomes of Arabidopsis, we will deploy three distinct autofluorescent proteins (AFPs), fused to three different heterologous DNA-binding proteins, as invivo tags for concatameric binding site arrays that correspond to high-affinity targets for these proteins. We will generate about 1,000 insertions with one of these binding site arrays that will be dispersed within the five chromosomes of Arabidopsis. The relative locations of these tagged regions will be compared by two other reference insertions that contain insertions with the other two distinct binding site arrays that can be visualized with the corresponding AFP-protein fusions. Comparative analyses of the relative positions between defined regions of the genome in space and time will provide novel information about the organizational principles that control the structure and dynamics of chromatin. Concurrent with our optical studies to track the relative subnuclear location and movement for distinct regions of the genome, we will quantitate the effects of genome location on the transcriptional potential of a reporter gene. Together, these studies should provide the first comprehensive three-dimensional physical and transcription activity maps for a genome, to our knowledge, and should contribute significantly to our understanding of the roles that subnuclear location may play in controlling gene expression. Our proposed study should generate more than 1,000 mapped insertion lines of Arabidopsis with three distinct and optically tractable AFP tags at defined locations within the genome. These materials should be invaluable for the characterization of chromatin-related mutations that affect gene expression and development. The number of such mutations is likely to rapidly increase due to the efforts of several genome projects that have been funded by the NSF in the past few years. In the future, we also intend to apply the molecular tools that we have generated from this project to important crop plants such as maize and rice. The fusion of cutting edge imaging technology with the wealth of classical and modern cytogenetics in maize should provide new perspectives on global control of genetic information as well as epigenetic phenomena such as paramutation. These new insights will facilitate our understanding of how genomic information is organized in plants and how gene expression can be regulated at a global scale. As such, the tools and knowledge generated by this proposed work should benefit future efforts to improve the quality and yield of crop plants. The following information/materials will be generated and made available to the community: (a) binary vectors for regulated expression of AFP fusions as DNA tracking systems for in vivo analysis of chromatin organization, (b) approximately 1,000 mapped lines of transgenic Arabidopsis containing multicolored tracking systems for studies of chromatin dynamics, (c) distance maps for the 1,000 dispersed insertions relative to two different sets of reference insertions, and (d) relative expression levels for a common reporter gene (35S-Luc-nos) at the 1,000 mapped insertion sites to compare the effects of genome location on gene activity.
PI: Norman G. Lewis, Washington State University, Pullman, lewisn{at}wsu.edu; Co-PI: Laurence B. Davin, Washington State University, Pullman, davin{at}wsu.edu; Co-PI: Vincent R. Franceschi, Washington State University, Pullman, vfrances{at}wsu.edu NSF Arabidopsis 2010 Project No. 0117260; http://ibc.wsu.edu/lewislab/nsf/index.html Establishing the physiological function of 248 Arabidopsis enzymes and proteins presumed to be involved in various networks of phenylpropanoid-acetate metabolism is the overall goal of this ongoing study. There are two main objectives: identifying networks associated with phenylpropanoid coupling/polymerization (e.g. leading to lignins, lignans, suberins, sporopollenins, etc.), including how these enzymes/proteins function. The second objective is to precisely identify the different networks that exist in Arabidopsis that are involved in the conversion of Phe through to the monolignols. In both objectives, functions of specific enzymes will be demonstrated in vitro, and the true physiological roles of these proteins will be elucidated by demonstrating temporal and spatial correlation with segments of the metabolic pathway networks involved. This work will define the organization of the various phenylpropanoid radical-radical coupling and related metabolic processes in Arabidopsis through its entire life cycle. The benefits to the scientific community will include rapid dissemination of results (before publication) through a Web site linked to the Arabidopsis sites, and provision of research materials (genes, constructs, recombinant proteins, and transgenic and mutant plants) as needed. Another important benefit will be the new knowledge gained on these hitherto difficult systems (e.g. coupling/polymerization) involving macromolecular assemblies, and the new insights that will be gained. For enzymes/genes chosen that are ultimately not involved in these pathways, it is considered that metabolite profiling will provide a clue as to function, and this will then be examined also. Further, in addition to lignification, this study will shed important light on other highly regulated radical-radical phenolic coupling systems in vivo including: construction of seed coats and metabolites therein, generation of the (strengthened) matrix of trichomes, formation of suberized tissue and strengthening flower stalks, biosynthesis of sporopollenin (a remarkably stable component of pollen grains), reinforcement of cutinized tissue, cross-linking cell wall carbohydrates through hydroxycinnamic acid (phenolic) coupling, and production of a plethora of defense-related compounds. In addition to peer-reviewed journals, the research findings and information and materials generated by this research will be made available to public databases, and will be updated monthly by posting data/information on a dedicated Web site (http://ibc.wsu.edu/lewislab/nsf/index.html) that will be linked up to the Arabidopsis network sites relevant to the project. This information will include: recombinant protein expression vectors constructed by our laboratory, gene identification and gene function analysis, kinetic data, gene expression profiles, metabolite and lignin analysis, in situ hybridization data, and light microscopy documentation. A multipronged strategy has been taken to address the research goals: Initially, a series of "digital" northern assays were carried out to identify possible tissue distributions of each gene of interest; these data can be conveniently depicted in graphical form and are derived from The Institute for Genomic Research (TIGR; Rockville, MD) database of (tentative consensus) sequence ESTs. Although this information provides some inkling as to which tissue in which a gene is likely to be expressed, the data are still very preliminary and incomplete. Accordingly, this study will significantly enhance the information in the database for the genes selected. In the first 6 months of this research project, monolignol pathway enzymes (initially genes encoding Phe ammonia lyase [four genes], cinnamoyl CoA reductase [nine genes], cinnamyl alcohol dehydrogenases [13 genes], and putative allylic phenylpropanoid double bond reductases [10 genes]) have all been cloned, and for the most part expressed in recombinant form, as a prelude to establishing physiological functions (i.e. identifying the corresponding phenylpropanoid network(s) involved). In addition, monolignol coupling enzymes and proteins (16 dirigent proteins, 71 peroxidases, and 23 laccases) are also under investigation in this initial phase. The 16 dirigent proteins have been individually cloned and are being over expressed in Arabidopsis in sense and antisense orientations, as well as in insect cells. In addition, the promoter regions of each dirigent have been isolated (approximately 670-2,035 bp) and various reporter gene strategies are being utilized to define the different patterns of expression in individual tissues (e.g. from stems to seeds). An analogous approach is under way to study the role of the various peroxidase(s) and laccase(s).
PI: Emmanuel Liscum, University of Missouri, liscume{at}missouri.edu NSF Arabidopsis 2010 Project No. 0114992; http://www.biosci.missouri.edu/liscum/NPH3-RPT2family.html Of the approximately 11,000 predicted protein families
encoded by the Arabidopsis nuclear genome, only approximately 150 appear to have no paralogs in non-plant species (Arabidopsis Genome
Initiative, 2000 Loss-of-function alleles are being isolated from a number of publicly
available T-DNA-insertional mutant lots, whereas gain-of-function alleles are being generated by driving (over) expression of each gene
from its native promoter fused to four transcriptional enhancer elements from the cauliflower mosaic virus 35S promoter. We
expect that this "activation expression" approach (derived in name
and concept from Weigel et al., 2000 We are developing an amplified fragment-length
polymorphism-based (Liscum, 1999
PI: Alan Lloyd, University of Texas, Austin, lloyd{at}uts.cc.utexas.edu; Collaborator: Vaughan Symonds, University of Texas, Austin, vsymonds{at}mail.utexas.edu NSF Arabidopsis 2010 Project No. 0114976; http://www.Arabidopsis2010.org Variation is the essence of genetics. The most widely
exploited method for functional analysis of genes in Arabidopsis and other model genetic organisms has been to use investigator-imposed mutation to induce genetic variation of interest. There are limitations to this approach. Mutations in essential genes are often difficult to
isolate and work with due to lethality or morbidity. Furthermore, genes
that are even partially redundant may never be revealed by this method.
This is compounded in Arabidopsis by the finding that 64% of the
composite ORFs have a match elsewhere in the genome (Vision et al.,
2000 In addition to the classical genetic approach, there is a largely unexploited source of natural genetic and phenotypic variation readily available for use in Arabidopsis. Several hundred independent Arabidopsis ecotypes or accessions are publicly available and these contain quantitative character variation far beyond the common lab strains. Because Arabidopsis is almost entirely a self-pollinating plant, individual plants of these accessions are essentially homozygous inbred lines. One way to exploit this natural variation is to use quantitative genetics and develop means to map QTLs between homozygous parental pairs of heretofore-unused natural accessions. A QTL analysis is a way of simultaneously mapping multiple loci that are responsible for observable segregating trait variances in progeny from the hybridization of two individuals. Specifically, a QTL analysis provides estimates of how many and which regions of the genome affect the variation of a trait segregating between the progeny of two inbred lines, often with the ultimate goal of identifying the specific QTL involved. The approach relies on quantitative rather than qualitative differences, so that the function of many loci, which may be duplicate, essential, or ecotype specific, can be determined by this method. Although QTLs can be mapped in the F2 generation, the production of genetically mapped RILs provides a permanent resource that alleviates most of the work involved in QTL mapping. At present, relatively few accessions have been incorporated into mapping populations. The goal of this project is to develop a resource for the scientific
community that promotes expanded use of natural genetic variation
toward the objective of assigning a function to each Arabidopsis gene.
At least four new sets of mapped RILs will be produced from pairs of
previously unused lines that demonstrate as wide an array of variation
as possible. Ninety-five individuals, from each set of 400 F8 generation RILs, will be mapped at 100 loci,
generating maps with a density of approximately 6 cM. Potential RIL
parents from 100 wild-type accessions have been systematically analyzed
for genetic and phenotypic variation using pair-wise distances and
principle component analysis, respectively (Fig. 1). Pairs of parents
will be chosen from these lines based on phenotype variation and
segregating F2 variance. Simple sequence length
DNA polymorphisms (Bell and Ecker, 1994 To date, we have screened eight phenotypes in 100 accessions and constructed a principle component analysis with preliminary data to determine lines with maximum phenotypic differences (Fig. 1). We have generated 1,000 random pair-wise crosses that include accessions at the extremes of the principle component analysis spectrum as well as lines that are phenotypically similar. We are scoring F2 from selected crosses to analyze segregation variation. We have scored 130 accessions for simple sequence length DNA polymorphisms at 14 of 20 eventual loci. These data have been only partially analyzed, but we have determined that there are an average of 20.6 (SD = 7.9) alleles per marker and any one marker will not amplify in approximately 9% of the lines. These data will be placed on a Web site linked to TAIR.
PI: David Meinke, Oklahoma State University, meinke{at}okstate.edu; Co-PI: Allan Dickerman, Virginia Polytechnic Institute and State University, dickerman{at}vt.edu; Co-PI: David Patton, Syngenta, david.patton{at}syngenta.com NSF Arabidopsis 2010 Project No. 0114866; http://www.seedgenes.org This project will contribute to an important objective of the 2010 Program: determining which genes perform essential and nonredundant functions during plant growth and development. Meeting this objective will require the identification of genes that function at different phases of the life cycle. We have chosen to focus on seed development and genes that give a visible seed phenotype when disrupted by mutation. Arabidopsis appears to contain 500 to 750 such EMB genes required for seed development and another 200 genes required for normal seed pigmentation. Our goal is to coordinate the collection, analysis, and presentation of information on these genes based on cloning of mutant alleles. Determining the cellular functions of these essential genes will complement research in other labs on the biochemical activities of specific gene products. The result will be an integrated view of essential gene functions in a model plant. Project objectives are to approach saturation for cloned EMB and seed pigment genes, standardize phenotypic characterization of the corresponding mutants, understand the functions of these genes in growth and development, determine through expression studies and comparative sequence analysis why these genes are essential, and integrate this information into a simple and robust database accessible through the Web. The Meinke lab will coordinate the project and contribute expertise in the analysis of seed mutants. Allan Dickerman and colleagues at VPI will provide expertise in database design and expression analysis. Project deliverables include public access to information and seed stocks for 500 mutants defective in 300 different EMB genes, similar information for another 100 pigment mutants defective in 75 genes, expression data for genes active in young seeds, and a database that should serve as a model for presenting synthesized information on large collections of mutants. Release of the first version of the database is scheduled for March 2002 (http://www.seedgenes.org). Profiles of the first 100 essential genes examined, including those described in existing research publications, will be presented at that time. Future releases are scheduled at 6-month intervals through September 2005. Members of the Arabidopsis community are encouraged to contribute information on additional EMB genes and mutant alleles identified in their own laboratories. This project was made possible through a large-scale T-DNA mutagenesis
program initiated at Syngenta 5 years ago. Several hundred tagged
mutants defective in seed development have been identified through
forward genetic analysis of 120,000 insertion lines (McElver et al.,
2001
PI: Blake C. Meyers, University of California, Davis, bcmeyers{at}vegmail.ucdavis.edu NSF Plant Genome Project No. 0110528; http://mpss.ucdavis.edu The primary goal of this project is to demonstrate the
utility of a novel technology called massively parallel signature
sequencing (MPSS) for the quantification of gene expression in plants.
MPSS is a rapid method to produce 17-bp sequence tags that are
precisely representative of the population of messenger RNAs in a given tissue (Brenner et al., 2000 The expression data and tag sets will provide a resource for
determining the precise level of expression of many or most of the
genes in the Arabidopsis genome either under unperturbed or under
certain treatment conditions. The 17-bp MPSS tags are derived from the
3' end of a mRNA and provide a virtually unique, experimentally derived
identifier for each expressed gene. These data are comparable with those derived by the more commonly used technique, serial analysis of gene expression (SAGE; Velculescu et al.,
1995 The MPSS sequence data is most informative when the tags are compared with either a completely sequenced genome or with large collections of ESTs. Therefore, the libraries generated from Arabidopsis (ecotype Col-0) will be compared with the complete genomic sequence of this model plant. The comparison with annotations of the DNA sequence identifies the individual genes from which the tags are derived. These data can then be used for the following types of analyses: to quantify and experimentally confirm gene expression and mRNA transcripts in diverse tissues (including shoot, root, inflorescence, silique, anthers, and callus), to measure expression strength and tissue specificity of particular genes, to estimate the frequency of alternative polyadenylation in plant tissues, to study coregulated gene pairs, and to assess global transcriptional changes in response to specific treatments. The data will facilitate gene discovery by providing experimental results that can be compared with annotated genomic sequences. Furthermore, the data can serve as a starting point for assigning functions to unknown genes by demonstrating the presence and levels of specific transcripts in distinct tissues. Sequence tags generated through this project will be accessible via the Web in a custom database and an interface will provide graphical and statistical tools for data analyses. This database and the user interface are presently under construction, and will be online at http://mpss.ucdavis.edu. The database will allow users to perform "electronic northerns" on any gene of interest, to assess and compare global patterns of gene expression for the sampled tissues, or to compare changes in expression of one or several genes across different tissues. The MPSS data also will be deposited in public gene expression databases (e.g. the SAGE database at the National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/SAGE/).
PI: Richard Michelmore, University of California, Davis, rwmichelmore{at}ucdavis.edu; Co-PI: Andrew Bent, University of Wisconsin, a-bent{at}uiuc.edu; Co-PI: Scot Hulbert, Kansas State University, shulbrt{at}plantpath.ksu.edu; Co-PI: Jan Leach, Kansas State University, Jeleach{at}ksu.edu; Senior Personnel: Blake Meyers, University of California, Davis, bcmeyers{at}vegmail.ucdavis.edu. NSF Plant Genome Project No. 9975971; http://www.niblrrs.ucdavis.edu Overall project objectives are to: (a) categorize nucleotide-binding site (NBS)-Leu-rich repeat encoding genes in Arabidopsis into functional classes, (b) identify and characterize homologous genes in rice and maize, (c) initiate detailed functional characterization of members representing each class in Arabidopsis as well as rice and maize, and (d) train students and postdocs in comparative and functional genomics, especially global expression analysis. Genes encoding NBS-LRR-containing proteins belong to one of the most prevalent families in plant genomes, comprising an estimated 0.6% of all genes in Arabidopsis. However, little is known of their function. Sequence motifs indicate that they act at the beginning of signaling pathways. To date, the only demonstrated function for these genes is in disease or pest resistance. However, they may also be involved in other aspects of plant biology including plant development and responses to the abiotic environment. We are carrying out a detailed bioinformatics and functional analysis of NBS-LRR-encoding genes in Arabidopsis. We have carefully reevaluated the annotation of all NBS-LRR-encoding genes in the Col-0 sequence. Approximately one-third were potentially misannotated. Corrections are being validated by wet lab experimentation. Very little is currently known concerning the expression of NBS-LRR-encoding genes. To determine which NBS-LRR genes are expressed and therefore more likely to be functional as well as to determine the range of expression patterns exhibited by these genes, we have been investigating their expression using a variety of methods, including RACE-PCR, MPSS, and gene trap lines. For selected NBS-LRR resistance genes, we will identify the common and different components among defense responses elicited by different resistance genes, relate the structures of different R gene products to the downstream signaling pathways that they activate, and use microarray expression-profiling methods to find expression signatures that allow functional classification of NBS-LRR genes of unknown function. We are generating plant lines that use a dexamethasone-inducible promoter system to control the expression of known R genes and avr genes in transgenic Arabidopsis plants as well as to provide ligand-independent assays for NBS-LRR genes of unknown function. We are determining expression profiles of plants that are undergoing gene-for-gene resistance in response to P. syringae pathogens that trigger known R/avr pathways using Affymetrix arrays. Four well-characterized R/avr gene pairs have been analyzed in our initial studies: RPS2/avrRpt2, RPS4/avrRps4, RPS5/avrPphB, and RPM1/avrRpm1. Initial analyses have revealed some genes that are regulated only for a particular avr/R interaction and other genes that are regulated in common for multiple avr/R interactions. Some of these genes have been previously suggested or shown to be involved in disease resistance, but most of them are either implicated in previously unrelated biochemical processes or are uncharacterized genes. To detect and characterize the diversity of NBS-LRR genes in rice and maize, we have isolated (as PCR products) or mined NBS sequences from rice and maize databases. Our analysis indicates that there may be 120 or more different NBS-LRR families in these two monocot species with variable numbers of members (one to 25 or more) per family. We have begun to map the different NBS-LRR-encoding genes in two rice cv Indica × Japonica populations. Both families are segregating for major genes and QTLs for two important rice diseases, bacterial blight and blast. We have also established cooperative efforts to begin mapping these genes in other cereals for comparative analysis. We have been using deletion mutants (in collaboration with Dr. Hei Leung, International Rice Research Institute, Philippines) to associate putative R gene function with NBS-LRR groups. To develop collections of pathogen stress response genes for determining signature profiles of rice and maize avr/R gene interactions, we have constructed pathogen stress-induced cDNA libraries from rice and maize using suppression subtractive hybridization. The rice cDNA libraries are enriched for genes whose expression is altered during interactions between rice and the bacterial blight pathogen Xanthomonas oryzae pv oryzae. We have constructed microarrays carrying our induced libraries in collaboration with Dr. Patrick S. Schnable (Iowa State University, Ames) and Dr. H. Leung. We have developed a series of relational mySQL databases for the project. These are available and searchable online via our Web page (http://www.niblrrs.ucdavis.edu). This contains our recent data about the complete set of NBS-LRR-encoding genes in Arabidopsis Col-0, including phylogenetic, sequence, and FASTA analyses as well as links to TIGR and MIPS pages for each gene, and physical map information. We also developed a program GenomePixelizer for displaying custom maps and relationships between genes. This is downloadable from our Web site. Our work will define the different classes of resistance genes and identify those with other functions; this will establish how many NBS-LRR-encoding genes act as resistance genes and how many control processes unrelated to defense. It will also identify sets of downstream genes regulated by each class. This will provide the basis for detailed studies into the action of specific NBS-LRR and downstream genes and will provide tools to facilitate the practical manipulation of plant disease resistance. In addition, it may provide the ability to manipulate diverse aspects of plant development and physiology.
PI: Stephen M. Mount, University of Maryland, College Park, sm193{at}umail.umd.edu; Co-PI: Caren Chang, University of Maryland, College Park, cc203{at}umail.umd.edu; Co-PI: Steven L. Salzberg, TIGR, salzberg{at}tigr.org NSF Arabidopsis 2010 Project No. 0114792; http://www.tigr.org/2010-splicing/ Reaching the goal of the NSF 2010 project will require the
acquisition of protein sequences for all Arabidopsis genes, yet many
genes in Arabidopsis are currently known only from genomic sequence
data. Although protein sequences can be deduced from an intron/exon
structure inferred by computational methods, accurate gene annotation
in the absence of reliable experimental evidence remains difficult to
achieve. The most accurate de novo gene finders still predict correct
intron/exon structures for less than 50% of Arabidopsis genes (Pavy et
al., 1999 Splice site selection depends not only on splice site consensus
sequences, which are fairly well understood, but also on additional information including sequences at the branch site (Simpson et al.,
2002 Thus far, we have compiled a list of all the exons from 5,000 cDNAs recently provided by Ceres, Inc. (freely available at http://www.tigr.org/tdb/e2k1/ath1/ceres/ceres.shtml) and are beginning to mine this data for ESEs. We have also constructed two vectors for
experimental verification of ESE function in transgenic Arabidopsis. These two nearly identical vectors are designed to display
ESE-dependent GUS expression that reflects skipping versus inclusion of
an enhancer-dependent exon. Both vectors contain an intron-exon-intron
unit chosen based on evidence of relatively weak signals for activation
of the exon splice sites, such that the addition of an ESE is likely to
enhance exon inclusion. We are in the process of introducing two
promising ESE candidates into this vector system to test for
enhancer-dependent splicing. One candidate is a purine-rich motif
similar to canonical animal ESEs and over-represented in Arabidopsis
exons (unpublished data), whereas the other is a sequence
recognized in vitro by the Arabidopsis SR protein atRSZp22 (Lopato et
al., 1999 Ultimately, this project will generate some 2,000 publicly available transgenic lines, available through the ABRC, carrying splicing reporter genes with defined candidate splicing enhancer sequences. A description of marker gene expression for each splicing enhancer candidate, including a description of all expression patterns and images of typical and selected patterns, will be available through the Internet at our Web site (http://www.tigr.org/2010-splicing/) with links to the seed stocks. Finally, improved gene finding and gene annotation will be realized directly as improvements to the gene annotations themselves (http://www.tigr.org/tdb/ath1/htmls/ath1.html) and as improved performance by the GlimmerM server (http://www.tigr.org/softlab/glimmer/glimmer.html).
PI: Timothy Nelson, Yale University, timothy.nelson{at}yale.edu NSF Arabidopsis 2010 Project No. 0114648; http://plantgenomics.biology.yale.edu The technique of LCM will be developed and optimized for plant tissues. This is a method originally developed for animal tissues, whereby individual cells are harvested by tacking them to a plastic film with a low-power infrared laser that can be aimed at single cells while viewing the tissue slice under a microscope. Using this method, it is possible to recover specific cell types or developmental stages from complex tissues consisting of many cell types. Cells recovered in this manner can be analyzed with regard to gene expression profiles, protein profiles, and other properties. Our major effort is to test a variety of tissue preparation methods and RNA/DNA/protein recovery methods to adapt LCM to plants. This project will optimize the LCM technique, using a variety of plant tissue sources and purity markers for specific cells. LCM provides the ability to evaluate the function of individual cells in the context of a complex tissue of different cell types. In studies of animals, this has already permitted the evaluation of a single tumor cell in the background of many normal cells, without resorting to tissue disruption and cell fractionation methods that may disturb characteristics of the cell. In plants, this will permit similar studies of the function of individual cells. For example, what genes are expressed or not expressed in an individual plant cell subjected to attack by a pathogenic bacterium or fungus? How are the cells that initiate a new leaf different from their neighboring cells that do not? The ability to answer questions with this refinement will permit advances in the understanding and manipulation of plant growth and development and in the treatment and prevention of plant diseases. The "deliverable" for this project is a package of protocols for the use of LCM on plant tissues of various types, ages, sources, and conditions. Detailed protocols will be provided to the biological community at the Web site (http://plantgenomics.biology.yale.edu). Tissues with simple organizations and relatively large cells are being tested first, and more complex tissues with smaller cells will follow. Accordingly, we developed initial protocols for isolation of bundle sheath and mesophyll cells from maize leaves and of mesophyll cells from Arabidopsis leaves. We are now developing protocols for smaller cell types in developing tissues of Arabidopsis and maize, including meristematic, provascular, and procambial cells. Protocols optimized for Arabidopsis and maize tissues will be tested and adapted for a wide range of species. We are at the same time optimizing protocols for recovery of RNA of sufficient length and quality for expression profiling. These properties are highly dependent on the combination of tissue fixation and RNA extraction protocols.
PI: Magnus Nordborg, University of Southern California, magnus{at}usc.edu; Co-PI: Martin Kreitman, University of Chicago, mkre{at}midway.uchicago.edu; Co-PI: Joy Bergelson, University of Chicago, jbergels{at}midway.uchicago.edu NSF Arabidopsis 2010 Project No. 0115062; http://magnolia.usc.edu Background Consider a particular site in the genome of some species. Trace the ancestry of all the currently existing copies of this site (for a diploid species, twice the number of currently existing individuals). These copies must be related in a tree-like fashion to the most recent common ancestor of that site. Now consider a second site, linked to the first one, and carry out the same exercise. In the absence of recombination between the sites, the genealogical tree for the second site must be identical to the tree for the first site. However, if recombination occurred between the sites in some ancestral chromosome, the two sites need no longer have identical genealogical trees, and could well have different most recent common ancestors. The genealogy of the genome can thus be thought of as a "walk through
tree space." Because of historical recombination events, the genome
is broken up into segments, each with its own tree, and the trees
gradually change as we move along the chromosomes from one segment to
next. The visible consequence of this genealogical structure is linkage
disequilibrium (LD): nonrandom associations of alleles in haplotypes
(for review, see Nordborg and Tavare, 2002 There is currently a great deal of interest in LD and haplotype
structure, especially in humans (Patil et al., 2001 Population genetics theory predicts that LD should be unusually
extensive in highly selfing species like Arabidopsis (Nordborg, 2000 Goals We will sequence approximately 2,000 short fragments distributed throughout the genome in a sample of 96 accessions. This means on the order of four fragments per cM, or one fragment every 50 kb. The fragments will be of length 500 to 700 bp, depending on how much can be sequenced in a single read. In other words, rather than resequencing one more accession in addition to Col, we will resequence 1% of the genome in approximately 100 accessions. We will also develop a database to make the data available and bioinformatics tools to make them useful. Methods Fragments will be PCR amplified and both strands sequenced using standard methods. All reactions will be carried out in plates, and set up using liquid-handling robots. Ad hoc software that incorporates publicly available components will be used to design PCR primers. We will preferentially target non-coding regions to maximize the amount of polymorphism detected. The sample of 96 accessions will be chosen as follows: Roughly one-half will consist of stock center accessions, chosen so as to include most accessions that are currently being used in developing RILs. Col will be included as control. This part of the sample will also include the accessions used in a single-nucleotide polymorphism detection project at the Max-Planck Institute for Chemical Ecology (Jena, Germany). The reminder of the sample will consist of a stratified sample of several freshly collected accessions from each of a number of populations in Europe and the U.S. The purpose of this is to get further insight into the population structure of Arabidopsis (for details, see http://magnolia.usc.edu). Resources that will be made available to the public include: (a)
roughly 2,000 high-quality sequences of 500- to 700-bp segments from
each of 95 accessions (in addition to Col). Our estimates indicate that this will be equivalent to at least 10,000 polymorphic markers in these accessions; (b) a database of primer sequences used in
the study and amplification conditions; (c) a highly flexible, extensible Web-accessible relational database with the polymorphism data; (d) links (in both directions) to the TAIR database, allowing rich annotation of the regions sequenced, and a plan to migrate the
database to TAIR upon completion of the project; (e) Web-based tools
for accessing and querying the database, in particular with respect to
the haplotype structure and genomic genealogy of the species; and (f) a
standard set of accessions for analyzing the genetic basis of naturally
occurring phenotypic variation. Because these accessions have already
been genotyped, it may be possible to map genes for traits of interest
in silico (Grupe et al., 2001 Progress High-throughput sequencing will begin in the spring of 2002. We anticipate that 500 fragments will be completed by the end of the summer of 2002, and a further 500 by the end of the year. Sequences will be made available in some form as soon as they are finished, although the database and other software tools will be under continuous development throughout the period of the grant. For further information, see http://magnolia.usc.edu. Seeds of all accessions used will be made available to the stock centers for distribution during the spring of 2002.
PI: Daphne Preuss, Howard Hughes Medical Institute, University of Chicago, dpreuss{at}midway.uchicago.edu; Co-PI: Laurens Mets, University of Chicago, l-mets{at}uchicago.edu Plant Genome Project No. 9872641; http://preuss.bsd.uchicago.edu; http://mets.bsd.uchicago.edu The Arabidopsis centromeres, like those of most higher eukaryotes, are located within highly heterochromatic and repetitive portions of the genome. These chromosomal regions are highly condensed and bind a unique assembly of proteins that directs chromosome assortment during cell division, regulates homologous chromosome pairing during meiosis, mediates sister chromatid adhesion during mitosis and meiosis, and modulates transcription of centromere linked genes. Some data suggest that the primary DNA sequence carries cues that specify all of these functions; alternatively, the DNA at the centromere might play a structural role, or might contain epigenetic modifications that are inherited through the cell cycle. DNA sequencing of centromere regions, combined with genetic and molecular assays that recapitulate centromere function, will be required to understand these processes. Our NSF project began in 1998 and aimed to characterize all five
regions that provide centromere functions in Arabidopsis, as well as
two centromere regions from Chlamydomonas
reinhardtii. Both model organisms provide the unusual
opportunity to perform tetrad analysis We analyzed marker segregation in Arabidopsis pollen tetrads,
performing a genome-wide analysis that defined centromere positions and
demonstrated that Arabidopsis cells carefully control the distribution
and number of sites of genetic exchange (Preuss et al., 1994 Analysis of centromere DNA sequences promises to clarify the
evolutionary forces that act in regions of limited recombination, as
well as improving the understanding of the role of DNA sequence patterns in chromosome segregation. We compared the non-repetitive sequence of all five Arabidopsis centromeres to each other, showing they share limited (1%-7%) similarity. We found 41 families of conserved centromere sequences, AtCCS
(http://preuss.bsd.uchicago.edu/Arabidopsis.genome.html), that are
enriched in the centromeric and pericentromeric regions. Apart from the
AtCCS sequences, most centromere DNA is not shared between chromosomes,
complicating efforts to derive clear evolutionary relationships. In
contrast, genetic and cytological assays indicate that homologous
centromeres are highly conserved among Arabidopsis accessions, albeit
subject to large rearrangements (Copenhaver et al., 1999 Ongoing investigations in this project focus on understanding the
controls that regulate the expression of the many genes found within
the centromere regions (Arabidopsis Genome Initiative, 2000
PI: John Quackenbush, TIGR, johnq{at}tigr.org; Co-Investigator: Heenam Kim, TIGR, hkim{at}tigr.org NSF Arabidopsis 2010 Project No. 0117281; http://atarrays.tigr.org We will use whole genome Arabidopsis microarrays to survey gene expression under a variety of tissues and growth conditions, including environmental stresses and pathogen response, to develop a detailed picture of patterns of gene expression throughout the complete lifetime of the plant. Ultimately, the goal of this project will be to integrate gene expression data with genome sequence and annotation, and to exploit the temporal and spatial information obtained to provide functional annotation and to search for conserved regulatory functions. Although this proposal has a large biological component, the generation of these data using whole genome microarrays will be essential for the computational analysis that forms the intellectual core of this project. Starting from germinating seeds and following plants through their lifecycle, we will survey expression in whole plants and the various plant tissue components and use these data to produce an integrated picture of expression during the plant lifecycle. By examining correlated patterns of gene expression in tissues and through time, our secondary goal will be to begin to provide functional associations for many of the hypothetical and unknown genes identified within the genome sequence. We will begin to build a gene expression encyclopedia for Arabidopsis, tracing expression patterns through the lifecycle in both whole plants and in major tissues. RNA will be collected from individual tissues and whole plants each morning during a 6-week period starting from initiation of germination and following through to seed production and plant senescence. Because root tissue is difficult to obtain from soil-grown plants, additional plants will be grown on agar to facilitate root collection. RNA will be extracted from whole plants and plant tissues and archived for analysis in sufficient quantities so that measurements can be repeated in the future as the gene models and the arrays evolve. RNA will be labeled and cohybridized to the arrays with the appropriate reference sample. All experiments will be conducted in duplicate using a flip-dye design in which the Cy3 and Cy5 labels are swapped between reference and query samples in subsequent hybridizations to minimize any possible dye-specific effects. Hybridization images will be analyzed using TIGR Spotfinder to determine relative hybridization intensities and all data will be recorded in the AGED database. All hybridizations will be analyzed using the control genes included in the arrays as well as other criteria to determine the quality of the hybridization assay; any questionable or low-quality assays will be repeated. All hybridization data will be presented on the TIGR Web site using the minimum information about a microarray experiment/MGED standards (see http://www.mged.org). Following the completion of a hybridization series, all data will be analyzed using the suite of clustering and pattern recognition tools included in the TIGR MultiExperiment Viewer, including hierarchical clustering, self-organizing maps, principal component analysis, k means clustering, and support vector machines, as well as any other available software tools, to identify putative spatial and temporal patterns of gene expression, to identify coregulated genes, and to begin to provide functional inferences for the hypothetical and unknown genes. Gene expression data will be mapped back to the completed Arabidopsis genome and cross-referenced to the genomes and orthologous genes in other plants and a variety of eukaryotic species. This will allow expression data from Arabidopsis to be leveraged against expression data in other species to provide a more complete picture of gene function and regulation. Finally, we will attempt to identify gene regulatory networks if possible. Our goal will be to provide a complete analysis of the Col-0 ecotype during year 1. We will also make an attempt to characterize expression in at least one physiologically interesting mutant or environmental stressor per year in years 2 through 3. By selecting a distinct phenotype, and correlating expression differences with developmental or morphological differences, we hope to provide additional information on possible gene function. All data generated will be presented on the TIGR Web site and attempts will be made to provide both genomic context information by mapping the expression data onto the available genome sequence and to place genes into functional categories and metabolic pathways.
PI: Natasha Raikhel, University of California, Riverside, natasha.raikhel{at}ucr.edu; Co-PI: Kenneth Keegstra, Michigan State University, keegstra{at}msu.edu; Co-PI: Jonathan Walton, Michigan State University, walton{at}msu.edu Plant Genome Project No. 9975815; http://www.cepceb.ucr.edu/members/raikhel.htm#functional Despite the importance of cell walls to the biology of
plants, little is known about the biosynthesis of their major
macromolecular components. From the known complexity of cell wall
structure we can predict that wall synthesis requires hundreds of
enzymes, but biochemical approaches have been unsuccessful in
identifying and characterizing more than a few of them. Comparative
molecular genetic studies have not been useful because the walls of
other organisms, such as bacteria and yeast, are fundamentally
different in composition, structure, and function from those of plants. We posit that a genomics-based approach is particularly appropriate for
attacking intractable problems in plant biology such as cell wall
architecture and biosynthesis. Recent advances in genomics make it
possible to identify large numbers of genes as being candidates for
involvement in particular processes. With the identification of
candidate genes for biosynthetic enzymes and regulatory proteins comes
the challenge of analyzing the functions of these genes and of the
proteins they encode. This task is particularly critical for
understanding the numerous genes whose functions are unique to plants
(Keegstra and Raikhel, 2001 Our NSF Genomic Grant was initiated on December 1, 1999, and is now in its 3rd year. Three investigators are collaborating on this project: Natasha Raikhel (University of California, Riverside) and Kenneth Keegstra and Jonathan Walton (Michigan State University, East Lansing). Our long-term goal is to understand how hemicelluloses are synthesized, delivered to the cell surface, and incorporated into the wall matrix. Our first step toward this goal is to identify and characterize the polypeptides that mediate polysaccharide biosynthesis. We are working with several plant species, with emphasis on Arabidopsis as a dicot model to investigate xyloglucan biosynthesis (Raikhel and Keegstra) and maize and rice as monocot models to study the hemicelluloses of grasses (Walton). For this Special Issue, we will discuss only our main accomplishments with Arabidopsis, but hope to discuss our work with rice/maize and other work with Arabidopsis in a future issue. Before the initiation of our genomic grant, we identified the
fucosyltransferase involved in xyloglucan biosynthesis, AtFUT1 (Perrin
et al., 1999 Genomic strategies for determining gene function require two
independent steps: the identification of candidate genes and evaluation
of the function of the candidates. One effective strategy for
evaluating the function of candidate genes is to measure the enzymatic
activity of the gene products. Such a strategy requires a reliable
enzymatic assay and until recently, an acceptor-dependent assay
was not available for the XyG xylosyltransferases. Thus, we invested
considerable effort in establishing a biochemical assay for the XyG
alpha-(1,6)-xylosyltransferase. Using pea (Pisum sativum) microsomes that are capable of XyG biosynthesis
(White et al., 1993 Bioinformatic analyses revealed several candidates for the
xylosyltransferases, but the biochemical characterization of the xylosyltransferase activity from peas led us to focus on seven Arabidopsis genes with sequence similarity to a fenugreek
(Trigonella foenumgraecum)
alpha-(1,6)-galactosyltransferase that is involved in galactomannan
biosynthesis (Edwards et al., 1999 Although xylosyltransferase activity was not observed with five other putative AtPXTs when they were tested in this assay, it is possible that they require acceptors that already contain a xylosyl residue and are involved in adding other xylosyl residues to the XyG backbone. We are continuing genetic and biochemical analysis of these putative glycosyltransferase genes and their products in search of their biological function.
PI: Mary A. Schuler, University of Illinois, maryschu{at}uiuc.edu; Co-PI: Mark Band, University of Illinois, markband{at}uiuc.edu; Co-PI: Lei Liu, University of Illinois, leiliu{at}uiuc.edu; Co-PI: Stephen Sligar, University of Illinois, s-sligar{at}uiuc.edu; Collaborator: Hans Bohnert, University of Illinois, bohnerth{at}life.uiuc.edu; Collaborator: Daniele Werck-Reichhart, Centre National de la Recherche Scientifique, Strasbourg, France, wreck{at}mailserver.u-strasbg.fr NSF Arabidopsis 2010 Project No. 0115068; http://Arabidopsis-P450.biotec.uiuc.edu With a group based largely at the University of Illinois (Urbana-Champaign), we have begun tackling the functional characterization of the 273 cytochrome P450 monooxygenase (P450) genes that exist within the Arabidopsis genome. These enzymes are involved in a diverse array of biosynthetic functions that are either shared in common with other plants or specific to Arabidopsis and catabolic/detoxicative functions that may or may not be shared with other species. (P450s are involved in the synthesis of lignin, UV protectants [flavonoids], pigments [anthocyanins], defense compounds [isoflavonoids, phytoalexins, and hydroxamic acids], fatty acids, hormones [gibberellins and brassinosteroids], accessory pigments [carotenoids], and defense compounds [terpenes].) Because of their roles in this wide diversity of metabolic processes and their relative lack of posttranslational modification, we believe that they serve as downstream reporters for the direct activation of many different biochemical pathways responding to chemical, developmental, and environmental cues and have, at first, aimed at defining the tissue-specific expression patterns of these genes by creating microarrays containing all P450 sequences and representative biochemical pathway marker sequences. These arrays are being used to define P450 transcript profiles with respect to developmental stage, tissue specificity, and a variety of internal and external chemical cues as well as changing environmental conditions (UV damage, pathogen attack, insect attack, cold stress, etc.) and to collate these with the response patterns of defined biochemical pathways so that we can begin to assign prospective function(s) to each P450 sequence and define the expression profiles in different tissues. The second series of experiments dealing with a smaller subset of these P450s (approximately 30 at a time) is focused on cloning full-length P450 cDNAs and co-expressing them in yeast and baculovirus systems with their electron transfer partner, NADPH P450 reductase. The third series of experiments is aimed at incorporating these overexpressed P450s and P450 reductase into a novel membrane-scaffolding system containing His-6-tagged membrane scaffold protein and natural lipids that has been used to solubilize a number of purified mammalian membrane proteins for functional analyses. Assembly of individual Arabidopsis P450s into these membrane-scaffolded discs will provide the basis for high-throughput substrate profiling of these P450s to determine which substrate(s) are most preferred by a particular P450. The fourth series of experiments is focused on visualizing at the cellular level the responses of these P450 genes to various chemical and environmental stresses using P450 promoter:reporter fusion genes expressed in transgenic Arabidopsis. This level of cellular visualization should provide a more accurate record of P450 gene expression than will be generated in the microarray profiling described above. Combined together, these results should serve to elucidate this model plant's biochemical responses to a variety of stress conditions and provide genomic technology tools needed for assessment of the diverse P450 gene family as well as other membrane protein families. The microarray, substrate profiling, and cellular imaging data generated on this project will be combined with a bioinformatics assessment of existing ESTs in an evolving Web site (http://Arabidopsis-P450.biotec.uiuc.edu) that will describe the activities and expression profiles of the divergent array of sequences that make up the Arabidopsis P450 superfamily.
PI: Jen Sheen, Massachusetts General Hospital, sheen{at}molbio.mgh.harvard.edu; Co-PI: Frederick M. Ausubel, Massachusetts General Hospital, ausubel{at}molbio.mgh.harvard.edu; Co-PI: Kan Wang, Iowa State University, kanwang{at}iastate.edu NSF Plant Genome Project No. 007692; genetics. mgh.harvard.edu/sheenweb/ MAPK cascades are evolutionarily conserved signaling modules with essential regulatory functions in eukaryotes, including yeasts, worms, flies, frogs, mammals, and plants. Numerous studies have shown that plant MAPKs are activated by abiotic stresses, pathogens, and pathogen-derived elicitors and plant hormones. The Arabidopsis genome and EST sequencing projects have revealed large gene families encoding MAPKs and their immediate upstream regulators, MAPKKs and MAPKKKs. However, little is known about the constitution of plant MAPK cascades and the specific roles that particular MAPK cascade genes play in particular plant signal transduction pathways. We propose a comprehensive approach based on genomic information to generate MAPK-, MAPKK-, and MAPKKK-related resources including engineered clones and transgenic plants. This MAPK tool set then will be used in conjunction with transient expression analysis to determine the function of all Arabidopsis MAPK cascade genes involved in essential plant signaling pathways. Because the functions of MAPK cascades in plant signal transduction pathways are likely conserved, our studies using the Arabidopsis genome resources will have broad implications and applications in other plant species. Objectives We are using Arabidopsis to elucidate the functions of conserved MAPK cascades in abiotic stress responses, defense against pathogen attack, and hormone signaling. MAPK cascades will first be functionally defined in plant cell systems. Engineered genes encoding constitutively activated MAPK genes will then be introduced into crop plants such as maize and soybean (Glycine max) to test their agricultural applications and values. The specific aims include: (a) to develop reporter genes for pathogen and stress response pathways and hormone signaling pathways in protoplast transient expression assays, (b) to clone all Arabidopsis MAPK cascade genes, (c) to perform functional analyses of all Arabidopsis MAPK cascade genes in protoplasts, (d) to carry out functional analysis of functionally defined MAPK cascade genes in transgenic Arabidopsis, (e) to perform functional analysis of functionally defined MAPK cascade genes in transgenic maize and soybean, and (e) to develop a Web-accessible public database containing MAPK-related information. Progress First, we have established Arabidopsis protoplast assays for the
analyses of MAPK cascades controlled by osmotic, oxidative, pathogen-derived elicitor, and hormone signals by using distinct reporter genes in each pathway (Asai et al., 2000
PI: Chris Somerville, Carnegie Institution, Stanford, CA, crs{at}andrew2.stanford.edu NSF Arabidopsis 2010 Project No. 0114562; http://cellwall.stanford.edu/cesa/index.shtml A recent breakthrough in research concerning the
biogenesis of plant polysaccharides was the identification, by genomic
methods, of genes encoding cellulose synthase in cotton fibers (Pear et al., 1996 Reiterative database searches using the Arabidopsis CesA
sequences as the initial query sequences revealed a large superfamily of at least 41 CESA-like genes in Arabidopsis. Based on
predicted protein sequences, we have grouped these genes into seven
clearly distinguishable families (Richmond and Somerville, 2000 The goal of this project is to determine the biological function of the Csl proteins. Based on the sequence similarity to cellulose synthase, it is hypothesized that the CSL genes encode processive glycosyltransferases that may catalyze the synthesis of some of the non-cellulosic polymers that comprise plant cell walls and other exopolysaccharides such as stylar secretions and mucilage, or the glycosyl residues on arabinogalactan proteins. The technical approach that will be used to determine the function of the CSL genes exploits the recently completed full genome sequence of Arabidopsis. Most or all of the CSL genes will be inactivated by insertional mutagenesis. In addition, the expression of the CSL genes will be altered by producing transgenic plants that have increased or decreased accumulation of mRNA for specific CSL genes. The effects of the mutations and transgenic events on the growth and development of the plants and on plant polysaccharide composition will be analyzed. If changes in polysaccharide composition are observed, the identity of the altered polysaccharides will be determined. This information will be used to develop enzyme assays for the corresponding enzymes. The enzymatic function of the CSL gene products will be measured in mutants and wild-type plants to associate each CSL gene with a specific enzyme of known catalytic activity. In parallel with analyses of the catalytic function of the CSL gene products, we will characterize the effects of inactivation of the CSL genes, and overexpression of selected genes, on the growth and development of the mutant plants. The types of measurements that will be carried out to quantitate phenotypic effects will depend on which polysaccharides are altered in the mutant plants and where the genes are expressed. Thus, for instance, alterations in mucilage composition may have substantially different consequences to alterations in cell wall polysaccharide composition. In preliminary studies, we have identified mutations in 29 of the CESA and CSL genes. FTIR spectrophotometry of cell walls from several of the mutants indicates significant changes in cell wall structure or composition. In addition, fractionation and chemical analysis of the polysaccharides in several of the mutants indicates significant changes in chemical composition. Thus, the preliminary results are consistent with the underlying hypothesis. However, we do not yet know the precise catalytic function of the Csl proteins.
PI: Dina A. St. Clair, University of California, Davis, dastclair{at}ucdavis.edu; Co-PI: Rebecca W. Doerge, Purdue University, doerge{at}purdue.edu; Co-PI: Richard W. Michelmore, University of California, Davis, rwmichelmore{at}ucdavis.edu NSF Arabidopsis 2010 Project No. 0115109; http://elp.ucdavis.edu Quantitative differences in the expression of genes involved in disease resistance responses are being investigated using a functional genomics approach that involves a novel application of QTL analysis to microarray data. We will identify regulatory QTLs controlling natural variation in induced gene expression patterns (i.e. ELPs) through QTL analysis of microarray data for ELPs from genetically segregating RIL populations. Dissection of regulatory networks using genetic analysis of natural allelic variation will provide an efficient method for searching for regulatory loci at the systems biology level and avoids unnatural traumatic perturbations to gene regulation that are caused by extreme mutations. QTL dissection of natural variation is complementary to mutant analysis because it is likely to reveal different aspects of the regulatory network controlling disease resistance than mutant analysis because qualitatively inherited resistance genes do not account for all the aspects of complex pathways. We will: (a) develop integrated molecular and statistical approaches for the dissection of quantitatively inherited traits and QTLs; (b) determine if ELPs involved in the variation of disease resistance pathways in Arabidopsis are due to regulatory QTLs, structural QTLs, or both; and (c) characterize individual genes at the molecular level that encode the regulatory QTLs. This will be accomplished by surveying accessions for natural variation in ELPs in response to induction of defense-related pathways by salicylic acid and jasmonic acid using Affymetrix chips for the preliminary global screen, and spotted microarrays to confirm reproducible ELPs. Populations of RILs derived from crosses between polymorphic accessions will then be phenotyped for ELPs using targeted DNA microarrays designed with novel applications of statistical methods. QTLs associated with ELPs will be mapped by employing an innovative application of established QTL mapping methodologies, including composite interval mapping and permutation thresholds. This approach will allow the identification of regulatory QTLs, a subset of which will be cloned using a combination of candidate gene and extreme allele approaches. The methodologies we establish will be applicable whether the quantitative traits are measured at the mRNA level (as here), or at the protein, metabolite, or macrophenotypic levels as the technology for such global measurements become available. The ultimate goal is to associate phenotypic ELPs to QTLs to determine the functional genomics of QTLs controlling a trait. To accomplish this project, we are using collections of accessions and RIL populations, databases with the Arabidopsis genome sequence and linkage maps, and software for microarray data analysis, QTL mapping, and statistical analysis of quantitative data. We have established a public Web site (http://elp.ucdavis.edu) that will store the microarray and QTL mapping data we generate. We will also deposit the data in established Arabidopsis public databases for the benefit of the Arabidopsis research community. Our methodologies will also be available on our Web site. This project provides multidisciplinary training at the interface of quantitative and molecular genetics, statistics, and genomics for postdoctoral researchers, graduate students, undergraduate students, and high school students.
PI: Michael R. Sussman, University of Wisconsin, Madison, msussman{at}factsaff.wisc.edu; Co-PI: Richard Amasino, University of Wisconsin, Madison, amasino{at}biochem.wisc.edu; Co-PI: Patrick Krysan, University of Wisconsin, Madison, fpat{at}biotech.wisc.edu NSF Arabidopsis 2010 Project No. 0116945; http://www.biotech.wisc.edu/2010/ We have been funded to develop and implement novel methods
for genotyping Arabidopsis mutants through the use of high-density oligonucleotide arrays. Our ultimate goal is to provide the Arabidopsis community with a knockout allele of every gene in the genome through the use of both T-DNA insertional mutations and point mutations induced
by a chemical mutagen. The genotyping methods that we are developing
make use of the maskless array synthesizer (MAS), which is a novel
technology for fabricating oligonucleotide arrays (Singh-Gasson et al.,
1999 We are using this flexible platform to make a series of high density oligonucleotide arrays that will allow us to: (a) develop a less expensive and more rapid means of obtaining hundreds of thousands of flanking sequences for T-DNA insertions, and (b) identify plants containing specific stop codon mutations in a chemically mutagenized population. Both of these projects are devoted to the goal of obtaining a knockout for each and every gene in the Arabidopsis genome, but it is important to note that the technologies being developed are applicable to other species as well. Through our past experience with T-DNA mutagenesis, we have identified two main challenges that need to be addressed to get the most out of reverse genetics in Arabidopsis. One challenge is the presence of a large number of small genes in the Arabidopsis genome, and the other is the prevalence of tandemly repeated genes. Because the probability of finding a mutation in a given gene is target size dependent, one must search through an unreasonably large population of randomly induced mutants to find knockouts in all of the small genes in a genome. Although one could simply generate and catalog larger and larger populations of mutagenized individuals, such an approach leads to a situation of diminished returns. The presence of tandemly repeated genes presents a slightly different challenge. Because closely related members of a gene family can often display functional redundancy, an important tool in reverse genetics is the ability to combine multiple mutations in a single plant so that functional redundancy can be overcome and underlying mutant phenotypes can be revealed. When two members of a gene family are not closely linked on a chromosome, this can be accomplished easily by simply performing a genetic cross between parent plants carrying each of the two mutations. When the members of a gene family are tightly linked in the form of a tandemly repeated gene cluster, the situation is more challenging. In particular, one would have to screen through a prohibitively high number of progeny to find the extremely rare recombinant between closely spaced genes. The challenges posed by small genes and tandemly duplicated genes are being addressed in this proposal by creating large populations of T-DNA-mutagenized Arabidopsis plants in which the T-DNA element contains a transposon lacking a transposase gene. By sequencing the flanking DNA in each of these insertionally mutagenized plants, we can create a database in which an insertion is present on average every 1 to 2 kb throughout the genome. These plants can then be used as a platform for launching the dormant transposon into linked regions of the genome by simply crossing the T-DNA line to a plant expressing a transposase gene. The progeny of this cross will contain plants in which the original T-DNA location remains disrupted and neighboring members of the tandem gene cluster have also been mutagenized. In addition, by choosing a T-DNA launching line in which the T-DNA has landed next to a small gene of interest, one can easily saturate a small, targeted region of the chromosome with transposon insertions and thereby increase the likelihood of obtaining a knockout in that small gene. Small genes are a major problem with current T-DNA technology because, even with 200,000 lines, a statistical analysis shows that 10% to 20% of the genes will not be mutated using standard T-DNA technology. Because reverse genetics can only reach its full potential when all of the genes have been knocked out, it is essential to have strategies that allow for full saturation of the genome with mutations. The advantage of using high-density oligonucleotide arrays for genotyping large populations of T-DNA mutagenized lines is that a single DNA chip can map the precise locations of at least 1,000 T-DNA insertions in a single hybridization reaction. In our experiments, we are making DNA chips in which the entire genome is tiled end to end on both strands. Seed from the T-DNA transformed plants for our study will be collected in "pools of 10," allowing us to handle a much larger number of total lines than if we harvested seed from individual lines. Genomic DNA will be isolated from these "pools of 10," and thermal asymmetric interlaced PCR will be used to amplify the genomic DNA immediately flanking the T-DNA insertion sites. By using three-dimensional pooling strategies and hybridizing 1,000 lines on a single DNA chip, we should be able to map 40,000 T-DNA insertion sites using fewer than 200 DNA chips. A great advantage of high-density oligonucleotide arrays over arrays containing longer sequences of double-stranded DNA is that oligonucleotide arrays allow each strand of the genome to be independently analyzed so that one can easily pinpoint insertions with a fine enough resolution to definitively determine whether the T-DNA insertion is within a gene or just outside of it. In addition, because large pools of T-DNA lines can be simultaneously analyzed, one can efficiently search through a much larger population of T-DNA lines than could be handled with the same amount of resources using single-line sequencing. Another use of the MAS that is being explored is the creation of an optimized high-density oligonucleotide array in which one can search through an ethyl methanesulfonate (EMS)-mutagenized population of Arabidopsis for specific point mutations. For example, one can use this method to search for the "knockouts" present in an EMS population that are caused by the creation of stop codons within exons at the 5' end of protein-coding genes due to the prevalent G/C to A/T transitions that EMS is known to induce. It is hoped that this technology can be developed to allow pools of DNA representing multiple independent EMS lines to be simultaneously analyzed. Because this method relies on EMS as the mutagen rather than T-DNA, it will be readily applicable for use in any other organism for which a large amount of DNA sequence is known (e.g. rice). Although one can use simple hybridization as the method for chip-based genotyping, these single-based polymorphisms, the development of inverse chemistries in which the 3' OH is available at the free end of the oligonucleotide, will allow numerous enzyme-based methods for mutation detection to be implemented. We plan to explore these strategies in the hope that they may prove more robust for detecting single base polymorphisms with a minimal number of oligonucleotides and a higher degree of sensitivity. In summary, our project is focused on using oligonucleotide arrays as tools for cataloging large numbers of mutations in Arabidopsis. The product of these studies will be a collection of cataloged T-DNA insertions and EMS mutations present in a searchable database accessible on a Web site. The presence of a "hit" within a gene of interest in this database will lead the investigator to a single batch of seed derived from a "pool of 10 parent plants." When one is working with a small gene, or a tandemly duplicated set of genes, the T-DNA element that maps in or near the gene of interest can be mobilized into the neighboring sequences to knock out the small gene of interest, or to create multiple knockouts for tandemly duplicated genes. In contrast, the T-DNA and transposon will never move unless crossed to a transposase source, so one need not worry about the stability of the mutations. It is hoped that through the combination of the resources that we develop and those generated by the Salk T-DNA single-line sequencing project headed by Joe Ecker, the Arabidopsis community will be able to access knockout mutations for any gene in Arabidopsis with a minimum of effort.
PI: Richard N. Trelease, Arizona State University, trelease.dick{at}asu.edu NSF Arabidopsis 2010 Project No. 0091826; http://lsweb.la.asu.edu/rtrelease The overall goal of the project is to identify Arabidopsis and related genes coding for peroxisomal membrane proteins (PMPs) that are involved in the biogenesis and functioning of peroxisomes, and thereafter conduct experiments aimed at elucidating the metabolic and/or biogenetic function(s) of the gene products. A multipronged cell/molecular biological approach (primary efforts) will be correlated with results from genetic analyses of available knockout mutants (our work and those of colleagues). The PMPs that will be examined are grouped into two main categories, i.e. those that perform "housekeeping" functions (e.g. transporters, constitutive and conditional enzymes, etc.) and those that participate in peroxisomal biogenesis (e.g. replication, proliferation, and differentiation). Examples of the former category of PMPs are ascorbate peroxidase, PXA1 transporter, porin(s) PMP34/36, prenylated DnaJ, myristolated CDPK, and monodehydroascorbate reductase. The latter category of PMPs includes "peroxins" (Pex). Of the 23 peroxin genes (PEX) described thus far (mostly from viable yeast mutants with peroxisomal biogenesis defects), 15 Arabidopsis peroxin orthologs were identified (http://lsweb.la.asu.edu/rtrelease). All but one of the 15 established peroxins are peripheral or integral membrane proteins. It remains to be determined whether all 14 of the Arabidopsis orthologs are PMPs and whether they participate in plant peroxisomal biogenesis. As progress is made, our public Web site will list the full-length genes that are subcloned into expression vectors and will provide data pertaining to the function of each of the gene products. Our multipronged approach includes the following general methods: (a) organelle isolation; (b) membrane protein biochemistry; (c) transient and stable transformations of Arabidopsis suspension-cultured cells; (d) immunofluorescence, confocal, and electron microscopy; (e) PCR, RT-PCR, DNA sequencing, etc., to acquire full-length DNAs, and to mutate putative targeting/sorting/functional sequences; (f) genetic analyses of available knockouts; and (g) computer-aided DNA analyses. Our findings thus far have been numerous, but diverse, and therefore
not appropriate for publication. Several focused studies are under way;
four members of the group will present poster papers at the 2002 American Society of Plant Biologists meetings. In addition, sufficient
data for two papers are being assembled into manuscripts. In one,
Arabidopsis suspension-cultured cells are shown to be an excellent
system for elucidating the sorting/targeting pathways of proteins that
function within the major organelles of plant cells. The other
manuscript focuses on the endoplasmic reticulum sorting pathway
of the AtPex10p ortholog. Research by Trelease (2002)
|