|
|
||||||||
|
Plant Physiology 138:1310-1317 (2005) © 2005 American Society of Plant Biologists The SOL Genomics Network. A Comparative Resource for Solanaceae Biology and Beyond1Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853 (L.A.M., T.H.S., N.T., B.S., R.B., J.B., C.L., M.H.W., R.A., Y.W., E.V.H., E.R.K., S.D.T.); and Department of Field Crops, Vegetables, and Genetics, Faculty of Agriculture, Hebrew University of Jerusalem, Jerusalem, Israel
The SOL Genomics Network (SGN; http://sgn.cornell.edu) is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such as potato (Solanum tuberosum), eggplant (Solanum melongena), pepper (Capsicum annuum), and tomato (Solanum lycopersicum). The aim of SGN is to relate these species to one another using a comparative genomics approach and to tie them to the other dicots through the fully sequenced genome of Arabidopsis (Arabidopsis thaliana). SGN currently houses map and marker data for Solanaceae species, a large expressed sequence tag collection with computationally derived unigene sets, an extensive database of phenotypic information for a mutagenized tomato population, and associated tools such as real-time quantitative trait loci. Recently, the International Solanaceae Project (SOL) was formed as an umbrella organization for Solanaceae research in over 30 countries to address important questions in plant biology. The first cornerstone of the SOL project is the sequencing of the entire euchromatic portion of the tomato genome. SGN is collaborating with other bioinformatics centers in building the bioinformatics infrastructure for the tomato sequencing project and implementing the bioinformatics strategy of the larger SOL project. The overarching goal of SGN is to make information available in an intuitive comparative format, thereby facilitating a systems approach to investigations into the basis of adaptation and phenotypic diversity in the Solanaceae family, other species in the Asterid clade such as coffee (Coffea arabica), Rubiaciae, and beyond.
The SOL Genomics Network (SGN; http://sgn.cornell.edu) is a genomics information resource for the Solanaceae family and related families in the Asterid clade, with the aim of building a comparative bioinformatics platform for answering questions about adaptation, evolution, development, defense, biochemistry, and other facets of this clade. To date, SGN's efforts have focused primarily on four areas: (1) cataloging and maintaining genetic maps and markers of the Solanaceae species; (2) disseminating sequence information for the different species of Solanaceae, mostly in the form of expressed sequence tags (ESTs), for which SGN generates and publishes unigene builds; (3) cataloging and publishing phenotypic information; and (4) assembling, analyzing, and publishing data from the recently commenced sequencing of the tomato (Solanum lycopersicum) genome. Unlike many other plant resources on the Web, which often focus on a single plant species, such as the Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org) on Arabidopsis (Arabidopsis thaliana; Rhee et al., 2003
The Solanaceae have held great interest for many researchers, breeders, and consumers for a long time. Indeed, the Solanaceae family is composed of more than 3,000 species, including the tuber-bearing potato (Solanum tuberosum), a number of fruit-bearing vegetables (tomato, eggplant [Solanum melongena], and peppers [Capsicum annuum]), ornamental plants (petunias [Petunia hybrida], Nicotiana), plants with edible leaves (Solanum aethiopicum, Solanum macrocarpon), and medicinal plants (e.g. Datura, Capsicum; Knapp, 2002
To meaningfully analyze the gene-to-phenotype relationships, a large amount of sequencing information is necessary. The most cost-effective way to get sufficient sequence information to address the SOL questions is to sequence a high-quality reference genome and then map sequences from other genomes onto the reference sequence. Hence, the first cornerstone of the SOL project is the sequencing of the full euchromatic portion of the tomato genome by an international consortium of 10 countries. Concomitantly, SOL will build a bioinformatics platform that allows intuitive and unrestricted access for researchers, and integrates information from all Solanaceae research into a one-stop shop on the Web that will ultimately allow approaching Solanaceae biology from a systems biology perspective. In collaboration with other bioinformatics centers involved in SOL, SGN is actively building this infrastructure, which will be distributed in nature. It will rely on bioMOBY (Wilkinson and Links, 2002
Like most other plant databases, SGN can be accessed through an easy-to-use Web interface. The SGN homepage was recently revamped to improve the usability of the site. It now contains an intuitively organized Getting Started section that provides links to the major features and Web pages, such as data overview pages, search pages, map and markers, resources, etc. In the lower part of the screen is a section providing links to related sites of interest, such as the Tomato Expression Database (TED; Fei et al., 2004
All pages on SGN contain a toolbar at the top with links to the most frequently used sections of the site for easy navigation. The toolbar consists of the SGN logo, which is also a link to the SGN homepage, a quick search function that searches the SGN databases and Web pages, and a menu bar with pull-down menus providing quick links to specific pages grouped by menu topic. Search pages for several types of data are available, such as searches for markers, unigenes, expressed sequence tags (ESTs), EST libraries, bacterial artificial chromosomes (BACs), and profiles of registered SGN users. Using the marker search, markers can be queried by name, map, organism, map position, and whether a marker has associated information such as an overgo probe. Using the BAC search, BACs can be searched by name, including wild cards, presence of end sequence, and matches to overgo probes. Currently, about 75,000 BACs have been end sequenced from a HindIII library (Budiman et al., 2000 On SGN, all information is freely accessible to all users. A login system exists only for the purpose of a user-managed database of Solanaceae researchers and for submission of EST sequences, and is required for sequencing centers that participate in the tomato sequencing program to update the BAC status information in the SGN database. In the near future, login will also allow users to comment on data objects, such as markers and unigenes, and to make user-contributed annotations to genes. In summary, SGN strives to operate under these guiding principles: (1) all data should be accessible without restrictions; (2) original data should be stored wherever possible (chromatograms, assembly files, gel images, etc.) to ensure complete reproducibility; (3) all data should be attributed to the submitters and data generators; (4) all annotations should be carried out using standard vocabularies and annotation guidelines; (5) free and open-source software is used where possible and SGN-developed software is made available to all as open source; and (6) all the data are loaded into interconnected SGN relational databases such that, ultimately, a systems approach to Solanaceae biology becomes possible.
The SGN database consists of a number of interrelated relational databases implemented in MySQL (http://www.mysql.com). Most software is written in Perl. The Web site uses the Apache (http://www.apache.org) Web server with the mod_perl integrated Perl interpreter. In keeping with the philosophy of open systems and open-source software, all servers and most development machines run the Debian distribution of the GNU/Linux operating system. More information on the database schemas, software, and setup at SGN can be found on the SGN Web site (http://sgn.cornell.edu).
SGN Solanaceae Unigene Builds As no full genome sequence of a representative Solanaceae species is yet available, much of the existing sequence data on SGN consists of EST datasets for Solanaceae species. However, as the tomato sequencing project begins to bear fruit, SGN's focus will change more to genomic sequence data. From these EST datasets and other known transcript sequences, unigenes are assembled in an effort to approximate the transcriptome set of each organism. SGN currently produces unigene builds for Solanaceae species that have EST sequences available with associated chromatograms. Unigene builds are available for tomato, potato, pepper, eggplant, and petunia (see Table I). A Web interface is also available for submitting new sequence datasets.
In contrast to many other unigene assembly methods, the SGN custom unigene assembly pipeline starts at the level of the raw chromatogram in order to apply the same quality standards to all data, thereby increasing the consistency and overall quality of the builds. The assembly pipeline, which is tightly integrated with the SGN database, works as follows. First, the chromatograms are base called with phred and the raw sequences are loaded into our database. Next, the sequences are processed to determine a high-quality region excluding low-quality or cloning vector sequences. Then, Escherichia coli or lambda phage contamination is detected with an automated National Center for Biotechnology Information (NCBI) BLAST search, and contaminated sequences are flagged in the database. Inserts that contain sequences matching the multiple cloning site of the vector are flagged as chimeric. A second chimera screen is also applied that attempts to align the ends of a read with any Arabidopsis coding sequences. If the two ends match unrelated Arabidopsis genes, the sequence is flagged as chimeric. Flagged sequences are not used in subsequent unigene builds. The unigene assembly proceeds through a custom preclustering program, which generates clusters that are fed into the cap3 program (Huang and Madan, 1999
The SGN database currently houses six maps, the Tomato E x PEN 2000 (Fulton et al., 2002
Phenotypic Information SGN houses a collection of phenotypic information in the Genes That Make Tomatoes database describing a mutant population of 13,000 Solanum lycopersicum M2 families of the M82 variety. This comprehensive mutant population is a useful basic resource for exploring gene function. These mutants were generated using ethyl methanesulfonate and fast-neutron mutagenesis, and the plants were visually phenotyped in the field and then categorized into a morphological catalog encompassing 15 primary and 48 secondary categories. Currently 3,417 mutations have been cataloged; among them are most of the previously described phenotypes from the monogenic mutant collection of the TGRC, plus over 1,000 new mutants with multiple alleles per locus. The phenotypic database indicates that most mutations fall into more than a single category (they are pleiotropic), with some organs (e.g. leaves) more prone to alterations than others. All data and images can be searched and accessed through SGN. The mutants were generated and phenotyped and the database is administered by the Zamir lab at the Hebrew University in Jerusalem, Israel. Another tool is real-time quantitative trait loci, which presents the results of eight independent phenotyping experiments on an isogenic inbred line population and allows viewing correlations online in real time. A tighter integration of phenotypic data with the molecular and mapping data on SGN is planned for the future.
The tomato sequencing project was initiated in 2004 by a consortium of 10 countries, with each of the following countries sequencing one chromosome: Korea (chromosome 2), China (chromosome 3), Great Britain (chromosome 4), India (chromosome 5), The Netherlands (chromosome 6), France (chromosome 7), Japan (chromosome 8), Spain (chromosome 9), and Italy (chromosome 12). The United States sequenced three chromosomes (chromosomes 1, 10, and 11; see Fig. 2). The tomato genome is composed of approximately 950 Mb of DNA, more than 75% of which is heterochromatin and largely devoid of genes. The majority of genes are found in long contiguous stretches of gene-dense euchromatin located on the distal portions of each chromosome arm. The sequencing strategy is to sequence a minimal tiling path of BAC clones through the approximately 220 Mb of euchromatin. The starting points for sequencing the genome will be 1,500 anchor points, where the physical map has been linked to the genetic map using overgo probes. The results of the overgo analysis are available on SGN (http://sgn.cornell.edu). In addition, SGN is involved in setting up part of the infrastructure for the tomato sequencing project, such as a BAC registry, so that the status of each BAC in the sequencing pipeline can be tracked by users and sequencers alike. The finished BAC sequences will be deposited in both GenBank and SGN. On SGN, annotations will be included that will be viewable online, based on Gbrowse (Stein et al., 2002
To date, the curational activities have focused on maps and markers, and the functional annotation, primarily by automatic means, of the unigene sets for Solanaceae species generated at SGN. However, with the start of tomato sequencing, SGN will be involved in the annotation of the tomato genome. A pipeline is currently being developed by the tomato sequencing consortium that allows the automatic annotation of gene structures, repeats, RNAs, and other features on BAC sequences. The pipeline will be based on a distributed system such that, for each analysis, the best tool available for a given task can be pulled into the pipeline as a Web service, such as a bioMOBY service (Wilkinson and Links, 2002
There are myriad other Solanaceae resources on the Web besides SGN. The most important ones are listed on the Solanaceae Resources links page on SGN, some of which are summarized in Table III.
SGN is a relatively new database that is rapidly developing into a comprehensive resource for comparative biology between members of the Solanaceae family, closely related plants, such as coffee (Coffea arabica), and beyond. The tomato genome sequence will provide the foundation for a new sequenced-based, comparative resource that will aid in linking the Solanaceae to each other and outward to other species and families where enough sequence information is available. As time progresses, the hypothesis-driven research enabled by large repositories of biological data like SGN will continue to gain importance. Instead of only answering simple queries, data repositories and plant Web sites should strive to provide the enabling context that leads to further questions, allowing a researcher to posit new hypotheses that can be verified in silico or in vitro. The ultimate goal of SGN is to make information available in an intuitive comparative format that will make this type of science possible, thus facilitating a systems approach to investigating the basis of adaptation and phenotypic diversity in the Solanaceae. Received February 3, 2005; returned for revision April 18, 2005; accepted May 3, 2005.
1 This work was supported by the National Science Foundation (grant nos. 0116076, 9872617, 975866, and 0421634) for the SGN and the tomato sequencing project. www.plantphysiol.org/cgi/doi/10.1104/pp.105.060707. * Corresponding author; e-mail lam87{at}cornell.edu; fax 6072556683.
Adams-Phillips L, Barry C, Giovannoni J (2004) Signal transduction systems regulating fruit ripening. Trends Plant Sci 9: 331338[CrossRef][Web of Science][Medline]
Alexander L, Grierson D (2002) Ethylene biosynthesis and action in tomato: a model for climacteric fruit ripening. J Exp Bot 53: 20392055 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 2529[CrossRef][Web of Science][Medline]
Bogdanove AJ, Martin GB (2000) AvrPto-dependent Pto-interacting proteins and AvrPto-interacting proteins in tomato. Proc Natl Acad Sci USA 97: 88368840 Brummell DA, Harpster MH (2001) Cell wall metabolism in fruit softening and quality and its manipulation in transgenic plants. Plant Mol Biol 47: 311340[CrossRef][Web of Science][Medline]
Budiman MA, Mao L, Wood TC, Wing RA (2000) A deep-coverage tomato BAC library and prospects toward development of an STC framework for genome sequencing. Genome Res 10: 129136
Doganlar S, Frary A, Daunay MC, Lester RN, Tanksley SD (2002) A comparative genetic linkage map of eggplant (Solanum melongena) and its implications for genome evolution in the Solanaceae. Genetics 161: 16971711 Fei Z, Tang X, Alba RM, White JA, Ronning CM, Martin GB, Tanksley SD, Giovannoni JJ (2004) Comprehensive EST analysis of tomato and comparative genomics of fruit ripening. Plant J 40: 4759[CrossRef][Web of Science][Medline]
Fernie AR, Willmitzer L (2001) Molecular and biochemical triggers of potato tuber development. Plant Physiol 127: 14591465 Fray RG, Grierson D (1993) Molecular genetics of tomato fruit ripening. Trends Genet 9: 438443[CrossRef][Web of Science][Medline]
Fulton TM, Van der Hoeven R, Eannetta NT, Tanksley SD (2002) Identification, analysis, and utilization of conserved ortholog set markers for comparative genomics in higher plants. Plant Cell 14: 14571467 Gebhardt C, Valkonen JP (2001) Organization of genes controlling disease resistance in the potato genome. Annu Rev Phytopathol 39: 79102[CrossRef][Web of Science][Medline]
Giovannoni JJ (2004) Genetic regulation of fruit development and ripening. Plant Cell (Suppl) 16: S170S180 Gray J, Picton S, Shabbeer J, Schuch W, Grierson D (1992) Molecular biology of fruit ripening and its manipulation with antisense genes. Plant Mol Biol 19: 6987[CrossRef][Web of Science][Medline] Gur A, Semel Y, Cahaner A, Zamir D (2004) Real time QTL of complex phenotypes in tomato interspecific introgression lines. Trends Plant Sci 9: 107109[CrossRef][Web of Science][Medline] Hamilton AJ, Fray RG, Grierson D (1995) Sense and antisense inactivation of fruit ripening genes in tomato. Curr Top Microbiol Immunol 197: 7789[Medline]
Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9: 868877
Knapp S (2002) Tobacco to tomatoes: a phylogenetic perspective on fruit diversity in the Solanaceae. J Exp Bot 53: 20012022
Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res 32: D393D397 Lee JM, Nahm SH, Kim YM, Kim BD (2004) Characterization and molecular genetic mapping of microsatellite loci in pepper. Theor Appl Genet 108: 619627[Medline] Lewis SE, Searle SM, Harris N, Gibson M, Lyer V, Richter J, Wiel C, Bayraktaroglir L, Birney E, Crosby MA, et al (2002) Apollo: a sequence annotation editor. Genome Biol 3: RESEARCH0082
Li L, Li C, Howe GA (2001) Genetic analysis of wound signaling in tomato. Evidence for a dual role of jasmonic acid in defense and female fertility. Plant Physiol 127: 14141417 Menda N, Semel Y, Peled D, Eshed Y, Zamir D (2004) In silico screening of a saturated mutation library of tomato. Plant J 38: 861872[CrossRef][Web of Science][Medline]
Mulder NJ, Apweiler R, Attwood RK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, et al (2003) The Interpro database, 2003 brings increased coverage and new features. Nucleic Acids Res 31: 315318 Pedley KF, Martin GB (2003) Molecular basis of Pto-mediated resistance to bacterial speck disease in tomato. Annu Rev Phytopathol 41: 215243[CrossRef][Web of Science][Medline] Prat S, Frommer WB, Hofgen R, Keil M, Kossmann J, Koster-Topfer M, Liu XJ, Muller B, Pena-Cortes H, Rocha-Sosa M, et al (1990) Gene expression during tuber development in potato plants. FEBS Lett 268: 334338[CrossRef][Medline]
Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, et al (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31: 224228
Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12: 15991610
Tanksley SD (2004) The genetic, developmental, and molecular bases of fruit size and shape variation in tomato. Plant Cell (Suppl) 16: S181S189 Tanksley SD, Ganal MW, Prince JP, de Vicente MC, Bonierbale MW, Broun P, Fulton TM, Giovannoni JJ, Grandillo S, Martin GB, et al (1992) High density molecular linkage maps of the tomato and potato genomes. Genetics 132: 11411160[Abstract]
Wilkinson MD, Links M (2002) BioMOBY: an open source biological web services proposal. Brief Bioinform 3: 331341
Zdobnov EM, Apweiler R (2001) InterProScanan intergration for the signature-recognition methods in InterPro. Bioinformatics 17: 847848 This article has been cited by other articles:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|