|
|
||||||||
|
First published online June 6, 2008; 10.1104/pp.108.119560 Plant Physiology 147:1788-1799 (2008) © 2008 American Society of Plant Biologists OPEN ACCESS ARTICLE
A Community-Based Annotation Framework for Linking Solanaceae Genomes with Phenomes1,[C],[OA]Department of Plant Breeding and Genetics, and Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
The amount of biological data available in the public domain is growing exponentially, and there is an increasing need for infrastructural and human resources to organize, store, and present the data in a proper context. Model organism databases (MODs) invest great efforts to functionally annotate genomes and phenomes by in-house curators. The SOL Genomics Network (SGN; http://www.sgn.cornell.edu) is a clade-oriented database (COD), which provides a more scalable and comparative framework for biological information. SGN has recently spearheaded a new approach by developing community annotation tools to expand its curational capacity. These tools effectively allow some curation to be delegated to qualified researchers, while, at the same time, preserving the in-house curators' full editorial control. Here we describe the background, features, implementation, results, and development road map of SGN's community annotation tools for curating genotypes and phenotypes. Since the inception of this project in late 2006, interest and participation from the Solanaceae research community has been strong and growing continuously to the extent that we plan to expand the framework to accommodate more plant taxa. All data, tools, and code developed at SGN are freely available to download and adapt.
Biological databases have become one of the principal drivers of research and innovation in biology. For plants, model organism databases (MODs), such as The Arabidopsis Information Resource (TAIR; Swarbreck et al., 2008
The early sequenced genomes, such as Mus musculus (Eppig et al., 2007
However, for the databases and plant community, two important limitations remain. First, these model organism systems cannot be used to annotate the specific biology of other plants or plant clades, and, second, the centralized approach is not scalable beyond the existing model organisms without a concomitant scaling up of funding. Therefore, other radical methods must be developed for annotating more organisms, such as the Solanaceae clade, and also to enhance the quality and scale of curation. The most compelling prototype approaches involve the research community in the annotation process in some way. We refer to these strategies broadly as community annotation. Currently, annotation jamborees are most commonly practiced community annotation (Pennisi, 2000 Herein we describe a community annotation approach for gene and phenotype data that leverages the existing database infrastructure at SOL Genomics Network (SGN), including data from the ongoing International Tomato Sequencing Project (The Tomato Sequencing Consortium, unpublished data). We think this approach will be successful because: (1) the Solanaceae research community has a well-established tradition of unrestricted collaboration and sharing of data and materials; (2) this community annotation software is written with user-friendliness as a primary design goal, enabling scientists to utilize structured vocabularies and other advanced annotation tools with relatively little training; (3) SGN curators routinely provide necessary guidelines and technical support to community annotators; and, most importantly, (4) there is a significant worldwide social trend toward open collaboration and data sharing on the Web.
We are amid a revolution in how people use computers to share data on the Web, as evidenced by the recent success of social networking sites that have made sharing user-generated content popular. Among those sites are Flickr (www.flickr.com) for photos and Youtube (www.youtube.com) for short videos. Bioinformatics has a long tradition of sharing information, programs, and code; Web sites designated for hosting Open Source software include SourceForge.net, BioPerl (Stajich et al., 2002 This social networking and data-sharing movement, in combination with the new paradigm for the Web, often termed Web 2.0, and which relies heavily on technologies that can be used to provide a richer and more user-friendly experience, are the critical ingredients for bringing successful community annotation to biology.
The Solanaceae are an excellent system to showcase such community annotation systems. With their exceptionally conserved genomes, yet extremely diverse phenotypic variation and adaptations to natural and agricultural environments, they comprise important species, such as tomato (Solanum lycopersicum), potato (Solanum tuberosum), pepper (Capsicum annuum), and petunia (Petunia hybrida), that are important model systems for research as well as important food crops or commercial products. Our system builds upon the bioinformatics platform for addressing Solanaceae diversity, SGN (http://www.sgn.cornell.edu), a clade-oriented database (COD) containing genomic, genetic, and taxonomic information (Mueller et al., 2005 SGN is also the bioinformatics hub for the ongoing international project to fully sequence the euchromatic portion of the tomato genome. This project will provide a high-quality reference to interpret the sequence organization of other Solanaceae crops and serve as the basis for understanding how plants diversify and adapt to new and adverse environments. Thus, the tomato genome coupled with automated and user-contributed gene annotations will reveal novel phenotypes of agronomic and commercial value for the entire Solanaceae and related families of the Asterid clade.
The SGN community annotation effort has produced the necessary software for user-friendly Web interfaces for annotation and data display, back-end data modeling, storage, and auditing. The ease of use of the annotation tools combined with clear annotation guidelines has encouraged the Solanaceae research community to actively participate in the annotation process as measured by the continued increase in number of locus and phenotype annotations.
At the time of this writing, approximately 12 months after the introduction of community annotation functionalities on SGN, a total of 183 loci have been annotated by the community. Ninety-five of these loci have designated editors, 42 in total, who are experts on the locus or loci. The extent of annotation by the community ranges from creating a new locus or phenotype entry to adding or editing data to an existing entry. The contributed annotations include alleles, sequences, publications, ontology term annotations, images, phenotyped accessions, and locus-locus associations. The phenotype database also contains user-submitted information, including more than 6,000 phenotyped accessions of 17 distinct populations. Phenotypes are usually batch loaded into the database by SGN curators and the submitter has editorial privileges in a similar manner to the locus database. Phenotypes can also be added manually via the Web interface (see "Materials and Methods").
A gene is defined as the genomic sequence corresponding to a transcribed unit in the genome. The Solanaceae and tomato, in particular, have rich historic collections of gene descriptions based on morphological and biochemical phenotypes (Butler, 1952 Each locus in our database has a unique name and symbol and must be associated with an organism. Currently, our database contains locus information of tomato, potato, pepper, eggplant (Solanum melongena), tobacco (Nicotiana tabacum), and henbane (Hyoscyamus niger; Table I ). Locus data include links to GenBank accessions, supporting literature, SGN markers and unigenes, and Gene Ontology (GO) and Plant Ontology (PO) annotations (Fig. 1 ). To aid the community in locus annotation for a given species, we periodically update the database with selected bulk information from these sources. This allows the community annotators to utilize and complement this information with their manual annotations. To populate the database, we have developed an automated pipeline to process new locus data and upload and update existing links and annotations (see "Materials and Methods").
User Interface Gene/Locus Search The search application for the locus database allows users to search for loci using locus or allele names, symbols, and/or synonyms. In addition, more advanced search criteria are available for limiting search results to a specific organism, allele-associated phenotype, chromosome, GO or PO term name, synonym, or ID/name of a locus editor, GenBank accession, loci that have associated gene sequence, loci with a mapped location, or loci with ontology annotation. Search results are displayed as a list of loci matching the search parameters with links to separate pages showing the details of each locus. Because the search application searches both the locus and allele datasets and there may be multiple alleles for each locus, the same locus may be shown in the result table multiple times, once for each allele. Clicking a locus name in the search results displays the locus page with the following sections.
Locus Details
If the locus has a known chromosomal location, a chromosome glyph is shown, and if the locus is associated with a known marker, the marker is shown on the glyph in its genetic map location (Fig. 2A
). The chromosome glyph and the marker name are clickable links to the SGN comparative viewer (Mueller et al., 2008
Any SGN submitter has the ability to add and delete locus synonyms. Clicking on the add/remove link leads to the locus synonyms page, where synonyms may be added or removed. If the locus information was obtained from another organization, a corresponding link is displayed. At the bottom of the section is information about the locus editors and the editing history. Every locus entry is assigned one or more editors who have full editing and deleting privileges. The name of each editor is shown as a clickable link leading to the editor's personal details page (see "Materials and Methods") where users can find contact details of the editor, followed by the date when the locus was created in the database, the date of its last update, and which editor made the last update.
Notes and Figures
Accessions and Images
Known Alleles
Associated Loci
SolCyc Links
Sequence Annotation
Literature Annotations
Ontology Annotations
Solanaceae Phenotype Module A phenotype is the observable trait of an individual. Phenotyping records are kept with individual accessions because phenotypic variation of single plants may vary with genetic background, the environment, phenotyping methodologies, and human inconsistencies in scoring for traits. Each accession in the phenotype database has a unique name and is associated with a population (see "Materials and Methods"). Currently, the database contains 6,921 accessions from 17 populations (Table II ). Individual accession data include images, underlying loci and alleles, phenotypic attributes, the genetic makeup of each plant, germplasms, and ontology annotations (Fig. 1). The database is usually populated with batch information for large datasets. Accession entries, for example mutants, can also be added to the database by submitters using the Web interface.
User Interface Phenotype Search The phenotype search function allows users to search for accessions using keywords from the name or phenotype descriptors. An advanced search can be done using filters for a specific population, PO, or Solanaceae Phenotype (SP) term, name of an accession editor, accessions with associated loci, or accessions with ontology annotation. Search results are displayed as a list of accessions matching the search parameters with links to separate pages showing the details of each accession. Clicking an accession name in the search results displays the accession detail page (Fig. 5 ), divided into the following sections.
Accession Details Each accession in the database has a unique name, free-text description, population name, the name of the person who submitted the record, and references to loci identified in the accession (Fig. 5A). Each accession may be associated with more than one locus if it carries variation in more than one gene. As of this writing, the community annotation database has information on accessions of tomato and eggplant mutants, cultivars, mapping and quantitative trait loci (QTL) populations, breeder lines, transgenic accessions, and introgression lines.
Images
Phenotype Data
Genotype Data
Alleles
Germplasms
Ontology Annotations
We have developed a comprehensive database for the community annotation of loci and phenotypes, providing functionality for extensive annotation based on free-text descriptions, controlled vocabularies, images, sequences, and literature references. Users with submitter accounts can contribute information using easy-to-use Web interfaces. All submitted data are immediately visible to all users, facilitating review and discussion of annotations as they emerge. While only submitters can modify data, all registered users can contribute knowledge using the forum-like comments option available on each page. Quality control pipelines and rigorous submission tracking ensure that only high-quality annotations are published on the site. As of March 2008, the database contains 3,604 loci, 1,014 publications, and 6,921 plant accessions (Table I). There are 42 community submitters who contributed most of the phenotyped accessions and information on approximately 200 loci. This community annotation effort creates a medium and tools for the Solanaceae research community to annotate their genes and phenotypes, that way ensuring that the quality of data in the database is as accurate, current, and accessible as possible. Nevertheless, community annotation is only one aspect of the curational capacity at SGN and adds an additional aspect to the larger scale of automated and in-house-curator annotations. Critical metrics for our system's success are the number of community annotators and the number of annotations they make. If the current rate of subscription continues, we expect the number of community annotators to grow by about 100 every year. Our outreach program actively solicits contributions from leading scientists through direct e-mail contact, presentations at conferences, and publications in leading journals. While our goal is to have at least 200 annotators by the end of 2009, we see this number as a critical mass for the system to be useful. We predict that, in a few years, online annotations will be a normal part of any biologist's routine and our system will scale well to thousands of annotators. The literature contains a vast amount of information on genes and mutants that has yet to be integrated into any electronic database in a format that makes it computationally accessible. The SGN community database hopes to help close this gap by providing an easy way for most knowledgeable members of the community to contribute this information. Our system allows submitters to edit almost any data type associated with a locus or a phenotype, so even partial data excluded from publication, corrections, or supplemental information can be added to SGN community annotation pages. In 2007 alone, more than 1,300 Solanaceae-related papers were published (about 90% of which were on either tomato or potato) and more than 150 Solanaceae mRNA sequences were submitted to GenBank. Researchers from the community are by far the best resource for reviewing gene information and extracting relevant data from their own publications. Due to space limitations or focus on specific traits or processes, papers and supplementary materials do not always include all useful data gained from experiments. SGN provides the research community with a platform for sharing supplementary information that may be useful for other members of the research community.
The inferred cross-links between phenomes and genomes provide a resource for studying genome evolution and the resulting phenotype variation in plants (Ori et al., 2007
In recent years, much progress has been made in defining standard controlled vocabularies for biology, which seek to develop standard machine-readable ways to describe general processes shared by different organisms, called ontologies. Ontologies greatly facilitate meaningful cross-species queries between disparate databases by providing a common semantic framework that can be used in searches and comparisons. Among ontologies, the GO (Gene Ontology Consortium, 2008 Beyond the borders of the Solanaceae community, several other approaches have recently been developed. For example, EcoliWiki (http://ecoliwiki.net) has deployed an installation of MediaWiki (http://mediawiki.org) as a hub for community annotation of Escherichia coli K-12. Wikis have advantages in that they are simple to set up and maintain, user-friendly, and already familiar to many users due to the popularity of Wikipedia (http://wikipedia.org). For bioinformatics purposes, the most significant limitation of traditional wikis is that the wiki's content is stored in a mostly unstructured manner and without any semantic metadata. This tends to make large-scale automated analysis of wiki content difficult and error prone at best. This limits the usefulness of such resources because such large-scale analyses are the bedrock in modern bioinformatics. In contrast, the SGN community annotation system stores data in a highly structured relational database, an ideal basis for large-scale bioinformatics analyses. Despite its limitations, wiki-style free-text editing can, however, be an excellent option for community editing of information whose structure may not be known in advance. Currently, SGN community annotation pages allow submission of free-text comments at the bottom of each page that can be used for this purpose.
Ultimately, our objective is to present on each locus and phenotype page the entire story of a gene, including not only its descriptors, synonyms, and functions, but also its history, provenance, mapping, cloning, and sequencing, and all the experimental steps, people, and methods involved in its characterization. Each page will essentially be presented as a free-standing publication, creating a permanent, yet evolving, entry that can be cited and referenced. With the growing number of gene descriptors and annotations, MOD databases are becoming central actors in a community effort to develop a unified gene nomenclature and gold standards for annotation, not only for maintaining similar guidelines within an organism's research community, but also for comparative searches across taxa. Journals are beginning to collaborate with databases to set nomenclature standards and naming conventions. Since July 2007, manuscripts for the publication Plant Physiology have been required to supply a TAIR locus identifier for Arabidopsis gene data (http://www.plantphysiol.org/misc/ifora.shtml). The benefits of this policy include prevention of nomenclature conflicts (since TAIR arbitrates the nomenclature) and ensure availability of up-to-date gene information. We intend to provide the research community with a similar system of stable identifiers, naming conventions, and annotation standards for the Solanaceae.
SGN is the first among the major plant databases to put the control of the information directly in the hands of community experts, with SGN curators acting as editors in the annotation process, rather than exclusively as authors. As a result, SGN annotations are more up to date and richer with detailed descriptions, images, and several levels of gene-to-phenotype cross-links, than would otherwise be possible without a large curatorial staff. We would be happy to collaborate with other research communities to help start community annotation efforts of other organisms and clades.
Platform Technologies SGN stores and indexes most of its data using the open source PostgreSQL database system (http://www.postgresql.org). Most software developed at SGN is written in Object Oriented Perl and Javascript. User data submission forms are written using AJAX techniques to provide powerful and user-friendly interfaces. The SGN Web site uses the Apache (http://www.apache.org) Web server with the mod_perl integrated Perl interpreter. All servers and most development machines run the Debian distribution of the GNU/Linux operating system. More information on the database schemas, software, and setup at SGN can be found on the SGN Web site (http://sgn.cornell.edu).
The first step toward implementing a system for representing phenotype-to-genotype relationships was to design a database schema for storing Solanaceae loci and phenotypes with cross-references between the two datasets (Fig. 1). The following conceptual data types are used, many of which map directly to Perl classes and/or tables in the PostgreSQL database. Locus: Central data type representing descriptive genetic information of plausible transcribed units in the genome. Locus has unique names and symbols, synonyms, allele data, related sequences. It is annotated with supporting literature records and phenotypes are described using controlled vocabularies. Allele: Alternative form of a locus. It may originate from natural or induced variation. Alleles allow representation of multiple products and phenotypes of a single locus in an organism. Phenotype: Measurable traits and characteristics of individuals within a defined population. Phenotypes are stored as text descriptors of alleles and individual accessions, annotated images, quantitative measurements, and controlled vocabulary terms. Accession: Single member of a predefined population. Annotated with phenotypic and genotypic attributes, such as images, locations on a genetic map, and controlled vocabulary terms. Cross-referenced with loci via associations with alleles (accession to allele to locus). Population: Collection of individuals (accessions) sharing a common genetic background or a common phenotyping or genotyping scheme. A population may be genetically homogeneous, such as mutant collections in a specific background or isogenic inbred lines, or may be a heterogeneous collection of plants of different genetic backgrounds that have been characterized using similar methodologies.
In the PostgreSQL database system, data are represented as tables with rows and columns that hold data and also refer to other tables. To maintain a comprehensive audit trail for every data point change, each user-updateable table associated with the community annotation system stores certain standardized metadata for each database record such as creation date, modification date, owner ID, the ID of the user who submitted an update, and obsoleteness information (Supplemental Fig. S1). The owner of the record is usually authorized to edit and delete (actually, obsolete) the information. All users can view the information. The core structure of the database (Fig. 1) consists of a locus table for storing gene descriptors and an individual table for storing phenotype descriptors of accessions. More than 30 additional tables store related information, such as alleles, image data, and annotations. Plant accessions (individuals) can be linked to an allele through a linking table allowing many relationships between loci, accessions, and images. Thus, accessions with mutations in several loci can be represented easily. Each locus has a default allele used as a place holder to allow associating phenotypes to loci in the absence of allele information. Over time, as genes are sequenced and annotated with allele information, the locus-phenotype associations may be refined to include more specific allele information. Sequences, publications, biochemical pathways, controlled vocabulary terms, and other general data used for annotation are primarily stored using a slightly modified version of the GMOD Chado database schema (Mungall et al., 2007
From the outset, the community annotation system was designed with an eye toward participation from Web site users in a variety of roles. Each logged-in user has an account type, which is used as the first level of granularity for assigning database access and editing privileges. Web access to view all data is unrestricted and does not require registration. The default user type is user, which carries permission for posting comments on pages for loci and individuals and on other pages on the site. Submitter accounts are granted only to users who have been individually vetted by an SGN curator, since these accounts carry privileges for submitting new data and editing many existing entries. Submitter accounts are generally available to anyone with a legitimate interest and expertise in Solanaceae research, and a request for a submitter-class account is typically granted within 24 h. There is also a third user type for SGN staff, curator, which carries administrative privileges. Any SGN submitter may add a new locus or request locus editing privileges for the purpose of curation and annotation of genes already existing in the database. To obtain locus editor privileges, a user must first create an account by clicking on the login link from the toolbar on any SGN page (http://www.sgn.cornell.edu/solpeople/login.pl) and follow the instructions after clicking the sign up for an account link. This will create an account of type user. User accounts can be upgraded to submitter upon request by e-mailing to sgn-feedback{at}sgn.cornell.edu (using the link provided in the footer of every SGN Web page), or by requesting editor privileges for a specific locus by clicking on the request editor privileges link from the relevant locus page.
When a user logs into the SGN Web site, they are directed to the site's central hub for user-based functions called MySGN (http://www.sgn.cornell.edu/solpeople/top-level.pl), which provides entry points to many of the site's user-based tools, including to the community annotation functions. On this page, users with submitter accounts can find a summary of all loci for which they have editor privileges, as well as a list of their recently annotated loci. It also has a link for viewing all recent changes to the community annotations. Each user's publicly visible SGN person detail page also shows a list of loci for which they are editors.
All user-editable data types have designated owners. Loci, alleles, phenotyped accessions, images, and annotations can be submitted by any SGN submitter or curator. The submitter becomes the object owner by default.
Loci When a locus editor is logged in, the edit and delete links on the locus page become active. Clicking the edit link opens an editable form (Fig. 2A), and clicking the delete link brings up a delete confirmation dialog. Any user with a submitter-class account can add synonyms, alleles, sequences, publications, and ontology annotations to a locus record.
Alleles
Gene Networks
Files and Images An image-specific detail page is available, with metadata including description, the user who uploaded the image, who is usually also the owner of the image and has edit and delete privileges. As with loci and individuals, images are never deleted from the file system and the database, but are only set to obsolete in the image table. The image page also contains tags—general text descriptors—that may be added or deleted by any SGN submitter in a similar manner to the locus synonyms. All the objects associated with an image are printed below it, with links to the relevant Web pages. The same image may be associated with one or more individuals and also with other object types in SGN's database, thus creating a general image object not restrained to a specific data type.
Accessions
Biochemical Pathways
Sequence Annotations
Literature Annotations
Ontology Annotations Curator-verified annotations are submitted periodically by SGN to the GO and PO consortiums and are available for browsing on their respective Web sites. The SP ontology, developed at SGN, is mapped to PO and PATO terms for entity-quality-value annotations of qualitative and quantitative traits.
User Comments
The sources for original bulk data include NCBI (GenBank for sequences and PubMed for literature), the TGRC for loci and plant accessions, individual labs. To aid the community in locus annotation for a given species, we periodically populate or repopulate the database with selected bulk information from these sources using automated data-processing pipelines. These data serve as seed for the community annotators to refine and build upon and/or complement data already existing in SGN. To populate the database, we have developed an automated processing pipeline to process new locus data and upload and update existing links and annotations. Newly characterized genes are also added individually to the database by SGN curators or by members of the community as they are published in the public domain.
Web Data User Edits Deleting entries (locus or phenotype) and annotations using the Web system does not remove the information from the database, but only flags the item as obsolete, whereupon it is excluded from Web displays. This means that delete operations can be reverted, thus preventing data loss caused by accidents or malicious users. Back-end administration features allow SGN curators to view all annotation changes organized by date. An additional layer of control on data input by the community is restrictions on the data type that can be added in a specific field. For example, to describe functions and phenotypes of a locus, annotators are limited to using ontologies from a browsable list already existing in the SGN database. And, generally, data that can be used to annotate, for example, a locus, have to exist in an internal or external database. This ensures that random data entry or spamming is minimized.
Storing History On each update of locus and accession details, the previous version of the information is transferred from primary tables in the relational database to a set of history tables, which are nearly identical in structure to the primary tables. When a locus owner or curator is logged in, the Web interface provides a clickable link to display all the changes previously made, and the name of the person who made each change. This history module enables easy tracking and reverting of data, providing an essential undo function for managing community-generated content.
We would like to thank Esther Van Der Knaap, Roger Chetelat, and Dani Zamir and all submitters for contributing data to the phenotyped populations and locus database, and Anuradha Pujar for contributing to the development of the Solanaceae Phenotype Ontology. We would also like to thank two anonymous reviewers for their helpful comments. Received March 23, 2008; accepted May 9, 2008; published June 6, 2008.
1 This work was supported by the National Research Initiative Plant Genome Program of the U.S. Department of Agriculture Cooperative State Research, Education, and Extension Service (BARD grant no. FI–370–2005) and the National Science Foundation (grant no. 2007–02777). The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Lukas A. Mueller (lam87{at}cornell.edu).
[C] Some figures in this article are displayed in color online but in black and white in the print edition.
[OA] Open Access articles can be viewed online without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.108.119560 * Corresponding author; e-mail lam87{at}cornell.edu.
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815[CrossRef][Web of Science][Medline] Avraham S, Tung CW, Ilic K, Jaiswal P, Kellogg EA, McCouch S, Pujar A, Reiser L, Rhee SY, Sachs MM, et al (2008) The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations. Nucleic Acids Res 36: D449–D454 Butler L (1952) The linkage map of the tomato. J Hered 43: 25–35 Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, et al (2008) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res 36: D623–D631 Crosby MA, Goodman JL, Strelets VB, Zhang P, Gelbart WM, FlyBase Consortium (2007) FlyBase: genomes by the dozen. Nucleic Acids Res 35: D486–D491[CrossRef][Web of Science][Medline] Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, Mouse Genome Database Group (2007) The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res 35: D630–D637[CrossRef][Web of Science][Medline] Eshed Y, Zamir D (1995) An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield-associated QTL. Genetics 141: 1147–1162[Abstract] Gene Ontology Consortium (2008) The Gene Ontology project in 2008. Nucleic Acids Res 36: D440–D444 Gonzalo MJ, van der Knaap E (2008) A comparative analysis into the genetic bases of morphology in tomato varieties exhibiting elongated fruit shape. Theor Appl Genet 116: 647–656[CrossRef][Web of Science][Medline] Lawrence CJ, Schaeffer ML, Seigfried TE, Campbell DA, Harper LC (2007) MaizeGDB's new data types, resources and activities. Nucleic Acids Res 35: D895–D900 Liang C, Jaiswal P, Hebbard C, Avraham S, Buckler ES, Casstevens T, Hurwitz B, McCouch S, Ni J, Pujar A, et al (2008) Gramene: a growing plant comparative genomics resource. Nucleic Acids Res 36: D947–D953 Menda N, Semel Y, Peled D, Eshed Y, Zamir D (2004) In silico screening of a saturated mutation library of tomato. Plant J 38: 861–872[CrossRef][Web of Science][Medline] Muller HM, Kenny EE, Sternberg PW (2004) Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol 2: e309[CrossRef][Medline] Mueller LA, Mills AA, Skwarecki B, Buels RM, Menda N, Tanksley SD (2008) The SGN comparative map viewer. Bioinformatics 24: 422–423 Mueller LA, Solow TH, Taylor N, Skwarecki B, Buels R, Binns J, Lin C, Wright MH, Ahrens R, Wang Y, et al (2005) The SOL Genomics Network: a comparative resource for Solanaceae biology and beyond. Plant Physiol 138: 1310–1317 Mungall CJ, Emmert DB, FlyBase Consortium (2007) A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics 23: i337–i346 Ohyanagi H, Tanaka T, Sakai H, Shigemoto Y, Yamaguchi K, Habara T, Fujii Y, Antonio BA, Nagamura Y, Imanishi T, et al (2006) The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res 34: D741–D744 Ori N, Cohen AR, Etzioni A, Brand A, Yanai O, Shleizer S, Menda N, Amsellem Z, Efroni I, Pekker I, et al (2007) Regulation of LANCEOLATE by miR319 is required for compound-leaf development in tomato. Nat Genet 39: 787–791[CrossRef][Web of Science][Medline] Pennisi E (2000) Ideas fly at gene-finding jamboree. Science 287: 2182–2184 Riley M, Abe T, Arnaud MB, Berlyn MK, Blattner FR, Chaudhuri RR, Glasner JD, Horiuchi T, Keseler IM, Kosuge T, et al (2006) Escherichia coli K-12: a cooperatively developed annotation snapshot—2005. Nucleic Acids Res 34: 1–9 Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al (2002) The BioPerl toolkit: Perl modules for the life sciences. Genome Res 12: 1611–1618 Stein L (2001) Genome annotation: from sequence to biology. Nat Rev Genet 2: 493–503[Web of Science][Medline] Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36: D1009–D1014 Xiao H, Jiang N, Schaffner E, Stockinger EJ, van der Knaap E (2008) A retrotransposon-mediated gene duplication underlies morphological variation of tomato fruit. Science 319: 1527–1530
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY® | THE PLANT CELL | |
|---|---|---|---|