|
|
||||||||
|
Plant Physiology 138:55-58 (2005) © 2005 American Society of Plant Biologists The Maize Genetics and Genomics Database. The Community Resource for Access to Diverse Maize Data1Department of Genetics, Development and Cell Biology (C.J.L., V.B.), and Department of Statistics (V.B.), Iowa State University, Ames, Iowa 500113260; and Agricultural Research Service, United States Department of Agriculture, Ames, Iowa 500113260 (T.E.S.)
The Maize Genetics and Genomics Database (MaizeGDB) serves the maize (Zea mays) research community by making a wealth of genetics and genomics data available through an intuitive Web-based interface. The goals of the MaizeGDB project are 3-fold: to provide a central repository for public maize information; to present the data through the MaizeGDB Web site in a way that recapitulates biological relationships; and to provide an array of computational tools that address biological questions in an easy-to-use manner at the site. In addition to these primary tasks, MaizeGDB team members also serve the community of maize geneticists by lending technical support for community activities, including the annual Maize Genetics Conference and various workshops, teaching researchers to use both the MaizeGDB Web site and Community Curation Tools, and engaging in collaboration with individual research groups to make their unique data types available through MaizeGDB.
The Maize Genetics and Genomics Database (MaizeGDB) is the community resource for maize (Zea mays) data and can be accessed online at http://www.maizegdb.org. Data types stored at MaizeGDB include (but are not limited to) sequence, locus, variation, probe, map, metabolic pathway, phenotype, quantitative trait locus (QTL) experiment, stock, and contact information for hundreds of maize researchers worldwide (for review, see Lawrence et al., 2004 The team of people who work at MaizeGDB seek to serve the community of maize geneticists not only by making data generated by maize researchers available through the MaizeGDB site, but also by engaging in various community service projects. The MaizeGDB team supports the annual Maize Genetics Conference by maintaining the conference Web site and facilitating the abstract collection process, provides technical assistance for training workshops (e.g. the Maize Genetics, Genomics and Bioinformatics Workshop, which took place in March 2004 at the International Maize and Wheat Improvement Center [CIMMYT] in Mexico City), and sends out announcements to the community of maize geneticists via e-mail as directed by the Maize Genetics Executive Committee (http://www.maizegdb.org/mgec.php). In addition, by providing data descriptions for the general public, which can be found on each data center page at MaizeGDB (e.g. a description for gene product can be viewed at http://www.maizegdb.org/gene_product.php#dld), the MaizeGDB team works to educate the general public about the importance of maize genetic research.
It is the aim of this article to illustrate the breadth of information made available through MaizeGDB, to convey the method by which the information is curated and made accessible, and to relate how the database infrastructure was built and is currently maintained. Detailed information regarding historical aspects of the MaizeGDB project and a review of various data types made available through the MaizeGDB site are described elsewhere (Lawrence et al., 2004
In September 2004, under the guidance of the National Plant Genome Initiative (NPGI), the National Science Foundation (NSF), the U.S. Department of Energy (DOE), and the U.S. Department of Agriculture (USDA) announced that research funds would be made available to sequence the maize genome, and a solicitation for grant proposals was circulated (see http://www.nsf.gov/pubs/2004/nsf04614/nsf04614.txt). It was noted in this announcement that proposals should utilize existing, previously funded resources, one of the resources listed explicitly being the maize community genome database.
Currently, MaizeGDB provides a portal to maize genome sequencing information (which can be viewed at http://www.maizegdb.org/genome). Displayed on that page are documents outlining details concerning the maize genome sequencing endeavor, links to maize sequence repositories, information outlining the diversity of sequencing strategies currently employed, and a list of relevant publications germane to the maize genome sequencing project. As the maize genome gets sequenced, MaizeGDB will adapt to provide a sequence-centered portal to all maize data similar to the one provided by The Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org; Rhee et al., 2003
Presently, the number of people working at MaizeGDB is quite small when compared to the personnel associated with other database projects like TAIR (http://www.arabidopsis.org), Gramene (http://www.gramene.org), and the Solanaceae Genomics Network (SGN; http://www.sgn.cornell.edu). In order to serve as a central site for making large numbers of sequences, contigs, assemblies, and, eventually, a fully sequenced maize genome available at MaizeGDB, it is necessary that partnerships be forged between MaizeGDB and other databases, sequencing projects, and large-scale generators of maize data. In an initial effort to engage in such a collaboration, a pipeline for getting sequence data into MaizeGDB has been developed wherein all available maize sequences are downloaded from GenBank (http://www.ncbi.nih.gov/Genbank) by staff working at the Plant Genome Database (PlantGDB; http://www.plantgdb.org; Dong et al., 2004
Over the course of the past year, cytological map images generated by the Cytogenetic Map of Maize project (http://www.cytomaize.org; Koumbaris and Bass, 2003
Web Interfaces Allow Access to Data within a Biological Context
The data stored at MaizeGDB are made available through a series of interconnected Web pages. Researchers can also contact the MaizeGDB team at mgdb{at}iastate.edu to request access to Web-based read-only SQL tools allowing direct queries on the curation copy of the database. These pages are coded in HTML, and most are automatically generated by PHP and Perl scripts. Through the Web interface (accessible at http://www.maizegdb.org), each data display page shows detailed information on a specific biological entity (e.g. a locus), as well as basic information about data associated with it (e.g. maps, variations, probes, and citations are among data types associated with loci), and links to related off-site resources (e.g. locus pages link directly to Gramene; Ware et al., 2002 MaizeGDB's method of data delivery is aimed at making information available within the framework of its scientific meaning. Data displays place specific pieces of data within a biological context. For example, if you arrive at a map page by way of a locus page, the locus that was last visited is highlighted within the map display. Not only does using the biological relatedness of data types in conjunction with following a researcher's clickstream enable the interface to reflect real biological relationships, it also aids researchers by causing the site to seem to follow their actual train of thought. The following usage case demonstrates the interrelatedness of different types of biological information, reveals MaizeGDB's method of recapitulating those interrelationships, and highlights the placement of links to off-site resources that can help researchers gain access to related information that is not stored at MaizeGDB.
An intrepid researcher visits MaizeGDB in the hopes of finding information that would help her to design multiplex PCR primers to genotype F2 plants. She is working to determine the transmission of different variants of chromosome 10, and wishes to develop a protocol for diagnosing which chromosome 10 variants are present in any given plant growing in a half-acre field. Because the researcher knows that the expressed sequence tag (EST) probe p-csu571 labels bands that migrate differentially on Southern blots between the two backgrounds of interest (Mroczek, 2003
Most of the continued development of the MaizeGDB interface is guided by members of the maize (Zea mays) genetics research community: Community members have sent hundreds of suggestions and requests concerning methods to find and display data. To aid in encouraging community feedback, a highly utilized context-sensitive feedback tool appears at the bottom of every page. The needs of researchers serve as the major impetus for interface development, and addressing those needs directly allows for tools to be developed that are both timely and germane to the needs of maize geneticists. An example of a research need that guided interface development comes from Bill Sheridan, a maize geneticist working in the Department of Biology at the University of North Dakota. Dr. Sheridan contacted the MaizeGDB team seeking an easy method to summarize which simple sequence repeats (SSRs) detected bacterial artificial chromosomes that were also associated with genetically mapped markers. Dr. Sheridan worked with the MaizeGDB team to design a table-generating tool that provides approximate map locations for markers, the SSRs for those markers, and bacterial artificial chromosomes detected by the SSRs (see the links to each of the 10 maize chromosomes beneath the heading "Mapped & Anchored SSRs" at http://www.maizegdb.org/ssr.php). Dr. Sheridan was able to use this tool for his research and was pleased that his input guided the development of such a useful tool. This sort of interface development to address specific research needs typifies the method by which members of the MaizeGDB team work alongside researchers to help them gain access to complex relationships documented in the database. In summary, MaizeGDB's interface was initially designed to provide a context for interpreting maize data, and continued interface development is driven by specific input from and collaborative design with members of the maize research community at large.
At present, the MaizeGDB team does not have any individual member dedicated solely to data curation. Instead, all team members curate data as the need arises and in accordance with his or her particular knowledge base. Most data are added to the curation copy of the database (see below for a description of how each copy of the database is utilized) in bulk and are contributed by community members directly. Feedback from researchers often guides individual data additions and edits. By describing which data to associate with existing records or by explaining why mistakenly associated information should be updated, the community of maize geneticists contributes directly to curating the data stored at MaizeGDB. Moreover, community members can add annotations to records at MaizeGDB by logging in through the annotation link at the top of any MaizeGDB page. Once logged in, researchers can add notes like the one shown in Figure 2 (http://www.maizegdb.org/cgi-bin/displaybacrecord.cgi?id=424644) by clicking the link to "Add your own annotation to this record" shown at the top of virtually all data displays.
For researchers interested in contributing data directly to the database, a set of Java-based Community Curation Tools has been developed and is available for general use. Data types accessible through these tools include clone library, gel pattern, gene product, linkage group, locus, map, map scores, panel of stocks, person, phenotype, primer/enzyme, probe, recombination data, reference, species, stock, term, and variation. By creating data records, researchers become Community Curators who own the records they create and retain the ability to edit owned records directly. To ensure that records entered by community members are complete, a curation level system has been implemented. Newly entered records are considered "submitted" and are checked by a professional curator. Once checked, the records are marked "approved" or "failed," and only "approved" records become publicly accessible through the Web interface. Each time a community member edits a record he or she owns, the record is reassigned the "submitted" curation level and must be reapproved to regain accessibility through the Web interface. Workshops demonstrating the use of the Community Curation Tools were taught at Iowa State University (ISU) in August 2004 and at the Plant and Animal Genome Conference in San Diego, January 2005. To schedule an on-site training session for your research group, contact the MaizeGDB team at mgdb{at}iastate.edu. Professional curators have access to a set of Java-based Professional Curation Tools that were originally created to interact with the Maize Genetics Cooperation Stock Center (MGCSC) MySQL copy of the database and that subsequently were adapted to interact with the ISU Oracle-based curation database. Whereas the Community Curation Tools were designed specifically to allow researchers from the community of maize geneticists to gain limited and controlled access to the database, the Professional Curation Tools allow less restricted access to data, enabling professional curators to create and edit records in an efficient and authoritative manner.
Three copies of the MaizeGDB database exist at ISU: a production copy, a curation (staging) copy, and a test copy. Each copy is housed on a separate machine. The production copy of the database is accessible through http://www.maizegdb.org. This copy is not edited and is accessible by the public through the Web interface. The curation copy of the database is accessible by both community and professional curators via curation tools: It is the copy of the database to which new data are added and within which existing data are edited when the need to do so arises. The curation copy of the database is dumped in a compressed form to file each day. Compressed daily dumps of the curation database are formatted for Oracle and can be accessed at http://goblin1.zool.iastate.edu/ The servers that support MaizeGDB run Oracle 9i, which is licensed every 2 years. The machines that house the various copies of the database are Dell PowerEdge servers (Round Rock, TX) with 2 x 2.0 GHz Xeon processors, 4 GB of RAM, 5 x 73 GB Ultra 320 10K RPM drives with Red Hat Advanced Server 2.1 (Raleigh, NC) installed. All servers are nearly identically configured.
In addition to the copies of the database housed at ISU, a MySQL copy exists at the MGCSC in Urbana/Champaign, Illinois, enabling the staff of the MGCSC to keep track of data associated with maize stocks directly (a service described in detail in Scholl et al., 2003
We are indebted to Darwin A. Campbell for his work as the MaizeGDB database administrator; Qunfeng Dong for his work as the database manager at PlantGDB, the source for all sequence data made available through MaizeGDB; and Marty Sachs, Director of the MGCSC, for his work curating stock and associated data types. We also thank Michael Brekke, systems support specialist; Sanford B. Baran, contract Web developer and creator of the Community Curation Tools; and Jason Carter, information technology specialist for the MGCSC and creator of the Professional Curation Tools that enable the MGCSC and MaizeGDB to maintain data synchronization. MaizeGDB would not have been possible without the legacy work of Ed Coe and Mary Polacco on the original MaizeDB resource. We are grateful for their continued interest and contributions. Received December 31, 2004; returned for revision January 28, 2005; accepted February 20, 2005.
1 This work was supported by a Specific Cooperative Agreement with the Agricultural Research Service, U.S. Department of Agriculture (no. 5836251192 to V.B.). www.plantphysiol.org/cgi/doi/10.1104/pp.104.059196. * Corresponding author; e-mail vbrendel{at}iastate.edu; fax 5152946755.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 33893402
Barker G, Batley J, O'Sullivan H, Edwards KJ, Edwards D (2003) Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 19: 421422
Brendel V, Xing L, Zhu W (2004) Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus. Bioinformatics 20: 11571169 Bruskiewich R, Coe EH, Jaiswal P, McCouch S, Polacco M, Stein L, Vincent L, Ware D (2002) The Plant OntologyTM Consortium and plant ontologies. Comp Funct Genomics 3: 137142[CrossRef]
Dong Q, Schlueter SD, Brendel V (2004) PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32: D354D359 Koumbaris GL, Bass HW (2003) A new single-locus cytogenetic mapping system for maize (Zea mays L.): overcoming FISH detection limits with marker-selected sorghum (S. propinquum L.) BAC clones. Plant J 35: 647659[CrossRef][ISI][Medline]
Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res 32: D393D397 Mroczek RJ (2003) Molecular, genetic, and cytogenetic analysis of the structure and organization of the abnormal chromosome 10 of maize. PhD dissertation. University of Georgia, Athens, GA
Rhee S, Beavis W, Berardini T, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, et al (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31: 224228 Scholl R, Sachs MM, Ware D (2003) Maintaining collections of mutants for plant functional genomics. In E Grotewold, ed, Plant Functional Genomics, Vol 236. Humana Press, Totowa, NJ, pp 311326
Ware DH, Jaiswal P, Ni J, Yap IV, Pan X, Clark KY, Teytelman L, Schmidt SC, Zhao W, Chang K, et al (2002) Gramene, a tool for grass genomics. Plant Physiol 130: 16061613 This article has been cited by other articles:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY | THE PLANT CELL | |
|---|---|---|---|