|
|
||||||||
|
Plant Physiology 135:4-9 (2004) © 2004 American Society of Plant Biologists To Give or Not to Give? That Is the QuestionPlant Gene Expression Center, Albany, California 94710 (A.T.); Department of Biochemistry, Stanford University School of Medicine, Stanford, California 94305 (R.W.D.); and Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304 (R.W.D.)
While Hamlet experienced "suffering caused by the slings and arrows of outrageous fortune...," many modern day scientists have similar feelings when asked to give and share published data and materials with other members of our community. Herein, we argue that being on the side of the fence that advocates sharing freely published data and materials is an acceptable practice in the scientific community, an opinion based on our scientific journey over the past 40 years.
One of us (A.T.) began as a plant physiologist when experimental plant systems were as many as the number of plant species, and the National Institutes of Health was calling on individual principal investigators to write three-page proposals to obtain funds for their favorite study. Plant physiology and biochemistry were the major disciplines of the time. Arabidopsis was known to very few of us, and genetics was practiced by the corn geneticists, a small group of scientists who freely exchanged their mutants published in their newsletter (Maize Genetics Cooperation, 19262004 The discussion on data and material sharing is formidable; however, it can be focused on two groups: (1) published materials and data by individual investigators, and (2) unpublished data produced by large scale genome and community projects.
A wonderful piece of writing relevant to the subject of data sharing appeared 3 years ago in the Journal of Cell Science (Caveman, 2001 As scientists we know the value of collaboration in our work. It is where ideas and expertise are shared, reagents are given away, and when young scientists learn to think about scientific advances being made as a group effort rather than on an individual basis. Collaboration, however, requires trust, respect and an eye to mentoring: trust that your ideas will not be stolen or misused; respect for the hard work and effort that brought you to the collaboration; and mentoring for the young scientists who want to develop in this area as independent scientists.
Recently, the National Research Council of the National Academy of Sciences released a report by the Committee on Responsibilities of Authorship in the Biological Sciences, chaired by Tom Cech (Cech, 2003 Community standards for sharing publication-related data and materials should flow from the general principle that the publication of scientific information is intended to move science forward. More specifically, the act of publishing is a quid pro quo in which authors receive credit and acknowledgment in exchange for disclosure of their scientific findings. An author's obligation is not only to release data and materials to enable others to verify or replicate published findings (as journals already implicitly or explicitly require) but also to provide them in a form on which other scientists can build with further research. All members of the scientific communitywhether working in academia, government, or a commercial enterprisehave equal responsibility for upholding community standards as participants in the publication system, and all should be equally able to derive benefits from it.
Four years ago Arthur Kornberg published the ten commandments of enzymology (Kornberg, 2000
According to the fourteenth commandment, it will be of great importance to establish of a federally funded resource center (Fig. 1) for accepting and distributing the materials derived from all the branches of plant biology, similar to that of the Arabidopsis Biological Resource Center (http://www.biosci.ohio-state.edu/
That the ninth commandment also needs to be amended was suggested by the Editor in Chief of the Proceedings of the National Academy of Sciences (Cozzarelli, 2004
During the last few years, a heated debate has been taking place among various communities regarding early prepublication data release of large-scale genome sequencing. The debate also extends beyond genomics; it encompasses any community resource-generating project to be subject itself to the same principles as those for genomic projects. The question is: Can a laboratory analyze and publish the findings of early released data deposited in a public database by a different laboratory? Conflicts have been reported over the early use of the sequences of the malaria parasites Plasmodium falciparum and Trypanosoma brucei (Macilwain, 2000 Since the initiation of the Human Genome Project, one of its operating principles has been that the data and resources generated should rapidly be made available to the scientific community. This implies the release of data prior to publication. In 1991, the National Human Genome Research Institute (NHGRI) and U.S. Department of Energy established a data release policy that called for release of data and materials no later than 6 months after having been generated (http://www.genome.gov/page.cfm?pageID=10000925). In 1996, the International Human Genome Sequencing Consortium adopted principles for data release (known as the Bermuda Principles; http://www.gene.ucl.ac.uk/hugo/bermuda.htm) that called for the automatic, rapid release of sequence assemblies of 1 to 2 kb or greater to the public databases. Subsequently, in April 1997, NHGRI published a data release policy stating that its grantees engaged in large-scale genomic DNA sequencing should release DNA sequence assemblies of >2 kb within 24 h of their generation. This policy become partially outmoded because it did not adequately address randomly generated whole-genome shotgun data. Such data sets are not assembled until late in a project, so tying data release to assembly could actually have had the opposite effect of slowing the release of sequence data. Consequently, in December 2000, NHGRI extended its data release policy, calling for raw sequence traces to be submitted weekly to a public trace database. The institute, however, acknowledged that this early data release policy potentially threatened the standard scientific practice that those who generate primary data should have both the right and responsibility to publish the work in a peer-reviewed journal. To prevent such an event, the NHGRI agreed to the inclusion of a statement on the trace data that indicated that users could use the data for all purposes, with the sole exception of the initial publication of the complete genome sequence assembly or other large-scale analyses that the producers planned to publish. This restriction attracted little attention until early 2002, when a community debate began about the merits of allowing any limitation on the use of whole-genome assemblies once they had been submitted to the public databases. To discuss the issue and attempt to resolve their differences, the Wellcome Trust (http://www.wellcome.ac.uk/) organized a meeting of data producers, users, database personnel, journal editors, and funding agency representatives in Fort Lauderdale, Florida, in January 2003. It was unanimously agreed that prepublication release of large-scale genome sequence data has been of tremendous benefit to the scientific research community at large and that it is very important to ensure that such release of sequence data continues. They therefore reaffirmed the Bermuda principles and recommended that they be extended to all types of sequence data. Furthermore, they recognized that other large efforts, designated community resource projects, would increasingly be generating data and other resources that should also be rapidly released to the community in an unrestricted manner. To ensure the continuing effectiveness of the system of rapid, prepublication release of data from community resource projects, the meeting attendees concluded that each of the three stakeholders in the systemdata producers, data users and funding agencieshas an active role to play in promulgating this tradition of openness. In response to the Fort Lauderdale meeting, the NHGRI modified its data release policy to implement the system of tripartite responsibility by stating:
The deposited data should be available for all to use without restriction. It was pointed out by the NHGRI that the successful maintenance of the system of rapid, unrestricted, prepublication data release requires constructive behavior from both the sequence producers and users. The community depends on the success of these efforts, and the sequence producers typically face relatively little direct competition. Furthermore, it is not possible to guarantee them the standard scientific incentive of publishing the initial analysis of the data they generate without applying restrictions that might inhibit the broadest possible use of the data by the scientific community. Accordingly, the sequence producers must recognize that even if the sequence data are occasionally used in ways that violate normal standards of scientific etiquette, this is a necessary risk set against the considerable benefits of immediate data release. Sequence users also must accept significant responsibilities. Users of unpublished genomic sequence data must appropriately acknowledge the source of the sequence data through the use of appropriate citations. Users must also recognize that the sequence producers have a legitimate interest in publishing peer-reviewed reports describing and analyzing the sequence that they have produced and that data deposits in databases are not the equivalent of such a publications. The entire scientific community can also help ensure that the system works fairly for all participants through the peer review systems of both journals and funding agencies. NHGRI also encourages the entire scientific community to recognize that the continued success of the system of prepublication data release requires active community-wide support. There should be no restrictions on the use of the genomic sequence data, but the best interests of the community are served when all act responsibly to promote the highest standards of respect for the scientific contribution of others. In addition, the NHGRI encourages the sequence producer to publish a Project Description, beginning with new genomic sequencing projects that are initiated in 2003. The purpose of the Project Description, which will be a new type of scientific publication, is to inform the scientific community about the sequencing project and to provide a citation that can be used to reference the source of the sequence.
We have seen this issue from both sides of the fence. We sit on one side of the fence as members of the research community, from which we have been users of the sequence and have benefited from the availability of the genome sequence, and have strongly supported the quick and immediate release of the data. We sat on the other side of the fence in our role as contributors to the yeast (Dietrich et al., 1997 The primary reason for coming to this conclusion is the following. If it is required to immediately release the sequence, we will be at a disadvantage relative to many others who will analyze our data because we have the added responsibilities of managing the sequencing and making the sequence available. Just getting the sequence and making it available (not to mention getting the grant to do it and dealing with the continuing administration of the grant) is a significant amount of work that our competition does not have to worry about. If the scientists leading projects to produce the sequence are to be able to attract the students and postdocs that are necessary to make the projects successful, they need to be able to offer them opportunities to exploit the data that are somewhat better than they would enjoy at other labs not producing the data. The producers of scientific data have historically been awarded the courtesy of ownership of their data. This gives them the privilege of being the first to analyze it and the first to publish findings based on it. They can, in principle, sit on the data for as long as they want, but they do so at their peril, of course, because if they wait too long they may not get continued funding, or they may get scooped. We think most of us will agree that this system has worked more or less (but mostly more) efficiently over the years. It's not clear to us that DNA sequence data should be treated any differently. We realize that the large sequencing centers are in a somewhat unusual position of having an essential monopoly on data, and the issues this raises will need to be resolved, but we don't think it's in the best interest of anyone to require immediate release of the data. The other consequence of a requirement for immediate data release that we are concerned about is that it will lead to the demise of the sequencing centers. Until now, they have been used largely as core facilities for the worldwide scientific community. We believe this is an unsustainable situation that, if continued, will destroy the large-scale sequencing centers. Whether that is an acceptable outcome might be a debatable issue, but if we think it is desirable to retain large-scale DNA sequencing centers, then we think the data release issues must be resolved without a wholesale requirement for immediate data release. This is because there would otherwise be no incentive for young scientists to get involved in such a project. An analogy may further clarify this point. Imagine a painter posting his unfinished artwork every 24 h outside his studio. One day another painter comes along and takes this painting, finishes it, and puts his name on it. Finally, we think raising in this discussion the precedent of the Bermuda standards of data release is off the mark. We think the Bermuda standards are outdated and no longer apply to the current situation. They were relevant, and helpful, for navigating the data release waters of the past, but the situation has changed. It was important that the sequence data for the established (and widely used) model organisms and for the human be made available as soon as possible because there really were (and are) A LOT of people who could make immediate advances based on that data. However, it seems to us that in the post-genomic era, immediate sequence release is not so urgent because the data are not as critical to as many research advances as was the initial genome data. Furthermore, we do not agree with the idea of putting restrictions on the use of data that has been publicly released. We know the problems with this firsthand because we have done the experiment of making publicly available our sequences to the public (via our Web pages) but with restrictions, and it caused problems. The simple solution to these problems is to revert back to the tried-and-mostly true method of allowing the producer of the data to own it until publication. Another possible solution is that these types of projects are carried out by commercial groups so that students are not at risk to be scooped. A commercial arrangement, however, will be more expensive than that carried out in an academic setting. Thus, to give or not to give, that is the question. It is nobler in the action of scientists indeed TO GIVE.
We thank Sheila McCormick for making us aware of the article by the "Caveman," Mark Johnston for sharing his thoughts regarding release of prepublication data, and Sarah Hake for useful discussions. Received March 18, 2004; returned for revision March 18, 2004; accepted March 19, 2004.
www.plantphysiol.org/cgi/doi/10.1104/pp.104.043083. * Corresponding author; e-mail theo{at}nature.berkeley.edu; fax 5105595678.
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796815[CrossRef][Medline] Cech TR (2003) Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences. National Academies Press, Washington, DC, www.nap.edu/books/0309088593/html Caveman (2001) Send me all of your reagents and ideas. We want to work on the same experiments. J Cell Sci 114: 10371038[Medline]
Cozzarelli NR (2004) UPSIDE: Uniform Principle for Sharing Integral Data and Materials Expeditiously. Proc Natl Acad Sci USA 101: 37213722 Dietrich FS, Mulligan J, Hennessy K, Yelton MA, Allen E, Araujo R, Aviles E, Berno A, Brennan T, Carpenter J, et al. (1997) The nucleotide sequence of Saccharomyces cerevisiae chromosome V. Nature 387 (6632 suppl): 7881[Medline]
Kornberg A (2000) Ten commandments: Lessons from the enzymology of DNA replication. J Bacteriol 182: 36133618 Kornberg A (2003) Ten commandments of enzymology, amended. Trends Biochem Sci 28: 515517[CrossRef][Medline] Macilwain C (2000) Biologists challenge sequencers on parasite genome publication. Nature 405: 601602 Maize Genetics Cooperation (1926-2004) Maize Genetics Cooperation Newsletter, 178
Marshall E (2002) DNA sequencer protests being scooped with his own data. Science 295: 12061207 The National Academies Committee on Responsibilities of Authorship in the Biological Sciences (2003) Sharing publication-related data and materials: responsibilities of authorship in the life sciences. 132: 1924
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, et al. (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842846
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ASPB Publications | PLANT PHYSIOLOGY | THE PLANT CELL | |
|---|---|---|---|