A New World of Sharing Scientific Data


Photo by Emmanuel on Unsplash

A Shift in Sharing
The idea of sharing data and making it publicly available was once inconceivable for many scientists. Hard-earned data is far too precious to be made public and too tempting to the uninspired plagiarizers of the world. However, considering that scientists advance knowledge gained from experimental and modeled data, it could be argued that scientists that do not publish their data are compromising scientific development and, perhaps, leaving their work incomplete (Costello, 2009).

In the past, those researchers who were open to sharing their data had limited options, such as publishing data in print media, archiving in libraries, or sending hard copies in the mail by request (Popkin, 2019). The world of data management has come a long way in the last few decades and the concept of open data is infiltrating all sectors of science and technology. Open, citable data is an important step towards transparency in scientific discovery and collaboration. It is rapidly catching on, as evidenced by more than 100 repositories, institutions, publishers, and societies signing up since last November to the Enabling FAIR Data Project’s Commitment Statement in the Earth, Space, and Environmental Sciences for storing and sharing data (Stall et al.,2019). The principles state that research data should be “findable, accessible, interoperable and reusable” (FAIR) (Wilkinson et al., 2016). This concept has been around for a while but it is only recently taking off as a major step forward in the scientific community.

Publishing data in online repositories such as the Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC) is not only far less expensive than options from the past, but it also allows others to instantly access, use, and cite datasets in their own publications. By properly citing datasets, researchers can build off of work that has already been done or find ways to collaborate with other researchers — limiting duplication of efforts and resources used.

“The principles in FAIR are the result of 20 years of collaboration with publishers, data repositories, research funders, researchers and others (Stall et al.).”
The principles recommend that scientific data are:

  • Findable — so that anyone can easily search for and find data and metadata.
  • Accessible — the user should be able to access the data, possibly needing authentication and authorization.
  • Interoperable — comparable data can be analyzed and integrated with necessary applications for analysis, storage, and processing.
  • Reusable — the main goal of FAIR is to optimize the reuse of data. Metadata and data should be as detailed as possible so others can replicate methods in different situations (Costello, 2009).

What is data citation?
Data citation refers to citing a dataset in the same way that articles, journals, and books are referenced in research papers. It gives credit to the researcher, data repositories, researcher funders, and overall benefits environmental stewardship in the scientific community and the general public (Wilkinson et al., 2016).

Why cite data?
There are many benefits to citing data in publications. One major reason that should dispel worry about plagiarism is that data citation does just that — it creates accountability for authors and users of the dataset, reducing the threat of plagiarism once it has been appropriately cited (U.S. Geological Survey, 2019). Data citation also allows other researchers to easily locate and access data in order to replicate or verify results (U.S. Geological Survey, 2019). Also, convenient access to data encourages others to easily locate and reuse a dataset. Citing datasets creates an official system of acknowledgment and incentive to data producers as a citable contribution to their fellow scientists (U.S. Geological Survey, 2019). Data citation increases the transparency of data production, which in turn encourages researchers to produce higher quality datasets (U.S. Geological Survey, 2019). Open data also exposes the researcher’s work to a much wider audience, boosting their recognition, opening them up to invitations to meetings, opportunities to consult and collaborate with others, as well as increase citation rates since their productivity will be more visible. Besides these major reasons, opening data for others to use can also increase confidence in results and generate goodwill among scientists (Oregon State University, 2019).

How to cite data and standards/best practices:
Datasets must be equipped with detailed metadata so others may understand and replicate the scientific procedures that were followed to create the data. Metadata describes the dataset and provides users with descriptions, definitions and units that explain how the data were obtained (Costello, 2009). The more detailed the metadata, the better. This helps other researchers to more easily search, access and cite the data (Costello, 2009). A benefit of storing data in a data repository like GRIIDC is that a team of data curation experts do a lot of this work for researchers who may not have much experience in data management. Subject matter experts and data curators are trained to skillfully review the data to make sure what they publish is as accurate and useful as possible.

Datasets should be cited as a reference in the references section of the publication through a data access statement. The statement contains information such as the data repository from which the data was downloaded, the URL and the Digital Object Identifier (DOI) or accession code to help access the data (Ball and Duke, 2015). This is a unique string of numbers and letters which is used to identify a particular dataset. GRIIDC has had success in helping scientists cite their datasets in the reference section. GRIIDC provides a DOI and recommends a data citation:

[data originator last name], [data originator first name], [co-data originators]. [title of the dataset] [year dataset registered/published by GRIIDC]. Distributed by: Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC), Harte Research Institute, Texas A&M University – Corpus Christi. doi:

Scientific data production is flourishing around the world. Allowing others to use data through collaboration or as a starting point for expanding the research will only maximize its potential and give more notoriety to the authors of the data (Oregon State University, 2019). Not only that, datasets are being recognized as legitimate, independent products of research — as valuable as the publications produced from the data, further benefiting the data creators (Oregon State University, 2019). Valuable data doesn’t have to gather dust in your office any more — share that hard earned work with the world and get credit!

1. Ball, A. & Duke, M. "How to Cite Datasets and Link to Publications”. DCC How-to Guides. Edinburgh: Digital Curation Centre. (2015).

2. Costello, Mark J. "Motivating online publication of data." BioScience 59.5 (2009): 418-427. doi: 10.1525/bio.2009.59.5.9

3. Oregon State University. “Citation of Datasets.” (2019). https://guides.library.oregonstate.edu/research-data-services/data-manag...

4. Popkin, G. "Data sharing and how it can benefit your scientific career." Nature 569.7756 (2019): 445. doi: 10.1038/d41586-019-01506-x

5. Stall S., Yarmey L., Cutcher-Gershenfeld J., Hanson B., Lehnert K., Nosek B., Parsons M., Robinson E., Wyborn L. "Make scientific data FAIR." Nature 570, 27-29 (2019): 27. doi: 10.1038/d41586-019-01720-7

6. U.S. Geological Survey. “Data Citation.” https://www.usgs.gov/products/data-and-tools/data-management/data-citation

7. Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.W., da Silva Santos, L.B., Bourne, P.E. and Bouwman, J. “The FAIR Guiding Principles for scientific data management and stewardship.” Sci. Data 3:160018 (2016) doi: 10.1038/sdata.2016.18