What is a dataset?

What is a dataset?

The definition of what constitutes a dataset varies across disciplines and is determined by the community of interest. Usually a dataset is a meaningful group or collection of related data; data can be related because, for example, they were collected during a specified time period, at a specific location or because they measure a specific parameter. GRIIDC encourages scientists to share all the research data collected that another scientist would find useful.

A single dataset can contain multiple files which may include data, software and models. Generally these data do not include: laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues or physical objects. GRIIDC encourages researchers to provide their data in the ‘rawest’ form that will be useful to the general scientific community and permit the widest reuse. For some data types, it may be useful to provide data at different levels of processing; for example, within the proteomics community it is valuable to release ‘raw’ data as well as processed data, to facilitate different reuse cases.

Data shared through GRIIDC are not limited to published data; all data collected with funding from GoMRI should be shared through the GRIIDC system even if these data will not be included in a peer-reviewed publication.
Ultimately, it is up to the Data Provider to decide what collection of a data files will constitute a single dataset. GRIIDC staff and subject matter experts are available to provide recommendations and guidance on how to organize your dataset and about what data you should include in your dataset submission.