Got Cruise Data?

GRIIDC and NOAA’s National Centers for Environmental Information (NCEI)

Data management is an important part of any research project. The principles of open data are increasingly becoming widely recognized in the scientific community and as a result, many proposals are now requiring researchers to make their data publicly available. To promote data sharing in the Gulf of Mexico, GRIIDC not only stores and shares data but also submits cruise data to NOAA’s National Centers for Environmental Information (NCEI). NCEI is a long-term repository for oceanic, atmospheric, and geophysical data. Over the past few years, GRIIDC has been collaborating with NCEI to archive datasets collected during oceanographic research cruises. NCEI requires specific documentation and file structures for all datasets. Therefore, GRIIDC has created very specific guidelines to ensure all documentation required by NCEI is collected before it is accepted into the GRIIDC repository.

Taking these extra few steps may seem arduous, but properly organizing and storing data pays off immensely in the long run. Not only is data safely stored for long-term retrieval and re-use, it is also easily searchable for others to build on and cite, giving the researcher credit and increasing exposure to their work. With this in mind, GRIIDC strives to make the process as painless as possible for researchers and only requires some basic information about the cruise in order to fulfill the requirements. All the researcher has to do is fill out a few fields in the GRIIDC Cruise Data Documentation Template spreadsheet and submit with the data — GRIIDC does the rest!

Is My Data Considered Cruise Data?

Types of cruise data
GRIIDC divides the term “oceanographic cruise data” into five main categories, as shown in the main image above.

Underway Data
This group includes all measurements collected along track using the ship’s onboard, on-deck and flow-through instruments. These data can be submitted ideally in one dataset. Multiple datasets are acceptable; however, each dataset should contain navigation files that can be used to match each data point with coordinates and time.

Stations/Profiles
Similar to the underway data, station data includes all measurements collected at a specific location in the atmosphere, ocean surface, or in the water column. Data collected at specific locations or stations can be provided in one or multiple datasets. If there are multiple types of data, please organize using different folders.

Samples
The third group includes all the measurements or analysis from samples, such as water, sediment or organisms, collected during the cruise. So, all data and analysis of water samples, nets, sediment cores, and organisms must be submitted following the same guidelines for cruise dataset. Each analysis can be submitted as an individual or multiple dataset. The only type of analysis that will not be consider cruise data is genetics. Some examples include mass spectrometry, nutrients, primary production, respiration rates, fish measurements, stomach content, infauna and taxonomy.

Observations
The fourth group includes all the observations such as marine mammals or bird spotting. Observations follow the same guidelines as the “Samples” datasets described above. However, depth may not be necessary and coordinates are usually a close approximation.

Underwater Vehicles
The final group includes all measurements taken with underwater observing tools, such as remotely or towed operated vehicles (ROVs). Underwater vehicles are considered cruise data and include towed instruments such as the Scanfish or ISSIS and submersible dives. Glider data is not considered cruise data. Refer to the glider data submission guidance for more information: https://data.gulfresearchinitiative.org/sites/default/files/uploads/GRII....

GRIIDC Cruise Data Documentation Template
GRIIDC provides a simple Excel spreadsheet called the GRIIDC Cruise Data Documentation Template, found in the User Guides section, that must be submitted for all data collected via oceanographic cruise. It requires information such as:

    • Start and end dates of the cruise
    • Purpose
    • Sea name
    • Chief scientist
    • Cruise ID
    • Vessel name
    • Keywords
    • Basic information about the researchers involved in data collection

This file should also contain explanations of the folder structure and contents of folders and files. To avoid delays in publishing your data, please make sure you follow the guidelines. The cruise guidance documentation and template can be found here: https://data.gulfresearchinitiative.org/training-user-guides https://data.gulfresearchinitiative.org/sites/default/files/uploads/GRII...

General Data Do’s and Don’ts
For all data collected during a cruise, every sample or data point must include latitude/longitude coordinates, date, and depth. The methods, instruments, and protocols used for data collection and analysis must be included in the metadata. If calibration or device files were used for any of the instruments, they must be included in the dataset.

Color and format
Excel files should NOT contain any color formatting, comments, formulas, graphs, signatures, or designs. These datasets will be returned to the submitter to clear the formatting. These features may not migrate into new file formats and may get lost. Colors, if not defined, lose meaning and may not be interpreted the same way by all users.

Blanks
Please do not leave cells blank. Blank cells leave a lot up for interpretation-- are the data missing? Was it not collected? Did something go wrong? GRIIDC is required to define and explain all blank cells in the data before submitting to NCEI. Therefore, if there is no information available, please use “NaN” for “Not a Number” or “N.D.” for “Not Determined”.

Headers
All headers, variables, and columns must be defined, and units should be provided. A Readme text file may be provided if the number of variables is very long. Define any acronyms or abbreviations used in the dataset. An outside user should be able to understand your dataset using this information.

NetCDF Files
All NetCDF files must be CF compliant. Please see the GitHub link for a compliance checker used for all glider and NCEI data.

Best Practices: Dataset File Structure Example:

CTD Data Example:
PELICAN_CTD_20170810-20170812.zip
Folder: Documentation
Cruise_Data_Documentation_20170877.xlsx*
DataDocumentationReadme.txt
CruiseReport.docx
StationCoordinates.xlsx
Folder: CTD
CTD20170811.cnv
CTD20170811.hdr
CTD20170811.hex
CTD20170811.ros
CTD20170811.txt
CTD20170811.xml
CTD20170811.xmlcon
CTD20170811_RawData.csv
CTD20170811_Downcast.csv
CTD20170811_Binned_Averages.csv

This example for the research vessel called “PELICAN”, shows what the file structure should look like for a station/profile cruise dataset. In this case we used a CTD example. The dataset should contain the cruise documentation in a separate folder, if possible.

For more examples of other cruise data type’s file structures please see our cruise guidance document: https://data.gulfresearchinitiative.org/sites/default/files/uploads/GRII....

Conclusion
Although additional information is required for cruise datasets, collecting this information is important for creating a more useable and long-lasting dataset. Data that are not well-documented are unusable — making the time, effort, and funds put into the collection and analysis of the data essentially squandered. GRIIDC has made the process of collecting this additional information easy and as always, GRIIDC is here to help with any data management questions along the way!