Research Outputs
What Research Outputs Are Being Produced?
Research outputs include a variety of materials and data generated during the research process. These may include:
- Samples
- Physical samples (e.g., biological, chemical, geological)
- e.g.:
- Mullineaux #2453 - Physical samples will include animals, rocks, and water column samples. Rocks and animal specimens for which we extract gut contents will be assigned unique sample identifiers, important for provenance for subsamples to be analyzed for microbes.
- e.g.:
- Digital samples (e.g., scanned images, 3D models)
- Physical samples (e.g., biological, chemical, geological)
- Data
- Raw data
- Processed data
- Metadata
- Collections
- Curated sets of samples or data
- Software
- Code, scripts, or applications developed for analysis
- Materials
- Lab protocols, reagents, or experimental setups
- Models
- Statistical, mathematical, or computational models
- e.g:
- BCO-DMO: Bianchi
- Output from model simulations, which will include both physical (T, S, velocities) and biogeochemical (tracer, rates, fluxes) variables. The raw output will be post-processed and analyzed for presentation at meetings and for publication in scientific journals. For model output from the main set of simulations (Table 1 in the proposal), standard model fields (e.g. physical and biogeochemical tracers, current velocities, and the main biogeochemical rates) at monthly resolution will be provided to, and made available through, BCO-DMO and NOAA NCEI in standard NetCDF format.
- BCO-DMO: Bianchi
- e.g:
- Statistical, mathematical, or computational models
- Documents/Posters/Pre-prints/Publications/Videos
- Research papers, conference posters, pre-prints, and multimedia content
- e.g.:
- Mullineaux #2453
- Imagery at the seafloor will include NDSF vehicle imagery (e.g., 4K video) and PI-provided imagery (e.g., time-lapse still).
- Mullineaux #2453
- e.g.:
- Research papers, conference posters, pre-prints, and multimedia content
General example
Note their description of data types, number of experiments performed, file types generated, and repositories for submission.
- BCO-DMO: Alexander
- Description of data types, number of experiments performed, file types generated, and repositories for submission
- Genetic sequencing: mRNA and DNA sequencing from E. huxleyi cultures grown in the lab will be collected. Sequencing will be performed at the Columbia Genome Center (New York, NY). All raw sequence data will be deposited in the short-read archive (SRA) through the NCBI. Assembled genomes will be deposited under the same project number through the NCBI’s Assembly database. Assembled transcriptomes will be deposited on the NCBI’s Transcriptome Shotgun Assembly (TSA) database. Associated annotation files will be uploaded with the genome and transcriptome files to the appropriate database. File types: Short-read archive (.sra), raw sequencing files (.fastq), assembled fasta files (.fasta), annotation files (.gff). Repository: NCBI; accession numbers will be provided to BCO-DMO.
- Physiological experiment data: Physiological experiments carried out on 10 different E. huxleyi strains will be conducted in the lab. Physiological and chemical parameters of the cultures will be collected over time including: growth rate, cell size, Fv/Fm, chlorophyll, particulate organic carbon, particulate inorganic carbon, dissolved nitrogen and phosphorus, and particulate nitrogen and phosphorus. File types: csv files. Repository: BCO-DMO.
- Bioinformatic pipelines: Analysis of all genetic sequencing data will be done through the construction of reproducible pipelines designed in Snakemake. These pipelines will be made public on GitHub and archived and provided a doi through Zenodo. Repository: Zenodo; doi will be provided to BCO-DMO.
Number and Size of Research Outputs
- Number of files, samples, collections, etc.
- Estimated count of each type of output.
- Size of each file, sample, etc.
- File sizes (e.g., MB, GB, TB) for digital outputs.
- Physical dimensions or quantities for physical samples.
- e.g.:
- Sosik LTER
- By mid-year 6 of our project, our IM team stores ~60 TB of data (~55 on premises and ~5 cloud). For WHOI’s institutional research data storage (RDS) on premises, we have >40 TB to share within WHOI a subset of our towed plankton imaging data, 12 TB for a subset of our coupled physical-biological model output, and 1 TB for (low-volume) data products including those served by our web-based REST API. However, additional storage is required for plankton imagery (e.g., PI Sosik’s network attached storage serving ~10 TB publicly available IFCB data), acoustic data collected by vessels and Stingray towed vehicle, and physical and coupled biological-physical model output; these and some other high-volume data types (e.g., high throughput sequencing data) are necessarily managed by PIs.
- Sosik LTER
- e.g.:
Metadata Critical for Data Interpretation and Reuse
- Sensor metadata
- Information about instruments or sensors used to collect data.
- e.g.:
- BCO–DMO: Church
- All underway optical data including imaging flow cytometry will be time-synchronized with the ship’s GPS and thermosalinograph. At sea all data will be visualized in real time to evaluate data quality and instrument stability; data will be stored on a customized raspberry-PI based data logger and backed up onto a separate hard drive. Post-cruise, all continuous optical data will be merged to 1-min averages and submitted to BCO-DMO; Imaging flow cytometry data are sampled at ~20 min intervals; a composite cruise file of all imaged particles and their individual morphometrics and optical properties will be generated and submitted to BCO-DMO.
- BCO–DMO: Church
- e.g.:
- Information about instruments or sensors used to collect data.
- Sampling effort metadata
Details about sampling methods, locations, and conditions
e.g.:
- BCO-DMO: Reitzel - Sequence data will be annotated with essential information for each sample, which will include description of the sample (e.g., location of origin for population, date), method for library generation, and the sequencing method. These data tags for description will be included with data submitted to SRA and BCO-DMO. Nucleic acids extracted from collected animals will be labeled with precise site of origin (latitude, longitude), date of collection, and species and archived in the Reitzel and Burt laboratories. Field temperature data will be stored in flat ASCII files, which can be read easily by different software packages. Field data will include date, time, latitude, longitude, and temperature, as appropriate.