2 Provide Context and Understandability to Your Data
Metadata standards ensure that data are described using a consistent structure and format to provide necessary context for users across the data lifecycle from data management to accessibility and interoperability. In biological data, there are a few important metadata standards to be aware of: the Ecological Metadata Language (EML), ISO-19115, and MIxS.
2.1 Ecological Metadata Language (EML)
What Is It?
Ecological Metadata Language (EML) is a metadata schema (i.e. standard) that was developed by the ecological community for ecological data, including biological data. Shared data can include an EML file that provides context for all files in the data “package”. EML is presented in Extensible Markup Language (XML), which provides standard details for ecological data in a structure that is readable to both people and machines.
For more information, see the full EML documentation or the National Center for Ecological Analysis and Synthesis (NCEAS)EML GitHub repository.
Why?
It provides context and improves reproducibility of the data.
It captures important links and relationships between data, such as a time series, hierarchical taxonomies, IDs, and authoritative material.
EML helps represent ecological information in a standardized way.
EML is mandatory for LTER, iLTER, OBIS, GBIF, Darwin Core Archive (DwC-A) data sharing.
Top Resources
Tools or packages to help write EML:
For data managers, coders:
For scientists or those not inclined to write scripts:
2.2 ISO 19115
What Is It?
ISO-19115 is a metadata standard, developed and maintained by the International Standards Organization (ISO), for describing geographic data. Biological data are inherently geographic, especially as we strive to understand how occurrences are impacted by ecological or environmental variables. ISO-19115 provides information about the identification, extent, quality, spatial and temporal schema, spatial reference, and distribution of geographic data. It evolved from the need for flexibility in harmonizing the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) with other geospatial standards. For more information about implementations and extensions of ISO 19115, including for remotely sensed imagery and gridded data, see the Digital Curation Centre ISO 19115 guide or the NCEI Metadata Workbook.
Why?
It helps to provide important geographic context to data in a standardized way.
Using ISO metadata is mandatory for some U.S. federal agencies, like NOAA, NASA and USGS, to share their data through government repositories.
It can be used to describe individual files, data packages, and collections of datasets.
Top Resources
mdToolkit - mdEditor is a writer for ISO 19115 metadata which uses mdJSON as an intermediary and mdTranslator allows translation to different metadata formats
2.3 Minimum Information about any (x) Sequence (MIxS)
What Is It?
MIxS (pronounced MIX-ess) is a set of checklists and packages for molecular genomic sequence data, such as DNA and RNA. MIxS is a standard published by the Genome Standards Consortium (GSC) for molecular biologists and ecologists who create, manage, and archive sequence data. It includes a breadth of environment-specific metadata variables (e.g. soil variables) to augment genome-specific checklists (e.g. bacteria) and enables interoperability with environmental analyses.
Why?
It helps to provide minimal standardized metadata about genetic sequence data.
It is used by the International Nucleotide Sequence Database Collection (INSDC), which has the following member participating databases: ROIS - NIG, EMBL-EBI and NCBI.
Top References
MIxS is maintained by the community using GitHub. To propose changes, ask questions, see the MIxS GitHub repository.
Minimum Information about Sequence Data from the Built Environment (MIxS-BE)