1  Provide Context and Understandability to Your Data

1.1 Ecological Metadata Language (EML)

What Is It?

EML is a community-developed metadata schema designed for ecological data, which encompasses biological data. EML is normally presented as Extensible Markup Language (XML). An EML instance (XML document) holds metadata to describe one or more data objects. Data tables are the most common, but almost any data object can be accommodated.

Why?

  • Provide context to your data and improve reproducability of the data.
  • Can capture linked data relationships within EML (dataset series)
  • Standardized representation of information.
  • EML was designed for ecological data, which encompasses biological data.
  • It’s taxonomic fields cover relationships (hierarchies), IDs, and authoritative material

Key Information

  • EML Schema

  • Mandatory for LTER, iLTER, OBIS, GBIF, Darwin Core Archive (DwC-A)

  • Maintained, and github repo, managed by NCEAS

  • Usually, what you would submit to a repository is a “data package” consisting of an EML document and one or more data objects.

Top References

Tools or packages to help write EML:

1.2 ISO 19115

What Is It?

Content standard for describing geographic data sponsored by the International Standards Organization (ISO). At its most basic, it is written in narrative form with class diagrams. There are many implementations and extensions (e.g., https://www.dcc.ac.uk/resources/metadata-standards/iso-19115).

Why?

  • Provide context to your data (biological data is inherently ‘geographic’)

  • Standardized representation of information

  • Mandated by some US federal agencies, including NOAA, NASA, and USGS

  • Can be used at different granularities, used to describe data packages or collections, as well as at a dataset level (?): content standard vs collection standard?

What?

  • Evolved from the need for to to harmonize the FGDC Content Standard for Digital Geospatial Metadata (CSDGM) with other formal and defacto standards that support the documentation of geospatial data and services.

  • Many variations including 19115, 19115-1, 19115-2

  • From NCEI:

    • ISO 19115 Geographic information – Metadata: The ISO standard for documenting geospatial data. 

    • ISO 19115-2 Geographic information – Metadata – Part 2: Extensions for imagery and gridded data: An extension of ISO 19115 used to document information about imagery, gridded data, and remotely sensed data. The root of ISO 19115 metadata records will change from MD_Metadata to MI_Metadata when using ISO 19115-2.

  • Usurped FGDC CSDGM - all users encouraged to migrate to ISO.

  • Highly flexible for many uses compared FGDC CSDGM, but few required elements leaves room for incomplete metadata

Top References

1.3 Minimum Information about any (x) Sequence (MIxS)

Who?

This is a standard for molecular data, like DNA and RNA. It is used by molecular biologist and ecologists who generate, manage and archive these type of sequence data.

What is it?

A set of checklists and packages for genomic sequence data.

Why?

  • Provide minimal standardized metadata about genetic sequence data

  • Agreed upon and published by the Genome Standards Consortium

  • Used by the INSDC (DDBJ, EMBL-EBI and NCBI)

Key Information

  • MIxS (pronounced MIX-ess) is a suite of checklists standards introduced the reporting of a breadth of environment-specific metadata variables to augment the genome-specific checklists.

  • Enables mixing and matching of genome checklists and environmental-specific packages.

  • MIxS Structure

Top References