4  Make Your Data Interoperable

The term “semantic resources” refers to a suite of information frameworks - such as controlled vocabularies, taxonomies, and ontologies - that provide definitions and context to natural language terms that are found within datasets. Semantic resources help disambiguate what is meant by a certain word or phrase. This facilitates interoperability, machine readability, and cross-dataset comparisons. Controlled vocabularies, taxonomic authorities, habitat classifications, and ontologies are all important semantic resources that are used in biological data standardization.

4.1 Controlled Vocabularies

Controlled vocabularies are pre-determined standardized terms and definitions used to describe a specific entity, collection, parameter, or unit of measurement in either metadata or data. While they facilitate computer-readability, they also reduce ambiguity around terms. Controlled vocabularies are designed to fit specific schema and are routinely updated by the communities that use them.

4.1.1 Natural Environmental Research Council (NERC) Vocabulary Server (NVS) 🌳

What is it?

The Natural Environment Research Council (NERC)-funded Vocabulary Server (NVS) provides access to standardized and hierarchically-organized vocabularies, primarily in oceanographic and associated domains. NVS is managed by the British Oceanographic Data Centre at the National Oceanography Centre (NOC).

Why?

  • It is used by the marine science community in the UK (MEDIN), Europe (SeaDataNet), and globally, by a variety of organisations and networks.

  • By connecting terms held in controlled lists using standards, the data described by these controlled vocabularies become more interoperable and hence more broadly reusable.

  • It becomes possible to build a truly distributed and interoperable data ecosystem across domain boundaries, enabling data reuse no matter the purpose for which they were collected in the first place.

Top Resources

4.1.2 Global Change Master Directory (GCMD) Keywords 🌏

What is it?

The Global Change Master Directory (GCMD) Keywords are a standardized set of terms used to describe Earth science data sets and services. They serve as a common language for categorizing and searching for data related to Earth science, environmental science, and global change research

Why?

  • GCMD Keywords provide a standardized vocabulary for describing Earth science data, ensuring consistency and interoperability across different data sets and repositories.

  • The GCMD keywords describe Earth science data and services consistently and comprehensively in a hierarchical format and follow a codified governance process.

  • The power of the keywords is in their ability to enable scientists to tag their data using a taxonomy of controlled scientific categories. This, in turn, allows those searching for data to discover datasets easily through the use of an established hierarchy.

Top Resources

4.2 Taxonomic Authorities

Taxonomic authorities ensure clarity and consistency of organism taxonomic identity (e.g. species, genus, etc.), regardless of changes to scientific name over time or differences in common name. In addition to nomenclature, taxonomic authorities often also include taxonomy and distribution.

4.2.1 Catalogue of Life (COL) 📒

What Is It?

Catalogue of Life describes itself as, “The most complete authoritative list of the world’s species - maintained by hundreds of global taxonomists.” It brings together information from taxonomists, and taxonomic databases, to construct an integrated view of currently accepted species across all taxonomic groups. A list of source datasets can be found here.

The primary mission of COL is to deliver data, but the tools and services offered by COL also enable taxonomists and other stakeholders to publish and revise species lists for any purpose.

Why?

  • COL adds persistent identifiers that enable users to track changes to a scientific name. 

  • COL helps downstream users consider the most up-to-date past and current characteristics of an organism: its biology, distribution, relevance to humans, and evolutionary history.

Top Resources

4.2.2 Integrated Taxonomic Information System (ITIS) 🦬

What Is It?

ITIS is a taxonomic database, developed and maintained by a partnership of federal agencies that provides reliable information on the nomenclature, taxonomy, and distribution of 1.8 million species of plants, animals, fungi and microbes in North America and the world. ITIS couples each scientific name with a unique taxonomic serial number (TSN), which ensures consistency and accuracy in the naming and classification of species.

Why?

  • Unique taxonomic serial number (TSN) ensures consistent and accurate naming and classification of species

  • ITIS is an important tool for identifying and cataloging species and monitoring their populations. 

Top Resources

4.2.3 Paleobiology Database (PBDB) 🦣

What Is It?

The Paleobiology Database (PBDB) is an expert-curated database that aims to provide taxonomic information for paleobiological taxa of all geological ages. It contains data for almost half a million paleobiological taxa from over 900 different contributors.

Why?

  • Checking your paleobiological taxonomic names against the PBDB will ensure the names are up-to-date based on current taxonomic literature as reflected in the database.

  • PBDB is a significant contributor to the taxonomic backbone of the Global Biodiversity Information Facility (GBIF). This means aligning your taxonomic names with PBDB will make the process of sharing your data easier.

Top Resources

  • PBDB can be accessed via their website, mobile applications, and an API.

  • The PBDB website has a Resources tab where more information about these access points can be found. 

  • The same Resources page also includes information on how to contribute taxonomic information to PBDB. Lessons are available via a playlist of YouTube videos. Note that you will need to be granted Contributory User access to contribute to PBDB. More information about the process and different user types is available in the PBDB User Guide.

4.2.4 World Register of Marine Species (WoRMS) 🐋

What Is It?

The World Register of Marine Species (WoRMS) is an authoritative and comprehensive list of names of marine organisms. It provides detailed information about marine species from around the world, helping anyone interested in marine life to find accurate and up-to-date information about these species, including where they can be found, their characteristics, and how they are related to one another. 

Why?

  • WoRMS content is curated by taxonomic and thematic marine experts. 

  • Each taxonomic group is represented by an expert who has the authority over the content, and is responsible for controlling the quality of the information. Each of these main taxonomic editors can invite several specialists of smaller groups within their area of responsibility to join them. 

  • WoRMS is the taxonomic database used by OBIS, and other important biological initiatives.

Top Resources

4.3 Habitat Classification

Habitat classification is the process of organizing quantitative observations (i.e., data collected by various methods and instruments) about the natural world into meaningful, human-understandable representations and descriptions. Habitat classification standards or systems provide terminology and methodology or guidance so that classification can be performed in a standard manner by individual projects and programs.

4.3.1 Coastal and Marine Ecological Classification Standard (CMECS) 🪸

What is it?

The Coastal and Marine Ecological Classification Standard (CMECS) was developed by a consortium of scientists and coastal managers to meet the needs of inventorying, monitoring and managing natural resources in U.S. and territorial waters. It is a structured dictionary of defined terms, or “ecological units”, that characterize the biotic and abiotic characteristics of benthic habitats in marine, estuarine, and lacustrine settings. CMECS’ ecoloigcal units are descriptive and may also be categorical; they are used for interpreting and classifying observational data and integrating information about physical environments and associated biota. Units are organized in a spatially-scaled hierarchical framework that works well for annotating geospatial data and developing map layers or constructing three-dimensional representations of the ecological conditions and associated biological communities.

CMECS is endorsed by the U.S. Federal Geographic Data Committee (FGDC) as the national standard for ecological classification FGDC-STD-018-2012 CMECS was developed to be compatible with other FGDC-endorsed standards listed below. CMECS adopts the Marine System, Estuarine System (with some modification), and Lacustrine System for the its Aquatic Setting from the Wetlands Classification Standard (FGDC-STD-004-2013), and the CMECS Biotic Component includes some vegetation communities and associations from both the Wetlands Classifciation Standard and the National Vegetation Classification Standard (FGDC-STD-005-2008).

Why?

  • Data collectors and analysts can describe data using standard terminology and organization structures so that information is consistent among projects and over time.
  • Using CMECS enables data discovery, data use and re-use, and broader analytical applications of data federally-funded data assets.

Top Resources

  • Access The CMECS Catalog is the collection of ecological units (defined terms) represented in the Web Ontology Language (OWL) format. The CMECS GitHub provides access to versioned releases of the CMECS OWL file and table and text versions. Details about the CMECS Catalog and information about the update process can be found in the wiki.

  • CMECS on Ecoportal for browsing the most recent version of the CMECS Catalog and linking directly to individual units’ definitions and properties.

  • Documentation background about the CMECS standard and how to use it, including technical guidance and examples of CMECS application in various locations and settings.

  • Recorded Presentation on CMECS

4.3.2 U.S. National Vegetation Classification (NVCS) 🌻

What is it?

The U.S National Vegetation Classification (USNVC) is the comprehensive, standardized, and hierarchical classification system for all vegetation types in the United States. Because several agencies, each with its own sampling protocols, are tasked with mapping and describing vegetation in the United States, the resultant inventories are not automatically interoperable, making vegetation resource monitoring across jurisdictional boundaries and scales challenging. The USNVC, a collaboration between the Ecological Society of America (ESA), NatureServe, and various federal agencies, was created to address this need. It provides a common language that allows for communication and cooperation on vegetation management issues across jurisdictional boundaries for the effective management and conservation of plant communities.

Natural classification at different scales

Vegetation modeling and mapping are relevant to many conservation efforts, including land inventories, wildlife habitat inventories, enhancing natural resource conservation efforts, fire management, invasive species management, and setting national vegetation policies (e.g. biofuels, carbon markets, and ecosystem services).

Why?

  • The USNVC is endorsed by the U.S. Federal Geographic Data Committee (FGDC) as the national standard for vegetation classification (FGDC-STD-005-2008 (Version 2) and provides a methodology and guidelines for vegetation data collection and analysis, ensuring consistent reporting on the nation’s vegetation resources.

  • As a dynamic standard, the USNVC is designed to be easily adapted as new ecological knowledge becomes available.

  • Its hierarchical nature makes classification scalable for diverse applications from vegetation monitoring to broad-scale analyses of trends across North America.

Top Resources

4.3.3 National Wetlands Classification System (?) (NWCS) 🐸

What is it?

The primary objective of the Classification of Wetlands and Deepwater Habitats of the United States, as originally drafted by Cowardin et al. (1979:3), was “to impose boundaries on natural ecosystems for the purposes of inventory, evaluation, and management.” The FGDC Wetlands Classification Standard (WCS) provides minimum requirements and guidelines for classification of both wetlands and deepwater habitats that are consistent with the FGDC Wetlands Mapping Standard (FGDC-STD-015-2009). 

Why?

  • NWCS was developed to support a detailed inventory and periodic monitoring of the Nation’s wet habitats using remote sensing.

  • It has been an official National Standard since 1996 (FGDC-STD-004), and has been the de facto standard for mapping U.S. wetlands and deepwater habitats since 1976

  • The NVC and Wetlands standard is endorsed as a Federal Geographic Data Committee (FGDC) standard in the U.S. for aquatic environmental data so that data collectors and analysts can describe data using standard terminology and organization structures.

Top Resources

  • Wetland Classification codes in table and tool formats

  • The second edition of Classification of Wetlands and Deepwater Habitats of the United States, which outlines the underlying concepts, definitions, systems, and sub-systems.

4.4 Ontologies

Ontologies are semantic resources that describe a set of concepts, their definitions and properties, and the relationships between. They facilitate the machine readability and usability of datasets by enabling semantic reasoners to analyze data and make inferences based on the relationships defined in the ontology.

4.4.1 Environmental Ontology (ENVO) 🌿

What is it?

The Environmental Ontology (ENVO) is an ontology of all things environmental (e.g. systems, components, and processes). ENVO aids humans, machines, and semantic web applications in understanding environmental entities of all kinds, from microscopic to intergalactic scales, increasing the interoperability of environmental descriptions. ENVO terms can be used within various standards, like the MIxS metadata standard (see here) or Darwin Core standard, to describe the materials that compose your sample or the environment where the sample was collected. Integrating ENVO within other standards enhances the ability to integrate environmental information with species occurrence data, improving data quality and usability.

Why?

  • ENVO provides a comprehensive standardized vocabulary for describing habitats, ecosystems, and environmental processes.

  • It incorporates the relationships between objects (Ontology).

  • It is used internationally.

  • Several portals and ontology browsing interfaces already harvest from ENVO.

Top Resources