4 Make Your Data Interoperable
The term “semantic resources” refers to a suite of information frameworks - such as controlled vocabularies, taxonomies, and ontologies - that provide definitions and context to natural language terms that are found within datasets. Semantic resources help disambiguate what is meant by a certain word or phrase. This facilitates interoperability, machine readability, and cross-dataset comparisons. Controlled vocabularies, taxonomic authorities, habitat classifications, and ontologies are all important semantic resources that are used in biological data standardization.
4.1 Controlled Vocabularies
Controlled vocabularies are pre-determined standardized terms and definitions used to describe a specific entity, collection, parameter, or unit of measurement in either metadata or data. While they facilitate computer-readability, they also reduce ambiguity around terms. Controlled vocabularies are designed to fit specific schema and are routinely updated by the communities that use them.
4.1.1 Natural Environmental Research Council (NERC) Vocabulary Server (NVS) 🌳
What is it?
The Natural Environment Research Council (NERC)-funded Vocabulary Server (NVS) provides access to standardized and hierarchically-organized vocabularies, primarily in oceanographic and associated domains. NVS is managed by the British Oceanographic Data Centre at the National Oceanography Centre (NOC).
Why?
It is used by the marine science community in the UK (MEDIN), Europe (SeaDataNet), and globally, by a variety of organisations and networks.
By connecting terms held in controlled lists using standards, the data described by these controlled vocabularies become more interoperable and hence more broadly reusable.
It becomes possible to build a truly distributed and interoperable data ecosystem across domain boundaries, enabling data reuse no matter the purpose for which they were collected in the first place.
Top Resources
The NVS Search can be used to find controlled vocabulary terms
The NVS Vocab Search can search the entire NVS content
SeaDataNet Search searches only collections used by the SeaDataNet data infrastructure
4.1.2 Global Change Master Directory (GCMD) Keywords 🌏
What is it?
The Global Change Master Directory (GCMD) Keywords are a standardized set of terms used to describe Earth science data sets and services. They serve as a common language for categorizing and searching for data related to Earth science, environmental science, and global change research
Why?
GCMD Keywords provide a standardized vocabulary for describing Earth science data, ensuring consistency and interoperability across different data sets and repositories.
The GCMD keywords describe Earth science data and services consistently and comprehensively in a hierarchical format and follow a codified governance process.
The power of the keywords is in their ability to enable scientists to tag their data using a taxonomy of controlled scientific categories. This, in turn, allows those searching for data to discover datasets easily through the use of an established hierarchy.
Top Resources
- More information can be found here:Global Change Master Directory (GCMD) Keywords | Earthdata (nasa.gov)
4.3 Habitat Classification
Habitat classification is the process of organizing quantitative observations (i.e., data collected by various methods and instruments) about the natural world into meaningful, human-understandable representations and descriptions. Habitat classification standards or systems provide terminology and methodology or guidance so that classification can be performed in a standard manner by individual projects and programs.
4.3.1 Coastal and Marine Ecological Classification Standard (CMECS) 🪸
What is it?
The Coastal and Marine Ecological Classification Standard (CMECS) was developed by a consortium of scientists and coastal managers to meet the needs of inventorying, monitoring and managing natural resources in U.S. and territorial waters. It is a structured dictionary of defined terms, or “ecological units”, that characterize the biotic and abiotic characteristics of benthic habitats in marine, estuarine, and lacustrine settings. CMECS’ ecoloigcal units are descriptive and may also be categorical; they are used for interpreting and classifying observational data and integrating information about physical environments and associated biota. Units are organized in a spatially-scaled hierarchical framework that works well for annotating geospatial data and developing map layers or constructing three-dimensional representations of the ecological conditions and associated biological communities.
CMECS is endorsed by the U.S. Federal Geographic Data Committee (FGDC) as the national standard for ecological classification FGDC-STD-018-2012 CMECS was developed to be compatible with other FGDC-endorsed standards listed below. CMECS adopts the Marine System, Estuarine System (with some modification), and Lacustrine System for the its Aquatic Setting from the Wetlands Classification Standard (FGDC-STD-004-2013), and the CMECS Biotic Component includes some vegetation communities and associations from both the Wetlands Classifciation Standard and the National Vegetation Classification Standard (FGDC-STD-005-2008).
Why?
- Data collectors and analysts can describe data using standard terminology and organization structures so that information is consistent among projects and over time.
- Using CMECS enables data discovery, data use and re-use, and broader analytical applications of data federally-funded data assets.
Top Resources
Access The CMECS Catalog is the collection of ecological units (defined terms) represented in the Web Ontology Language (OWL) format. The CMECS GitHub provides access to versioned releases of the CMECS OWL file and table and text versions. Details about the CMECS Catalog and information about the update process can be found in the wiki.
CMECS on Ecoportal for browsing the most recent version of the CMECS Catalog and linking directly to individual units’ definitions and properties.
Documentation background about the CMECS standard and how to use it, including technical guidance and examples of CMECS application in various locations and settings.
4.3.2 U.S. National Vegetation Classification (NVCS) 🌻
What is it?
The U.S National Vegetation Classification (USNVC) is the comprehensive, standardized, and hierarchical classification system for all vegetation types in the United States. Because several agencies, each with its own sampling protocols, are tasked with mapping and describing vegetation in the United States, the resultant inventories are not automatically interoperable, making vegetation resource monitoring across jurisdictional boundaries and scales challenging. The USNVC, a collaboration between the Ecological Society of America (ESA), NatureServe, and various federal agencies, was created to address this need. It provides a common language that allows for communication and cooperation on vegetation management issues across jurisdictional boundaries for the effective management and conservation of plant communities.
Vegetation modeling and mapping are relevant to many conservation efforts, including land inventories, wildlife habitat inventories, enhancing natural resource conservation efforts, fire management, invasive species management, and setting national vegetation policies (e.g. biofuels, carbon markets, and ecosystem services).
Why?
The USNVC is endorsed by the U.S. Federal Geographic Data Committee (FGDC) as the national standard for vegetation classification (FGDC-STD-005-2008 (Version 2) and provides a methodology and guidelines for vegetation data collection and analysis, ensuring consistent reporting on the nation’s vegetation resources.
As a dynamic standard, the USNVC is designed to be easily adapted as new ecological knowledge becomes available.
Its hierarchical nature makes classification scalable for diverse applications from vegetation monitoring to broad-scale analyses of trends across North America.
Top Resources
Overview of the USNVC Database
ESA’s collection of USNVC resources, including fact sheets, presentations, webinars, and posters.
4.3.3 National Wetlands Classification System (?) (NWCS) 🐸
What is it?
The primary objective of the Classification of Wetlands and Deepwater Habitats of the United States, as originally drafted by Cowardin et al. (1979:3), was “to impose boundaries on natural ecosystems for the purposes of inventory, evaluation, and management.” The FGDC Wetlands Classification Standard (WCS) provides minimum requirements and guidelines for classification of both wetlands and deepwater habitats that are consistent with the FGDC Wetlands Mapping Standard (FGDC-STD-015-2009).
Why?
NWCS was developed to support a detailed inventory and periodic monitoring of the Nation’s wet habitats using remote sensing.
It has been an official National Standard since 1996 (FGDC-STD-004), and has been the de facto standard for mapping U.S. wetlands and deepwater habitats since 1976
The NVC and Wetlands standard is endorsed as a Federal Geographic Data Committee (FGDC) standard in the U.S. for aquatic environmental data so that data collectors and analysts can describe data using standard terminology and organization structures.
Top Resources
Wetland Classification codes in table and tool formats
The second edition of Classification of Wetlands and Deepwater Habitats of the United States, which outlines the underlying concepts, definitions, systems, and sub-systems.
4.4 Ontologies
Ontologies are semantic resources that describe a set of concepts, their definitions and properties, and the relationships between. They facilitate the machine readability and usability of datasets by enabling semantic reasoners to analyze data and make inferences based on the relationships defined in the ontology.
4.4.1 Environmental Ontology (ENVO) 🌿
What is it?
The Environmental Ontology (ENVO) is an ontology of all things environmental (e.g. systems, components, and processes). ENVO aids humans, machines, and semantic web applications in understanding environmental entities of all kinds, from microscopic to intergalactic scales, increasing the interoperability of environmental descriptions. ENVO terms can be used within various standards, like the MIxS metadata standard (see here) or Darwin Core standard, to describe the materials that compose your sample or the environment where the sample was collected. Integrating ENVO within other standards enhances the ability to integrate environmental information with species occurrence data, improving data quality and usability.
Why?
ENVO provides a comprehensive standardized vocabulary for describing habitats, ecosystems, and environmental processes.
It incorporates the relationships between objects (Ontology).
It is used internationally.
Several portals and ontology browsing interfaces already harvest from ENVO.
Top Resources
Buttigieg et al. (2013) article introducing ENVO in Journal of Biomedical Semantics
Buttigieg et al. (2016) article revisiting ENVO
Instructions on how to Browse ENVO terms