4  Make Your Data Interoperable

[whole section preamble about “SEMANTIC RESOURCES” (rather than controlled vocabularies, as it was)… Possibly want to connect to framework of understanding, incl. controlled vocabulary, taxonomy, close associations, and ontology = taxonomy + relationships between terms]

Possible content/phrases to incorporate into whole section preamble: Ontology describes a set of concepts or categories in a domain and shows their properties and the relations between them.

Semantic resources (i.e. ______), improve consistency across diverse sources of data and are essential to interoperability. Controlled vocabularies, taxonomic authorities, and habitat classifications are all useful semantic resources for standardizing biological data.

Controlled vocabularies provide standardized terms and definitions for biological concepts

They facilitate searching for data in web portals. They also enable records to be interpreted by computers. This opens up data sets to a whole world of possibilities for automated data workflows, computer aided manipulation, distribution, interoperability, and long-term reuse.

By connecting terms held in controlled lists using standards, the data described by these controlled vocabularies become more interoperable and hence more broadly reusable.

enable records to be interpreted by computers, facilitating automated data workflows, computer aided manipulation, distribution, and interoperability

It becomes possible to build a truly distributed and interoperable data ecosystem across domain boundaries, enabling data reuse no matter the purpose for which they were collected in the first place.

concise, controlled description

Standardized classifications of observational data also allows the information derived from individual projects to be compared with each other and compiled into larger and more extensive representations across space and time. This advances the collective scientific knowledge of natural systems and enables coordinated management activities, such as regional assessments, monitoring and restoration between partners.

Combined with international ontologies like ENVO, which is used by UNESCO, for example, the reach of your data will become global.

4.1 Controlled Vocabularies

Controlled vocabularies are pre-determined standardized terms and definitions used to describe a specific entity, collection, parameter, or unit of measurement in either metadata or data. While they facilitate computer-readability, they also reduce ambiguity around terms. Controlled vocabularies are designed to fit specific schema and are routinely updated by the communities that use them.

4.1.1 Environmental Ontology (ENVO) [emoji]

What is it?

The Environmental Ontology (ENVO) is an ontology (i.e. it describes a set of concepts or categories, their properties, and the relations between them) of all things environmental (e.g. systems, components, and processes). ENVO aids humans, machines, and semantic web applications in understanding environmental entities of all kinds, from microscopic to intergalactic scales, increasing the interoperability of environmental descriptions.

Why?

  • ENVO provides a comprehensive standardized vocabulary for describing habitats, ecosystems, and environmental processes.

  • It incorporates the relationships between objects (Ontology).

  • It is used internationally.

  • Several portals and ontology browsing interfaces already harvest from ENVO.

Top Resources

[Do we want to include this fun fact?

ENVO terms can also be used within the MIxS metadata standard (see here), for example to describe the materials that compose your sample or the environment where the sample was collected.

Could be included in either and link to the other]

4.1.2 Natural Environmental Research Council (NERC) Vocabulary Server (NVS) [emoji]

What is it?

The Natural Environment Research Council (NERC)-funded Vocabulary Server (NVS) provides access to standardized and hierarchically-organized vocabularies, primarily in oceanographic and associated domains. NVS is managed by the British Oceanographic Data Centre at the National Oceanography Centre (NOC).

Why?

  • It is used by the marine science community in the UK (MEDIN), Europe (SeaDataNet), and globally, by a variety of organisations and networks.

  • By connecting terms held in controlled lists using standards, the data described by these controlled vocabularies become more interoperable and hence more broadly reusable.

  • It becomes possible to build a truly distributed and interoperable data ecosystem across domain boundaries, enabling data reuse no matter the purpose for which they were collected in the first place.

Top Resources

4.1.3 Global Change Master Directory (GCMD) Keywords [emoji.. gonna suggest a globe]

What is it?

The Global Change Master Directory (GCMD) Keywords are a standardized set of terms used to describe Earth science data sets and services. They serve as a common language for categorizing and searching for data related to Earth science, environmental science, and global change research

Why?

  • GCMD Keywords provide a standardized vocabulary for describing Earth science data, ensuring consistency and interoperability across different data sets and repositories.

  • The GCMD keywords describe Earth science data and services consistently and comprehensively in a hierarchical format and follow a codified governance process.

  • The power of the keywords is in their ability to enable scientists to tag their data using a taxonomy of controlled scientific categories. This, in turn, allows those searching for data to discover datasets easily through the use of an established hierarchy.

Top Resources

4.2 Taxonomic Authorities

[preamble for Taxonomic Authorities]

4.2.1 Catalogue of Life (COL) [emoji]

What Is It?

Catalogue of Life describes itself as, “The most complete authoritative list of the world’s species - maintained by hundreds of global taxonomists.” It brings together information from taxonomists, and taxonomic databases, to construct an integrated view of currently accepted species across all taxonomic groups. A list of source datasets can be found here.

The primary mission of COL is to deliver data, but the tools and services offered by COL also enable taxonomists and other stakeholders to publish and revise species lists for any purpose.

Why?

  • COL adds persistent identifiers that enable users to track changes to a scientific name. 

  • COL helps downstream users consider the most up-to-date past and current characteristics of an organism: its biology, distribution, relevance to humans, and evolutionary history.

Top Resources

4.2.2 Integrated Taxonomic Information System (ITIS) [emoji]

What Is It?

ITIS is a taxonomic database, developed and maintained by a partnership of federal agencies that provides reliable information on the nomenclature, taxonomy, and distribution of 1.8 million species of plants, animals, fungi and microbes in North America and the world. ITIS couples each scientific name with a unique taxonomic serial number (TSN), which ensures consistency and accuracy in the naming and classification of species.

Why?

  • Unique taxonomic serial number (TSN) ensures consistent and accurate naming and classification of species

  • ITIS is an important tool for identifying and cataloging species and monitoring their populations. 

Top Resources

4.2.3 Paleobiology Database (PBDB) [emoji]

What Is It?

The Paleobiology Database (PBDB) is an online, expert-curated database that aims to provide taxonomic information for paleobiological taxa of all geological ages. It contains data for almost half a million paleobiological taxa from over 900 different contributors.

Why?

  • Checking your paleobiological taxonomic names against the PBDB will ensure the names are up-to-date based on current taxonomic literature as reflected in the database.

  • PBDB provides the taxonomic backbone to the Global Biodiversity Information Facility (GBIF). This means aligning your taxonomic names with PBDB will make the process of sharing your data easier.

Top Resources

  • PBDB can be accessed via their website, mobile applications, and an API.

  • The PBDB website has a Resources tab where more information about these access points can be found. 

  • The same Resources page also includes information on how to contribute taxonomic information to PBDB. Lessons are available via a playlist of YouTube videos. Note that you will need to be granted Contributory User access to contribute to PBDB. More information about the process and different user types is available in the PBDB User Guide.

4.2.4 World Register of Marine Species (WoRMS) 🐋

What Is It?

The World Register of Marine Species (WoRMS) is an authoritative and comprehensive list of names of marine organisms. It provides detailed information about marine species from around the world, helping anyone interested in marine life to find accurate and up-to-date information about these species, including where they can be found, their characteristics, and how they are related to one another. 

Why?

  • WoRMS content is curated by taxonomic and thematic marine experts. 

  • Each taxonomic group is represented by an expert who has the authority over the content, and is responsible for controlling the quality of the information. Each of these main taxonomic editors can invite several specialists of smaller groups within their area of responsibility to join them. 

  • WoRMS is the taxonomic database used by OBIS, and other important biological initiatives.

Top Resources

4.3 Habitat Classification

Habitat classification is the process of organizing quantitative observations (i.e., data collected by various methods and instruments) about the natural world into meaningful, human-understandable representations and descriptions. Habitat classification standards or systems provide terminology and methodology or guidance so that classification can be performed in a standard manner by individual projects and programs.

4.3.1 Coastal and Marine Ecological Classification Standard (CMECS) [emoji]

What is it?

The Coastal and Marine Ecological Classification Standard (CMECS) is a structured dictionary of terms that describe benthic habitats in marine, estuarine, and lacustrine settings. It was developed by a consortium of scientists and coastal managers to meet the needs of inventorying, monitoring and managing natural resources in US and territorial waters. CMECS terms are organized in a spatially-scaled framework for annotating geospatial data and integrating information about the physical components of aquatic habitats that enables three-dimensional representations of ecological conditions for associating with faunal behavior.

Why?

  • CMECS is endorsed as a Federal Geographic Data Committee (FGDC) standard in the U.S. for aquatic environmental data so that data collectors and analysts can describe data using standard terminology and organization structures.

  • Using CMECS enables data discovery, data use and re-use, and broader analytical applications of data federally-funded data assets.

Top Resources

  • Documentation of the CMECS standard, how to use it (including examples), and how to contribute to its improvement

4.3.2 U.S. National Vegetation Classification (NVCS) [emoji]

What is it?

The U.S National Vegetation Classification (USNVC) is the comprehensive, standardized, and hierarchical classification system for all vegetation types in the United States. Because several agencies, each with its own sampling protocols, are tasked with mapping and describing vegetation in the United States, the resultant inventories are not automatically interoperable, making vegetation resource monitoring across jurisdictional boundaries challenging. The USNVC, a collaboration between the Ecological Society of America (ESA), NatureServe, and various federal agencies, was created to address this need. It provides a common language that allows for communication and cooperation on vegetation management issues across jurisdictional boundaries for the effective management and conservation of plant communities.

Vegetation modeling and mapping are relevant to many conservation efforts, including land inventories, wildlife habitat inventories, enhancing natural resource conservation efforts, fire management, invasive species management, and setting national vegetation policies (e.g. biofuels, carbon markets, and ecosystem services).

Why?

  • As a dynamic standard, the USNVC is designed to be easily adapted as new ecological knowledge becomes available.

  • Its hierarchical nature makes classification scalable for diverse applications from vegetation monitoring to broad-scale analyses of trends across North America.

  • The USNVC is governed by standards for vegetation data collection and analysis, ensuring consistent reporting on the nation’s vegetation resources.

Top Resources

4.3.3 National Wetlands Classification System (?) (NWCS) [emoji]

What is it?

The primary objective of the Classification of Wetlands and Deepwater Habitats of the United States, as originally drafted by Cowardin et al. (1979:3), was “to impose boundaries on natural ecosystems for the purposes of inventory, evaluation, and management.” The FGDC Wetlands Classification Standard (WCS) provides minimum requirements and guidelines for classification of both wetlands and deepwater habitats that are consistent with the FGDC Wetlands Mapping Standard (FGDC-STD-015-2009). 

Why?

  • NWCS was developed to support a detailed inventory and periodic monitoring of the Nation’s wet habitats using remote sensing.

  • It has been an official National Standard since 1996 (FGDC-STD-004), and has been the de facto standard for mapping U.S. wetlands and deepwater habitats since 1976

  • The NVC and Wetlands standard is endorsed as a Federal Geographic Data Committee (FGDC) standard in the U.S. for aquatic environmental data so that data collectors and analysts can describe data using standard terminology and organization structures.

Top Resources

  • Wetland Classification codes in table and tool formats

  • The second edition of Classification of Wetlands and Deepwater Habitats of the United States, which outlines the underlying concepts, definitions, systems, and sub-systems.