5 Make Your Data Internet Ready
Web services and standards are useful to understand if you collect or manage biological data. Platforms like NOAA’s NCEI, NASA, OBIS, and GBIF, etc. utilize standard web services to serve data. Web-friendly data standards facilitate the transfer and handling of data via web services by making information visible in a predictable way, promoting online sharing, programmatic discovery, access, and processing of data across platforms and disciplines.
5.1 Web-enabled standards
Web standards are the formal, non-proprietary standards and other technical specifications that define and describe different aspects of the World Wide Web. Web standards are created by standards bodies, which are institutions that invite groups of people to come together and agree on how the technologies should work in the best way to fulfill specific use cases. Web standards are key to global data discovery.
5.1.1 W3C standards
What Is It?
The World Wide Web Consortium (W3C) develops the W3C standards, which serve as building blocks to build internet browsers, web pages, blogs, search engines, and other software that power our experience on the web. Although HTML is its cornerstone, W3C publishes a range of technical reports, which help move the web forward, like CSS, SVG, WOFF, WebRTC, XML, and a growing variety of APIs.
Why?
Developers can create interactive experiences available on any device.
Data can be made more FAIR by increasing your awareness of these standards.
Top Resources
- W3Schools offers a variety of tutorials for free
5.1.2 Dublin Core Standard
What Is It?
Dublin Core is a metadata standard of 15 ‘core’ terms originally developed for archives and libraries to describe physical or digital resources and details about their collection. Darwin Core is an extension of Dublin Core for biodiversity information.
Why?
Dublin Core can be thought of as the cornerstone of Darwin Core. Created in 1994, Dublin Core provides a simple set of terms for describing digital resources. Standards like Darwin Core build on this foundation and continue to use Dublin Core terms, such as language to describe the language of a record. Because Dublin Core offers a small, universal set of metadata elements (like title, creator, subject, date, identifier) that work across all kinds of digital resources, it has been widely adopted in libraries (to describe books and digital texts), museums (for artifacts and specimens), and across the internet (in data repositories, archives, and search engines).
Its success lies not only in its simplicity but also in its purpose: Dublin Core was designed to make digital resources more discoverable, shareable, and usable. By distilling resource description down to its most essential elements, it ensures consistency while remaining both human- and machine-readable. This balance is what has allowed Dublin Core to become a well-established, widely trusted standard.
Why should you care? As a biologist, you may encounter Dublin Core directly when working with Darwin Core, or indirectly every time you publish a dataset, deposit a record in a repository, or use a data portal. Even if you don’t see it, Dublin Core is almost always at work behind the scenes, enabling data integration, discovery, and reuse. For ESIP and other communities, awareness of Dublin Core matters because it underpins how biodiversity and other scientific data are shared and connected globally.
Top Resources
5.1.3 DataCite
What Is It?
The DataCite metadata schema is an international, not-for-profit organization which aims to improve data citation through web-enabled standards that connect products and citations.
Why?
Helps mint persistent identifiers, such as digital object identifiers (DOI), for research products, which enables data archiving and long-term preservation
Helps connect the research product to researchers through other persistent identifiers, such as ORCIDs for researchers or ROR for organizations
Promotes long-term preservation, accessibility, reuse, and attribution of research products with citable contributions to a scholarly record
Connects users and publishing machinery
Top Resources
REST API, which enables retrieval, creation, and update of a DOI metadata record.
Additional documentation on DataCite
5.1.4 Schema.org
What Is It?
Schema.org provides documentation on a set of extensible schemas, which are schemas where users can use components to create other schemas. This enables users to embed structured data on their web pages to help search engines understand the information presented and provide richer search results. Using schema.org vocabulary as well as various formats (e.g., JSON-LD) to mark up website content with metadata about itself, makes it easier for websites or data records to not only be searched but also for the relationships between them to be understood.
Why?
- Make your research more easily and prominently discvoerable through major search engines.
Top Resources
You can add schema.org markup to your webpages or records using various online tools, including Google’s Structured Data Markup Helper, or by directly adding code to your webpages.
Documentation on schema.org
5.2 Web Services
Web services run much of our digital world today. You probably use them through your phone every day, without noticing a thing. You can think of a web service as a waiter at a restaurant. You (the user) order food (a request), the waiter (the web service) takes your order to the kitchen (the server or application), and then brings you back your food (the response). This allows different parts of a computer system or different systems altogether to interact without needing to know how each other works internally. When web services are fully utilized, it results in impressive hi-speed analysis, like the analytics shared during football (all types 🙂) games, the olympics, and other sporting events.
5.2.1 ERDDAP™ Web Service
What Is It?
ERDDAP™ (pronounced ur-dap) is a data server designed to make it easy to access and work with scientific datasets, especially oceanographic and atmospheric data. It provides a simple, consistent way to download, integrate, analyze, and visualize data from many sources. To help compare across datasets, ERDDAP™ standardizes the space/time axes, so users can set constraints without worrying about the original file format. Users can request just a subset of a dataset and then download it in their preferred format—such as CSV, JSON, NetCDF, and more—making ERDDAP™ a flexible tool for customizing data to fit their needs.
Why?
ERDDAP™ is free, open source, and used globally
All information, data, and figures made available via ERDDAP™ are also available via an API, making data programmatically accessible.
ERDDAP™ has a RESTful web service which is designed to be easy for computer programs and scripts to use or interact with.
Used for oceanographic and atmospheric datasets, but also works great for biological and biodiversity-relevant observations
Good for both gridded and tabular data - See table dataset API docs here, and for gridded datasets here.
Top Resources
CoastWatch Training and specifically ERDDAP basics
Data providers can set up their own ERDDAP server to serve up their data.
Additional overall documentation on ERDDAP can be found here.
5.2.2 Thematic Real-time Environmental Distributed Data Services (THREDDS)
What Is It?
The THREDDS server, is a data service, like ERDDAP. THREDDS is closely tied to OPeNDAP, using it as one of its core protocols for serving scientific data, though it also supports others like WMS, WCS, and NetCDF Subset Service. ERDDAP, on the other hand, is more flexible: it can expose data through OPeNDAP but doesn’t depend on it, instead offering multiple access formats such as CSV, JSON, and custom REST-like APIs.
Why?
- Biologists usually bump into THREDDS when they need to download large environmental datasets; like climate model outputs, ocean circulation fields, or satellite products; because many data centers use THREDDS catalogs to serve those files. If a biologist wants to go a step further and overlay sea surface temperature maps directly in QGIS to compare with marine protected areas, THREDDS is especially useful since it can provide WMS map services that plug right into GIS tools. By contrast, they often turn to ERDDAP when they want to slice, query, or subset those same datasets for just their study region or species, since ERDDAP makes that kind of filtering and export to CSV or JSON much easier.
Top Resources
- A comparison of ERDDAP and THREDDS
5.2.3 Web Map Service
What Is It?
A Web Map Service (WMS) is a way to retrieve georegistered map images over the internet to display in applications and web pages. The WMS specifications were developed by the Open Geospatial Consortium (OGC) to enable interoperability and use in web browsers, open-source GIS software (ex. QGIS), and proprietary GIS software (ex. Esri).
Why?
- WMS allows you to view and use maps from different sources that host the maps and data used to create them without needing to download them.
Top Resources
OGC WMS Standard; Official specification and protocol details
QGIS WMS Tutorial; Hands-on lesson loading WMS layers
GeoServer WMS Guide; Practical WMS setup and usage
Penn State GEOG 585; Course module on web mapping systems
GISGeography WMS Intro; Clear overview of concepts and uses