5 Make Your Data Internet Ready

Web services and standards are useful to understand if you collect or manage biological data. Platforms like NOAA’s NCEI, NASA, OBIS, and GBIF, etc. utilize standard web services to serve data. Web-friendly data standards facilitate the transfer and handling of data via web services by making information visible in a predictable way, promoting online sharing, programmatic discovery, access, and processing of data across platforms and disciplines.

5.1 Web-enabled standards

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe different aspects of the World Wide Web. Web standards are created by standards bodies, which are institutions that invite groups of people to come together and agree on how the technologies should work in the best way to fulfill specific use cases. Web standards are key to global data discovery.

5.1.1 W3C standards

What Is It?

The World Wide Web Consortium (W3C) develops the W3C standards, which serve as building blocks to build internet browsers, web pages, blogs, search engines, and other software that power our experience on the web. Although HTML is its cornerstone, W3C publishes a range of technical reports, which help move the web forward, like CSS, SVG, WOFF, WebRTC, XML, and a growing variety of APIs.

Why?

Developers can create interactive experiences available on any device.
Data can be made more FAIR by increasing your awareness of these standards.

Top Resources

W3Schools offers a variety of tutorials for free

5.1.2 Dublin Core Standard

What Is It?

Dublin Core is a metadata standard of 15 ‘core’ terms originally developed for archives and libraries to describe physical or digital resources and details about their collection. Darwin Core is an extension of Dublin Core for biodiversity information.

Why?

Crowd source: Why is it beneficial to know about Dublin Core for internet ready data?

Top Resources

5.1.3 DataCite

What Is It?

The DataCite metadata schema is an international, not-for-profit organization which aims to improve data citation through web-enabled standards that connect products and citations.

Why?

Helps mint persistent identifiers, such as digital object identifiers (DOI), for research products, which enables data archiving and long-term preservation
Helps connect the research product to researchers through other persistent identifiers, such as ORCIDs for researchers or ROR for organizations
Promotes long-term preservation, accessibility, reuse, and attribution of research products with citable contributions to a scholarly record
Connects users and publishing machinery

Top Resources

REST API, which enables retrieval, creation, and update of a DOI metadata record.
Additional documentation on DataCite

5.1.4 Schema.org

What Is It?

Schema.org provides documentation on a set of extensible schemas, which are schemas where users can use components to create other schemas. This enables users to embed structured data on their web pages to help search engines understand the information presented and provide richer search results. Using schema.org vocabulary as well as various formats (e.g., JSON-LD) to mark up website content with metadata about itself, makes it easier for websites or data records to not only be searched but also for the relationships between them to be understood.

Why?

Make your research more easily and prominently discvoerable through major search engines.

Top Resources

You can add schema.org markup to your webpages or records using various online tools, including Google’s Structured Data Markup Helper, or by directly adding code to your webpages.
Documentation on schema.org

5.2 Web Services

Web services run much of our digital world today. You probably use them through your phone every day, without noticing a thing. You can think of a web service as a waiter at a restaurant. You (the user) order food (a request), the waiter (the web service) takes your order to the kitchen (the server or application), and then brings you back your food (the response). This allows different parts of a computer system or different systems altogether to interact without needing to know how each other works internally. When web services are fully utilized, it results in impressive hi-speed analysis, like the analytics shared during football (all types 🙂) games, the olympics, and other sporting events.

5.2.1 ERDDAP™ Web Service

What Is It?

ERDDAP™ (pronounced ur-dap) is a data server that offers users a simple and consistent way to download, integrate, analyze, visualize, and map multiple scientific datasets from different sources and scientific communities – typically oceanographic and atmospheric data.

To facilitate comparisons of data from different datasets, requests and results in ERDDAP™ use standardized space/time axis, which makes it easier for users to specify data constraints in requests without having to worry about the data format. ERDDAP™ allows users to request a subset of a dataset, and can convert the subset to a desired file format such as .csv, .json, .nc and others, for download.

Why?

ERDDAP™ is free, open source, and used globally
All information, data, and figures made available via ERDDAP™ are also available via an API, making data programmatically accessible.
ERDDAP™ has a RESTful web service which is designed to be easy for computer programs and scripts to use or interact with.
Used for oceanographic and atmospheric datasets, but also works great for biological and biodiversity-relevant observations
Good for both gridded and tabular data - See table dataset API docs here, and for gridded datasets here.

Top Resources

CoastWatch Training and specifically ERDDAP basics
Awesome ERDDAP
Overview: Distributed Model Data Access
Data providers can set up their own ERDDAP server to serve up their data.
Additional overall documentation on ERDDAP can be found here.

5.2.2 Thematic Real-time Environmental Distributed Data Services (THREDDS)

What Is It?

The THREDDS server, which was developed prior to ERDDAP, has features and interfaces that makes it easier to explore and use data.

Why?

Crowd source: What is beneficial about THREDDS for internet ready data?

Top Resources

A comparison of ERDDAP and THREDDS

5.2.3 Web Map Service

What Is It?

A Web Map Service (WMS) is a way to retrieve georegistered map images over the internet to display in applications and web pages. The WMS specifications were developed by the Open Geospatial Consortium (OGC) to enable interoperability and use in web browsers, open-source GIS software (ex. QGIS), and proprietary GIS software (ex. Esri).

Why?

WMS allows you to view and use maps from different sources that host the maps and data used to create them without needing to download them.

Top Resources

Crowd source: What is beneficial about Web Map Services for internet ready data?