Resources for Cloud Data Optimization

Data Formats

Cross-Format Evaluations

Task 51 - Cloud-Optimized Format Study - A NASA cross-format evaluation on the performance characteristics of several formats in cloud environments.

HDF5

Highly Scalable Data Service (HSDS): A REST-based web service for HDF5 data stores, built by The HDF Group, with several optimizations for data access in network environments.

H5Coro: The Cloud-Optimized Read-Only Library

Storage Guidance

Performance Guidelines for Amazon S3: Amazon suggestions to optimize performance on S3.

Benchmarking

Pangeo benchmarking: Benchmarking and scalng studies of the Pangeo Platform.

Chunking

Making earth science data more accessible: experience with chunking and compression: Presentation by XX of unidata provides explanation of what chunking is, why chunking is important for big data access, and guidance for choosing the right chunk shape.

Categorical Data Standards

Communities of geospatial data develop data standards in order to facilitate the adoption and sharing of data and code across users and platforms. Cloud data providers may see greater data use if providing data adhering to these standards. Below are some examples of data standards for a scoped category of geospatial data.

Standard for the Exchange of Earthquake Data (SEED) Communitee on Earth Observing Satellites (CEOS) Analysis Ready Data (ARD) for Land (CARD4L)