Please contribute to this site if you have input. We welcome pull requests.

Additionally, the community has noted the following specific needs for input or experimentation:

  • Identifying commonalities across communities, organizations, and source data formats

  • Performance analysis on a variety of data organizations, analyses and data structure types

  • Chunking and compression options in the context of scalable data access to model output

  • Data on how optimization decisions vary between different access clients like remote users, dask, and spark clusters

  • Provenance – how to maintain a record of changes as the data gets reformatted and repacked

    • Maintain a src link back to origin

    • When data is broken into granules, how do you identify a singular dataset that comes from an origin?

    • Micro-changes to data – do these qualify as a completely new version

    • Provenance chaining as data is subsetted, combined, recombined, reformatted going hand to hand.