This document captures Use Cases and Requirements (UCR) for the storage, access, governance, long term preservation and availability of Semantic Web resources via the ESIP Community Ontology Repository (COR). Artifacts to be stored within COR include ontologies, vocabularies, linked data resources, etc. which are of value to the Earth Science community. This document underpins the collaborative work of the Semantic Technologies Committee (STC) as operated by ESIP.

This UCR document represents a collaborative effort by ESIP semantic technologists (and fellow stakeholders) to understand what current and future COR expectations and requirements actually are. The STC aims to engage and collaborate with external parties, on behalf of ESIP, with regards to semantic technologies. It can therefore be assumed that due to ongoing collaborations this document will be revised and updated as necessary. It's state should be considered dynamic.

Introduction

The mission of the STC, is described in our Vision of the Semantic Technologies Committee. We have produced several use cases that aim to represent a voice for semantic technologists within the Earth Science community. From these use cases, a number of requirements for further work are derived. In this document, use cases, requirements and their relationships are described. Requirements and use cases are also related to the deliverables of the STC.

The requirements described in this document serve a dual purp;ose, a) they have been used to evaluate software candidates e.g. COR and b) provide a roadmap for development planning.

Deliverables

The deliverables of STC are tangible outcomes aligned with the 2015-2020 Strategic Plan Goals. For convenience those deliverables are replicated in this chapter. The charter remains the authoritative source of the definition of deliverables.

Development of COR Use Cases and Requirements

A document setting out the range of community problems and issues that the STC is trying to solve (this document).

Development of a Community Governance Model for the Semantic Web for Earth and Environmental Terminology (SWEET) Vocabulary

The STC will work within NASA JPL to ensure transition of the SWEET Ontology from NASA JPL to ESIP where the STC will take community ownership of the resource. Within the scope of this deliverable the STC will address issues surrounding:

  1. hosting SWEET in the ESIP COR,
  2. source code management including the definition of a community development and contribution process,
  3. selecting a suitable open source licensing for SWEET,
  4. establishing a public community forum where stakeholders and interested individuals/groups can follow development,
  5. community development including building the community for the long term sustainability of SWEET as the prima ontological resource within the earth and planatery sciences, and
  6. defining and documenting a release management procedure ensuring that new contributions and developments are made available through formal public open source releases.

Methodology

In order to find out the requirements for the deliverables of the STC, use cases were collected. A use case is a story that describes challenges with respect to semantic data for existing or envisaged information systems. It does not need to adhere to certain standardised format. Use cases are primarily used as a source of requirements.

The STC has derived requirements from the collected use cases. A requirement is something that needs to be achieved by one or more deliverables and is phrased as a specification of functionality. Requirements can lead to one or more tests that can prove whether the requirement is met.

Use Cases

Use cases that describe current problems or future opportunities for the use of Semantic Web resources within the ESIP community have been gathered by the STC. They were mainly contributed by members of STC, but there were also contributions from other interested parties. In this chapter these use cases are listed and identified. Each use case is related to one or more STC deliverables and to one or more requirements for future deliverables.

Use of Semantics within Search Engines

Lewis John McGibbney, NASA JPL, Beth Huffer (Lingua Logica)

▶ Full use case description (click to expand):

The ability to use defined terminology dervied from domain vocabularly has the potential to improve certain types of information retreival tasks. As an example one can imagine a user engaging in a typical search scenario where a query is entered into a search engine interface and a ranked list of results are returned for the query. Domain semantics through use of terms and vocabulary can be utilized to augment/refine the users query with the aim of retreiving more relevant content for the user query.

User profile: A software developer engaged in the development of search tools for the Earth science community. Assume the developer is not familiar with semantics or ontologies.

Scenario: In order to improve the relevancy of search results, the developer of the ACME Earth Science Search Service develops a capability whereby ACME finds standard terms for the search term entered by the ACME user and uses those standard terms to augment the user’s own search term.

Workflow:

  1. A user enters a search term into the ACME search service.
  2. ACME calls the ESIP Ontology portal and finds standard terms that have the same meaning as the term entered by the user.
  3. The ESIP Ontology Portal sends back one or more terms (from one or more ontologies?) matching the ACME user’s input.

Requirements implied by this use case:

  1. The ontology portal has an api via which the ACME system can submit the term to be matched.
  2. The ontology portal can semantically match terms received as input to terms in the ontologies stored there.
  3. The ontology portal can return a set of matching terms to the requesting application.

Browsing a Portal for a Relevant Ontology

Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User profile: Dr. Jane Anderson is a researcher investigating marine ecosystems. She is gathering data on properties of sea water. She maintains a personal database on her laptop computer in which she records the values for salinity and parts-per-million of manganese.

Scenario: Although the data Dr. Anderson is collecting on sea water is initially being recorded in a private database, she hopes later to publish her data, relate it to data collected by other researchers, and publish her results. In order to ensure that her own data can be discovered and that it will be semantically interoperable with that of other researchers, she would like to use standard terms for data elements and their attributes. She browses the ontology portal to find a standard vocabulary for sea water properties.

Workflow:

  1. Dr. Anderson links to the ontology portal home page.
  2. She enters “sea water” into the search dialogue.
  3. The ontology portal returns a set of terms that match “sea water”, with links to the ontologies/vocabularies in which they are found.
  4. Dr. Anderson selects one of the ontologies.
  5. The ontology portal displays information about the term as it is recorded in the selected ontology and displays related terms.
  6. Dr. Anderson then continues searching within the selected ontology, or opens a different linked ontology, for additional terms that are appropriate for her database.

Requirements implied by this use case:

  1. The ontology portal provides the capability of searching across all of the ontologies it stores.
  2. There is a user interface and/or api that accepts a search term as input and returns appropriate results.
  3. There are links among related concepts within an ontology.

Matching Concepts among Ontologies

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

Each concept in an ontology should be mapped to concepts it matches in other ontologies. Exact matches based on string matching of concept names should be provided automatically by the portal. The portal should also support matches entered manually.

Annotating Text

Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User Profile: Roger Brown is a scientist at a prominent University.

Scenario: He recently completed a study on the relationship between x, y and z. The paper he wrote reporting on the results of his study has been accepted for publication in an online journal. The journal requires authors to provide annotations for technical terms found in the document, so that readers can easily access the definitions of such terms. The annotations are especially important because many of the terms used in Dr. Brown’s paper have specialized meanings that are peculiar to his area of research and could easily be misinterpreted by researchers in other disciplines or areas of interest. Annotations are also valuable aids for students.

Workflow:

  1. Dr. Brown accesses the ontology portal’s text annotation tool.
  2. The annotation tool prompts him to upload a text document or enter text directly.
  3. Dr. Brown uploads his document.
  4. The annotation tool prompts Dr. Brown to either select one or more particular ontologies to work from, or select all ontologies.
  5. Dr. Brown selects some ontologies (or selects all ontologies).
  6. The text annotator identifies terms in the uploaded document that match concepts in the selected ontologies.
  7. The text annotator returns a list of concepts from the selected ontologies and indicates the term(s) in the text that the ontology concepts matched, along with information about the ontology in which the concept is found.
  8. Dr. Brown reviews the concept-term matches suggested by the annotation tool and, for each term matched, he indicates whether or not he wants to annotate it with the suggested ontology concept.
  9. The annotation tool inserts hyperlinks to the selected ontology concepts into the text.

Requirements implied by this use case:

  1. The ontology portal includes an annotation tool.
  2. The annotation tool has a UI and/or API that enables users to access the annotation tool.
  3. The annotation tool is able to accept text as input either by uploading a document or by entering text directly.
  4. The annotation tool is able to identify terms in the text that match ontology concepts.
  5. The annotation tool is able to display the extracted terms along with the concepts/ontologies to which they could be mapped.
  6. The annotation tool is able to accept input from users accepting or rejecting suggested matches.
  7. The annotation tool is able to mark up a text document with appropriate hyperlinks.

Subsetting Ontologies into Projects

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

If (someday) there are large numbers of ontologies in the portal, the portal should support a means to identify subsets of ontologies that can be searched and viewed separately.

User Access

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

A portal should provide both a GUI and a SPARQL endpoint for accessing its functionality and its stored ontologies and concepts.

Editing, Extending and Releasing New Versions of an Existing Ontology

Ruth Duerr (Ronin Institute), Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User Profile: Andrea Carter is an information systems engineer at the Roadrunner Science Technology Corp. Becky Stein is a data scientist whose background is in cryospheric science. Both are well-versed in RDF/OWL, set theory, and first-order logic.

Scenario: Once an earth scientist has located an ontology in a portal that matches the scientist’s interest, the scientist should be able to add new domain concepts to the ontology and modify existing concepts for improvement or correction. The changed ontology should be stored as a new version and should not simply replace the original version.

In order to act as a working testbed for an ontology, the ontology repository must include the concept of released versions of ontologies and working versions where the advertised and stable URL's point to the lastest release not the latest working copy.

Specifically related to this Scenario, Ms. Carter has been working with Dr. Stein and other Earth science subject matter experts to develop an ontology in RDF for the Cryosphere. She has recently received approval to publicize the ontology and would like to put it in the ESIP repository, in order to make it available for the broad community of ESIP members. However, Ms. Carter is aware that, just like any code, the ontology is likely to undergo changes as functional and/or technical requirements change, and domain knowledge increases. She and Dr. Stein expect to make periodic changes to the ontology, and hope to encourage other subject matter experts, data scientists, and semantic technology developers to contribute to the ontology. Accordingly, some contributors to the ontology may not be well-versed in RDF/OWL and will want to edit it via a user-friendly interface. Moreover, because changes to the ontology have the potential to cause problems for applications that are using it, it will be necessary to ensure that updates to the ontology are managed under a version control system.

Workflow:

  1. Ms. Carter logs into the ontology portal.
  2. She selects the “upload a new ontology” option.
  3. She uploads one or more files in one of several rdf encoding formats (e.g., ttl, n-triple) which comprise an ontology of the Cryosphere.
  4. The portal logs the date and time that the ontology was uploaded, and its state.
  5. After uploading, Ms. Carter views the ontology in a browser that allows her to see the class structure, view properties of classes, view any instances of classes, and view properties and their properties.
  6. After verifying that the ontology has been properly uploaded, Ms. Carter “publishes” the ontology, thereby making it available to anyone with access to the portal. She exits the portal.
  7. The following week, Dr. Stein logs in to the ontology portal and retrieves the ontology of the Cryosphere uploaded by Andrea Carter.
  8. The portal displays the ontology, giving Dr. Stein the opportunity to browse or edit it.
  9. Dr. Stein selects the edit ontology option.
  10. Dr. Stein makes various changes to the ontology and saves them.
  11. The ontology portal records the changes, logs the date and time of the changes and the author of the changes.
  12. The portal prompts Dr. Stein to either publish a new version of the ontology or save it as a work in progress.
  13. Dr. Stein
    • Saves it as work in progress, or
    • Publishes it as a new version that can be accessed by anyone.
  14. The ontology portal
    • Prompts Dr. Stein to log out, or
    • Prompts Dr. Stein to inidicate whether the previous version of the portal should remain publically available.
  15. Dr. Stein
    • Logs out, or
    • Indicates the previous version should remain publically available, or
    • Indicates the previous version should no longer be publically available.

Requirements implied by this use case:

  1. There is a user authentication system.
  2. A UI and/or API that enables users to upload ontology files (in a variety of formats?)
  3. A UI that allows users to view an existing ontology.
  4. A UI that allows users to edit an existing ontology.
  5. A version control system.

,

,,,,

Upload Large Resources

Blake Regalia (NASA JPL)

▶ Full use case description (click to expand):

Upon loading a large dataset (~320K triples) the user immediately realized some errors with the IRI prefixes in the dataset and regenerated/uploaded a few revisions in quick succession. These large datasets had an impact on system resources in that COR (or more specifcally the JVM running on one of the containers) ran our of memory.

Updating Ontology Metadata

Ruth Duerr (Ronin Institute)

▶ Full use case description (click to expand):

After it was pointed out to me that I had a typo in the dc:title of the Academic Disciplines Ontology I tried to fix that issue by using the COR Edit new version -> Edit metadata facility. However, none of my metadata showed up, so could not be edited.

In particular there was no dc:title field in the list of available fields.

Attempts to update it using the omv:name field on the form failed to update the dc:title field which remained unchanged.

I think that if there is a term in the ontology that is a well-known term (like dc:title), that's what COR should display in its corresponding field. Why make people enter data twice. This also has the advantage of forcing agreement on standard ways of annotating ontologies! It forces and implicit "same as" on terms between two sets of annotations! Badly needed!!!

,

Use of Ontology Information in Data Processing Workflows

Tristan Wellman (Science Analytics and Synthesis, U.S. Geological Survey)

▶ Full use case description (click to expand):

A base ontology is created to describe term identifiers, labels, and definitions, which are used for processing data records through OBIS-USA and NOAA NCEI. ESIP COR provides a stable, publically-available endpoint used in the data processing workflow. As part of the workflow, basic ontology information and external supplementary information describing each variable (term) are infused as metadata into NetCDF data files. Real-time feedback could be useful to ensure variable information and ontology information continuously align. As terms are added or modified, ontology versioning is needed to support historical data products which reference this resource.

User Profile: A user or institution that expects to evolve ontology records in an automated workflow and requires reproducibility of the resulting data products that use ontology information.

Scenario: An institution in the Earth science community uses semantic vocabularies stored on public endpoints to describe scientific terms and variables in their data products. When these data products are created or revised ontologies should be updated in step. Versioning should be used to reproduce vocabulary information used in historical case studies.

Workflow:

  1. A code-driven analysis package is activated to process a collection of data files.
  2. A series of quality control and processing functions are conducted in the processing workflow.
  3. A processing function calls ESIP COR to match vocabulary terms defined within the cached ontology.
  4. Additional variable (term) information, such as variable type, units, and alias name are retrieved to enhance default information.
  5. Where vocabulary terms are new or vocabulary information has been revised or enhanced, the ESIP COR instantiation is updated to include the latest publically-available scientific information, potentially in real-time.

Requirements implied by this use case:

  1. The ontology portal has automated versioning capabilities used to preserve ontology definitions in real time. Ontologies can be retrieved by version at user request.
  2. The ontology portal allows authenticated users to update, create, or delete ontologies using a simple API, perhaps generating a modified temporary ontology while preserving the original parent ontology until a review has been completed.

,,,


Requirements

This chapter lists the requirements for the deliverables of the STC, in alphabetical order.

In some requirements the expression 'recommended way' is used. This means that a single best way of doing something is sought. It does not say anything about the form this recommended way should have, or who should make the recommendation. A recommended way could be a formal or community recommendation or standard from an authoritative body like ESIP, OGC or W3C, but it could just as well be a more informal specification, as long as it is arguably the best way of doing something.

COR shall provide a user authentication system

There is a user authentication system.

,

COR shall provide an ontology upload mechanism

A graphical user interface and/or API that enables users to upload ontology files (in a variety of formats?)

,

COR shall enable viewing of existing resources

A graphical user interface or REST API that allows users to view an existing ontology.

,

COR shall enable editing of existing resources

A graphical user interface or REST API that allows users to edit an existing ontology.

COR shall provide a version control management capability

A version control system.

API documentation shall be provided alongside COR

It is absolutely essential that developer-level API documentation is readily available alongside a SRI such that developers can easily develop client applications around the portal.

COR shall facilitate an upload mechanism for large resources

COR shall provide a capability to upload large multi-GB resources.

Requirements by deliverable

For convenience, this chapter lists requirements grouped by STC deliverable.

Acknowledgements

The editors are grateful for all contributions made to this document, in particular the contributors of the use cases and the all the members of the STC that helped with deriving and formulating requirements.