This document describes use cases that describe community Use Cases and Requirements (UCR) for the storage, access, governance, long term preservation and availability of Semantic Web resources via a Semantic Repository Implementation (SRI) potentially hosted by ESIP. Artifacts to be stored within the SRI include ontologies, vocabularies, linked data resources, etc. which are of value to the Earth Science community. This document underpins the collaborative work of the Semantic Technologies Committee (STC) as operated by the ESIP Federation.

This UCR document represents the first unified, collaborative effort by ESIP semantic technologists (and fellow stakeholders) on what current and future SRI expectations and requirements actually are. This being said, best efforts have been made to fully recognize any previous similar efforts on this front. The STC aims to engage and collaborate with external parties, on behalf of ESIP, in regards to semantic technologies. It can therefore be assumed that due to ongoing collaborations this document will be revised and updated as necessary. It's state should be considered dynamic.

Introduction

The mission of the STC, as described in our Vision of the Semantic Technologies Committee, is to .... In particular:

  1. promote research and development of semantic technologies in support of Earth science data discovery, dissemination, and analysis;
  2. collaborate with ESIP members, working groups, clusters, and standing committees on semantic related topics;
  3. foster sharing and reuse of ontologies and controlled vocabularies;
  4. to provide a collaborative environment for the development of ontologies and ontology-based standards;
  5. to maintain a long-term agenda and roadmap for evaluation of semantic technologies within ESIP; and
  6. to collaborate with external agencies, on behalf of ESIP, in regards to semantic technologies.

This document describes first steps of working towards these goals specifically focused on the SRI topic. Members of the working group and other stakeholders have come up with a number of use cases that aim to represent a voice for semantic technologists within the Earth Science community. From these use cases, a number of requirements for further work are derived. In this document, use cases, requirements and their relationships are described. Requirements and use cases are also related to the deliverables of the working group.

The requirements described in this document will be the basis for the evaluation of a suitable SRI for use by the ESIP Semantic Technologies community and hopefully further afield.

Deliverables

The deliverables of STC are tangible outcomes aligned with the 2015-2020 Strategic Plan Goals. For convenience those deliverables are replicated in this chapter. The charter remains the authoritative source of the definition of deliverables.

Development of Semantic Repository Implementation Use Cases and Requirements

A document setting out the range of community problems and issues that the STC is trying to solve (this document).

Evaluation of Semantic Repository Implementation Platform(s)

The STC will undertake testing and evaluation to support ESIP's evaluation of semantic repository solutions in particular providing insight into:

  1. what current and future ESIP Semantic Technologies Committee (STC) community requirements (CR) are,
  2. a comparative evaluation of different technology stacks/approaches; with the aim of assessing different solutions based upon how well they satisfy the Use Cases and Requirements as defined within this document, and
  3. a business cost justification model that could be used to justify funding semantic web technology stacks within ESIP.

All ESIP members (and anyone else with an interest in a SRI hosted at ESIP) are encouraged to contribute Use Cases and Requirements as described in this document.

Development of a Community Governance Model for the Semantic Web for Earth and Environmental Terminology (SWEET) Vocabulary

The STC will work within NASA JPL to ensure transition of the SWEET Ontology from NASA JPL to ESIP where the STC will take community ownership of the resource. Within the scope of this deliverable the STC will address issues surrounding:

  1. hosting SWEET in the ESIP SRI,
  2. source code management including the definition of a community development and contribution process,
  3. selecting a suitable open source licensing for SWEET,
  4. establishing a public community mailing list where stakeholders and interested individuals/groups can follow development,
  5. community development including building the community for the long term sustainability of SWEET as the prima ontological resource within the earth and planatery sciences, and
  6. defining and documenting a release management procedure ensuring that new contributions and developments are made available through formal public open source releases.

Methodology

In order to find out the requirements for the deliverables of the Working Group, use cases were collected. For the purpose of the Working Group, a use case is a story that describes challenges with respect to spatial data on the Web for existing or envisaged information systems. It does not need to adhere to certain standardised format. Use cases are primarily used as a source of requirements, but a use case could be revisited near the time the work of the Working Group will reach completion, to demonstrate that it is now possible to make the use case work.

The Working Group has derived requirements from the collected use cases. A requirement is something that needs to be achieved by one or more deliverables and is phrased as a specification of functionality. Requirements can lead to one or more tests that can prove whether the requirement is met.

Care was taken to only derive requirements that are considered to in scope for the further work of the Working Group. The scope of the Working Group is determined by the charter. To help keeping the requirements in scope, the following questions were applied:

  1. Is the requirement specifically about spatial data on the Web?
  2. Is the use case including data published, reused, and accessible via Web technologies?
  3. Has a use case a description that can lead to a testable requirement?

Use Cases

Use cases that describe current problems or future opportunities for the use of Semantic Web resources within the ESIP community have been gathered by the STC. They were mainly contributed by members of STC, but there were also contributions from other interested parties. In this chapter these use cases are listed and identified. Each use case is related to one or more STC deliverables and to one or more requirements for future deliverables.

Use of Semantics within Search Engines

Lewis John McGibbney, NASA JPL, Beth Huffer (Lingua Logica)

▶ Full use case description (click to expand):

The ability to use defined terminology dervied from domain vocabularly has the potential to improve certain types of information retreival tasks. As an example one can imagine a user engaging in a typical search scenario where a query is entered into a search engine interface and a ranked list of results are returned for the query. Domain semantics through use of terms and vocabulary can be utilized to augment/refine the users query with the aim of retreiving more relevant content for the user query.

User profile: A software developer engaged in the development of search tools for the Earth science community. Assume the developer is not familiar with semantics or ontologies.

Scenario: In order to improve the relevancy of search results, the developer of the ACME Earth Science Search Service develops a capability whereby ACME finds standard terms for the search term entered by the ACME user and uses those standard terms to augment the user’s own search term.

Workflow:

  1. A user enters a search term into the ACME search service.
  2. ACME calls the ESIP Ontology portal and finds standard terms that have the same meaning as the term entered by the user.
  3. The ESIP Ontology Portal sends back one or more terms (from one or more ontologies?) matching the ACME user’s input.

Requirements implied by this use case:

  1. The ontology portal has an api via which the ACME system can submit the term to be matched.
  2. The ontology portal can semantically match terms received as input to terms in the ontologies stored there.
  3. The ontology portal can return a set of matching terms to the requesting application.

,

Browsing a Portal for a Relevant Ontology

Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User profile: Dr. Jane Anderson is a researcher investigating marine ecosystems. She is gathering data on properties of sea water. She maintains a personal database on her laptop computer in which she records the values for salinity and parts-per-million of manganese.

Scenario: Although the data Dr. Anderson is collecting on sea water is initially being recorded in a private database, she hopes later to publish her data, relate it to data collected by other researchers, and publish her results. In order to ensure that her own data can be discovered and that it will be semantically interoperable with that of other researchers, she would like to use standard terms for data elements and their attributes. She browses the ontology portal to find a standard vocabulary for sea water properties.

Workflow:

  1. Dr. Anderson links to the ontology portal home page.
  2. She enters “sea water” into the search dialogue.
  3. The ontology portal returns a set of terms that match “sea water”, with links to the ontologies/vocabularies in which they are found.
  4. Dr. Anderson selects one of the ontologies.
  5. The ontology portal displays information about the term as it is recorded in the selected ontology and displays related terms.
  6. Dr. Anderson then continues searching within the selected ontology, or opens a different linked ontology, for additional terms that are appropriate for her database.

Requirements implied by this use case:

  1. The ontology portal provides the capability of searching across all of the ontologies it stores.
  2. There is a user interface and/or api that accepts a search term as input and returns appropriate results.
  3. There are links among related concepts within an ontology.

Matching Concepts among Ontologies

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

Each concept in an ontology should be mapped to concepts it matches in other ontologies. Exact matches based on string matching of concept names should be provided automatically by the portal. The portal should also support matches entered manually.

Annotating Text

Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User Profile: Roger Brown is a scientist at a prominent University.

Scenario: He recently completed a study on the relationship between x, y and z. The paper he wrote reporting on the results of his study has been accepted for publication in an online journal. The journal requires authors to provide annotations for technical terms found in the document, so that readers can easily access the definitions of such terms. The annotations are especially important because many of the terms used in Dr. Brown’s paper have specialized meanings that are peculiar to his area of research and could easily be misinterpreted by researchers in other disciplines or areas of interest. Annotations are also valuable aids for students.

Workflow:

  1. Dr. Brown accesses the ontology portal’s text annotation tool.
  2. The annotation tool prompts him to upload a text document or enter text directly.
  3. Dr. Brown uploads his document.
  4. The annotation tool prompts Dr. Brown to either select one or more particular ontologies to work from, or select all ontologies.
  5. Dr. Brown selects some ontologies (or selects all ontologies).
  6. The text annotator identifies terms in the uploaded document that match concepts in the selected ontologies.
  7. The text annotator returns a list of concepts from the selected ontologies and indicates the term(s) in the text that the ontology concepts matched, along with information about the ontology in which the concept is found.
  8. Dr. Brown reviews the concept-term matches suggested by the annotation tool and, for each term matched, he indicates whether or not he wants to annotate it with the suggested ontology concept.
  9. The annotation tool inserts hyperlinks to the selected ontology concepts into the text.

Requirements implied by this use case:

  1. The ontology portal includes an annotation tool.
  2. The annotation tool has a UI and/or API that enables users to access the annotation tool.
  3. The annotation tool is able to accept text as input either by uploading a document or by entering text directly.
  4. The annotation tool is able to identify terms in the text that match ontology concepts.
  5. The annotation tool is able to display the extracted terms along with the concepts/ontologies to which they could be mapped.
  6. The annotation tool is able to accept input from users accepting or rejecting suggested matches.
  7. The annotation tool is able to mark up a text document with appropriate hyperlinks.

Subsetting Ontologies into Projects

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

If (someday) there are large numbers of ontologies in the portal, the portal should support a means to identify subsets of ontologies that can be searched and viewed separately.

User Access

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

A portal should provide both a GUI and a SPARQL endpoint for accessing its functionality and its stored ontologies and concepts.

Editing, Extending and Releasing New Versions of an Existing Ontology

Ruth Duerr (Ronin Institute), Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User Profile: Andrea Carter is an information systems engineer at the Roadrunner Science Technology Corp. Becky Stein is a data scientist whose background is in cryospheric science. Both are well-versed in RDF/OWL, set theory, and first-order logic.

Scenario: Once an earth scientist has located an ontology in a portal that matches the scientist’s interest, the scientist should be able to add new domain concepts to the ontology and modify existing concepts for improvement or correction. The changed ontology should be stored as a new version and should not simply replace the original version.

In order to act as a working testbed for an ontology, the ontology repository must include the concept of released versions of ontologies and working versions where the advertised and stable URL's point to the lastest release not the latest working copy.

Specifically related to this Scenario, Ms. Carter has been working with Dr. Stein and other Earth science subject matter experts to develop an ontology in RDF for the Cryosphere. She has recently received approval to publicize the ontology and would like to put it in the ESIP repository, in order to make it available for the broad community of ESIP members. However, Ms. Carter is aware that, just like any code, the ontology is likely to undergo changes as functional and/or technical requirements change, and domain knowledge increases. She and Dr. Stein expect to make periodic changes to the ontology, and hope to encourage other subject matter experts, data scientists, and semantic technology developers to contribute to the ontology. Accordingly, some contributors to the ontology may not be well-versed in RDF/OWL and will want to edit it via a user-friendly interface. Moreover, because changes to the ontology have the potential to cause problems for applications that are using it, it will be necessary to ensure that updates to the ontology are managed under a version control system.

Workflow:

  1. Ms. Carter logs into the ontology portal.
  2. She selects the “upload a new ontology” option.
  3. She uploads one or more files in one of several rdf encoding formats (e.g., ttl, n-triple) which comprise an ontology of the Cryosphere.
  4. The portal logs the date and time that the ontology was uploaded, and its state.
  5. After uploading, Ms. Carter views the ontology in a browser that allows her to see the class structure, view properties of classes, view any instances of classes, and view properties and their properties.
  6. After verifying that the ontology has been properly uploaded, Ms. Carter “publishes” the ontology, thereby making it available to anyone with access to the portal. She exits the portal.
  7. The following week, Dr. Stein logs in to the ontology portal and retrieves the ontology of the Cryosphere uploaded by Andrea Carter.
  8. The portal displays the ontology, giving Dr. Stein the opportunity to browse or edit it.
  9. Dr. Stein selects the edit ontology option.
  10. Dr. Stein makes various changes to the ontology and saves them.
  11. The ontology portal records the changes, logs the date and time of the changes and the author of the changes.
  12. The portal prompts Dr. Stein to either publish a new version of the ontology or save it as a work in progress.
  13. Dr. Stein
    • Saves it as work in progress, or
    • Publishes it as a new version that can be accessed by anyone.
  14. The ontology portal
    • Prompts Dr. Stein to log out, or
    • Prompts Dr. Stein to inidicate whether the previous version of the portal should remain publically available.
  15. Dr. Stein
    • Logs out, or
    • Indicates the previous version should remain publically available, or
    • Indicates the previous version should no longer be publically available.

Requirements implied by this use case:

  1. There is a user authentication system.
  2. A UI and/or API that enables users to upload ontology files (in a variety of formats?)
  3. A UI that allows users to view an existing ontology.
  4. A UI that allows users to edit an existing ontology.
  5. A version control system.

,

,,,,


Requirements

This chapter lists the requirements for the deliverables of the STC, in alphabetical order.

In some requirements the expression 'recommended way' is used. This means that a single best way of doing something is sought. It does not say anything about the form this recommended way should have, or who should make the recommendation. A recommended way could be a formal or community recommendation or standard from an authoritative body like ESIP, OGC or W3C, but it could just as well be a more informal specification, as long as it is arguably the best way of doing something.

A User Authentication System

There is a user authentication system.

Ontology Upload Mechanism

A UI and/or API that enables users to upload ontology files (in a variety of formats?)

UI for Viewing Existing Resources

A UI that allows users to view an existing ontology.

UI for Editing Existing Resources

A UI that allows users to edit an existing ontology.

Version Control

A version control system.

API Documentation

It is absolutely essential that developer-level API documentation is readily available alongside a SRI such that developers can easily develop client applications around the portal.

Requirements by deliverable

For convenience, this chapter lists requirements grouped by STC deliverable.

Acknowledgements

The editors are grateful for all contributions made to this document, in particular the contributors of the use cases and the all the members of the STC that helped with deriving and formulating requirements.

Appendix A: History of changes

Changes between the First Public Working Draft and Second Public Working Draft

A summary of the main changes between the First Public Working Draft (published 2015-07-23) and the Second Public Working Draft (this version) of this document:

Changes between the Second Public Working Draft and Third Public Working Draft

A summary of the main changes between the Second Public Working Draft (published 2015-12-17) and the Third Public Working Draft (this version) of this document: