Semantic Technologies Group Semantic Repository Implementation (SRI) Use Cases & Requirements (UCR)

This document captures Use Cases and Requirements (UCR) for the storage, access, governance, long term preservation and availability of Semantic Web resources via the ESIP Community Ontology Repository (COR). Artifacts to be stored within COR include ontologies, vocabularies, linked data resources, etc. which are of value to the Earth Science community. This document underpins the collaborative work of the Semantic Technologies Committee (STC) as operated by ESIP.

Use Cases

Use cases that describe current problems or future opportunities for the use of Semantic Web resources within the ESIP community have been gathered by the STC. They were mainly contributed by members of STC, but there were also contributions from other interested parties. In this chapter these use cases are listed and identified. Each use case is related to one or more STC deliverables and to one or more requirements for future deliverables.

Use of Semantics within Search Engines

Lewis John McGibbney, NASA JPL, Beth Huffer (Lingua Logica)

▶ Full use case description (click to expand):

The ability to use defined terminology dervied from domain vocabularly has the potential to improve certain types of information retreival tasks. As an example one can imagine a user engaging in a typical search scenario where a query is entered into a search engine interface and a ranked list of results are returned for the query. Domain semantics through use of terms and vocabulary can be utilized to augment/refine the users query with the aim of retreiving more relevant content for the user query.

User profile: A software developer engaged in the development of search tools for the Earth science community. Assume the developer is not familiar with semantics or ontologies.

Scenario: In order to improve the relevancy of search results, the developer of the ACME Earth Science Search Service develops a capability whereby ACME finds standard terms for the search term entered by the ACME user and uses those standard terms to augment the user’s own search term.

Workflow:

A user enters a search term into the ACME search service.
ACME calls the ESIP Ontology portal and finds standard terms that have the same meaning as the term entered by the user.
The ESIP Ontology Portal sends back one or more terms (from one or more ontologies?) matching the ACME user’s input.

Requirements implied by this use case:

The ontology portal has an api via which the ACME system can submit the term to be matched.
The ontology portal can semantically match terms received as input to terms in the ontologies stored there.
The ontology portal can return a set of matching terms to the requesting application.

Browsing a Portal for a Relevant Ontology

Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User profile: Dr. Jane Anderson is a researcher investigating marine ecosystems. She is gathering data on properties of sea water. She maintains a personal database on her laptop computer in which she records the values for salinity and parts-per-million of manganese.

Scenario: Although the data Dr. Anderson is collecting on sea water is initially being recorded in a private database, she hopes later to publish her data, relate it to data collected by other researchers, and publish her results. In order to ensure that her own data can be discovered and that it will be semantically interoperable with that of other researchers, she would like to use standard terms for data elements and their attributes. She browses the ontology portal to find a standard vocabulary for sea water properties.

Workflow:

Dr. Anderson links to the ontology portal home page.
She enters “sea water” into the search dialogue.
The ontology portal returns a set of terms that match “sea water”, with links to the ontologies/vocabularies in which they are found.
Dr. Anderson selects one of the ontologies.
The ontology portal displays information about the term as it is recorded in the selected ontology and displays related terms.
Dr. Anderson then continues searching within the selected ontology, or opens a different linked ontology, for additional terms that are appropriate for her database.

Requirements implied by this use case:

The ontology portal provides the capability of searching across all of the ontologies it stores.
There is a user interface and/or api that accepts a search term as input and returns appropriate results.
There are links among related concepts within an ontology.

Matching Concepts among Ontologies

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

Each concept in an ontology should be mapped to concepts it matches in other ontologies. Exact matches based on string matching of concept names should be provided automatically by the portal. The portal should also support matches entered manually.

Annotating Text

Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User Profile: Roger Brown is a scientist at a prominent University.

Scenario: He recently completed a study on the relationship between x, y and z. The paper he wrote reporting on the results of his study has been accepted for publication in an online journal. The journal requires authors to provide annotations for technical terms found in the document, so that readers can easily access the definitions of such terms. The annotations are especially important because many of the terms used in Dr. Brown’s paper have specialized meanings that are peculiar to his area of research and could easily be misinterpreted by researchers in other disciplines or areas of interest. Annotations are also valuable aids for students.

Workflow:

Dr. Brown accesses the ontology portal’s text annotation tool.
The annotation tool prompts him to upload a text document or enter text directly.
Dr. Brown uploads his document.
The annotation tool prompts Dr. Brown to either select one or more particular ontologies to work from, or select all ontologies.
Dr. Brown selects some ontologies (or selects all ontologies).
The text annotator identifies terms in the uploaded document that match concepts in the selected ontologies.
The text annotator returns a list of concepts from the selected ontologies and indicates the term(s) in the text that the ontology concepts matched, along with information about the ontology in which the concept is found.
Dr. Brown reviews the concept-term matches suggested by the annotation tool and, for each term matched, he indicates whether or not he wants to annotate it with the suggested ontology concept.
The annotation tool inserts hyperlinks to the selected ontology concepts into the text.

Requirements implied by this use case:

The ontology portal includes an annotation tool.
The annotation tool has a UI and/or API that enables users to access the annotation tool.
The annotation tool is able to accept text as input either by uploading a document or by entering text directly.
The annotation tool is able to identify terms in the text that match ontology concepts.
The annotation tool is able to display the extracted terms along with the concepts/ontologies to which they could be mapped.
The annotation tool is able to accept input from users accepting or rejecting suggested matches.
The annotation tool is able to mark up a text document with appropriate hyperlinks.

Subsetting Ontologies into Projects

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

If (someday) there are large numbers of ontologies in the portal, the portal should support a means to identify subsets of ontologies that can be searched and viewed separately.

User Access

Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

A portal should provide both a GUI and a SPARQL endpoint for accessing its functionality and its stored ontologies and concepts.

Editing, Extending and Releasing New Versions of an Existing Ontology

Ruth Duerr (Ronin Institute), Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)

▶ Full use case description (click to expand):

User Profile: Andrea Carter is an information systems engineer at the Roadrunner Science Technology Corp. Becky Stein is a data scientist whose background is in cryospheric science. Both are well-versed in RDF/OWL, set theory, and first-order logic.

Scenario: Once an earth scientist has located an ontology in a portal that matches the scientist’s interest, the scientist should be able to add new domain concepts to the ontology and modify existing concepts for improvement or correction. The changed ontology should be stored as a new version and should not simply replace the original version.

In order to act as a working testbed for an ontology, the ontology repository must include the concept of released versions of ontologies and working versions where the advertised and stable URL's point to the lastest release not the latest working copy.

Specifically related to this Scenario, Ms. Carter has been working with Dr. Stein and other Earth science subject matter experts to develop an ontology in RDF for the Cryosphere. She has recently received approval to publicize the ontology and would like to put it in the ESIP repository, in order to make it available for the broad community of ESIP members. However, Ms. Carter is aware that, just like any code, the ontology is likely to undergo changes as functional and/or technical requirements change, and domain knowledge increases. She and Dr. Stein expect to make periodic changes to the ontology, and hope to encourage other subject matter experts, data scientists, and semantic technology developers to contribute to the ontology. Accordingly, some contributors to the ontology may not be well-versed in RDF/OWL and will want to edit it via a user-friendly interface. Moreover, because changes to the ontology have the potential to cause problems for applications that are using it, it will be necessary to ensure that updates to the ontology are managed under a version control system.

Workflow:

Ms. Carter logs into the ontology portal.
She selects the “upload a new ontology” option.
She uploads one or more files in one of several rdf encoding formats (e.g., ttl, n-triple) which comprise an ontology of the Cryosphere.
The portal logs the date and time that the ontology was uploaded, and its state.
After uploading, Ms. Carter views the ontology in a browser that allows her to see the class structure, view properties of classes, view any instances of classes, and view properties and their properties.
After verifying that the ontology has been properly uploaded, Ms. Carter “publishes” the ontology, thereby making it available to anyone with access to the portal. She exits the portal.
The following week, Dr. Stein logs in to the ontology portal and retrieves the ontology of the Cryosphere uploaded by Andrea Carter.
The portal displays the ontology, giving Dr. Stein the opportunity to browse or edit it.
Dr. Stein selects the edit ontology option.
Dr. Stein makes various changes to the ontology and saves them.
The ontology portal records the changes, logs the date and time of the changes and the author of the changes.
The portal prompts Dr. Stein to either publish a new version of the ontology or save it as a work in progress.
Dr. Stein
- Saves it as work in progress, or
- Publishes it as a new version that can be accessed by anyone.
The ontology portal
- Prompts Dr. Stein to log out, or
- Prompts Dr. Stein to inidicate whether the previous version of the portal should remain publically available.
Dr. Stein
- Logs out, or
- Indicates the previous version should remain publically available, or
- Indicates the previous version should no longer be publically available.

Requirements implied by this use case:

There is a user authentication system.
A UI and/or API that enables users to upload ontology files (in a variety of formats?)
A UI that allows users to view an existing ontology.
A UI that allows users to edit an existing ontology.
A version control system.

,,,,

Upload Large Resources

Blake Regalia (NASA JPL)

▶ Full use case description (click to expand):

Upon loading a large dataset (~320K triples) the user immediately realized some errors with the IRI prefixes in the dataset and regenerated/uploaded a few revisions in quick succession. These large datasets had an impact on system resources in that COR (or more specifcally the JVM running on one of the containers) ran our of memory.

Updating Ontology Metadata

Ruth Duerr (Ronin Institute)

▶ Full use case description (click to expand):

After it was pointed out to me that I had a typo in the dc:title of the Academic Disciplines Ontology I tried to fix that issue by using the COR Edit new version -> Edit metadata facility. However, none of my metadata showed up, so could not be edited.

In particular there was no dc:title field in the list of available fields.

Attempts to update it using the omv:name field on the form failed to update the dc:title field which remained unchanged.

I think that if there is a term in the ontology that is a well-known term (like dc:title), that's what COR should display in its corresponding field. Why make people enter data twice. This also has the advantage of forcing agreement on standard ways of annotating ontologies! It forces and implicit "same as" on terms between two sets of annotations! Badly needed!!!

Use of Ontology Information in Data Processing Workflows

Tristan Wellman (Science Analytics and Synthesis, U.S. Geological Survey)

▶ Full use case description (click to expand):

A base ontology is created to describe term identifiers, labels, and definitions, which are used for processing data records through OBIS-USA and NOAA NCEI. ESIP COR provides a stable, publically-available endpoint used in the data processing workflow. As part of the workflow, basic ontology information and external supplementary information describing each variable (term) are infused as metadata into NetCDF data files. Real-time feedback could be useful to ensure variable information and ontology information continuously align. As terms are added or modified, ontology versioning is needed to support historical data products which reference this resource.

User Profile: A user or institution that expects to evolve ontology records in an automated workflow and requires reproducibility of the resulting data products that use ontology information.

Scenario: An institution in the Earth science community uses semantic vocabularies stored on public endpoints to describe scientific terms and variables in their data products. When these data products are created or revised ontologies should be updated in step. Versioning should be used to reproduce vocabulary information used in historical case studies.

Workflow:

A code-driven analysis package is activated to process a collection of data files.
A series of quality control and processing functions are conducted in the processing workflow.
A processing function calls ESIP COR to match vocabulary terms defined within the cached ontology.
Additional variable (term) information, such as variable type, units, and alias name are retrieved to enhance default information.
Where vocabulary terms are new or vocabulary information has been revised or enhanced, the ESIP COR instantiation is updated to include the latest publically-available scientific information, potentially in real-time.

Requirements implied by this use case:

The ontology portal has automated versioning capabilities used to preserve ontology definitions in real time. Ontologies can be retrieved by version at user request.
The ontology portal allows authenticated users to update, create, or delete ontologies using a simple API, perhaps generating a modified temporary ontology while preserving the original parent ontology until a review has been completed.

,,,

Introduction

Deliverables

Development of COR Use Cases and Requirements

Development of a Community Governance Model for the Semantic Web for Earth and Environmental Terminology (SWEET) Vocabulary

Methodology

Use Cases

Use of Semantics within Search Engines

Browsing a Portal for a Relevant Ontology

Matching Concepts among Ontologies

Annotating Text

Subsetting Ontologies into Projects

User Access

Editing, Extending and Releasing New Versions of an Existing Ontology

Upload Large Resources

Updating Ontology Metadata

Use of Ontology Information in Data Processing Workflows

Requirements

COR shall provide a user authentication system

COR shall provide an ontology upload mechanism

COR shall enable viewing of existing resources

COR shall enable editing of existing resources

COR shall provide a version control management capability

API documentation shall be provided alongside COR

COR shall facilitate an upload mechanism for large resources

Requirements by deliverable

Acknowledgements