This document captures Use Cases and Requirements (UCR) for the storage, access, governance, long term preservation and availability of Semantic Web resources via the ESIP Community Ontology Repository (COR). Artifacts to be stored within COR include ontologies, vocabularies, linked data resources, etc. which are of value to the Earth Science community. This document underpins the collaborative work of the Semantic Technologies Committee (STC) as operated by ESIP.
This UCR document represents a collaborative effort by ESIP semantic technologists (and fellow stakeholders) to understand what current and future COR expectations and requirements actually are. The STC aims to engage and collaborate with external parties, on behalf of ESIP, with regards to semantic technologies. It can therefore be assumed that due to ongoing collaborations this document will be revised and updated as necessary. It's state should be considered dynamic.
The mission of the STC, is described in our Vision of the Semantic Technologies Committee. We have produced several use cases that aim to represent a voice for semantic technologists within the Earth Science community. From these use cases, a number of requirements for further work are derived. In this document, use cases, requirements and their relationships are described. Requirements and use cases are also related to the deliverables of the STC.
The requirements described in this document serve a dual purp;ose, a) they have been used to evaluate software candidates e.g. COR and b) provide a roadmap for development planning.
The deliverables of STC are tangible outcomes aligned with the 2015-2020 Strategic Plan Goals. For convenience those deliverables are replicated in this chapter. The charter remains the authoritative source of the definition of deliverables.
A document setting out the range of community problems and issues that the STC is trying to solve (this document).
The STC will work within NASA JPL to ensure transition of the SWEET Ontology from NASA JPL to ESIP where the STC will take community ownership of the resource. Within the scope of this deliverable the STC will address issues surrounding:
In order to find out the requirements for the deliverables of the STC, use cases were collected. A use case is a story that describes challenges with respect to semantic data for existing or envisaged information systems. It does not need to adhere to certain standardised format. Use cases are primarily used as a source of requirements.
The STC has derived requirements from the collected use cases. A requirement is something that needs to be achieved by one or more deliverables and is phrased as a specification of functionality. Requirements can lead to one or more tests that can prove whether the requirement is met.
Use cases that describe current problems or future opportunities for the use of Semantic Web resources within the ESIP community have been gathered by the STC. They were mainly contributed by members of STC, but there were also contributions from other interested parties. In this chapter these use cases are listed and identified. Each use case is related to one or more STC deliverables and to one or more requirements for future deliverables.
Lewis John McGibbney, NASA JPL, Beth Huffer (Lingua Logica)
The ability to use defined terminology dervied from domain vocabularly has the potential to improve certain types of information retreival tasks. As an example one can imagine a user engaging in a typical search scenario where a query is entered into a search engine interface and a ranked list of results are returned for the query. Domain semantics through use of terms and vocabulary can be utilized to augment/refine the users query with the aim of retreiving more relevant content for the user query.
User profile: A software developer engaged in the development of search tools for the Earth science community. Assume the developer is not familiar with semantics or ontologies.
Scenario: In order to improve the relevancy of search results, the developer of the ACME Earth Science Search Service develops a capability whereby ACME finds standard terms for the search term entered by the ACME user and uses those standard terms to augment the user’s own search term.
Workflow:
Requirements implied by this use case:
Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)
User profile: Dr. Jane Anderson is a researcher investigating marine ecosystems. She is gathering data on properties of sea water. She maintains a personal database on her laptop computer in which she records the values for salinity and parts-per-million of manganese.
Scenario: Although the data Dr. Anderson is collecting on sea water is initially being recorded in a private database, she hopes later to publish her data, relate it to data collected by other researchers, and publish her results. In order to ensure that her own data can be discovered and that it will be semantically interoperable with that of other researchers, she would like to use standard terms for data elements and their attributes. She browses the ontology portal to find a standard vocabulary for sea water properties.
Workflow:
Requirements implied by this use case:
Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)
Each concept in an ontology should be mapped to concepts it matches in other ontologies. Exact matches based on string matching of concept names should be provided automatically by the portal. The portal should also support matches entered manually.
Line Pouchard (Purdue University), Beth Huffer (Lingua Logica), and Michael Huhns (University of South Carolina)
User Profile: Roger Brown is a scientist at a prominent University.
Scenario: He recently completed a study on the relationship between x, y and z. The paper he wrote reporting on the results of his study has been accepted for publication in an online journal. The journal requires authors to provide annotations for technical terms found in the document, so that readers can easily access the definitions of such terms. The annotations are especially important because many of the terms used in Dr. Brown’s paper have specialized meanings that are peculiar to his area of research and could easily be misinterpreted by researchers in other disciplines or areas of interest. Annotations are also valuable aids for students.
Workflow:
Requirements implied by this use case:
Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)
If (someday) there are large numbers of ontologies in the portal, the portal should support a means to identify subsets of ontologies that can be searched and viewed separately.
Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)
A portal should provide both a GUI and a SPARQL endpoint for accessing its functionality and its stored ontologies and concepts.
Ruth Duerr (Ronin Institute), Line Pouchard (Purdue University) and Michael Huhns (University of South Carolina)
User Profile: Andrea Carter is an information systems engineer at the Roadrunner Science Technology Corp. Becky Stein is a data scientist whose background is in cryospheric science. Both are well-versed in RDF/OWL, set theory, and first-order logic.
Scenario: Once an earth scientist has located an ontology in a portal that matches the scientist’s interest, the scientist should be able to add new domain concepts to the ontology and modify existing concepts for improvement or correction. The changed ontology should be stored as a new version and should not simply replace the original version.
In order to act as a working testbed for an ontology, the ontology repository must include the concept of released versions of ontologies and working versions where the advertised and stable URL's point to the lastest release not the latest working copy.
Specifically related to this Scenario, Ms. Carter has been working with Dr. Stein and other Earth science subject matter experts to develop an ontology in RDF for the Cryosphere. She has recently received approval to publicize the ontology and would like to put it in the ESIP repository, in order to make it available for the broad community of ESIP members. However, Ms. Carter is aware that, just like any code, the ontology is likely to undergo changes as functional and/or technical requirements change, and domain knowledge increases. She and Dr. Stein expect to make periodic changes to the ontology, and hope to encourage other subject matter experts, data scientists, and semantic technology developers to contribute to the ontology. Accordingly, some contributors to the ontology may not be well-versed in RDF/OWL and will want to edit it via a user-friendly interface. Moreover, because changes to the ontology have the potential to cause problems for applications that are using it, it will be necessary to ensure that updates to the ontology are managed under a version control system.
Workflow:
Requirements implied by this use case:
Blake Regalia (NASA JPL)
Upon loading a large dataset (~320K triples) the user immediately realized some errors with the IRI prefixes in the dataset and regenerated/uploaded a few revisions in quick succession. These large datasets had an impact on system resources in that COR (or more specifcally the JVM running on one of the containers) ran our of memory.
Ruth Duerr (Ronin Institute)
After it was pointed out to me that I had a typo in the dc:title
of the Academic Disciplines Ontology I tried to fix that issue by using the COR Edit new version -> Edit metadata facility. However, none of my metadata showed up, so could not be edited.
In particular there was no dc:title
field in the list of available fields.
Attempts to update it using the omv:name
field on the form failed to update the dc:title
field which remained unchanged.
I think that if there is a term in the ontology that is a well-known term (like dc:title
), that's what COR should display in its corresponding field. Why make people enter data twice. This also has the advantage of forcing agreement on standard ways of annotating ontologies! It forces and implicit "same as" on terms between two sets of annotations! Badly needed!!!
Tristan Wellman (Science Analytics and Synthesis, U.S. Geological Survey)
A base ontology is created to describe term identifiers, labels, and definitions, which are used for processing data records through OBIS-USA and NOAA NCEI. ESIP COR provides a stable, publically-available endpoint used in the data processing workflow. As part of the workflow, basic ontology information and external supplementary information describing each variable (term) are infused as metadata into NetCDF data files. Real-time feedback could be useful to ensure variable information and ontology information continuously align. As terms are added or modified, ontology versioning is needed to support historical data products which reference this resource.
User Profile: A user or institution that expects to evolve ontology records in an automated workflow and requires reproducibility of the resulting data products that use ontology information.
Scenario: An institution in the Earth science community uses semantic vocabularies stored on public endpoints to describe scientific terms and variables in their data products. When these data products are created or revised ontologies should be updated in step. Versioning should be used to reproduce vocabulary information used in historical case studies.
Workflow:
Requirements implied by this use case:
This chapter lists the requirements for the deliverables of the STC, in alphabetical order.
In some requirements the expression 'recommended way' is used. This means that a single best way of doing something is sought. It does not say anything about the form this recommended way should have, or who should make the recommendation. A recommended way could be a formal or community recommendation or standard from an authoritative body like ESIP, OGC or W3C, but it could just as well be a more informal specification, as long as it is arguably the best way of doing something.
There is a user authentication system.
A graphical user interface and/or API that enables users to upload ontology files (in a variety of formats?)
A graphical user interface or REST API that allows users to view an existing ontology.
A graphical user interface or REST API that allows users to edit an existing ontology.
A version control system.
It is absolutely essential that developer-level API documentation is readily available alongside a SRI such that developers can easily develop client applications around the portal.
COR shall provide a capability to upload large multi-GB resources.
For convenience, this chapter lists requirements grouped by STC deliverable.
The editors are grateful for all contributions made to this document, in particular the contributors of the use cases and the all the members of the STC that helped with deriving and formulating requirements.