Requirements for an Ontologies Mapping Tool

Document Information

Project Title:Requirements for an Ontologies Mapping Tool

Brief Project Description:

Document the functional and non-functional requirements for an Ontologies Mapping Tool.

V1.1 incorporates feedback from the Steering committee. 21st Dec 2015

V1.2 edits have been made to align with the RFI process to evaluate existing tools.

Prepared by:The Pistoia Alliance Ontologies Mapping Project team
Date:28th April 2016
Version:1.2

Contents

A. Purpose and Motivation

B. In Scope

Functional Requirements

Non-Functional Requirements

C. Out of scope

D. Details for the Functional Requirements

Requirements

A. Purpose and Motivation

This document describes the requirements for an Ontologies Mapping Tool. It is envisioned that this tool will be used in conjunction with the Pistoia Alliance Guidelines for Best Practise and checklist to support Ontologies Application and Mapping in the life sciences. Adoption of the guidelines will facilitate interoperability and mapping between ontologies undertaken using the tool being defined as requirements in this document.

Existing mapping tools can be compared to the requirements described in this document as part of the RFI process described in the accompanying document. Successful funding for the 2016 phase of this Pistoia Alliance project allows support for existing ontologies mapping tools which substantially satisfy these requirements.

The typical users of this mapping tool are expected to be specialist ontology engineers, information and data scientists who are expert in enterprise search, data integration and/or knowledge management. Such expert users are likely to be familiar and comfortable with ontology sources and editors.

B. In Scope

Functional Requirements

1. User Interface

1.1. Visualisation of ontological space

1.1.1. Numerous view options

1.2. Mapping Alignment Editor

1.2.1. Improving Alignments - includes support for manual modification and refinement procedures such as thresholds to trim alignments

1.2.2. Matching correspondence - Ability to specify if a term is exactly the same, more specific (subclass) or more general (super class)

1.2.3. Edit mapping suggestions - Review and accept or reject suggested mappings based on similarity of URLs or Labels

1.2.4. Ability to align to an upper level ontology - to which other ontologies can be aligned

2. Framework

2.1. Workflow and Evaluation

2.1.1. Workflow

2.1.2. Evaluation metrics

2.2. Ontology Matching algorithms

2.2.1. Supports extensibility

3. Mappings

3.1. Import

3.1.1. Import equivalence mappings

3.1.2. Import source ontologies or vocabularies

3.1.3. Use of external data sources

3.2. Export

3.2.1. Export equivalence mappings

3.2.2. Mapping metadata and documentation

Non-Functional Requirements

1. License restrictions

Any license restrictions of the tool must have no effect on the freedom to exploit the output mappings. The specific terms and conditions of a tool's license must be clear and understandable. This is especially important for commercial tools or those which have restrictions for commercial users, whilst having unrestricted use by academic users.

2. Current Availability and Maintenance

The mapping tool must be currently available through a web site. This also means it must be evident that the tool is maintained by a team of developers. There must be an active contact point or mailing list. There must also be a statement of maintenance of the tool corresponding to the period of investment.

3. Standalone and web service

The mapping tool must be available as a standalone code which can be installed locally (Windows, Mac or Linux) and as a web service which uses appropriate standards for secure access.

These requirements are listed in a template sheet designed to capture tool capabilities. See (public) wiki page.

C. Out of scope

  1. The tool will not create new ontologies. Instead it will be used to generate mappings between existing ontologies.
  2. Source ontologies will not be changed. For example, original identifiers will be preserved in any mappings produced by this tool. Edits to source ontologies can be done with an existing ontology editor, such as Protege.
  3. The tool will not create new algorithms for ontology matching. Instead it will make use of existing ontology matching algorithms.

D. Details for the Functional Requirement

The vision for the requirements of the Ontology Mapping Tool is illustrated in Figure 1. It consists of three major aspects; 1) User Interface 2) Framework and 3) Mappings Import and Export, which are detailed in the remainder of this document. The interaction between 1) User Interface and 2) Framework components is likely to be mediated by an Automated Programmable Interface (API).

Figure 1: Vision for the Tool Requirement

 1. User Interface

1.1. Visualisation of ontological space.

 The user interface will include a high level view of ontologies relevant to a particular domain.

 1.1.1. Numerous view options

    • Ontology classes and/or instances (names) are displayed as a simple list which can be sorted alphabetically.
    • Ontology classes and/or instances (names) are shown as pillars (vertical) and layers (horizontal).
    • Ontology classes and/or instances (names) are displayed as a hierarchical tree which can also show relationships in an expandable form.
    • Ontology classes and/or instances (names) are  displayed as a two dimensional matrix.
    • Show any available metadata for source ontologies or mappings (e.g. version, data, contact etc.)
    • Show ontology metrics (e.g. number of classes, properties etc.)
    • Visualise areas of overlap between source ontologies

The purpose of this high level visualisation is to enable the user to understand ontology coverage and delineation. An example of ontology delineation is reference vs. application ontologies where the latter will re-use parts of reference ontologies. Application ontologies may span multiple domains to support a particular activity such as experimental investigation.

        Finally, the user interface will provide the facility to drive import existing mappings and export of edited mappings (see section 3 for further detail).

1.2. Mapping Alignment Editor

The user interface will include a mapping alignment editor which will display and allow modification of how individual instances or elements are aligned and the local structure between aligned elements in each ontology. The mapping alignment editor will need to support the ability to control the level of detail displayed.

However, although the mapping alignment editor will share the display features of an ontology editor, it will not allow curation of source ontologies, only viewing. Ontology curation is a separate process which should be undertaken using an ontology editor such as Protégé or OBO edit and would result in a new version of the source ontology, which falls out of scope of this project.

 1.2.1 Improving Alignments

The mapping alignment editor will support manual modification to improve the alignment of elements in two selected ontologies. It will also enable application of refinement procedures such as selecting correspondences by applying thresholds to trim an alignment or trim the source ontologies being aligned by the matching algorithm.

Following ontology matching the alignment will need to be checked for consistency. Matching algorithms (described later) will be selected and run with modifying parameters which can also be changed in an iterative manner to improve the alignment.

Some matching approaches are based on non-lexical features, such as usage (for instance, being consistently used to specifically annotate the same entities). In this case the tool should also present the "evidence" for the matching, based on whatever knowledge the algorithm makes use of.

 1.2.2. Matching correspondence

 The mapping alignment editor will have the ability to specify if a term is exactly the same, more specific (subclass) or more general (super class).

 1.2.3. Edit mapping suggestions

The alignment editor will support the facility to accept or reject the suggested mappings between two matching ontologies. Simple equivalence matching between ontologies are likely to be based on similarity of URLs/Labels. In addition, numerous other ontology matching algorithms will be incorporated as described in section 2.  

1.2.4. Tracking of modifications

Manual modification to an existing match should be stored with some indication of provenance. This is for two reasons: we want to keep this "curated" information if a new algorithm is run. We want to capture some idea of the context or assumptions under which the mapping was provided.

1.2.5. Definition of context

We should be able to specify a context under which a mapping makes sense. Contexts are probably hard to define, but some indication of the scope for which some mapping is designed would be useful and stored as metadata for the mapping.

2. Framework

2.1. Workflow and Evaluation

Interaction between the user interfaces and workflow components is likely to be through an automated programmable interface (API). The tool framework will be designed to support the ontology mapping (or alignment) life cycle. This ontology management activity reflects how ontologies evolve so mapping (or matching) between them must follow this evolution too. The dynamic perspective of the Ontology Mapping life cycle is illustrated in Figure 2.

Figure 2. The Ontology Mapping life cycle

2.1.1. Workflow

Support for the ontology mapping life cycle is reflected in the generalised workflow shown in Figure 3. The ontology mapping tool should support users adopting this methodology workflow as a core requirement.

The workflow begins with characterising the need, selecting existing mappings, selecting appropriate matching algorithms (matchers), running them, evaluating the results and correcting choices made (matchers and parameters).

There is also an implicit link between exploitation and the mapping process. Once a mapping is shared with users, they can generate external feedback which must be gathered. This external input will be evaluated and used to improve the mapping through the alignment editor which should accept this feedback easily (e.g. options like correct or annotate mapping elements). 

Figure 3: The workflow for ontology mapping methodology

2.1.2. Evaluation Metrics

The tool will provide metrics for evaluation of imported mappings and those generated by the ontology matching algorithm(s) which is displayed in the mapping alignments editor as shown in Figure 3. Evaluation will be performed automatically or by manual inspection where graphical display of alignment is important. Manual evaluation would be expected to have some clearly defined criteria for objectiveness. Automated evaluation requires access to a suitable reference mapping (not used by the matching algorithm) and extracting samples from the alignment results and computing measures like precision, recall and inconsistency to give an approximation of correctness and completeness. The mapping tool evaluation metrics will support assessment of precision and recall with respect to "locally provided" gold standards. This will be tested by the RFI process for evaluation of existing mapping tools.

Element Similarity

The tool will implement a recognised measure of similarity between two sets of elements. For example, the Hamming distance counts the join correspondences with regard to the overall correspondence of both sets.

Precision and Recall

The tool will support the measures of precision and recall which originate from information retrieval and can be adapted for ontology matching. They are based on the comparison of a generated mapping to a reference mapping alignment, effectively showing which correspondences are discovered (precision) and those that are missed (recall).

Precision measures the ratio of correctly found correspondences (true positives) and which are not (false positives). This provides a logical measure of correctness in a mapping alignment.

Recall measures the ratio of correctly found correspondences (true positives) over the total number of expected correspondences (true positives and false negatives i.e. missed). This provides a logical measure of completeness in a mapping alignment.

Logical consistency

The tool will support the detection of logical inconsistencies that can be inferred from the structure of the alignment between two ontologies, such as logical disjoints.

Composite measures

The framework of the tool will be designed to accommodate numerous algorithms for ontology matching. The user interface of the tool will need to show a composite view (e.g. dashboard?) of measures of similarity, correctness, completeness and consistency for 1) the runs of the same algorithm with different parameters and 2) runs of different algorithms and their algorithm settings.

2.2. Ontology Matching algorithms.

2.2.1. Supports extensibility

The tool framework will be designed so that it supports extendibility to allow inclusion of one or more algorithms for ontology matching. The minimal requirement is for the tool implementation to include at least one algorithm, based on matching of classes and/or elements two source ontologies.

The ontology matching algorithms used by the mapping tool should be well documented, preferably through peer-reviewed publication. Another source of algorithm documentation could be through the Ontology Matching Evaluation Initiative (http://www.ontologymatching.org) which hosts annual campaigns (e.g. http://oaei.ontologymatching.org/2015). The tool documentation should give an explanation of the methods used by the matching algorithm(s) and any additional data or information that is used.

Algorithms which implement matching techniques operate at the level of ontology element or structure. Table 1 provides a brief description to illustrate the range of techniques available. This summary is derived from the book entitled ‘Ontology Matching’ 2nd Edition 2013 by Euzenat and Shvaiko, (http://book.ontologymatching.org) where further details can be found.

Table 1: Summary of techniques used by Ontology Matching algorithms

Element-levelString-basedOften used to match names, identifiers and name descriptions of ontological entities.
 Language-basedConsider names as words in some natural language such as English.
 Constraint-basedDeal with internal constraints applied to definitions of entities such as types, multiplicity of attributes and keys.
 Informal resource-basedDeduce relations between ontology entities based on how they relate to each other.
 Formal resource-basedMake use of formal resources such as domain-specific ontologies, upper ontologies and linked data.
Structure-levelGraph-basedCompare source ontologies (including database schemas and taxonomies) as nodes on labelled graphs.
 Taxonomy-basedHierarchical classifications consider only the specialisation relation.
 Model-basedMatch source ontologies based on semantic interpretation.
 Instance-basedCompare sets of instances of classes to decide if they match or not.

3. Mappings

3.1. Import

In preparation for import of existing mappings, it is important to consider existing ontologies that are relevant to support 1) a particular domain such as disease and phenotype or 2) a particular activity such as experimental investigation which can span multiple domains. It is also important to understand and characterise the application need because this affects the choice of ontology mapping algorithm and the setting of their parameters.

3.1.1. Import equivalence mappings

The tool will support the import of any existing equivalence mappings between ontologies relevant to the characterised need.

3.1.2. Import source ontologies or vocabularies

The tool will support the import of existing source ontologies or vocabularies for mapping to another ontology or mapping e.g. Vocabulary such as MeSH.

3.1.3. Use of external data sources

Sources like sameas.org can be utilised for string-based techniques implemented as an algorithm. When the tool exploits external data sources this needs to be documented in detail. The value this brings to the ontology matching result must be shown clearly rather than being hidden. For example, it would be useful to include an option to run the algorithm with and without use of external data sources.

3.2. Export

The tool will export mappings to enable exploitation and sharing of ontologies mapping. Annotation of metadata for the mappings will also be supported by the tool.

3.2.1. Export equivalence mappings

The tool will need to support rendering the mapping for exploitation and sharing using a suitable alignment language or format such as OWL, SKOS, Alignment and EDOAL. The latter two, Alignment format and Expressive and Declarative Ontology Alignment Language (EDOAL) provide specialised representation of mapping alignments.  

3.2.2.  Mapping Metadata and Documentation

The tool will need to support annotation of an ontologies mapping with suitable metadata and documentation. This is an important aspect which supports exploitation through sharing and reuse of ontology mapping alignments. The purpose of the annotation is to make the process of alignment generation easier to understand.

The metadata for ontology mapping alignment will include:-

    • date of rendering the mapping alignment
    • method for ontology matching with version and parameters
    • description of purpose
    • creator name
    • curator identity and multiple contributions
    • alignment measures
    • manual curation and confidence
    • any limitations of use
    • names of the aligned, source ontologies
    • type of alignment (e.g. 1:1 or *.* correspondence)
    • seed alignment from which the alignment is derived
    • the application context
    • relevant external sources and references
    • any dependency across the mapping alignment
    • context and application supported by the mapping
    • link to a community resource to support the mapping
    • provenance and rationale for any manual curation
    • ability to extend the metadata model, as required