Requirements for an Ontologies Mapping service
Document Information
Project Title: | Requirements for an Ontologies Mapping service |
Brief Project Description: | The Ontologies Mapping Tool Project specify the requirements for an Ontologies Mapping service in this document. |
Prepared by: | Ian Harrow |
Date: | 15th Dec 2016 |
Version: | 1.0 |
Contents
1.1 Introduction
1.2 Common Business Scenarios
1.2.1. Mapping Public Ontologies
1.2.2. Mapping Internal and Public Ontologies
1.3. Expected Benefits of an Ontologies Mapping service
2.1. Scope for Service Requirements
2.2. Find existing ontologies and mappings
2.3. Evaluate the quality of source ontologies
2.4. Evaluate the quality of any existing mappings
2.5. Creation of new or improved pairwise mappings
2.6. Aggregation of pairwise mappings
2.7. Evaluate the quality of new mappings
2.8. Generate metadata to represent the methodology
2.9. Monitor the existing source ontologies or mappings
3.0. Seek new source ontologies or mappings
3.1. Feedback and consultation
3.2. Access to mappings
3.3. Ownership of the mappings
Service Requirements
1.1. Introduction
These Ontologies Mapping service requirements should be viewed as the definition of a standard for an enterprise quality service. It is expected that the service provider will make use of an ontologies mapping tool or ontologies matching algorithm as detailed in section 2.5. We describe each service requirement, their logical flow and how they relate to each other. Not all steps need necessarily apply to all possible business scenarios, although generally these requirements follow the workflow for the creation and maintenance of ontology mappings shown in Figure 1. The detail will depend on a client organisation's particular service requirements as driven by specific applications.
Figure 1: The workflow for the creation and maintenance of ontology mappings
Specific use cases, provided by the client organisation could be shared with the service provider, in confidence if desired. Feedback and consultation with the client's users is extremely important (see section 3.1). Here we describe two common business scenarios to illustrate the value of the mapping service:-
1.2. Common Business scenarios
1.2.1. Mapping between Public Ontologies
The need to map between public ontologies is a relatively straightforward business scenario. This is likely to begin with selection of core reference ontologies in a particular data domain which are best suited to meet the requirements of the user. The process of selection will benefit from the application of the ontologies guidelines, developed by the OM project. The selected ontologies will be mapped to each other to identify equivalent and similar matchings so that the selected ontologies can be used in conjunction with the mappings to gain maximum coverage at high quality. Such mappings will need to be maintained on a regular basis to reflect the changes in the source ontologies. These mappings of public ontologies have potential for sharing with the wider community, provided public funding can be found to sustain this. Client companies who benefit from these mappings are not expected to require secure access to them.
1.2.1. Mapping between Public and Internal Ontologies
Mapping between public and internal ontologies is going to be a much more challenging business scenario. It is likely to begin with selection of public and internal ontologies in a particular data domain, which are best suited to meet the requirements of the client organisation. This selection will benefit from the application of the ontologies guidelines, as developed by the OM project. The toughest challenges are likely to be caused by the assignment of identifiers in the internal ontologies. Ideally the client organisation has made use of public identifiers (IRIs) or at least used them in addition to internal identifiers. While internal identifiers do not prevent the operation of ontology algorithms, it is likely to be much more costly to maintain the mappings because such internal identifiers need to be administered locally and governed by internal change control. Internal identifiers that are not managed actively should be avoided because they are so fragile, making them likely to break easily. Beyond managing internal identifiers, mapping internal and public ontologies should be executed by a similar workflow so that the selected ontologies can be used in conjunction with the mappings to gain maximum coverage at high quality. It is possible that these mappings to internal identifiers are sensitive to a client organisation's representation of intellectual property. If this is the case, then an option is required to implement secure access to such mappings as part of the service.
1.3. Benefits of an Ontologies Mapping service
Saves internal resource (n=17) and Enables better results (n=13) were by far the most important benefits to emerge from the question: What is the most important benefit you would gain from an external ontologies mapping service? This result was obtained from a recent online questionnaire which surveyed Pistoia Alliance members and the Community of interest (a total 36 responses were received).
2.1. Scope for Service Requirements
The Service Level Agreement (SLA) for this mapping service is defined in a separate accompanying document.
In scope: Mapping (=matching) equivalence and similarity between ontologies in the same data domain. Ontologies can include hierarchical relationships; taxonomies; classifications and/or vocabularies. The methodology for mapping used by the service must be described clearly (see sections 2.5 and 2.8).
Out of scope: Mapping between ontologies in different data domains. This could gain scope for an extended service but it would require additional data (e.g. from experimental studies) and algorithms (e.g. natural language processing) to find evidence for the relationships for matches between the different data domains.
2.2. Find existing ontologies and mappings
The service will find existing source ontologies and mappings that are relevant to the data domain(s) of interest in consultation with the client(s). Any further expectations for finding ontologies and mappings will be determined through consultation with the client(s). It will be important to understand and comply with any license restrictions for the source ontologies and mappings with respect to their usage for generating new mappings and to share this understanding in detail with the client(s).
2.3. Evaluate the quality of source ontologies
The service will use the Pistoia Alliance Ontologies Guidelines for Best Practice, available on a public wiki, in order to understand the quality of each source ontology and to determine if they are "fit for purpose". Here, the immediate purpose is mapping between the ontologies in consultation with the client(s) and their user requirements.
2.4. Evaluate the quality of any existing mappings
The service will evaluate the quality of mappings which will report the following aspects:-
- Manual assessment including comparison with relevant reference mappings
- Inspection of perfect matches. Are they all found? Are any missed out correctly? What is missed?
- Evaluation of similar matches based on synonyms
- Evaluation based on axiom similarity (same parents, came class descriptions)
The service will generate measures of quality which will include 1) confidence for the match and 2) level of similarity for the match and any additional measures. Similarity of match will be expected to range from equivalent to close similarity to broadly similar. These measures of quality will be recorded in the mapping output which should be in a suitable alignment language or format such as OWL, SKOS, Alignment and EDOAL. The latter two, Alignment format and Expressive and Declarative Ontology Alignment Language (EDOAL) provide specialised representation of mapping alignments.
If additional data has been used to make the match, the output must provide access to such data the through links.
The evaluation metrics of this service will support assessment of mapping quality with respect to "locally provided" gold standards. Examples of metrics to include:-
Element Similarity
Similarity between two sets of elements in a mapping. For example, the Hamming distance counts the join correspondences with regard to the overall correspondence of both sets.
Precision and Recall
Measures of precision and recall based on the comparison of a generated mapping to a reference mapping alignment, effectively showing which correspondences are discovered (precision) and those that are missed (recall). Precision is the ratio of correctly found correspondences (true positives) to those which are not (false positives) to measure correctness in a mapping alignment. Recall is the ratio of correctly found correspondences (true positives) over the total number of expected correspondences (total of true positives and false negatives) to measure completeness in a mapping alignment.
Logical consistency
The service will detect and report logical inconsistencies in mappings that can be inferred from the structure of the alignment between two ontologies, such as logical disjoints.
2.5. Creation of new or improved pairwise mappings
The service will use the source ontologies and any existing mappings for the data domain(s) of interest as described in section 2.2. These will be input to the service workflow which must use "state of the art" matching algorithm(s) and expert curation to generate new or edit to improve existing ontology mappings. The service documentation should give an explanation of the methods used by the matching algorithm(s) and any additional data or information that is used. Another source of algorithm documentation could be through the Ontology Matching Evaluation Initiative (http://www.ontologymatching.org) which hosts annual campaigns (e.g. http://oaei.ontologymatching.org/2016).
Algorithms which implement ontology matching techniques operate at the level of ontology element and/or structure. Table 1 provides a brief description to illustrate the range of techniques available. This summary is derived from the book entitled ‘Ontology Matching’ 2nd Edition 2013 by Euzenat and Shvaiko, (http://book.ontologymatching.org) where further details can be found.
Table 1: Summary of techniques used by Ontology Matching algorithms
Element-level | String-based | Often used to match names, identifiers and name descriptions of ontological entities. |
Language-based | Consider names as words in some natural language such as English. | |
Constraint-based | Deal with internal constraints applied to definitions of entities such as types, multiplicity of attributes and keys. | |
Informal resource-based | Deduce relations between ontology entities based on how they relate to each other. | |
Formal resource-based | Make use of formal resources such as domain-specific ontologies, upper ontologies and linked data. | |
Structure-level | Graph-based | Compare source ontologies (including database schemas and taxonomies) as nodes on labelled graphs. |
Taxonomy-based | Hierarchical classifications consider only the specialisation relation. | |
Model-based | Match source ontologies based on semantic interpretation. | |
Instance-based | Compare sets of instances of classes to decide if they match or not. |
2.6. Aggregation of pairwise mappings
The service should offer the option to aggregate pairwise mappings to increase coverage across a particular data domain, should this be required by a client.
2.7. Evaluate the quality of new mappings
The service will evaluate the quality of new mappings using similar measures described already in section 2.4.
2.8. Generate metadata to represent the methodology
The service will provide annotation of the mapping with suitable metadata and documentation. It's purpose is to make the methodology easier to understand to enable reproducibility and reuse.
The metadata for ontology mapping alignment will include:-
- date of rendering the mapping alignment
- method for ontology matching with version and parameters
- description of purpose
- creator name
- curator identity and multiple contributions
- alignment measures
- manual curation and confidence
- any limitations of use
- names of the aligned, source ontologies
- type of alignment (e.g. 1:1 or *.* correspondence)
- seed alignment from which the alignment is derived
- the application context
- relevant external sources and references
- any dependency across the mapping alignment
- context and application supported by the mapping
- link to a community resource to support the mapping
- provenance and rationale for any manual curation
- an ability to extend the metadata model, as required
2.9. Monitor the existing source ontologies or mappings
The service will monitor the ontology and mapping sources for change every two weeks. If change is found, then evaluation of quality will be undertaken as described in section 2.3 and 2.4. If sufficient quality is found the service will move to the next sections to generate improved mappings.
3.0. Seek new source ontologies or mappings
The service will seek new source ontologies or mappings that are relevant to the data domain(s) determined by the client(s). If new resources are found, then evaluation of quality will be undertaken as described in section 2.3 and 2.4. If sufficient quality is found the service will move to the next sections to generate improved mappings.
3.1. Feedback and consultation
The service will include a mechanism for feedback and consultation from with the client(s) so it is possible to annotate disagreement with matching elements in a mapping. This mechanism must make such annotation easy to provide. The annotation will be submitted to the service provider for evaluation as described in section 2.4. Following this the service provider will consult with the client who provided the feedback to reach resolution.
3.2. Access to the mappings
It is important that the service supports an appropriate security mechanism to manage access to the mappings. Such managed access will have the option to share a mapping if it seen as beneficial to the client organisation or if there is an opportunity to source of public funding. Another option that is required is to restrict access to a mapping using an acceptable security standard.
3.3. Ownership of the mappings
Some of the mappings will be agreed to be open for the public domain and some others will be restricted access. Both cases need to be clearly indicated by the mapping service.