Pistoia Debates Webinar on Ontologies Mapping 23rd Feb 2017

Links to Video recording and Slideshare

Statistics from the webinar analytics report: 212 attendees from 396 registrants

Questions from the Q & A

QuestionQuestioner nameAnswerPanelist name
Q1. Can you elaborate on which minimal information standards you are looking at?mmiller@systemsbiology.orgThe standards generated by this project are the ontologies guidelines, mapping tool requirements and mapping service requirements. These are all accessible through this public wiki.

Q2. Is the OM project considering SiLA/AnIML analytical and instrument data ontologies and standards? Can these be considered in Phase 3? Is the OM project primarily focused on experimental ontologies or also metadata of the instruments used for the data generation, being able to reference the instrument metadata for validation and experiment reproducibility

devon.johnston@sila-standard.comOntologies in the Disease and Phenotype domain are selected by the funders. However, the project guidelines are applicable to ontologies in any data domain. Likewise, mappings can found for any public ontologies hosted by EMBL-EBI, but their quality and value will not be assessed by Phase 3.ian.harrow@pistoiaalliance.org
Q3. How does these mappings interact with OMOP? Or is this a competitive view?mmeighu@celgene.comThe Pistoia Alliance Ontologies Mapping project is open to collaboration with public organisations such as OHDSI, who provide the OMOP common data model. Mappings produced by Phase 3 will be shared on an openly available website such as through this public wiki at the appropriate time.

Q4. Roche is an EFPIA member of IMI1 (eTOX) and IMI2 (NexGETS). However, Martin didn't mention Roche's involvement which, in the case of eTOX, delivered the OntoBrowser which is designed to draw disparate terminologies together. Did Roche's 5-year involvement in this project have any impact on its Ontologies Mapping project and, if so, how?

philip.drew@pds-consultants.co.ukWe have been aware of the OntoBrowser provided by the eTOX IMI consortium. (I was still working at NIBR when everything started and I also used the tool at this time). We have not considered the tool for the Ontologies mapping at Roche as there is a clear focus on mappings related to the eTOX project (see http://www.etoxproject.eu/results.html ) whereas we are looking for a generic approach. However, there is an indirect influence as we have been aware of the mapping strategy of the tool. Internally, we make our ontology services available to all our business organizations so we are hoping to bring some results from the Ontologies Mapping project in the business although we have not connected to the NexGETS team so far.

Q5. For Martin and Yasmin ..where is the greatest value of using semantics - data ingestion, integration or consumption?

merchant_ron@lilly.com

All three areas are important, however the earlier the better; even planning for ontology use at study design! Semantic enrichment at integration probably provides the greatest measurable value because of the extra information (context) provided by adjacent datasets. Putting off curation until consumption is unwise for two reasons; it leads to repeated "data preparation", and leaves users to select the datasets to consume without the benefit of semantics. The beauty of measurement, of course, is the ability to quantify where in the workflow the most value is provided (YAF).

Using semantics is an overarching principle for data management and data processing. Based on the assumption that we consider data as a key asset in Research and Development Semantics equally contribute to data acquisition, data integration and data access/ sharing. The application of semantic principles for ingestion facilitates the integration downstream as well as value generation during consumption and analysis. (MR)

Q6. In terms of diseases like cancer, where clinical ontologies vary from tumor type to tumor type how can one work on normalizing attributes in that case?

kundrar@mskcc.orgThis is likely to benefit from collaboration with curators at the most relevant public ontology. Mapping between different source ontologies may also be part of the solution for building an application ontology.
Q7. While ingesting a new data source, is there a prerequisite to have a data model / biz. model or ontology first?merchant_ron@lilly.comFor mapping between two ontologies access to the source ontologies in OWL format are the only requirements. There is no need for any data model or business model.ian.harrow@pistoiaalliance.org
Q8. For SciBite -- your enrichment – are you adding content/ terms to HPO, MeSH directly?jkranz@post.harvard.eduEnrichment falls into 3 distinct areas: greater coverage for existing entities in ontologies, new entities in ontologies and new ontologies in spaces not served by the public domain (often very company specific). As much as possible we feedback to public sources for new entities as the value in having public IDs for these is of benefit to all.

Q9. For SciBite - how accurate are the ontology enrichments? What are the recall and precision pre- and post- enrichment?

peter.mcquilton@oerc.ox.ac.uk

This is a very difficult thing to measure. Obviously we have bio-creative for gene, disease and drug mainly. But even here if you take some of the top scoring open source tools and apply them to patents or internal documents you can get vastly different results. Further there are many ontologies and document types for which there is no published data. Thus, we rely on our users to evaluate precision/recall on a case by case basis, in their context as I think it's the only way to see what happens in "real life".

lee@scibite.com

Q10. Does the ontology driven search also cover clinical trials? Can we search on the basis of indication + targeted mutation (keeping combination mutations in mind)

kundrar@mskcc.orgThat depends on the nature of the source data, but mining e.g. medline for clinical trials publication types and extracting out normalised mutations linked to indications is indeed something thats very straightforward to do.

Q11. For Simon: How do you distinguish between a sample based attribute and a patient based attribute? How do you connect the two?

kundrar@mskcc.orgOur definition of a biosample in the BioSamples database is very broad (https://www.ebi.ac.uk/biosamples/). We use ontologies to describe the metadata on a sample that can help us distinguish different types of attributes, but this is still a challenge. For example we have a material type “individual” on this sample https://www.ebi.ac.uk/biosamples/samples/SAMEA2629548 that indicates it is a patient and links through to cell lines derived from this sample. We are also now looking to adopt other standards like PhenoPackets as way of representing patient phenotypes associated to a biosample.
https://github.com/phenopackets/phenopacket-format/wiki/Overview-of-Phenotype-Exchange-Format
https://github.com/phenopackets/phenopacket-reference-implementation
jupp@ebi.ac.uk
Q12. Where is the greatest value of using semantics/ontologies .. unstructured, structured or semi-structured?merchant_ron@lilly.comI would say all 3. The act of combining structured + semi/unstructured data through common ontologies has great potential to power many different data analyses. This is why ontologies are so important, providing the semantic glue to connect all these.lee@scibite.com

Q13. Are there plans to aggregate OBO foundry and similars together with the ontologies hosted at EBI (AIM would be to have ontologies in "ONE" format and harmonized)

matthias.negri@abbvie.comThis is essentially what the Ontology Lookup Service tries to do. It includes all the OBO ontologies and additional EBI ontologies and provides a common API based on REST/JSON for accessing them (http://www.ebi.ac.uk/ols/docs/api). OLS does the work to harmonise certain fields like synonyms, definitions etc. You can also download them all from OLS in the OWL format and they will soon be available all together to query with SPARQL in our RDF platform (https://www.ebi.ac.uk/rdf/). We have also adopted a standard developed within the OBO community for metadata about an ontology, this provides a standard way to describe how you register and access an ontology https://github.com/OBOFoundry/OBOFoundry.github.io#instructions-for-registry-curators.jupp@ebi.ac.uk
Q14. What is the relationship between OLS and the other services and NCBO BioPortal out of Stanford, if any?mmiller@systemsbiology.org

There are a number of ontology repositories, like OLS and BioPortal, all offering similar services. The differences are subtle in some cases, but for OLS at least our focus is on the data and allowing application developers and curators easy access to semantics. OLS contains a smaller set of ontologies that we think are relevant for our community and our annotation tools aim for precision over recall to support automation of annotation. We collaborate closely with the team in Stanford to share ideas and improve our services that ultimately should benefit everyone.

All of our services have REST APIs:-

http://www.ebi.ac.uk/ols/docs/api
https://www.ebi.ac.uk/spot/zooma/docs/api.html 

The OxO mapping API is will be announced soon...

jupp@ebi.ac.uk
Q15. How is EMBL-EBI's OLS different from BioPortal ?jkranz@post.harvard.edu

See answer to related Q14. directly above.

jupp@ebi.ac.uk

Q16. How these EBI ontology tools compare to similar function BioPortal tools (such as annotation, ontology mapping)? Do the tools provide Rest API access?

jiezheng@upenn.eduSee answer to related Q14. directly above.jupp@ebi.ac.uk

Q17. For Lee - wonderful work being done at SciBite. Is there a plan to develop or integrate with cognitive processing such as IBM Watson?

monica.elrod@pfizer.comI think this article (virtual-strategy.com/2016/12/06/why-semantics-data-linking-is-vital-to-artificial-intelligence/?utm_content=buffer44668&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) describes the role of semantic data as a resource to power AI/ML processes. We see our work as providing substrate for input into any computing engine for whatever customers wish to do. Most of our collaborations are driven by real-world customer use cases.lee@scibite.com
Q18. For Simon - will OXO be distributed for mirror installation?william.spooner@eaglegenomics.comYes, OxO is open source software and is straight forward to install locally. Here is the code and we’ll soon add some documentation on how to get it set up. https://github.com/EBISPOT/OLS-mapping-servicejupp@ebi.ac.uk
Q19. How can we join this exciting community to knowledge share etc.mmeighu@celgene.com
Please contact Ian to join the Community of Interest.

Q20. Real world things typically have different identifiers in different datasets - don't we also need co-reference resolution for actual data integration?

andreas.thalhammer@roche.comThe pair of matched identifiers from the source ontologies could be sufficient to serve that function without the need to assign a new identifier. However, the new mappings database (OXO) hosted by EMBL-EBI and described by Simon may assign unique identifiers to matched pairs (=cross references) in mapping sets in future.ian.harrow@pistoiaalliance.org
Q21. How to apply mapped ontologies on non-English content?s.tobaben@elsevier.comTranslate each source ontology to a common language first, then map between the ontologies by expert curation and/or ontology matching algorithm.