AI Data Group 10 Jan 2018 Action Minutes

AI Data Group 10 Jan 2018 Action Minutes

Date

Jan 10, 2018

Notes @Nick Lynch (Unlicensed)

Attendees

  • Afrozy, Frederik, Ian, Jabe, Bryn, Helena, Nick, Ted, Carmen, Evelina

Goals & Discussions

  • Comments since last meeting

  • Review the emerging business case (let me know if you cannot edit, should be open but please use a Google account if you can)

  • Benchmark principles

    • Support datasets for comparative model build

    • Open data for key areas

  • Look at ideas for data processing toolkit (Afrozy)

  • Project Timing & phasing options:

    • Early delivery in 4-5 months, second delivery year end?

  • Next steps focusing on Business case and data use cases

    • create a business case(s) for funding by Feb 2018 to pitch to groups and present at London Conference

Timing:

To select business use case(s) areas is our key priority

Data Areas

Item

Summary

Item

Summary

Use Cases

Where are the current pain points: We hope to identify these top down from known areas or in other ways

Data 

Need to select areas that have enough data and that this data is accessible or can be made accessible easily.

Longer terms goals to encourage best data access in parallel with other initiatives

Toxicology

Is this Pre-clinical, clinical or post marketing?

Can we link to ETox data sources and other groups who are collecting data

Instrument & Lab data

How to best include Devices and sensors not just from lab but wider data collection tools 

Possible focus on maintenance information and prediction of issues

Rare Disease areas

Are there suitable groups to partner here?* Building on Hackathon example from 2017 (frederick ataraxia disease)

*Ian Harrow is a member of the Rare Diseases GO-FAIR group (Marco Roos and Barend Mons) at Leiden, NL. Inaugral meeting is next week.

There is potential value in publishing the use cases and discussion there

Data Quality approaches for AI (Afrozy)

  • Intelligent domain discovery: classifying data fields by applying semantic labels to each column. Can auto infer domains for columns based on Data patterns using supervised classification techniques 

  • Intelligent anomaly detection: statistical and machine learning approaches to detect data outliers and anomalies. Useful for flagging data quality issues upstream – long before they impact business processes downstream 

  • Intelligent Data Similarity: detecting duplicates, combining individual data fields into business entities, propagating user tags across data sets using clustering/recommendation algorithms 

  • Auto-mapping: Detect master data entities across the enterprise and automatically map them to the master data model applying the requisite transformations and quality rules. This can result in intelligent automation of data integration from multiple partner/suppliers improving data quality and efficiency 

  • Intelligent Structure Discovery: derive structure from messy device and logs data – converting non-standard , non-relational formats ( web, IoT, logs) to useful data which can be used with other enterprise datasets. Automated structure discovery will also be useful for metadata extraction, enriching and improving data. 

The techniques that can be applied to build these use cases are supervised/ unsupervised machine learning and recommendation algorithms.

Action items

Document Areas for AI Community
All to read and comment on business case document
Need to add use cases and examples
Planning a AI group meeting 13 March in London, more details

Data ideas actions from the previous meeting

Jabe, Dennis, Frederik, Afrozy - Best Practice for data
Drawing other industry examples including non Life Sciences
See Affrozy section above
Industry Examples of AI usage: Terry, Ian
Quick review of current AI uses (Imaging, QSAR activity, others)
Lessons from Hackathon - Nick wrote to all the participants, awaiting feedback