AI Data Group 10 Jan 2018 Action Minutes

Date

10 Jan 2018

Comments since last meeting
Review the emerging business case (let me know if you cannot edit, should be open but please use a Google account if you can)
Benchmark principles
- Support datasets for comparative model build
- Open data for key areas
Look at ideas for data processing toolkit (Afrozy)
Project Timing & phasing options:
- Early delivery in 4-5 months, second delivery year end?
Next steps focusing on Business case and data use cases
- create a business case(s) for funding by Feb 2018 to pitch to groups and present at London Conference

To select business use case(s) areas is our key priority

Item	Summary
Use Cases	Where are the current pain points: We hope to identify these top down from known areas or in other ways
Data	Need to select areas that have enough data and that this data is accessible or can be made accessible easily. Longer terms goals to encourage best data access in parallel with other initiatives
Toxicology	Is this Pre-clinical, clinical or post marketing? Can we link to ETox data sources and other groups who are collecting data
Instrument & Lab data	How to best include Devices and sensors not just from lab but wider data collection tools Possible focus on maintenance information and prediction of issues
Rare Disease areas	Are there suitable groups to partner here?* Building on Hackathon example from 2017 (frederick ataraxia disease)

*Ian Harrow is a member of the Rare Diseases GO-FAIR group (Marco Roos and Barend Mons) at Leiden, NL. Inaugral meeting is next week.

There is potential value in publishing the use cases and discussion there

Intelligent domain discovery: classifying data fields by applying semantic labels to each column. Can auto infer domains for columns based on Data patterns using supervised classification techniques

Intelligent anomaly detection: statistical and machine learning approaches to detect data outliers and anomalies. Useful for flagging data quality issues upstream – long before they impact business processes downstream

Intelligent Data Similarity: detecting duplicates, combining individual data fields into business entities, propagating user tags across data sets using clustering/recommendation algorithms

Auto-mapping: Detect master data entities across the enterprise and automatically map them to the master data model applying the requisite transformations and quality rules. This can result in intelligent automation of data integration from multiple partner/suppliers improving data quality and efficiency

Intelligent Structure Discovery: derive structure from messy device and logs data – converting non-standard , non-relational formats ( web, IoT, logs) to useful data which can be used with other enterprise datasets. Automated structure discovery will also be useful for metadata extraction, enriching and improving data.

The techniques that can be applied to build these use cases are supervised/ unsupervised machine learning and recommendation algorithms.

Document Areas for AI Community
Data Group Folder
Working Document
All to read and comment on business case document
- Need to add use cases and examples
Planning a AI group meeting 13 March in London, more details
Data ideas actions from the previous meeting
- CJ - Clinical Data from Wearables and their analysis compared to current challenges of paper recording
- Afrozy - Commercial data scenarios but also the data best practice applied to that
- Ted - Sepsis prediction, Morphology, Bone suppression
- Ian - Rare Disease GO-FAIR network - opportunity for AI use case?
Jabe, Dennis, Frederik, Afrozy - Best Practice for data
- Drawing other industry examples including non Life Sciences
- See Affrozy section above
Industry Examples of AI usage: Terry, Ian
Quick review of current AI uses (Imaging, QSAR activity, others)
Lessons from Hackathon - Nick wrote to all the participants, awaiting feedback