AI Data Group 10 Jan 2018 Action Minutes
Date
Jan 10, 2018
Notes @Nick Lynch (Unlicensed)
Attendees
Afrozy, Frederik, Ian, Jabe, Bryn, Helena, Nick, Ted, Carmen, Evelina
Goals & Discussions
Comments since last meeting
Review the emerging business case (let me know if you cannot edit, should be open but please use a Google account if you can)
Benchmark principles
Support datasets for comparative model build
Open data for key areas
Look at ideas for data processing toolkit (Afrozy)
Project Timing & phasing options:
Early delivery in 4-5 months, second delivery year end?
Next steps focusing on Business case and data use cases
create a business case(s) for funding by Feb 2018 to pitch to groups and present at London Conference
Timing:
To select business use case(s) areas is our key priority
Data Areas
Item | Summary |
|---|---|
Use Cases | Where are the current pain points: We hope to identify these top down from known areas or in other ways |
Data | Need to select areas that have enough data and that this data is accessible or can be made accessible easily. Longer terms goals to encourage best data access in parallel with other initiatives |
Toxicology | Is this Pre-clinical, clinical or post marketing? Can we link to ETox data sources and other groups who are collecting data |
Instrument & Lab data | How to best include Devices and sensors not just from lab but wider data collection tools Possible focus on maintenance information and prediction of issues |
Rare Disease areas | Are there suitable groups to partner here?* Building on Hackathon example from 2017 (frederick ataraxia disease) |
*Ian Harrow is a member of the Rare Diseases GO-FAIR group (Marco Roos and Barend Mons) at Leiden, NL. Inaugral meeting is next week.
There is potential value in publishing the use cases and discussion there
Data Quality approaches for AI (Afrozy)
Intelligent domain discovery: classifying data fields by applying semantic labels to each column. Can auto infer domains for columns based on Data patterns using supervised classification techniques
Intelligent anomaly detection: statistical and machine learning approaches to detect data outliers and anomalies. Useful for flagging data quality issues upstream – long before they impact business processes downstream
Intelligent Data Similarity: detecting duplicates, combining individual data fields into business entities, propagating user tags across data sets using clustering/recommendation algorithms
Auto-mapping: Detect master data entities across the enterprise and automatically map them to the master data model applying the requisite transformations and quality rules. This can result in intelligent automation of data integration from multiple partner/suppliers improving data quality and efficiency
Intelligent Structure Discovery: derive structure from messy device and logs data – converting non-standard , non-relational formats ( web, IoT, logs) to useful data which can be used with other enterprise datasets. Automated structure discovery will also be useful for metadata extraction, enriching and improving data.
The techniques that can be applied to build these use cases are supervised/ unsupervised machine learning and recommendation algorithms.
Action items
Data ideas actions from the previous meeting