AI Data Group 10 Jan 2018 Action Minutes


Notes Nick Lynch (Unlicensed)


  • Afrozy, Frederik, Ian, Jabe, Bryn, Helena, Nick, Ted, Carmen, Evelina

Goals & Discussions

  • Comments since last meeting

  • Review the emerging business case (let me know if you cannot edit, should be open but please use a Google account if you can)
  • Benchmark principles
    • Support datasets for comparative model build
    • Open data for key areas
  • Look at ideas for data processing toolkit (Afrozy)
  • Project Timing & phasing options:
    • Early delivery in 4-5 months, second delivery year end?
  • Next steps focusing on Business case and data use cases
    • create a business case(s) for funding by Feb 2018 to pitch to groups and present at London Conference


To select business use case(s) areas is our key priority

Data Areas

Use CasesWhere are the current pain points: We hope to identify these top down from known areas or in other ways

Need to select areas that have enough data and that this data is accessible or can be made accessible easily.

Longer terms goals to encourage best data access in parallel with other initiatives


Is this Pre-clinical, clinical or post marketing?

Can we link to ETox data sources and other groups who are collecting data

Instrument & Lab data

How to best include Devices and sensors not just from lab but wider data collection tools 

Possible focus on maintenance information and prediction of issues

Rare Disease areasAre there suitable groups to partner here?* Building on Hackathon example from 2017 (frederick ataraxia disease)

*Ian Harrow is a member of the Rare Diseases GO-FAIR group (Marco Roos and Barend Mons) at Leiden, NL. Inaugral meeting is next week.

There is potential value in publishing the use cases and discussion there

Data Quality approaches for AI (Afrozy)

  • Intelligent domain discovery: classifying data fields by applying semantic labels to each column. Can auto infer domains for columns based on Data patterns using supervised classification techniques 
  • Intelligent anomaly detection: statistical and machine learning approaches to detect data outliers and anomalies. Useful for flagging data quality issues upstream – long before they impact business processes downstream 
  • Intelligent Data Similarity: detecting duplicates, combining individual data fields into business entities, propagating user tags across data sets using clustering/recommendation algorithms 
  • Auto-mapping: Detect master data entities across the enterprise and automatically map them to the master data model applying the requisite transformations and quality rules. This can result in intelligent automation of data integration from multiple partner/suppliers improving data quality and efficiency 
  • Intelligent Structure Discovery: derive structure from messy device and logs data – converting non-standard , non-relational formats ( web, IoT, logs) to useful data which can be used with other enterprise datasets. Automated structure discovery will also be useful for metadata extraction, enriching and improving data. 

The techniques that can be applied to build these use cases are supervised/ unsupervised machine learning and recommendation algorithms.

Action items

  • Document Areas for AI Community
  • Data Group Folder
  • Working Document
  • All to read and comment on business case document

    • Need to add use cases and examples
  • Planning a AI group meeting 13 March in London, more details
  • Data ideas actions from the previous meeting

    • CJ - Clinical Data from Wearables and their analysis compared to current challenges of paper recording

    • Afrozy - Commercial data scenarios but also the data best practice applied to that

    • Ted -  Sepsis prediction, Morphology, Bone suppression

    • Ian - Rare Disease GO-FAIR network - opportunity for AI use case?
  • Jabe, Dennis, Frederik, Afrozy - Best Practice for data
    • Drawing other industry examples including non Life Sciences
    • See Affrozy section above
  • Industry Examples of AI usage: Terry, Ian

  • Quick review of current AI uses (Imaging, QSAR activity, others)

  • Lessons from Hackathon - Nick wrote to all the participants, awaiting feedback