London 2019 AI Workshop Summary and Material

Thanks to all our speakers and panelists!

Here is the list of talks:

Mark Earll, Data Science Project Lead, Syngenta, “Chemometric and Machine Learning applications at Syngenta”
Colin Batchelor, The Royal Society of Chemistry, “Deep learning and chemical data”
Dennis Wang, University of Sheffield, “ML approaches for dissecting the pharmacological landscape of cancer populations”
Jonas Bostrom, Principal Scientist, AstraZeneca, “AI and Automation in Drug Design, For Real”
Chris Holmes, Professor, Oxford and Alan Turing Institute, “Reproducible AI in health research”
Al Dossetter, Medchemica Ltd, “Accelerating multiple medicinal chemistry projects using AI”
Lei Xie, Professor, CUNY, “Multi-scale drug action models”

In addition, we held multiple panel discussions and a series of short talks during an AI-themed breakout on the main conference day (March 13th).

Agenda for the workshop, speaker list, and slide decks (with speakers' permissions) have been archived.

Here are some views from the conference venue at Hilton Paddington, London:

Key topics emergent from talks and the brainstorming session:

Good science must be reproducible, and multiple tools are available to help with data and model sharing. Chris Holmes of Oxford University and the Turing Institute talked about examples in reproducible AI research.
Data is everything in statistical modeling and artificial intelligence. AI models are particularly suitable for "rich data" scenarios.
It is becoming progressively easier and cheaper to produce and capture ever more data. This means that it is also cheap and easy to produce a lot of bad data. Good models cannot be created using bad data, hence, planning for data quality is therefore a key capability of a scientific organization. A data life cycle is important.
- One important summary statement that sounded in the audience: "We as IT practitioners invest heavily in talent and infrastructure. But without good data best talent and infrastructure are worthless. Yet investment towards data life cycle is insufficient"
- An organization can measure its data management maturity level using the model presented by Chris Willis of Accenture.
Model Versioning: Just like infrastructure and data, mathematical models also have life cycles. Knowing what these life cycles are and planning one's research efforts accordingly are important, pointed out Brandon Allgood, co-founder of Numerate, a Bay Area AI start-up.
Regulatory approval of medical products and services based on "black box" AI recognition models is the upcoming challenge.

Notes from the brainstorming session flipchart

Quality:

- Playbook desired

- Need dimensions of data quality definition

- Need standards

- Access to raw data (leading to better reproducibility etc)

- Clear metadata

- Define data life cycle

- Biases known

- Invest in data collection and curation processes (and less so in infrastructure and hardware, as is common in the industry driven by engineers)

Please review the feedback given by our London attendees and CoE constituents and feel free to add your own: https://docs.google.com/forms/d/1t5yqA-yRsMZDM6F99LBB5_a3FEpMn9_xmAoo9_w6y6Y/edit