FAIR Annotation of BioAssays

This project aims to convert the biological assay protocols contained in research publications into a machine-readable FAIR format.

Why is this important?

The main objective of the DataFAIRy assay annotation project has evolved from NLP driven annotation of biological assay protocols to development and promotion of a metadata standard for the annotation and reporting of a full assay life cycle. Biological assay is a popular data type for post-hoc data mining for research program planning, but most assay descriptions are not in a FAIR form. It is expected that making assay information FAIR would increase the efficiency of bench scientists engaged in experiment planning and enable research that currently requires tedious expert literature review. The common data model can simplify data-sharing in collaborative research and publishing of research results.

Project History

2020: Idea of FAIR metadata annotation: AZ, BMS, Roche, Novartis
2021: Proof of Concept Experiment
- Abbvie joined
- Defined data structure, found NLP technology partner (Collaborative Drug Discovery)
- Curation model: “AI-in-the-loop” = NLP + vetted human expert review
- Annotated a batch of 498 protocols and deposited in PubChem
2022: Scale-up Experiment
- GSK joined
- Found high-throughput technology partner (Molecular Connections)
- Annotated over 2300 protocols. Collaborative Drug Discovery BioHarmony Annotator was used in the first two phases of our project
- Pistoia Alliance signed an agreement with the US FDA on in-vitro pharmacology standards
2023 - 2024: Push for the new Metadata Standard and Publish
- PA and US FDA use modified DataFAIRy annotation template for IVP
- Call for an industry standard for assay metadata (AZ, Roche, Abbvie, Novartis supporting)
- A paper describing earlier work on the metadata standard published in SLAS Discovery
2024 - 2025: Define standards for the entire assay information value cycle
- Vision: (1) Register a protocol ---> (2) Instantiate ---> (3) Capture experimental data ---> (4) Report to regulators
- Picked immunoassays as the initial technology for data and metadata standardization

Why is this a good idea?

Interoperability with regulatory (FDA) data systems ---> streamlined regulatory submissions
Interoperability with vendor (CRO) data systems ---> increased efficiency for customers and more business for vendors
Interoperability between data systems in pharma companies ---> eliminate silos, miscommunication and waste
Quality across the board! Increased reproducibility of science
Streamlined publishing
…And yes, maybe cost savings

What will the project achieve?

We will create a data product that would provide bioassay protocol information in a machine-readable FAIR form ready for publishing, regulatory submission, incorporation into scientific reports, and data mining. Since the largest global pharmaceutical companies support this effort, the resulting data model, data ingestion business processes, and software would become de-facto industry standards that would further facilitate data sharing and collaboration.

How will the project do this?

We conduct extensive business analysis of the topics of interest that the future data model should be able to contain. We work with ontologists (most notably, the BioAssay Ontology, BAO) to define new standard terms when necessary. We use Natural Language Processing in combination with manual review to automatically produce assay annotations (“AI-in-the-loop”) fit to this data model.

Project deliverables

At this time, we performed Proof-of-Concept and Scale-Up experiments and annotated close to 3,000 assay protocols. Our data model was published in the SLAS Discovery journal. The resulting metadata standard is used by another Pistoia Alliance project, In-Vitro Pharmacology, that collaborates with the US FDA. Multiple Pistoia Alliance member firms expressed interest in harmonizing their proprietary assay registration systems with this emergent standard. We are now developing data standards for the remaining parts of the assay life cycle: protocol instantiation, data capture, and reporting. We use ELISA immunoassay as the initial use case.

Project Community

The DataFAIRy BioAssay Annotation Project Team:

Vladimir Makarov, Pistoia Alliance, Project Manager

Founding Team:

Isabella Feierberg, Jnana (formerly AZ)
Dana Vanderwall, Digital Lab Consulting (formerly BMS)
Anosha Siripala, Novartis
Martin Romacker, Roche
Samantha Jeschonek, Revvity (formerly CDD and PerkinElmer)

Current Steering Committee:

Timothy Ikeda, AstraZeneca
Rama Balakrishnan, Genentech
Roger Canales, Genentech
Yelena Budovskaya, Genentech
Chris Butler, Abbvie

Supporters:

Mark Musen, Stanford University
Ellen Berg, Alto Predict (former Eurofins)
Sheryl Denker, Critical Path Institute (former Eurofins)
Christopher Southan, Univ. of Edinburgh, and Fellow of Royal Societies of Biology, Chemistry, and the British Pharmacological Society
Gabriel Backiananthan, Novartis
Wendy Zimmerman, Novartis
Stephen Schurer, Univ. of Miami and BAO
Jignesh Bhate, Molecular Connections
Evan Bolton, PubChem

Collaborative Drug Discovery (CDD):

Jason Harris
Alex Clark
Barry Bunin

In-Vitro Pharmacology Project Team:

Veronique Francois, Pistoia Alliance, Project Manage