FAIR Annotation of BioAssays

This project aims to convert the biological assay protocols contained in research publications into a machine-readable FAIR format.

Why is this important?

The main objective of the DataFAIRy assay annotation project has evolved from NLP driven annotation of biological assay protocols to development and promotion of a metadata standard for the annotation and reporting of a full assay life cycle. Biological assay is a popular data type for post-hoc data mining for research program planning, but most assay descriptions are not in a FAIR form. It is expected that making assay information FAIR would increase the efficiency of bench scientists engaged in experiment planning and enable research that currently requires tedious expert literature review. The common data model can simplify data-sharing in collaborative research and publishing of research results.

Project History

  • 2020: Idea of FAIR metadata annotation: AZ, BMS, Roche, Novartis

  • 2021: Proof of Concept Experiment

    • Abbvie joined

    • Defined data structure, found NLP technology partner (Collaborative Drug Discovery)

    • Curation model:  “AI-in-the-loop” = NLP + vetted human expert review

    • Annotated a batch of 498 protocols and deposited in PubChem

  • 2022: Scale-up Experiment

    • GSK joined

    • Found high-throughput technology partner (Molecular Connections)

    • Annotated over 2300 protocols. Collaborative Drug Discovery BioHarmony Annotator was used in the first two phases of our project

    • Pistoia Alliance signed an agreement with the US FDA on in-vitro pharmacology standards

  • 2023 - 2024: Push for the new Metadata Standard and Publish

    • PA and US FDA use modified DataFAIRy annotation template for IVP

    • Call for an industry standard for assay metadata (AZ, Roche, Abbvie, Novartis supporting)

    • A paper describing earlier work on the metadata standard published in SLAS Discovery

  • 2024 - 2025: Define standards for the entire assay information value cycle

    • Vision: (1) Register a protocol ---> (2) Instantiate ---> (3) Capture experimental data ---> (4) Report to regulators

    • Picked immunoassays as the initial technology for data and metadata standardization

Why is this a good idea?

  • Interoperability with regulatory (FDA) data systems ---> streamlined regulatory submissions

  • Interoperability with vendor (CRO)  data systems ---> increased efficiency for customers and more business for vendors

  • Interoperability between data systems in pharma companies ---> eliminate silos, miscommunication and waste

  • Quality across the board! Increased reproducibility of science

  • Streamlined publishing

  • …And yes, maybe cost savings

What will the project achieve?

We will create a data product that would provide bioassay protocol information in a machine-readable FAIR form ready for publishing, regulatory submission, incorporation into scientific reports, and data mining. Since the largest global pharmaceutical companies support this effort, the resulting data model, data ingestion business processes, and software would become de-facto industry standards that would further facilitate data sharing and collaboration.

How will the project do this?

We conduct extensive business analysis of the topics of interest that the future data model should be able to contain. We work with ontologists (most notably, the BioAssay Ontology, BAO) to define new standard terms when necessary. We use Natural Language Processing in combination with manual review to automatically produce assay annotations (“AI-in-the-loop”) fit to this data model.

Project deliverables

At this time, we performed Proof-of-Concept and Scale-Up experiments and annotated close to 3,000 assay protocols. Our data model was published in the SLAS Discovery journal. The resulting metadata standard is used by another Pistoia Alliance project, In-Vitro Pharmacology, that collaborates with the US FDA. Multiple Pistoia Alliance member firms expressed interest in harmonizing their proprietary assay registration systems with this emergent standard. We are now developing data standards for the remaining parts of the assay life cycle: protocol instantiation, data capture, and reporting. We use ELISA immunoassay as the initial use case.

Project Community

The DataFAIRy BioAssay Annotation Project Team:

  • Vladimir Makarov, Pistoia Alliance, Project Manager

Founding Team:

  • Isabella Feierberg, Jnana (formerly AZ)

  • Dana Vanderwall, Digital Lab Consulting (formerly BMS)

  • Anosha Siripala, Novartis

  • Martin Romacker, Roche

  • Samantha Jeschonek, Revvity (formerly CDD and PerkinElmer)

Current Steering Committee:

  • Timothy Ikeda, AstraZeneca

  • Rama Balakrishnan, Genentech

  • Roger Canales, Genentech

  • Yelena Budovskaya, Genentech

  • Chris Butler, Abbvie

Supporters:

  • Mark Musen, Stanford University

  • Ellen Berg, Alto Predict (former Eurofins)

  • Sheryl Denker, Critical Path Institute (former Eurofins)

  • Christopher Southan, Univ. of Edinburgh, and Fellow of Royal Societies of Biology, Chemistry, and the British Pharmacological Society

  • Gabriel Backiananthan, Novartis

  • Wendy Zimmerman, Novartis

  • Stephen Schurer, Univ. of Miami and BAO

  • Jignesh Bhate, Molecular Connections

  • Evan Bolton, PubChem

Collaborative Drug Discovery (CDD):

  • Jason Harris

  • Alex Clark

  • Barry Bunin

In-Vitro Pharmacology Project Team:

  • Veronique Francois, Pistoia Alliance, Project Manage