FAIR Annotation of BioAssays

This project aims to convert the biological assay protocols contained in research publications into a machine-readable FAIR format.

Why is this important?

The main objective of the DataFAIRy assay annotation project evolved from NLP driven annotation of biological assay protocols to development and promotion of a metadata standard for such annotations. Biological assay is a popular data type for post-hoc data mining for research program planning, but most assay descriptions are not in a FAIR form. It is expected that making assay information FAIR would increase the efficiency of bench scientists engaged in experiment planning and enable research that currently requires tedious expert literature review. The common data model can simplify data-sharing in collaborative research and publishing of research results. At this time most of the pharmaceutical businesses already have internal programs for conversion of assay protocols into a FAIR format. The effort for annotation of public assay protocols in these internal programs is therefore duplicated. Shifting the annotation of these public assays to a collaborative project would result in immediate cost savings to the member companies.

What will the project achieve?

We will create a data product that would provide bioassay protocol information in a machine-readable FAIR form ready for data mining. Since the largest global pharmaceutical companies support this effort, the resulting data model, data ingestion business processes, and software would become de-facto industry standards that would further facilitate data sharing and collaboration.

How will the project do this?

We will use Natural Language Processing in combination with manual review of automatically produced assay annotations (“AI-in-the-loop”). Collaborative Drug Discovery BioHarmony Annotator was used in the first two phases of our project. Manual QC was provided by Molecular Connections. Both firms won these contracts in the open competitive bidding process.

Project deliverables

At this time, we performed Proof-of-Concept and Scale-Up experiments and annotated close to 3,000 assay protocols. We are in process of deposition of these results into PubChem. We are also in process of drafting a paper reporting our process and the results. This paper was invited by the SLAS Discovery journal. The resulting metadata standard is used by another Pistoia Alliance project, In-Vitro Pharmacology, that collaborates with the US FDA. Multiple Pistoia Alliance member firms expressed interest in harmonizing their proprietary assay registration systems with this emergent standard.

Project Community

Founding Team:

  • Isabella Feierberg, Jnana (formerly AZ)

  • Dana Vanderwall, Digital Lab Consulting (formerly BMS)

  • Anosha Siripala, Novartis

  • Martin Romacker, Roche

  • Samantha Jeschonek, PerkinElmer (formerly CDD)

Current Steering Committee:

  • Timothy Ikeda, AstraZeneca

  • Rama Balakrishnan, Genentech

  • Roger Canales, Genentech

  • Yelena Budovskaya, Genentech

  • Chris Butler, Abbvie

Supporters:

  • Mark Musen, Stanford University

  • Ellen Berg, Alto Predict (former Eurofins)

  • Sheryl Denker (Former Eurofins)

  • Gabriel Backiananthan, Novartis

  • Wendy Zimmerman, Novartis

  • Stuart Chalk, BAO / U. of N. Florida

  • Paulo van Huffel, Ontoforce

  • Jignesh Bhate, Molecular Connections

  • Evan Bolton, PubChem

Collaborative Drug Discovery (CDD):

  • Jason Harris

  • Alex Clark