FAIR Annotation of BioAssays
This project aims to convert the biological assay protocols contained in research publications into a machine-readable FAIR format.
Why is this important?
The main objective of the DataFAIRy assay annotation project has evolved from NLP driven annotation of biological assay protocols to development and promotion of a metadata standard for the annotation and reporting of a full assay life cycle. Biological assay is a popular data type for post-hoc data mining for research program planning, but most assay descriptions are not in a FAIR form. It is expected that making assay information FAIR would increase the efficiency of bench scientists engaged in experiment planning and enable research that currently requires tedious expert literature review. The common data model can simplify data-sharing in collaborative research and publishing of research results.
Project History
2020: Idea of FAIR metadata annotation: AZ, BMS, Roche, Novartis
2021: Proof of Concept Experiment
Abbvie joined
Defined data structure, found NLP technology partner (Collaborative Drug Discovery)
Curation model: “AI-in-the-loop” = NLP + vetted human expert review
Annotated a batch of 498 protocols and deposited in PubChem
2022: Scale-up Experiment
GSK joined
Found high-throughput technology partner (Molecular Connections)
Annotated over 2300 protocols. Collaborative Drug Discovery BioHarmony Annotator was used in the first two phases of our project
Pistoia Alliance signed an agreement with the US FDA on in-vitro pharmacology standards
2023 - 2024: Push for the new Metadata Standard and Publish
PA and US FDA use modified DataFAIRy annotation template for IVP
Call for an industry standard for assay metadata (AZ, Roche, Abbvie, Novartis supporting)
A paper describing earlier work on the metadata standard published in SLAS Discovery
2024 - 2025: Define standards for the entire assay information value cycle
Vision: (1) Register a protocol ---> (2) Instantiate ---> (3) Capture experimental data ---> (4) Report to regulators
Picked immunoassays as the initial technology for data and metadata standardization
Why is this a good idea?
Interoperability with regulatory (FDA) data systems ---> streamlined regulatory submissions
Interoperability with vendor (CRO) data systems ---> increased efficiency for customers and more business for vendors
Interoperability between data systems in pharma companies ---> eliminate silos, miscommunication and waste
Quality across the board! Increased reproducibility of science
Streamlined publishing
…And yes, maybe cost savings
What will the project achieve?
We will create a data product that would provide bioassay protocol information in a machine-readable FAIR form ready for publishing, regulatory submission, incorporation into scientific reports, and data mining. Since the largest global pharmaceutical companies support this effort, the resulting data model, data ingestion business processes, and software would become de-facto industry standards that would further facilitate data sharing and collaboration.
How will the project do this?
We conduct extensive business analysis of the topics of interest that the future data model should be able to contain. We work with ontologists (most notably, the BioAssay Ontology, BAO) to define new standard terms when necessary. We use Natural Language Processing in combination with manual review to automatically produce assay annotations (“AI-in-the-loop”) fit to this data model.
Project deliverables
At this time, we performed Proof-of-Concept and Scale-Up experiments and annotated close to 3,000 assay protocols. Our data model was published in the SLAS Discovery journal. The resulting metadata standard is used by another Pistoia Alliance project, In-Vitro Pharmacology, that collaborates with the US FDA. Multiple Pistoia Alliance member firms expressed interest in harmonizing their proprietary assay registration systems with this emergent standard. We are now developing data standards for the remaining parts of the assay life cycle: protocol instantiation, data capture, and reporting. We use ELISA immunoassay as the initial use case.
Project Community
The DataFAIRy BioAssay Annotation Project Team:
Founding Team:
Current Steering Committee:
Supporters:
Collaborative Drug Discovery (CDD):
| In-Vitro Pharmacology Project Team:
|