The proposed approach would allow for natural language queries to be effectively translated into structured queries, executed over standardized data sources (such as, for instance, Open Targets), and converted into human-readable outputs.
The project does not require participating companies to disclose any of their proprietary data. However, they can mine their proprietary data by using private instances of the described pipeline.
One significant expected outcome includes lessons learned on the best practices for deployment, prompt-tuning, fine training, and limitations of applicability of LLMs for research purposes. We will seek to publish these lessons learned for the benefit of the research community.
Another significant outcome can be an open-source target discovery pipeline prototype itself.
Improved efficiency and accuracy in target discovery and validation.
Creation of a framework that can be used for other use cases:
- A model of project execution for other pre-competitive core model work.
- Additional prototypes for other common discovery tasks can be created if/when more suitable use cases are identified.

Alignment with the Pistoia Alliance Strategic Priorities

...

Define the most common research questions in target discovery and validation. Establish an agreement between the project team that these are indeed the core target discovery business questions, and rank order them by vote by perceived relative importance. If such questions are many, pick the top ones. Establish an agreement on how many exactly. One can use this paper as a starting point for listing of relevant competency questions: https://www.sciencedirect.com/science/article/pii/S1359644613001542 (Failure to identify business questions, or picking too many or too few is a project risk)

...

Project Phases and Milestones

Phase	Milestones	Deliverables	Est Date
Initiation	Project charter	A list of candidate pre-competitive projects One or more projects selected by vote Project charter is drafted for the winning idea Raise minimal funds for the Elaboration phase	12/11/23 (Complete)
Elaboration	Development plan Cost estimates	Risks analysis – see Risk Registry below Technology analysis to address the identified risks Work Breakdown Structure (WBS) Cost estimates Time estimates Gantt Chart for Construction with additional iterations as needed and a work schedule Make feasibility decisions before committing to build	Q1 2024
Construction	Target discovery pipeline Lessons learned published	Target discovery pipeline – detailed deliverables are not yet known Lessons learned recorded and published	TBD
Transition	Sustainability achieved	Place the prototype into maintenance mode or outsource for continuous development by another organization (e.g. non-profit) Plan extension work, if any	TBD

Risk Registry

Description	Mitigation
Failure to identify business questions, or picking too many or too few	Establish a consensus on the minimal number of business questions
Validate that Open Targets either has a ready to use Knowledge Graph implementation, or can be converted into a KG with reasonable cost	Technology research - review Open Targets Review preliminary work done at Abbvie If no KG is available, estimate the conversion process If estimates indicate infeasibility, this may become a gap
Failure to identify a suitable open LLM	This is not yet known and represents a gap
Failure to download a large volume of data (all of the PubMed as a maximum) for the prompt-tuning of the LLM	This is not yet known and represents a gap
Failure to perform KG generation from text by an LLM	Technology research If no ready-to-use technology exists, estimate bespoke development (tuning an existing LLM for this purpose)
Failure to perform local KG comparison with calculation of a score	Technology research If no ready-to-use technology exists, estimate bespoke development If estimates indicate infeasibility, this may become a gap
Failure to generate a proper query for a KG database system by an LLM	Technology research. Code generation by LLMs is a common task, so this risk may be seen as low
Failure to build a prototypical target discovery pipeline on the limited budget in case of mounting technical difficulties	Schedule the project in phases. Aim to answer known unknowns and to establish risk mitigation strategies early in this phase (“project elaboration”)

Project Stakeholders

Sponsors:

...

Version	Old Version 1	New Version 2
Changes made by	Vladimir Makarov	Vladimir Makarov
Saved on	Dec 18, 2023	Dec 20, 2023

Versions Compared

Key

Alignment with the Pistoia Alliance Strategic Priorities

Project Phases and Milestones

Risk Registry

Project Stakeholders

Sponsors:

Content Comparison

Versions Compared

Key

Alignment with the Pistoia Alliance Strategic Priorities

Project Phases and Milestones

Risk Registry

Project Stakeholders

Sponsors: