...
2024.02.12 sub-team call:
Recording: https://pistoiaalliance-org.zoom.us/rec/share/HMVULVgYhhm0afZyU1LJEcm7XmOPwYahu9-xCmeIgCemdwQ7DwRkU6WoubuX4yZd.BFvjGBf6-0qvu1a2?startTime=1707750476000
Passcode: =8T7nf=eThis sub-team is done.
Questions are defined. Some can be readily answered with the Open Targets KG. Good enough questions are ok. Not everything should be answerable right away.
Let us now take one or more of easy to address (“green”) questions and feed it to KG. Get POC for technical and procedural feasibility.
No need to invest in harder questions now, no need to expand the Open Targets KG
Success criterion #1: compare RAG answer to the opinion from a human scientist who is an expert on the topic of the question
Success criterion #2: compare RAG answer (with a plain language question asked of LLM) to the KG-derived answer produced by an expert data scientist
Strategy: Ask at the next large team call: validate an approach to encase Open Targets KG questions as multiple modules with tuned prompts for a RAG system, one for each of the questions. But say we are successful in the POC, - then what? Need a better vision of success, exciting, complex.
Creating many open-source APIs to public data sources is not exciting. Perhaps define a standard for such APIs?
Need for the API standard is one lesson learned
This is valuable for KG software vendors: ease of “wiring in” additional data, including proprietary data sources
What is the volatility of the data sets that we want to eventually use? For rapidly developing data sources continuous updates may be needed to accommodate ongoing changes in data source structure. So need a data standard for rapidly evolving data sources, with data increments (easier use case) or completely new data dimensions and variables (harder use case). Can a data source present itself to a RAG query system to automate data updates?
Possible new risk: will LLMs be confused by the similar data types from multitude of sources?
Provider / vendor / expert ecosystem needed - not just Open AI ChatGPT with Medline in it