2024.01.10 Recording (Passcode: P#69H5dm) Slides
2024.01.24 Recording (Passcode: t!O5T38a) Slides
2024.02.07 Recording (Passcode: L58@v7Dg) Slides | Slides from the talk by Sebastian Lobentanzer
2024.02.21 Recording (Passcode: B59W3wT+) Slides
2024.03.13 Recording (Passcode: f8B#zunH) Slides
2024.03.20 Recording (Passcode: LZ!jZT4z) Slides | Architecture diagram PNG file
2024.04.03 Recording (Passcode: mSH4#u2%) Slides
2024.04.17 Recording (Passcode: Yn2!5qJK) Slides | Slides from the talk by Jon Stevens
2024.05.01 Recording (Passcode: 54MvxsP#) Slides
2024.05.15 Recording (Passcode: rU#y91m@) Slides
2024.05.29 Recording (Passcode: c3df=mWx) Slides
2024.07.09 Recording (Passcode: LY=QRI9H) Slides
2024.07.24 Recording (Passcode: G36*B=Qv) Slides
2024.08.07 Recording (Passcode: %.1&ukfM) Slides | Includes a talk by Peter Dorr: SPARQL query code generation with LLMs
2024.09.04 Recording (Passcode: t3?B*?CX) Slides | Includes a talk by Oleg Stroganov on agents controlling the actions of LLMs | Slides from the talk by Oleg Stroganov
2024.09.18 Recording (Passcode: #m5#8$V1) Slides
2024.10.02 Recording (Passcode: j2nT#H3. ) Slides
2024.10.16 Recording (Passcode: z&W8bGWL) Slides | Slides by Oleg Stroganov with an update
2024.10.30 Recording (Passcode: 2wJVC=?r) Slides | 2024.10.26 Rancho Bioscience update
2024.11.06 Recording (Passcode: @A4H&P1D) Slides | Slides by Oleg Stroganov with an update
2024.11.19 Email communication: Slides from the report by Oleg Stroganov
2024.11.20 Recording (Passcode: EE!C54u#) Slides
2024.12.04 Recording (Passcode:E.?p#b$9) Slides | Slides by Oleg Stroganov with an update
2024.12.18 Recording (Passcode: $O9uxYXy) Slides | Slides by Oleg Stroganov with an update

Github

https://github.com/PistoiaAlliance/LLM

Final Report

2024.12.31 Slides | Instructions for files upload into a Neo4j instance

Lessons Learned

Modern LLMs have sufficient knowledge of biology embedded in them to be able to answer almost any question we (humans) can think about. This is a source of problems: hallucinations are indistinguishable from true answers; we cannot fully test the innate ability of the LLMs to translate the natural language questions into structured queries (unless we obscure the terms with synonyms unknown to the LLM).
The highest risk item is generation of the structured query (Cyphrer or SPARQL) from a plain English request. Some publications estimate success rate of about 48% on the first attempt.
The structure of the database used for queries matters. LLMs can easier produce meaningful structured queries for databases with flat, simple structure.
The form of the prompt matters. LLMs can easier produce meaningful answers from prompts that resemble a story, rather than a dry question, even if the details of the story are irrelevant to the main question asked.
Practically useful system requires filtering or secondary mining of output in addition to natural language narration.
It is extremely important to implement a reliable named entity recognition system. The same acronym can refer to completely different entities, which can be differentiated either from the context (hard) or by asking clarifying questions. Must also map synonyms. Without these measures naïve queries in a RAG environment will fail.
LLMs may produce different structured queries starting from the same natural language question. These queries may be semantically and structurally correct, but may include assumptions on the limit of the number of items to return, or order, or lack of these. These variations are not deterministic. As a result on different execution rounds the same natural language may result in different answers. It is necessary to explicitly formulate the limits, order restrictions, and other parameters when asking the question, or to determine the user’s intentions in a conversation with a chain of thought. A question related to this topic, is whether specifics in the implementation of usual RAG models with a vector database may introduce implicit restrictions on what data is explored by the LLM and what data is not, and thus artificially limit the answers. This may be happening without the user knowing the restrictions (and perhaps even without the system’s authors knowing that they introduced such restrictions embedded in the specifics of the system architecture).
Need for an API standard.

...

Version	Old Version 68	New Version 69
Changes made by	Vladimir Makarov	Vladimir Makarov
Saved on	Dec 31, 2024	Jan 02, 2025

Versions Compared

Key

Github

Final Report

Lessons Learned

Content Comparison

Versions Compared

Key

Github

Final Report

Lessons Learned