LLM Selection

This is the subpage for the LLM Selection sub-team: Jon Stevens, Etzard Stolte, Helena Deus; Brian Evarts; Wouter Franke, Matthijs van der Zee;

Notes from the January 24th general PM call:

08:40:27 From Brian Evarts (CPT) to Everyone:
Has anyone tried QLORA or other Quantization techniques for fine tuning?
08:42:05 From stevejs to Everyone:
@Brian we had a QLORA fine-tuned llama2 model that we fine-tuned to increase the sequence length. Quality was OK, but we haven’t used it in production because the model was pretty beefy and we need more infra to increase the speed of the model

Notes from the January 26th small team call:

Recording: https://pistoiaalliance-org.zoom.us/rec/share/0hrNcFRs8SWpojuIySGTrObp7Q_3HB-n_2lxaMKbXDA9tH_dGQ2VqRf0NvLaytl1.UEZyKTY0mUeOhFwc
Passcode: 4=UyzhM$
Transcript: https://pistoiaalliance-org.zoom.us/rec/share/lvJ6tFaEXpApRatSufu9O3KnS0uyDDM3Ojdu3ceCIpXngtSdnm7MglEAIRFP_fGW.pLfIg7Ka3u7gK1KG
Passcode: 4=UyzhM$
Private brainstorming document is at: https://docs.google.com/document/d/1ip5vmGuRXVey1Ml_uiSUURAUf-a6KMCepDC_ovrds4M/edit?usp=sharing
List of candidate LLMs with evaluation criteria: https://docs.google.com/spreadsheets/d/1muOE2zweNl9LvW1yIsUcJRy3gGTDe2C_/edit?usp=sharing&ouid=111803761008578493760&rtpof=true&sd=true

Notes from February 1st small team call:

Recording https://pistoiaalliance-org.zoom.us/rec/share/3vT1H30cX_zFgEUtJ828Spj158rN2oqOssBNDc6hC1mti2TQ5G-uxdoBZkN7I8GQ.xtvwbqf---G-qnut?startTime=1706799805000
Passcode: vW*uB7^2
Transcript: https://pistoiaalliance-org.zoom.us/rec/share/dcrTHAezwaqAQtBPJMUSgGAb5LUS8lTiqJquis3yeyo6U6SgTvPVk6dDZ0K6oNIU.7s_B5JDf98QIHXz1
Passcode: vW*uB7^2
The main action item is to add information to the list of candidate LLMs: https://docs.google.com/spreadsheets/d/1muOE2zweNl9LvW1yIsUcJRy3gGTDe2C_/edit?usp=sharing&ouid=111803761008578493760&rtpof=true&sd=true

Notes from February 15th small team call:

Warning: BioCypher may not be W3C compliant, and needs discussion in the large team before adoption - or consider alternatives - so far this is the most important question.
- This team cannot make progress until we make the decision about BioCypher
Focus on smaller, cheaper models first? Pick a handful of models, at various size points, look up performance on general benchmarks
What is the task → that dictates the choice of the benchmarks
Verify that BioChatter has benchmarks for writing cypher queries
How important is each benchmark? Perhaps create a linear model that combines multiple scores into a single score
Helena: This benchmark answers the question “what are the best embeddings” across a variety of tasks: https://huggingface.co/spaces/mteb/leaderboard
Convert into a weekly call at the same time on Thursdays for the next six weeks