...
This call was short and not recorded
The remaining items in the LLM comparison table are costs for the Llama models (Brian to look up) and the performance figures on BioCypher (here we are dependent on Sebastian and may have to wait)
There is an expectation, based on team members' work experiences on other projects, that fine-tuning of open-source models may be heavily dependent on use case and may not be cost-effective
In that case GPT4 would win
Notes from March 21st small team call:
Recording: https://pistoiaalliance-org.zoom.us/rec/share/tNABZ4XV54gHrEGC4O2aZsxA1UVm6qLlblc3pfGSOKDG8Hwv9cTt4BzRjybAlR_4.-3DaDIOxI2QHusxr Passcode: 8FhD=wtj
Transcript: https://pistoiaalliance-org.zoom.us/rec/share/dEvIc4DaaaxaLr7qW7iapr8cnWlezudOdQXW2LPIzPQmI8nwqoKRM95EJ2VtW3Jm.5f4X7Hwa2GxxrvH6 Passcode: 8FhD=wtj
Focus on assigning relative weights. It seems that the most important categories are accuracy (on the dimensions of generating queries and writing plain text answers based on structured input), which in turn requires awareness of the biological terminology; then whether the model is open-source or not; and finally the cost. The other factors are seen as co-linear with these.
Homework: please review the spreadsheet and suggest values for the weights
Homework: action item for Brian: please add information in your columns in the spreadsheet
New risk identified: some proprietary LLMs, such as ChatGPT, are censored by their authors. This means that in answering of scientific questions they may produce uncontrollable bias. This is a strong argument in favor of uncensored, open-source LLMs.
Based upon discussion today we’d have to take back the statement from the last week that given all equal ChatGPT 4 would win.