Page Comparison

...

Recording: https://pistoiaalliance-org.zoom.us/rec/share/tNABZ4XV54gHrEGC4O2aZsxA1UVm6qLlblc3pfGSOKDG8Hwv9cTt4BzRjybAlR_4.-3DaDIOxI2QHusxr Passcode: 8FhD=wtj
Transcript: https://pistoiaalliance-org.zoom.us/rec/share/dEvIc4DaaaxaLr7qW7iapr8cnWlezudOdQXW2LPIzPQmI8nwqoKRM95EJ2VtW3Jm.5f4X7Hwa2GxxrvH6 Passcode: 8FhD=wtj
Focus on assigning relative weights. It seems that the most important categories are accuracy (on the dimensions of generating queries and writing plain text answers based on structured input), which in turn requires awareness of the biological terminology; then whether the model is open-source or not; and finally the cost. The other factors are seen as co-linear with these.
Homework: please review the spreadsheet and suggest values for the weights
Homework: action item for Brian: please add information in your columns in the spreadsheet [DONE]
New risk identified: some proprietary LLMs, such as ChatGPT, are censored by their authors. This means that in answering of scientific questions they may produce uncontrollable bias. This is a strong argument in favor of uncensored, open-source LLMs.
Based upon discussion today we’d have to take back the statement from the last week that given all equal ChatGPT 4 would win.

Notes from March 28th small team call:

Recording: https://pistoiaalliance-org.zoom.us/rec/share/oxGWla7rTcksvYfxMt0NrepIGltJxS6aYo-UUeN5dYQ21F8rNr8IW9LLNQCO-T-Y.OyMU-x9CAyXjcieU Passcode: uwwr&H5A
Transcript: https://pistoiaalliance-org.zoom.us/rec/share/mQT2t0Z0mbcIr8Yq0y1sqsoeo_nByZoTWPw8EwubZDxihARk5mgT8D-Gk_1IYG0a.AjRl0-VIFJyQKNWw Passcode: uwwr&H5A
Prompt size may be important, and we increased its weight in the comparison table
Preferred architecture would allow for swapping of LLMs
Censorship is most likely already included in the performance scores - this thought discounts the censorship risk
Given that not all scores are available, we may end up having to do our own evaluation
Consider hosting platforms for open-source models (Amazon Bedrock) instead of renting servers at AWS
- Preference for hosted models with per-per-token
- Add this dimension to the spreadsheet ACTION for Jon Stevens
Review rankings - ACTION for Brian and Etzard

Versions Compared