A Blind LLM Taste Test

Luke Demi created a fun session today! We tested five prompts against five different models. We scored it in a matrix and ranked them roughly in this order:

Google Bard
Anthropic Claude v2 (https://claude.ai)
GPT-4
Falcon Instruct (40b) self-hosted https://huggingface.co/tiiuae/falcon-40b-instruct
Orca mini 3b (on my laptop) https://huggingface.co/psmathur/orca_mini_3b

I was surprised at Google bards performance as it seems to use internal google APIs very well to provide real time information.

You can find the matrix we used here:
https://docs.google.com/spreadsheets/d/1kdPH7Z3UKntfYbEeFfeix9USn1Lh8kDxMz6bVKtqkRs/edit?usp=sharing

lancasterai

Leave a Reply Cancel reply