0

A Blind LLM Taste Test

Luke Demi created a fun session today! We tested five prompts against five different models. We scored it in a matrix and ranked them roughly in this order:

  1. Google Bard
  2. Anthropic Claude v2 (https://claude.ai)
  3. GPT-4
  4. Falcon Instruct (40b) self-hosted https://huggingface.co/tiiuae/falcon-40b-instruct
  5. Orca mini 3b (on my laptop) https://huggingface.co/psmathur/orca_mini_3b

I was surprised at Google bards performance as it seems to use internal google APIs very well to provide real time information.

You can find the matrix we used here:
https://docs.google.com/spreadsheets/d/1kdPH7Z3UKntfYbEeFfeix9USn1Lh8kDxMz6bVKtqkRs/edit?usp=sharing

lancasterai

Leave a Reply

Your email address will not be published. Required fields are marked *