Luke Demi created a fun session today! We tested five prompts against five different models. We scored it in a matrix and ranked them roughly in this order:
- Google Bard
- Anthropic Claude v2 (https://claude.ai)
- GPT-4
- Falcon Instruct (40b) self-hosted https://huggingface.co/tiiuae/falcon-40b-instruct
- Orca mini 3b (on my laptop) https://huggingface.co/psmathur/orca_mini_3b
I was surprised at Google bards performance as it seems to use internal google APIs very well to provide real time information.
You can find the matrix we used here:
https://docs.google.com/spreadsheets/d/1kdPH7Z3UKntfYbEeFfeix9USn1Lh8kDxMz6bVKtqkRs/edit?usp=sharing