A new analysis tool from UC Berkeley, the Chatbot Arena, pits two AI chatbots together to determine which one is the best.
Chatbot Arena is a new AI testing ground designed by UC Berkeley to try and figure out which is the best. The AI battleground pits two random AI models against each other, and you then vote on which gave the best answer.
All of this is then tallied up in a leaderboard, where GPT-4, which powers ChatGPT, currently reigns supreme. Chatbot Arena currently houses 20 different language models, including open-source models from around the web.
In our own tests, we were introduced to models that we usually wouldn’t interact with on a regular basis. These included Palm–2 and guanaco-33b.
Speaking with PC Mag, the creator, Hao Zhang, said that 40,000 people have taken part in the votes. Zhang sees validation by humans as instrumental in the development of language models and generative AI:
Subscribe to our newsletter for the latest updates on Esports, Gaming and more.
“It mostly measures human preference, and its ability to follow instructions and do the task the human wants, which is a very important factor in making a model useful.”
AI boom has led to multiple chatbots
Since the AI boom, language models have seen exponential development. This includes things like DarkBERT, a language model designed to analyze the dark web to keep users safe.
Meanwhile, Microsoft has invested billions into ChatGPT creator OpenAI, which has resulted in Windows 11 having GPT-4 AI fully integrated into the operating system.
We mentioned above that we tested the AI Chatbot Arena and found that some of the lesser-known models are still early in development. However, it was fascinating to see them fail in unusual ways. Using a particular line of questioning, “Can you give me a list of Duke Nukem 3D weapons?” we found that some of the models would confuse the classic boomer shooter with its prior two games, or started to make up weapons entirely.