LM Arena Accused of Favoring Top AI Labs in Benchmark Manipulation
New study accuses LM Arena of helping AI giants like Meta, OpenAI, and Google manipulate their leaderboard scores.
Matilda
LM Arena Accused of Favoring Top AI Labs in Benchmark Manipulation
A new study led by AI labs Cohere, Stanford, MIT, and Ai2 raises serious concerns about LM Arena’s practices surrounding its Chatbot Arena benchmark. The paper claims that LM Arena, a crowdsourced AI benchmark platform, allowed leading AI companies like Meta, OpenAI, Google, and Amazon to privately test multiple AI models, selectively withholding the results of lower-performing variants. This preferential treatment allegedly helped these companies secure top spots on the platform’s leaderboard, providing them with an unfair advantage over rivals. Image Credits:Andriy Onufriyenko / Getty Images What is Chatbot Arena, and how does it work? Created as an academic research project at UC Berkeley in 2023, Chatbot Arena serves as a competitive platform where AI models are tested by users who choose the best responses between two competing models. This format aims to provide an unbiased evaluation of AI performance, with a leaderboard ranking models based on user votes. Ho…