April 22, 2025:
Crowdsourced AI Benchmarks Under Scrutiny - Experts criticize crowdsourced AI benchmarking platforms like Chatbot Arena, citing ethical and academic concerns. Critics, including Emily Bender and Asmelash Teka Hadgu, argue these benchmarks lack validity and are misused by AI labs to make exaggerated claims. They advocate for diverse, dynamic, and professionally tailored benchmarks and suggest compensating evaluators to prevent exploitative practices.
While crowdsourcing provides valuable insights, it shouldn't replace other evaluation metrics. Chatbot Arena's co-founder emphasizes the platform's role in reflecting community preferences and commits to policy updates for fair evaluations.