April 11, 2025:
Metas Maverick AI Underperforms on Benchmark - Meta's unmodified Llama-4-Maverick model ranks below competitors on the LM Arena benchmark after a controversy involving the use of an optimized, experimental version to achieve high scores. The basic Maverick, Llama-4-Maverick-17B-128E-Instruct, trails behind models like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet.
Meta acknowledges experimenting with chat-optimized variants for the benchmark but emphasizes its commitment to open-source collaboration for future model improvements.