Meta Refutes Claims of Llama 4 Benchmark Manipulation

April 7, 2025: Meta Refutes Claims of Llama 4 Benchmark Manipulation - Meta's VP of generative AI, Ahmad Al-Dahle, refuted claims that the company manipulated benchmark scores for its Llama 4 Maverick and Scout models by training them on test sets. The rumor surfaced from a Chinese social media post by a purported former employee, alleging Meta concealed the models' weaknesses.

Discrepancies in model performance reports and the use of an experimental Maverick version for benchmarks contributed to the speculation. Al-Dahle admitted to varied user experiences and pledged to resolve these issues as implementations stabilize.

AI TECHNOLOGY PERFORMANCE DISPUTE

Meta exec denies the company artificially boosted Llama 4’s benchmark scores

AI TECHNOLOGY PERFORMANCE DISPUTE

Meta exec denies the company artificially boosted Llama 4’s benchmark scores

Stay Current on AI in Minutes Weekly