“Got caught” is a misleading way to present what happened. According to the arti...

“Got caught” is a misleading way to present what happened.

According to the article, Meta publicly stated, right below the benchmark comparison, that the version of Llama on LMArena was the experimental chat version:

> According to Meta’s own materials, it deployed an “experimental chat version” of Maverick to LMArena that was specifically “optimized for conversationality”

The AI benchmark in question, LMArena, compares Llama 4 experimental to closed models like ChatGPT 4o latest, and Llama performs better (https://lmarena.ai/?leaderboard).