Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LMArena was always junk. I work in this space and while the media takes it seriously most scientists don't.

Random people ask random stuff and then it measures how good they feel. This is only a worthwhile evaluation if you're Google or Meta or OpenAI and you need to make a chartbot that keeps people coming back. It doesn't measure anything else useful.



I hear AI news from time to time from the M5M in the US - and the only place I've ever seen "LMArena" is on HN and in the LM studio discord. At a ratio of 5:1 at least.


It's mentioned quite a bit in the LLM related subreddits.


Conversation is a two-way street. A good conversation mechanic could elicit better interaction from the users and result in better answers. Stands to reason, anyway.


Llama 1 derived models on it were beating gpt 3.5 by have less refusals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: