Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't believe there's any way to handle the open-ended complexity of the real world other than data. Whether that comes from human demonstrations or self-collected with reinforcement learning, there's just no way around it.

But I disagree there has been no progress. Check out this lecture from Sergey Levine about sample-efficient RL in the real world (no simulator): https://www.youtube.com/watch?v=17NrtKHdPDw



Problem is behaviour cloning doesnt generalize, and even on tasks where you have infinite optimal data (solved board games) it yields lower scores than self play. (and still doesnt generalize to unseen states)

Sim to real, self play, and curriculum learning have yielded superhuman performances when done correctly in situations where they can be done. Behaviour cloning doesnt.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: