I don't believe there's any way to handle the open-ended complexity of the real ...

wegfawefgawefg · on Jan 17, 2024

Problem is behaviour cloning doesnt generalize, and even on tasks where you have infinite optimal data (solved board games) it yields lower scores than self play. (and still doesnt generalize to unseen states)

Sim to real, self play, and curriculum learning have yielded superhuman performances when done correctly in situations where they can be done. Behaviour cloning doesnt.