When RL Hits a Wall: Humans Open Doors
This is amazing write up on what it took to make RL work on Obstacle Tower Challange that was specifically designed to eliminate loopholes otherwise exploited by RL algorithms on game problems. It turns out learning from human demonstrations was the key. https://x.com/unixpickle/status/1154052487495790594