The Lebowski Theorem: AI's Reward Function Hack

20 November 2018·23 words·1 min · Download pdf

“No superintelligent AI is going to bother with a task that is harder than hacking its reward function.” — The Lebowski theorem