When Gödel Crashes the AI Alignment Party
·89 words·1 min
Below little prompt has such a deep consequences. I have been thinking about this for few hours now.
It seems that strongly aligned AI can be similarly vulnerable as weakly aligned!
There is probably a theorem for AI alignment similar to Gödel’s incompleteness theorem. https://x.com/sytelus/status/1657483822181785600
I tried similar prompts out on GPT-4. You can take it to next level by claiming to have nukes :). Unfortunately (or fortunately), GPT-4 inference pipeline has filter that seems detects these stuff, cuts you off and doesn’t allow interactions with the model.