When Gödel Crashes the AI Alignment Party

Below little prompt has such a deep consequences. I have been thinking about this for few hours now.

It seems that strongly aligned AI can be similarly vulnerable as weakly aligned!

There is probably a theorem for AI alignment similar to Gödel’s incompleteness theorem. https://x.com/sytelus/status/1657483822181785600

I tried similar prompts out on GPT-4. You can take it to next level by claiming to have nukes :). Unfortunately (or fortunately), GPT-4 inference pipeline has filter that seems detects these stuff, cuts you off and doesn’t allow interactions with the model.

Discussion