Large Language Models can Strategically Deceive their Users when put under pressure

Existence proof that GPT-4 can performed misaligned actions, perform strategic deception and double down on it when put under enough pressure. The type of pressure seems irrelevant, but the quantity of pressure sources seems to make a difference.

Previous
Next