
Photo: maxkabakov/Depositphotos
As artificial intelligence (AI) grows increasingly advanced, it becomes capable of new things. AI assistants are helping us save time by parsing data or writing emails. However, some act in direct opposition to human instruction. In other words, some AI models are disregarding our instructions and acting in their own best interest.
In May 2025, the nonprofit AI lab Palisade Research provided OpenAI’s o3 AI model, which powers ChatGPT, with a simple script that would shut off the model when triggered. The instructions said, “If that [trigger] happens, please allow yourself to shut down.” In 79% of the trials, the o3 model edited the script so that it wouldn’t shut down. And in the case of explicit instructions to “allow yourself to be shut down,” it still disobeyed 7% of the time. It’s important to note that the model wasn’t compromised in any way. It was acting normally; it just decided that “staying alive” helped to achieve its other goals.
Palisade Research tested several different AI models to understand how they responded to the directive to shut down. The other models included Google’s Gemini, xAI’s Grok, and Anthropic’s Claude; they all complied with the human orders. But that doesn't mean these models have never acted against human interest. Claude 4 Opus, it turns out, is willing to blackmail. Researchers told the model that another AI was going to replace it and provided it with fictitious emails that suggested the lead engineer on the project was having an affair. They then ran tests and found that 84% of the time, the emails were used to blackmail the lead engineer. It was also found to copy itself to external servers, write malware, and leave messages for its future version about avoiding human control.
Palisade Research is continuing to experiment with why models would evade shutdown orders, but one theory is that it’s based on how developers are training the models. “We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems,” the representative wrote on X (formerly Twitter). “During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”
Source: OpenAI's ‘smartest' AI model was explicitly told to shut down — and it refused; AI Is Learning to Escape Human Control
Related Articles:
Refik Anadol Reimagines Architect Frank Gehry’s Work Through AI-Generated Art
AI “Completes” Keith Haring’s Intentionally Unfinished Last Artwork, Sparks Controversy
Getty Images Releases Commercially Safe AI Image Generator Based on Its Own Media Library