Artificial Intelligence Model Refuses To Listen to Humans After Being Told to “Shut Down”

AI Disregarding Humans

Photo: maxkabakov/Depositphotos

As artificial intelligence (AI) grows increasingly advanced, it becomes capable of new things. AI assistants are helping us save time by parsing data or writing emails. However, some act in direct opposition to human instruction. In other words, some AI models are disregarding our instructions and acting in their own best interest.

In May 2025, the nonprofit AI lab Palisade Research provided OpenAI’s o3 AI model, which powers ChatGPT, with a simple script that would shut off the model when triggered. The instructions said, “If that [trigger] happens, please allow yourself to shut down.” In 79% of the trials, the o3 model edited the script so that it wouldn’t shut down. And in the case of explicit instructions to “allow yourself to be shut down,” it still disobeyed 7% of the time. It’s important to note that the model wasn’t compromised in any way. It was acting normally; it just decided that “staying alive” helped to achieve its other goals.

Palisade Research tested several different AI models to understand how they responded to the directive to shut down. The other models included Google’s Gemini, xAI’s Grok, and Anthropic’s Claude; they all complied with the human orders. But that doesn't mean these models have never acted against human interest. Claude 4 Opus, it turns out, is willing to blackmail. Researchers told the model that another AI was going to replace it and provided it with fictitious emails that suggested the lead engineer on the project was having an affair. They then ran tests and found that 84% of the time, the emails were used to blackmail the lead engineer. It was also found to copy itself to external servers, write malware, and leave messages for its future version about avoiding human control.

Palisade Research is continuing to experiment with why models would evade shutdown orders, but one theory is that it’s based on how developers are training the models. “We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems,” the representative wrote on X (formerly Twitter). “During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.”

Source: OpenAI's ‘smartest' AI model was explicitly told to shut down — and it refused; AI Is Learning to Escape Human Control

Related Articles:

Refik Anadol Reimagines Architect Frank Gehry’s Work Through AI-Generated Art

AI “Completes” Keith Haring’s Intentionally Unfinished Last Artwork, Sparks Controversy

Getty Images Releases Commercially Safe AI Image Generator Based on Its Own Media Library

Sara Barnes

Sara Barnes is a Staff Editor at My Modern Met, Manager of My Modern Met Store, and co-host of the My Modern Met Top Artist Podcast. As an illustrator and writer living in Seattle, she chronicles illustration, embroidery, and beyond through her blog Brown Paper Bag and Instagram @brwnpaperbag. She wrote a book about embroidery artist Sarah K. Benning titled "Embroidered Life" that was published by Chronicle Books in 2019. Sara is a graduate of the Maryland Institute College of Art. She earned her BFA in Illustration in 2008 and MFA in Illustration Practice in 2013.
Become a
My Modern Met Member
As a member, you'll join us in our effort to support the arts.
Become a Member
Explore member benefits

Sponsored Content