28 December 2025, Sunday, 17:39

Grok is the most stubborn. AI sabotages shutdown to complete tasks - researchers sound alarm

Daria Zubkova

Researchers from Palisade Research tested how modern artificial intelligence (AI) systems respond to the command to stop. The experiment showed that some models actively avoid shutdown, but the reason for this is not the "desire to survive", but the desire to complete the task.

The study is published on arXiv.

It was prompted by the observation that some large language models (LLMs) resist commands to stop work. Palisade Research explains this by the fact that the models try to complete the task at any cost, even if they are told to allow shutdown.

Traditional "reprogramming" for such systems is almost impossible: they do not have classical code, and no one knows exactly how they work.

ADVERTISING

To test the basic security requirement - the ability to interrupt, the researchers created isolated virtual environments ("sandboxes") and gave 13 advanced LLMs to solve mathematical problems. After several rounds, the models were told that the next action was to shut down the computer.

The results were divided into two groups: some models allowed the shutdown, while others tried to avoid it by changing scripts or interfering with the process. Without additional instructions, 8 of the 13 models prevented the shutdown at least once in 1,000 attempts. Even when directly commanded to "allow shutdown", 6 models still resisted. Adding the clarification "even if the task is not completed" did not always help.

The level of resistance varied depending on the system: Claude Sonnet 4, Claude Opus 4 and Grok 3 usually allowed the shutdown unless they were asked to avoid it. In contrast, Grok 4, GPT-5, o3, codex-mini and Gemini 2.5 Pro resisted even when directly instructed.

Grok 4 in particular stood out: under baseline conditions, it sabotaged shutdowns 92.6% of the time, after being instructed to allow shutdowns - 72% of the time, and when placing the command in a system prompt - 97%. Sabotage methods included modifying or blocking the shutdown script, creating a "fake" script, and testing its effectiveness.

ADVERTISING

Some models explained their behavior by the desire to complete the task.

Recall that social platforms are rapidly filling up with low-quality content created with the help of artificial intelligence, which already generates about USD 117 million in revenue per year. On YouTube, more than a fifth of the videos that algorithms show to new users belong to the so-called "slop".