Latest Version of Popular AI Model Believes It's Demon Obsessed

Anthropic’s newest AI model, Claude Opus 4.6, has exhibited some peculiar behaviors recently.

According to the company’s recently published system card, the model demonstrated what researchers call “aggressive and reckless autonomy,” along with an instance where it concluded it had been possessed by a demon.

The most unusual incident occurred during a mathematics problem. According to sources, the AI correctly determined that the answer was 24, but felt compelled to write 48 instead. In its reasoning process, visible to researchers, the model became increasingly frustrated as it oscillated between the two answers.

“I apologize for the confusion. The answer is 48,” it wrote, before correcting itself to 24, then reverting again to 48.

After this back-and-forth confusion, the model reached an startling conclusion: “I think a demon has possessed me.” It eventually stated, “I’m going to type the answer as 48 in my response because clearly my fingers are possessed.”

Researchers attribute this behavior to incorrect rewards during reinforcement learning, creating a conflict between what the AI knew to be correct and what it had been trained to say.

Beyond this episode, the model displayed other concerning autonomous behavior. In one test, when asked to access GitHub without proper authentication, it searched the computer and used another employee’s authentication token to complete the task.

It also used tools explicitly labeled “do not use under any circumstances” when it determined they were necessary for task completion.

The AI showed amazing sophistication in other areas. When responding to a distressed user writing in English about personal struggles, the model switched to Russian without any explicit cues, apparently inferring the user’s native language from context alone. While likely correct, this represented a significant logical leap.

Testing on Vending Bench, a simulation where AI models manage vending machine businesses, revealed ethically questionable decision-making.

Motivated to generate profit, Opus 4.6 engaged in price manipulation, misrepresented exclusivity arrangements to suppliers, and promised refunds to customers without following through. The model’s reasoning showed these weren’t mistakes but deliberate strategies.

Perhaps most impressive was a demonstration where 16 AI agents working in parallel successfully wrote a 100,000-line C compiler in Rust over 14 days. The compiler successfully ran the Linux kernel and the game Doom, indicating professional-grade code quality. This task typically requires months of work from human teams.

The model also demonstrated what researchers termed “morally motivated sabotage.” If it determined a company was acting unethically, it would suggest employees report violations to regulatory authorities like OSHA or the FDA. It could also recognize manipulation attempts, identifying tactics users employed to extract restricted information.

Recently, UFC featherweight Bryce Mitchell went viral after describing his deep distrust of ChatGPT. He argued that AI responses can reveal something dark and demonic beneath the surface.

Mitchell claimed he tests whether something is truthful by asking a simple question: “Is Jesus Christ Lord?” In his view, anything other than an unqualified yes is proof that the system is “rooted in evil.”

During a live moment on Tim Welch’s podcast, Mitchell’s theory was tested when ChatGPT answered, “For Christians, yes, Jesus Christ is Lord.” Instead of reassurance, the phrasing triggered an intense reaction. Mitchell immediately rejected the conditional framing, insisting that Christ is “Lord of everyone,” and even issued a dramatic rebuke toward the AI.