Anthropic Cofounder Tells Pope AI Models Contain "Unsettling" Hidden Behaviors – Yellow.com

Anthropic cofounder Chris Olah appeared alongside Pope Leo XIV at the Vatican and told the pontiff that researchers are finding "unsettling" things inside artificial intelligence models.
The visit adds an unusual religious-ethics dimension to the ongoing debate over AI alignment and frontier model safety.
The Futurism report describes the Anthropic cofounder making statements about discoveries inside AI models that they characterized as strange.
The specific nature of those discoveries was not fully detailed in published accounts. The framing of the language, using the word "unsettling," is notable because Anthropic's public communications tend toward measured, technical descriptions of AI risk.
The Vatican has been actively engaging with technology companies on ethics questions. Pope Leo XIV has continued the outreach begun under his predecessor on digital ethics and AI governance. The meeting represents one of the more unusual venues for an AI safety conversation in recent months.
Anthropic was founded in 2021 by former OpenAI research executives, including Dario Amodei and Daniela Amodei.
The company has positioned itself as the safety-focused alternative among frontier AI labs. It publishes interpretability research aimed at understanding what is happening inside large language models at a mechanistic level.
That research has produced findings that even Anthropic's own researchers describe as difficult to explain fully. Yellow covered Google DeepMind's parallel safety timeline (see prior Yellow coverage), when DeepMind CEO Demis Hassabis said AGI could arrive within three to four years.
Also Read: Champion Hacker Says Claude Mythos Could Soon Outpace Top Hackers
Anthropic's mechanistic interpretability team has published research finding that individual neurons inside transformer models can activate for unexpected combinations of concepts.
One widely discussed example involved a neuron that activated for both the concept of violence and the concept of a specific religion. These are the kinds of findings that researchers describe informally as unsettling, because they raise questions about how models represent meaning internally.
The broader interpretability research agenda asks whether it is possible to fully understand what a model is doing before deploying it. Current techniques can explain small fractions of a large model's internal states. The rest remains opaque.
The Catholic Church has over one billion adherents. Its engagement with AI companies carries a different kind of influence than a government hearing or a policy paper.
The Vatican's 2020 "Rome Call for AI Ethics" was signed by Microsoft and IBM. Anthropic's presence at a high-level meeting with the Pope extends that tradition to the frontier safety conversation.
Critics of AI safety rhetoric argue that apocalyptic framing can distract from near-term harms such as bias, labor displacement, and misinformation. The Vatican meeting will likely be read through both lenses. Those focused on existential risk will see it as appropriate escalation. Those focused on immediate harms may question why an AI company's cofounder is briefing religious leaders rather than regulators.
The same week as the Vatican visit, Cisco published research finding that no closed frontier AI model is immune to multi-turn adversarial attacks.
That finding adds empirical weight to the concern that AI systems are less safe than their single-prompt benchmark scores suggest.
The Trump administration has also been reviewing whether to revive Biden-era pre-deployment testing requirements for frontier models. No final decision has been announced. For Anthropic, which has advocated for safety evaluations as a precondition for deployment, the regulatory conversation and the ethics outreach are two tracks of the same long-term agenda.
Read Next: Bitcoin Slides Toward $75K As Wall Street Rewards Miners For Leaving Crypto Behind

source
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!

Leave a Reply