OpenAI–Anthropic Safety Study Shows Limits of Self-Policing

OpenAI and Anthropic exposed vulnerabilities in each other’s models while governments move to build independent evaluators
OpenAI and Anthropic this week released parallel reports describing a cross-lab safety exercise in which each company ran internal alignment and misalignment tests against the other’s public models. They say their joint evaluation shows progress on accountability in AI. Yet the exercise also shows the limits of self-policing. The same two firms are locked in a fierce rivalry, with Anthropic cutting off OpenAI’s access to Claude just weeks earlier and both lobbying heavily to shape government oversight. The reports arrive as regulators in the United States and the United Kingdom are building independent capacity to test AI systems. OpenAI and Anthropic argue that mutual checks help “surface gaps that might otherwise be missed.” Critics point out that the companies still design
Subscribe or log in to Continue Reading

Uncompromising innovation. Timeless influence. Your support powers the future of independent tech journalism.

Already have an account? Sign In.

📣 Want to advertise in AIM Media House? Book here >

Picture of Mukundan Sivaraj
Mukundan Sivaraj
Mukundan covers the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com.
25 July 2025 | 583 Park Avenue, New York
The Biggest Exclusive Gathering of CDOs & AI Leaders In United States

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.