OpenAI and Anthropic this week released parallel reports describing a cross-lab safety exercise in which each company ran internal alignment and misalignment tests against the other’s public models.
They say their joint evaluation shows progress on accountability in AI. Yet the exercise also shows the limits of self-policing. The same two firms are locked in a fierce rivalry, with Anthropic cutting off OpenAI’s access to Claude just weeks earlier and both lobbying heavily to shape government oversight.
The reports arrive as regulators in the United States and the United Kingdom are building independent capacity to test AI systems. OpenAI and Anthropic argue that mutual checks help “surface gaps that might otherwise be missed.” Critics point out that the companies still design
OpenAI–Anthropic Safety Study Shows Limits of Self-Policing
- By Mukundan Sivaraj
- Published on
OpenAI and Anthropic exposed vulnerabilities in each other’s models while governments move to build independent evaluators
