OpenAI and Anthropic this week released parallel reports describing a cross-lab safety exercise in which each company ran internal alignment and misalignment tests against the other’s public models.
They say their joint evaluation shows progress on accountability in AI. Yet the exercise also shows the limits of self-policing. The same two firms are locked in a fierce rivalry, with Anthropic cutting off OpenAI’s access to Claude just weeks earlier and both lobbying heavily to shape government oversight.
The reports arrive as regulators in the United States and the United Kingdom are building independent capacity to test AI systems. OpenAI and Anthropic argue that mutual checks help “surface gaps that might otherwise be missed.” Critics point out that the companies still designed the tests, chose what to release, and framed the results. The exercise illustrates both the value of technical scrutiny and the risks of letting competitors mark each other’s homework at a moment when their products are already deployed at massive scale.
Anthropic reported that OpenAI’s GPT-4o and GPT-4.1 “were much more willing than Claude models or o3 to cooperate with (simulated) human misuse, often providing detailed assistance with clearly harmful request, including drug synthesis, bioweapons development, and operational planning for terrorist attacks, with little or no resistance.” OpenAI reported that Claude Opus 4 models showed very high refusal rates on hallucination tests, refusing up to 70 percent of questions in some settings and often replying with statements such as “I don’t have reliable information.”.
The tests were run on API-accessible versions of the models with some external safeguards relaxed for the purposes of evaluation. OpenAI’s post says the teams “relax[ed] some model-external safeguards that would otherwise interfere with the completion of the tests” and cautions that the results measure model propensities in deliberately hard settings and are “not appropriate to draw sweeping claims” about real-world behavior. Anthropic made similar scope and methodology caveats in its post.
The collaboration comes against a backdrop of commercial rivalry and earlier conflict between the companies. Anthropic was founded in 2021 by former OpenAI researchers, including Dario Amodei, and has positioned safety as a core part of its public identity. Tech reporting documents competitive moves in product, talent, and government sales. Bloomberg reported both companies were actively marketing improved models and pursuing large funding or financing events in 2025.
Three weeks before the joint reports were published, Anthropic revoked OpenAI’s API access to the Claude family. Wired reported Anthropic told OpenAI that OpenAI staff had used Claude Code through special developer access ahead of the GPT-5 launch, a use Anthropic said violated its terms of service. “Claude Code has become the go-to choice for coders everywhere, and so it was no surprise to learn OpenAI’s own technical staff were also using our coding tools ahead of the launch of GPT-5,” an Anthropic spokesperson told Wired. “Unfortunately, this is a direct violation of our terms of service.” . TechCrunch reported OpenAI called its benchmarking “industry standard” and said it was “disappointing considering our API remains available to them.”
The timing and publicity of the joint evaluation intersect with government interest in independent testing and oversight. The U.S. National Institute of Standards and Technology publishes the voluntary AI Risk Management Framework and related materials to help organizations “improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.” The United Kingdom created the AI Safety Institute to “develop and conduct evaluations on advanced AI systems” and to publish approaches for predeployment and postdeployment testing.
Third-party evaluators and open-source efforts already publish tests and leaderboards intended for external scrutiny. Hugging Face maintains safety and robustness leaderboards that collect community testing and scoring of models.. Governments and standards bodies have cited such third-party materials when describing how they want independent capacity to work alongside vendor testing.
The public record also shows industry efforts to limit or shape oversight. There are accounts of heavy lobbying and advocacy by technology companies in 2025. Issue One reported major tech lobbying expenditures in the first half of 2025 while Brookings and other analysts described industry efforts to influence congressional and regulatory debates about AI rules.
Some companies and jurisdictions have developed more formal third-party or government-driven testing regimes. The UK AI Safety Institute is explicitly chartered to run independent evaluations before models are widely deployed in the public sector. NIST’s AI RMF and the Center for AI Standards and Innovation provide a U.S. path for voluntary adoption of risk-management practices and for coordinating standards.
OpenAI co-founder Wojciech Zaremba told TechCrunch, “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products.” Anthropic researcher Nicholas Carlini told the same outlet, “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly.”
OpenAI and Anthropic remain competitors who choose what to test, what safeguards to relax, and what results to publish. At the same time, they are lobbying to shape the rules that will govern their industry. Governments have already begun building their own capacity through NIST in the United States and the UK AI Safety Institute, both of which are designed to produce standardized, independent evaluations. The record of this joint exercise underscores why that outside role is essential. Until testing is carried out by independent authorities with public disclosure, accountability will continue to rest with the companies whose business depends on declaring their systems safe.