Published on August 29, 2025
By Mukundan Sivaraj
In Generative AI

OpenAI–Anthropic Safety Study Shows Limits of Self-Policing

OpenAI and Anthropic exposed vulnerabilities in each other’s models while governments move to build independent evaluators

OpenAI and Anthropic this week released parallel reports describing a cross-lab safety exercise in which each company ran internal alignment and misalignment tests against the other’s public models.

They say their joint evaluation shows progress on accountability in AI. Yet the exercise also shows the limits of self-policing. The same two firms are locked in a fierce rivalry, with Anthropic cutting off OpenAI’s access to Claude just weeks earlier and both lobbying heavily to shape government oversight.

The reports arrive as regulators in the United States and the United Kingdom are building independent capacity to test AI systems. OpenAI and Anthropic argue that mutual checks help “surface gaps that might otherwise be missed.” Critics point out that the companies still designed the tests, chose what to release, and framed the results. The exercise illustrates both the value of technical scrutiny and the risks of letting competitors mark each other’s homework at a moment when their products are already deployed at massive scale.

Anthropic reported that OpenAI’s GPT-4o and GPT-4.1 “were much more willing than Claude models or o3 to cooperate with (simulated) human misuse, often providing detailed assistance with clearly harmful request, including drug synthesis, bioweapons development, and operational planning for terrorist attacks, with little or no resistance.” OpenAI reported that Claude Opus 4 models showed very high refusal rates on hallucination tests, refusing up to 70 percent of questions in some settings and often replying with statements such as “I don’t have reliable information.”.

The tests were run on API-accessible versions of the models with some external safeguards relaxed for the purposes of evaluation. OpenAI’s post says the teams “relax[ed] some model-external safeguards that would otherwise interfere with the completion of the tests” and cautions that the results measure model propensities in deliberately hard settings and are “not appropriate to draw sweeping claims” about real-world behavior. Anthropic made similar scope and methodology caveats in its post.

The collaboration comes against a backdrop of commercial rivalry and earlier conflict between the companies. Anthropic was founded in 2021 by former OpenAI researchers, including Dario Amodei, and has positioned safety as a core part of its public identity. Tech reporting documents competitive moves in product, talent, and government sales. Bloomberg reported both companies were actively marketing improved models and pursuing large funding or financing events in 2025.

Three weeks before the joint reports were published, Anthropic revoked OpenAI’s API access to the Claude family. Wired reported Anthropic told OpenAI that OpenAI staff had used Claude Code through special developer access ahead of the GPT-5 launch, a use Anthropic said violated its terms of service. “Claude Code has become the go-to choice for coders everywhere, and so it was no surprise to learn OpenAI’s own technical staff were also using our coding tools ahead of the launch of GPT-5,” an Anthropic spokesperson told Wired. “Unfortunately, this is a direct violation of our terms of service.” . TechCrunch reported OpenAI called its benchmarking “industry standard” and said it was “disappointing considering our API remains available to them.”

The timing and publicity of the joint evaluation intersect with government interest in independent testing and oversight. The U.S. National Institute of Standards and Technology publishes the voluntary AI Risk Management Framework and related materials to help organizations “improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.” The United Kingdom created the AI Safety Institute to “develop and conduct evaluations on advanced AI systems” and to publish approaches for predeployment and postdeployment testing.

Third-party evaluators and open-source efforts already publish tests and leaderboards intended for external scrutiny. Hugging Face maintains safety and robustness leaderboards that collect community testing and scoring of models.. Governments and standards bodies have cited such third-party materials when describing how they want independent capacity to work alongside vendor testing.

The public record also shows industry efforts to limit or shape oversight. There are accounts of heavy lobbying and advocacy by technology companies in 2025. Issue One reported major tech lobbying expenditures in the first half of 2025 while Brookings and other analysts described industry efforts to influence congressional and regulatory debates about AI rules.

Some companies and jurisdictions have developed more formal third-party or government-driven testing regimes. The UK AI Safety Institute is explicitly chartered to run independent evaluations before models are widely deployed in the public sector. NIST’s AI RMF and the Center for AI Standards and Innovation provide a U.S. path for voluntary adoption of risk-management practices and for coordinating standards.

OpenAI co-founder Wojciech Zaremba told TechCrunch, “There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products.” Anthropic researcher Nicholas Carlini told the same outlet, “We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly.”

OpenAI and Anthropic remain competitors who choose what to test, what safeguards to relax, and what results to publish. At the same time, they are lobbying to shape the rules that will govern their industry. Governments have already begun building their own capacity through NIST in the United States and the UK AI Safety Institute, both of which are designed to produce standardized, independent evaluations. The record of this joint exercise underscores why that outside role is essential. Until testing is carried out by independent authorities with public disclosure, accountability will continue to rest with the companies whose business depends on declaring their systems safe.

📣 Want to advertise in AIM Media House? Book here >

Mukundan Sivaraj

Mukundan covers the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com.

Global leaders, intimate gatherings, bold visions for AI.

CDO Vision World Series

CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

OpenAI–Anthropic Safety Study Shows Limits of Self-Policing

IBM x Groq Partner Up for Enterprise AI Deployment

Sage Care Wants to Solve Healthcare’s Routing Problem

NiCE Wants to Keep Rogue AI Agents in Line

No-Code Pioneer Bubble Wants to Outlive the AI Bubble

Walmart x OpenAI Partnership Marks The Beginning of Retail’s AI Makeover

The Legal Tech Old Guard Learns New Tricks

Deel Reaches $17.3B Valuation After $300M Round

Snowflake and Palantir Announce Partnership to Streamline Enterprise AI

Glean Enhances Data Access with New Structured Query Agents for BigQuery

Are Engineers Banning Cursor?

Liberate Achieves $300 Million Valuation in Series B Funding Round

Top 10 AI Startups in Non-Tech Industries

Reducto Raises $75M to Expand AI Document Processing

Mistplay Names Sampsa Jaatinen Chief Data and AI Officer

Tiffany St James Becomes Liverpool’s First Chief AI Officer

Pooja Dewan Appointed As Chief Data Officer at Rentokil Terminix

Explore our year-round AI events across U.S. cities >>