Published on August 19, 2025
By Mukundan Sivaraj
In Generative AI

Parallel Says It Can Beat OpenAI at Deep Research

Parag Agrawal’s new startup makes bold claims, but without peer review, the evidence remains unproven

“We can outperform OpenAI’s deep research and every leading model’s deep research quality,” Former Twitter CEO Parag Agrawal told TBPN as he announced Parallel’s new deep research API. Parallel looks to beat the giant general-purpose models at a capability that they are actively trying to perfect themselves. For now, however, the claim is premature.

His company’s launch just attracted $30 million from backers including Khosla Ventures, Index Ventures, and First Round Capital. The appeal is obvious, as deep research has quickly become one of the most valuable applications of AI, moving it towards the kind of multi-step reasoning that businesses, scientists, and investors actually rely on.

The argument makes sense on the surface. Agrawal explained the idea: “We’re building Parallel to build infrastructure for AI’s using the web. The web was built for humans. Two years ago, I started thinking AI… is going to be the primary user at a massive scale.”

By designing search, indexing and ranking for machines instead of browsers, more reliable factual signals get fed into language models. That is an engineering win Parallel can point to legitimately.

The launch materials and Parag’s posts do make concrete claims backed by numbers. The company says its Deep Research API outscored humans and leading models on hard web-research tests, and Parag has repeated those figures on LinkedIn and in press coverage. Their use of OpenAI’s BrowseComp benchmark, which measures how well agents find hard to locate web information, is relevant here. BrowseComp is meant specifically to test multi-step, persistent browsing behavior that ordinary chatbots do not always handle well.

Parallel is also not the only company trying to brand itself around deep research. In recent weeks, Thomson Reuters rolled out its own Deep Research tool aimed at transforming legal search and case analysis, while Manus announced a “Wide Research” product that deploys swarms of agents (hundreds at once) to crawl and synthesize web sources.

The problem is the breadth of the claim. Agrawal did not say “in our tests” or “on selected problems.” His claim was absolute. And that is the claim that needs third-party verification.

“Parallel exceeds the accuracy of humans working for 2 hours for just $0.10 per task.”

The company’s numbers look impressive, but they are self-reported and derived from a narrow set of tests and conditions. They point to specific benchmark runs and to demos showing better accuracy and citation behavior than other systems in those runs. When a startup claims superiority on a hard, newly defined task, the responsible next steps are to publish precise datasets, make evaluation code public, and invite neutral researchers or industry labs to reproduce the runs.

Third-party vetting isn’t unheard of in this space. Cognition’s “Devin” had its coding abilities checked on Princeton’s SWE-bench Verified, where patches are independently executed and scored by the benchmark maintainers, and reported a 13.86% solve rate on the Verified subset, a result reflected on the public leaderboard. In healthcare, Abridge’s ambient AI scribe has peer-reviewed evaluation from the University of Kansas Medical Center published this year, reporting significant improvements in documentation workflow and reduced after-hours work

There is also another practical reason for caution. The big model makers have been building their own deep research modes, and they also have an abundance of resources. Google’s Gemini now advertises a Deep Research capability that is explicitly designed to explore live web content and synthesize it into reports. That is the same area Parallel says it excels in. Google has deep control over search, and the massive compute and context windows that matter in these tasks. OpenAI has its own browsing benchmarks and specialized research agents too.

While OpenAI’s latest model has been the subject of mixed user reaction even as it posts strong internal numbers on many benchmarks, that messiness does not settle Parallel’s claim either way.

Parallel can’t be dismissed outright. Building a data retrieval system optimized for agents is an underexplored design point, and statements from Parag point to the fact that the company has healthy operating instincts: “At every price point, you have got to be the best. And for someone who has no price sensitivity, you have got to be the absolute best.”

The rest of the field, that is: independent benchmarkers, researchers or customers, will need to publicly confirm that Parallel’s Deep Research API consistently beats the likes of OpenAI and Google. Until then, Parallel is measured only against its own internal metrics and the public numbers from rivals.

📣 Want to advertise in AIM Media House? Book here >

Mukundan Sivaraj

Mukundan covers the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com or Signal at mukundan.42.

Global leaders, intimate gatherings, bold visions for AI.

CDO Vision World Series

CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Parallel Says It Can Beat OpenAI at Deep Research

Etsy Names Kruti Patel Goyal as New CEO

Perplexity Patents Turns IP Search Into a Conversation

Figma Acquires Weavy to Expand Media Generation in Design

Harvey Specter Inspired AI Startup Raises $150 Million

Bevel Raises $10 Million Series A to Build Its AI Health Companion

Archy Raises $20 Million to Bring AI Automation to Dental Practices

You Too, Youtube??

Siemens x Capgemini Strengthen Partnership to Reveal AI Manufacturing Solutions

EXL and Databricks Launch EXLdata.ai to Make Enterprise Data AI-Ready

The AI Startups Without Machines Are Losing Ground

Inside Sephora’s AI-Driven Beauty Transformation

Top 8 OpenAI Alumni Who Left to Launch New Startups

Reflectiz Secures $22 Million to Expand Web Threat Management

TestSprite Raises $6.7M to Automate the Hardest Part of AI Coding

IBM Consulting Appoints Yogendra (Yogi) Goyal to Lead Its Global AI-First Business Operations

Robin Gordon Joins Hippo as Chief Data Officer to Drive Analytics

Explore our year-round AI events across U.S. cities >>