Cerebras Built a Chip the Size of a Dinner Plate

We built a chip the size of a dinner plate while everybody else was building chips the size of a postage stamp.

Silicon hasn’t fundamentally changed shape in decades. Most AI chips are still bound by the limits of traditional packaging, designed to fit inside server racks and built to scale incrementally. Cerebras Systems broke that convention by building a chip the size of a dinner plate.

The move was as functional as it was radical. Cerebras’ Wafer Scale Engine, with 850,000 cores and 2.6 trillion transistors, keeps data on-chip and minimizes memory bottlenecks delivering a performance advantage the company claims is roughly 50 times faster than Nvidia GPUs for inference, the essential workload of AI deployment. Independent benchmarks show that Cerebras’ WSE-3 outperforms the latest Nvidia H100 and B200 GPUs in performance per watt and memory scalability.

“We built a chip the size of a dinner plate while everybody else was building chips the size of a postage stamp,” said Andrew Feldman, co-founder and CEO of Cerebras. The decision was not symbolic. It reflected an anticipation of AI’s future, which Feldman describes as increasingly shaped by agentic workloads, reasoning, decision-making, and long-context processing that traditional GPUs struggle to handle without latency.

Feldman argues that real-time inference has become the defining constraint of AI deployment. GPUs, he noted, often introduce delays during reasoning-heavy tasks. Cerebras, by contrast, has designed its architecture around low latency and high throughput. The company’s platform is tuned for what Feldman refers to as the “beauty of agentic work”—models that do more than autocomplete text, instead reasoning across data to deliver actionable outcomes.

One demonstration of this design came with Cerebras’ recent deployment of Alibaba’s Qwen3-235B model on its Inference Cloud. The company expanded the model’s context window from 32,000 to 131,000 tokens, reduced response times from over a minute to under two seconds, and slashed costs. Cerebras offers inference at $0.60 per million input tokens and $1.20 per million output tokens less than one-tenth the cost of comparable closed systems. Independent sources reported that the system can stream approximately 1,500 tokens per second, bringing down high-latency workloads to near real-time speeds.

Alibaba’s Qwen3-235B, a mixture-of-experts architecture, delivers frontier-level benchmark scores on par with models like Claude 4 and Gemini 2.5. The performance of this model, coupled with Cerebras’ high-throughput infrastructure, strengthens the company’s value proposition in inference-heavy deployments.

Cerebras has always been used as a counter to Nvidia’s software lock-in. CUDA has become a standard in many machine learning workflows, but it restricts cross-platform deployment. Feldman noted that most ML practitioners work in PyTorch, and Cerebras invested in building a compiler that translates PyTorch workloads to its hardware. “We’ve been able to work around [Nvidia lock-in] and deliver a seamless experience,” Feldman said.

This approach is resonating with partners. The company has secured deployments with Notion, Docker, Hugging Face, and DataRobot. Hugging Face’s SmolAgents and DataRobot’s Syftr platform run on Cerebras infrastructure, and the Docker integration enables developers to launch multi-agent stacks in seconds. Babak Pahlavan, CEO of NinjaTech AI, called the system transformative for reasoning-intensive workloads, saying it enables math, coding, logic, and research “10–20 times faster than flagship proprietary models, at just a fraction of the cost.”

Even so, Cerebras’ business model is still maturing. More than 80% of its revenue comes from a single customer: G42, the UAE-based AI firm. Feldman acknowledged the concentration but clarified that Cerebras is part of a broader cloud platform built for G42, which serves many clients. “G42 itself is nine different companies. So it is a cloud for many customers at once,” he said.

The company is expanding globally, with offices in Toronto and Bangalore and commercial traction in Europe, Japan, Korea, and Singapore. It does not sell in China. Manufacturing has quadrupled over the past year, and Cerebras is one of the few AI system providers that designs, packages, and builds computers entirely within the United States.

Cerebras confidentially filed for an IPO in September 2024. SEC filings show the company posted $136 million in revenue and a $66.6 million net loss in the first half of 2024, compared to $8.7 million in revenue and a $77.8 million loss a year prior. Analysts expected an IPO in late 2024, later pushed to 2025 pending CFIUS review of its UAE ties. That review has since been cleared. “There were some hurdles. Those are in the rearview mirror,” Feldman said.

Feldman noted that while the company remains unprofitable, the market appetite for infrastructure-focused AI companies is growing. “There’s a yawning appetite for AI companies and particularly for alternatives to Nvidia.” He cited the positive market reception to CoreWeave’s IPO as validation.

A growing part of Cerebras’ strategy revolves around AI sovereignty. Countries seeking localized AI infrastructure for linguistic, regulatory, or geopolitical reasons are increasingly pursuing alternatives to the U.S.-centric cloud platforms. Feldman said this shift echoes historical trends around power access and data center geography. “To have AI in Arabic or in Spanish or in French that is as rich as the AI that’s in English—that’s a very reasonable desire.”

He and Eric Schmidt shared a keynote in Paris where they discussed how future AI hubs will mirror the map of electricity availability predicting AI development based on power infrastructure rather than cloud reach.

Feldman sees this shift as reinforcing Cerebras’ distributed infrastructure strategy. “You build an app and need a shopping cart—you call Stripe. You need a chatbot—you call Cerebras.”

Unlike past tech waves where consumer apps led adoption, Feldman said enterprise demand is now driving AI innovation. “Enterprise is leading, and consumers are lagging. AI will be embedded into how we order a burrito or get a question answered by HR.”

Cerebras is built for both general-purpose and highly specialized workloads. While HR and finance models can use shared logic, fields like genomics and aerospace demand bespoke approaches. “You have to think really hard about what is similar across large organization. But in drug design or aeronautics, specialization is key.”

That logic underpins Cerebras’ long-term roadmap. The company made its architectural bet before transformers were mainstream and has since become an ideal fit for transformer workloads. “When you build chips that require a year and a half of planning, you have to get those bets right. When you build chips before transformers existed… that’s when you get something right.”

📣 Want to advertise in AIM Media House? Book here >

Picture of Anshika Mathews
Anshika Mathews
Anshika is the Global Media Lead for AIM Media House. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at anshika.mathews@aimmediahouse.com
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.