d-Matrix Raises $275 Million to Solve AI’s Most Expensive Problem

"We've spent the last six years building the solution. A fundamentally new architecture that enables AI to operate everywhere, all the time"​

The AI boom has a big secret. Training is cheap compared to deployment. Building a large language model costs millions. Running it at scale costs billions. As enterprises deploy AI models into production, inference has become the dominant expense in AI infrastructure. 

Santa Clara startup d-Matrix has raised $275 million to solve this economic crisis with a fundamentally different chip architecture. The Series C funding, co-led by BullhoundCapital, Triatomic Capital, and Singapore’s Temasek sovereign wealth fund, values the company at $2 billion and brings total capital raised to $450 million.​

The insight behind d-Matrix is simple. The problem isn’t computing power, it’s the distance data travels. Traditional graphics processing units, from Nvidia and others, separate processing from memory. Data must constantly move between these two components, creating a bottleneck that wastes power and latency. This “memory wall” is why inference is so expensive.​

d-Matrix’s solution is called Digital In-Memory Compute, or DIMC. Instead of separating memory from processing, the company embeds processing components directly into memory itself. The result is Corsair, an inference accelerator that delivers what the company claims are 10 times faster performance, 3 times lower costs, and up to 5 times better energy efficiency compared to GPU-based systems.​

Corsair takes the form of a PCIe card that plugs into standard data center servers. Inside are two custom chips, each containing 1 gigabyte of SRAM, the type of ultra-fast memory typically used only in processor caches. But d-Matrix has repurposed much of this SRAM to perform vector-matrix multiplications, the mathematical operations at the heart of AI inference.​

The chips also contain additional components, a RISC-V control core that orchestrates operations, and SMID cores that handle parallel calculations. Together, these achieve extraordinary performance. Corsair can execute 9,600 trillion calculations per second when using the MXINT4 data format, a compressed representation that uses less memory than standard formats.​

CEO Sid Sheth emphasized the foundational philosophy: “From day one, d-Matrix has been uniquely focused on inference. We predicted that when trained models needed to run continuously at scale, the infrastructure wouldn’t be ready. We’ve spent the last six years building the solution: a fundamentally new architecture that enables AI to operate everywhere, all the time.”​

The Infrastructure

But Corsair alone isn’t the story. d-Matrix has built an entire ecosystem for inference at scale. JetStream is a custom network interface card that links multiple Corsair servers into clusters, achieving latencies as low as 2 microseconds between accelerators, far faster than off-the-shelf networking. Aviator is the software stack that automates deployment and monitoring of AI models on the platform.​

Together, these components allow customers to run AI models with up to 100 billion parameters entirely in ultra-fast SRAM within a single server rack. For enterprises, this means unprecedented density. What might require ten traditional data centers can now fit in one, addressing both economic and sustainability concerns.​

Real-world performance numbers are striking. On a Llama 70B model, Corsair can generate 30,000 tokens per second at just 2 milliseconds per token,meeting the low-latency requirements of applications like real-time voice interactions and AI agents.​

The investor consortium behind this funding, BullhoundCapital, Triatomic Capital, and Temasek aren’t general-purpose venture firms betting on hype. They’re deep-tech specialists who recognized what d-Matrix understood years before the industry caught up. Inference would define AI’s economics.​

Per Roman, founder of BullhoundCapital, articulated this vision: “As the AI industry’s focus shifts from training to large-scale inference, the winners will be those who anticipated this transition early and built for it. d-Matrix stands out not only for its technical depth but for its clear strategic vision.”​

Jeff Huber at Triatomic Capital added: “AI inference is becoming the dominant cost in production AI systems, and d-Matrix has cracked the code on delivering both performance and sustainable economics at scale.”​

Michael Stewart from M12 (Microsoft’s Venture Fund), which also participated in the round, emphasized d-Matrix’s market position: “d-Matrix is the first AI chip startup to address contemporary unit economics in LLM inference… with differentiated elements in the in-memory product architecture that will sustain the TCO benefits with leading latency and throughput.”​

What’s Next?

d-Matrix isn’t resting on Corsair’s success. The company has already announced Raptor, its next-generation accelerator launching in 2026. Raptor will stack RAM directly atop compute modules in a 3D configuration, dramatically reducing the distance data must travel. It will also upgrade from 6-nanometer to 4-nanometer manufacturing technology, bringing additional speed and efficiency gains.​

The company claims Raptor will deliver 10 times better memory bandwidth and 10 times greater energy efficiency compared to existing HBM4 technology, a huge improvement that would fundamentally reshape data center economics.​

This funding round signals something profound. The market has finally acknowledged what d-Matrix saw six years ago. Inference isn’t a secondary problem. It’s the primary constraint on AI’s future. And the companies that solve it won’t just be successful, they’ll be essential.​

With $450 million in total funding, over 250 employees, and a clear roadmap to the next generation of chip architecture, d-Matrix is positioning itself as the infrastructure layer for the inference economy. The $2 billion valuation still seems conservative compared to the trillions of dollars that inference operations will consume over the next decade.​

For data centers, enterprises, and hyperscalers struggling with AI’s runaway costs, d-Matrix isn’t just another chip startup. It’s the answer to how we run AI at scale without bankrupting ourselves.

📣 Want to advertise in AIM Media House? Book here >

Picture of Sachin Mohan
Sachin Mohan
Sachin is a Senior Content Writer at AIM Media House. He is a tech enthusiast and holds a very keen interest in emerging technologies and how they fare in the current market. He can be reached at sachin.mohan@aimmediahouse.com
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.