Cartesia Raises $100 Million to Transform Real-Time Voice AI with Sonic-3

"Experience voice AI that feels truly human"

Silicon Valley startup Cartesia has secured a $100 million funding round from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Co-founded by Stanford AI Lab alumni Karan Goel and Albert Gu, Cartesia is launching Sonic-3, a real-time conversational AI model. 

Sonic-3 redefines what’s possible in voice AI by delivering a combination of uniqueness, speed, and multilingual support. It captures the full emotional range of human speech, including laughter, tone variation, and subtle emotional shifts, making conversations feel deeply authentic and engaging. 

It also boasts lightning-fast performance, with a model latency of just 90 milliseconds and a total end-to-end response time of 190 milliseconds, placing it among the fastest real-time voice AI systems available. Its global reach is equally impressive, supporting 42 languages, enabling enterprises to deploy truly global, natural voice applications that meet diverse market needs.

Unlike most voice AI solutions that rely on Transformer architectures, Sonic-3 is built on State Space Models (SSMs). The traditional Transformer-based models process conversations by re-reviewing all preceding dialogue to predict each next word, similar to replaying the entire conversation repeatedly. This approach introduces latency and inefficiency.

SSMs, pioneered by Cartesia’s founders at Stanford (with innovations like S4 and Mamba), function more like human memory. They retain an ongoing understanding of the topic and conversational vibe without replaying everything from scratch for each response. This enables Sonic-3 to generate speech that is both natural and fast.

“If you’re qualified and we can’t make your voice AI better than what you’re using now, I’ll donate $5K to your chosen charity,” said Karan.

Thousands of companies, including ServiceNow, Cresta, and Decagon trust Sonic to power millions of voice interactions monthly. Cartesia’s platform enables enterprises to build voice agents capable of complex tasks such as customer support, scheduling, and even lighthearted pranks, all with human-like expressiveness.

To encourage adoption, Cartesia offers free trials and demos, as well as an 11-page guide on cloning voices and creating AI agents in under 10 minutes. Additionally, new users receive $100 in free credits to experiment with voice AI applications.

The $100 million raise highlights growing investor confidence in Cartesia’s technology and business potential. With capital from Silicon Valley titans like Kleiner Perkins and NVIDIA, Cartesia plans to expand its engineering team, scale product development, and extend its global reach.

📣 Want to advertise in AIM Media House? Book here >

Picture of Sachin Mohan
Sachin Mohan
Sachin is a Senior Content Writer at AIM Media House. He is a tech enthusiast and holds a very keen interest in emerging technologies and how they fare in the current market. He can be reached at sachin.mohan@aimmediahouse.com
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.