“APIs Have Been More Sticky,” Says Murf AI’s CEO

Murf argues that infrastructure reliability, not model commoditization, will determine who wins voice AI inside global enterprises

When Ankur Edkie and his co-founders started Murf AI in 2020, the generative-AI wave had not arrived. “We’ve been at it for five years,” he said to AIM Media House. “This sort of precedes the Gen AI wave.” The original bet: voice was an underdeveloped modality. Editing, modifying, and scaling speech remained far behind what was possible with images or video. “It felt like the most untapped modality at that point,” he said.

That gap has now become Murf’s central opportunity. The company has grown from a creative-tool platform into a full voice-infrastructure provider. Its newest product, Falcon, is part of a broader effort to make enterprise-grade voice agents viable at scale. Murf claims the API delivers 55-millisecond model latency, 130-millisecond global time-to-first-audio, and support for 35 languages, positioning it as one of the fastest text-to-speech systems in production environments.

Edkie sees this shift as the natural evolution of Murf’s early work. The company began with voiceovers and dubbing across creator and education markets but quickly saw pull from enterprise teams. “From the first product itself, we had about 350 of Forbes 2000 companies being our customers,” he said. “A lot of them bought the second one as well because we could easily expand.”

Today, 60% of Murf’s revenue comes from the United States, but its infrastructure ambitions are global.

A Multilingual Foundation

Murf’s enterprise push rests on two technical layers: multilingual depth and low-latency infrastructure.

Falcon’s multilingual capability extends across 35+ languages. Murf also built what it calls “MultiNative Technology,” enabling models to handle mixed-language speech. This kind of code-mixing remains a difficult problem for most speech systems. Eddie views it as one of the core differentiators. “Anything beyond English, there’s just so much work that needs to be done,” he said. “The diversity that we have, the amount of Tamil and English you’re going to mix and Hindi in the mix, it’s just not easy to get there.”

The technical challenge extends beyond language coverage. Latency determines whether a voice agent feels usable. Falcon’s 55-millisecond model latency is achieved by avoiding large language model backbones and instead tailoring a smaller architecture. “Most text-to-speech systems today start with an LLM backbone. We actively avoided that to go smaller,” Edkie said. He noted that Murf will “drop it further by 20 milliseconds” in the coming weeks.

The company is preparing to offer “provisioned concurrency,” giving enterprises guaranteed P99 latencies under 130 milliseconds. “There’s no guarantee today from most providers,” he said. “That’s a promise we want to hold ourselves accountable to.”

This latency target is aimed at enterprises pursuing real-time use cases: outbound sales, hospitality reservations, interactive learning, and customer-support deflection. Murf says Falcon can support up to 10,000 concurrent calls at the same latency level.

The shift reflects where the market is moving. Google Cloud and Microsoft Azure offer mature TTS systems with global hosting, but neither provides consistent sub-150-millisecond voice generation at scale. OpenAI’s newer audio models integrate tightly with GPT-based systems, but they are not optimized for low-latency telephony or call-center infrastructure. Deepgram has expanded into speech generation, but its multilingual coverage is narrower.

Murf is positioning itself as the infra-level alternative built specifically for voice agents, not general multimodal AI.

Data Residency, On-Prem Options, and Enterprise Stickiness

A large share of Murf’s enterprise appeal comes from deployment flexibility. Falcon supports data-residency across 11 global regions, including India, the U.S., the EU, and the Middle East. Murf also offers a full on-premise version, allowing customers to deploy TTS models inside their own VPC or private data center, so “audio and text never leave your environment,” according to company documentation.

This matters in sectors like banking, government, and healthcare, where voice data may be classified, regulated, or legally restricted from crossing borders. Eddie said regulatory-heavy customers often come through partners and systems integrators. “We haven’t gone too deep into healthcare, but BFSI we are interested in,” he said. For these customers, Murf’s approach is to support agents end-to-end. “There’s a lot of expertise that goes into a very stable agent design,” he said.

Enterprise stickiness is a recurring theme. While API-based businesses are often seen as interchangeable, Edkie argues the opposite. “APIs have been more sticky in general because the trust that needs to be built with an API is high,” he said. “Once the price works and the solution actually works, it’s very hard to go back and convince management that you want to look for another solution.”

Murf’s roadmap continues to expand toward a full-stack agent platform. The company is developing an end-to-end speech-to-speech system and a semi-self-service agent builder that enterprises can use alongside Murf’s engineering teams. “Every infra provider needs to have some app development,” Edkie said. “We don’t expect it to be completely self-service because we’re catering to enterprises.”

Over the next three to six months, the roadmap includes deeper reductions in latency, improvements to Falcon’s naturalness, and a unified data-residency solution for companies running agents across multiple geographies. “It’s important that there’s a unified solution,” Edkie said. “We’re excited about launching something there.”

📣 Want to advertise in AIM Media House? Book here >

Picture of Mukundan Sivaraj
Mukundan Sivaraj
Mukundan covers enterprise AI and the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com or Signal at mukundan.42.
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.