Inception, an artificial intelligence startup founded by Stanford professor Stefano Ermon, has raised $50 million in seed funding to develop a new generation of large language models built on diffusion technology.
The round was led by Menlo Ventures with participation from Microsoft’s M12, NVentures (Nvidia’s venture arm), Snowflake Ventures, Databricks Investment, and Mayfield. Angel investors Andrew Ng and Andrej Karpathy also joined the round. Inception’s goal is to create diffusion-based models for text, code, and voice generation, offering faster and more efficient performance than existing systems.
Ermon, known for his research on probabilistic modeling and generative systems, is taking diffusion models beyond image generation into the domains of text, code, and voice. Inception claims that its models can produce over 1,000 tokens per second, offering significantly faster generation than conventional systems. “These diffusion-based LLMs are much faster and much more efficient than what everybody else is building today,” Ermon said.
The company plans to use the capital to expand its research team, develop infrastructure, and bring its models to enterprise-scale deployment.
Diffusion Models: A New Architecture for Text and Code
Unlike traditional autoregressive models, which generate text one token at a time, diffusion models start with random noise and refine it iteratively until coherent output emerges. This parallelized process enables higher throughput and reduced latency.
Ermon explained, “We’ve been benchmarked at over 1,000 tokens per second, which is way higher than anything that’s possible using existing autoregressive technologies, because our system is built to be parallel and really fast.”
Inception’s first model, Mercury, demonstrates this approach in practice. It has been integrated into developer tools, including ProxyAI, Buildglare, and Kilo Code, which use it for code completion, text assistance, and conversational interfaces. Mercury can generate structured text, automate repetitive coding tasks, and support natural-language interactions with low latency.
The company’s research focus is on optimizing the inference stage of large model deployment, where diffusion’s parallel structure offers measurable speed and cost benefits. This technical direction places Inception in contrast with most major LLM providers, which continue to refine autoregressive pipelines rather than rethinking the architecture.
Enterprise Opportunity and Technical Ambition
Inception is targeting enterprise and developer markets that require fast, reliable, and scalable model outputs for production use. Its technology is designed to serve applications in software development, customer interaction, and data documentation, where real-time response and accuracy are critical.
The newly raised funding will be used to scale the Mercury model, invest in research and compute infrastructure, and expand commercial partnerships. By building a diffusion-based foundation for code and text, Inception aims to establish an alternative standard for high-performance generative systems.
Ermon said, “It’s a completely different approach where there is a lot of innovation that can still be brought to the table.” The company is focusing on making the system production-ready for enterprise workloads rather than releasing consumer-facing tools.
Inception’s roadmap includes growing its engineering teams, refining Mercury’s latency and accuracy, and extending its model suite to other domains. The startup is aligning itself at the intersection of academic research and enterprise AI infrastructure, aiming to deliver faster and more efficient generative systems for professional environments.








