For years, large language models (LLMs) have operated within a well-defined paradigm: autoregression. Each word or token is generated sequentially, one at a time, creating a fundamental bottleneck in speed and efficiency. This has led to increasing inference costs and latency issues as AI-generated text becomes more complex. Now, Inception Labs, a startup co-founded by Stanford professor Stefano Ermon and his colleagues Volodymyr Kuleshov and Aditya Grover, is introducing a different approach, diffusion large language models (dLLMs). Their first commercial-scale product, Mercury, aims to disrupt the status quo by offering significantly faster and more efficient text generation.
The Diffusion Model Shift
Traditional LLMs, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Ha
Diffusion Models Enter the Large Language Arena as Inception Labs Unveils Mercury
- By Anshika Mathews
- Published on
If Mercury’s claims hold up in real-world applications, it may not be long before diffusion-based language models become a core part of AI development.
