Cerebras Systems, an AI hardware innovator, has taken a massive leap in the AI inference landscape, delivering unprecedented performance on Meta's Llama 3.1-405B model. Using its third-generation Wafer Scale Engine (WSE), Cerebras has achieved 969 tokens per second, setting a new benchmark for AI inference speed and redefining industry standards.
Llama 3.1 405B is now running on Cerebras!– 969 tokens/s, frontier AI now runs at instant speed– 12x faster than GPT-4o, 18x Claude, 12x fastest GPU cloud– 128K context length, 16-bit weights– Industry’s fastest time-to-first token @ 240ms pic.twitter.com/zWJdV4zNPU— Cerebras (@CerebrasSystems) November 18, 2024
Breaking the GPU Ceiling
Traditional GPUs have long been the backbone of AI infrastructure, but they weren't de
Cerebras Redefines Speed with Fastest Inference for Next Agentic AI
- By Anshika Mathews
- Published on
By running the largest models at instant speed, Cerebras enables real-time responses from the world’s leading open frontier model.
