Higgsfield AI Thinks Generative Video Still Lacks One Critical Thing

While platforms like Runway, Pika Labs, and OpenAI focus on visual fidelity, Higgsfield has prioritized the grammar of film which is the motion, perspective, and spatial composition that shape a story.

“AI video looks better, but it doesn’t feel like cinema.” “We kept hearing the same thing from creators: AI video looks better, but it doesn’t feel like cinema,” said Alex Mashrabov, founder of Higgsfield AI and former head of AI at Snap. “There’s no intention behind the camera.”

That critique became the foundation for Higgsfield AI, a generative video company focused on bringing cinematic language into AI video, not by enhancing visual fidelity, but by giving creators direct control over how the camera moves through a scene.

Founded by Mashrabov, a pioneer in the AI video space, Higgsfield recently raised $15 million in seed funding, launching into a market where venture capital has flooded into AI video. Luma raised $45 million, while companies like Runway and Pika Labs now command valuations exceeding $500 million. Yet, despite the excitement, Mashrabov is clear-eyed about who’s actually using the technology: “Most of the adoption I think of the video AI comes from professionals today 100%.”

Higgsfield’s technology stems from lessons learned during the launch of Diffuse, a viral app Mashrabov previously developed that lets users create personalized AI clips. While Diffuse found traction, it also revealed the creative limits of short-form, gag-driven content. The Higgsfield team shifted their focus to storytelling, specifically serialized short dramas for TikTok, YouTube Shorts, and other mobile-first platforms.

At the heart of Higgsfield’s offering is a control engine that allows users to craft complex camera movements, dolly-ins, crash zooms, overhead sweeps, and body-mounted shots—using nothing more than a single image and a text prompt. These kinds of movements traditionally demand professional rigs and crews. Now, they’re accessible through presets.

The idea is not just to produce good-looking frames but to make AI video feel intentional and cinematic. Higgsfield is tackling one of the most common criticisms of AI-generated content: that it lacks structure, rhythm, and authorship.

“We’re not just solving style—we’re solving structure,” said Yerzat Dulat, Higgsfield’s Chief Research Officer. The platform directly addresses character and scene consistency over time, still a persistent challenge in generative video tools. Murat Abdrakhmanov, VC did mention that his rule of thumb as an experienced angel investor is to invest in people, not products. So, as much as Higgsfield’s technology revolutionizes video AI generation and content creation, getting to know its founder was just as important.

Higgsfield DoP I2V-01-preview

The company’s proprietary model, Higgsfield DoP I2V-01-preview, is an Image-to-Video (I2V) architecture that blends diffusion models with reinforcement learning. Unlike traditional systems that simply denoise static frames, this model is trained to understand and direct motion, lighting, lensing, and spatial composition the essential components of cinematography.

By introducing reinforcement learning after diffusion, the model learns to inject coherence, intentionality, and expressive movement into scenes. This approach draws from how RL has been used to give large language models reasoning and planning capabilities.

Built on AMD Instinct™ MI300X with TensorWave

Higgsfield built and tested its model in partnership with TensorWave, deploying on AMD Instinct™ MI300X GPUs. Using TensorWave’s AMD-based infrastructure and pre-configured PyTorch and ROCm™ environments, the team ran inference workloads without custom setup—allowing them to evaluate model performance and stability under real-world conditions.

Filmmaker and creative technologist Jason Zada, known for Take This Lollipop and brand work with Intel and Lexus, produced a short demo titled Night Out using Higgsfield’s platform. The video features stylized neon visuals and fluid, high-impact camera motion—all generated within Higgsfield’s interface.

“Tools like the Snorricam, which traditionally require complex rigging and choreography, are now accessible with a click,” Zada said. “These shots are notoriously difficult to pull off, and seeing them as presets opens up a level of visual storytelling that’s both freeing and inspiring.”

John Gaeta, the Academy Award–winning visual effects supervisor behind The Matrix and founder of escape.ai, praised Higgsfield’s system for pushing creators closer to having “total creative control over the camera and the scene.” Gaeta’s platform escape.ai focuses on films created with AI, game engines, and other emerging tools.

While platforms like Runway, Pika Labs, and OpenAI focus on visual fidelity, Higgsfield has prioritized the grammar of film which is the motion, perspective, and spatial composition that shape a story.

📣 Want to advertise in AIM Media House? Book here >

Picture of Anshika Mathews
Anshika Mathews
Anshika is the Global Media Lead for AIM Media House. She holds a keen interest in technology and related policy-making and its impact on society. She can be reached at anshika.mathews@aimmediahouse.com
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.