In AI development, a performance breakthrough often depends less on algorithms than on the code that connects them to the hardware. The specialized “kernel” programs that run on GPUs can determine whether a model operates at full speed or wastes costly computing resources. Writing that code is slow, labor-intensive, and typically done by a small pool of engineers at large tech companies: a constraint that has shaped the economics of AI infrastructure for years.
When Waleed Atallah talks about GPU programming, he frames it as a bottleneck that has shaped, and limited, AI’s infrastructure for years. “GPUs are the workhorses behind modern AI. But writing code for them remains an archaic, manual, and expensive process,” he wrote in announcing his company’s recent seed round. “Performance still depends on a small group of highly specialized engineers, and the best code is often locked away inside hyperscalers.”
Atallah is co-founder and CEO of Mako, a New York–based startup developing AI systems that write and optimize the low-level “kernel” code that allows software to run efficiently on GPUs. The company has raised $8.5 million in seed funding led by M13, with participation from Neo, Flybridge, K5 Global, Irregular Expressions, and others. Mako has also signed partnerships with AMD and Tenstorrent, signaling early alignment with chipmakers that are competing for share in the AI hardware market.
Automating a Specialized Skill
In AI model deployment, a GPU kernel acts as the bridge between a software algorithm and the underlying hardware. Writing these kernels is a highly technical task, often requiring deep knowledge of languages like CUDA and a detailed understanding of GPU architecture. Salaries for skilled kernel engineers can exceed $1 million annually, and most work at large companies such as NVIDIA, Meta, or Google.
Mako’s platform is designed to replicate and accelerate that expertise. Its flagship tools: MakoGenerate and MakoOptimize, use AI to produce optimized kernels in under a minute, then continuously tune them for speed, efficiency, and cost reduction. According to the company, customers have seen up to three times faster inference performance, infrastructure cost reductions of up to 80%, and kernel speeds up to ten times faster than the output of torch.compile.
“Our mission is to make peak GPU performance universally accessible through intelligent, automated code generation,” Atallah told The Information. “Instead of requiring teams to master CUDA or hunt for rare kernel engineers, we’re building an AI system that writes and continuously tunes the low-level GPU code for you.”
The approach is hardware-agnostic, supporting NVIDIA, AMD, and Tenstorrent GPUs, with plug-and-play deployment in either cloud or on-premises environments. This is significant in a sector where hardware changes can force months of redevelopment.
Positioning in a Shifting AI Hardware Landscape
The seed funding will allow Mako to expand its engineering team, broaden hardware support, and deepen its chip vendor partnerships. The company’s early alignment with AMD and Tenstorrent suggests a strategy of enabling performance parity, and portability, across competing GPU architectures.
Atallah emphasized that Mako’s technology is not intended to displace NVIDIA’s dominance, but to enhance the overall ecosystem. “We can improve the performance of existing kernels. We can also enable new software to run on NVIDIA GPUs… and all sorts of different GPUs as well,” he said.
The startup’s positioning reflects a challenge in the AI hardware market: without robust software tooling, new architectures struggle to gain traction. “Without a proper software stack and without somebody to write all of that low-level code that maps the algorithms to the hardware, your chips are basically expensive paperweights,” Atallah told The Information.
Competitors in this space range from in-house teams at major chipmakers to companies developing compilers and performance engineering tools. Mako’s differentiator is its focus on automated, AI-driven code generation and optimization: eliminating the need for engineers to manually port and fine-tune kernels for each hardware release.
The company’s long-term vision aligns with a shift in software development. “AI infrastructure is entering a new era, one where agents write the infrastructure itself,” Atallah wrote. He envisions a future where compilers, profilers, and debuggers are abstracted by intelligent systems that understand both performance constraints and hardware nuances.
With commercial availability already in place, Mako is pursuing customers in AI, graphics, simulation, and scientific computing: segments where GPU efficiency directly impacts operational costs.