Uber Has an AI Productivity Boom and an ROI Crisis at the Same Time

By Mukundan Sivaraj | May 29, 2026 | 5 min read

Enterprises can now measure AI-generated activity in extraordinary detail. Proving that it improves products, margins, or customer value is much harder.

Major hyperscalers are projected to spend more than 600 billion dollars on AI infrastructure in 2026. Very few enterprises can say what they're actually getting in return.

Uber's executives are describing two separate realities. AI is measurably increasing employee throughput. The company cannot measure whether that translates into faster products, better features, or stronger margins.

This gap is becoming a strategic problem because organizations can now observe acceleration in employee activity far more easily than they can measure proportional organizational returns.

The Confidence Level at the Employee Level

Uber executives believe AI improves productivity. The data confirms it.

CEO Dara Khosrowshahi frames the shift plainly: "We're seeing uptake of these tools, whether it's our legal team or marketing team or developers. We think it's creating kind of employees with superpowers."

Uber can see that roughly 10% of code changes are generated by autonomous agents. Code generation and experimentation are accelerating across teams. The company reportedly exhausted portions of its Claude Code budget early in 2026. Teams are consuming AI aggressively across departments.

Khosrowshahi tied this directly to hiring strategy: "If every person at this company can increase their throughput by 20%, 30%, 50%, 100%, then I think metering headcount growth and leaning in on AI investment is going to be well worth it."

In May, Uber slowed hiring growth. The money went to AI investment. Higher throughput per person will offset reduced headcount.

Uber isn't isolated in this pattern. 79% of organizations report productivity gains from AI at the individual level. Confidence in individual productivity gains is widespread. Measuring organizational impact is harder.

Where Measurement Breaks Down

Uber President and COO Andrew Macdonald identified the real problem in May.

The company cannot draw a clear line between higher AI usage and customer-facing results. "That link is not there yet, right?" he said. "I think maybe implicitly there is more that is getting shipped, but it's very hard to draw a line between one of those stats and, 'Okay, now we're actually producing 25% more useful consumer features.'"

He continued: "If you're not actually able to draw a direct line to how much useful features and functionality you're shipping to your users, that trade becomes harder to justify."

Macdonald accepts that AI works. His specific constraint is that the company has no system to connect higher token consumption to measurable business outcomes. Code generation, token consumption, and internal experimentation are all increasing. What's missing is the mechanism for measuring whether any of it matters to customers.

Enterprises still struggle to measure which work actually creates value. This is a deeper problem that the measurement trap exposes.

When software development relied on "lines of code" as a productivity metric, engineers discovered a problem that still matters. More code doesn't mean better software. Some of the most valuable engineering work, including refactoring and simplification, may reduce code rather than add to it. The metric optimized for the wrong thing.

AI is creating the same measurement trap. Enterprises can count code generated. They can count tokens consumed. They can measure GPU utilization. But counting activity is not the same as counting value.

The real problem is that organizations don't have a way to measure whether faster task completion actually matters. Whether faster code changes improve products. Whether local optimization creates organizational friction higher up the chain.

AI accelerates task completion. Organizations are optimized around workflows, approvals, dependencies, and coordination. When individual throughput increases, bottlenecks often move upward. Execution speeds up, but integration becomes the constraint. Individual work accelerates while organizational synchronization lags.

Friction accumulates at the handoff layer. That's where the constraint sits: between teams, systems, and approvals.

The scale of this gap is now visible in enterprise data.

Only 29% of organizations report significant ROI from generative AI, despite 79% reporting productivity gains. That 50-point gap reflects the difference between measurable productivity gains and measurable business outcomes. Only 21% of S&P 500 companies can cite any measurable AI benefit. 95% of AI pilots deliver zero measurable P&L impact.

54% of C-suite executives believe AI adoption is "tearing their company apart," even as they report high confidence in the technology itself. Organizations are becoming more efficient at execution while still struggling to coordinate and measure that execution at scale.

When Activity Metrics Become a Risk Signal

Capital markets are pricing the measurement gap as risk.

Debt markets are increasingly treating the gap between AI activity and measurable returns as a risk factor. Citi Research found that companies classified as AI "adopters" without proven ROI face higher cost of capital versus companies with demonstrated returns.

This shift in how enterprises evaluate AI has become measurable.

Enterprises have stopped accepting productivity as proof of value. Recent research from The Futurum Group documents the shift. Direct financial impact, combining revenue growth and profitability, has nearly doubled to 21.7% of primary ROI metrics. Meanwhile, productivity gains fell from 23.8% to 18.0% as the leading success metric. Time-saved metrics are becoming less persuasive as enterprises shift toward ROI-based evaluation.

AI as an Operating Cost Category

AI spending is increasingly being evaluated as a capital allocation decision. Enterprise generative AI spending exploded from 11.5 billion dollars in 2024 to 37 billion in 2025. That's a 3.2x increase. This happened during the same period when per-token costs fell by a factor of 1,000. Unit economics improved by three orders of magnitude. Spending more than tripled. Scale and continuous operation are now the dominant enterprise AI costs.

Inference has become a recurring operational expense. Every interaction with an LLM burns GPU cycles. Persistent agent workflows and automated inference pipelines can consume tokens continuously, even when no human is actively requesting a response. CFOs used to treat AI as pure innovation. Now it competes with headcount, infrastructure, and product development for capital. Enterprises using AI gateways for cost governance report 40–60% reductions in inference costs. That implies the other 40-60% is waste. It also implies that governance systems now matter more than model capability.

Macdonald framed the operational choice directly: "We're going to have to start talking about token consumption and the associated cost versus headcount."

The discussion increasingly centers on capital allocation, operating costs, and governance. Whether Uber's decision to slow hiring while accelerating AI spending works depends on building measurement and governance systems fast enough to prove the trade is worth it.

Other major enterprises are making identical choices. The pattern is spreading because the math is forcing it. If inference costs are operational expense, if token consumption is outpacing productivity, if organizations can't connect spending to outcomes, then spending must be governed like any other capital category. That requires measurement. That requires deciding what actually matters.

Why Uber Matters Specifically

Uber operates at a particular scale that makes it useful as a case study. The company facilitates over 40 million trips per day in 70+ countries and 15,000 cities. Engineering is core to the business. The platform is already AI-native in parts of its operation. And its leadership is unusually transparent about constraints.

Most enterprises dealing with identical measurement gaps are silent. Uber is exposing the problem publicly. That transparency reveals what's actually happening inside technology organizations right now.

The Real Question

Uber's measurement problem runs through the organizational structure. The company can instrument employee activity in extraordinary detail. Code generated, tokens consumed, experiments run, commits pushed. These metrics are precise and real. What Uber lacks is a system for measuring organizational causality. Does faster local execution translate into better products, stronger margins, or faster feature velocity?

The core issue is that AI systems are advancing faster than the management systems designed to evaluate them. Enterprises can generate more code, more quickly, than they can measure whether the code matters.

Uber is rare in discussing this publicly. Most enterprises are dealing with the same measurement gap but aren't saying so. Silence has a cost. Citi's research shows that it appears in cost of capital.

Organizations that can connect AI spending to measurable operational outcomes will have a structural advantage. Others may continue scaling inference costs without clear evidence of proportional returns.

The bottleneck in enterprise AI has shifted from model capability to measurement.

Key Takeaways

Measure AI productivity gains but struggle to quantify their impact on products and margins.
Uber reports significant increases in employee throughput due to AI tools across various departments.
Executives acknowledge a gap between productivity improvements and measurable organizational returns.
Hyperscalers plan to invest over $600 billion in AI infrastructure by 2026 without clear ROI.
CEO highlights AI tools as empowering employees, enhancing coding and experimentation efficiency.