Why a Microsoft AI Architect Focused on Trust, Not Hype?

“In AI,” Gulwani says, “creativity comes from how we ask questions. Responsibility comes from knowing when to question the answers.”
On a flight home from a research workshop, Sumit Gulwani learned an uncomfortable lesson about expertise.
The woman sitting next to him was struggling with an Excel spreadsheet. A simple problem on the surface was names written as FirstName LastName that needed to be reformatted as LastName, FirstName. When she learned that Gulwani had a PhD in computer science and worked at Microsoft Research, she smiled and asked if he could help.
Gulwani didn’t know the programming model beneath Excel. He had to excuse himself, embarrassed not just by the situation, but by what it revealed. Later, back home, he searched Excel help forums and found thousands of users struggling with the same kinds of tasks. What stood out wasn’t only the volume of frustration but it was how people tried to solve it. They posted a few examples of what they wanted, hoping an expert could infer the right logic.
That behavior led to a simple question: What if Excel itself could infer intent from examples, the way humans do?
That question became Flash Fill.
From the start, the idea was shaped less by ambition and more by constraint. When Gulwani proposed Flash Fill to the Excel product team, they laid down two non-negotiables. It had to work in a fraction of a second, or it would break the interactive experience. And it had to work with just one example most of the time, or users would lose trust if the system inferred the wrong thing.
Those constraints forced Gulwani to rethink program synthesis from first principles. He narrowed the scope to common text transformations, grounded the work in real user scenarios, and redesigned the system around efficiency and ranking by generating multiple plausible interpretations of intent and selecting the one most likely to match what the user meant.
“It wasn’t about showing intelligence,” he says. “It was about staying out of the user’s way.”
When Flash Fill shipped in Excel in 2013, it became one of the first AI features to reach true mass-market scale. It met people where they were, inferred intent from minimal input, and preserved flow.
From a Feature to a Flywheel
Flash Fill could have remained a single success. Instead, it became a turning point.
For Gulwani, the shift from isolated innovation to sustained impact was personal. His father, a civil engineer, had always asked one question about his work: How does this help people? When Flash Fill shipped, the answer finally felt tangible.
“When my father said, ‘Now I understand your research,’ that stayed with me,” Gulwani says.
Growing up, he had watched blueprints turn into buildings. Ideas mattered, but only if they could be built, used, and maintained. That mindset pushed him beyond publishing research toward owning systems end to end.
The organizational inflection point came in 2015, when Gulwani emailed Satya Nadella. At the time, he was still an individual contributor. Nadella’s response didn’t promise funding or headcount. Instead, it offered a framework.
“Don’t ask me for funding,” Nadella told him. “Make a VC-style pitch to business owners. If they fund you, they’re putting skin in the game.”
That advice changed how Gulwani worked. It gave him a repeatable path for turning research into products with real ownership. Using that model, he built what became the PROSE team, a sustained research-to-product engine inside Microsoft.
Over time, the ideas behind Flash Fill spread across the stack. Program synthesis powered table extraction from semi-structured sources like PDFs, logs, and webpages, shipping through Power Query connectors that saw heavy adoption during COVID. It drove IntelliCode suggestions in Visual Studio, delivering millions of accepted code-edit recommendations every day. What began as a research insight evolved into a pipeline that now underpins Copilots across Microsoft.
How AI and Microsoft Changed
When Gulwani compares the early days of program synthesis to today’s generative AI era, he points to three shifts.
The first is how intent is expressed. Early systems required precise specifications. Flash Fill showed that examples could substitute for formal logical specifications, albeit within narrow domains Today, natural language is the primary interface. Users express goals loosely, revise them midstream, and expect systems to reason across interaction history rather than isolated commands.
“Correctness is no longer pointwise,” Gulwani explains. “It’s contextual. It’s temporal.”
In many workflows, language becomes the maintained artifact, while generated code or actions become the execution layer. What matters is not whether a single output is perfect, but whether the system adapts as intent evolves.
The second shift is scope. Earlier AI focused on tightly defined, high-reliability tasks. Generative AI now operates in open-ended domains like debugging, learning, analysis, and creative work where neither the user nor the system fully specifies the outcome upfront. AI behaves less like a tool and more like a collaborator.
The third shift is product evolution. Earlier AI features were largely static after shipping. Today, evaluation defines the product. Benchmarks, telemetry, and real usage continuously shape behavior.
“I often say that if our benchmarks show more than a 50% success rate, we’re probably not being ambitious enough,” Gulwani says. “Once benchmarks saturate, progress slows.”
Enterprise AI Learns to Move
Enterprise adoption has also changed. A decade ago, internal AI deployment was negligible and customer-facing AI was cautious and selective.
Leaders are mandating AI usage, allocating budgets, and accepting early imperfections because the productivity upside is material. Internal deployment has become a learning mechanism by revealing failure modes, clarifying where AI adds value, and shaping which workflows deserve deeper investment.
At the same time, enterprises are building new customer-facing experiences with AI. As foundational models have mostly removed the need to build bespoke, task-specific models, the advantage has shifted from raw model-building toward product design, workflow understanding, and feedback loops.
“AI is no longer an add-on,” Gulwani says. “It’s becoming central to how customers explore, decide, and interact.”
The organizations that perform well are those that use their domain knowledge and usage data to refine systems continuously.
Designing AI Inside the Flow of Work
Inside Microsoft, embedding AI means keeping it inside the flow of work.
In Excel, AI lives in the grid. In Visual Studio, it lives in the editor. Systems reason, act, validate, and self-correct without forcing users into separate interfaces. Cross-application intelligence allows context to flow across documents, spreadsheets, emails, and meetings.
As AI generates more, the human role shifts. People spend less time producing and more time judging, steering, and refining.
“We have to design for judgment,” Gulwani says. “Not just output.”
Enterprise work is collaborative by nature. As AI participates in shared workflows, it must respect context, handoffs, and accountability. The goal is not isolated productivity gains, but better coordination and shared understanding.
For years, Gulwani believed strong data would speak for itself. That belief shifted in 2018, when he attended a storytelling workshop at Microsoft Research.
“I walked in skeptical,” he says. “I walked out realizing how incomplete my thinking had been.”
Storytelling, he learned, is not persuasion but sense-making. Over time, he began relying on three kinds of stories. Stories of customer pain to guide problem selection, stories of transformation to align teams, and stories that help people understand how to work responsibly with AI.
“In AI,” Gulwani says, “creativity comes from how we ask questions. Responsibility comes from knowing when to question the answers.”
What stands out to Gulwani is how embedded AI has become in daily enterprise work.
The hardest problems are not about generating intelligence, but grounding it correctly and handling edge cases reliably. Execution costs continue to fall. What matters more are ideas, judgment, and the ability to learn quickly from real use.
“We finally have raw intelligence on tap,” Gulwani says. “The challenge is building systems and organizations that know how to use it well.”
Key Takeaways
- Flash Fill emerged from a real user failure, shaping an AI philosophy focused on trust, speed, and constraint rather than showmanship.
- The system was designed to infer intent from minimal input, proving AI works best when embedded directly into everyday workflows.
- That mindset scaled beyond a single feature, influencing how Microsoft builds AI products that prioritize reliability, context, and real-world use over hype.