“Our agents know when they don’t have enough data,” says PagerDuty’s David Williams

SVP of Product David Williams discusses how AI, orchestration, and consumption pricing are reshaping enterprise reliability

“The more our customers use PagerDuty, the smarter it gets.”

PagerDuty built its business around operational urgency. Its platform became the coordination layer for incident response, the system that mobilizes engineers when digital infrastructure fails.

That core function is now evolving into a control system for reliability. PagerDuty’s new AI Agent Suite, marks a shift from reacting to incidents to anticipating them, and from automating human workflows to orchestrating intelligent systems.

David Williams, PagerDuty’s Senior Vice President of Product, told AIM Media House that the change is the result of a long arc. “We’ve been building machine-learning-based capabilities for over fourteen years,” he says. “You see the culmination of that work in our AIOps product, where we correlate signals and filter out noise from event data to identify familiar patterns.”

That early work, involving reducing alert storms and detecting recurring failure signatures, set the stage for the agentic architecture now embedded across the Operations Cloud. The transition began in 2024 with PagerDuty Advance, a generative-AI assistant that could summarize incidents and draft updates during response efforts. This year, the company extended that framework with a network of specialized AI agents that handle discrete operational roles and interact through a central orchestrator.

Williams describes the approach as one of “specialized agents and orchestration.” Rather than a single large model attempting to manage every aspect of incident response, PagerDuty now distributes reasoning across smaller, purpose-built systems that collaborate.

One of those systems, the Scribe Agent, automates a routine but time-consuming task in operations: documentation during live incidents. During an incident, Scribe listens to discussions across Zoom or Slack, distills the conversation, and generates summaries for both responders and stakeholders. Those records then feed into the Insights Agent, which correlates conversational context with operational data to reconstruct the full sequence of events. The SRE Agent supports diagnosis and remediation, while the Shift Agent manages on-call schedules and escalation policies.

Overseeing them all is the AI Assistant, which routes queries to the right agent, whether that’s a request for on-call coverage or a question about the root cause of an outage. “The AI assistant orchestrates which agent is needed to act,” Williams explains. “It’s orchestration among specialists, not a single black box trying to do everything.”

Building Guardrails for Autonomous Operations

PagerDuty has been deliberate in defining how far these agents can act autonomously. Each is designed to assess whether it has sufficient data before making a suggestion. “Our agents know when they don’t have enough data,” Williams says. “If confidence is low, they explicitly say so instead of producing answers that sound certain but aren’t.”

PagerDuty also uses partners such as Arize AI to monitor for model drift and relies on AWS Bedrock to select the most suitable foundation model for a given task. Continuous summarization pipelines let the system process hours-long incidents without retaining full transcripts, dynamically compressing context as events unfold.

The company’s broader goal is to move from faster response toward prevention. As customers resolve more incidents through the platform, PagerDuty’s models accumulate operational intelligence: symptoms, root causes, contributing factors, and resolutions. Over time, that knowledge can reach developers in their own environments, warning them when new code resembles patterns that once caused outages. “We want to prevent incidents from happening in the first place,” Williams says. “When developers can see historical incident data directly in their tools, they can make changes that stop recurrences before they happen.”

Humans remain central to this process. “Our goal isn’t to remove humans, but to make their work faster and less painful,” Williams adds. In large enterprises, that precision can reduce the number of responders engaged during an incident. By identifying the failing services and the teams responsible, PagerDuty can improve time to resolution while limiting the number of people disrupted by an alert.

Rewriting the Economics of Operations

Behind the product evolution is a shift in PagerDuty’s business model. Like many enterprise SaaS companies, PagerDuty historically charged per seat, an arrangement that made sense when human users performed most of the work. But as AI agents automate detection, diagnosis, and remediation, that equation changes.

“You’ll increasingly see a consumption-based pricing model become the primary way that we capture revenue from the value we deliver in a very directly measurable way for our customers,” Williams says.

As intelligence becomes the engine of productivity, value is no longer tied to headcount.

PagerDuty’s evolution is taking place in a competitive field. ServiceNow, Cisco’s Splunk, BigPanda, Moogsoft, and a new generation of AI-native startups are all racing toward a similar goal: self-healing systems that detect, diagnose, and fix problems automatically. Most focus on observability or correlation. PagerDuty’s differentiation lies in its embedded role within existing workflows and its long-running dataset of real-world operational events, a contextual edge that cannot be replicated overnight.

Williams also points to the company’s internal structure as part of the story. The four agents were not built by a single centralized AI team but by different product groups working on top of a shared internal platform that handles data pipelines, guardrails, and orchestration logic. Williams says that distributed model allows the company to scale AI development without fragmenting standards or safety controls.

By opening its platform to open protocols like MCP and A2A, PagerDuty is aligning with a wider shift across enterprise software, from closed orchestration to interoperable intelligence. The test ahead will be whether its agents can act as dependably as the responders they support. “The more our customers use PagerDuty, the smarter it gets,” Williams says.

📣 Want to advertise in AIM Media House? Book here >

Picture of Mukundan Sivaraj
Mukundan Sivaraj
Mukundan covers the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com or Signal at mukundan.42.
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.