Agentic Observability: The Third Eye

We’re starting to see different players, agent to agent interactions, multi-turn architectures, and that actually makes things exponentially more complex.

Agentic systems promise speed, adaptability, and autonomy, but they also introduce a new kind of opacity. Decisions emerge from networks of interacting agents, each reasoning, delegating, and executing in ways that even their creators cannot always anticipate. For enterprises now deploying these systems, the challenge has shifted from building them to understanding them. That tension was the focus of a session at MachineCon New York 2025 titled “Agentic Observability: The Third Eye”, presented by Wil Pong, Vice President of Product at Fiddler AI.

The New Complexity of Agentic Systems

In the next couple of years, we have not seen this kind of transformational change since probably the cloud” said Pong as he explained that the real shift lies in how requests play out inside these systems. What seems simple on the surface can branch into a web of reasoning, delegation, and coordination. Agents pick up tasks, hand them off, and sometimes revisit them through reflection cycles. A single query can expand into multiple paths, each adding to the overall complexity. We’re starting to see different players, agent to agent interactions, multi-turn architectures, and that actually makes things exponentially more complex, he said.

Why Traditional Monitoring Falls Short

This level of interaction raises the bar for enterprises. Systems that once followed predictable workflows now behave more like evolving networks. They generate results through chains of actions that are difficult to trace end-to-end. That is why observability becomes essential. It gives teams the ability to see how agents are working together, where breakdowns occur, and how performance connects to cost and reliability.

See why this matters in a travel-planning scenario. If a customer requests a trip from San Francisco to New York, a traditional programmed workflow would move through a sequence of steps. In an agentic system, a supervisory agent coordinates others: one searches flights, another checks hotels, and another aligns schedules. These agents sometimes run in parallel, sometimes in sequence, and sometimes in conflict. Humans, accustomed to comparing browser tabs and reconciling results, can resolve these differences instinctively. Agents, however, need monitoring to ensure they are aligned and consistent. Pong emphasized that agentic systems combine machine learning models, large language models, and application tool calls into a complex orchestration. “All these things together, that’s what makes an agentic system work,” he said, “and that is where we need to evolve the tools we use to observe all that’s happening.”    

The challenge is that traditional observability tools are insufficient. DevOps platforms track latency, server calls, and error rates. Machine learning monitoring tools focus on model performance, data drift, or hallucinations. Agentic systems require both views combined in one place yet most teams today still operate across silos, stitching together dashboards in a swivel-chair fashion that makes root-cause analysis difficult. Pong described it plainly.

Reflection Loops: When AI Questions Itself

Reflection loops add another layer of complexity. Agents are capable of pausing, reviewing their own outputs, and deciding whether results make sense. This is unlike traditional software, which executes instructions without question. Pong explained: “there’s actually a capability for your software to look at its own output and say, does this make sense? Is this what I expected it to do?” Reflection helps catch errors and incomplete tasks before they reach users, improving reliability. The tradeoff is cost. Each cycle of self-checking requires additional compute resources, which can quickly add up. Observability has to capture these loops, showing when they are beneficial and when they become inefficient.

Enterprises also need safeguards that operate in real time. Post-hoc monitoring is too slow when systems face risks like toxic prompts, jailbreak attempts, or fabricated answers. Filtering must happen at the point of interaction. Pong noted that in regulated industries, allowing unsafe or flawed outputs through is not an option. “The ability to do that in real time in a guardrail between the input and output of your agentic system really matters.”

Moving Observability Upstream

The long-term value of observability lies in analysis across time. Pong compared it to managing a workforce. Just as managers study teams to identify inefficiencies and improve processes, organizations can review logs of agent sessions to uncover anomalies and optimizations. Over time, patterns appear that highlight both problems and opportunities. In some cases, agents even find more efficient ways of working than the designers originally intended.

The financial implications cannot be ignored. Agentic systems consume resources differently from traditional applications. A model may produce the right answer but do so at unsustainable cost. Pong raised a practical concern: is my app technically working, but it’s crazy expensive because I’m calling OpenAI 8,000 times per session? I need to know that. Without detailed observability, organizations risk deploying systems that drain budgets while appearing successful. Enterprises need tools that provide transparency into both effectiveness and efficiency, ensuring costs do not spiral beyond value delivered.

Fiddler AI’s own platform demonstrates how observability is changing. The company started with model-level monitoring in areas such as fraud detection and underwriting. Over time, it has expanded into the agentic space, capturing reasoning loops, cross-agent interactions, and quality of execution. Its evaluators measure toxicity, hallucination, and faithfulness. Deployment flexibility across cloud or on-premises environments ensures sensitive data can remain secure while still benefiting from real-time monitoring and analysis.

Pong stressed that observability must move upstream in the development process. Leaving it for post-deployment analysis is too late. Enterprises need visibility while they are still designing, refining prompts, and selecting providers. By embedding observability early, they can shape performance before launch, avoid unnecessary costs, and reduce risks. This shift positions observability as an active part of building reliable and efficient systems.

The Future of Enterprise AI Monitoring

Agentic systems behave less like fixed code and more like evolving networks. They adapt, they coordinate, and they sometimes surprise their creators. This adaptability is powerful, but it also makes outcomes harder to predict. Without observability, scaling such systems becomes risky. With observability embedded from the start, organizations gain the visibility, context, and control they need to deploy them responsibly. That is why observability, described in this session as the “third eye,” is now a central requirement for the next generation of enterprise AI.

📣 Want to advertise in AIM Media House? Book here >

Picture of Mansi Mistri
Mansi Mistri
Mansi Mistri is a Content Writer who enjoys breaking down complex topics into simple, readable stories. She is curious about how ideas move through people, platforms, and everyday conversations. You can reach out to her at mansi.mistri@aimmediahouse.com.
14 of Nov. 2025
The Biggest Exclusive Gathering of
CDOs & AI Leaders In United States

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.