5 key agenticops practices to start building now
AI agents combine language and reasoning models with the ability to take action through automations and APIs. Agent-to-agent protocols like the Model Context Protocol (MCP) enable integrations, making each agent discoverable and capable of orchestrating more complex operations.
Many organizations will first experiment with AI agents embedded in their SaaS applications. AI agents in HR can assist recruiters with the hiring process, while AI agents in operations address complex supply-chain issues. AI agents are also transforming the future of work by taking notes, scheduling meetings, and capturing tasks in workflow tools.
Innovative companies are taking the next steps and developing AI agents. These agents will augment proprietary workflows, support industry-specific types of work, and will be integrated into customer experiences. To develop these AI agents, organizations must consider the development principles, architecture, non-functional requirements, and testing methodologies that will guide AI agent rollouts. These steps are essential before deploying experiments or promoting AI agents into production.
Rapidly deploying AI agents poses operational and security risks, prompting IT leaders to consider a new set of agentic operations practices. agenticops will extend devops practices and IT service management functions to secure, observe, monitor, and respond to AI agent incidents.
What is agenticops?
Agenticops builds on several existing IT operational capabilities:
- AIops emerged several years ago to address the problem of having too many independent monitoring tools. AIops platforms centralize logfiles and other observability data, then apply machine learning to correlate alerts into manageable incidents.
- Modelops emerged as a separate capability to monitor machine learning models in production for model drift and other operational issues.
- Combining platform engineering, automating IT processes, and using genAI in IT operations helps IT teams improve collaboration and resolve incidents efficiently.
Agenticops must also support the operational needs unique to managing AI agents while providing IT with new AI capabilities.
DJ Sampath, SVP of the AI software and platform group at Cisco, notes that there are “three core requirements of agenticops”:
- Centralizing data from across multiple operational silos together
- Supporting collaboration between humans and AI agents
- Leveraging purpose-built AI language models that understand networks, infrastructure, and applications
“AI agents with advanced models can help network, system, and security engineers configure networks, understand logs, run queries, and address issue root causes more efficiently and effectively,” he says.
These requirements address the distinct challenges involved with managing AI agents versus applications, web services, and AI models.
“AI agents in production need a different playbook because, unlike traditional apps, their outputs vary, so teams must track outcomes like containment, cost per action, and escalation rates, not just uptime,” says Rajeev Butani, chairman and CEO of MediaMint. “The real test is not avoiding incidents but proving agents deliver reliable, repeatable outcomes at scale.”
Here are five agenticops practices IT teams can begin to integrate now, as they begin to develop and deploy more AI agents in production.
1. Establish AI agent identities and security profiles
What data and APIs are agents empowered to access? A recommended practice is to provision AI agents the same way we do humans, with identities, authorizations, and entitlements using platforms like Microsoft Entra ID, Okta, Oracle Identity and Access Management, or other IAM (identity and access management) platforms.
“Because AI agents adapt and learn, they need strong cryptographic identities, and digital certificates make it possible to revoke access instantly if an agent is compromised or goes rogue,” says Jason Sabin, CTO of DigiCert. Securing agent identities in this manner, similar to machine identities, ensures digital trust and accountability across the security architecture.”
Recommendation: Architects, devops engineers, and security leaders should collaborate on standards for IAM and digital certificates for the initial rollout of AI agents. But expect capabilities to evolve, especially as the number of AI agents scales. As the agent workforce grows, specialized tools and configurations may be needed.
2. Extend platform engineering, observability, and monitoring for AI agents
As a hybrid of application, data pipelines, AI models, integrations, and APIs, AI agents require combining and extending existing devops practices. For example, platform engineering practices will need to consider unstructured data pipelines, MCP integrations, and feedback loops for AI models.
“Platform teams will play an instrumental role in moving AI agents from pilots into production,” says Christian Posta, Global Field CTO of Solo.io. “That means evolving platform engineering to be context aware, not just of infrastructure, but of the stateful prompts, decisions, and data flows that agents and LLMs rely on. Organizations get observability, security, and governance without slowing down the self-service innovation AI teams need.”
Similarly, observability and monitoring tools will need to help diagnose more than uptime, reliability, errors, and performance.
“AI agents require multi-layered monitoring, including performance metrics, decision logging, and behavior tracking,” says Federico Larsen, CTO of Copado. “Conducting proactive anomaly detection using machine learning can identify when agents deviate from expected patterns before business impact occurs. You should also establish clear escalation paths when AI agents make unexpected decisions, with human-in-the-loop override capabilities.”
Observability, monitoring, and incident management platforms with capabilities supporting AI agents as of this writing include BigPanda, Cisco AI Canvas, Datadog LLM observability, and SolarWinds AI Agent.
Recommendation: Devops teams will need to define the minimally required configurations and standards for platform engineering, observability, and monitoring for the first AI agents deployed to production. Then, teams should monitor their vendor capabilities and review new tools as AI agent development becomes mainstream.
3. Upgrade incident management and root cause analysis
Site reliability engineers (SREs) often struggle to find root causes for application and data pipeline issues. With AI agents, they will face significantly greater challenges.
When an AI agent hallucinates, provides an incorrect response, or automates improper actions, SREs and IT operations must respond and resolve issues. They will need to trace the agent’s data sources, models, reasoning, empowerments, and business rules to identify root causes.
“Traditional observability falls short because it only tracks success or failure, and with AI agents, you need to understand the reasoning pathway—which data the agent used, which models influenced it, and what rules shaped its output,” says Kurt Muehmel, head of AI strategy at Dataiku. “Incident management becomes inspection, and root cause isn’t just, “the agent crashed,” it’s “the agent used stale data because the upstream model hadn’t refreshed.” Enterprises need tools that inspect decision provenance and tune orchestration—getting under the hood, not just asking what went wrong.”
Andy Sen, CTO of AppDirect, recommends repurposing real-time monitoring tools and utilizing logging and performance metrics to track AI agents’ behavior. “When incidents occur, keep existing procedures for root cause analysis and post-incident reviews, and provide this data to the agent as feedback for continuous improvement. This integrated approach to observability, incident management, and user support not only enhances the performance of AI agents but also ensures a secure and efficient operational environment.”
Recommendation: Select tools and train SREs on the concepts of data lineage, provenance, and data quality. These areas will be critical to up-skilling IT operations to support incident and problem management related to AI agents.
4. Track KPIs on model accuracy, drift, and costs
Most devops organizations look well beyond uptime and system performance metrics to gauge an application’s reliability. SREs manage error budgets to drive application improvements and reduce technical debt.
Standard SRE practices of understanding business impacts and tracking subtle errors become more critical when tracking AI agents. Experts identified three areas where new KPIs and metrics may be needed to track an AI agent’s behaviors and end-user benefits continuously:
- Craig Wiley, senior director of product for AI/ML at Databricks, says, “Defining KPIs can help you establish a proper monitoring system. For example, accuracy must be higher than 95%, which can then trigger alert mechanisms, providing your organization with a centralized visibility and response system.”
- Jacob Leverich, co-founder and CPO of Observe, Inc., says, “With AI agents, teams may find themselves taking a heavy dependency on model providers, so it becomes critical to monitor token usage and understand how to optimize costs associated with the use of LLMs.”
- Ryan Peterson, EVP and CPO at Concentrix, says, “Data readiness isn’t a one-time check; it requires continuous audits for freshness and accuracy, bias testing, and alignment to brand voice. Metrics like knowledge base coverage, update frequency, and error rates are the real tests of AI-ready data.”
Recommendation: Leaders should define a holistic model of operational metrics for AI agents, which can be implemented using third-party agents from SaaS vendors and proprietary ones developed in-house.
5. Capture user feedback to measure AI agent usefulness
Devops and ITops sometimes overlook the importance of tracking customer and employee satisfaction. Leaving the review of end-user metrics and feedback to product management and stakeholders is shortsighted, even in the application domain. Such review becomes a more critical discipline when supporting AI agents.;
“Managing AI agents in production starts with visibility into how they operate and what outcomes they drive,” says Saurabh Sodani, chief development officer at Pendo. “We think about connecting agent behavior to the user experience and not just about whether an agent responds, but whether it actually helps someone complete a task, resolve an issue, or move through a workflow, all the while being compliant. That level of insight is what allows teams to monitor performance, respond to issues, and continuously improve how agents support users in interactive, autonomous, and asynchronous modes.”
Recommendation: User feedback is essential operational data that shouldn’t be left out of scope in AIops and incident management. This data not only helps to resolve issues with AI agents, but is critical for feeding back into AI agent language and reasoning models.
Conclusion
As more organizations develop and experiment with AI agents, IT operations will need the tools and practices to manage them in production. IT teams should start now by tracking end-user impacts and business outcomes, then work deeper into tracking the agent’s performance in recommending decisions and providing responses. Focusing only on system-level metrics is insufficient when monitoring and resolving issues with AI agents.
Original Link:https://www.infoworld.com/article/4100507/5-key-agenticops-practices-to-start-building-now.html
Originally Posted: Tue, 16 Dec 2025 09:00:00 +0000












What do you think?
It is nice to know your opinion. Leave a comment.