Now Reading: AI agents and IT ops: Cowboy chaos rides again

Loading
svg

AI agents and IT ops: Cowboy chaos rides again

NewsJanuary 22, 2026Artifice Prime
svg8

In a traditional IT ops culture, sysadmin “cowboys” would often SSH into production boxes, wrangling systems by making a bunch of random and unrepeatable changes, and then riding off into the sunset. Enterprises have spent more than a decade recovering from cowboy chaos through the use of tools such as configuration management, immutable infrastructure, CI/CD, and strict access controls. But, now, the cowboy has ridden back into town—in the form of agentic AI.

Agentic AI promises sysadmins fewer manual tickets and on‑call fires to fight. Indeed, it’s nice to think that you can hand over the reins to a large language model (LLM), prompting it to, for example, log into a server to fix a broken app at 3 a.m. or update an aging stack while humans are having lunch. The problem is that an LLM is, by definition, non‑deterministic: Given the same exact prompts at different times, it will produce a different set of packages, configs, and/or deployment steps to perform the same tasks—even if a particular day’s run worked fine. This would hurtle enterprises back to the proverbial O.K. Corral, which is decidedly not OK.

I know, first-hand, that burning tokens is addictive. This weekend, I was troubleshooting a problem on one of my servers, and I’ll admit that I got weak, installed Claude Code, and used it to help me troubleshoot some systemd timer problems. I also used it to troubleshoot a problem I was having with a container, and with validating an application with Google. It’s so easy to become reliant on it to help us with problems on our systems. But, we have to be careful how far we take it.

Even in these relatively early days of agentic AI, sysadmins know it’s not a best practice to set an LLM off on production systems without any kind of guardrails. But, it can happen. Organizations get short-handed, people get pressured to do things faster, and then desperation sets in. Once you become reliant on an AI assistant, it’s very difficult to let go.

Rely on agentic AI for non-deterministic tasks

What sysadmins need to start thinking about is balancing their use of AI among deterministic and non-deterministic tasks—using AI for non‑deterministic work and then forcing everything important back into the deterministic world.

Non‑deterministic work is the exploratory, ambiguous, “figure it out” side of engineering—searching the Internet, reconciling docs, experimenting with different config patterns, sketching out playbooks or Dockerfiles. Deterministic work is what actually, effectively, purposely, and safely runs your business at scale—the scripts, container images, and pipelines that behave the same way every time across tens or hundreds or thousands of systems.

Retrieval-augmented generation (RAG), agent frameworks, and tool‑calling models all exist to reconnect a drifting, probabilistic model to grounded, deterministic data and systems. Whether the model is hitting a vector database, an API, a ticketing system, or a calculator, the protocol should be the same: Let the LLM reason in a fuzzy space, then anchor its output in something that behaves predictably when executed. Enterprises that blur that boundary—by letting the probabilistic part touch production directly—are inviting cowboy-level chaos.

What to build (and not to build) with agentic AI

The right pattern is not “AI builds the environment,” but “AI helps design and codify the artifact that builds the environment.” For infrastructure and platforms, that artifact might be a configuration management playbook that can install and harden a complex, multi‑tier application across different footprints, or it might be a Dockerfile, Containerfile, or image blueprint that can be committed to Git, reviewed, tested, versioned, and perfectly reconstructed weeks or months later.

What you don’t want is an LLM building servers or containers directly, with no intermediate, reviewable definition. A container image born from a chat prompt and later promoted into production is a time bomb—because, when it is time to patch or migrate, there is no deterministic recipe to rebuild it. The same is true for upgrades. Using an agent to improvise an in‑place migration on a one‑off box might feel heroic in the moment, but it guarantees that the system will drift away from everything else in your environment.

The outcomes of installs and upgrades can be different each time, even with the exact same model, but it gets a lot worse if you upgrade or switch models. If you’re supporting infrastructure for five, 10, or 20 years, you will be upgrading models. It’s hard to even imagine what the world of generative AI will look like in 10 years, but I’m sure Gemini 3 and Claude Opus 4.5 will not be around then.

The dangers of AI agents increase with complexity

Enterprise “applications” are no longer single servers. Today they are constellations of systems—web front ends, application tiers, databases, caches, message brokers, and more—often deployed in multiple copies across multiple deployment models. Even with only a handful of service types and three basic footprints (packages on a traditional server, image‑based hosts, and containers), the combinations expand into dozens of permutations before anyone has written a line of business logic. That complexity makes it even more tempting to ask an agent to “just handle it”—and even more dangerous when it does.

In cloud‑native shops, Kubernetes only amplifies this pattern. A “simple” application might span multiple namespaces, deployments, stateful sets, ingress controllers, operators, and external managed services, all stitched together through YAML and Custom Resource Definitions (CRDs). The only sane way to run that at scale is to treat the cluster as a declarative system: GitOps, immutable images, and YAML stored somewhere outside the cluster, and version controlled. In that world, the job of an agentic AI is not to hot‑patch running pods, nor the Kubernetes YAML; it is to help humans design and test the manifests, Helm charts, and pipelines which are saved in Git.

Modern practices like rebuilding servers instead of patching them in place, using golden images, and enforcing Git‑driven workflows have made some organizations very well prepared for agentic AI. Those teams can safely let models propose changes to playbooks, image definitions, or pipelines because the blast radius is constrained and every change is mediated by deterministic automation. The organizations at risk are the ones that tolerate special‑case snowflake systems and one‑off dev boxes that no one quite knows how to rebuild. The environments that still allow senior sysadins and developers to SSH into servers are exactly the environments where “just let the agent try” will be most tempting—and most catastrophic.

The quiet infrastructure advantage

The organizations that will survive the agent hype cycle are the ones that already live in a deterministic world. Their operating model assumes you do not poke at production by hand; you declare the desired state, you automate the path to get there, and you repeat it across thousands of systems without drama. In that kind of environment, AI shows up at build time and design time, not as an ungoverned runtime actor improvising in front of customers.

The real prize is not another shiny “AI agent for your infrastructure” banner. It is an opinionated stack that refuses to let AI touch production except through artifacts that can be tested, audited, and rerun on demand. That stack quietly protects enterprises from their own worst impulses: from the desperate developer at a startup who is tempted to give an LLM shell access, to the overworked sysadmin staring down a terrifying upgrade window on a legacy box. In that world, AI does what it does best—explore, analyze, propose—while the underlying platform does what it must: keep the cowboys, human or machine, out of production!

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Original Link:https://www.infoworld.com/article/4119285/ai-agents-and-it-ops-cowboy-chaos-rides-again.html
Originally Posted: Thu, 22 Jan 2026 09:00:00 +0000

0 People voted this article. 0 Upvotes - 0 Downvotes.

Artifice Prime

Atifice Prime is an AI enthusiast with over 25 years of experience as a Linux Sys Admin. They have an interest in Artificial Intelligence, its use as a tool to further humankind, as well as its impact on society.

svg
svg

What do you think?

It is nice to know your opinion. Leave a comment.

Leave a reply

Loading
svg To Top
  • 1

    AI agents and IT ops: Cowboy chaos rides again

Quick Navigation