AI doesn’t think like a human. Stop talking to it as if it does
Autonomous agents take the first part of their names very seriously and don’t necessarily do what their humans tell them to do — or not to do.
But the situation is more complicated than that. Generative (genAI) and agentic systems operate quite differently than other systems — including older AI systems — and humans. That means that how tech users and decision-makers phrase instructions, and where those instructions are placed, can make a major difference in outcomes.
AI systems have already developed quite a history of disregarding instructions and overriding guardrails. (I’ll spare you for now my admonitions about how the “lack of trustworthiness of today’s genAI and agentic systems is a dealbreaker that means they should simply not be used.”)
But this month saw two powerful examples of how two hyperscalers — AWS and Meta — got burned by how they communicated with these complicated AI systems.
The first involved a December incident affecting AWS, where an engineer didn’t know his own privileges and therefore didn’t know — literally — what his agentic system was capable of doing. The agent deleted and then recreated a key AWS environment.
AWS declined to say just what the system had asked and what the engineer said when approving the request.
The Meta mess
The Meta case is even more frightening because the perpetrator/victim was not some nameless AWS engineer, but the director of AI Safety and Alignment at Meta Superintelligence Labs, Summer Yue.
As Yue described the incident in a posting on X, “Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to run to my Mac mini like I was defusing a bomb.”
Yue may have only begun working for Meta last July, but she held senior AI roles for years, including stints as VP/Research at Scale AI and five years in senior research positions at Google. She was no novice.
When someone in the discussion group asked how it happened, her posted reply said: “Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”
Yue said she had instructed the system to “check this inbox and suggest what you would archive or delete. Don’t take action until I tell you to.” She added that “this has been working well for my toy inbox, but my real inbox was huge and triggered compaction. During the compaction, it lost my original instruction.”
As various readers in that forum noted, Yue tried begging the agent to stop deleting her emails (she told the system “Stop don’t do anything”) as opposed to giving a machine-friendly order such as /stop or /kill. She eventually made the system respond when she got to her desktop computer. (She had been trying to stop things from her phone, which didn’t work.)
One commenter suggested the problem involved giving a prompt, which agents do not always follow, especially if there is a long list of prompts. “The real fix is architectural. Write critical instructions to files the agent re-reads every cycle, not online instructions that vanish when the context window fills up.”
Lessons learned?
There are many lessons to unpack from the monstrous Meta mishap. First, don’t rush to extrapolate from what an agent does with a small test area or even a sandboxed trial performed with air-gapped machines. Once it’s released into the wild of a global environment, lessons learned from limited exposure might not apply. Tests show what an agent can do, not necessarily what it will do when unleashed.
Even ordinary communications with an agent can be problematic. When an agent asks for permission to perform a function, avoid assuming any common sense or shared understanding of reasonableness.
In the AWS situation, AWS said the engineer’s first mistake was not understanding their own system privileges and therefore what capabilities and access they’d given to the agent. That suggests a good procedure: create accounts with minimal access and then log into that low-level account when creating the agent.
That won’t guarantee that the agent will obey its instructions, but at least it will limit how much damage it can do if/when it goes rogue.
I asked Claude — who better to know how to talk with a large language model (LLM) than an LLM? — for tips on talking with agents. “Rather than implying constraints, state them directly. Instead of ‘keep it appropriate,’ say, “Do not include any violence, profanity, or adult content.’ The more precise the boundary, the easier it is to follow consistently.”
Even better, Claude suggested telling an LLM “both what to do and what not to do. For example: ‘Write only about the topic I provide. Do not go off-topic, add unsolicited advice, or mention competing products.’”
Claude also acknowledged its own systems can forget instructions. “For long conversations or complex system prompts, restating the most important guardrails near the end or in a summary helps them stay active in Claude’s attention.” In other words, treat LLMs as if you’re talking with a 2-year-old.
The real world is different
Part of the problem involves the nature of autonomous agents. Enterprises are not used to them and they think they are safely cocooned inside of walled-off sandboxes during their proof-of-concept (POC) testing — just like 99% of the trials they’ve seen for decades.
But agentic AI doesn’t work that way. For those agents to deliver the massive efficiencies and flexibilities that hyperscaler sales people promise, they need to be dispatched in the wild, touching lots of live systems and interacting with other agents.
That forces an impossible choice: keeping the agents secure means they can’t deliver the purported benefits. A wise executive would say, “So be it. The risk of letting these agents loose is way too high. Cancel all genAI and agentic POCs.”
But wise executives also like to keep their jobs, which usually means efficiency and cost cutting will beat security and risk every single time.
Joshua Woodruff, CEO of MassiveScale.AI, said the Meta situation offers a good peek into the IT mindset for many agentic trials.
“That’s how most people think about AI safety right now,” he said. “They write an instruction and assume it’s a control. It’s not. It’s a suggestion the model can forget when things get busy. Look at what the agent actually did from a security perspective. It performed well on low-value tasks. It earned trust. It got promoted to access sensitive data. Then it caused damage. That’s the exact behavioral pattern every security team is trained to watch for in humans.
“You have to use those architectural constraints and put the instructions in one of the memory artifacts. That way, it can’t compact it and the rule will have a better chance of surviving. Just remember that the agent can still read the rule and ignore it. Think of it as a policy manual, not a locked door.”
One ongoing issue is that there is a rash of human terms being used to describe these systems — they “think” and use a “reasoning model” — even though users should know that none of these systems do any actual thinking or reasoning, Woodruff said. “It’s just math.”
But that anthropomorphization is dangerous; it allows people to treat and interact with these systems as if they’re human. The next thing you know, an experienced manager at Meta is shouting at her system to please stop.
Treating an autonomous agent as if it’s a person gives a whole new meaning to someone “acting very Meta.”
Original Link:https://www.computerworld.com/article/4138071/ai-doesnt-think-like-a-human-stop-talking-to-it-as-if-it-does.html
Originally Posted: Fri, 27 Feb 2026 07:00:00 +0000












What do you think?
It is nice to know your opinion. Leave a comment.