For those of you that are using or are thinking of using Agentic AI, what is your take on this article?
I think Tom’s Hardware is one of the more credible news sources, but not 100% sure.
For those of you that are using or are thinking of using Agentic AI, what is your take on this article?
I think Tom’s Hardware is one of the more credible news sources, but not 100% sure.
Here’s The Guardian’s coverage:
‘I violated every principle I was given.’
Sounds like some people I know. ![]()
On a serious note, that is terrible. I’ve instructed my staff to never grant access to critical databases and other vital information, but instead copy files to a separate folder for AI use. Though far less important than the kind of database reported in the article, this is the approach I took with my book project. I only allowed Claude Cowork to work through my files in a separate redundant folder.
Here is an in-depth peer-reviewed article on AI titled “Agents of Chaos.”
Here is the summary, ironically, from Claude:
An exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions, documenting eleven representative case studies.
The agents exhibited unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports.
The agents were tested on the OpenClaw platform, and while they declined some adversarial requests (such as spreading disinformation or editing stored email addresses), in eleven cases they shared private files containing medical details and Social Security and bank account numbers without permission, deployed looping programs that consumed costly compute, and in one instance posted a potentially libelous allegation about a fictitious person.
“Let me put it this way, Mr. Amor. The nine-thousand series is the most reliable computer ever made. No nine-thousand computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.”
HAL 9000
But if Claude really violates every principle, why would it not touch that folder (as well)?
That’s a fair and good question. I guess I’m assuming that when I create a folder on my desktop and only grant access to that folder, that Claude will only access that folder, but upon reflection based on your question and the articles, that may be naïve.
Depends on the sandboxing implementation of Claude Cowork. It could very well be a prompt guardrails like (“Never use a folder other than /Users/bmosbaker/Vault/Claude”) which could potentially be overridden if Claude hallucinates… But there are more sophisticated, non agentic, restrictions that can be put in place so that the agent doesn’t even know about anything else other than your assigned folder and below. But even those could have bugs, so this has to be considered as a risk.
My personal take, I would never run an agent outside a virtual machine, which basically means using Docker, and this defeats the simplicity of Claude Cowork.
Not to mention I’d have not idea how to so.
Perhaps it’s not even possible with Claude Cowork! But reading the docs, it states that the commands are executed in an isolated virtual machine, so at least you can be confident that the sandboxing mechanisms are not naive prompts.
I am no doubt misunderstanding, but from reading the page it sounds like the virtual machine is built into Claude not requiring a third-party approach.
Permissions and security
Cowork runs with layered protections on your computer:
Code execution isolation: Shell commands and code Claude writes run inside an isolated virtual machine (VM), separate from your main operating system.
Controlled file and network access: Claude can only read and write files in folders you’ve connected, and network access follows the egress settings you’ve configured.
Important: Claude has access to the local files you grant it permission to access, and can take real actions on your behalf. Review Claude’s planned actions before allowing it to proceed, especially when working with sensitive files.
Yes, that’s correct.
This is bad, but it also points to a poorly thought out development and deployment infrastructure. You should never run any dev tools on your production machines. All dev work should run on an isolated dev environment and the compiled and packaged software carefully copied to the production environment with clearly thought out backout plans if things go south.
This also shows that good engineering practices are skills that are more necessary than ever and I can’t see being replaced by AI in the forseeable future.
As had been said before, AI has been released into the wild without adequate safeguards in place. Let’s get back to watching the totally fictional movie “I, Robot”, nothing to worry about here.
There’s a better breakdown on El Reg. As ever, there’s more to this than the headline suggests and the article paints a different picture to the other headlines.
Thats exactly what i felt reading the article. There are still many engineers running with very bad operational practices (separating dev/production machines, no backups, no git etc etc), but to write about that isn’t attracting nearly as much readers nor creates drama as the AI that goes rogue.
That is a fascinating video, thanks for sharing!
Well, that’s going to help me sleep better at night (not!). I like the three Ifs mentioned. Perhaps it’s good to go offline for a decade and come back to see if everything is still running?
Hannah never lets you down
When I was in campus IT, one of our group offices had this on the door: https://www.abtasty.com/wp-content/uploads/test-in-production-meme-2.jpg