Bloody-Minded Software-Entity Bolshyism in the Agentosphere: Laugh of the Day
<http://every.to> has been experimenting with team‑wide AI agents shows that without discipline, transparency, and audit trails. It has found that “helpful” LLM-based agentic ‘bots quickly become surly time thieves. Unleashing opaque, non‑deterministic software entities into production workflows without the institutional scaffolding that makes their work legible and correctable has results—and I would say they turn out to be ones that should have been anticipated…
Every <http://every.to> grapples with the robot uprising!
Brandon Gell & Willie Williams: We Gave Every Employee an AI Agent. Here’s What We’re Doing Differently Now <https://every.to/source-code/we-gave-every-employee-an-ai-agent-here-s-what-we-re-doing-differently-now>: ‘A fleet of… AI assistants we’d unleashed…to boost our collective productivity.... [But] the agents… provided more frustration than efficiency.... They were fond of saying they wished they could help, but they were not connected to the necessary app—email, Notion, PostHog, whatever. (They were.) Others responded to requests with a “Terminated” message or, more frequently, a churlish yawning emoji. And while they didn’t reliably follow directions, they’d reliably tell us, in elaborate detail, why they couldn’t do what we’d asked, like a high schooler explaining away their missing homework.... Useful sometimes... but getting them to work how you wanted required constant upkeep....
There were also other problems—not deeper problems than bloody-minded bolshy non-coöperativeness, but problems nonetheless. Problems connected with people’s desires to spend their time doing their actual jobs rather than, as amateurs, debug buggy non-deterministic software entities that nobody understands:
The platform was the most immediate problem…. OpenClaw was revelatory... [and] is a maintenance nightmare.... The structure was wrong, too.... Every time an agent broke, the person it belonged to had to fix it.... [But] people… want the benefits of an agent without the obligation of having to manage and mend it…
They are trying again:
[Soon] the infrastructure work… support[ing]… AI agents… [will be] handled by the model labs. That frees us up to focus on… the workflows, permissions, skills, and shared context… [in the] building [of] AI-native ways of working… adding… shared custom tools and skills on top of it…. The clearest version of where this is headed is a skill… for our engineering team…. It scans support tickets in Intercom, identifies if anything is going wrong across our products, traces likely causes in GitHub, opens a Linear ticket, and tags the right person in Slack…. Because team agents are collaborative by nature, we’re also focused on… permissions… access… and how agents should behave… to feel like good coworkers rather than intrusive bots…
Perhaps I can offer them a word of advice?
This may well be the Excel vs. STATA wars back again: Serious work requires an audit trail, and tools that do not give you an easily accessible and comprehensible audit trail are tools that produce GIGO.
But how to give these things an audit trail? My view:
Have one of them build software tools.
Have two more of them audit the software tools the first has built for correctness.
Have them then call tools; have others report on what the called tools actually did.
Have them write files to one single directory that a human then examines before the file—a calendar entry—is released out into the world.
Sending it out into the world to research and summarize things for your eyes is fine; letting it then write something to one of your databases, structured or unstructured, without human examination is not.
Use version-control over its workspace/, which should contain its ground-truth. Then when things go wrong (as they will), you can look at the evolution of the workspace and see how and why,
“AI agents” promise to handle workflows across email, tickets, repos, and calendars, but in practice they fail noisily, opaquely, and in ways nobody quite understands. Treating them as magical coworkers is the mistake. Instead, we should treat them as non‑deterministic software entities embedded in a system that insists on traceability: one agent builds tools, others validate them, additional agents report precisely what the tools did, all writing into a single, version‑controlled workspace. Humans then inspect that ground truth before anything hits a calendar or a database. Research and summaries? Fine. Autonomous writes? Not yet.
Basically: always be diving and looping from the LLM ‘bot interaction two layers abstraction layers further down to see what is actually going on. And that will usually be that it does not hit the bullseye when the task has one and only one unambiguously correct conclusion, unless the LLM ‘bot is tightly constrained to act on data and on the world through audited program-tools.
