Wetware-Hardware Centaurs, Not Digital Gods: Wednesday MAMLMs

Faster GPUs won’t conjure a world model from out of thin air we’re scaling mimicry, not understanding: that is my guess as to why the MAMLM frontier is spiky, with breathtaking benchmarks, baffling failures, real consequences…

Share


Models ace PhD quizzes, then misclick a checkout button and hallucinate a blazer and tie. If these systems are about to become “better than humans at ~everything,” why can’t they keep time on a roasting turkey?Without being built from studs up around a world model—durable representations of time, causality, and goals—frontier MAMLMs systems are high‑speed supersummarizers of the human record and hypercompetent counters at scale, yet brittle in embodied context, interfaces, and long‑horizon tasks. Calling it a “jagged frontier” misidentifies this unevenness, except to the extent it leads to permanent acceptance of centaur workflows where humans supply judgment and guardrails.

Or so I guess.

Share DeLong’s Grasping Reality: Economy in the 2000s & Before


I really do not grok things like this.

It seems obvious to me that “AI” as currently constituted—Modern Advanced Machine-Learning Models, MAMLMs, relying on scaling laws and bitter lessons—will not be “better than humans at ~everything” just with faster chips and properly-tuned GPUs and software stacks. Without world models, next‑token engines merely (merely!) draw on and summarize the real ASI, the Human Collective Mind Anthology Super‑Intelligence, and excel only where answers are clear or where counting at truly massive scale suffices: fast mimics—useful, but narrow.

That seems obvious to me. But not to Helen Toner.

And so Helen Toner stands across a vast gulf, among the serried ranks of AI-optimist singulatarians who believe that we are building our cognition betters, and face the “peak horse” problem that the steam engine, the internal-combustion engine, and the electric motor brought to the equine. And yet she is not on Team Artificial Super-Intelligence by 2030, not at all:

Helen Toner: Taking Jaggedness Seriously <https://helentoner.substack.com/p/taking-jaggedness-seriously>: ‘Why we should expect AI capabilities to keep being extremely uneven, and why that matters…. Two things are true…. AI models keep getting better and better… [and] AI models keep sucking, and the things they keep sucking at are kind of confusing…. Who here saw Project Vend from Anthropic? This was so good. For anyone who isn’t familiar, basically a little, not really a vending machine, it’s a little fridge with a point of sale on top…. the fact that you can run this and it actually had a store and it sold things and it priced things… seems like… I’m not going to say any given acronym.

But also, it kept sucking at a confusing combination of things. It was offered $100 for a set of drinks that cost $15; it said it would consider it, but didn’t really want it. It was receiving payments via Venmo, but it just made up what its Venmo address was sometimes and so couldn’t actually receive the money. Anthropic employees started asking it for tungsten cubes and then it decided that it was going to sell metal objects but didn’t look up how much they cost, so it was selling at a loss. And then it went into total meltdown at one point and was having a pretty existential crisis, and told one Anthropic user that it would be waiting for the Anthropic employee by the store, wearing a navy blue blazer and a red tie.

So, that’s confusing. If you’re not confused by this set of data points, then I claim you should be…

Give a gift subscription

And yet I do not see this as confusing at all.

What you have here is a function: prompts → next-word-continuations. (OK, next token.) This function answers the question: “What did the Typical Internet S***poster think was the next word in a stream-of-text situation like this one?” And as the prompt evolves recursively, which TIS it is copying from changes. And so all of a sudden we have not the same TIS who was not manning the cash-register, but a different TIS standing at the vending machine wearing a navy-blue blazer and a red tie waiting for a date.

Read more