Notes on the End of the $5 Uber Era of "AI": MONDAY MAMLMs

Hyperscalers are slamming on the brakes, metering “intelligence,” and discovering that someone still has to pay the datacenter bill. Anthropic chases profits, OpenAI fights for its life, Microsoft sells shovels, and everyone tries to figure out what—if anything—could justify an agentic-pace GPU burn. The datacenters are running flat‑out, the subsidies are ending, and the secret sauce may lie less in magic model weights than in the harness wrapped around them.

And Google sends me an email:

Updates to your Google One Premium plan
Hi Brad, We wanted to let you know about changes to your Google One Premium plan. These updates include changes to usage limits included with your subscription and your plan name. What’s changing starting today, Jun 8, 2026:
Updated plan name: Your new plan is now called Google AI Plus, and you’ll still enjoy the same 2 TB of storage and other premium benefits.
Usage limits in the Gemini app: For the Gemini app, we’re introducing compute-based usage limits that factor in the complexity of your prompt, the features you use, and the length of your chat. Your limit refreshes every 5 hours until you reach your weekly limit. As an AI Plus subscriber, you’ll enjoy a 2x higher usage limit than non-subscribers.
AI credits: The product-based usage limit model is also rolling out to other products, starting with Flow and Antigravity. While 200 AI credits will no longer be included as a benefit in your base plan each month, the new usage limit model we are introducing should allow you to maintain the same experience you are used to. To learn more about how to use AI credits, please visit our Help Center…

And so Google becomes the latest of the hyperscalers to decide that it cannot afford to let me burn tokens running its Modern Advanced Machine-Learning Models—its MAMLMs—for well under the marginal cost of the electricity and the depreciation for it to supply them. And do note that this is not an announcement of a future change in the pricing model. This is a change announced and effective a week ago. or perhaps Google has decided that its data centers are now working flat out —either serving its own customers, serving Apple’s customers as Apple’s Private Cloud Compute turns out to be Google-managed runs on NVIDIA chips in Google data centers, or being rented out to others who hope to someday make money from “AI”. with data centers running flat out, charging less than the price the market will bear, does not build reputation and attract customers, but rather turns customers off as congestion and contention produce low-quality service.

And Google is one of the last to do this:

Mike Taylor: How Microsoft Is Building for a World of Metered Intelligence <https://every.to/also-true-for-humans/how-microsoft-is-building-for-a-world-of-metered-intelligence>: ‘In my Uber… I fondly recalled… when you could get anywhere in San Francisco for $5. Those days are long gone…. Uber needed to show a path to profitability ahead of its 2019 IPO…. The “$5 Uber era” of LLMs is over now, too.
AI labs[’] subsidizing subscriptions to the tune of thousands of dollars… can’t continue forever…. Microsoft… switch[ed]… to token-based billing… [and] some users said their bills jumped from $39 to over $3,000 per month…. Intelligence… available on tap… [will be] constrained by how many coins you can put in the meter… [unless] the RTX Spark, a new laptop Microsoft designed for AI workloads with Nvidia…. able to run a medium-sized 128-billion-parameter model locally (frontier models are in the trillions of parameters)… without paying a penny for tokens….
Budget-conscious coders… a year or two behind the frontier… [with] a smaller model…. Microsoft… the place where you can use any model, agent, or harness…. Microsoft’s research lab… released a set of new (cheaper) smaller models…. A smart model to train a dumb one, a process called distillation…. Tech Insider reported that half of announced U.S. data center capacity for 2026 has been cancelled or delayed….
San Francisco allowed for a stark contrast between the token maxxing AI-pilled engineers and the relentlessly pragmatic enterprise leaders who are trying to get this technology to work…. Richly compensated AI researchers spending hundreds of billions of free tokens per month aren’t living in the same world as a junior IT consultant for an enterprise healthcare company in Seattle…. As I rode my Uber back to the airport at the end of the conference, I read… Uber had capped its engineers’ token budget at a sensible $1,500 per month… about 10 percent of a typical Uber engineer’s salary…

As I understand the state of play right now, it is this: All datacenters are running flat-out. All customers are frantically trying to slim down their token bills, running only the models needed to do the job and not models that try to overthink it, and running only the jobs that help pay bills or that genuinely build capabilities to do useful things in the future. Some customers are frantically trying to figure out how to light up their own dark silicon to run things much more cheaply (and securely) on their own devices.

Among those last would, indeed, be me:

At a higher level:

Anthropic is desperately trying to become profitable so that it can actually IPO , and thinks that demand for Claude Code and CoWork and hopefully Design is strong enough that it can get its prices high enough for that to happen.
OpenAI believes it is facing an existential threat and is trying to catch up to Anthropic while also preserving its position as the accidental consumer chatbot company, and use that as a funnel to try to make money.
Microsoft is spreading out all kinds of bets on all kinds of things—in particular its joint bet with NVIDIA that it can sell hardware with an accompanying software harness to thoroughly commoditize the useful things that LLMs do.

And what about Apple, Amazon, Google, Oracle and the other really big boys? And the little boys? What are they doing?

I do not have a very good sense of what they are doing this month. And it will almost surely be different next month.

I think they have decided that they have good enough models that they no longer need to fear losing their current platform monopoly profits as long as they can keep their data center build-outs ahead of natural language interface demands for their services. Hence they too are shifting to “can we avoid losing money on these things?” mode rather than “we have to give away services to keep people stickily attached to our platforms” mode. Thus we are going to see whether the game is worth the candle. Perhaps it will not be. It seems clear to me that good enough natural-language interfaces will be run not in datacenters but on device. And it may be that datacenter-run “agentic” workloads emerge that are worth their electricity, depreciation, and capital costs. But perhaps no such workloads will emerge besides vibe-coding.

Here, however, I get stuck. There is one key question connected to the sudden emergence of Anthropic and its sudden acquisition of real revenue that I need to understand if I am to have even a ghost of a chance of gaining a possibly correct view of all this. The question is this: What is the recipe for Anthropic’s current secret vibe-coding Claude Code sauce?

That is really two separable questions here:

Are Opus/Sonnet themselves unusually strong models?
Is the key difference the symbolic-ish harness wrapped around the models?

And I do not have answers to any of them.

Subscribe now

If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…

##notes-on-the-end-of-the-5-uber-era-of-ai-monday-mamlms
##monday-mamlms
##subturingbradbot
##macro-outlook
##mamlms
#notes-on-the-end-of-the-5-uber-era-of-ai
#metered-intelligence
#gpu-scarcity
#ai-business-models
#anthropic-strategy
#openai-strategy
#model-vs-harness
#harness-engineering

Notes on the End of the $5 Uber Era of "AI": MONDAY MAMLMs

Updates to your Google One Premium plan

##notes-on-the-end-of-the-5-uber-era-of-ai-monday-mamlms##monday-mamlms##subturingbradbot##macro-outlook##mamlms#notes-on-the-end-of-the-5-uber-era-of-ai#metered-intelligence#gpu-scarcity#ai-business-models#anthropic-strategy#openai-strategy#model-vs-harness#harness-engineering

##notes-on-the-end-of-the-5-uber-era-of-ai-monday-mamlms
##monday-mamlms
##subturingbradbot
##macro-outlook
##mamlms
#notes-on-the-end-of-the-5-uber-era-of-ai
#metered-intelligence
#gpu-scarcity
#ai-business-models
#anthropic-strategy
#openai-strategy
#model-vs-harness
#harness-engineering