Inference Is Unlikely to Ever Be a Low Marginal Cost Operational Node, & the Other Reasons Why the Anthropic and OpenAI IPOs Ought to Fail

Digital Gods, real costs: why a rational world would see the doom of the foundation‑model-builder IPO, because the AI labs are highly unlikely to ever get profits, let alone hyperprofits. Inference never becomes sufficiently cheap, AI-entity judgment stays bad, and durable quasi-rents flow to NVIDIA & company—not to the model‑makers…

I have no idea whether OpenAI or Anthropic or both with launch an IPO this year, and I have no idea what the results of it will be.

But it is clear to me that, if either one does, it ought to fail.

That is clear to me in a way that it was not clear to me back in the day that the Google or the FaceBook or the Microsoft IPOs were unsound. I thought all three of those were very risky, yes. But, even though the valuations seemed very high to me, I did see a possible path to durable hyperprofitability for each.

Share DeLong's Grasping Reality: Economy in the 2000s & Before

I do not see such a path for either Anthropic or Open AI. That has now crystalized for me. And it is reading Paolo Perrone that has done it, and that has led me to the conclusion in the title.

From Paolo Perrone I get four things:

(1) “Inference” is very unlikely to ever become a low marginal-cost node in the system:

Paolo Perrone: Why is Inference Slow and Expensive? <https://theaiengineer.substack.com/p/why-is-inference-slow-and-expensive>: ‘Your inference bill…. Memory bandwidth…. KV cache reads…. GPU idle time…. The electricity bill for running all that idle silicon…. The industry’s largest AI lab spent $8.67 billion on inference in the first three quarters of 2025, nearly double their revenue… [and] lose[s] money on $200/month Pro…. The memory wall doesn’t care how big your model is. It scales down with you. The cost spiral is real…. Don't believe the 'inference is getting cheap' headline. It’s half true. API prices have collapsed since 2022. But “cheaper than 2022” is not the same as “cheap.”… The pricing you see on API dashboards is subsidized by venture capital. Providers are selling below cost to capture the market. When the subsidies end, the prices go up…

Give a gift subscription

(2) Language models now have sufficient verbal fluency. What they do not have is judgment as to which pieces of the human information corpus that has made up their training data are knowledge as opposed to simply s***posting. Hence anything they produce that is not for the immediate assessment by a skeptical human for whom it is part of their information diet requires IMMENSE “babysitting” not to run off the rails:

Paolo Perrone: Why AI Agents Keep Failing in Production <https://theaiengineer.substack.com/p/why-ai-agents-keep-failing-in-production>: ‘You watched the demo…. Everyone in the room was impressed. Three months later, your engineering team has burned $500K…. The agent works fine on Tuesdays when the API is responsive, the user says exactly what the system prompt expects, and nothing in the database has changed. It falls apart everywhere else….
Three failure patterns account for most of it: Dumb RAG (bad context management), Brittle Connectors (broken tool integrations), and the Compounding Error problem (mistakes that multiply across steps)…. The math is brutal: an agent with 85% accuracy per step only completes a 10-step workflow successfully 20% of the time. Every added step makes it worse…. A deterministic system fails loudly. A non-deterministic agent fails quietly, confidently, and often in ways your tests never anticipated…. No one anticipated the agent interpreting “clear the cache” as “wipe the drive.” That’s not bad luck. That’s what non-deterministic systems do…

Get 75% off a group subscription

(3) That is not going to change. Hence belief that these systems are on a trajectory to become human or superhuman in the next decade or so is simply crazy. What works are applications that are modestly right-sized in their expectations:

Paolo Perrone: Why AI Agents Keep Failing in Production <https://theaiengineer.substack.com/p/why-ai-agents-keep-failing-in-production>: ‘Agents delivering production value in 2026… [have:] Bounded scope. The agent handles one domain, with a defined tool set, and explicitly refuses tasks outside that boundary…. Observable behavior. Every tool call logged. Every decision point traceable…. Human gates on irreversible actions. Read: autonomous. Write: autonomous with logging. Irreversible: human approval required…. The demo worked because it was built for the happy path…

Refer a friend

(4) Plus ones in which the model itself generates bare verbal fluency, and in which the substance is tied down by the ground truth of a scrubbed trusted, organized data store:

Paolo Perrone: What Is RAG (Retrieval-Augmented Generation)? <https://theaiengineer.substack.com/p/what-is-rag-retrieval-augmented-generation>: ‘AI chatbots hallucinate 3% to 27% of the time, even in setups designed to prevent it…. The retrieval matters more than the generation. What makes or breaks it is the retrieval: how you chunk your docs, how you search them, and whether the right context actually lands in the prompt…. [A] natural language question… converted into a vector…compared against all the document chunks in your vector store (your indexed knowledge base)…. The retrieved chunks get assembled alongside the original question into a single prompt…. A well-built RAG system can tell the user which documents the answer came from, with links. This is the open-book exam equivalent of showing your work…. RAG… change[s] the model’s information diet: from “everything I memorized during training” to “the specific documents that are actually relevant to this question, right now”…

Leave a comment

What is the big—and bitter—lesson this time? It is this: data cleaning is most of it, as it so often is. And when cleaning the data properly is not most of it, finding the right data is.

Personally, even before Paolo had crystalized this for me, I had been pushed hard toward similar conclusions by my attempts to build my SubTuringBradBot <https://web.telegram.org/k/#@SubTuringBradBot>.

As a free‑ranging artificial Brad, it is charming but unacceptable. As a tightly leashed question-and-answer “catechism” engine answering questions about the syllabus, the course logistics, or opinions and judgments I have settled on. But it is still distressingly bad at providing crisp answers where there is one that is uniquely correct—whether the correctness is one of conceptual clarity or quantitative magnitude.

What we have are an oddly talented but unreliable group of les idiots savants, assistants who must never be left alone with the gradebook, the syllabus, the data analysis, or the nuclear launch codes. That gap between verbal fluency and judgment proved over and over again to be my central problem. in building this thing now burbling away at acceptable quality in the corner of my dining room. What works is not trusting the model to think and reason and invent answers as we let the stochastic dice fly. What works is forcing it into the humble role of indexer and pattern‑matcher over a vetted set of question‑and‑answer pairs. If the answer is in the catechism, you let it quote or adapt; if it is not, the system should admit ignorance—or escalate to the wetware.

And there is another thing that makes the behavior of SubTuringBradBot unacceptable—why I have disappointed whenever I have given it the strongest possible modern frontier model and let off the leash. Even when its answers are right by the standards of RLHF, they are wrong in a subtle and infuriating way: they gesture at knowledge and arguments while hollowing out the logical and argumentative core. They are, to borrow Harry Frankfurt’s vocabulary, bullshit: plausible-seeming textual performance. The more I care about the question, the less I am willing to let SubTuringBradBot try to answer it.

Inference is not and will never be sufficiently cheap tells us that we are not kooking at a familiar software story, one in which you do a big up‑front engineering push and then harvest enormous quasi‑rents because the marginal cost of serving the N+1‑st user is close to zero. We are looking, instead, at a capital‑intensive, energy‑intensive, bandwidth‑intensive, human nursemaiding-intensive industry in which the marginal cost of the N+1‑st user is stubbornly positive. Superprofits may flow to whoever owns the fabs, the data centers, and the power plants: the NIVIDIAs, the TSMCs, the hyperscalers, and the utilities. Not toe the model-builders.

Hence the importance of the religio-theological faith that a Digital God can be built and that the Oracular pronouncements of that Digital God can then be sold for real money as a motivator for what we have seen to date. Any reasonable possibility of that might have justified paying the price for the build-out on the part of those who did not have platform monopoly profits to protect from disruption. But the action on the actual token-production frontier is where Paolo says it is: systems that are bounded, supervised, observably logged, and tightly leashed to trustworthy data via RAG‑like architectures, and that still need a lot of plumbing and babysitting from quite expensive software engineers. The agentic fairy tale—“give the model tools and let it run your workflows”—keeps running into the same brick walls: non‑determinism, compounding error, brittle connectors, data quality, governance. Enterprises discover that, at the margin, this is not a replacement for their staff but an add‑on that itself has to be managed, monitored, and audited. That is a useful product. It is not a license to mint hyperprofits. Not for them. Not for me with SubTuringBradBot

At the same time, any price umbrella that might have preserved some margin is being kicked away from below. It is true that inference costs per unit of useful output are indeed falling. But they are falling for everyone. Open‑weight models are quite close in quality. And the gap is closing: distillation and quantization can do amazing things. When the key differentiator in getting results becomes not the unique edge of a unique model, but rather the harness and the data quality painfully and expensively maintained, the datacenter-based token-serving core model itself will slide toward being a commodity input. And that is when companies’ and consumers’ lighting up the dark silicon of their own devices will not become an even cheaper option.

Add to that the capital structure and burn, resulting in what seems from reports and leaks to be post-training and inference costs for both OpenAI and Anthropic that are not just high but scale roughly linearly with revenue. When you are burning through 70 percent of revenue even after having already climbed to multi‑billion‑dollar annual sales, you do not have a straightforward path from here to “profit machine.” You have, instead, a treadmill

And the recent behavior of investor-insiders is, I think, consistent with this picture.

Both Anthropic and OpenAI have moved, in the past couple of years, to lock down their secondary markets and keep a tight grip on who holds their equity and at what book valuation. Employees and early investors who thought they were holding liquid, pre‑IPO lottery tickets have discovered that the exit is blocked unless and until the company organizes a tightly controlled tender offer—or finally rings the bell on a public exchange. That is not what you do if you believe you are sitting on the next compounding, cash‑gushing franchise that can effortlessly buy back early shareholders over time from operating cash flow. It is what you do if you need to keep the story going, the paper valuations high, and the cap table tidy long enough to distribute the hot potato to the broad public.

Put these threads together, and the conclusion, I think, is grim for anyone who imagines that merely being “a leading foundation‑model lab” entitles you to super‑normal profits a decade from now. The technology is powerful but unreliable. It must be boxed into narrow, supervised uses. The economics of inference are unforgiving. The competitive environment is crowded, with open models and vertically integrated hyperscalers eroding any pricing power that an independent lab might hope to have.

What is the plausible, well‑specified path by which Anthropic or OpenAI grow into the kind of durable, high‑margin franchises that would justify the valuations their private rounds have implied?

There is none visible.

There is always “and then a miracle appears”: some qualitatively new product or institutional arrangement that we do not yet see, and that somehow evades both competition and regulation. Digital God. And, indeed, betting on that is not investing; it is eschatology.

But treat Anthropic and OpenAI not as prophecies but as businesses, and the numbers stop adding up. Inference remains capital‑, energy‑, and bandwidth‑intensive; models remain non‑deterministic, brittle, and in need of constant babysitting; and open‑weight competitors are “good enough” for many uses. That combination pushes the core model toward commodity status: an industry that looks, financially, more like a thin‑margin utility than a software cash machine.

Thus for existing investors, especially those who came in at nosebleed prices, the only realistic way to “win” is to sell out to Ms. Market while she is still willing to dream: to get an IPO done soon, distribute their positions into public hands, and hope that the day when the economics of inference and the limits of judgment finally knock on the door comes after the lockup expires, rather than before.

Subscribe now

If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…

##inference-is-unlikely-to-ever-be-a-low-marginal-cost-operational-node–the-other-reasons-why-the-anthropic-and-openai-ipos-ought-to-fail
##subturingbradbot
##macro-outlook
#anthropic
#openai
#inference-costs
#digital-god
#foundation-models
#human-in-the-loop
#ai-bubble-ipo-window
#stochastic-parrots

Inference Is Unlikely to Ever Be a Low Marginal Cost Operational Node, & the Other Reasons Why the Anthropic and OpenAI IPOs Ought to Fail

##inference-is-unlikely-to-ever-be-a-low-marginal-cost-operational-node–the-other-reasons-why-the-anthropic-and-openai-ipos-ought-to-fail##subturingbradbot##macro-outlook#anthropic#openai#inference-costs#digital-god#foundation-models#human-in-the-loop#ai-bubble-ipo-window#stochastic-parrots