Is the Day of the Data Center About to Be Over?

Marco Arment’s Setup as the Canary in the Coal Mine—or, Rather, as the 50 Mac Mini Server Farm Vastly More Efficient than the NVIDIA-Powered Cloud-Bound Hyperscalers. Or, why John Giannandrea’s stewardship of Apple’s AI strategy may have meant that Apple has already won the AI-software race…

Share

Share DeLong’s Grasping Reality: Economy in the 2000s & Before


John Giannandrea ran search and “AI” at Google, then jumped to Apple in 2018 to lead machine learning and “AI strategy.” That move mattered: Apple had lots of silicon and privacy rhetoric, but no coherent AI leadership. He then decided to push three intertwined bets:

  1. On‑device intelligence as the default, instead of “ship your life to the hyperscaler and rent it back token by token,” his vision was: your phone, your laptop, your watch do as much inference as possible locally, on NPUs Apple controls.

  2. Tight integration of model and hardware, with silicon paying off most when the software stack is shaped around its excellences and where the chip design springs from what the software is going to be asked to do most.

  3. Privacy‑preserving AI as a branding wedge, with high-class local models plus smart cacheing keeping your information from being captured and deployed to your detriment.

John Giannandrea was greatly suspicious of chasing the largest-possible GPT LLM models, spending fortunes on their training, and then deploying them as behemoths expensive in terms of silicon, heat, and power for inference as well. Giannandrea was the anti-Sam Altman. Giannandrea’s vision was of software as a thick middle layer where most economically relevant inference happens on M‑series chips scattered across living rooms and backpacks, with hyperscale training and a much thinner layer of cloud inference above it.

He was hired to (a) give Apple a coherent “AI” story, and fix Siri. Apple’s marketing arm had always been overpromising what Apple Siri would soon be able to do:

  • in 2011 it would soon be your digital butler, which would be a conversational assistant that could understand natural language and remember context;

  • in 2018 it soon would be able to control your home, answer questions, manage media, and act as a hub for an ecosystem of skillscontrol your home, answer questions, manage media, and act as a hub for an ecosystem of skills;

  • in 2024 Siri would—within the year—become a true unified conversational agent, able to understand open‑ended natural language, remember rich context across apps, and orchestrate complex actions on your behalf with LLM reasoning, cross‑app understanding, and natural follow‑ups.

Apple could not and did not deliver.

Back at the end of 2022 ChatGPT 3 had blown the doors off,

Thereafter, all Wall Street and pundits could care about was the amazing software technology-demonstration project they could see: a flashy, cloud‑scale, natural-language almost-useful virtual assistant. Pleasing Wall Street and pleasing pundits required that Apple be in the GPT LLM game at the frontier. So when the “we can do this in less than a year from June 2024 bet” failed, either you revise the strategy or you revise the strategist. Apple did both: they started leaning harder on cloud partnerships and they reassigned Siri and other “must ship soon” pieces to people seen as more execution‑driven and able to ship come hell or high water, with control and supervision of Apple “AI” going to Mike Rockwell, Craig Federighi, and company.

(Of course, nobody else has managed to deliver either. For example, last year Nilay Patel, Joanna Stern, and Jon Gruber made great fun of Alexa Plus:

00:22:54 ◼ ►Alexa, they claim that a million people have Alexa plus.
00:22:57 ◼ ► Does one person in this room have Alexa plus?
00:22:59 ◼ ► We can’t see.
00:23:01 ◼ ► No, the wheezing side does not count.
00:23:03 ◼ ► You have to actually say yes.
00:23:06 ◼ ► You have to say it and shout it.
00:23:07 ◼ ► You do?
00:23:07 ◼ ► Really?
00:23:08 ◼ ► Is it any good?
00:23:09 ◼ ► Get that person up on stage.
00:23:17 ◼ ► He’s the one person with Alexa Plus.
00:23:19 ◼ ► Better than Alexa is a low bar.
00:23:23 ◼ ► Is that Panos Panay?
00:23:24 ◼ ► Do you work for Amazon?
00:23:26 ◼ ► Okay.
00:23:29 ◼ ► Thank you, Mr. Bezos.
00:23:30 ◼ ► So there’s like a million people out there having whatever experience that is with Alexa Plus… <https://podsearch.david-smith.org/episodes/7742>

Give a gift subscription

and so on.

And do note that I am told that Microsoft has just changed its terms of service to state that CoPilot is “for entertainment purposes only”.

To be fair, there are people who claim that OpenClaw—tagline: “the AI that actually does things”—<https://openclaw.ai/> is getting close, at least in the large-but-limited domains of programming, scheduling, editing, summarizing, and, since SEO and chasing ad revenue have poisoned Google, searching.)

Get 75% off a group subscription


But now let us start over:

There is a semi celebrity-tech influencer <https://marco.org/2026/04/01/letter-to-john-ternus> restauranteer <https://www.instagram.com/thealbatrossob/>-auto enthusiast <https://carbuzz.com/rivian-launches-first-ad-campaign/>-podcaster <https://atp.fm/>-podcaster spouse <https://www.tiffanyarment.com/>-programmer <https://overcast.fm/> named Marco Arment <https://marco.org/>. As a programmer, Marco is the kind of programmer whom Steve Jobs would have called “a pirate”, a word that for Jobs had a strongly positive valence, as in “it’s better to be a pirate than to join the navy.

Marco’s first big project was as one of the two people building the original Tumblr <https://www.tumblr.com/>. His second was Instapaper <https://www.instapaper.com/>. Both demonstrating that a single person who actually touches the codebase every day can carry an astonishing amount of weight if they’re ruthless about scope and opinionated about what the product is for. And now—well, go to the landing page of pretty much any podcast and you will see something like this:

Refer a friend

As a programmer, these days Marco is podcast player OverCast, and podcast player OverCast is Marco. OverCast is perhaps fifth, perhaps eighth, in podcast players in terms of use. It is quite probably the second or third in terms of use by people vociferous enough to have and express opinions about podcast players. I guess that OverCast has perhaps a number of monthly active users in the low seven figures.

Two years ago Apple Podcasts began to deploy transcripts for every podcast it served. And Marco’s reaction was “oh crap!” He then saw a world in which serving transcripts was a table-stakes feature—a podcast player that did not offer transcripts was unambiguously significantly worse than one that did, and worse by a large margin. But transcripts are serious work, and serious work in the write-once-run-everywhere mode, with your per-unit fully-amortized cost inversely proportional to your user base. With only 1/30 of the scale of Apple Podcasts, and with podcast generation requiring serious “AI” work, how could Marco possibly over transcripts as a feature and still make any money at all from OverCast?

As a podcaster, Marco is one of the triumvirate that is the Flagship Podcast of People Who Wanted to Make an Auto Enthusiast Podcast But Found They & Their Audience Were Much More Lively When the Topic Was Tech: the Accidental Tech Podcast, ATP <https://overcast.fm/>. And on episode 683 of ATP <https://atp.fm/683>, he tells the story of what he did:


It starts:

I just had no idea how I could possibly ever match [Apple Podcast transcripts. However,] last summer, with the beta of iOS 26, Apple launched a new speech transcription API … opened up the iOS speech model…. It ran on device, it was very optimized, and very fast, and very lightweight….

OpenAI’s Whisper… was a game-changer in transcription models because the accuracy was so much higher than what had come before…. The problem… is that it’s a really big model… really slow… great for one person using on their computer or some very specialized app uses…. But… for… OverCast for… many podcasts, that wasn’t going to scale….

OpenAI has a transcription API…. [But] just cost-wise, you would be talking about hundreds or thousands of dollars per day at the scale that OverCast would need those to to be. And OverCast is not going to take on a thousands of dollars a day API cost…. That would require me to raise the price of premium higher than most people would be willing to pay….

This new API … for Apple’s on-device transcription…. I ran some tests. It just blew me away…. If I ran a few jobs in parallel I could have one M4 Mac transcribe audio at… 200 minutes transcribed for every real-time minute that has passed…

Leave a comment

To cut to the chase:

Marco Arment uses Apple’s lightweight on-device transcription inference models and a server farm of 50 MacMinis—total acquisition cost $30,000, $6,000/year amortized—drawing less than 2000W of power ($3000/year)—two microwaves—in a Long Island data center to generate transcriptions in near real time for every podcast any one of his users subscribes to. And if you want a transcript that his server farm has not yet gotten around to? Press a button on your iphone, and it will do it on your device in ten minutes or so. Add in non-power server farm colo fees and get that he is doing all this for something like $10,000/year plus his programmer time and skills.

Figure that each one of Marco’s MacMinis transcribes 300,000 minutes of audio a day. If he were to have outsourced the job to OpenAI, which charges $0.006/minute for using its cloud through the Whisper API, the work done by each MacMini would cost him $1800/day. The work done by fifty MacMinis going flat-out for a year would cost $1500 x 50 x 350—call it $30 million/year. 3000 times the cost. And even charging that price OpenAI is not making a profit thees days.

No, I cannot believe the difference between $10,000/year for a MacMini server farm and $26,000,000 to do it with NVIDIA chips in the cloud. I must have made a big mistake somewhere—slipped a decimal or two or three. I cannot see where I did.

But even if I have:

This is not a rounding error. This is not a niche edge case. This is the reality of AI inference running on the silicon of John Ternus’s people using the software of John Giannandrea’s. And this reality is not consistent with any belief that the Day of the Data Center is dawning.

Read more