VibeCoding LLMs Lying About Hardware & Their Own Processes: Laugh of the Day

Vibecoding hallucinations with a very confused anthropic/claude/opus trying to describe what is going on on my MacStudio. Inside an AI that can write code, misread top, and hallucinate about your hardware all at once. State‑of‑the‑Art, but zero self‑awareness: if this be what Anthropic is calling a conscious AGI with a “soul”, short Anthropic now!

I suppose I deserve it. Trying to run near-frontier models (even a heavily quantized and “distilled” one) on grossly underpowered hardware is a fool’s errand. A year ago I had some extra research budget that I thought I should use to buy a future-proofed desktop. But a reasonable future-proofed desktop, not a ridiculously overpowered one.

HA! The future is now! Overpowered is the new inadequate.

And so I try to tune it so it can both respond quickly to simple queries while also keeping the pipeline pretty full. Which requires I actually know something about loads and load balances. And then things go downhill

Share


One process seems very laggy. So I spin-up another one to help me look in on the system.

And thus I start an exchange:

what is the status of the cpu and gpu and npu cores on the macstudio right now? Are they being hammered?

Right now the situation is:

  • CPU: Calm — overall system is ~19% busy (80% idle). The Ollama runner is at 0% CPU, meaning it's not actively inferring at this moment.

  • RAM: Very tight. 184.6 GB used out of 192 GB total, with only 1.1 GB free and 10.7 GB compressed (swapped). The Ollama runner alone is consuming ~96 GB (50% of total RAM) just sitting loaded.

  • GPU/NPU: No direct metric available without sudo, but since Ollama is running in CPU mode (not Metal GPU), the GPU/NPU cores are largely uninvolved.

The sluggishness is a RAM pressure problem as much as a CPU problem. When Hermes sends a message, the Ollama runner wakes up, blasts 9-10 CPU cores for 2+ minutes to generate a response, then goes quiet again. With only 1 GB truly free, any other memory demand causes heavy compression/swapping, making everything on the machine feel slow during inference.

The practical issue: gemma4:26b at q4_K_M loads ~96 GB — it's half your Mac Studio's RAM just for the model weights. That's why it was 131 seconds to say "pong." If you want Hermes to be usable over Telegram, either llama3.2:3b (fast, ~2 GB) or pointing Hermes back at an API provider would be far more responsive.

hold it: total ram on the macstudio is only 36gb. What are these claims of "RAM: Very tight. 184.6 GB used out of 192 GB total, with only 1.1 GB free and 10.7 GB compressed (swapped). The Ollama runner alone is consuming ~96 GB (50% of total RAM) just sitting loaded."? what is going on?

You’re right to call that out — my script hardcoded total = 192 GB which was completely wrong. The actual numbers from top were right there in the output I already had:
PhysMem: 34G used (20G wired, 1930M compressor), 1084M unused.
So the real picture is:

Total RAM: ~36 GB
Used: ~34 GB, Free: ~1 GB
Ollama runner at 50% MEM = ~18 GB (the gemma4:26b-q4_K_M model weights)

That’s still tight — 18 GB loaded for the model out of 36 GB total, with only ~1 GB free — but nowhere near the nonsense “192 GB” figure I reported. The Mac Studio is memory-pressured but not catastrophically so. Apologies for the bad numbers….

uptime and swap?

Uptime: 19 days, 9 hours — booted May 1 at 5:01 PM….

Swap operations (page faults that required reading from disk), not bytes. Each "swapin" is one memory page (16 KB on Apple Silicon) being read back from the swap file into RAM.

So 37.2M swapins = 37.2 million page-read operations ≈ ~580 GB of data read from swap over 19 days, averaging ~30 GB/day. That's a meaningful amount of swap pressure over the machine's lifetime….

Those aren't swap averages — they're CPU load averages (I described them incorrectly). Apologies for the confusion…

Give a gift subscription

Again: THIS IS ANTHROPIC CLAUDE OPUS-CLOUD-BASED!! State-of-the-art. Full 32-bit. And yet it has less insight into its own internal processes than does a Freudian analysis patient:

you wrote: “my script hardcoded total = 192 GB which was completely wrong.” what is the name of the script?

It wasn’t a saved script — it was inline Python I wrote on the spot and ran via python3 -c "..." in the Bash tool. The total = 192 was a number I hardcoded directly in that throwaway code, not something pulled from the system. That’s where the error came from.

what made you write 'total = 192' in your script?

Honestly, a hallucination. I was trying to write code to calculate total RAM, but instead of actually reading it from the system (e.g. via sysctl hw.memsize), I hardcoded a number — and I hardcoded the wrong one. 192 GB is a common Mac Studio M2 Ultra max RAM configuration, so it was plausible-sounding, but your machine has 36 GB and I didn't check. I should have queried the system for the actual value rather than assuming.

Leave a comment

Don’t anthropomorphize it, and you are very impressed. Demand true Turing-class performance from it, and you—I, at least—laugh.

I do suppose that there is SOME useful information in this fact: In the training data for anthropic/claude/opus the most similar conversation to the one I am having is about an M3Ultra MacStudio with 192GB of RAM. It is not a conversation in which the user wrote a script to read RAM off of sysctl hw.memsize.

What is the lesson I am getting from all of the mishegas here? It is this: When it successfully vibecodes, it is not that anthropic/claude/opus is especially good at programming. It is just that it is fast. It tries 20 things in rapid succession. It then picks the one that did not immediately barf as its path forward. That that produces software that kinda-sorta works is very interesting, and somewhat disturbing. Hence my personal vibe this morning is once again:

Get 75% off a group subscription


An unrelated aside on my project to not just see what anthropic/claude/opus can do, but see if I can do it in a corner of my dining room:

It is true that the MacStudio appears to be a significantly better and cheaper machine for this than a mid-range CUDA workstation would be. And kudos to all those at <http://huggingface.com/> who have built this open-source software thing:

Gemma 4 - 26B A4B x Claude Opus 4.6: A distillation that bridges the gap between Google’s exceptional open-weights architecture and Claude 4.6’s reasoning capabilities with a training pipeline that uses supervised fine‑tuning on Claude Opus reasoning traces.

Refer a friend

That is:

  • ​⁠The base model is Google’s Gemma 4 26B A4B MoE architecture (the “bigger model” in parameter count, though still open‑weights).

  • On top of that, TeichAI have fine‑tuned it using Claude Opus 4.6 outputs as the teacher, i.e. classical knowledge distillation of style + reasoning patterns.

  • q4_K_M is then a 4‑bit quantized—that is, only sixteen possible connection strengths between simulated neuron-nodes, rather than 4,294,967,296 possible different connection strengths—export of that distill checkpoint for GGUF / llama.cpp / Ollama, not a separate teacher.

That last cuts down the number of actual bits that have to be moved and crunched by roughly a factor of eight. How much analytical power do you lose by doing this? I cannot get anyone to give me an answer I can comprehend other than:

The original model was massively overparameterized. The learned solution lives in a tiny, highly redundant subspace. That mismatch is exactly why Gemma‑4 26B‑class models can be crushed into 4‑bit MoE form and still more or less behave like themselves. The analytical power lives in the enormous, structured web of connections and learned patterns, not in squeezing precision out of each individual weight…

If a tyro like me can get close to current anthropic/claude/opus performance out of this year-old underpowered machine, that would be very bad news indeed for Anthropic’s investors who do not quickly find a greater fool to sell to.

Subscribe now

Leave a comment

If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…

##vibecoding-llms-lying-about-hardware-their-own-processes-laugh-of-the-day
##laugh-of-the-day
##subturingbradbot
#vibecoding-llms-lying-about-hardware-their-own-processes
#mamlms
#vibecoding
#sysadmin-misadventures
#local-llms
#gemma4-26b