Burning 2.1M Tokens Version of Misadventures in Vibe-Programming: LAUGH OF THE DAY
A 26 billion‑parameter mixture-of-experts LLM loses its mind, burning three War & Peace’s of tokens uselessly as it searches for memories it never formed…
As I continue to cosplay as a SysAdmin, we have:
⏺ Bash(python3 -c “ import json…) ⎿ {“aborted”: false, “externalAbort”: false, … +79 lines (ctrl+o to expand) ⏺ Root cause is clear. Two distinct problems:
1. terminalError: “non_deliverable_terminal_turn” + assistantTexts: []: The model completed successfully (3,468 output tokens, no timeout) but produced no final text message — only tool calls. OpenClaw has nothing to deliver, so it reports failure. The agent finishes inserting into the DB via tool calls and then goes silent instead of writing a closing summary.
2. 61 tool calls consuming 2.1 million accumulated input tokens: The prompt says “Recall the formatted Q&A pair produced by the previous run”. Since each run is sessionTarget: isolated, the agent has no memory of prior runs — so it frantically calls memory_search, sessions_list, and sessions_history over and over trying to find the previous output. That’s where the 2.1M tokens go.
The fixes needed…
That is three War & Peace’s worth of words. Ollama/gemma4:26b-a4b-it-q4_K_M’s 40 transformer blocks and 26 billion parameters frantically run search script after search script which then scans memory and sessions log files over and over again in actions containing enough verbal symbols to take us three times from the evening party of Anna Pavlovna Scherer in summer 1805 to Graf Pyotr Kirillovich and Grafinya Natasha Ilyinichna Bezukhov at home in 1820.
That “frantically” description by anthropic/claude4.6 describing the actions of Ollama/gemma4:26b-a4b-it-q4_K_M is a nice touch.
And all because I had asked a newly spun-up task to recall what it had done only an hour before.
