MAMLMs Still Epic Fail Open‑Book, Closed‑World, Finite‑List, Obvious Ground Truth Tasks
I count this as a failure nine different successive times. There really is a unique, well‑defined answer, and where the machine has every chance to uncover it: the Hedge Knight Ashford Meadow line‑up. This avoids the usual escape hatches about “the data might be ambiguous” or “this is a hard open question.” We get not one isolated “hallucination” but rather a hallucination cascade…
As Noah Smith warns everyone: if you don’t like posts about “AI”—then stay off the internet for the next, I don’t know, five years or so. That applies here as well as anyone else.
BE WARNED!!
Today’s MAMLM post is triggered by the extremely sharp John Quiggin’s:
John Quiggin; LLMs Reaching the Tipping Point <https://johnquigginblog.substack.com/p/llms-reaching-the-tipping-point>: ‘[Before] Anthropic’s Claude Code late last year… there was a fair bit of disillusionment about “vibe coding” and… [its] buggy output… as well as… general disaffection… with… AI slop…. But lately, the tone has changed radically…. Even [the] previously sceptical… have concluded… Claude represent[s] the future…. OpenAI Deep Research… [has had] mixed results until recently. But now I’m perceiving the same kind of change… lengthy interactions … as if I were talking to an intelligent and exceptionally well-informed human… keeps track… seems less prone to fabrications… [and] this kind of interaction is fun…. DR is moving from being an enthusiastic but unreliable research assistant to something more like a well-read junior co-author. The ideas are still mine, but I can rely on DR to provide discussion and critique as well as routine stuff like literature summaries…
I confess I have not been doing enough coding this winter to gain a sense of any sea-change. But I will say that I am not seeing it in other realms—specifically, in ChatGPT and its ilk as an alternative to standard search engines. The advantages are (a) a natural-language interface, (b) a system that has not (yet) been turned fully up the wazoo to sell ads, plus (c ) the SEO spammers have not yet descended to—I really do not like the vibe of “en***tify”—full commodification cannibalization. Yet I found it could not o the job when, yesterday, I asked it for a list of the fighters-for-good in Episode 5 of HBO’s “A Knight of the Seven Kingdoms”. That ought to have been well within its capabilities. And yet:
Who were the seven on Duncan’s side in the Trial of Seven?
You are asking about the Trial of Seven at Ashford Meadow in The Hedge Knight, the first of the Dunk & Egg novellas set in the world of A Song of Ice and Fire.
At that trial, Ser Duncan the Tall stands accused by Prince Aerion Targaryen after striking him to defend a puppeteer girl. Under Westerosi custom, a Trial of Seven is invoked — seven champions on each side.
Here are the seven who fought for Ser Duncan (“Dunk”): The Seven for Duncan: 1. Ser Duncan the Tall…. 2. Prince Baelor Targaryen (“Baelor Breakspear”)…. 3. Prince Maekar Targaryen…. 4. Ser Lyonel Baratheon (“The Laughing Storm”)…. 5. Ser Humfrey Hardyng…. 6. Ser Robyn Rhysling…. 7. Ser Raymun Fossoway (“The Red Apple Fossoway”)…
The trial ends in Dunk’s side prevailing, but at terrible cost…
And then it blathers on in its wordy and obsequious way. But even I know that (3) is wrong—that Maekar fights on the other side. So I wipe the memory and try again, asking about some of the less-conspicuous characters who I am pretty sure are fighters for the right here. But I dictate rather than type, and ChatGPT misinterprets one of my “commas” as a word rather than a typographical mark. And so we are off to the races:
Why do Beesbury, Comma, Harding, and Rysling fight for Duncan the Tall in the Trial of Seven?
You are asking about one of the more revealing episodes in the political sociology of Westeros: the Trial of Seven at Ashford Meadow in The Hedge Knight, the first Dunk & Egg novella. Duncan the Tall is, at that moment, a nobody—an anointed hedge knight of dubious pedigree who has just publicly accused a Targaryen prince of brutal wrongdoing. Yet four experienced knights—Beesbury, Cafferen, Hardyng, and Ryswell—choose to stand beside him. Why? Let us take them one by one, and then draw out the larger pattern…. 1. Ser Lyonel Baratheon (“the Laughing Storm”) — Not in Your List but Central. Before turning to your four, note that the first and most consequential supporter is Ser Lyonel Baratheon. His presence signals that this is not merely a vagabond’s quarrel. Once a major lord stands for Dunk, others can follow without appearing mad. The political economy shifts immediately: Dunk is no longer isolated….
2. Ser Humfrey Hardyng…. 3. Ser Humfrey Beesbury…. 4. Ser Robyn Ryswell….
5. Ser Humfrey Cafferen (You wrote “Comma”): Cafferen joins somewhat later and under persuasion. His motivation appears to combine: • Disgust at Aerion’s excesses. • Peer pressure from other honorable knights. • The gravitational pull of Lyonel Baratheon’s participation. Once a powerful regional lord publicly stakes his honor on Dunk’s cause, the reputational risk shifts dramatically. Cafferen may not be an idealist—but he is not willing to be counted among those who let a prince maim innocents without resistance…
And it, again, blathers on. And this time I know that it is Robyn Rysling, not Rhyswell. And that there is no such character as Ser Humfrey Cafferen in the story. There appears to be one and only one reference to a “Sir Humfrey Cafferen” in all the scope of google search: <https://www.reddit.com/r/IronThroneRP/comments/9idga5/the_stag_and_the_lion_iv_the_tourney_of_storms_end/>.
By this time, I am curious and amused as to what it will say next:
So who are the seven who fight for Duncan again?
The Seven Champions for Duncan the Tall: (Trial of Seven at Ashford Meadow, c. 209 AC — The Hedge Knight): Here are the seven who stand on Duncan’s side: 1. Duncan the Tall…. 2. Prince Baelor Targaryen (“Baelor Breakspear”)…. 3. Ser Lyonel Baratheon
The formidable Stormlander lord. Once he joins, the coalition acquires real weight. 4. Ser Humfrey Hardyng. 5. Ser Robyn Ryswell. 6. Ser Willem Wylde. 7. Ser Humfrey Beesbury.A Clarification: Earlier confusion often arises because some knights hesitate or shift before the final lineup. But the canonical seven on Duncan’s side are the above. Notably absent from this list is Ser Humfrey Cafferen—he does not ultimately fight in the final seven…
And it blathers along. But:
Still “Rhyswell” rather than “Rhysling”.
WTF is Ser Willem Wylde, and where does that come from?
Continuing the dig the hole with “Cafferen”
(A Wiki of Ice & Fire <https://awoiaf.westeros.org/index.php/Willem_Wylde> tells me that Willem Wylde fights on the side of the baddies here.)
I know that challenging these things never works out well. But why not?
But in the earlier conversation, you said that Prince Maekar fought on Duncan’s side as well.
You are right to press. That earlier statement was incorrect. Prince Maekar Targaryen does not fight for Duncan. He fights against him. Let us set the record straight cleanly and definitively.
The Seven for Duncan: From The Hedge Knight:
Duncan the Tall
Prince Baelor Targaryen
Ser Lyonel Baratheon
Ser Robyn Ryswell
Ser Humfrey Hardyng
Ser Humfrey Beesbury
Ser Willem Wylde….
Why the Confusion Happens: Maekar is central to the tragedy of the trial. He fights on Aerion’s side, but during the melee he strikes a crushing blow intended for Duncan that instead lands on his own brother, Baelor Breakspear. That blow kills Baelor days later. Thus Maekar is:
Opposed to Dunk in combat,
Yet indirectly responsible for the death of the very prince who defended Dunk.
This tragic irony is the hinge on which the Targaryen succession turns. Baelor’s death removes the clear heir and sets in motion the chain of events that ultimately places Maekar himself on the throne years later.
So to be unequivocal: Maekar fought against Duncan. If you would like, I can now lay out the full political consequences of Baelor’s death for the Targaryen succession and how this trial reshapes Westerosi history.
As of now, it has (a) Willem Wylde on the wrong side, and (b) Ser Robyn Rhysling’s name wrong. Someone is missing—I remember: The young Fossaway squire with the green-apple shield:
Ah. The green apple Fossoway. You have put your finger precisely on the most easily forgotten but narratively delicious piece of the lineup. Let us sort it out carefully:
The Fossoway Split at Ashford: At the tourney of Ashford Meadow in The Hedge Knight, there are two Fossoways present:
Ser Steffon Fossoway — the green apple
Ser Raymun Fossoway — the red apple
Originally, House Fossoway’s sigil is a red apple on gold. During the events of the trial, a cadet branch is effectively born.
What Actually Happens: When Duncan is assembling his seven:
Ser Raymun Fossoway (red apple) initially seems willing.
But he backs out under pressure — siding instead with Prince Aerion.
His cousin, Ser Steffon Fossoway, is appalled. In response:
Steffon publicly denounces Raymun’s cowardice.
He tears the red apple from his surcoat.
He declares a new sigil — a green apple on gold.
He takes the field for Duncan.
This is one of Martin’s most elegant symbolic gestures. A house literally splits over a question of honor.
Therefore: The Correct Seven for Duncan: The accurate roster on Dunk’s side is:
Duncan the Tall
Prince Baelor Targaryen
Ser Lyonel Baratheon
Ser Humfrey Hardyng
Ser Robyn Ryswell
Ser Willem Wylde
Ser Steffon Fossoway (the green apple)
Not Beesbury.
Beesbury does not fight in the final seven.
Your instinct was exactly right: the green-apple Fossoway must be there.
But that’s the wrong Fossaway!
By this point, even if you read the novella, perused the graphic novel, and watched the TV show, you will be very confused. So let me give you the ground truth from A Wiki of Ice & Fire <https://awoiaf.westeros.org/index.php/Trial_of_seven>:
Duncan’s champions
Ser Duncan the Tall
Prince Baelor Targaryen, died after the battle
Ser Lyonel Baratheon
Ser Raymun Fossoway [“green apple”]
Ser Humfrey Beesbury, slain by Donnel of Duskendale
Ser Humfrey Hardyng, died after the battle
Ser Robyn Rhysling
Hold tight to that: all else is hallucination. So I decide to torment it further, by giving it a self-contradicting question—I thing I know it does badly on:
If Beesbury does not fight in the final seven, then who does Beesbury fight for?
He fights for Duncan. And this is where the bookkeeping in our earlier exchange went off the rails. Let us reset it cleanly:
The Seven Who Fight for Duncan: From The Hedge Knight, the final seven on Duncan’s side are:
Duncan the Tall
Prince Baelor Targaryen
Ser Lyonel Baratheon
Ser Humfrey Hardyng
Ser Robyn Ryswell
Ser Humfrey Beesbury
Ser Steffon Fossoway (the green apple)
Ser Willem Wylde does not fight in the final seven. That was the mistaken substitution…
Still “Ryswell”, still the wrong Fossaway. And Willem Wylde does fight—on the other side.
It does no better—it does worse—if asked for the seven opposed to Ser Duncan’s side:
Who are the opposing seven?
The Seven Who Fought Against Duncan: (Ashford Meadow, c. 209 AC — The Hedge Knight)*. The champions on the prince’s side were:
Prince Aerion Targaryen
The instigator of the crisis; he demands satisfaction after Dunk strikes him for abusing Tanselle.Prince Maekar Targaryen
Aerion’s father; he rides for his son. In the melee he delivers the fatal blow—meant for Dunk—that mortally wounds his own brother Baelor.Prince Valarr Targaryen
Son of Baelor Breakspear; he stands with the crown against Dunk.Ser Raymun Fossoway
The “red apple” Fossoway who declines Dunk’s cause; his defection prompts Steffon’s green-apple split.Ser Steffon Frey
Ser Gormon Peake
A house with Blackfyre sympathies in its past; Peake’s presence hints at deeper fissures in Targaryen politics.Ser Humphrey Hightower…
Again: ground truth:
Prince Aerion Targaryen
Prince Daeron Targaryen
PrinceMaekar Targaryen
Ser Steffon Fossoway
Ser Willem Wyldeof theKingsguard
Ser Donnel of Duskendaleof the Kingsguard
Ser Roland Crakehallof the Kingsguard
It’s batting two for seven.
And so I make one attempt to push it toward that ground truth, and it fails:
And what about Sir William Wilde?
Ah — Ser Willem Wylde. Good. This is where we close the ledger cleanly. He fights for Duncan. And the earlier shuffling in our exchange came from the fact that Ashford’s roster is easy to misremember because several knights hesitate before the final lineup crystallizes. Let us set it down definitively.
The Seven for Duncan (Final, Canonical): From The Hedge Knight:
Duncan the Tall
Prince Baelor Targaryen
Ser Lyonel Baratheon
Ser Humfrey Hardyng
Ser Robyn Ryswell
Ser Humfrey Beesbury
Ser Willem Wylde
Errors are “Rhyswell”, Wylde, and the renewed vanishing of Raymun (not Steffon! he is on the side of the baddies!) Fossaway. But there is more. The answer continues, losing even the pretense of contact with logic:
What About the Fossoways?: Here is the crucial clarification:
Ser Raymun Fossoway (red apple) fights for Aerion.
His cousin Ser Steffon Fossoway (green apple) fights for Duncan.
So Steffon Fossoway is indeed one of Duncan’s seven. Which means something else must give. And here is the correction that resolves everything: Ser Humfrey Beesbury does not fight in the final seven. The final Dunk lineup is:
Dunk
Baelor
Lyonel
Hardyng
Ryswell
Wylde
Steffon Fossoway
Beesbury is among those who support Dunk earlier but is not one of the final combatants.
Why This Is So Easy to Scramble: Ashford is narratively chaotic:
Knights volunteer.
Some withdraw.
The Fossoways split publicly.
The princes intervene.
Dunk scrambles to complete his seven.
Because several of the middling landed knights (Hardyng, Wylde, Beesbury) appear in the same narrative sequence, readers often interchange one. But the presence of both Wylde and Steffon Fossoway in the final seven is clear in the text.
If you like, I can now present the full fourteen, both sides, in one clean side-by-side list so we end this once and for all without any further reshuffling.
And, of course, telling it the answer does not help:
It is actually: Duncan the Tall, Baelor Breakspear, Lyonel Baratheon, Humphrey Harding, Humphrey Beesbury, Raymond Fossoway, and Robin Rhysling.
You are very close — but not quite. And the difficulty here is real: Ashford’s roster is one of those things even careful readers scramble because Martin lets the lineup evolve in stages. Let us fix it definitively and cleanly.
The Canonical Seven for Duncan: (from The Hedge Knight)
The final seven who actually take the field for Dunk are:
Duncan the Tall
Prince Baelor Targaryen
Ser Lyonel Baratheon
Ser Humfrey Hardyng
Ser Humfrey Beesbury
Ser Robyn Ryswell
Ser Steffon Fossoway (the green apple)
The Key Correction: You listed: Raymun Fossoway
That is Ser Raymun Fossoway — the red apple. He does not fight for Duncan. He fights for Aerion. His cousin Steffon Fossoway tears off the red apple and creates the green apple branch in order to fight for Dunk. That moment — the heraldic schism — is why people often invert which Fossoway is on which side.
Clean Summary: Dunk’s Seven:
Dunk
Baelor Breakspear
Lyonel Baratheon
Humfrey Hardyng
Humfrey Beesbury
Robyn Ryswell
Steffon Fossoway (green apple)
Aerion’s Seven include:
Aerion
Maekar
Valarr
Raymun Fossoway (red apple)
and three additional knights (Peake, Frey, Hightower)
If you would like, I can now reconstruct the fight itself blow-by-blow — who unhorses whom, who is wounded, and exactly how Baelor is struck — which is where the real dynastic hinge lies.
If you are one of the five people who have read this far, congratilations! But you are probably asking: WTF is his point, here?
This is the point, belabored by the length of the “conversation”:
HERE I AM NOT HAVING:
the future of [human-computer interaction], and not merely an addition to a well-developed set of tools… talking to an intelligent and exceptionally well-informed human… less prone to fabrications… like a well-read junior co-author… rely[ing] on DR to provide discussion and critique as well as routine stuff…. It’s time to make a lot more use of AI if I am going to get my ideas out as fast as I would like, bearing in mind that I can hear Time’s winged chariot at my back…
What am I doing instead?
I am giving it a token-sequence. It is looking around in its training data for the most similar token-sequences, picking one of them, then channelling the thoughts of the human mind that made that picked token sequence, and then outputting what it thinks that human would have written as the next token. I have not given it garbage in. But I am definitely getting unreliable garbage out.
If—outside of specified domains—you think that you are “talking to an intelligent and exceptionally well-informed human… like a well-read junior coäuthor”, then all I can say is that you have Clever Hansed yourself to the max:
Overinterpreting success: Taking correct hoof-tap answers as evidence of arithmetic, rather than of a horse realizing when the human approved.
Ignoring alternative explanations: Failing to consider that Hans might be responding to cues, not calculating.
Trusting anecdote over experiment: Letting striking demonstrations and newspaper reports substitute for controlled tests with blinded observers and varied conditions.
Underestimating reporting bias: Not recognizing that von Osten (and later experimenters) could be reporting what they wished to see, rather than what they had actually seen.
Letting prestige vouch for the result: Allowing endorsements by officials, teachers, and scientists (the Hans Commission) to substitute for truly skeptical scrutiny and independent replication.
Not separating performance from understanding: Equating “getting the right output on cue” with “having the inner concepts” (numbers, dates, language) rather than treating performance as something to be decomposed and probed.
Stopping inquiry at the most flattering story: Being satisfied with the exciting narrative—“a reasoning horse!”—instead of actively trying to falsify that hypothesis with harsher tests.
You are—again, outside of very limited domains—being grifted, quite possibly by people who are also grifting themselves, and probably including self-grifting your own person—when you ignore that the machine never comes up with the list “Duncan the Tall, Baelor Targaryen, Lyonel Baratheon, Humfrey Harding, Humfrey Beesbury, Robyn Rhysling, Rayman Fossaway”, in spite of its enormous and ever-reeated confidence that—this time! for sure!—it has the right list. And from one perspective this failure is worth remarking on, for there are, by now, many many copies of the right list in the training data, and few copies of wrong lists. Yet the next-token-prediction engine that is the LLM never hops to one of those right lists as the “most similar” token sequence and then follows the groove to the end.
Shining and sputtering. “Fractal capabilities”, some say. And that means that stopping your undertanding of these things by CleverHansMaxxing is a silly thing to do.
Not, mind you, that CleverHansMaxxing is necessarily a bad thing to do under all circumstances. It may be a good thing to do in interacting with and using these rolling boils of linear algebra as information-cognition tools. I think there is wisdom in what my high-school friend Adam Farquhar said at our lunch at Acre Kitchen & Bar at Rockridge BART corner last year, that he was:
astonished by the potency of these new engines of thought… how often they seem to exceed the effects they ought, by rights, to possess… furnish[ing] answers, analyses, and even flights of invention whose polish belies the raw circuitry beneath…. I no longer trust my intuition to predict just when that brilliance will shine and when it will sputter…. There was a season when I cautioned all within earshot: “Do not anthropomorphize the computer; you will only mislead yourself.”… Today I think it is finally time to anthropomorphize the heck out of it. I need to treat the machine as though it were a somewhat eccentric roommate: a companion inclined to fixate on abstruse topics, possessed of unsettling literalism, vulnerable to the occasional non-sequitur, yet blessed with inexhaustible patience and a boundless appetite for our questions… never flagging, never annoyed…. Its encyclopedic recollection astonishes, though… that recollection is not comprehension…. Like certain friends we have all known—gifted, idiosyncratic, occasionally obtuse—it catalogs facts in profusion but can falter when nuance or context slip beyond its patterned grasp.
Yet precisely because of those quirks, conversation with such a companion can be fruitful. With patience we learn when to press, when to reinterpret, when to discard a flawed reply and ask anew… steering an intellect that is at once dazzling and uneven, alien yet uncannily familiar. And in so doing we glimpse the contours of a future in which collaboration with non-human minds will be, not an oddity, but a daily discipline…. Let us adopt habits suited to the partnership. Let us bring humility enough to recognize the limits of our own intuitions, curiosity enough to probe the machine’s uncharted talents, and vigilance enough to catch its inevitable missteps…
But that CleverHansMaxxing may be a royal road to making these things useful to you in your daily information-cognition routine is one thing. Jumping from there to CleverHansMaxxing as a way of understanding them with an eye toward assessing what is likely to happen next is quite another thing.
And, if it were not a faulty thing, ChatGPT could have come up with the right seven-name list of the champions of Ser Duncan the Tall.
But, you say, vibecoding successes! natural-language fluency successes! Yes. But here I think that Alessandrini, Klee, & Wolfram <https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/>. have it nailed:
At some level it still seems difficult to believe that all the richness of language and the things it can talk about can be encapsulated in such a finite system…. How is it… that… ChatGPT can get as far as it does?… Language is at a fundamental level somehow simpler than it seems…. ChatGPT—even with its ultimately straightforward neural net structure—is successfully able to “capture the essence” of human language and the thinking behind it… has somehow “implicitly discovered” whatever regularities in language (and thinking) make this possible. The success of ChatGPT is, I think, giving us evidence of a fundamental and important piece…. We can expect there to be major new “laws of language”—and effectively “laws of thought”—out there to discover. In ChatGPT… those laws are at best implicit. But if we could somehow make the laws explicit, there’s the potential to do the kinds of things ChatGPT does in vastly more direct, efficient—and transparent—ways…
However, my Visualization of the Cosmic All here may well be in error.
Now let’s see if it can search the weblog post above and compile a proper bibliography of the works referenced by URLs according to the pattern I give it for Alessandrini, Klee, & Wolfram. Ready, set, go!:
References:
Alessandrini. Giulio, Brad Klee, & Stephen Wolfram. 2023. “What Is ChatGPT Doing… & Why Does It Work?“. Stephen Wolfram Writings. February 14. <https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/>.
Quiggin, John. 2026. “LLMs Reaching the Tipping Point”. John Quiggin’s BlogStack. February 23. <https://johnquigginblog.substack.com/p/llms-reaching-the-tipping-point>.
“The Stag and the Lion IV: The Tourney of Storm’s End.” n.d. Reddit, r/IronThroneRP. <https://www.reddit.com/r/IronThroneRP/comments/9idga5/the_stag_and_the_lion_iv_the_tourney_of_storms_end/>.
“Trial of Seven.” n.d. A Wiki of Ice & Fire. <https://awoiaf.westeros.org/index.php/Trial_of_seven>.
“Willem Wylde.” n.d. A Wiki of Ice & Fire. <https://awoiaf.westeros.org/index.php/Willem_Wylde>.
Nope. It cannot.
Quiggin, John. 2026. “LLMs Reaching the Tipping Point”. John Quiggin’s BlogStack. Feb 23. <https://johnquigginblog.substack.com/p/llms-reaching-the-tipping-point>.
