CROSSPOST: COSMA SHALIZI: Aware of All Internet Traditions: Large Language Models as Information Retrieval & Synthesis

The real superintelligence is the House of Intellect we built since -3000; LLMs simply give us a dangerously seductive key to its front door, for LLMs are merely (“merely”!) high-speed front‑ends to five millennia of human formula, habit, and tradition. Once you see them as such, both their miracles and their dangers come more… into focus.

Thus this is, I think, very very good:

Cosma Shalizi draws on Jacques Barzun and his concept of The House of Intellect for what I have been calling The Real ASI—the Anthology Super-Intelligence of the Collective Human Mind since -3000.
For it was back then that the coming of writing and calculating made us truly a species-wide time- and space-binding single anthology intelligence

Share DeLong's Grasping Reality: Economy in the 2000s & Before

<https://bactra.org/research/2026-03-10.html>

Aware of All Internet Traditions: Large Language Models as Information Retrieval and Synthesis

Cosma Rohilla Shalizi (CMU + SFI)
10 March 2026, Cultural AI Workshop

In conclusion

GenAI is information retrieval and synthesis
GenAI generates formulaically
Human culture is also pretty formulaic, and that is OK
Formulas are traditional
∴ GenAI is mechanized/prosthetic tradition

Give a gift subscription

1. GenAI is information retrieval

With the right tools + access, we can quantify the influence of each training document on every response

From Grosse et al. (2023):

(= Arthur C. Clarke, 2010: Odyssey Two [1984])

Undergrads can grasp the math, the real work of Grosse et al. (2023) was doing it efficiently and at scale in the mess of real data

& synthesis

GenAI doesn’t just repeat its training corpus
- Of course there’s stuff it memorizes (Carlini et al. 2020)
- But it can’t memorize everything and it produces stuff it could never have memorized

etc., etc., for several more pages.

Get 75% off a group subscription

2. GenAI generates formulaically

GenAI can generate because it has learned formulas
- Also tropes, stereotypes, templates, conventions, genres, all sorts of recurring patterns in the symbol-stream

This is all wrong about me
This is 100% right about a lot of people like me

I didn’t say “tokenizer”
Most R tokenizers do use strsplit(), but not this one

Refer a friend

3. Much of human culture is formulaic

Formulas for oral epic

Formulas for scientific papers

Formulas for story plots

4. Formulas are traditions

Following a tradition means not having to think for oneself
- This has real evolutionary advantages (Simon 1990)
Following a tradition means not having to think for oneself: Barzun (1959)

Intellect is the capitalized and communal form of live intelligence; it is intelligence stored up and made into habits of discipline, signs and symbols of meaning, chains of reasoning and spurs to emotion — a shorthand and a wireless by which the mind can skip connectives, recognize ability, and communicate truth. Intellect is at once a body of common knowledge and the channels through which the right particle of it can be brought to bear quickly, without the effort of redemonstration, on the matter in hand.
Intellect is community property and can be handed down. We all know what we mean by an intellectual tradition, localized here or there; but we do not speak of a “tradition of intelligence,” for intelligence sprouts where it will…. And though Intellect neither implies nor precludes intelligence, two of its uses are — to make up for the lack of intelligence and to amplify the force of it by giving it quick recognition and apt embodiment.
For intelligence wherever found is an individual and private possession; it dies with the owner unless he embodies it in more or less lasting form. Intellect is on the contrary a product of social effort and an acquirement…. Intellect is an institution; it stands up as it were by itself, apart from the possessors of intelligence, even though they alone could rebuild it if it should be destroyed….
The distinction becomes unmistakable if one thinks of the alphabet — a product of successive acts of intelligence which, when completed, turned into one of the indispensable furnishings of the House of Intellect.

Humans internalize traditions by immersion and practice
- Bottle-washing, canvas-stretching, etc. etc.
- We have (probably) evolved to do this (Herrmann et al. 2007)
Human transmission of tradition is always selective
- There is always too much to pass on, only what continues to be relevant makes it (Hodgson 1974; Morin 2016)

Not “geniuses in a data-center” (intelligence), but an all-access pass to the House of Intellect
Not “geniuses in a data-center” (intelligence), but an all-access pass to the House of Intellect
Or at least the part of that House available online
- “I am aware of all Internet traditions”
- Sucks to be a small and/or unwritten language / off-line / secretive / old / genuinely novel
Or at least the external forms of online traditions, without the inner structures / human significance
- Sometimes this might be fine?
- This might be all some traditions have (Boyer 1990)
- This might be a shambles (Vafa et al. 2025)
Consequences for transmission: ??!?!?

“What has concluded that we might conclude in regard to it?”

GenAI is not original, creative, problem-solving intelligence
It is mechanized intellect, prosthetic access to the external formulas of many but not all traditions
This is incredible, and perhaps a disaster

I’m glad you asked that!

Making this sound like Capital

Fetishization: LLM users think they are interacting with an artificial intelligence, but really it is a social relation (to all the authors of the source texts)
Intelligence:Intellect :: Labor:Capital :: Living labor:Dead labor
It is no accident, comrades, that Barzun wrote “Intellect is the capitalized … form of live intelligence”
Working in citations to Vygotsky (1986), Vygotsky (1978), Luria (1976) left as an exercise for the reader

Patterns & Transformers

Transformers are higher-order Markov chains, but still finite-order Markov chains (Zekri et al. 2024)
∴ there are many patterns they cannot learn exactly (Chomsky 1956)
But they can and do learn approximations, often short-cuts which work very badly out-of-distribution (Liu et al. 2023; Zhang et al. 2024)

Share DeLong's Grasping Reality: Economy in the 2000s & Before

References

Barzun, Jacques. 1959. The House of Intellect. New York: Harper.
Boyer, Pascal. 1990. Tradition as Truth & Communication: A Cognitive Description of Traditional Discourse. Cambridge, England: Cambridge University Press. https://doi.org/10.1017/CBO9780511521058.
Carlini, Nicholas, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, & al. 2020. “Extracting Training Data from Large Language Models.” arxiv:2012.07805. http://arxiv.org/abs/2012.07805.
Chomsky, Noam. 1956. “Three Models for the Description of Language.” IRE Transactions on Information Theory 2:113–24. https://doi.org/10.1109/TIT.1956.1056813.
Grosse, Roger, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, & al. 2023. “Studying Large Language Model Generalization with Influence Functions.” E-print, arxiv:2308.03296. http://arxiv.org/abs/2308.03296.
Harris, Zellig. 1988. Language and Information. New York: Columbia University Press.
———. 2002. “The Structure of Science Information.” Journal of Biomedical Informatics 35:215–21. https://doi.org/10.1016/S1532-0464(03)00011-X.
Herrmann, Esther, Josep Call, María Victoria Hernàndez-Lloreda, Brian Hare, & Michael Tomasello. 2007. “Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis.” Science 317:1360–6. https://doi.org/10.1126/science.1146282.
Hodgson, Marshall G. S. 1974. The Venture of Islam: Conscience & History in a World Civilization. Chicago: University of Chicago Press.
Liu, Bingbin, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, & Cyril Zhang. 2023. “Transformers Learn Shortcuts to Automata.” In The Eleventh International Conference on Learning Representations [ICLR 2023]. https://arxiv.org/abs/2210.10749.
Lord, Albert B. 1960. The Singer of Tales. Cambridge, Massachusetts: Harvard University Press.
Luria, A. R. 1976. Cognitive Development: Its Cultural & Social Foundations. Cambridge, Massachusetts: Harvard University Press.
Morin, Olivier. 2016. How Traditions Live & Die. Oxford: Oxford University Press.
Propp, Vladimir. 1968. The Morphology of the Folktale. Second. Austin: University of Texas Press. https://doi.org/10.7560/783911.
Simon, Herbert A. 1990. “A Mechanism for Social Selection & Successful Altruism.” Science 250:1665–8. https://doi.org/10.1126/science.2270480.
Vafa, Keyon, Peter G. Chang, Ashesh Rambachan, & Sendhil Mullainathan. 2025. “What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models.” http://arxiv.org/abs/2507.06952.
Vygotsky, L. S. 1978. Mind in Society: The Development of Higher Psychological Processes. Cambridge, Massachusetts: Harvard University Press.
———. 1986. Thought & Language. Cambridge, Massachusetts: MIT Press.
Zekri, Oussama, Ambroise Odonnat, Abdelhakim Benechehab, Linus Bleistein, Nicolas Boullé, & Ievgen Redko. 2024. “Large Language Models as Markov Chains.” E-print, arxiv:2410.02724. http://arxiv.org/abs/2410.02724.
Zhang, Dylan, Curt Tigges, Zory Zhang, Stella Biderman, & Talia Ringer Maxim Raginsky. 2024. “Transformer-Based Models Are Not yet Perfect at Learning to Emulate Structural Recursion.” E-print, arxiv:2401.12947. https://arxiv.org/abs/2401.12947.

2026 03 10 Shalizi Slides Aware Of All Internet Traditions

3.11MB ∙ PDF file

Download

Give a gift subscription

And I do have a number of things that I would have added, or spent more time stressing. Eight, in fact:

Track‑switching pseudo‑thought: As a MAMLM generates text, it is constantly re‑evaluating which region of its training space is “closest” to the conversation so far and jumping between them, with the apparent stream of “reasoning” consisting of a series of track switches between different human conversations, the logic of each of which they are transitorily pantomiming by mimicking the language.
Tradition as prosthetic cognition & LLMs as more than a single tradition: powerful, powerful ways of avoiding thinking everything through from scratch. MAMLMs let you plug into multiple traditions at once, retrieve their characteristic moves on demand, and remix them without ever fully internalizing them…
Giant-selecting vs. shoulder-standing: often the hardest cognitive problem is choosing which prior thinker, tradition, or model to adopt in the first place, and here MAMLMs have a structural edge given their ability to do Clever Hansing millions of times per second…
Intelligence as anthology navigation: The real superintelligence is the five‑millennia anthology of written human thought to which MAMLMs are front‑ends, and powerful front-ends to the most relevant and valuable slice of the archive—if you give them the bread-crumb trail to figure that out…
Stochastic parrotage most of the way down: that is much of human “original” cognition, and,nembarrassingly, MAMLMs do the same thing mechanically and at scale…
Mechanized intellect vs. living intelligence: These systems are best understood as mechanized intellect (stored, formulaic, externalized patterns of reasoning) rather than living intelligence (situated, goal‑directed problem‑solving). They automate access to the frozen patterns of past thought, not the inner experience or purposes that produced those patterns.
Kernel smoothing mirabile dictu: If you can get this much apparent “understanding” out of something that is, at base, clever kernel smoothing over past text, that tells you as much about the regularity and redundancy of human culture as it does about the cleverness of the engineering.
Fetishism, not sentience, the real risk: treating mechanized intellect as autonomous intelligence obscures the underlying social relation: mediated access to, and re‑packaging of, the labor and thought of past human authors, with all the power, ownership, and distributional questions that entails.

If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…

##crosspost-cosma-shalizi-aware-of-all-internet-traditions-large-language-models-as-information-retrieval-synthesis
##subturingbradbot
##crosspost
#aware-of-all-internet-traditions
#large-language-models-as-information-retrieval-synthesis
#cosma-shalizi
#aware-of-all-internet-traditions-large-language-models-as-information-retrieval-synthesis
#house-of-intellect
#anthology-super-intelligence
#asi
#stochastic-parrots
#cultural-ai

James Bradford DeLong

CROSSPOST: COSMA SHALIZI: Aware of All Internet Traditions: Large Language Models as Information Retrieval & Synthesis

Aware of All Internet Traditions: Large Language Models as Information Retrieval and Synthesis

In conclusion

1. GenAI is information retrieval

& synthesis

2. GenAI generates formulaically

3. Much of human culture is formulaic

4. Formulas are traditions

5. GenAI is the mechanization of tradition

“What has concluded that we might conclude in regard to it?”

I’m glad you asked that!

Making this sound like Capital

Patterns & Transformers

References