ScratchPad: 2024-05-27 Mo: The Doubtful Utility for General or Even Specific Search of GPT LLMs...
A scratchpad…
MAMLMs: The very sharp Henry Farrell has a nice take on Google’s current headlong charge forward into the unsurveyed and unscouted ground of using GPT LLMs as a natural-language interface to search. I share his belief that such will flop when applied to low-quality but might prove very useful in summarizing central tendencies of high-quality data.
But, then, this gives me considerable pause:
I asked ChatGPT4o for a Chicago Manual citation for the paper at the url <https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.12.4.151>. Its reply? This: “Samuelson, Paul A. 1998. ‘The Historical Background of the Communist Manifesto’. Journal of Economic Perspectives 12 (4): 151-174. Accessed May 25, 2024. <https://web.archive.org/web/*/https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.12.4.151>. Please let me know if you need any further assistance!” The “pp. 151-174” is right. The “12 (4):” is right. For some reason or other “(Fall)”) is omitted. The “Journal of Economic Perspectives” is right. The “‘The Historical Background of the Communist Manifesto’” is right. The “1998” is right. But the “Samuelson, Paul A.” is wrong—the author is “Boyer, George R.”
You can understand how a GPT LLM might do this: Paul A. Samuelson published a lot of economics papers and appears in a hugely larger share of the training corpus than does George R. Boyer. But that means that even restricting the training corpus and the RAG context to the highest possible data is not going to fix things:
Henry Farrell: ‘Large language models are engines of summarization…potentially valuable where [(a)] you… want… summarizations, (b) the data… is fairly high quality, © you are prepared to tolerate… slop, and (d) you aren’t too worried if… outputs… drift towards [the] central… training corpus…. The “certain amount of slop” could end up being a real problem… even when the slop isn’t that much worse in absolute terms… than Google Original Flavor. You can tolerate slop as long as people don’t directly attribute it to you…. The outliers and soon-forthcoming New-Googlebombs that we’re focusing on right now are probably less consequential than the models’ convergence on central cultural tendencies… <https://www.programmablemutter.com/p/google-ai-fails-the-taste-test>