Nvidia’s Moment
Up through the beginning of Nvidia’s fiscal year 2023, revenue had grown 2.5-fold in two years, driven by the crypto bubble and the usefulness of its chips designed for graphics as the best processors on which to run costly, dissipative, and societal-destructive crypto mining.
Then came the crypto winter. Not only were crypto firms no longer buying Nvidia-designed chips, but crypto firms were ditching their Nvidia-designed chips and selling them back into the graphics market. It was turning out that the crypto bubble had not boosted demand for Nvidia-designed chips, but simply moved demand forward in time by perhaps a year and a half.
By late in its fiscal year 2023 Nvidia looked to be in real trouble—never mind that its chip designers were best-in-class, and that its Cuda framework was clearly the best way for humans to figure out how to get the stone to actually think.
And then came Midjourney and Chat-GPT…
The big questions now are three:
How fast can (or maybe will—lot of short-run monopoly power here) Nvidia (and TSMC) ramp up production of Nvidia-designed chips in order to meet demand?
Can competitors who lack CUDA nevertheless be good enough either to provide an alternative for people pushing modern ML forward?
How large will the modern ML market segment turn out to be?
I do not have any answers to these questions. And, as Friedrich von Hayek pointed out long ago, they are not the kinds of questions that we have enough information for anyone to answer. We have to let the market grope forward to solutions.
But we do know the bull case for NVIDIA and its ecosystem. Here is Ben Thompson summarizing NVIDIA CEO Jason Huang from last May:
Ben Thompson: Nvidia Earnings, Jensen Huang’s Defense, Training Versus Inference: ‘“The CPU[‘s] ability to scale in performance has ended and we need a new approach. Accelerated computing is full stack….” This is correct…. [To] run one job across multiple GPUs… means abstracting complexity up into the software…. “It’s a data-center scale problem….” This is the hardware manifestation of the previous point…. “Accelerated computing is… domain-specific… computational biology, and… computational fluid dynamics are fundamentally different….” This is the software manifestation of the same point: writing parallizable software is very difficult, and… Nvidia has already done a lot of the work…. “After three decades, we realize now that we’re at a tipping point….” You need a virtuous cycle between developers and end users, and it’s a bit of a chicken-and-egg problem to get started. Huang argues Nvidia has already accomplished that, and I agree with him that this is a big deal….
Huang then gave the example of how much it would cost to train a large language model with CPUs… $10 million dollars for 960 CPU servers consuming 11 GWh for 1 large language model… $400k… [2 GPU servers] 0.13 GWh…. The question, though, is just how many of those single-threaded loads that run on CPUs today can actually be parallelized…. Everyone is annoyed at their Nvidia dependency when it comes to GPUs…. Huang’s argument is that given the importance of software in effectively leveraging those GPUs, it is worth the cost to be a part of the same ecosystem as everyone else, because you will benefit from more improvements instead of having to make them all yourself….
Huang’s arguments are good ones, which is why I thought it was worth laying them all out; the question, though, is just how many loads look like LLM training?… Nvidia is very fortunate that that the generative AI moment happened now, when its chips are clearly superior…. The number of CUDA developers and applications built on Nvidia frameworks are exploding…. There are real skills and models and frameworks being developed now that, I suspect, are more differentiated than… the operating system for Sun Microsystems was…. Long-term differentiation in tech is always rooted in software and ecosystems, even if long-term value-capture has often been rooted in hardware; what makes Nvidia so compelling is that they, like Apple, combine both…