The Best Things I Have Found on the Unreasonable Effectiveness of Neural Networks

It may simply be that I am in a very unusual position with respect to what I know & what I don't, but I found these three videos from Welch Labs to be incredibly enlightening with respect to the unreasonable effectiveness of neural-network models…

Folding space: How back-propagation and ReLUs can actually learn to fit pieces of the world. Intuition is actually possible! Geometrically, neural nets work not by magic, but by folding planes into shapes—again and again, at huge scale, in extraordinarily high numbers of dimensions. Addition of innumerable such shapes composes simple bends into complex functions, and back-propagation finds such functions that fit faster than it has any right to.

But our geometrical intuition is limited. Our low‑dimensional brains misread the danger that a model might get “stuck” in a bad local minimum of the loss function, and not know which way to move to get to a better result. In large numbers of dimensions—when you have hundreds of thousands, or more of parameters to adjust what emerges looks to us low-dimensional visualizers as “wormholes.” As gradient descent proceeds, the proper shift in the slice you see reveals nearby, better valleys in the loss function down which the model can move.

Your mileage may, and probably will, vary. But these visual intuitions click for me. And I can at least believe, even if not see, how things change when our vector spaces shift from three- to million-dimension ones, in which almost all vectors chosen at random are very close to being at right angles to each other.

Share Brad DeLong's Grasping Reality

From Welch Labs: <https://www.youtube.com/@WelchLabsVideo/videos>. The first of these (which was the third made) three “How Models Learn” videos is the one I found most illuminating.

As I understand its major points:

The idea is to classify which locations are in Belgium and which are in Holland,
You do this by constructing a 3-D surface, in which the x-axis is longitude, the y-axis is latitude, and the z-axis is your confidence that the location is in Holland.
Anything you can do with an n-layer neural network you could do with a (much larger) single-hidden-layer network.
Each node can be thought of as (a) taking a plane, (b) making a fold in it, (c) bending the fold up to a greater or lesser degree, and then (d) shift the resulting shape up or down.
Your final shape is the sum of all of these bent, folded, and shifted planes.
Stacking such layers composes many folds into a very rich piecewise-linear geometry.
And so you can see how, with enough nodes, extraordinary flexibility is possible with a single hidden layer of nodes.
But existence ≠ trainability: even with 100k neurons in one hidden layer, gradient descent may fail or simply need far too many nodes.
Optimization beats existence: the universal approximation nature of a single hidden layer doesn’t get you to where the rubber meets the road.
Depth of the network compounds expressivity: repeatedly folding, scaling, and combining surfaces yields far more complex tilings of input space than a single wide layer.
Back-propagation has geometry: gradients shift fold lines and surface heights; learning is moving joints and planes to reduce loss.

And here is the video:

<https://www.youtube.com/watch?v=qx7hirqgfuU&list=FLupRdJE0AjQUa3Ab-fo88NQ&index=13>

Give a gift subscription

<https://www.youtube.com/watch?v=VkHfRKewkWw&t=1s>

I learned less—but still a lot—that I could grasp and visualize from the second video, the one just above. Still, notably:

Backpropagation is the workhorse of modern AI: a simple, scalable rule that updates millions to billions of parameters efficiently.
With two inputs (latitude/longitude), neurons become planes; the model learns which plane sits “on top” per region.
Simple linear models can’t carve intricate borders; they need depth/activation to capture complex partitions.
The Belgium–Netherlands enclave map illustrates why naive linear boundaries fail and why architecture matters.
History lesson: early skepticism vastly underestimated how far this mathematically modest method could scale with data and compute.

If reading this gets you Value Above Replacement, then become a free subscriber to this newsletter. And forward it! And if your VAR from this newsletter is in the three digits or more each year, please become a paid subscriber! I am trying to make you readers—and myself—smarter. Please tell me if I succeed, or how I fail…

James Bradford DeLong

The Best Things I Have Found on the Unreasonable Effectiveness of Neural Networks

It may simply be that I am in a very unusual position with respect to what I know & what I don't, but I found these three videos from Welch Labs to be incredibly enlightening with respect to the unreasonable effectiveness of neural-network models…

#mamlms
#neural-networks
#the-best-things-i-have-found-on-the-unreasonable-effectiveness-of-neural-networks

The Best Things I Have Found on the Unreasonable Effectiveness of Neural Networks

It may simply be that I am in a very unusual position with respect to what I know & what I don't, but I found these three videos from Welch Labs to be incredibly enlightening with respect to the unreasonable effectiveness of neural-network models…

#mamlms#neural-networks#the-best-things-i-have-found-on-the-unreasonable-effectiveness-of-neural-networks

#mamlms
#neural-networks
#the-best-things-i-have-found-on-the-unreasonable-effectiveness-of-neural-networks