AI Systems Still Confound Researchers

By BEN BRUBAKER in Quanta: An open secret about artificial intelligence systems like ChatGPT is that they all have an unsettling quirk: Not even the researchers who build them fully understand how they work. These large language models, or LLMs, are special computer programs based on mathematical structures called neural networks. Although neural networks are now ubiquitous in scientific research and daily life, and researchers have studied them for over half a century, their inner workings remain mysterious. How is that possible?

It’s not that the underlying mathematics is especially complicated. The simplest neural networks, called feed-forward networks, are organized as enormous webs of interconnected “neurons” — really just copies of the same simple mathematical function — arranged in layers. The outputs of one layer become the inputs for the next layer in the hierarchy. A set of numbers called the network’s “parameters” quantify the strength of connections between neurons. The networks used in LLMs, known as transformers, have a slightly more complicated structure and can have hundreds of billions of parameters.

To build a neural network, researchers first specify its size and layout, then set all its parameters to random values. That simple setup means newborn neural networks generate outputs that are entirely unrelated to their inputs. Researchers’ difficulty understanding their behavior starts with the training process through which networks learn to produce useful outputs. During training, researchers feed a network a mountain of data, along with a criterion for evaluating different possible outputs. Every time the network sees a new input, it spits out an output and then tweaks its parameters toward values that will produce a better output.

This strategy is extremely simple — it’s analogous to descending a mountain by repeatedly taking small steps in the direction where the downhill slope is steepest. Try this on an actual hiking trip, and chances are you’ll quickly fall into a crevasse. But when neural networks use this approach, navigating landscapes with many billions of dimensions, it works far better than it has any right to. Without a clear picture of the topography of this vast landscape, it’s hard to understand why the network ends up with a particular set of parameters.

The difficulty in understanding the training process is one problem, but the behavior of trained neural networks can be just as confusing. In principle, it’s easy to follow all the simple mathematical operations that collectively generate the network’s outputs. But in large networks, it’s hard to turn all that math into a qualitative explanation of what’s responsible for any given output, and researchers have had little success in pinpointing the role of individual neurons. That’s yet another reason it’s hard to make sense of the behavior of LLMs.

What’s New and Noteworthy

Researchers have taken many different approaches to studying the inner workings of neural networks. While some seek clues directly in real AI systems, others take a step back and look at the underlying mathematics, proving rigorous theorems about how networks must behave.

Many of these mathematical investigations focus on a neural network’s layout. Specifying the layout of a simple feed-forward network is just a matter of defining two numbers: the network’s “depth” (the number of layers of neurons) and its “width” (the number of neurons in each layer). It wasn’t until a few years ago that researchers pinned down the trade-off between depth and width even in these simple networks. More recently, researchers have continued this line of work by empirically studying the role of depth and width in large language models.

An even more basic consideration than a network’s layout is its overall size. The past decade of progress has dramatically illustrated that increasing the number of parameters in a neural network almost always improves its performance. But this observation is hard to reconcile with the theoretical framework traditionally used by statisticians, which predicts that past a certain point, more parameters should be detrimental. Two years ago, researchers took a step toward resolving this tension, by proving that extra parameters can help make neural network outputs less sensitive to small changes in their inputs.

Researchers have also studied how neural networks learn during the training process. One line of work has established a precise mathematical correspondence between neural networks and seemingly unrelated machine learning techniques that were popular in the early 2000s. The results excited many researchers because those older techniques are easier to analyze. But so far, the connection holds only for specific types of neural networks — a comprehensive theory remains elusive.

Abstract theoretical analysis can help clarify the intrinsic capabilities of neural networks, but it does have limits. It’s harder to prove theorems about more complex networks like the ones underlying large language models, and these models are trained on enormous and messy data sets, which are also difficult to characterize. But that hasn’t deterred researchers from trying.