How Do Machines ‘Grok’ Data?

Anil Ananthaswamy in Quanta: In January 2022, researchers at OpenAI, the company behind ChatGPT, reported that these systems, when accidentally allowed to munch on data for much longer than usual, developed unique ways of solving problems. Typically, when engineers build machine learning models out of neural networks — composed of units of computation called artificial neurons — they tend to stop the training at a certain point, called the overfitting regime. This is when the network basically begins memorizing its training data and often won’t generalize to new, unseen information. But when the OpenAI team accidentally trained a small network way beyond this point, it seemed to develop an understanding of the problem that went beyond simply memorizing — it could suddenly ace any test data.

The researchers named the phenomenon “grokking,” a term coined by science-fiction author Robert A. Heinlein to mean understanding something “so thoroughly that the observer becomes a part of the process being observed.”

The overtrained neural network, designed to perform certain mathematical operations, had learned the general structure of the numbers and internalized the result. It had grokked and become the solution.

“This [was] very exciting and thought provoking,” said Mikhail Belkin of the University of California, San Diego, who studies the theoretical and empirical properties of neural networks. “It spurred a lot of follow-up work.”

Indeed, others have replicated the results and even reverse-engineered them. The most recent papers not only clarified what these neural networks are doing when they grok but also provided a new lens through which to examine their innards. “The grokking setup is like a good model organism for understanding lots of different aspects of deep learning,” said Eric Michaud of the Massachusetts Institute of Technology.

Peering inside this organism is at times quite revealing. “Not only can you find beautiful structure, but that beautiful structure is important for understanding what’s going on internally,” said Neel Nanda, now at Google DeepMind in London.

More here.

How Do Machines ‘Grok’ Data?

Related Posts: