This AI Learns Continuously From New Experiences—Without Forgetting Its Past

Shelly Fan at Singularity Hub: Our brains are constantly learning. That new sandwich deli rocks. That gas station? Better avoid it in the future.

Memories like these physically rewire connections in the brain region that supports new learning. During sleep, the previous day’s memories are shuttled to other parts of the brain for long-term storage, freeing up brain cells for new experiences the next day. In other words, the brain can continuously soak up our everyday lives without losing access to memories of what came before.

AI, not so much. GPT-4 and other large language and multimodal models, which have taken the world by storm, are built using deep learning, a family of algorithms that loosely mimic the brain. The problem? “Deep learning systems with standard algorithms slowly lose the ability to learn,” Dr.  Shibhansh Dohare at University of Alberta recently told Nature.

The reason for this is in how they’re set up and trained. Deep learning relies on multiple networks of artificial neurons that are connected to each other. Feeding data into the algorithms—say, reams of online resources like blogs, news articles, and YouTube and Reddit comments—changes the strength of these connections, so that the AI eventually “learns” patterns in the data and uses these patterns to churn out eloquent responses.

But these systems are basically brains frozen in time. Tackling a new task sometimes requires a whole new round of training and learning, which erases what came before and costs millions of dollars. For ChatGPT and other AI tools, this means they become increasingly outdated over time.

This week, Dohare and colleagues found a way to solve the problem. The key is to selectively reset some artificial neurons after a task, but without substantially changing the entire network—a bit like what happens in the brain as we sleep.

When tested with a continual visual learning task—say differentiating cats from houses or telling apart stop signs and school buses—deep learning algorithms equipped with selective resetting easily maintained high accuracy over 5,000 different tasks. Standard algorithms, in contrast, rapidly deteriorated, their success eventually dropping to about a coin-toss.

Called continual back propagation, the strategy is “among the first of a large and fast-growing set of methods” to deal with the continuous learning problem, wrote Drs. Clare Lyle and Razvan Pascanu at Google DeepMind, who were not involved in the study.

Machine Mind

Deep learning is one of the most popular ways to train AI. Inspired by the brain, these algorithms have layers of artificial neurons that connect to form artificial neural networks.

As an algorithm learns, some connections strengthen, while others dwindle. This process, called plasticity, mimics how the brain learns and optimizes artificial neural networks so they can deliver the best answer to a problem.

But deep learning algorithms aren’t as flexible as the brain. Once trained, their weights are stuck. Learning a new task reconfigures weights in existing networks—and in the process, the AI “forgets” previous experiences. It’s usually not a problem for typical uses like recognizing images or processing language (with the caveat that they can’t adapt to new data on the fly). But it’s highly problematic when training and using more sophisticated algorithms—for example, those that learn and respond to their environments like humans.

Using a classic gaming example, “a neural network can be trained to obtain a perfect score on the video game Pong, but training the same network to then play Space Invaders will cause its performance on Pong to drop considerably,” wrote Lyle and Pascanu.

Aptly called catastrophic forgetting, computer scientists have been battling the problem for years. An easy solution is to wipe the slate clean and retrain an AI for a new task from scratch, using a combination of old and new data. Although it recovers the AI’s abilities, the nuclear option also erases all previous knowledge. And while the strategy is doable for smaller AI models, it isn’t practical for huge ones, such as those that power large language models.

More here.