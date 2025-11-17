AI researchers say the newest wave of memory upgrades is pushing machine recall a little closer to the way the human brain learns and adapts.
A push to fix a long-standing limitation in artificial intelligence is driving the shift. Today’s large language models (LLMs) can write code, summarize complex material and parse long documents. Still, they typically learn in a single training run and rarely update without risking “catastrophic forgetting,” a term researchers use to describe the loss of earlier knowledge when new training is added. Google Research outlined one of the latest attempts to address that problem last week with a training framework called Nested Learning.
“If deployed carefully, it could lead to genuinely personalized models, ones that learn with you instead of being frozen in time,” Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, said in an interview with IBM Think.
Researchers generally group continual-learning techniques into three broad strategies. Some rely on replay, where a model revisits or regenerates earlier data. Others use regularization, which limits how much important weights can shift as a model learns new tasks. A third set uses dynamic architectures, adding extra capacity when new skills are introduced. One of the best-known regularization methods is DeepMind’s elastic weight consolidation, introduced in a 2017 PNAS paper, which slows down updates to parameters that the system has previously identified as critical.
Google’s Nested Learning method takes a different direction. The authors describe a neural network as a set of nested optimization problems, each operating at its own update frequency. According to the paper, fast-changing components handle immediate context, slower ones capture more stable patterns, and deep layers remain mostly fixed. The company refers to this design as a “continuum memory system,” arguing that architecture and optimization should be viewed as linked rather than separate.
Goodhart said the framing aligns with how human learning typically works. “Human learning happens at a much slower pace, with much greater repetition,” he said. “Deep learning went broad and fast because that is what the funding model rewarded.”
To demonstrate the idea of distributing learning across different timescales, Google built a prototype architecture called “Hope,” which is based on the Titans family of long-term memory modules. According to the company’s published results, Hope incorporates multiple layers of in-context learning and adds continuum memory blocks that adjust at different rates.
In evaluations described in the Google paper, Hope achieved lower perplexity and higher accuracy than a standard transformer and several recurrent baselines on language modeling and common-sense reasoning benchmarks. Google also reported that Hope and Titans outperformed models such as Mamba 2 and TTT on long-context, “needle-in-a-haystack” retrieval tasks.
The company’s results do not resolve the problem of forgetting, Goodhart said. Still, they suggest that distributing updates across several time periods can improve long-term performance and reduce interference with earlier learning.
Goodhart said the approach still faces practical limits. “Modern AI relies on static weights to build trust,” he said. “If it worked yesterday, it should work today. A continuously learning model could behave differently for different users, which raises security and consistency issues.”
“Humans learn slowly and repetitively,” Goodhart said. “This could be a step toward models that do the same.”
