Pioneering reinforcement learning researcher contemplates AI's future

14 March 2025

Author

Sascha Brodsky

Tech Reporter, Editorial Lead

IBM

As artificial intelligence increasingly shapes our world, one of its founding fathers warns against hype and fear.

Andrew Barto, recently honored with the Turing Award, computing's highest honor, has spent decades developing reinforcement learning—the technology now powering everything from game-playing AI champions to drug discovery systems and the reasoning capabilities behind today's large language models. In an interview with IBM Think, Barto offers a measured assessment of AI's progress, potential and limitations that cuts through techno-optimism and doomsday scenarios.

Reinforcement learning, the computational approach to learning from interaction that Barto helped develop, has become ubiquitous in today's AI landscape. While many associate it with headline-grabbing achievements like defeating world champions at complex games, Barto sees its most meaningful applications in more practical domains.

"It's already being used in a number of places, a lot in robotics," he explains. "There are great possibilities for robots using reinforcement learning to enable them to do very detailed, helpful movements that could assist people at home or people with disabilities."

From game-playing to life-saving

Barto highlights medical applications where reinforcement learning optimizes treatment protocols over extended periods—precisely the kind of sequential decision-making problems where the technology excels.

"One of the features of reinforcement learning is that it can deal with sequential decision problems where a number of decisions are made over time, and in each case, the state of the system depends on the previous decision," he says. This ability to handle delayed rewards—consequences that only materialize after a sequence of actions—represents a fundamental challenge that reinforcement learning algorithms address.

Matt Riemer, a Deep Learning Research Engineer at the IBM AI Foundations Lab, points to even more recent applications.

"Researchers have successfully applied reinforcement learning-based approaches to the problem of drug discovery where they are just starting to see some very promising results," he told Think in an interview. "It has also recently had success for important problems such as optimizing and automating the process of water treatment."

Behind the impressive abilities of today's chatbots lies reinforcement learning. Riemer explains: "With the recent success of LLMs, we have seen high-profile use cases of RL improving their capabilities." The first major application was called RLHF—reinforcement learning from human feedback—which helps these systems produce responses that better match what people want.

While large language models have captured public attention with their ability to generate human-like text, their development owes much to reinforcement learning. As Riemer explains, "More recently, we have seen RL emerge as the most prominent approach for training so-called 'thinking' models that learn a chain of thought process that improves the reasoning capabilities of LLMs."

Math problems make ideal training grounds for these systems. "For problems like mathematical reasoning, it is easy to construct verifiable rewards, i.e., 'did the agent answer the problem correctly or not?'" explains Riemer. These clear right-or-wrong answers create what he calls a "pseudo simulation environment" where AI can learn through repeated practice.

The impact of reinforcement learning extends beyond academic research or specialized applications. Its influence is increasingly felt in technologies that interact with everyday users. "This is once again probably just the beginning as we are likely to see RL play an even more prominent role as the field starts to develop 'AI agents' that interact with web browsers and other tools to better assist users," Riemer predicts.

Barto maintains the cautious optimism of a scientist who has witnessed numerous technological hype cycles. He acknowledges the challenge when asked about AI safety and alignment— ensuring AI systems act by human values.

"The alignment problem is a non-trivial problem," he says. "One would hope that an RL system can direct an AI to incorporate the values of humans who are using the system. So, hopefully, that can happen. I don't have a prescription for it."

Looking for inspiration about AI's rewards, Barto turns to our brains. "Our reward functions come from mechanisms that evolved over millions of years," he explains. Unlike simple computer rewards, human motivation emerges from complex evolutionary pressures that kept our ancestors alive and reproducing.

This evolutionary perspective informs his thinking about multi-criterion reinforcement learning, where systems respond to several reward signals rather than just one—potentially mirroring how different parts of the human brain process various forms of feedback.

"I think multi-criterion reinforcement learning is something that is really quite important," Barto notes. "Instead of having one reward function, there can be several, and … different parts of the brain, for example, probably received different signals."

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

Bridging simulations and reality

Reinforcement learning excels in video games and simulations but struggles in the real world. The problem? These systems learn by exploring different actions—a strength in virtual environments but a major risk in reality. "Exploration is both the biggest selling point of RL and its biggest limiting factor for real-world use," explains Riemer, highlighting why both researchers see this transition as a critical challenge.

"In the real-world, outside of simulation, exploration can lead to the agent doing unpredictable things, which are a major concern for AI safety," Riemer explains. "Also, even for use cases where we can tolerate exploration, there is an issue with the sample efficiency of RL. It often feels like it needs to explore much more than a human would in the same situation."

Barto notes similar challenges: "It's going to take much longer because simulations can run much, much faster than physical experience in the world." He adds, "If it's a robot, it learns through trial and error, and if an error leads to a fall or something that damages the machine, then that's the problem."

This cautious approach to real-world deployment stems from both practical and safety considerations. Barto emphasizes the need for careful specification of reward functions "so that the system doesn't come up with something that is really unexpected and possibly problematic."

The challenge extends beyond mere implementation. As Riemer points out, reinforcement learning systems must also adapt to changing environments: "Continual RL studies the question of how RL agents can adapt to the changing nature of real-world environments, i.e., when the world is different than it was before during pre-training or when training in a simulator."

This adaptability presents what Riemer calls "the classic problem of the 'stability-plasticity dilemma' where the agent must decide how to prioritize performance on its new experiences and performance on its old experiences." This balancing act between retaining prior knowledge while adapting to new conditions represents an ongoing challenge in the field.

Despite these obstacles, researchers are finding promising solutions by combining reinforcement learning with other AI approaches. Riemer sees particular promise in the integration with large language models: "What RL was really lacking was an ability to understand the world enough so that it can structure its exploration more logically. We are starting to see evidence that LLMs can be used as a strong foundation of world knowledge to build RL training on top of, which is very exciting from the perspective on enabling real-world use cases for RL."

The integration between reinforcement learning and other AI techniques is evolving rapidly. "The major trend we are seeing is the way that other methods can help RL build a representation of the world that it can use to explore more efficiently," Riemer says. "For example, in language domains, RL has become a very effective tool used on top of pre-trained LLMs."

This complementary relationship works both ways—reinforcement learning enhances language models, while language models provide reinforcement learning systems with better representations of the world. "We are starting to see similar things for use cases like robotics or building AI agents where RL is becoming more effective when combined with the knowledge incorporated in VLMs that also have vision capabilities," Riemer explains.

When the conversation turns to artificial general intelligence (AGI)—systems with human-like cognitive abilities across domains—Barto expresses skepticism about both its likelihood and desirability as a research goal.

"I don't see the utility of making human-level intelligence a goal," he states candidly. "The goal of trying to understand how human intelligence works is different than trying to create machines that are at a human level."

One particularly intriguing frontier Barto identifies is multi-agent reinforcement learning—systems where multiple learning agents interact, potentially with different objectives. This approach not only has implications for AI development but might also illuminate how our own brains function.

"The hypothesis that neurons are reinforcement learning agents, and that the brain is a society of interacting agents that could have different goals among themselves" remains an "unusual hypothesis," he acknowledges, but one with potential implications for neuroscience.

For Barto, the most valuable contributions of reinforcement learning may not be in creating human-like intelligence but in solving specific problems that improve human lives—a legacy perhaps more meaningful than the Turing Award itself.

AI Academy

Become an AI expert

Gain the knowledge to prioritize AI investments that drive business growth. Get started with our free AI Academy today and lead the future of AI in your organization.

Related solutions
IBM® watsonx.ai™

Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.

Discover watsonx.ai
Artificial intelligence solutions

Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
AI services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Get one-stop access to capabilities that span the AI development lifecycle. Produce powerful AI solutions with user-friendly interfaces, workflows and access to industry-standard APIs and SDKs.

Explore watsonx.ai Book a live demo