IBM’s AI learns to navigate around a virtual home using common sense

Share this post:

You know a shirt belongs in a wardrobe. I know a shirt belongs in a wardrobe. Does an AI know that?

Typically, not.

But it can learn by interacting with the world around it. We wanted to boost this technique, known as Reinforcement Learning, by injecting common sense into an AI model — and helping it to learn faster.

In a recent paper, “Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines,” introduced at the 2021 AAAI Conference on Artificial Intelligence, we describe an AI that trades off “exploration” of the world with “exploitation” of its action strategy to maximize rewards. In Reinforcement Learning, an AI gets a reward — such as a bag of gold behind a locked door in a video game — every time it reaches specific desirable states.

We have greatly improved this exploration vs. exploitation tradeoff using additional commonsense knowledge — in the form of crowdsourced text. Our work could lead to better mapping and navigation applications, and to a new generation of interactive assistive agents able to reason like humans.

Doing it like a game

Reinforcement Learning has been an important focus area in the AI community, but it’s still a very intractable and unscalable problem. To tackle it, we opted for a game-like application of Natural Language Processing (NLP), inspired by games such as Dungeons & Dragons and Zork.

In the app, a player or an AI interacts with the world mainly through text. The “world” sends messages to the agent, and the agent sends back actions in the form of commands or sentences. For example, the environment can send an observation such as: “You are in a room. There are two doors leading north and west. You see a box.” In response to this, the AI agent can issue a command, such as: “Go west.”

One popular text-based app among AI researchers is Microsoft’s TextWorld. It involves an AI agent in a home-like setting. The AI moves around by learning how to navigate between rooms and interact with objects found in a home, using only text-based interactions. Most Reinforcement Learning-based AIs find such navigation challenging when the problem size increases beyond a few rooms and objects.

We decided to “upgrade” the AI’s abilities by introducing it to external commonsense knowledge about the world. We wanted the system to “know” that apples, say, are stored in the fridge, while a knife is stored in a drawer, and shirts? In the wardrobe. Once the AI had this data, we wanted to measure an improvement in its efficiency and performance in solving tasks.

To do so, we developed an extension of the environment where such knowledge would be crucial — the so-called TextWorld Commonsense (TWC) environment. It builds on top of the TextWorld system to generate problems and worlds that require the use of commonsense knowledge to be solved efficiently. Just like with measuring the learning outcomes of a student and the impact of the external course material, it’s vital to create a testing environment to measure all aspects of the learning.

Getting an AI to tidy up

Once the TWC was built, we decided to focus on the task of cleaning up the house. We placed the Reinforcement Learning AI in a virtual world with multiple objects and containers in different rooms. The AI’s job was to pick up misplaced objects and place them in the correct containers, navigating between rooms if necessary. Common sense was crucial for the machine to complete such a seemingly innocuous task. The AI simply had no other way of knowing where a specific object type belonged — that, say, a green shirt belonged in the wardrobe. Without such commonsense knowledge, the agent would have to randomly try different combinations — a rather intractable problem.

We then ran a human annotation task to measure human performance on this new environment. Specifically, we set up an interactive interface where human players could play games in the TWC, and logged their actions for problems of different sizes. Annotators played more than 100 games, allowing us to establish a base of human performance across different task difficulties — easy, medium, and hard.

Next, we assessed the performance of different Reinforcement Learning AIs on these problems. We evaluated the existing state-of-the-art agents that relied solely on the textual information from the game. And we compared their performance to that of an AI that had access to additional commonsense knowledge from external knowledge sources such as ConceptNet.

This allowed us to determine just how valuable commonsense knowledge really was for an AI. It turned out that it was — in the “easy” setting, agents endowed with commonsense were able to halve the number of steps required to solve problems compared to agents that only had access to the game text. There were also significant score increases across all the difficulty levels with the use of commonsense knowledge. Crucially, these results applied both to in-distribution as well as out-of-distribution problems, showing that in addition to improving the efficiency of the AI agents, commonsense knowledge also helps them generalize to unseen scenarios — much like humans.

How the AI cleans up the kitchen.

How the AI cleans up the kitchen.

Dealing with noise

While adding common sense to a machine might be important, it’s not always easy.

Most repositories of external commonsense knowledge, like ConceptNet, are crowdsourced through the web. The information can be incomplete or duplicated, often with dubious sources. Sorting through this data of millions of concepts and relations associating those concepts is incredibly tricky. The NLP community is currently busy tackling this problem.

Then there is the challenge of scale. Given the limits on compute power, and the need to balance desired outcomes with the amount of energy and time spent on training algorithms, it is crucial to improve approximation and scale-up.

Finally, there is a challenge unique to text-based games like TextWorld — that of the human baseline performance. As our human annotation task showed, humans are able to solve these kinds of tasks quite naturally, and their performance exceeds even the best Reinforcement Learning AI. This gap in performance between the very best AIs and humans is a key area for improvement, and we are trying to reduce this gap by using more information from the game.

To encourage the research community to build on top of the TextWorld Commonsense (TWC) platform that we created, we have open-sourced the platform and all the code, available on GitHub.

Dr. Mrinmaya Sachan, assistant professor of Machine Learning and Natural Language Processing at ETH Zurich also contributed to this article.


IBM Research AI is proudly sponsoring AAAI2021 as a Platinum Sponsor. We will present 40 main track papers, in addition to at least seven workshop papers, 10 demos, four IAAI papers, and one tutorial. IBM Research AI is also co-organizing three workshops. We hope you can join us from February 2-9 to learn more about our research. To view our full presence at AAAI 2021, visit here.


Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.


Research Staff Member, IBM Research

Kartik Talamadupula

Research Scientist, IBM Research

Pavan Kapanipathi

Research Staff Member, AI, IBM Research

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading