In the past few years, advances in artificial intelligence have captured the public imagination and led to widespread acceptance of AI-infused assistants. But this accelerating pace of innovation comes with increased uncertainty about where the technology is headed and how it will impact society.

One of the clearest areas of agreement, however, is that advancing the ability of computers to interact with us in a more natural way is critical for the AI-human relationship to reach its fullest potential. We spoke to 30 of AI’s most knowledgeable scientists and thought leaders about the future of the technology, and most agreed that advances in human-computer interaction (HCI) will be both dependent on AI—and essential to progressing its applications.

The consensus is that within three to five years, advances in AI will make the conversational capabilities of computers vastly more sophisticated, paving the way for a sea change in computing. And the key lies in helping machines master one critical element for effective conversation—context.

For people and machines to work together, they need to be able to interact in a much more natural way, and conversation is our go-to way of exchanging information.

— Murray Campbell, IBM Distinguished Researcher and architect of DeepBlue

HCI shifting to conversation, but users expect more

While the move toward conversation might seem like a natural progression of HCI, AI thought leaders point out that talking to machines actually represents a tectonic shift in computing. It marks the first significant departure from the command-based, on-screen interaction we’ve used since the dawn of the modern computing age.

This shift, of course, has already begun. AI-powered assistants on our phones, and more recently in the home, allow us to interact with them conversationally through voice. And AI-infused chatbots let us ask a wide array of questions and receive answers via typed text. But user frustration levels with AI conversational agents are beginning to rise.

“Chatbots were super-hot and now not-quite-as-much,” says Shivon Zillis, partner at AI-focused venture capital firm Bloomberg Beta. “They're seeing some early successes in a few narrow applications like customer support and smart appliances, but people are getting frustrated because they have overly high expectations.”

The conversational interface is not only one of the first places that people are noticing AI, but will also will be heavily dependent on AI to make it work.

— Kevin Kelly, co-founder of Wired and author of The Inevitable

The source of such high expectations? Significant advances in machine learning have allowed conversational systems to better recognize speech and transform text into speech—two key elements in natural language processing (NLP). As a result, conversational agents can respond with human-like quickness via voice and text, leading users to wrongly assume these agents are also capable of unbound, back-and-forth exchanges. “Unfortunately I can't yet really have a dialogue with Siri, for example. I can ask her, ‘what is the weather today?’ But I can't then ask, ‘should I wear rain boots’ and get a proper response,” explains Satinder Singh, Director of the Artificial Intelligence Lab at the University of Michigan.

Nor can conversational agents yet meet user expectations related to sensing and responding with emotion. “People identify with and personify computers, and even more so, computer agents,” says David Konopnicki, an IBM Research Manager who studies affective computing. “Even when people know that they are having a conversation with a computer, it’s surprising to see that they not only appreciate that the computer has empathy— they expect it.”

These limitations exist because computers have not yet made the great strides in natural language understanding and dialogue that they’ve achieved in NLP. Without this, most computer responses are painstakingly scripted by engineers using if-then rules. “It’s really difficult to anticipate every way a conversation may go, and if you leave out some critical paths, then you end up with the system saying, ‘I don’t understand’,” says IBM Distinguished Researcher Murray Campbell, who was one of the architects of IBM’s AI chess master, DeepBlue.

Language is a very tough nut to crack because it allows us in a succinct way, without using a whole lot of symbols, to say an extraordinarily diverse set of things.

— Vijay Saraswat, IBM Chief Scientist for Compliance Solutions

Context to the rescue

How do we get beyond “I don’t understand” as a response to unexpected triggers? The key is to better embed a sense of context in conversational systems. Context involves many interconnected layers of accumulated knowledge that humans acquire and apply in conversation with little effort but computers cannot yet amass—the purpose of a conversation, where the person you’re speaking with has likely just been or where they are now, applicable learnings from previous interactions with people who had the same purpose, general information about the world as it relates to this purpose, what has been said previously in the course of the current and past conversations with this person and so much more.

“How do you really carry context throughout a dialogue? This is the biggest challenge,” says Michigan’s Singh. “A lot of how you understand what I’m saying depends on what I said maybe five sentences ago, or fifteen sentences ago. You’re building up a state of the conversation.”

It’s not only about getting context from you, but from others, from the environment, from history.

— Michael Karasick, IBM Vice President of Cognitive Computing

Providing computers with context is not a simple exercise—it’s a complex problem for which computer scientists are experimenting with many different potential solutions. They’re looking particularly close at machine learning algorithms and, more specifically, training them using two techniques—supervised learning and reinforcement learning—that have been leveraged to teach AI systems to perform many other tasks.

“With reinforcement learning, we would say the system is somehow going to do trial and error learning,” says Singh. “It’s going to try to say things it hasn’t said before, because maybe it helps the dialogue move along faster and more effectively. It’ll try to say things differently. The process helps it learn a mapping to know what to say next, given the context and the overall task goal.”

Supervised learning involves teaching computers through examples. “Say a company that makes computers has a service help line,” Singh explains. “In principle, I could record gazillions of customer-agent dialogues from a call center. In a supervised learning approach, I could look at, ‘when the dialog was in a certain context, what did the call center agent say?’ The system could then learn to imitate that call center agent.”

Yoshua Bengio, deep learning pioneer and University of Montreal professor, anticipates that we’ll see systems that can understand as well as do a good job at generating natural language within the next five years. According to Mark Sagar, CEO of AI startup Soul Machines and winner of two Academy Awards for his human-like animation work, AI systems could very soon have reasonably natural conversations in certain domains, grounded in context.

The continued development of Internet of Things capabilities will also help supply AI systems with more context, as sensors send contextual data about people and objects to AI systems. Singh explains, “As these things talk to each other, they accumulate a sense of context, and as that context becomes available for a dialogue-like interaction, dialogs will become more useful and better. And there'll be a virtuous cycle. As they become more useful and better, we'll engage with them more.”

A future of powerful AI assistants

In the short term, conversation grounded in broader context will give rise to personal assistants with more robust utility—both in our work and personal lives. The AI experts see health, legal, education and even AI research as the best early fits for workplace assistants.

“If you can use AI to read 400,000 research papers automatically, organize the knowledge and then combine your intuition with machine learning, you can sharpen the research field—instead of fanning out for a solution, you fan in. This is what I believe is going to be really game changing for research in the future,” says IBM Cognitive Solutions Research Manager Costas Bekas.

And in our personal lives, AI assistants will provide guidance in personal decision-making and, later, even physical support.

“We’re making all these information-based decisions every day, whether it is on the job or personal,” says IBM Chief Science Officer of Cognitive Computing Guru Banavar. “So, for example, if I want to think about which school to send my kids to, that’s a complex decision that I’d love to get help with from AI.”

“I also think a lot about the eldercare scenario,” says IBM’s Campbell. “As people get older, they’ll have specific physical and cognitive limitations. Robotic and AI systems can greatly improve their quality of life.”

Our interactions will be way more conversational, much more multi-modal. Apps will be able to pick up on our gestures, our facial expressions, our emotions, what is being said in our voice.

— Gabi Zijderveld, Chief Marketing Officer, Affectiva

A future of pervasive AI

The ability of humans to interact with AI systems effectively through verbal, contextual conversation will allow scientists to embed the technology into objects around us, whether or not they have a screen.

“Imagine artificial intelligence and cognitive power in an avatar, an object in your hand, a robot, or even in the walls of an operating room, conference room or spacecraft,” explains IBM Chief Watson Scientist Grady Booch. “If I’m a psychologist and I want to detect who is in a room, who is looking at whom, who is in a clique with one another, or talking to one another. I’d have a cognitive assistant in the walls.”

One of major groundbreakers is going to be our ability to truly converse with artificial intelligence embedded in the fabric around us—this will be far bigger than people realize right now.

— Arvind Krishna, Senior Vice President of Hybrid Cloud and Director of IBM Research

“Think about ten years from now, when I can walk into the house and basically just talk and control all kinds of machines and devices,” says Michigan’s Singh. “And I can tell my washing machine to just run at 8:00 pm, or ask ‘when did my daughter come home? When did she leave? Did she seem happy? Who was she with when she went out?’”

What’s clear to the experts as they look to both the near and distant future of a contextual and omnipresent AI, is that the relationship between humans and computers will remain interactive and collaborative.

“The AI systems that we’re developing are going to have strengths and weaknesses,” says IBM’s Campbell. “And people have strengths and weaknesses. So the biggest impact will come from figuring out the best way to have people and computers working well together.”

Overlay content can include standard text or links/buttons. The overlay closes if a user clicks off of the overlay window or clicks the "X" button.