Talking to AI – Where natural language falls short

Share this post:

User interfaces to artificial intelligence (AI) applications are increasingly using natural language. But while today’s chatbots and AI agents can recognize natural language, like English or Spanish, they still cannot engage with humans in natural conversation. For example, virtual agents often give away the punchline of their own jokes.

Today’s Chatbot
1 Human: Tell me a joke.
2 Agent: Why can’t you trust atoms? They make up everything.

Systems today generally use a 2-turn, simple command paradigm: the human’s turn followed by the chatbot’s turn. If the user attempts to do something that requires more than a single response, like a question-answer joke, the chatbot will attempt to cram multiple turns into one. While this paradigm may suffice for a simple voice control interface (e.g., “Play Lana Del Rey”, “Tune radio to 88.5 FM” or “Find directions to the nearest gas station”), it cannot support a conversational interface, which often require more than two turns.

Natural conversation
1 Human: Tell me a joke.
2 Agent: Why can’t you trust atoms?
3 Human: They’re too small?
4 Agent: Nope.
5 Human: I don’t know.
6 Agent: They make up everything.
7 Human: haha

A conversational interface requires the agent to remember the current context across turns. Not only does this enable AI agents to tell jokes or handle complex requests, it also enables users to use minimal turns by building off of that context (e.g., “what about Muse?” or “how about 106.7?”). In addition, remembering context is necessary for performing a variety of conversation management actions, such as repairing normal troubles in hearing or understanding (e.g., “say again,” “what do you mean?” or “what do you mean by playlist?”). It’s not a natural conversation if the user must speak in complete sentences or if the agent cannot clarify what it just said.

Watson Assistant (previously known as Watson Conversation) has supported multi-turn, context-persistent conversation since the beginning. Now, our team at IBM Research–Almaden is building the next generation of conversational AI interfaces by applying conversation science to user experience (UX) design. Today’s UX design deals largely with visual interfaces, such as desktop, web, or mobile. But patterns of visual interaction are not so useful for conversational interfaces wherein the user experience consists primarily of sequences of utterances. Instead, formal knowledge of human conversation is needed. So we are applying patterns of natural human conversation, as specified by the field of Conversation Analysis, to conversational UX patterns.


Bob speaking about conversational UX at IBM Research-Almaden.

Conversation analysts have been documenting naturally occurring forms of human talk-in-interaction for over 50 years. However, they write primarily for a social science audience. Consequently, the conversation analysis literature is not easily accessible by UX designers. To bridge this disciplinary gap, we are collaborating across our respective sociology and UX design to adapt this sociological literature for the purposes of UX design.

We have created a Natural Conversation Framework for conversational UX design. It provides a library of generic conversational UX patterns that are independent of any particular technology platform and that are inspired by natural human conversation patterns documented in the Conversation Analysis literature. The framework consists of four main components: 1) an Interaction Model, 2) Conversation Navigation, 3) Common Activities and 4) Sequence Metrics.

The interaction model of natural conversation consists of expandable action pairs. For example, when one extends an invitation, the acceptance or declination of that invitation may come in the next turn, or it may not. The action pair, invitation-acceptance/declination may be expanded in the middle, (e.g., “What time?”, “Where to?”, “Who else is going?”), before the invitation (e.g., “Are you busy tonight?”) or after the acceptance/declination (e.g., “Great!”, “Maybe next time.”). So the conversational system must support action pairs that can be expanded as needed by different users, as they probe for details or seek clarification.

Different types of user interfaces, e.g., command line, graphical, mobile, etc., require different navigation methods, whether it is changing directories, dragging icons, or tapping the Home button. But how should they navigate a conversational interface? In the Natural Conversation Framework, users can rely on 6 actions that they can perform at any time to get around the application: What can you do? (Capability Check), What did you say? (Repeat), What do you mean? (Paraphrase), Okay/Thanks (Sequence Close), Never mind (Sequence Abort) or Goodbye (Conversation Close). Any conversational system should be able to talk about its own capabilities, repeat its prior utterance, paraphrase its prior utterance, move onto a new topic, recognize an attempt to abort a failed sequence and end the current conversation.

The framework also contains reusable UX patterns for Common Activities, such as conversation opening and closing, inquiring, order placing, troubleshooting, quizzing, storytelling and more. Finally, the framework provides a novel set of Sequence Metrics which enable analytics about the effectiveness and efficiency of the conversation itself.

We have applied the Natural Conversation Framework to over 20 conversational agents to date. United Airlines recently demonstrated the technology at the THINK Conference in Las Vegas, in a virtual travel agent application. The Natural Conversation Framework is implemented using Watson Assistant.

You can read much more about this emergent field at the Conversational UX Design project. And you can join me, Bob, for my tutorial in Las Vegas at HCI International in July . We would love to have you join the conversation.

More AI stories

IBM RXN for Chemistry: Unveiling the grammar of the organic chemistry language

In our paper “Extraction of organic chemistry grammar from unsupervised learning of chemical reactions,” published in the peer-reviewed journal Science Advances, we extract the "grammar" of organic chemistry's "language" from a large number of organic chemistry reactions. For that, we used RXNMapper, a cutting-edge, open-source atom-mapping tool we developed.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading

Simplifying data: IBM’s AutoAI automates time series forecasting

In our recent paper “AutoAI-TS: AutoAI for Time Series Forecasting,” which we’ll present at ACM SIGMOD 2021, AutoAI Time Series for Watson Studio incorporates the best-performing models from all possible classes — as often there is no single technique that performs best across all datasets.

Continue reading