User interfaces to artificial intelligence (AI) applications are increasingly using natural language. But while today’s chatbots and AI agents can recognize natural language, like English or Spanish, they still cannot engage with humans in natural conversation. For example, virtual agents often give away the punchline of their own jokes.
1 Human: Tell me a joke.
2 Agent: Why can’t you trust atoms? They make up everything.
Systems today generally use a 2-turn, simple command paradigm: the human’s turn followed by the chatbot’s turn. If the user attempts to do something that requires more than a single response, like a question-answer joke, the chatbot will attempt to cram multiple turns into one. While this paradigm may suffice for a simple voice control interface (e.g., “Play Lana Del Rey”, “Tune radio to 88.5 FM” or “Find directions to the nearest gas station”), it cannot support a conversational interface, which often require more than two turns.
1 Human: Tell me a joke.
2 Agent: Why can’t you trust atoms?
3 Human: They’re too small?
4 Agent: Nope.
5 Human: I don’t know.
6 Agent: They make up everything.
7 Human: haha
A conversational interface requires the agent to remember the current context across turns. Not only does this enable AI agents to tell jokes or handle complex requests, it also enables users to use minimal turns by building off of that context (e.g., “what about Muse?” or “how about 106.7?”). In addition, remembering context is necessary for performing a variety of conversation management actions, such as repairing normal troubles in hearing or understanding (e.g., “say again,” “what do you mean?” or “what do you mean by playlist?”). It’s not a natural conversation if the user must speak in complete sentences or if the agent cannot clarify what it just said.
Watson Assistant (previously known as Watson Conversation) has supported multi-turn, context-persistent conversation since the beginning. Now, our team at IBM Research–Almaden is building the next generation of conversational AI interfaces by applying conversation science to user experience (UX) design. Today’s UX design deals largely with visual interfaces, such as desktop, web, or mobile. But patterns of visual interaction are not so useful for conversational interfaces wherein the user experience consists primarily of sequences of utterances. Instead, formal knowledge of human conversation is needed. So we are applying patterns of natural human conversation, as specified by the field of Conversation Analysis, to conversational UX patterns.
Conversation analysts have been documenting naturally occurring forms of human talk-in-interaction for over 50 years. However, they write primarily for a social science audience. Consequently, the conversation analysis literature is not easily accessible by UX designers. To bridge this disciplinary gap, we are collaborating across our respective sociology and UX design to adapt this sociological literature for the purposes of UX design.
We have created a Natural Conversation Framework for conversational UX design. It provides a library of generic conversational UX patterns that are independent of any particular technology platform and that are inspired by natural human conversation patterns documented in the Conversation Analysis literature. The framework consists of four main components: 1) an Interaction Model, 2) Conversation Navigation, 3) Common Activities and 4) Sequence Metrics.
The interaction model of natural conversation consists of expandable action pairs. For example, when one extends an invitation, the acceptance or declination of that invitation may come in the next turn, or it may not. The action pair, invitation-acceptance/declination may be expanded in the middle, (e.g., “What time?”, “Where to?”, “Who else is going?”), before the invitation (e.g., “Are you busy tonight?”) or after the acceptance/declination (e.g., “Great!”, “Maybe next time.”). So the conversational system must support action pairs that can be expanded as needed by different users, as they probe for details or seek clarification.
Different types of user interfaces, e.g., command line, graphical, mobile, etc., require different navigation methods, whether it is changing directories, dragging icons, or tapping the Home button. But how should they navigate a conversational interface? In the Natural Conversation Framework, users can rely on 6 actions that they can perform at any time to get around the application: What can you do? (Capability Check), What did you say? (Repeat), What do you mean? (Paraphrase), Okay/Thanks (Sequence Close), Never mind (Sequence Abort) or Goodbye (Conversation Close). Any conversational system should be able to talk about its own capabilities, repeat its prior utterance, paraphrase its prior utterance, move onto a new topic, recognize an attempt to abort a failed sequence and end the current conversation.
The framework also contains reusable UX patterns for Common Activities, such as conversation opening and closing, inquiring, order placing, troubleshooting, quizzing, storytelling and more. Finally, the framework provides a novel set of Sequence Metrics which enable analytics about the effectiveness and efficiency of the conversation itself.
We have applied the Natural Conversation Framework to over 20 conversational agents to date. United Airlines recently demonstrated the technology at the THINK Conference in Las Vegas, in a virtual travel agent application. The Natural Conversation Framework is implemented using Watson Assistant.
You can read much more about this emergent field at the Conversational UX Design project. And you can join me, Bob, for my tutorial in Las Vegas at HCI International in July . We would love to have you join the conversation.