Overcoming Challenges in Building Enterprise AI Assistants

Share this post:

Advances in Artificial Intelligence (AI) have made it easy for us to ask AI assistants — whether on our phones or home devices — such simple questions as, “What’s the weather like, today?” However, in enterprise settings, the goal is to develop end-to-end dialog systems that automate some interactions and also connect to human intervention when necessary. This objective becomes much more challenging in group chat settings or in the case of out-of-context conversations.

Our team of researchers from IBM Research AI and AI Horizons Network-partner the University of Michigan published the papers “A Large-Scale Corpus for Conversation Disentanglement” and “Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use” at the Association for Computational Linguistics (ACL) conference (ACL 2019). This work address two main challenges in existing state-of-the-art AI assistants by:

  1. developing robust data-driven methods for conversation disentanglement, and
  2. proposing an end-to-end trainable method for neural goal-oriented dialog systems, which handles new user behaviors at deployment by transferring the dialog to a human agent intelligently.

Challenge #1: Conversation Disentanglement

When a group of people communicate in a common channel, multiple conversations occur concurrently. These convoluted conversations need to be disentangled before they are consumed by an AI assistant. Although Conversation Disentanglement is an important problem, it has been understudied because of the lack of public, annotated datasets.

With this work, we introduce the first large-scale manually annotated dataset of 77,563 messages with reply-structure graphs that disentangle conversations and define internal conversation structure. The dataset contains 74,963 messages from the #Ubuntu Internet Relay Chat (IRC) channel, and 2,600 messages from the #Linux IRC channel. This corpus is 16 times larger than all previously released datasets combined, and the first to include context and adjudicated annotations.

The following is an example from the data, with annotations marked by edges and colors:

enterprise AI assistants

#Ubuntu IRC log sample, earliest message first. Curved lines are our graph annotations of reply structure, which define two conversations shown with blue solid edges and green dashed edges.

In this example, each message can be represented by a node; an edge indicates that one message is a response to the other. Each connected component corresponds to a disentangled conversation.

Some common behavior of single channel multi-party conversation is seen in the above log sample. For example, BurgerMann receives multiple responses from multiple people for one message. We also see two of the users, delire and Seveas, simultaneously participating in two conversations.

Annotating the #Linux data enables comparison with Elsner and Charniak (2008) [1], while the #Ubuntu channel, which contains over 34 million messages, makes this an interesting large-scale resource for dialogue research. We performed the first empirical analysis of Lowe et al.’s (2015 [2], 2017 [3]) widely used, heuristically-disentangled conversations on #Ubuntu channel. One of the key findings is that only 20 percent of the conversations their method produces are true prefixes of conversations.

The models we developed have already enabled new directions in dialogue research, providing disentangled conversations for DSTC 7 track 1 (Gunasekara et al., 2019 [4]; Yoshino et al., 2018 [5]) and for the currently running DSTC 8 track 2. This work fills a key gap that has limited research, providing a new opportunity for understanding synchronous multi-party conversations online.

Challenge #2: New end-to-end dialog system, which can handle new user behavior at deployment with human-agent intervention

Neural end-to-end goal-oriented dialog systems show promise to reduce the workload of human agents for customer service, as well as reduce wait time for users. However, their inability to handle new user behavior has limited their usage for practical applications. In this work, we propose a new end-to-end trainable method for neural goal-oriented dialog systems, which handles new user behaviors by intelligently transferring the dialog to a human agent.

The dialog system can automatically identify a new user behavior during deployment that the system might fail at and transfer the task to a human agent, such that the user’s task is completed without any issue. At the same time, the dialog system also learns from the human agent’s response to handle that new user behavior in the future. Our method also allows designers of AI-assistants to choose the trade-off between maximizing their users’ task success and minimizing the workload on human agents. The proposed method has three goals:

  1. maximize the user’s task success by transferring to human agents,
  2. minimize the load on the human agents by transferring to them only when it is essential, and
  3. learn online from the human agent’s responses to reduce human agents’ load further.

We evaluate our proposed method on a modified-bAbI dialog task that simulates the scenario of new user behaviors occurring at test time.

Figure 2: Modified-bAbI dialog task. A user (in green) chats with a dialog system (in blue) to book a table at a restaurant.

Figure 2: Modified-bAbI dialog task. A user (in green) chats with a dialog system (in blue) to book a table at a restaurant.

Figure 2 shows a dialog sample from modified-bAbI dialog tasks, an extension of original-bAbI dialog tasks (Bordes et al., 2017) [6]. We modify the original-bAbI dialog tasks by removing and replacing certain user behaviors from the training and validation data. The test set is left untouched. This simulates a scenario where some new user behaviors arise during the test (deployment) time that were not seen during the training and hence allows us to test our proposed method. This also mimics real-world data collection via crowdsourcing in the sense that certain user behavior is missing from the training data.

Figure 3: Left: A single layer version of memN2N [6] (our baseline model. M). Right: Proposed Method

Figure 3: Left: A single layer version of memN2N [6] (our baseline model. M). Right: Proposed Method

Our proposed method is shown in Figure 3 (right). Consider a neural dialog model (M) trained for a goal-oriented dialog task. We also have a human agent (H) who is trained for the same task. Both (M) and (H) can take the dialog as input and produce the response for the user utterance (u). There is a neural classifier (C) which uses the dialog state vector (s) from the model (M) as input and decides whether to use the model (M) to provide a response to the user, or to transfer to the human agent (H) who could then provide the response to the user.

Our proposed method provides a new framework for learning and training goal-oriented dialog systems for the real world. The proposed method allows us to maximize user success rate by minimally using human agents instead of the dialog model for cases where the model might fail. Our evaluation on the modified-bAbI dialog task shows that our proposed method is effective in achieving the desired goals. Our method allows the designer to determine the trade-off between a user’s desired task success and human agent workload. We believe this opens up a new and promising research direction that could soon spark an increase in the use of end-to-end goal-oriented dialog systems in the real world.

The TALC-accepted paper, “Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use,” will be part of Oral Presentations 2 on Monday, July 29 at 14:50.

To learn more about the paper, “A Large-Scale Corpus for Conversation Disentanglement,” come visit us at ACL 2019 Poster session 5A: Dialogue and Interactive Systems (ARSENALE) on Tuesday July 30, 13:50–15:30.


[1] Elsner, Micha, and Eugene Charniak. “You talking to me? a corpus and algorithm for conversation disentanglement.” Proceedings of ACL-08: HLT. 2008.

[2] Lowe, Ryan, et al. “The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems.” arXiv preprint arXiv:1506.08909 (2015).

[3] Lowe, Ryan Thomas, et al. “Training end-to-end dialogue systems with the ubuntu dialogue corpus.” Dialogue & Discourse 8.1 (2017): 31-65.

[4] Gunasekara, Chulaka, et al. “DSTC7 Task 1: Noetic End-to-End Response Selection.” (2019).

[5] Yoshino, Koichiro, et al. “Dialog System Technology Challenge 7.” arXiv preprint arXiv:1901.03461(2019).

[6] Bordes, Antoine, Y-Lan Boureau, and Jason Weston. “Learning end-to-end goal-oriented dialog.” arXiv preprint arXiv:1605.07683 (2016).



Research Software Engineer - Deep Learning for Dialog

Chulaka Gunasekara

Research Staff Member - Implicit Learning for Dialog, IBM Research

Luis Lastras

Distinguished Research Staff Member and Senior Manager, IBM Research

More AI stories

Exploring quantum spin liquids as a reservoir for atomic-scale electronics

In “Probing resonating valence bond states in artificial quantum magnets,” we show that quantum spin liquids can be built and probed with atomic precision.

Continue reading

Fine-grained visual recognition for mobile AR technical support

Our team of researchers recently published paper “Fine-Grained Visual Recognition in Mobile Augmented Reality for Technical Support,” in IEEE ISMAR 2020, which outlines an augmented reality (AR) solution that our colleagues in IBM Technology Support Services use to increase the rate of first-time fixes and reduce the mean time to recovery from a hardware disruption.

Continue reading

Using SecDevOps to design and embed security and compliance into development workflows

IBM Research has initiated focused efforts called Code Risk Analyzer to bring security and compliance analytics to DevSecOps. Code Risk Analyzer is a new feature of IBM Cloud Continuous Delivery, a cloud service that helps provision toolchains, automate builds and tests, and control quality with analytics.

Continue reading