Four Papers Advance Computational Argumentation in IBM’s Project Debater

Share this post:


At an event in February 2019 moderated by Intelligence Squared U.S.’s John Donvan (left), AI system IBM Project Debater (center) and world champion debater, Harish Natarajan (right), debated the topic ‘We Should Subsidize Preschool.’ (credit: Visually Attractive for IBM)

IBM Project Debater is the first artificial intelligence (AI) system that can debate humans on complex topics. Following years of development, Project Debater successfully engaged in a live debate with human debate champion Harish Natarajan on February 11, 2019 at the IBM Think conference in San Francisco (watch a replay). Mastering language is a great challenge for AI, and Project Debater is an important milestone on this journey

This significant research effort has resulted in over 30 scientific papers and many datasets. The latest work on computational argumentation from our group is being presented at the ACL 2019 conference. Three papers will be presented at the main conference [1][2][3] and one more paper will be presented in the co-located Argument Mining Workshop [4].  Project Debater is based on three core innovations: data-driven speech writing and delivery, listening comprehension and modeling human dilemmas. In the following, we discuss these innovations and how our latest work contributes to each of them.

Data-driven speech writing and delivery

In the debate replay, you can hear Project Debater presenting claims in favor of subsidizing preschools, backing them up with pieces of evidence, citing authoritative figures, and presenting supporting studies and data. The system also uses humor and summarizes the main themes — all to make compelling and interesting speeches.

Speech generation starts with digesting a corpus of newspaper articles containing nearly 10 billion sentences. Out of this massive body of text, the system needs to pinpoint a couple dozen arguments that are related to the debate topic [5][6][7].

Next, it identifies arguments that support its side in the debate [8], selects the most convincing ones, and arranges them into a well-structured speech. Finally, it aims to deliver speeches in a clear and purposeful manner to keep audience attention [9]. Two of our ACL papers contribute to this part of the project.

The work described in “Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network,” [1] aims to identify which of the detected arguments are more convincing.  To this end, we created a dataset — IBM-EviConv — of evidence sentence pairs, each with a binary label pointing at the more convincing one. An important feature of this dataset is that it emphasizes pair homogeneity, e.g. compared elements do not differ in language style or length. Neutralizing trivial factors such as length makes the dataset more challenging and offers learning algorithms a better chance to reveal deeper convincingness signals.

To identify the more convincing arguments, we propose our version of a Siamese Network, which outperforms state-of-the-art methods. The proposed network is trained over pairs of arguments, but assigns scores for single arguments at inference time. Thus, it can identify the more convincing arguments with a single pass over the list of detected arguments.

In the second work, “From Surrogacy to Adoption; From Bitcoin to Cryptocurrency: Debate Topic Expansion,” [2], we introduce the task of debate topic expansionfinding related topics that can enrich our arguments and strengthen our case when debating a given topic. For example, when discussing the pros and cons of the presidential system, it is natural to contrast it with those of the parliamentary system. When debating alternative medicine, we may discuss specific examples, such as homeopathy and naturopathy. Conversely, when discussing bitcoins, we can speak more broadly on cryptocurrency. Debate topic expansion can enhance the coverage of existing argument mining methods by matching relevant arguments that do not mention the given topic explicitly.

We propose a three-step method for finding such expansions. First, we extract expansion candidates from large corpora using a predefined set of patterns.  Next, we apply a set of filters to the extraction results. Finally, we employ supervised classification to identify good expansions amongst the remaining candidates. Our best-performing method combines traditional feature-based classification, for which we introduce a novel set of features, and a deep neural network, which is trained by distant supervision.

Listening comprehension

In addition to presenting one side of the debate, engaging in a competitive debate further requires Project Debater to effectively rebut arguments raised by the human opponent. The system must listen to an argumentative speech in real-time, understand the main arguments, and produce persuasive counter-arguments.

The nature of the argumentation domain and the characteristics of competitive debates make the understanding of such spoken content challenging. Expressed ideas often span multiple, non-consecutive sentences and many arguments are alluded to rather than explicitly stated. Further difficulty stems from the requirement to identify and rebut the most important parts of a speech that is several minutes long. This contrasts with today’s conversational agents, which aim at understanding a single functional command from short inputs.

In “Towards Effective Rebuttal: Listening Comprehension Using Corpus-Wide Claim Mining,” [4] we describe a listening comprehension approach that is based on anticipating the opponent’s arguments. It starts by automatically mining, in advance, a set of claims that are both relevant to the discussion and support the opponent’s side. This set is then matched against the actual live speech. This paper is accompanied by a large-scale, high-quality dataset containing hundreds of speeches recorded of expert human debaters discussing various controversial topics, along with an annotation layer specifying the mined claims mentioned in each speech.

Modeling human dilemma

When we started consulting with professional debaters for the project, we realized that the kind of arguments they use tend to be different from those retrieved by the system. This was to be expected, as our system could cite studies and experts, which human debaters are usually unaware of. Since our ultimate motivation for the system is not to beat humans, but to assist them, this difference in argumentation strategy is aligned with the goal of augmenting human ability.

Yet, it was interesting to try and analyze how humans debate. We observed that it is common for human debaters to use principled or commonplace arguments — arguments that can be made in the context of a variety of topics. This can be done when one is not very familiar with the details of the topic and would rather debate the underlying principle. Alternatively, when one feels that an underlying principle is generally accepted, one can build a persuasive argument by starting from this principle and then exemplify how it is applicable to the debate topic.

While this observation was apparent to debaters we consulted with, they could not point to a manual of such commonplace arguments. Indeed, in “Argument Invention from First Principles” [3] we find it rather challenging to formalize this intuition. Specifically, we show that it is possible to come up with a small set of non-trivial principled arguments such that one can find a relevant principled argument for most topics in this set. Moreover, we show a classification algorithm that does so automatically with high precision.

These principled arguments are a complementary source to arguments mined from newspaper articles. Moreover, when one identifies a commonplace principle relevant to a debate, one can use it to generate more than just arguments. In fact, with Project Debater we use this knowledge to select a text for framing the topic, for adding relevant quotes, and even for including some humor. For example, one of the classes revolves around the attitude towards new technology. When the system identifies that the topic of the debate is relevant to this class, it may support a negative stance by using arguments such as “The fact is that nobody knows how reliable this new technology is, if at all,” as well as quotes such as this one from Aldus Huxley: “Technological progress has merely provided us with more efficient means for going backwards.”

Paper Presentation Schedule

Main conference ([1],[2],[3]): July 29th, session 2D (Hall 2, 13:50-15:30).

Argument Mining Workshop ([4]):  August 1st, 14:00-15:30.


[1] Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network. ACL, 2019.  Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov and Noam Slonim

[2] From Surrogacy to Adoption; From Bitcoin to Cryptocurrency: Debate Topic Expansion. ACL, 2019.  Roy Bar-Haim, Dalia Krieger, Orith Toledo-Ronen, Lilach Edelstein, Yonatan Bilu, Alon Halfon, Yoav Katz, Amir Menczel, Ranit Aharonov and Noam Slonim

[3] Argument Invention from First Principles. ACL, 2019.  Yonatan Bilu, Ariel Gera, Daniel Hershcovich, Benjamin Sznajder, Dan Lahav, Guy Moshkowich, Anael Malet, Assaf Gavron and Noam Slonim

[4] Towards Effective Rebuttal: Listening Comprehension Using Corpus-Wide Claim Mining. 6th Workshop on Argument Mining, 2019. Tamar Lavee, Matan Orbach, Lili Kotlerman, Yoav Kantor, Shai Gretz, Lena Dankin, Shachar Mirkin, Michal Jacovi, Yonatan Bilu, Ranit Aharonov and Noam Slonim

[5] Context Dependent Claim Detection. COLING, 2014. Ran Levy, Yonatan Bilu, Daniel Hershcovich, Ehud Aharoni, and Noam Slonim

[6] Show Me Your evidence – an Automatic Method for Context Dependent Evidence Detection. EMNLP, 2015. Ruty Rinott, Lena Dankin, Carlos Alzate Perez, Mitesh M. Khapra, Ehud Aharoni, and Noam Slonim

[7] Will it blend? Blending Weak and Strong Labeled Data in a Neural Network for Argumentation Mining. ACL, 2018. Eyal Shnarch, Carlos Alzate, Lena Dankin, Martin Gleize, Yufang Hou, Leshem Choshen, Ranit Aharonov, and Noam Slonim

[8] Stance Classification of Context-Dependent Claims. EACL, 2017. Roy Bar-Haim, Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha and Noam Slonim

[9] Word Emphasis Prediction for Expressive Text to Speech. Interspeech, 2018. Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev, David Konopnicki

More AI stories

Introducing the AI chip leading the world in precision scaling

We’ve made strides in delivering the next-gen AI computational systems with cutting-edge performance and unparalleled energy efficiency.

Continue reading

IBM’s AI goes multilingual — with single language training

At AAAI, our team presented two new multilingual research techniques that enable AI to understand different languages while only trained on one.

Continue reading

IBM researchers check AI bias with counterfactual text

Our team has developed an AI that verifies other AIs’ ‘fairness’ by generating a set of counterfactual text samples and testing machine learning systems without supervision.

Continue reading