With HEIDL, Humans and Machines Can Work Together to Cut Through Legalese

Share this post:

Despite legions of capable legal teams, contract reviews create bottlenecks in everything from sales to supply chains. This important work has the potential to be made less tedious and more efficient with the help of natural language processing (NLP). Building on more than a decade of NLP research at IBM Research-Almaden, our team set out to develop a tool that can help machines understand legalese, and make AI both explainable and adaptable for lawyers and laypeople alike.

At the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), we will present “HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop,” demonstrating a model that makes it easier and much faster for people to review the effectiveness of natural language labels generated by a deep learning model trained on human-labeled data.

This collaboration between IBM Research AI and the University of Michigan, explores an alternative to other human-in-the-loop paradigms such as Active Learning that bring expert humans into the loop twice: first to label the training data, then to analyze and improve the machine-learned model. This is done by surfacing semantic linguistic structures in an intuitive, drag-and-drop, browser-based interface that simplifies the interpretation and updating of linguistic expressions by subject matter experts.

In this case, the data was labeled by IBM attorneys, who combed through 20,000 sentences in nearly 150 contracts to annotate phrases related to key clauses like termination, communication and payment. This relatively sparse data was used to train an NLP model surfaced in a new human-in-the-loop machine learning prototype we call HEIDL, short for Human-in-the-loop linguistic Expressions wIth Deep Learning.

HEIDLHEIDL ranks machine-generated linguistic expressions by precision and recall. Precision is a measure of how well an expression identifies relevant clauses in the contract without false positives. Conversely, recall is a measure of how effective an expression is at capturing relevant clauses without false negatives. In other words, a more precise expression makes fewer mistakes and can better recall more clauses. HEIDL users can limit the set of acceptable expressions by setting minimum precision and recall thresholds, and then can immediately observe the effects of such alterations.

The ultimate goal is to gain an in-depth understanding of legal contracts automatically. One step is to take contracts, which are usually outfitted with complex language, and automatically label each sentence with one or more useful labels. So, without having to read the entire contract, I might want to see all the sentences that relate to a termination clause, which may be expressed using a variety of legal language, and therefore assign them a label of ‘termination’.

Lawyers need to have trust in machine-generated labels. Therefore, it is equally important that users understand the reasoning behind each linguistic expression, and what effect altering the expression would have on which clauses would be identified. Our demonstration at ACL takes as an example the task of identifying contract clauses related to communication between parties. Using HEIDL, users can dig deeper into the linguistic expressions for identifying communication clauses and explore how modifying the components of each expressions impacts precision and recall.

Verbs such as notify, inform, contact or respond may signal the presence of a communication clause in a legal contract. However, any kind of simple keyword matching inevitably leads to a large number of false positives. A robust linguistic expression might, for example, identify instances of such verbs in the context of the manner of agreed upon communication between parties, such as immediately, electronically and [in] writing. Users can explore the components of this expression and immediately observe the effect of ignoring the verb context, or adding/omitting any of the relevant terms.

Even a simple further exploration highlights the prodigious value of explainability and the pitfalls of trusting a model blindly, or optimizing for what seems to be a good metric. The machine-generated model suggests taking the above linguistic expression and further limiting it to sentences where the verb is in the passive voice, as this improves the precision to 100 percent and does not affect recall. A black-box model would unquestionably choose to use this condition. However, a person interacting with HEIDL can interpret this suggestion and realize that while this may be the case for the provided training data, there is no reason why in general, communication clauses must necessarily be in the passive voice. In this case, taking the machine’s suggestion would result in a less generalizable model that is overfitted to the training data that happened to be provided. Without this visibility into how and why a particular condition affects a linguistic expression, it likely that the ‘passive voice’ condition would be included in a machine-constructed algorithm. And in an opaque deep learning model without explainability, a human might never realize it.

With HEIDL, we did not just seek to make the AI explainable. We also aimed to make AI adaptable based on human input. In testing with HEIDL, our data scientists could identify an average of seven rules that automatically labeled nearly 150 contracts with high precision and recall in about 30 minutes. It would have taken them a week or more to build similarly effective linguistic expressions manually for that many contracts.

This approach makes deep learning-generated linguistic expressions more transparent and explainable, so that experts — in this case attorneys and others who review contracts — and engineers can easily interpret and refine the expressions. The resulting symbiotic models surpass the performance of human or machine algorithms alone, and the transparency of the generated expressions means the models will be robust, generalizable, and trustworthy.

Presentation Schedule

Demo Session 2 (BASILICA): Tuesday July 30, 13:50–15:30

Principle Research Staff Member

More AI stories

IBM researchers investigate ways to help reduce bias in healthcare AI

Our study "Comparison of methods to reduce bias from clinical prediction models of postpartum depression” examines healthcare data and machine learning models routinely used in both research and application to address bias in healthcare AI.

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading