We must check for racial bias in our machine learning models

By | 4 minute read | February 10, 2022

man in office typing on computer

As a data scientist for IBM Consulting, I’ve been fortunate enough to work on several projects to fulfill the various needs of IBM clients. Over my time at IBM, I have seen technology applied to various use cases that I would have never originally considered possible, which is why I was thrilled to steward the implementation of artificial intelligence to address one of the most insidious societal issues we face today, racial injustice.

As the Black Lives Matter movement started to permeate throughout the country and world in 2020, I immediately wondered if my ability to solve problems for clients could be applied to major societal issues. It was with this idea that I decided to look for opportunities to join the fight for racial equality and found an internal IBM community working on projects that were to be released through the Call for Code for Racial Justice.

Adding my two cents to TakeTwo

There were numerous projects that were being incubated within IBM but I found myself drawn to one in particular that was looking both an implicit and explicit bias. That project was TakeTwo and was to become one of the seven projects that was released as an external open source project just over a year ago. The TakeTwo project uses natural language understanding to help detect and eliminate racial bias — both overt and subtle — in written content. Using TakeTwo to detect phrases and words that can be seen as racially biased can assist content creators in proactively mitigating potential bias as they write. It enables a content creator to check content that they have created before publishing it, currently through an online text editor. Think of this like a Grammarly for spotting potentially racist language. TakeTwo is designed to leverage directories of inclusive terms compiled by trusted sources like the Inclusive Naming Initiative.

Not only did TakeTwo allow me to apply my expertise to improve the project, but it also afforded me the opportunity to look inward at some of the implicit racial biases that I may have held but have been formerly unaware of. Working on TakeTwo was great way to work on a mission that matters for the world, while also providing a chance for self-reflection.

See the solution action: 

The Data Challenge

While working on TakeTwo it became abundantly clear that although the solution aims at detecting bias by fielding and evaluating massive amounts of data, it’s important to recognize that the data itself can hold implicit bias in itself. By leveraging Artificial Intelligence and open source technologies like Python, FastAPI, JavaScript, and CouchDB, the TakeTwo solution can continue to evaluate the data it ingests, and better detect when bias exists within it. For example, one word or phrase that may be acceptable to use in the United States may not be acceptable in Japan – so we need to be cognizant of this to the best of our ability and have our solution function accordingly. As someone who is passionate about data science, I know from firsthand that our model is only as good as the data we feed it. On that note, one thing I’ve learnt from working on this project is that we need better data sets that can help us train the machine learning (ML) models that underpin these systems. Kaggle datasets has been a great starting point for us, but if we want to expand the project to take on racism wherever it exists, we’ll nee more diverse data.

On a related note, the skills needed on projects like these go way beyond just data science. Particularly for this project, it was important to leverage linguistics experts who can help define some of the cultural nuances that exist in language that a system like TakeTwo either needs to codify or ignore. Only by working cross-discipline can we get to a workable solution.

Be part of the future

The value that ML and AI bring to enhancing solutions like TakeTwo is inspiring. From hiring employees, to getting approved for a loan at the bank, ML and AI is permeating into the way we interact with one another and can help ensure we remove as much racial bias as possible for business decision-making. As technologists, we have a distinct responsibility to produce models that are honest, unbiased, and perform at the highest level possible so that we can trust their output.

You can check out how we at IBM are building better AI here. TakeTwo continues to make strides in developing and strengthening its efficacy, and you can contribute to making this open source project better. Check out TakeTwo and all of the other Call for Code for Racial Justice Projects today!


This post is part of a series during Black History Month covering the relationship between artificial intelligence and social justice.