Experian’s Business Information Services (BIS) unit spent years building a valuable and unique database of corporate relationships to better serve customers. The unit created this database of corporate hierarchies using technology to do entity matching, and people who evaluated the matches and updated them based on research and human evaluation.

This process was time-consuming and limited the number of company hierarchies the team could evaluate and match. It maintained a subset of corporate hierarchies out of a full universe of companies available to them due to the intense manual effort required. Plus, because of how much human involvement was needed, they were limited on how often they could refresh the hierarchies.

The IBM Data Science Elite team had a simple mission: apply AI to learn what Experian has done over the years building corporate hierarchies and then apply that to the full universe of companies that they traditionally couldn’t evaluate. The goal was to increase the number of corporate hierarchies and increase the frequency of corporate hierarchy matching.

The results? AI and machine learning are now helping Experian solve a problem building and maintaining business families and corporate linkages with a potential 500 percent increase in coverage and 80 percent reduction in cost.

Our team began with a discovery workshop to understand the problem. This included digging into the data with an open discussion of the current process and what they would like it to be. This included a team of business experts from Experian, plus stakeholders who understood the value of a new solution and an IBM Data Scientist Elite who could help develop a new approach. After the workshop, we put together a plan which included leveraging machine learning. The plan was to train new machine learning models using Experian’s existing, validated hierarchies. Each hierarchy is a company with thousands of sub companies matched with years of Experian expertise, intellectual property and software.

We then documented our approach, defined some agile sprints and moved to project kickoff.

Next we needed a platform – something with key open source data science and machine learning libraries that would meet Experian’s strict guidelines on encryption and key management. It would have to scale with the horsepower needed for such a complex problem. We especially needed to use GPUs, and we needed something that was quick to get started.

With that in mind, we spun up a Watson Studio environment to perform our modeling; Watson Machine Learning for our model deployment and scoring; Object Storage for Data Storage and Key Protect for Data Encryption and Security. All components were spun up on IBM Cloud for the project.

We started with the data. We uploaded several extracts which included Experian’s base data files. Those files contained corporate hierarches and relevant features such as address, city and website and many others. This added up to millions of rows of data. The team had to perform several cycles of data sampling, understanding, preparation and definition. We did this working very closely with business stakeholders.

In the next sprint, the team performed modeling, which included feature engineering, blocking and evaluation of several machine learning techniques, including binary classification algorithms, logistic regression, neural networks and recurrent neural networks (RNN).

The team determined that RNN in a binary classification achieved the best results with 95 percent accuracy. Matching the hierarchies previously took years of application and manual work. But now with a new RNN model, the model found more matches than the existing process with very good accuracy. In the final sprint, the team deployed, validated and scored additional hierarches using the IBM Watson Machine Learning deployment service.

In a few months, with the goal of scaling AI to impact all corporate hierarchies in BIS, the team had a validated an approach to a new, innovative AI system for corporate hierarchy matching. An aspect of project was to estimate the computational needs and system design for a full entity matching system. We estimated that to launch a full entity matching system with the current data and a 4-way ensemble of RNNs, Experian would initially have to train hundreds of models. This would require access to a great amount of GPU processing, and we would need to build several components that would have to interact.

We sketched a workflow for the entity matching system that we proposed to be run on IBM Cloud.

This was just the start. In a short time, the team developed 16 notebooks for data preparation, blocking, modeling and predictions. The language of choice was Python, with a heavy reliance on libraries including Pandas, NumPy and Keras with a Tensorflow backend.

The work set the BIS team on path to free them from a manual process by using AI.

Schedule a one-one-one consultation with experts who have worked with thousands of clients to build winning data, analytics and AI strategies. Visit ibm.com/analytics.

Was this article helpful?
YesNo

More from Cloud

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a fictional US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

24 IBM offerings winning TrustRadius 2024 Top Rated Awards

2 min read - TrustRadius is a buyer intelligence platform for business technology. Comprehensive product information, in-depth customer insights and peer conversations enable buyers to make confident decisions. “Earning a Top Rated Award means the vendor has excellent customer satisfaction and proven credibility. It’s based entirely on reviews and customer sentiment,” said Becky Susko, TrustRadius, Marketing Program Manager of Awards. Top Rated Awards have to be earned: Gain 10+ new reviews in the past 12 months Earn a trScore of 7.5 or higher from…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters