Experian’s Business Information Services (BIS) unit spent years building a valuable and unique database of corporate relationships to better serve customers. The unit created this database of corporate hierarchies using technology to do entity matching, and people who evaluated the matches and updated them based on research and human evaluation.

This process was time-consuming and limited the number of company hierarchies the team could evaluate and match. It maintained a subset of corporate hierarchies out of a full universe of companies available to them due to the intense manual effort required. Plus, because of how much human involvement was needed, they were limited on how often they could refresh the hierarchies.

The IBM Data Science Elite team had a simple mission: apply AI to learn what Experian has done over the years building corporate hierarchies and then apply that to the full universe of companies that they traditionally couldn’t evaluate. The goal was to increase the number of corporate hierarchies and increase the frequency of corporate hierarchy matching.

The results? AI and machine learning are now helping Experian solve a problem building and maintaining business families and corporate linkages with a potential 500 percent increase in coverage and 80 percent reduction in cost.

Our team began with a discovery workshop to understand the problem. This included digging into the data with an open discussion of the current process and what they would like it to be. This included a team of business experts from Experian, plus stakeholders who understood the value of a new solution and an IBM Data Scientist Elite who could help develop a new approach. After the workshop, we put together a plan which included leveraging machine learning. The plan was to train new machine learning models using Experian’s existing, validated hierarchies. Each hierarchy is a company with thousands of sub companies matched with years of Experian expertise, intellectual property and software.

We then documented our approach, defined some agile sprints and moved to project kickoff.

Next we needed a platform – something with key open source data science and machine learning libraries that would meet Experian’s strict guidelines on encryption and key management. It would have to scale with the horsepower needed for such a complex problem. We especially needed to use GPUs, and we needed something that was quick to get started.

With that in mind, we spun up a Watson Studio environment to perform our modeling; Watson Machine Learning for our model deployment and scoring; Object Storage for Data Storage and Key Protect for Data Encryption and Security. All components were spun up on IBM Cloud for the project.

We started with the data. We uploaded several extracts which included Experian’s base data files. Those files contained corporate hierarches and relevant features such as address, city and website and many others. This added up to millions of rows of data. The team had to perform several cycles of data sampling, understanding, preparation and definition. We did this working very closely with business stakeholders.

In the next sprint, the team performed modeling, which included feature engineering, blocking and evaluation of several machine learning techniques, including binary classification algorithms, logistic regression, neural networks and recurrent neural networks (RNN).

The team determined that RNN in a binary classification achieved the best results with 95 percent accuracy. Matching the hierarchies previously took years of application and manual work. But now with a new RNN model, the model found more matches than the existing process with very good accuracy. In the final sprint, the team deployed, validated and scored additional hierarches using the IBM Watson Machine Learning deployment service.

In a few months, with the goal of scaling AI to impact all corporate hierarchies in BIS, the team had a validated an approach to a new, innovative AI system for corporate hierarchy matching. An aspect of project was to estimate the computational needs and system design for a full entity matching system. We estimated that to launch a full entity matching system with the current data and a 4-way ensemble of RNNs, Experian would initially have to train hundreds of models. This would require access to a great amount of GPU processing, and we would need to build several components that would have to interact.

We sketched a workflow for the entity matching system that we proposed to be run on IBM Cloud.

This was just the start. In a short time, the team developed 16 notebooks for data preparation, blocking, modeling and predictions. The language of choice was Python, with a heavy reliance on libraries including Pandas, NumPy and Keras with a Tensorflow backend.

The work set the BIS team on path to free them from a manual process by using AI.

Schedule a one-one-one consultation with experts who have worked with thousands of clients to build winning data, analytics and AI strategies. Visit ibm.com/analytics.

Was this article helpful?

More from Cloud

The recipe for RAG: How cloud services enable generative AI outcomes across industries

4 min read - According to research from IBM®, about 42 percent of enterprises surveyed have AI in use in their businesses. Of all the use cases, many of us are now extremely familiar with natural language processing AI chatbots that can answer our questions and assist with tasks such as composing emails or essays. Yet even with widespread adoption of these chatbots, enterprises are still occasionally experiencing some challenges. For example, these chatbots can produce inconsistent results as they’re pulling from large data…

Rethink IT spend in the age of generative AI

3 min read - It’s the burning question for today’s CIOs: what do you spend your IT budget on? Cloud costs were already a challenge—in a recent survey, 24% estimated they wasted software spend. The explosion of generative AI makes it critical for organizations to consider frameworks like FinOps and technology business management (TBM) for visibility and accountability of all tech spend. But what does this all mean in practice? How can organizations shift to a more disciplined, value-driven approach to IT spend? What…

Announcing Dizzion Desktop as a Service for IBM Virtual Private Cloud (VPC)

2 min read - For more than four years, Dizzion and IBM Cloud® have strategically partnered to deliver incredible digital workspace experiences to our clients. We are excited to announce that Dizzion has expanded their Desktop as a Service (DaaS) offering to now support IBM Cloud Virtual Private Cloud (VPC). Powered by Frame, Dizzion’s cloud-native DaaS platform, clients can now deploy their Windows and Linux® virtual desktops and applications on IBM Cloud VPC and enjoy fast, dynamic, infrastructure provisioning and a true consumption-based model.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters