AI
Delivering effective analytics with limited or no ground truth
September 24, 2020 | Written by: Bob Patten
Categorized: AI | Cloud | FinTech | IBM RegTech Innovations
Share this post:
Insurance companies are continually subjected to questionable claims, whether that be actual fraud, waste, or just abuse. Insurance fraud in the U.S. alone represents a USD 32 billion in P&C and USD 84 billion in health care costs per year loss to insurance companies. Each carrier has tens and even hundreds of thousands of claims processed, yet the fraudulent claims are actually a small fraction of the total. This leads to highly unbalanced datasets with sparse data that makes fraud detection especially hard.
Combine that with the fact that new schemes are constantly emerging for which there is no available ground truth until well after a scheme is successfully implemented. This leaves insurance companies at a disadvantage.
Anti-fraud AI and machine learning processes
In typical AI and machine learning (ML) processes, the detection model is trained on a set of labeled data, which annotates whether the claim should be considered fraudulent or not. Since the dataset with this information is highly imbalanced, it requires different techniques than a “simple AI/ML model”. The IBM Financial Crimes Insight for Claims Fraud 6.5.1 product specializes in helping insurance companies find fraud, waste, and abuse. In this article I discuss a couple of the analytical techniques that we use given the lack of ground truth characteristic of the data.
One of the techniques we employ is that of using clustering models to analyze the data across a variety of dimensions. The claims are segmented by claim and/or party characteristics into different populations that are expected to behave differently. Subsequently, a cluster model for each segment is created that identifies “micro-clusters” within the population based on actual behavioral patterns in the claims data. Finally, known outcomes such as referrals and investigations are overlaid onto these micro-clusters to provide additional insight into claims that lack definitive outcomes. The final result is a set of features and feature values that are used to help compensate for the lack of ground truth on specific claims.
Auto-encoding is another technique that the Financial Crimes Insight solution (FCI) features. This technique actually takes advantage of the unbalanced nature of the data. It’s a type of artificial neural network that is trained on valid claims, which can be ones investigated and determined to be valid or optionally include claims that were never investigated, thereby exploding the amount of training data available.
A systematic approach to fraud detection
Without going into great detail, auto-encoding is a process by which the data is mathematically simplified and then reconstructed back into its (nearly) original state. Since the model was trained on valid claims, a fraudulent claim will typically result in high reconstruction errors, which provides clear fraud signals to downstream ensemble models performing the final fraud assessment.
These techniques are part of a systematic approach to fraud detection that combines multiple supervised and unsupervised learning methods. It leverages features created using traditional fraud indicators from the data, multiple deterministic techniques and enhanced statistical methods.
In addition to detecting known patterns, this FCI combined approach allows the system to potentially discover emergent fraudulent patterns that are not yet well established. This early detection in the claim lifecycle allows us to expedite the suspicious behavior alert to the appropriate insurance investigator or analyst.
Preventing insurance fraud webinar replays on demand
The earlier in the lifecycle that a valid alert can be raised, the better chance the insurance company will have at stopping the attempt and mitigating losses. For more about the latest analytic techniques to combat claims fraud, check out the IBM webinar replays on demand at COVID-19: Responding to the threat of fraudulent claims and How can the Insurance Fraud Industry stay ahead of Insurance Fraud in the new normal?
IBM Financial Crimes Insight for Claims Fraud is part of the IBM RegTech regulatory compliance solutions that are designed to help financial institutions better meet their regulatory monitoring, reporting, compliance and risk management needs. Learn more about IBM Financial Crimes Insight for Claims Fraud.

Sr Cert ITS; Detection Architect, Watson Financial Crimes Insight IBM Cloud and Cognitive Software
Key challenges and priorities for GRC leaders in 2021
As enterprises move their critical workloads to cloud and regulators tighten the norms in the wake of security breaches, the job of Governance, Risk and Compliance (GRC) professionals has become increasingly important and extremely difficult at the same time. We inspect the escalating cost pressures and reflect on some of the key priorities that GRC […]
Analytics at Work in Detecting Insurance Fraud
With analytic techniques such as business rules, statistical models, and machine learning it can be difficult to understand the role of each approach in identifying fraud. Analytical techniques used in identifying fraud There are a variety of techniques that are used to detect fraud, many of which fall under the umbrella of business rules, statistical […]
IDC ranks IBM #1 FinTech in Top 25 Enterprise category for fourth consecutive year
Based on its research and market analysis, IDC Financial Insights announced its annual FinTech Rankings 2020, recently, in two categories – The IDC FinTech Rankings Top 100 and Top 25 Enterprise. We’re proud to share that IBM is identified #1 FinTech in the Top 25 Enterprise category for the fourth consecutive year in the annual […]