AI

Claims data quality critical for analyzing insurance fraud

Share this post:

As the old saying goes “garbage in, garbage out.” Applying this mentality regarding the quality of the data is critical when building fraud risk detection systems.  While it may seem like a no-brainer, the characteristics of typical insurance data increase the requirement to perform basic data hygiene before any analytics can be executed.

A critical step in analyzing insurance claims data

Forms for the submission of insurance claims often follow the CMS1500 or AB04 templates for individual practice and institution-based filings.  Although it is 2020, many insurance claims are still initially submitted via paper-forms where the information must be manually entered into the system.  Existing claim processing systems can be configured to support a set of rules that often result in duplicate records due to the manual data entry processes.  Even an automated solution such as “OCR’ing the claim” results in duplicating data across various systems.  This duplication causes several unfortunate characteristics in the claims data.  

One of the consequences of this type of data entry from paper forms is the accuracy of the data being submitted.  Transcribing, even “OCR’ing” from hand-written forms is error-prone.  The manual entry process is then compounded with the fact there is often no lookup or resolution at time of the entry as to whether information already exists within the system.  This leads to the creation of similar but separate unique records.  Unfortunately, this indirectly leads to “getting lost in the crowd,” as the sheer volume of claims increases, and it becomes harder to notice suspicious behavior. Fraudsters are aware of this phenomenon and often use it to their advantage. The Financial Crimes Insight for Claims Fraud product specializes in helping insurance companies find fraud, waste, and abuse. This article explores the product’s knowledge gained regarding a critical step in the analysis of insurance claims data.

Out-of-the-box data hygiene operations

Financial Crimes Insight for Claims Fraud, or FCII, performs three out-of-the-box data hygiene operations; data cleanliness, data de-duplication, and data resolution.

  • Data Cleanliness 

FCII treats the first data hygiene operation, data cleanliness, as not only ensuring the data is valid and standardized, but also help mitigate any potential data errors. In cleaning the data, FCII guides the process of correcting missing values and making sure a given dataset is complete enough to yield analytic insights.  For instance, it is typical to validate an address by employing various Natural Language Processing parsing and geo-coding techniques and comparing results to commercially available data sources pertaining to that location. This public data may include metadata, images, and geopolitical data among others.  FCII goes further by validating the quality of the source columns that will be used by the analytics downstream in the pipeline.  If FCII detects sparsity, or a lack of data, within a specific dimension it will not generate the corresponding features that are reliant on that data.  This accelerates the model training and scoring process and reduces the negative impact that may occur from very sparse datasets.

  • Data de-duplication

The second data hygiene step, data duplication, is enacted to address the re-entry of the same data. Common examples of duplicated data are names and addresses.  If the name and address of a patient are re-entered or re-created each time a claim is addressed, there may be tens of records that refer to the same individual. Consider a medical provider, where this duplication of data easily creates an issue of “several orders of magnitude,” resulting in the attempted processing of that duplicated data.  Instead of processing 20 million individual records as expected, the provider is instead potentially processing 200 million or more. Many of these duplications identify the same information, including name, rank, and serial number.  This is common in the insurance domain and directly affects the performance of analytics applied to the data.  FCII automatically performs a de-duplication on common data types such as name, address, and identifications, providing immediate model performance enhancements.

  • Data resolution

FCII’s third data hygiene step targets the resolution of various data types.  Common fields processed in this approach include name and address.  Consider a scenario in which a claim is filled out as “Robert Smith”, only to be populated on another form as “Rob Smythe”.  At first these may appear to be different patients, but if the address is the same, and perhaps the SSN and driver license too, we can expect that these two patients are indeed the same individual.  While the actual entity resolution algorithms require much more significant logic, this understanding of identifying “who-is-who” matches is critical in fraud detection.  In addition to names and addresses, this resolution approach also extends to vehicles, vessels, and property or any other insured element. As such, it is critical to understand the ‘who’ or ‘what’ that is being insured.  FCII analytics use both the unique reference as well as the resolved entity in all downstream analytics.

While no product can completely remove data quality issues, there are a set of data readiness steps that are especially critical for any analysis of fraud, waste or abuse. These steps are critical for both insurance companies and any financial services organization.  FCII provides these, and other out-of-the-box analytics, without the need for lengthy data prep or data analysis before achieving ROI.

IBM Financial Crimes Insight for Claims Fraud is part of the IBM RegTech regulatory compliance solutions that are designed to help financial institutions better meet their regulatory monitoring, reporting, compliance and risk management needs. Learn more about IBM Financial Crimes Insight for Claims Fraud.

This blog is co-authored by Eliza Salkeld – Data Scientist, IBM Cloud and Cognitive Software

Learn more about IBM Claims Fraud solutions at FCI for Claims Fraud  

Sr Cert ITS; Detection Architect, Watson Financial Crimes Insight IBM Cloud and Cognitive Software

More stories

Key challenges and priorities for GRC leaders in 2021

As enterprises move their critical workloads to cloud and regulators tighten the norms in the wake of security breaches, the job of Governance, Risk and Compliance (GRC) professionals has become increasingly important and extremely difficult at the same time. We inspect the escalating cost pressures and reflect on some of the key priorities that GRC […]

Continue reading

Analytics at Work in Detecting Insurance Fraud

With analytic techniques such as business rules, statistical models, and machine learning it can be difficult to understand the role of each approach in identifying fraud. Analytical techniques used in identifying fraud There are a variety of techniques that are used to detect fraud, many of which fall under the umbrella of business rules, statistical […]

Continue reading

IDC ranks IBM #1 FinTech in Top 25 Enterprise category for fourth consecutive year

Based on its research and market analysis, IDC Financial Insights announced its annual FinTech Rankings 2020, recently, in two categories – The IDC FinTech Rankings Top 100 and Top 25 Enterprise. We’re proud to share that IBM is identified #1 FinTech in the Top 25 Enterprise category for the fourth consecutive year in the annual […]

Continue reading