Microsegmentation solutions for healthcare insurers

Using SPSS Statistics Base and decision trees

The healthcare industry is experiencing dramatic change that is resulting in greater access to valuable member data. This article discusses many of the technological changes in the industry, including a member-focused shift toward customization of health offerings to drive increased member satisfaction and retention. Review two segmentation-based case studies, with tips on the development process, data collection, segmentation, and deployment. Examine the amplitude of health data and how these traditional segmentation approaches remain relevant as solutions around the management of big healthcare data.


Kimberly Chulis (kim@coreanalytics.com), CEO and Co-founder, Core Analytics, LLC

Photograph of Kimberly ChulisKimberly Chulis is one of the original founders of Core Analytics, LLC. With over 18 years of professional advanced analytics experience, she's demonstrated analytic expertise on projects at several companies and industries, including WellPoint, HCSC, UHG, Great West, Accenture, Ogilvy, Microsoft, Sprint/Nextel, Commonwealth Edison, TXU, Eloyalty, SPSS, Allstate, Cendant, and others in the financial, telecommunications, healthcare, energy, nonprofit, retail, and educational sectors. Kimberly has conducted PhD research at Purdue University's Health and Human Services Consumer Behavior program, and has a Masters degree in economics with a focus on health economics and econometrics from the University of Illinois at Chicago.

26 June 2012

Also available in Portuguese

Technological advancements and healthcare trends

The healthcare industry is characterized by rapid change and technological expansion. This transformation is characterized by mergers and acquisitions, market industry domination by a handful of firms, and a potentially huge structural shift as new US federal health reform laws are rolled out. Along with the impact of new technological breakthroughs, healthcare providers are required to adopt electronic medical record systems that will enable more informed and efficient patient care from an integrated patient portal. There is also the emergence of health insurance exchanges (see Resources). These will function as markets providing a selection of individual insurance plans that follow standards complying with the new reforms. The exchanges will also serve as clearinghouses for distribution of patient auto-assignment and transferring of their information if the US insurance mandate comes into effect in 2014 (see Resources).

There has been significant change in primary aspects of the healthcare industry over the past 15 years; the United States has seen an increase in managed care, a consolidation of markets for insurance and services, and most recently the dramatic Health Care Reform Act and Affordable Care Reform Act. If the new health legislation remains in place, the demand for health insurance (by mandate) will increase dramatically in 2014. In the individual market, health insurance companies, hospitals, and providers have become much more member focused, introducing dynamic tailored benefits matching individual preferences. The healthcare players are striving for increased consumer-focused service, with a focus on wellness and prevention that is quickly becoming standard across plans. Another emerging trend is a shift from the traditional fee-for-service model toward an outcome-based performance payment system that restructures provider and hospital incentive and monetization strategy. Health insurance companies react to increasing competition with targeted product development and marketing efforts. To do so, these firms leverage internal customer data to identify primary demand drivers of their customer base.

The arenas of telemedicine and mobile health (mHealth) in particular will see rapid expansion in 2012, facilitating the exchange of information, diagnosis, treatment, and monitoring through phones and mobile devices to millions of individuals living in remote or rural areas with a shortage of doctors and hospitals. These changes fundamentally affect the markets for health delivery and have implications for ongoing healthcare industry restructuring and associated policy change. This anticipated explosion of mHealth delivery will result in widespread reliance on distributed file systems (for example, Apache Hadoop) to store vast amounts of personal health data, images, video, GPS data, and chat logs for streamlined indexing and processing. Access to all of this rich, personal data, including sensors, health monitoring readings, and auto-alerts when readings go over thresholds set by physicians, in a real-time shared scenario creates exciting opportunities for traditional healthcare analytics to scale up to meet the big healthcare data challenge.

The case studies described below are classical database marketing approaches to segmentation in healthcare and across other industries. Interestingly, these traditional approaches are also highly appropriate solutions for big data analytics. MapReduce is essentially a simplified process for natural language processing of extremely large data sets and features cluster-based partitioning options. The main difference when dealing with multiterabytes of data is primarily in the first data-processing step, which involves storage of the data and applying machine learning to filter and refine the dataset for analysis. The next step is familiar, applying traditional segmentation (or predictive analytics) approaches similar to those described below described below for even richer actionable microsegments.

Member-focused plans

Health insurance companies use analytic solutions and econometric analysis to uncover these demand drivers at an individual or segment level and identify differences in utility maximization in each segment to use in both product/plan design and marketing efforts. Understanding how demand drivers differ from segment to segment allows insurance companies to tailor products and services to meet demand and retain loyalty. These features might be regularly scheduled diagnostic exam reminders, discounts to health clubs for plan members, monthly heart-healthy recipes sent to member email addresses, appointment reminders and GPS directions on mobile phones, mobile reminders for prescription drug schedule updates, health apps to track glucose and blood pressure readings, and automatic emails to doctors with monitoring information.

Consumer preference for a tailored and personal health plan to manage individual and family wellness is highly varied. One insured person may value a plan that offers wellness and disease-management programs, access to a 24-hour nurse line, free participation in tobacco cessation programs; another may find high value in having access to a website to query pill-splitting and generic equivalents. Targeted options at a microsegment level are likely to reduce skyrocketing healthcare costs through a variety of mechanisms, simultaneously promoting healthy behaviors through better knowledge distribution and access. Health insurers need a repeatable way to leverage all of the disparate member data available to support initiatives involving data-driven intelligent member marketing, product design, risk assessment, program design, actuarial pricing, and provider and hospital relationship management. The following sections introduce two case studies that detail an actionable segmentation solution.

Case study 1: Member segmentation to drive retention

An actionable member segmentation effort starts with program objective refinement and a collection of enterprise-wide business knowledge and raw data discovery. The various (and often disparate) data sources are then assembled into an integrated analysis file using a tool like IBM SPSS Statistics Base, data exploration shaped by business insights, and a Chi-squared Automatic Interaction Detector (CHAID)-based segmentation performed in the Decision Tree module in SPSS Statistics Base results in specific profiled segments available for immediate retention targeting and tracking measurement. This case study outlines the detailed process.

Business objective

Customer retention is a critical component of profitability for any firm. Historically in the United States, company-sponsored health plans cover many individuals and their families. Although this trend is changing and an estimated 85 million Americans are currently uninsured, the Affordable Health Care Act will mandate that everyone carry a minimum level of insurance or face tax penalties for noncompliance. Regardless of the nature of insurance, those covered individually will have the opportunity to switch plans at will, and those on company-sponsored policies are allowed an annual open enrollment period during which they are able to change plans or make changes to elected benefits during a short time frame each year. Health insurers want to reduce the likelihood of members leaving for other plans and design targeted strategies to retain customers during these critical windows and throughout the year.

Phase 1

The first step in designing member segmentation to drive retention strategies is to compile a comprehensive list of relevant segmentation dimensions. This initial list is often the result of an enterprise sequence of individual interviews with internal stakeholders in various levels and departments across the organization.

Designing a standard interview questionnaire that collects information about the individual's role; department; business objectives; and customer data generated, used, and stored by the group is a best practice. The survey should have detailed questions that collect information during this data discovery around format and where the data resides, in what type of warehouse or tool, and who can provide access to this data.

Business questions should be included to understand dimensions of interest to each interviewee. What dimensions (age, product, gender, price, tenure, preferred provider organization [PPO] versus health maintenance organization [HMO] plans, presence of children and their age, chronic conditions, participation in wellness programs, prescription drug coverage, company-sponsored versus individual versus state-sponsored plan, children under 25 years of age on plan, patients 65 years of age or older, high claims, high utilization, and so on) are most relevant to the nature of their responsibility and interaction with the member. Collect and aggregate all of these, and review the information in workshops with all stakeholders to eliminate duplicates and come to a collective qualitative hierarchical list of attributes.

This process will uncover enterprise business knowledge and biases and provide a starting point for validation of these qualitative dimensions in Phase 2. It's also important in terms of success of future adoption and deployment of the solution to create one that aligns with business objectives and expectations in such a way that all stakeholders have provided personal validation throughout the various stages of segmentation design. This phase introduces invaluable insider business knowledge that one can use for segment insight and treatment strategy development.

Phase 2

Begin by requesting and compiling raw member-level data identified throughout the organization in Phase 1, and then integrate the data with a data-mining and aggregation tool. IBM® SPSS® Statistics Base was used to perform the analysis in question. In this example, suppose the following data sources are available:

  • Member and group identifiers, including each plan change, transaction, associated dates, dependent changes/additions, plan choice, group type, region, date of birth, gender, Standard Industrial Classification code, and zip code
  • Dental, vision, and drug plan participation
  • Disease management and wellness program participation data
  • Trigger-based indicators to capture plan changes, such as a student rolling off the policy, a new birth, marriage, or divorce
  • Census data appended to the zip code on address file (percentage ethnicity, urban/rural, average home value, and so on)
  • Customer satisfaction survey response data
  • New additional variables for analysis derived through aggregation, data manipulation, date subtraction, and rules logic for proxies for trigger events based on the member- and group-level data above (age, number of dependents at plan inception and each cancellation, current number of dependents, first product HMO or PPO, last product HMO or PPO, member tenure, change in the number of dependents during tenure, total number of plan changes, and members associated with the plan)

Phase 3

After the file assembly from Phase 2 is complete and basic data quality checking processes and validation of univariate counts and premodeling analyses have been performed, it's time to begin segmentation. For this example, I selected a CHAID approach. A sequence of iterative exhaustive CHAID programs can be run until the final version is identified.

CHAID is an algorithm used to build differentiation models based on a classification system. The analysis subdivides the sample into a series of subgroups that share similar characteristics toward a response and maximize ability to predict values of the response variable. The output is a tree, the branches of which are the predictor variables that split the sample into discriminating groups. CHAID is often used in segmentation, typically in the direct marketing industry to identify the type of responders who have reacted to a specific campaign. This article demonstrates a process by which a CHAID segmentation is performed using the Classification Tree Module in SPSS Statistics Base.

Phase 4

When the final CHAID segmentation is determined, each terminal node of the classification tree represents a member segment. In the generic tree sample shown in Figure 1, there are seven distinct terminal nodes.

Figure 1. Generic classification tree
Image showing a generic classification tree

Each of these segments can be profiled, given a name, and assigned treatment strategies designed to increase member satisfaction with the plan as well as the perceived utility members receive from their health insurer, which is expected to result in lower member churn and plan transfer and higher customer retention with the health insurance plan. This outcome can then be tested for efficacy and tweaked as needed as the segment-based retention efforts are operationalized.

Case study 2: Member advocacy program design

Member advocacy programs rely on segmentation throughout all stages of the program. A CHAID module in SPSS Statistics Base is first used for initial program design to understand the composition of member issues and to identify key acute categories and associated volume. A second segmentation is then developed for capacity and resource planning, where acute categories identified in the first stage are mapped to the appropriate advocate categories. The resulting overlay of segmentation schemes allows for real-time optimization and management of member-advocate mapping throughout the crisis management cycle.

Business objective

There is an ongoing trend in the health insurance industry for health plans to provide customized support to members to have them better understand and navigate their plans, particularly during acute instances, where health care consumers are unable to effectively optimize their plan use. A Member Advocacy Program (MAP) is designed to increase customer satisfaction and attract and retain members who find a personal advisor during critical health service demand periods to be a value-add. This advocate acts as a plan adviser familiar with the specific aspects of the plan and nuances of the insured person's family and health profile and helps him or her navigate health plans to the client's best advantage during short-term critical periods.

Phase 1

The first phase of a MAP is to identify those members who qualify for participation in a plan. Generally, this is done by plan type, and other parameters are used to limit the initial pilot to a manageable roll-out size. For example, a health insurer might specify that only those on specific company-sponsored plans and their spouses are included. No one under 18 years of age is included, the insured person must be enrolled on or after a certain date to be eligible, and must be a medical member; for Health Insurance Portability and Accountability Act (HIPAA) compliance purposes, each member (not the primary insured) is contacted about his or her own medical maintenance.

This process results in a specific list of filters. In much the same way the data integration was performed for analysis above, a member file is assembled that contains all of these elements, and a final list of eligible members is identified. This step was completed with SPSS Statistics Base to compile a complete list of eligible members for program participation.

Phase 2

The next stage involves determination of types of member advocates needed to support the program and member designation. This phase is based on the premise that within any given insured population, a given number of categories are classified as acute instances that require plan advocate intervention and support. This phase involves identification within the data of what is classified as an acute instance, flagging these instances with 1, and including a balanced sample of members who have not had acute instances. This flag variable becomes the dependent variable for CHAID segmentation. The resulting segments and segment profiles will describe the distinct types of acute instances and provide insight into the volume and nature of the issues to accurately identify the skills and specialties needed by member advocates to support the acute member segments. (An alternate to the CHAID approach is a cluster-based segmentation; in this case, only acute cases are needed for modeling, and SPSS Statistics Base supports both of these options).

To complete this process, three additional data sources are appended to the member file created in Phase 1. Of specific interest are call data (how often do they have inbound calls to the call center) and claims data (total number of claims and total dollar amount of claims). A period for analysis should be selected, and looking at this on an annual basis is a good starting point. SPSS Statistics Base is used to aggregate the claims and call data and append this information onto the primary member file. A group of indicators is used to group members into different acute need propensity categories. These are identified by determining averages for the population and examining distributions for each. By setting thresholds to flag any over those that are highest in these categories with the intention that through outreach by an advocate, a member will reduce the calls to the call center and reduce utilization through expert navigation of his or her plan, thereby lowering costs for both the member and the health insurance company. This model can then score an entire eligible member base on a daily basis to send auto-alerts when member data indicates that a member is entering an acute period:

  • High claims (dollars). This can be annual total annual, claims not covered, claims submitted, and so on.
  • High claims (volume). This is an aggregate of total claims in annually by the member.
  • High calls to the call center. This is the total count of inbound calls by the member.

Profiling the pilot member base and understanding the distribution of those who fall into one, two, or three of the above categories can establish a hierarchy of needs. By including International Classification of Diseases, 9th revision (ICD-9) codes, it is possible to specify areas of expertise for member advocates assigned to each member case.

Phase 3

In general, three types of member advocates will be available for assignment: a claims and benefits expert, a clinical expert, and an educator role. As a member moves through different phases of an acute instance, the advocates assigned to the member work as a coordinated team to support the member through the critical period. Claims data and call data will be used to score customers for not only instances of need but also where in the advocacy cycle they are at any point to pinpoint the proper advocate intervention.


Segmentation is critical to understanding the drivers of member behavior and how best to shape the nature of how individuals interact with and derive the most utility from their health plans. The microsegmentation approaches described are in industry-wide use today to help identify segments of members for a multitude of business purposes. Technological advances in the industry will result in further decentralization of member data available to support analytics initiatives. These new and more readily accessible sources of healthcare data will facilitate communication among all of the players, resulting in lower healthcare costs and improved health and wellness outcomes.



Get products and technologies


  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.


developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks

Zone=Big data and analytics, Information Management
ArticleTitle=Microsegmentation solutions for healthcare insurers