Optimal segmentation approach and application
Clustering vs. classification trees
The term segmentation has become ubiquitous, yet it takes on so many different meanings based on context that it often results in confusion. It's not uncommon for corporations to have several segmentation efforts going on simultaneously across different departments. Generally, most professionals agree that segmentation serves as a catchall term referring to general partitioning of a whole into subsets of similar units. Yet, beyond that the topic can be a subject of intense debate.
Suppose that at any given time within one organization, the following efforts for segmentation are going on simultaneously:
- Research and development (R&D) is developing a customer segmentation to better understand consumer preferences and purchase behaviors to drive tailored product enhancements. R&D might also develop a product segmentation to understand product similarities and types of products that are usually purchased together.
- Finance has identified segments of customers and prospects to aid in revenue forecasting. Data in this case might be profitability, cost of acquisition, lifetime value, demographics, retention and advertising costs, and more.
- Market research segmentation forms the basis for service and quality perception to drive brand strategy and advertising efforts. Market researchers traditionally perform segmentation with survey instruments and customer feedback data.
- Marketing has yet another segmentation to understand who is responding to various marketing channel campaigns to refine targeting and improve campaign response. Marketing analytics folks often draw on raw customer purchase behavior and demographic data as a basis for segmentation.
This kind of scenario is quite common, where an enterprise will lack a universal segmentation strategy and disparate (and often contradictory) segmentations are developed cross-departmentally and used in distinct ways. This practice is prevalent across many industries in which segmentation is used. To offer a limited snapshot of how various industries approach segmentation, consider the following applications: Insurance firms use segmentation to identify risk pools and set pricing and premium levels. The electricity industry uses a bottom-up approach to load forecasting, performing segment-level forecasts to aggregate for total demand. The automobile industry uses segmentation to understand target market preferences around design and features. Banks segment credit card prospects for direct mail campaigns. Biologists refer to segmentation as something vastly different, separating animal types into categories based on body structure and growth zones. Pharmaceutical firms deploy segmentations to maximize life cycles of product innovation. The field of image processing (which includes facial recognition) is one of the most complex areas, relying on sophisticated segmentation application of parametric, region-growing, and edge-detection algorithms. Irrespective of the industry, presumably all enterprises would benefit from moving to a more consolidated and aligned corporate segmentation strategy.
The distinctions listed above detail the various approaches and objectives of segmentation projects. Market researchers and marketing analytics professionals typically approach the process with radically different objectives, input data, and methodology. Let’s explore the standard approaches to marketing segmentation a bit further.
The first step in any segmentation effort is to understand the objective and motivation for the study. Who is asking for the segmentation? What will it be used for? Why is it necessary? What information about consumers is needed that is not already available? Who will use the output? What data is available to support the segmentation? How will the segmentation be actionable and deployed? How will the success of the project be measured? The answers to all of these questions help identify the most appropriate technique, data, and algorithm needed to address the problem. We'll look at a specific use case in a later section and outline two viable options as well as discuss the similarities and differences between customer segmentation and predictive modeling (see Related topics for a link to more information.)
Data inputs and standard segmentation approaches
Data is a critical input to any segmentation effort. As a rule, as long as the data source can be accurately associated with an individual- or household-level ID, more data is preferable to less. The list of available data is nearly infinite, but here are a few key data categories:
- Survey data can be collected from customers or consumers from the general population around product and price preferences, channel sales, customer experience satisfaction, and improvement recommendations.
- Transactional data is traditionally stored in relational databases and includes purchases, returns, discounts, method of payment, date and time of purchase, and items purchased together in a retail setting. In a financial setting, this information translates to deposits; withdrawals; products like checking, savings, and mortgages; and details around each product. In an energy setting, this information includes usage, outages, payments, deposits, installation of smart meters, and more.
- Behavioral data includes web browsing behavior, store navigation, eye tracking, voice recognition, search, mobile usage and device information, geolocation, frequency, and volume of in- and out-bound interactions with the brand. Social media interaction such as "liking," retweeting, and following also falls into this rich category of data.
- Demographic data can be collected directly from consumers or purchased from demographic data providers that offer between 300 and more than 900 variables at the individual, household, and zip code level. Many of these third-party appended datasets are derived from U.S. census response data.
- Other data categories include call center, chat, information-seeking, price comparison, reviews, participation in peripheral programs and communities, and product information.
After the preliminary business objective refinement and data discovery are complete, it's time to consider possible segmentation approaches. You can select from a variety of traditional approaches, and each has its own benefits and limitations. For instance, many of the clustering options result in same-size clusters; although this can be desirable from a deployment standpoint, forcing same-size clusters can also dilute the strength of similarity measures within a cluster.
There are three basic choices when determining the best segmentation approach. Figure 1 shows three general methods: non-quantitative, interdependent, and dependent.
Figure 1. Three basic segmentation choices
The first option is a qualitative, or non-quantitative approach involving contrasting dimensions derived through interviews with business stakeholders and focus groups to gather anecdotal information. These dimensions reflect experiential data around consumer behavior and are used to assign subjective segments for targeted treatment strategy. Although directionally useful, these non-quantitative approaches tend to be less robust than the other two data-driven segmentation categories—interdependent and dependent.
Interdependence refers to a subset of multivariate segmentation techniques that group consumers based on similar characteristics. Cluster analysis is one popular type of interdependent segmentation in which all dataset inputs are simultaneously considered, and there is no splitting of dependent and independent variables. Integral to the clustering process are iterative mapping and graphing of segments to visualize relationships and cluster variation spatially until the final best fit is identified.
Dependence refers to pattern analysis approaches, such as Kohonen networks, Rule Induction, Chi-square based automatic interaction detection (CHAID), C5.0, Iterative Dichotomiser 3 (ID3) and classification and regression trees (CARTs), and are typically selected for the identification of key market segments. Most of these algorithms as well as machine learning approaches (neural networks) result in tree-type output that is useful in that it delivers a visual graphical representation of segments that aids in validation and explanation to nontechnical stakeholders. One key difference in these approaches is that the models require a dependent variable, whereas no dependent variable is designated in the interdependence models. The dependent variable is usually a 0-1 flag-type variable that matches the objective of the segmentation (that is, churn to identify segments of customers most likely to defect, high value for customers likely to exceed a desired threshold of spending, or high risk for groups of customers likely to default on credit card payments or loans). In addition to the resulting groupings in tree format, these dependence models generate associated probability and propensity metrics in their output metrics. For this reason, extensive industry debate exists around the semantics of segmentation with dependence approaches.
Proponents of this method stress that the main output of dependence segmentations is groupings of similar customers that can be further profiled and have tailored treatment strategies applied to reduce churn, encourage increased spending behavior, or introduce risk-intervention strategies prior to impending default. Critics of this approach argue that the resulting model is actually a predictive model rather than a segmentation model because of the probability prediction output. The distinction may lie in the use of the model. Segmentation is classifying customer bases into distinct groups based on multidimensional data and is used to suggest an actionable roadmap to design relevant marketing, product, and customer service strategies at a segment level that will drive desired business outcomes. Predictive modeling is forecasting a specific consumer behavior at the individual level. If that seems a logical definition, it follows that use of the output should determine the segmentation versus predictive model designation.
The final preparation step prior to embarking on a segmentation effort is to select the most appropriate software for the job. Numerous open source and commercial vendors offer the gamut of classification and clustering algorithms. Some, such as freeware Rapid Miner, offer decision trees, supervised vector machines (SVMs), and two types of neural networks. Others, such as IBM, have an array of options, including IBM® SPSS® Advanced Statistics (see Related topics), which includes Kohonen, Two-step, K-Means, and the Decision Tree Module, which offers four tree-growing algorithms: CHAID, Exhaustive CHAID, CART and QUEST (an unbiased binary tree algorithm). IBM Unica has the Affinium Model, which offers a cross-sell module providing CHAID, CART, and neural nets. The IBM Intelligent Miner® data-mining suite provides an extensive list of algorithms with the ability to benchmark and compare multiple algorithms to facilitate the final best algorithm selection. This list provides detailed information on many of the statistical packages that support segmentation approaches.
Types of clusters and classification approaches
Hierarchical and nonhierarchical (disjoint) clusters are both limited by their ability to analyze numerical variables; unless a distance matrix is included that allows both character and numeric inputs. Hierarchical clusters do not overlap, although one cluster can be a fully contained subset of another. Disjoint clusters also do not overlap, as customers can only be in one cluster. In contrast, overlapping clusters are unconstrained versions that can be adjusted to allow various degrees of overlap. Fuzzy clusters can fall into the above three categories and are differentiated by assigned probabilities of membership to each cluster. K-Means algorithms can be run many times to produce a specific number of disjoint, flat clusters. A softer technique uses probability estimates through iterative classification called Normal mixtures to assign probability of group inclusion. Single linkage is a hierarchical clustering technique that merges two clusters, with the smallest minimum pair-wise distance at each step; complete linkage merges two clusters, where the merger represents the smallest diameter. One clustering approach that performs well across the board in Milligan's 1981 sentinel publication on the topic is the Average Link (group average) approach, which combines single and complete linkage characteristics (see Related topics for a link to more information). Ward's Minimum Variance method also performs well. Other methods are available, such as factor analysis, which is often used in the first stage of clustering for variable reduction purposes, and latent class algorithms, which represent a structural equation modeling approach that uses probability modeling to maximize overall fit to find groups in datasets of multivariate categorical data.
In terms of classification approaches, CHAID is a decision tree that uses adjusted significance testing to detect interaction among variables to determine multi-way splits. Its advantages are that has easy-to-understand and interpret output, it is an industry-standard approach to direct marketing, and it can easily handle both categorical and numerical inputs. CHAID doesn't perform well on small datasets and is usually associated with initial stages of data exploration in regression and predictive modeling efforts. CART (see Related topics) is actually an umbrella term for both classification and regression trees that differ primarily in their node splitting criteria. ID3 (see Related topics) is an approach that results in nodes that minimize entropy.
Use case of segmentation
Business Scenario: A health insurance company is interested in segmenting its customer base to determine the best segments of customers for an outreach campaign, encouraging their participation in online wellness programs. The expectation is that as members assume a greater role in the self-management of care, claims will decrease, health outcomes will improve, and member satisfaction and retention will result.
The health insurance company collects data on type of plan, demographics, claims, participation in wellness and disease-management programs, detailed information on outbound and inbound calls, chat and emails, website logins and information-seeking sessions, prescription drug data, and other individual-level variables. What are possible segmentation approaches to address this business case?
As is true with most cases of applied analytics, the process involves a blend of art and science.
To some extent, approach selection comes down to a question of analyst
preference, availability of software and associated algorithms, and
familiarity with validation and assessing success criteria of the output.
In this use case, either an interdependent (no dependent variable)
clustering or dependent (classification) approach could be applied.
Remember that the latter requires a dependent variable: If the data
supports the identification of members who are already participating in an
online wellness program or who are participating in an offline program and
is associated with desired success metrics, this group can be flagged with
WellFlag=1, and all others can be flagged with
WellFlag=0. This binary flag can be further refined if a
demographic variable is available that indicates a computer user or if the
member record includes an email address, both of which serve as a proxy
for propensity to have and use computers. Because the data inputs are both
character and numeric, CHAID is a flexible classification approach that
will neatly divide the members by the categorical and numeric data into
segments and allow for more detailed profiling to aid in wellness
subprogram and website design (based on medical needs, risk for disease,
and targeted needs).
This segmentation could also be approached with a clustering technique, where Average Linkage or K-Means is applied using the numeric values and treating categorical with "distance" measures for inclusion in the model. The actual algorithm selection depends on the desired output. If distinct clusters are necessary (that is, one member should participate in one type of wellness program), then non-Fuzzy options such as K-Means and Normal can be selected. If overlapping clusters are more suitable, Factor rotation and Fuzzy clusters are prescribed. The best-fit algorithm selection is the result of first preparing the data set and transforming categorical values appropriately, then running the datasets through the various candidate approaches and reviewing the graphical output to see the relative size and groupings of the clusters. These graphs allow for comparison and selection of ideal clusters: Those with the best-separated and most compact clusters represent the best fit.
Segmentation and big data implications
Big data is a term that applies to the petabytes of social, mobile, web, text, and sensor data generated and stored at the individual level. This data is typically stored in unstructured databases and tools such as IBM InfoSphere® BigInsights™, which sits on the Apache Hadoop platform and facilitates large-scale analytics by business analysts instead of machine learning experts. These new technologies enable access to huge and previously untapped data sources as well as nimble filtering and MapReduce functions, thereby adding value with the inclusion of representation of unstructured data such as images, videos, and text-based opinions into the traditional dataset.
The classical segmentation algorithms described in this article remain relevant in a big data environment: The approach and selection criteria remain the same. The difference lies primarily in the preprocessing and integration of unstructured data and promises to result in richer and more actionable segmentation outcomes. Companies that assemble a technology stack to access big data are able to tap into what would otherwise remain an unwieldy and mostly inaccessible reservoir of information. Many of the open source solutions designed to manage big data are based on segmentation and filtering principles similar to the algorithms described here. Instead of analyzing the data in its entirety, however, it becomes possible to scoop up filtered samples of big data and apply traditional segmentation to gain insights into new digital channel behavior. Enterprises that are able to tie in these new, unstructured data sources and integrate them fully into a multidimensional analysis will be several steps closer to the ultimate 360-degree view of the customer and all of the competitive benefits associated with deep consumer insights.
- Segmentation takes on a different meaning among biologists that in industry.
- In the field of image processing, segmentation is the process of partitioning a digital image into sets of pixels for easier analysis.
- For more information about segmentation and predictive modeling, see the white paper, Customer Segmentation and Predictive Modeling: It's Not an Either/Or Decision, by Mike McGuirk.
- Learn more about the segmentation approaches discussed in this article.
- Learn more about K-Means clustering.
- Read about the Average Link approach, pioneered in the research of Milligan and Cooper.
- Wikipedia provides a good explanation of latent class algorithms.
- CART is a catch-all term used for decision tree learning, which also includes ID3.
- Google Research has done a great deal of work on MapReduce, an architecture for simplified data processing in large clusters.
- Rapid Miner offers several software options for segmentation, including SVMs.
- Learn more about SPSS Advanced Statistics.
- Evaluate IBM products in the way that suits you best.