In a world where technology companies go to battle every day against new competitors to maintain and grow revenue, a services offering can provide an effective means of grabbing a larger share of the customer's wallet. According to a recently released report by Global Industry Analysts Inc., the global IT services market is expected to reach a whopping US$1.2 trillion by the year 2015. Although this global number seems astronomical, it doesn't seem completely outside the realm of possibility given that companies and consumers alike are constantly seeking ways to simplify all things digital, including computing, transmitting, protecting, retrieving, and storing that which exists in the ether.
Cloud computing, or the use of computing resources (hardware and software) delivered via a network, certainly represents a key driver of this trend. Forrester estimates that the market for cloud computing to be US$61 billion by the end of 2012. Consumer-driven companies like Apple and Amazon ramped up their cloud offerings and are finding new and creative ways to bundle service contracts with their products that not many could imagine just a few years ago. Meanwhile, within the business-to-business (B2B) space, large technology outsourcers such as IBM and Capgemini help businesses focus on their core operations by offering fully outsourced technology solutions from cradle to grave.
This trend bodes well for those professionals who possess the knowledge and skills to help organizations use big data to develop stronger relationships with their services customers while they predict which customers are most likely to leave for greener pastures. Within the technology sector, keeping customers over time is a challenge, but not one without substantial rewards. In the book, Leading on the Edge of Chaos, the authors estimate that a 5 percent increase in customer retention can yield an incremental surge in profits of 25 to 125 percent—not a bad return on investment to tie your analytical project to! Many pundits agree that US$1 of marketing budget that is spent on customer retention yields better economic results than spending that same US$1 on customer acquisition. In his book, The Loyalty Effect, Frederick Reichheld highlights several residual effects of customers who stick around, including:
- Increased referral rates to bring in other customers;
- Less sensitivity to prices;
- Lower cost to serve; and
- Reduction in initial processing costs.
Retention modeling, also known as churn or attrition modeling, really serves as the primary means of identifying which customers are about to leave, understanding key drivers for increasing retention, and helping to focus tactical interventions on customers who might be persuaded to stay. This type of model uses data to find similarities and differences between two groups—in this case, between customers who canceled versus those who stayed. Marketing and sales teams typically bake the outputs of these models into operational processes in attempts to strategically target at-risk customers for sales and marketing initiatives that help retain profitable accounts. Retention models can help customer and loyalty programs, regardless of the department. Within this article, I focus more on on-going maintenance contracts than on one-time implementation services.
Define the objective
For most technology service providers, defining the objective is simple: Continue servicing customers if possible, regardless of technological changes. The more challenging aspect of beginning a retention modeling project comes when you attempt to establish the definition of a customer who leaves, or an attritor. Important factors to consider include the number of months after the contract ends or is canceled, how to handle customers who return after a certain amount of time, and protocols for nonvoluntary attrition. Do not include customers whose service contracts were canceled because of nonpayment issues or similar types of circumstances in the analysis. Instead, focus on voluntary terminations that are based on a customer decision that the organization can potentially affect, and code this "event" as a binary variable (1, 0) within the data set. Finally, the most important aspect of defining the objective lies in the analyst's ability to understand how to use analysis outputs to further business goals. Gathering these requirements in the beginning stages of the project and clearly setting expectations save valuable analysis and implementation time.
Prepare the data
Similar to any other data-mining or modeling effort, data must be extracted from source systems and collated into a single data set. Rarely is the data sitting in a single table waiting for your analysis; more often, it lies in disparate operational systems across the organization. The goal is to bring together all of this data and transform it into meaningful, useful information. Keep in mind that this task often takes upwards of 50 percent of the total model building time.
For this type of model, systems tracking details about the service contracts themselves, inquiries that are placed on the contracts themselves, and characteristics that pertain to the customers represent critical areas to target data-gathering efforts. Table 1 shows an example of primary subject areas.
Table 1. Primary subject areas
|Primary subject area||Examples|
|Accounts receivables||Payment problems, credit issues, method of payment|
|Customer demographics/firmographics||Geography, industry, size, revenues, tenure|
|Contract specifications||Change history, coverage, billing frequency, start date, end date, sales channel|
|Product or products involved||Product type|
|Service history||Call volume, call response, time to problem resolution|
|Account contract||Count of other contracts, coverage for other contracts, start dates, end dates|
|Account service||For all contracts for this account, call volumes, call responses, time to problem resolutions|
|Orders||Parts order history and type|
|Account orders||For all contracts, parts order history and type|
The above list of subject areas and examples is by no means comprehensive but rather a small sampling of the types of data typically used for retention models. Although the general rule that more data equals a more robust model certainly applies here, the reality of deadlines often limits the amount of time available to data gathering, so prioritizing which data to go after based on accessibility and impact becomes an important skill that analysts learn over time. Finally, obtaining at least two years of data can, at least in most cases, ensure that you have a reasonable time period from which to build the model and capture a few cycles, as most technology service contracts are annual in nature. There are exceptions to this rule of thumb, especially within the technology industry, because of the dynamic nature of product offerings.
As the data is collected, varying structures can pose a significant challenge, as the goal is to create a single data set from which to build the model. It is best that this data set contain one row per customer, with many columns—sometimes hundreds of columns after the creation of the data set is final. You can use IBM® SPSS® Statistics to run match merging processes necessary to build the modeling dataset that require one-to-many merges, which are often tricky to perform correctly. Within a B2B context, it might make sense to create individual rows for subsidiary companies, depending on the level to which you will use the model outputs. A similar situation can apply within the consumer world, as the marketing or sales teams might want to target individuals within a household. The example in Figure 1 illustrates at a high level what a B2B final data set looks like.
Figure 1. Example data set
(View a larger version of Figure 1.)
Next, individual variables most often require transformation or manipulation to allow statistical procedures to fully highlight relevant trends and anomalies. You might use special coding such as binning to fit individual variables into the required format of one row per customer. Examine both continuous and categorical variable in detail for outliers using frequency distributions and histograms. Flooring or capping outliers, which are based on sound business judgment, represent an important step for each variable. This critical work typically falls under the heading of exploratory data analysis and helps to provide an early view into which variables can yield the best results when placed in the model.
Before modeling, running a clustering analysis can highlight different customer groups that might exist within the data and prove useful later on in the process. Based on the value of data attributes, clustering techniques maximize both the similarity of customers within the same cluster and the dissimilarity of customers between the different clusters. Many statistical software packages offer this function, and IBM SPSS Modeler provides three different procedures that you can used to cluster groups of customers, including K-Means cluster, hierarchical cluster analysis, and two-step cluster analysis. If significant differences in your customer base exist, creating retention models for each group of customers might yield the best results. If you do so, use your business knowledge to reduce the number of variables that enter the cluster analysis, as the presence of a huge number of variables can generate unwieldy or unusable results.
Several well-documented statistical approaches exist for building a retention model, thus it is best to match your approach to the application. However, logistic regression tends to be the default and is a time-tested method in this space. After you select the appropriate samples for training and validation, you must iterate several key steps within the model-building process, including:
- Select variables;
- Validate model results; and
- Check business process to confirm (or not) result theories.
Several useful tests for assessing model adequacy and fit are available for logistic regression models within SPSS Modeler, including measures similar to the "coefficient of determination" in ordinary least squares regression, a generalized test (Hosmer-Lemeshow) for determining of model fit, and the ability to develop tables that show the proportion of cases under analysis that are classified correctly.
Always employ intuition as a final check: Look at the variables that are displayed in the model and the composition of the predicted leaver customer segments. Do they resemble your image of a customer who might be about to discontinue his or her service and defect to a competitor? Would the marketing or sales teams agree? By identifying and communicating the key drivers of retention to business stakeholders you build credibility in the model while potentially offering valuable information to stakeholders that can be used elsewhere in the organization.
Working closely with key stakeholders to ensure that the appropriate use and interpretation of the model is critical to success. After model outputs are used in loyalty and retention programs, tracking ongoing results reveals how well the model works in real life. Include test and control groups in tracking to help users understand how the model is affecting program outcomes, which simultaneously generates valuable information to use to improve it—the virtuous cycle of test, learn, and refine at work.
Seasonality and competitor activity can affect the model performance, so try to learn from new information and experience. Valued nuggets of customer insight exist within the process of gathering information about why customers are defecting, and the data can provide early signals if a significant issue lies ahead. Also, changes in the customer base itself can be monitored, as they can affect model performance. The cluster analysis mentioned earlier might yield key insights that can prove invaluable when combined with profiles of the groups identified. Given the fluid nature of this business, all levels of the organization appreciate early signals.
Finally, you must refresh retention models regularly. By waiting longer than six to nine months before you refresh a model results in one that is stale and does not predict with the same levels of accuracy that it did in the beginning stages of the project. Also, customers can be scored at least monthly if not weekly to capture the most recent history. Although daily scoring might be ideal, constraints within source systems often don't support such frequent updates. However, automated scoring processes can help ensure that users of the outputs have fresh results. Export these scores into key databases and transactional systems that contain customer information, such as IBM PredictiveInsight (formerly Unica PredictiveInsight).
Retention models continue to increase in popularity across all industries, especially within the technology market. Use the tips you learned here to ensure that your model rests on a solid foundation of business knowledge and can deliver optimal value. Pair a solid set of retention models with automated customer touch processes, and you have a killer combination that can take your organization to new levels of profitability. Good luck!
- Global Industry Analysts report: Learn more about the global IT services market.
- 10 Cloud Predictions for 2012: Read Forrester predictions about the cloud market in Holger Kisker's blog.
- Leading on the Edge of Chaos (Prentice Hall, 2002): Check out a great resource for understanding customer loyalty in this book by Emmett C Murphy and Mark A Murphy.
- The Loyalty Effect: Read a brief summary of the book by Frederick F. Reichheld and Thomas Teal.
- Introduction to IBM SPSS Modeler and Data Mining, Modeling: Explore the training courses offered by IBM.
- IBM SPSS Statistics Statistical Procedures Companion, Chapter 16 Cluster Analysis: Learn more about how to use SPSS Statistics for cluster analysis projects.
- More SPSS content on developerWorks: Explore more articles related to SPSS.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
Get products and technologies
- IBM SPSS Statistics: Learn more about this integrated family of products that addresses the analytical process, from planning to data collection to analysis, reporting, and deployment.
- SPSS Modeler: Learn more about this product and how to use it for intensive data mining.
- IBM PredictiveInsight: Access information about this product that contains powerful predictive modeling features designed for use by marketing business users.
- Evaluation software: Download or explore the online trials in the IBM SOA Sandbox and try application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- developerWorks community: Connect with other developerWorks users while you explore the developer-driven blogs, forums, groups, and wikis.
- developerWorks profile: Create your profile today and set up a watchlist.
Dig deeper into Big data and analytics on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Crazy about Big data and analytics? Sign up for our monthly newsletter and the latest Big data and analytics news.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.