Retention modeling for technology service contracts

Which of my customers is most likely to leave?

Retaining customers represents a tremendous challenge within the ever-changing and dynamic technology space. Rapidly evolving technology offerings that are combined with constantly shrinking IT budgets create the perfect storm for customer churn. Explore tips and tricks for building a model that predicts which customers are most likely not to renew their service contracts. Review critical steps, regardless of the particular statistical method, and important considerations about the implementation strategies of the model.


Vincent A. Stuntebeck, Senior Manager Analytics, Avaya

Photograph of  Vincent StuntebeckVincent Stuntebeck provides consulting services within the marketing and business analytics space and has over 13 years of experience generating customer insights that impact the business.  He has led analytical teams and projects for Fortune 500 companies within industries as diverse as Telecommunications, Government, Financial, Insurance, Energy, and Technology.  Vincent has an MBA from the University of Georgia and a master’s degree in mass communications from Georgia State University.  Vincent is passionate about working with business teams to fully leverage the power of analytics and has recently ventured into the emerging world of social media analytics.  He can be reached at 

22 January 2013


In a world where technology companies go to battle every day against new competitors to maintain and grow revenue, a services offering can provide an effective means of grabbing a larger share of the customer's wallet. According to a recently released report by Global Industry Analysts Inc., the global IT services market is expected to reach a whopping US$1.2 trillion by the year 2015. Although this global number seems astronomical, it doesn't seem completely outside the realm of possibility given that companies and consumers alike are constantly seeking ways to simplify all things digital, including computing, transmitting, protecting, retrieving, and storing that which exists in the ether.

Cloud computing, or the use of computing resources (hardware and software) delivered via a network, certainly represents a key driver of this trend. Forrester estimates that the market for cloud computing to be US$61 billion by the end of 2012. Consumer-driven companies like Apple and Amazon ramped up their cloud offerings and are finding new and creative ways to bundle service contracts with their products that not many could imagine just a few years ago. Meanwhile, within the business-to-business (B2B) space, large technology outsourcers such as IBM and Capgemini help businesses focus on their core operations by offering fully outsourced technology solutions from cradle to grave.

This trend bodes well for those professionals who possess the knowledge and skills to help organizations use big data to develop stronger relationships with their services customers while they predict which customers are most likely to leave for greener pastures. Within the technology sector, keeping customers over time is a challenge, but not one without substantial rewards. In the book, Leading on the Edge of Chaos, the authors estimate that a 5 percent increase in customer retention can yield an incremental surge in profits of 25 to 125 percent—not a bad return on investment to tie your analytical project to! Many pundits agree that US$1 of marketing budget that is spent on customer retention yields better economic results than spending that same US$1 on customer acquisition. In his book, The Loyalty Effect, Frederick Reichheld highlights several residual effects of customers who stick around, including:

  • Increased referral rates to bring in other customers;
  • Less sensitivity to prices;
  • Lower cost to serve; and
  • Reduction in initial processing costs.

Retention modeling, also known as churn or attrition modeling, really serves as the primary means of identifying which customers are about to leave, understanding key drivers for increasing retention, and helping to focus tactical interventions on customers who might be persuaded to stay. This type of model uses data to find similarities and differences between two groups—in this case, between customers who canceled versus those who stayed. Marketing and sales teams typically bake the outputs of these models into operational processes in attempts to strategically target at-risk customers for sales and marketing initiatives that help retain profitable accounts. Retention models can help customer and loyalty programs, regardless of the department. Within this article, I focus more on on-going maintenance contracts than on one-time implementation services.

Define the objective

For most technology service providers, defining the objective is simple: Continue servicing customers if possible, regardless of technological changes. The more challenging aspect of beginning a retention modeling project comes when you attempt to establish the definition of a customer who leaves, or an attritor. Important factors to consider include the number of months after the contract ends or is canceled, how to handle customers who return after a certain amount of time, and protocols for nonvoluntary attrition. Do not include customers whose service contracts were canceled because of nonpayment issues or similar types of circumstances in the analysis. Instead, focus on voluntary terminations that are based on a customer decision that the organization can potentially affect, and code this "event" as a binary variable (1, 0) within the data set. Finally, the most important aspect of defining the objective lies in the analyst's ability to understand how to use analysis outputs to further business goals. Gathering these requirements in the beginning stages of the project and clearly setting expectations save valuable analysis and implementation time.

Prepare the data

Similar to any other data-mining or modeling effort, data must be extracted from source systems and collated into a single data set. Rarely is the data sitting in a single table waiting for your analysis; more often, it lies in disparate operational systems across the organization. The goal is to bring together all of this data and transform it into meaningful, useful information. Keep in mind that this task often takes upwards of 50 percent of the total model building time.

For this type of model, systems tracking details about the service contracts themselves, inquiries that are placed on the contracts themselves, and characteristics that pertain to the customers represent critical areas to target data-gathering efforts. Table 1 shows an example of primary subject areas.

Table 1. Primary subject areas
Primary subject areaExamples
Accounts receivablesPayment problems, credit issues, method of payment
Customer demographics/firmographicsGeography, industry, size, revenues, tenure
Contract specificationsChange history, coverage, billing frequency, start date, end date, sales channel
Product or products involvedProduct type
Service historyCall volume, call response, time to problem resolution
Account contractCount of other contracts, coverage for other contracts, start dates, end dates
Account serviceFor all contracts for this account, call volumes, call responses, time to problem resolutions
OrdersParts order history and type
Account ordersFor all contracts, parts order history and type

The above list of subject areas and examples is by no means comprehensive but rather a small sampling of the types of data typically used for retention models. Although the general rule that more data equals a more robust model certainly applies here, the reality of deadlines often limits the amount of time available to data gathering, so prioritizing which data to go after based on accessibility and impact becomes an important skill that analysts learn over time. Finally, obtaining at least two years of data can, at least in most cases, ensure that you have a reasonable time period from which to build the model and capture a few cycles, as most technology service contracts are annual in nature. There are exceptions to this rule of thumb, especially within the technology industry, because of the dynamic nature of product offerings.

As the data is collected, varying structures can pose a significant challenge, as the goal is to create a single data set from which to build the model. It is best that this data set contain one row per customer, with many columns—sometimes hundreds of columns after the creation of the data set is final. You can use IBM® SPSS® Statistics to run match merging processes necessary to build the modeling dataset that require one-to-many merges, which are often tricky to perform correctly. Within a B2B context, it might make sense to create individual rows for subsidiary companies, depending on the level to which you will use the model outputs. A similar situation can apply within the consumer world, as the marketing or sales teams might want to target individuals within a household. The example in Figure 1 illustrates at a high level what a B2B final data set looks like.

Figure 1. Example data set
Screen capture image showing an example data set

(View a larger version of Figure 1.)

Next, individual variables most often require transformation or manipulation to allow statistical procedures to fully highlight relevant trends and anomalies. You might use special coding such as binning to fit individual variables into the required format of one row per customer. Examine both continuous and categorical variable in detail for outliers using frequency distributions and histograms. Flooring or capping outliers, which are based on sound business judgment, represent an important step for each variable. This critical work typically falls under the heading of exploratory data analysis and helps to provide an early view into which variables can yield the best results when placed in the model.


Before modeling, running a clustering analysis can highlight different customer groups that might exist within the data and prove useful later on in the process. Based on the value of data attributes, clustering techniques maximize both the similarity of customers within the same cluster and the dissimilarity of customers between the different clusters. Many statistical software packages offer this function, and IBM SPSS Modeler provides three different procedures that you can used to cluster groups of customers, including K-Means cluster, hierarchical cluster analysis, and two-step cluster analysis. If significant differences in your customer base exist, creating retention models for each group of customers might yield the best results. If you do so, use your business knowledge to reduce the number of variables that enter the cluster analysis, as the presence of a huge number of variables can generate unwieldy or unusable results.

Several well-documented statistical approaches exist for building a retention model, thus it is best to match your approach to the application. However, logistic regression tends to be the default and is a time-tested method in this space. After you select the appropriate samples for training and validation, you must iterate several key steps within the model-building process, including:

  • Select variables;
  • Validate model results; and
  • Check business process to confirm (or not) result theories.

Several useful tests for assessing model adequacy and fit are available for logistic regression models within SPSS Modeler, including measures similar to the "coefficient of determination" in ordinary least squares regression, a generalized test (Hosmer-Lemeshow) for determining of model fit, and the ability to develop tables that show the proportion of cases under analysis that are classified correctly.

Always employ intuition as a final check: Look at the variables that are displayed in the model and the composition of the predicted leaver customer segments. Do they resemble your image of a customer who might be about to discontinue his or her service and defect to a competitor? Would the marketing or sales teams agree? By identifying and communicating the key drivers of retention to business stakeholders you build credibility in the model while potentially offering valuable information to stakeholders that can be used elsewhere in the organization.


Working closely with key stakeholders to ensure that the appropriate use and interpretation of the model is critical to success. After model outputs are used in loyalty and retention programs, tracking ongoing results reveals how well the model works in real life. Include test and control groups in tracking to help users understand how the model is affecting program outcomes, which simultaneously generates valuable information to use to improve it—the virtuous cycle of test, learn, and refine at work.

Seasonality and competitor activity can affect the model performance, so try to learn from new information and experience. Valued nuggets of customer insight exist within the process of gathering information about why customers are defecting, and the data can provide early signals if a significant issue lies ahead. Also, changes in the customer base itself can be monitored, as they can affect model performance. The cluster analysis mentioned earlier might yield key insights that can prove invaluable when combined with profiles of the groups identified. Given the fluid nature of this business, all levels of the organization appreciate early signals.

Finally, you must refresh retention models regularly. By waiting longer than six to nine months before you refresh a model results in one that is stale and does not predict with the same levels of accuracy that it did in the beginning stages of the project. Also, customers can be scored at least monthly if not weekly to capture the most recent history. Although daily scoring might be ideal, constraints within source systems often don't support such frequent updates. However, automated scoring processes can help ensure that users of the outputs have fresh results. Export these scores into key databases and transactional systems that contain customer information, such as IBM PredictiveInsight (formerly Unica PredictiveInsight).


Retention models continue to increase in popularity across all industries, especially within the technology market. Use the tips you learned here to ensure that your model rests on a solid foundation of business knowledge and can deliver optimal value. Pair a solid set of retention models with automated customer touch processes, and you have a killer combination that can take your organization to new levels of profitability. Good luck!



Get products and technologies



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks

Zone=Big data and analytics, Industries
ArticleTitle=Retention modeling for technology service contracts