Predictive analytics and its statistical underpinnings have been used in science and academia for decades. Dig just a bit below the surface, and you find similar analytical processes in economics too. Until recently, commercial use of predictive analytics was limited to pharmaceutical trials (scientific research) and marketing projects for huge companies.
With the introduction of faster processing and easier-to-use software, commercial use of predictive analytics is exploding in departments beyond the traditional. Massive increases in data volumes from both structured data transactional systems and nonstructured data mean that the need for data mining and predictive analytics is exploding.
Customer segmentation is an area of predictive analytics that is useful for any organization. Knowing who your customers are is self-evidently important. Customer relationship management (CRM) systems perform the tasks of monitoring activities, coordinating resources, and generally keeping your organization on track with its sales processes. Moving beyond tracking and transactions to gain insight is the realm of predictive analytics. As processing and software advance, predictive analytics is increasingly finding its way into more CRM-related activities. Foundationally, segmenting and grouping similar customers allow for many other analytical processes. Few more advanced processes will work efficiently until you logically divide your customers into understandable groupings. (As a side note, some of those more advanced processes may include predicting reorder rates, seasonality by customer type, customer life cycle management, targeted marketing, and cross sell or up-sell initiatives.)
Just a few years ago, the idea of a company using millions of transactions in its history files and dozens of customer characteristics to create statistically based predictions about future business was limited to large and very large companies. Now, changes in the technology landscape bring predictive analytical processes to such a price point that medium-sized and small companies can now analyze their data and deploy predictive models. In the future, as more unstructured data enters the business, the concepts of big data will work their way into these medium-sized and small companies just as they are working their way into large companies now.
Predictive analytics is actionable information
Predictive analytics is the use of statistical analysis over data. The output is insight into the data set as well as predictions or guidance about future activities. Realistically, it is statistics beyond the one statistic most people are familiar with: the average or mean. Where the rubber meets the road, so to speak, is how to deploy the insight gained (descriptive statistics) in a way that is understandable to the average businessperson and actionable within business processes.
Predictive analytics is not reporting
Reporting on historical data can take many forms and provide the basis for making inferences on future activities. However, show the same report to other people, and different interpretations result. Predictive analytics takes the human intuition factor out—or at least provides a solid foundation of fact for the salesperson or marketer to make a more informed decision about future events.
That said, neither more education nor experience is the answer. It sounds obvious, but data volumes are increasing, with no sign that less data will be collected in the future. Analysts will have the similar, ever-increasing problem of keeping up with the data volumes and an increase in the variety of data incoming. Hence, the need for applying automated analytical processes.
Customer clusters: the heart of segmentation
As with any predictive analytics process, several methodologies and algorithms are available for segmenting your customers. Some of the more common categories include support vector machines, clustering algorithms of various flavors, and neural networks. There are many other places to learn about which technique is best in different situations, and many practitioners have their own preferences. The point of this article is not to judge the mathematical merits of the techniques but rather to guide you on how to integrate the results of the segmentation modeling processes into line-of-business (LOB) applications.
IBM® SPSS® Statistics includes several of the above-mentioned statistical techniques available. Figure 1 shows some of the available options, built into SPSS Statistics as menu commands.
Figure 1. The standard classification submenu in SPSS Statistics
Figure 2 shows that SPSS Statistics has even organized some of the processes into a special menu that includes customer segmentation as a quick click.
Figure 2. The pop-up menu for the Direct Marketing option
The outputs of these analytical processes vary. All yield tables that give the definition of each cluster, and these tables present mean values using all the relevant input variables. The tables present the range of values for each variable for each cluster—information that is vital to programmatically integrating the clustering models into LOB applications. However, by themselves, the tables and numbers are difficult for technical people to interpret. They are virtually unintelligible for the average businesserson.
Fortunately, most model output also includes graphs, while some show a decision tree. I highlight both types of graphical output, because they form the first-pass basis for understanding what the model is telling.
Clustering graphs quickly show the grouping of customers. Obviously, you are limited in the number of variables that can be put on a single graph. Often, you need to see several graphs using different combinations of variables to get a good sense of how the customers are being divided.
Most of the algorithms cluster-normalize the variables. If you are not familiar with the concept of normalizing variables, Wikipedia offers more information (see Resources). Please keep this information in mind when interpreting the graphs and especially when explaining the models to business people.
The other graphical output I mention is the decision tree—a chart of logic used to get from the general to the specific by using tests at each branching point to arrive eventually at an end node. In this case, the end node is membership in a specific segment.
Business uses of segmentation models
Your business people probably already have some type of segmentation model. There may even be several. It's natural for people to try to group "things" into categories, as doing so makes future interpretation easier.
The problem for most businesses is that there are probably several segmentation models at work. Each is applied by different departments and different people within departments, and each is used differently now and over time. It makes for a real mess.
By leading a segmentation-modeling project, you can get agreement across the organization. You will prevent interdepartmental conflict and be able to code in a segmentation model for all that has a solid foundation. In my experience, two main departments in most commercial organizations use customer segmentation: marketing and sales.
Most marketers love to group customers. The very word marketing implies that there is a market or group of potential customers. However, there's an old adage: "Half of all marketing dollars are wasted. Unfortunately, no one knows which half." More effective targeting reduces costs for marketing while increasing the impact. Having a consistent segmentation model is a good step toward reducing waste.
A customer segmentation model works for marketing in several ways. Among generic uses are focusing a campaign on customers most likely to respond, not targeting customers with irrelevant campaigns, tailoring products to specific segments, and penetrating new markets. Here, the effect is to reduce spending, get more orders for dollars spent, and not waste money on guesses.
Sales departments also use segmentation models. Their use may be more informal and dispersed among individual salespeople, but the effects are dramatic.
A good segmentation model helps salespeople cross-sell products. Order history of customers in the same segment can be used to cross-sell to other, similar customers. After all, similar customers stand a better chance of ordering similar products.
Some segmentation models break down a customer's life cycle. Salespeople are able to spot signs that customers are about to depart, and sales processes should be designed to retain customers longer and preserve customer relationships. Likewise, new customers' ordering patterns can be brought to mimic longer-term customers more quickly.
Interacting with the segmentation model
After the segmentation model is constructed, there are several levels to how people and business processes interact with it.
The presentation level
Even with the best of modeling math and phenomenally good data preparation, your business users need to be "sold" on the model. To that end, you need to present the modeling processes and the final results in a meeting.
I mentioned earlier that SPSS Statistics presents charts and graphs of the segmentation models: You will need these. Actually, you will probably need several, and they will have to be simple. Throw in a decision tree diagram or at least an example of one branch of a larger tree. These graphics will aid in communicating the model.
Don't minimize the backup information, though. You may not know whom, but some of your business people will remember their statistics. These people will want to see the methodology you used to create the segmentation model. They may not understand it all, but they will want to see the substantiation. (In the real world, you will have had a committee advising you that included key business people. Nevertheless, others need to buy in to the quality of the model before adapting business processes to the results.)
At this level, SPSS Statistics can be your friend, and many documentation options are available to provide the substantiation users seek. You will also want to include a section on the mathematics you used. People may not question it, but it's good to head off the questions with information they can read.
The spreadsheet level
In a spreadsheet, demonstrate how your model works during the adoption process. Before your business people can become truly comfortable with your segmentation models and predictions made from them, the model will have to be deployed to a spreadsheet.
Large companies can sometimes bypass this step, because LOB people are required to follow top-down procedures by edict. That is just the nature of large company bureaucracy. However, I have found it useful to deploy a spreadsheet interface to the model to enhance my own comfort level and the comfort level of executives. At mid-sized companies, you will need the spreadsheet models to give that same comfort level to business people in their familiar spreadsheets before you can deploy it to the enterprise resource planning (ERP) and CRM applications.
The easiest way to execute this step is to start with a spreadsheet of prospective or current customers. Each prospect or customer represents a line, and the values in the columns are the characteristics that the model needs to classify. SPSS Statistics scores each prospect in the spreadsheet against the model, returning a segment membership for each prospect.
The ERP and CRM integration level
The holy grail of deployment for any predictive system is to integrate it into business processes. For most companies, this means placing the insight and specific predictions into the ERP and CRM applications.
Customer segmentation models that have been through the previous two levels of evaluation are ready to go for integration. They have buy-in from business people; they have been validated in the spreadsheet level. The models can then be programmatically integrated into customer information screens, order entry screens, and CRM systems and used to create cross-sell recommendations, among other uses.
Depending on how you intend to use the model, there are several ways for
applications to interact with the segmentation model. First, the model can
be directly queried. If you have approached the model using the concept of
a decision tree, you can create complex but quick queries that have the
criteria for segmentation membership within the query. Just imagine a
single query that looks at a customer or prospect, takes a few fields in
the database for that customer or prospect, and then performs what is
essentially a large
IF statement to output
which segment or cluster that customer fits into.
Another way would be to query against the model as the spreadsheet level above did. IBM SPSS Modeler supports live queries against the existing model. Send SPSS Modeler the relevant database values for a customer or prospect, and it sends back a classification.
A third way would be to classify all customers and prospects in a batch process, and then place those results as a column within a customer data file within the ERP or CRM application. This may seem a less elegant method; after all, it is not real time and doesn't involve queries and programs to be integrated. In reality, people not familiar with querying predictive models can use the data file for other purposes. The segmentation classes become just another field in the database that can be accessed and queried for purposes other than intended fostering experimentation and further use.
The downside is that there is delay. Segmentation membership may change before you can update the database file. New prospects need to be scored by you or someone else with access to SPSS Statistics. However you choose to integrate the segmentation model, though, you will have successfully woven predictive analytics into the fabric of your company's operations.
IBM tools get to the result
I have already mentioned SPSS Statistics, which is the starting point for analysis. Using this tool requires that you review your college statistics materials. Segmentation is, fortunately, an area of statistics that is fairly easy to grasp.
When moving into production using some of the integration levels discussed in the previous section, you need SPSS Modeler running on a server to be able to query against your segmentation model. SPSS Modeler is the key product for real-time classification of new prospects and customers. It is the server application that responds to variable input, and then returns a classification in real time. (See Resources.)
Big data in the future
Most companies (especially small and mid-sized organizations) don't collect much data that would fall under the category of big data today other than marketing organizations tracking website navigation. That is changing fast, though. New data elements are coming in that definitely fall under big data.
In the future, retailers may track individual customers through stores. Manufacturers may track product usage at customers. These and other types of data come into the data center as unstructured or semi-unstructured data, and there is too much to manually analyze. These characteristics certainly sound like big data.
I bring the concept of big data up not to scare anyone away: You can experiment with big data now using IBM InfoSphere® BigInsights™. Using InfoSphere BigInsights Basic Edition, either as a no-charge download or on IBM SmartCloud™ is a great way of exploring big data using current data elements, and then integrating future data additions.
Call it foreshadowing: The skills and techniques you learn today in predictive analytics have a place in a big data future.
Most types of businesses need to segment their customers to bring a uniform understanding across different departments. Success stories abound in retail, distribution, health care, government, and every type of company that conducts transactions on the Internet. To make an impact on business processes, the segmentation must be placed into the business processes so that business users can act on the information.
Customer segmentation analysis involves commonly understood statistics and is easily discussed with business people. It's a great first step into the realm of predictive analytics.
- Read Wikipedia's entry on normalizing variables.
- Learn more about big data and its role in the enterprise.
- Visit IBM developerWorks Business analytics for more analytic technical resources for developers.
- Visit IBM developerWorks Industries for all the latest industry-specific technical resources for developers.
- Follow developerWorks on Twitter.
- Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
- Learn more about SPSS Statistics.
- Learn more about SPSS Modeler and how to use it for intensive data mining.
- Learn more about SPSS Modeler Server.
Get products and technologies
- InfoSphere BigInsights Basic Edition is an integrated, tested and pre-configured, no-charge download for anyone who wants to experiment with Hadoop. You can also use this product on the cloud.
- Evaluate more IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.
Dig deeper into Big data and analytics on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Crazy about Big data and analytics? Sign up for our monthly newsletter and the latest Big data and analytics news.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.