Predict prospect-to-customer conversion with analysis of surveys and SPSS Statistics

Turn survey results into usable predictors with SPSS Statistics

Surveys of customers and prospects are becoming more common as web-based tools allow for quick deployment. As this information floods into the enterprise, it is often not organized or merged with other survey efforts. Most marketing and sales departments glance at the results, cherry-pick those customers who bother to write comments, and then ignore the rest. IBM® SPSS® Statistics comes from a background of survey analysis, but most business managers and analysts do not have that background. Those people can use the Direct Marketing menu of SPSS Statistics to develop a predictive model for prospects who are more likely to purchase products. In this article, explore the best practices to create a statistically valid sample, how the predictive algorithm in SPSS Statistics works, and how to apply the predictive model to ongoing surveys.


David Gillman, Director, Services, Data Sooner

Photo of David GillmanDavid Gillman has worked in the areas of business intelligence, data mining, and predictive analytics for 20 years. His educational background is in applied math, optimization, and statistical analysis, with particular emphasis on applications to commercial activities. He has hands-on experience in improving business operations through applied analytics in the distribution, manufacturing, retail, and hospitality industries with companies of various sizes.

20 January 2014

Also available in Russian

Marketing departments love surveys. After all, they collect information with little to no cost. Even if the department sends a little gift to respondents, the information that is gathered is much more valuable than the price paid to the person who fills out the survey.

At least that's the justification given.

The reality is that most surveys are done because people in marketing feel they need to or they budget for it. Don't get me wrong: There is often great value in the information gathered. The problem is that most companies don't do much to unlock the value of the information in the survey.

Fortunately, IBM® SPSS® Statistics comes from the background of social sciences, where survey analysis is commonplace. Many built-in features make analyzing survey results easy for all technical levels.

A survey of surveying in marketing

Surveys come in many forms, from the old-fashioned, mailed survey to the questions asked in the middle of webinars, with many methods in between. In their many forms and uses, though, surveys are rarely used to predict prospect conversion. The examples in this article demonstrate how to use SPSS Statistics to analyze the surveys that your organization collects.

Smart marketing departments ask similar questions over time to monitor how responses change over that period. Plan to store the results in a single database where questions, their answers, and time stamps can be retrieved easily.

Unfortunately, results from surveys are often stored in different formats in different locations. It is not unusual to have some surveys that are stored as text files that come from a web server. Other results might be stored in database files that come from web services firms. Still others might be entered into spreadsheets and need to be unified to unlock the full value of their results. The disparate storage formats add another level of complexity for merging results that is beyond this article.

Once merged, though, SPSS Statistics offers simple first steps to insightful analyses. Most people feel comfortable and better informed knowing the mean, median, and variance of the answers for each question. SPSS Statistics can certainly do that. Using the concept that each survey is a case and each question is a variable, simply click Analyze > Descriptive Statistics as shown in Figure 1. The options show the basic statistics of the answers for that survey question.

Figure 1. Using SPSS Statistics to see basic statistics of the survey question responses
Image showing how to use SPSS Statistics to see basic statistics of the survey question responses

One trick for extracting insight from surveys is to track the changes in the mean and other basic statistics over time. For questions that are the same from survey to survey, enter their basic statistics as derived from SPSS Statistics into a simple spreadsheet, and then graph that as a trend chart.

The basic statistics work great for providing a glimpse into the prospects and customers, but you are limited both by how the question is phrased and how the reader interprets it. Free-form comment boxes provide survey respondents the means to express themselves fully. Sadly, most companies look over the comment boxes once, cherry-pick for comments that say they want a salesperson to contact them, and then discard the rest. Many potential insights are left unanalyzed in these comment boxes. Full analysis by using machine learning is done through text analytics programs such as IBM SPSS Text Analytics for Surveys, but that type of analysis is best done with numerous survey responses. In this article, I perform basic analysis by using the standard types of responses that are found in common surveys. I don't cover text analytics.

The Direct Marketing menu for simplicity

SPSS Statistics is a powerful statistical analysis package. Its many algorithms would suffice for almost any commercial business analysis need, but most people view statistics as difficult. I suppose that is why the Direct Marketing menu was created in SPSS Statistics many versions ago and is maintained to the current version.

The Direct Marketing menu is shown in Figure 2. You can see that it describes each option by the result, not by a statistical term. The menu is designed for business people to begin doing basic statistical analysis.

Figure 2. Direct Marketing menu in SPSS Statistics
Image showing the Direct Marketing menu in SPSS Statistics

The options aren't quite "wizard" like, but they do simplify the menus of the underlying algorithms. The screens use terms closer to what a statistical novice might understand.

This article uses the Generate profiles of my contacts who responded to an offer menu option, as highlighted in Figure 2.

Gathering the Data

Unless your company is large, survey responses probably fit into a spreadsheet. The spreadsheet also provides an easy-to-use platform for combining responses from multiple surveys and sharing that information with others to obtain validation and buy-in. As your products and services change over time, so too does your customer and prospect universe, so be careful when you combine older surveys with newer ones.

Get buy-in from line-of-business managers in the beginning so they accept the results and predictions of the model. A great starter project is to marry a recent survey of prospects with a survey of customers in which the same questions are asked. If the surveys come from two different initiatives or departments, so much the better for involving more people.

Finding and assembling the data to use might be your most challenging task. Most companies store their survey results by project or survey and do not have an aggregated database of questions and answers. Using spreadsheets to combine survey results is invariably the easiest path.

Figure 3 shows typical spreadsheets of survey responses with a column added to the end of the question columns to indicate whether the survey came from a buyer (customer). The values that are used are Y for yes and N for no.

Figure 3. Spreadsheets of aggregated survey responses
Image showing spreadsheets of aggregated survey responses

The algorithm that the Generate profiles of my contacts who responded to an offer wizard requires one field and only one field to be the target field. That's why the buyer column was added to the survey responses. All other columns are treated as variables.

Algorithm basics

As mentioned earlier, the algorithm for the Generate profiles of my contacts who responded to an offer wizard needs two types of fields: variables and a single target field. For surveys, the variables are each question asked that provides a preformatted answer (such as circle a value or give a letter grade).

Many business people assume that the only cases wanted in the data are surveys for customers but that's not true. For a predictive model to work, the data needs a mix of customers and noncustomers. The algorithm can then sense the differences. If there were only customers, the algorithm might not find differences between customers and noncustomers because it would not see what a noncustomer's responses looked like. The ratio of customers to noncustomers depends on the range of the values of responses for each question. For the sake of simplicity, having a 50-50 mix of customers and noncustomers works well for most analyses.

The modeling procedure yields rules by predicting which responses indicate a customer over a prospect. Back-applying those rules to the prospects gives a simple indicator of prospects who are more likely to purchase. The rules that the Generate profiles of my contacts who responded to an offer wizard generates are simple and easy to understand. They are also easy to apply against future surveys in the form of simple filters on reports that result in a list of all respondents who match the rules.

Predictive analytics purists might scoff at the inelegance of this method. In practice, it has the twin winning characteristics of being easy to understand for the average businessperson and working directly against data (surveys) that are already being collected. These reasons go a long way toward getting businesspeople to accept and use the predictions to guide their calls, emails, and appointments to convert prospects to customers.

The predictive analytics purists can now correctly point out that more precise and sophisticated algorithms are available, even within SPSS Statistics. Decision trees and neural networks are two prime examples, although they have their drawbacks in a business setting.

Decision tree algorithms require much iterative work by a trained analyst to achieve a useful result, often called pruning the tree. Many runs are made over the test data to test against a validation set. The modeler tweaks the parameters to achieve better results as the model is being developed. This experimentation is great for a statistician, but most business analysts get lost in the mechanics of evaluating and improving models.

Other algorithms, such as neural networks, generate a slightly better predictive model. The cost in time and training to execute these algorithms is going to be way beyond the average organization's knowledge and resource constraints.

Data fields and quality

The Generate profiles of my contacts who responded to an offer wizard uses an algorithm that is sensitive to dependent variables. Dependent variables are pairs or sets of variables in which the answers change together. Most businesspeople understand the concept of correlation—when one thing changes, another thing also changes. In customer surveys, if a customer answers that he is not satisfied with the product, the customer is not likely recommend the product to others.

The how-to examples that I present in this article are simple and do not highlight this problem, but in the real world, different questions that are asked in a survey are often similar. The answers that are given are then related. When you construct the predictive model off the survey data, do not include the dependent variables. One option beyond the scope of this article is to combine the answers of similar questions into a single variable.

A technique to try after you identify dependent variable sets is to run the Generate profiles of my contacts who responded to an offer wizard repeatedly, changing the dependent variables one at a time. See which model appears to work best, and then use that one going forward to predict prospect behavior.

Example 1

In Figure 3, you saw two lists of survey responses in spreadsheet form. In both, the buyer column was added with a value of either Y or N, indicating whether the respondent was a customer. This field is the target for the algorithm.

Now, I merge these two lists and bring them into the SPSS Statistics data view. Here is the process that I used:

  1. Click Direct Marketing > Choose Technique, as highlighted in Figure 4.
    Figure 4. Accessing the Direct Marketing menu
    Image showing how to access the Direct Marketing menu
  2. In the Direct Marketing window, click Generate profiles of my contacts who responded to an offer, and then click Continue.
  3. In the Prospect Profiles window, which is shown in Figure 5, notice that all of the columns in the spreadsheet appear in the Field list.
    Figure 5. Beginning the Generate profiles of my contacts who responded to an offer analysis
    Image showing the beginning of the Generate profiles of my contacts who responded to an offer analysis
  4. Select the target field—Buyer—as the Response Field.
  5. Enter Y in the Positive response value field.

    Figure 6 shows the completed target field values.

    Figure 6. The completed Response field
    Image showing the completed Response field
  6. Select variables from the Fields list to use for predicting. Move them from the Fields list to the Create Profiles with list.

    This example uses a subset of the questions. Start with all the nondependent variables in the data set as the first pass. The resulting rules show which variables are not pertinent.

  7. With the Fields tab now complete, click the Settings tab.
  8. Specify a minimum group size.

    This value depends on the number of cases in the survey data. Too small a value, and the analysis has too many groups to consider. Too large a value, and the analysis misses meaningful positive response groups.

  9. Use the check box and target response rate box to set a minimum threshold of positive responses in each predicted group.

    Again, if set too high, the analysis ignores groups that have a not-positive-enough response rate. Too low, and the analysis presents too many groups, many of which are not meaningful.

    The Settings tab with the relevant values for this data set is shown in Figure 7.

    Figure 7. The completed Settings tab
    Image showing the completed Settings tab
  10. Click Run.

    Results of the analysis are displayed in the Output window, as shown in Figure 8.

    Figure 8. Output window of the analysis
    Image showing the output of the analysis

Because of the values on the Settings tab, the results are color-coded green for above the target response rate and red for below the target response rate. Obviously, look to the green rules. Use the rules in the Description field for predictions.

In this example, only one rule predicts a customer over a prospect. If Q2 (think question 2 on the survey) is answered with a 5, then that is a good predictor. In a real-world situation, take this value, look at surveys in which the prospect answered 5, and target those people for offers or continued sales efforts. In this simple way, the rule is used to modify business processes.

Advanced statisticians are quick to point out that there might be selection bias in the question. Simply comparing two groups, and then back-applying the rule to the same data set is not the correct methodology. This analysis assumes that the population characteristics of customers are the same or reasonably similar to the population of the prospects. The inference is that a prospect who answered 5 is more similar to a customer. Carefully consider your questions to make sure that they are not anachronistic questions—that is, questions that customers answer differently than the prospects because they are customers.

This predictive modeling technique does work in the real world. It is easy for businesspeople to understand the rules that are generated and the questions and variables that are used in analysis. To help others understand, the graph that appears by default illustrates the response rate decline as the rules are applied to the data, as shown in Figure 9.

Figure 9. Graph of the response rate for the rules
A graph of the response rate for the rules

Use caution when you put too many questions into the algorithm. Using the same data, Figure 10 shows the rules when all questions and variables are included in the analysis. In some situations, the multiple case rules are useful, but they are more difficult to explain to businesspeople. Even though the response rate is better, it is more difficult for people to apply the rules as mental IF-THEN-ELSE statements. Rules like the rules in Figure 10 are good in a big data environment, though, where the application is done automatically over incoming survey responses.

Figure 10. More complicated rule set by using more variables
Image showing a more complicated rule set using more variables

Example 2

Merging surveys over time is another useful technique. Many times, the same or similar questions are asked in unrelated surveys. Provided the response types are similar, combine the response cases, and select only those questions that are persistent over time.

Note: A side area of analysis is to study how the responses change over time, but that is a topic for another how-to article that involves other algorithms in SPSS Statistics.

Figure 11 shows three different surveys in which the questions asked are slightly different. To distinguish the differing questions, the surveys are labeled with a letter and number.

Figure 11. Surveys in three different spreadsheets
Image showing surveys in three different spreadsheets

Figure 12 shows the combined variables, where each question is its own column.

Figure 12. Combined surveys, with questions coded to differentiate
Image showing combined surveys, with questions coded to differentiate

For this analysis, I only use questions that are identical over all the surveys.

Figure 13 shows the prospect profiles for the Generate profiles of my contacts who responded to an offer wizard for this example.

Figure 13. The Prospect Profiles tab
Image showing the Prospect Profiles tab

As in Example 1, the Buyer field is the Response Field with a positive value of Y. Only those variables that represent identical questions over time are used in the Create Profiles with box. The Settings tab is completed with threshold and group size values, as shown in Figure 14.

Figure 14. The Settings tab
Image showing the Settings tab

Click Run to fire off the algorithm. Figure 15 shows the results as displayed in the SPSS Statistics output window. As you can see, the algorithm shows two groups that meet the response rate minimum that is specified on the Settings tab.

Figure 15. Output for Example 2
Image showing the output for Example 2

The first rule is the best and involves only one question. That rule is easy to implement and communicate. The second rule is more complicated because it involves looking at the responses to two questions, but when you view the rules together, Q7 is a pivotal question. One might reasonably interpret and communicate to businesspeople that the higher the answer on Q7, the better the prospect, especially if he answers yes to Q9.

Big data

The rules that are generated in the earlier examples are typical for survey analysis. They can be built easily into selection criteria for queries and reports. The same rules can be built into big data queries. IBM InfoSphere® BigInsights™ is a great place to do this.

The simplest situation is a big data process that continually scans survey questions from different environments as they come into InfoSphere BigInsights storage. When set up to store the results in real time, InfoSphere BigInsights can either alert someone when the conditions are met or fire off an offer, email, or next web page to the person who fills out the survey.

This scenario might be the way that your company starts to use big data for real-world applications. It is directly tied to sales and customer relations and is easily understood by management. It ties SPSS Statistics to big data in a direct way that increases sales and personalization in marketing.


The SPSS Statistics Direct Marketing menu is great for business analysts who are new to SPSS Statistics but still want to do meaningful work right away. Applying the Generate profiles of my contacts who responded to an offer option to existing survey data is easy when you understand a few caveats and restrictions. Using existing surveys, the predictive rules that are generated are easy to understand, communicate, and apply to business processes—a quick win for the analyst.



Get products and technologies



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks

Zone=Big data and analytics
ArticleTitle=Predict prospect-to-customer conversion with analysis of surveys and SPSS Statistics