Classifying Telecommunications Customers (Multinomial Logistic Regression)
Logistic regression is a statistical technique for classifying records based on values of input fields. It is analogous to linear regression but takes a categorical target field instead of a numeric one.
For example, suppose a telecommunications provider has segmented its customer base by service usage patterns, categorizing the customers into four groups. If demographic data can be used to predict group membership, you can customize offers for individual prospective customers.
This example uses the stream named telco_custcat.str, which references the data file named telco.sav. These files are available from the Demos directory of any IBM® SPSS® Modeler installation. This can be accessed from the IBM SPSS Modeler program group on the Windows Start menu. The telco_custcat.str file is in the streams directory.
The example focuses on using demographic data to predict usage patterns. The target field custcat has four possible values that correspond to the four customer groups, as follows:
Value | Label |
---|---|
1 | Basic Service |
2 | E-Service |
3 | Plus Service |
4 | Total Service |
Because the target has multiple categories, a multinomial model is used. In the case of a target with two distinct categories, such as yes/no, true/false, or churn/don't churn, a binomial model could be created instead. See the topic Telecommunications Churn (Binomial Logistic Regression) for more information.