This type of statistical analysis (also known as logit model) is often used for predictive analytics and modeling, and extends to applications in machine learning. In this analytics approach, the dependent variable is finite or categorical: either A or B (binary regression) or a range of finite options A, B, C or D (multinomial regression). It is used in statistical software to understand the relationship between the dependent variable and one or more independent variables by estimating probabilities using a logistic regression equation.
This type of analysis can help you predict the likelihood of an event happening or a choice being made. For example, you may want to know the likelihood of a visitor choosing an offer made on your website — or not (dependent variable). Your analysis can look at known characteristics of visitors, such as sites they came from, repeat visits to your site, behavior on your site (independent variables). Logistic regression models help you determine a probability of what type of visitors are likely to accept the offer — or not. As a result, you can make better decisions about promoting your offer or make decisions about the offer itself.
Machine learning uses statistical concepts to enable machines (computers) to “learn” without explicit programming. A logistic approach fits best when the task that the machine is learning is based on two values, or a binary classification. Using the example above, your computer could use this type of analysis to make determinations about promoting your offer and take actions all by itself. And, as more data is provided, it could learn how to do this better over time.
Some types of predictive models that use logistic analysis:
Predictive models built using this approach can make a positive difference in your business or organization. Because these models help you understand relationships and predict outcomes, you can act to improve decision-making. For example, a manufacturer’s analytics team can use logistic regression analysis as part of a statistics software package to discover a probability between part failures in machines and the length of time those parts are held in inventory. With the information it receives from this analysis, the team can decide to adjust delivery schedules or installation times to eliminate future failures.
In medicine, this analytics approach can be used to predict the likelihood of disease or illness for a given population, which means that preventative care can be put in place. Businesses can use this approach to uncover patterns that lead to higher employee retention or create more profitable products by analyzing buyer behavior. In the business world, this type of analysis is applied by data scientists whose goal is clear: to analyze and interpret complex digital data.
Certainly multinomial analysis can help when you are examining a range of categorical outcomes: A, B, C or D. But binary analysis — yes or no, present or absent — is more often used. Although the outcomes are constrained, the possibilities are not. Binary logistic regression can be used to examine everything from baseball statistics to landslide susceptibility to handwriting analysis.
This approach to analytics also proves useful for a range of statistical concepts and applications:
The use of statistical analysis software delivers great value for approaches such as logistic regression analysis, multivariate analysis, neural networks, decision trees and linear regression. But remember: hardware and cloud-computing solutions should also be considered if you need to accommodate large data sets either on premises, in the cloud or in a hybrid cloud configuration.
When is this approach most effective or ineffective?
While binary logistic regression is more often used and discussed, it can be helpful to consider when each type is most effective.
Multinomial can be used to classify subjects into groups based on a categorical range of variables to predict behavior. For example, you can conduct a survey in which participants are asked to select one of several competing products as their favorite. You can create profiles of people who are most likely to be interested in your product, and plan your advertising strategy accordingly.
Binary is most useful when you want to model the event probability for a categorical response variable with two outcomes. A loan officer wants to know whether the next customer is likely to default — or not default — on a loan. Binary analysis can help assess the risk of extending credit to a particular customer.
It is also helpful to understand when this type of analysis might be ineffective, according to Classroom – The Disadvantages of Logistic Regression (link resides outside ibm.com). Here are some hazards to watch out for:
You could perform this analytics approach in Microsoft Excel, but for nearly all applications, including conditional logistic regression, multiple logistic regression and multivariate logistic regression, using either open source (logistic regression R) or commercial (logistic regression SPSS) software packages is recommended to analyze data and apply techniques more efficiently. You can perform the analysis in Excel or use statistical software packages such as IBM SPSS® Statistics that greatly simplify the process of using logistic regression equations, logistic regression models and logistic regression formulas.
Comparison to linear regression
When to use linear or logistic analysis is a common query. Basically, linear regression analysis is more effectively applied when the dependent variable is open-ended or continuous — astronomical distances or temperatures, for example. Use the logistic approach when the dependent variable is limited to a range of values or categorical — A or B...or A, B, C or D.
Binary logistic regression can help bankers assess credit risk. Imagine that you are a loan officer at a bank and you want to identify characteristics of people who are likely to default on loans. Then you want to use those characteristics to identify good and bad credit risks. You have data on 850 customers. The first 700 are customers who have already received loans. See how you can use a random sample of these 700 customers to create a logistic regression model and classify the 150 remaining customers as good or bad risks.
Multinomial analysis is valuable to those who want to profile the consumers of packaged goods. Imagine that your consumer packaged goods company is looking to improve the marketing of breakfast products. You have polls of 880 people, noting their age, gender, marital status and whether or not they lead an active lifestyle. Each participant then tasted three breakfast foods and was asked which one they liked best. See how a multinomial approach can help determine marketing profiles for each breakfast option.
First Tennessee Bank boosted profitability with IBM SPSS software and achieved increases of up to 600 percent in cross-sale campaigns. Leaders at this regional bank in the US wanted to approach the right customers with the right products and services. There is no shortage of data to help, but it was a challenge to bridge the gap from having data to taking action. First Tennessee is using predictive analytics and logistic analytics techniques within an analytics solution to gain greater insight into all of its data. As a result, decision-making is improved to optimize customer interactions. (1 MB)
Reach more accurate conclusions when analyzing complex relationships using univariate and multivariate modeling techniques.
Drive return on investment with a drag-and-drop data science tool.
Predict categorical outcomes and apply a wide range of nonlinear regression procedures.
Build and train AI and machine-learning models, prepare and analyze data — all in a flexible, hybrid cloud environment.
Get a smart, simple way to mine and explore all your unstructured data with cognitive exploration, powerful text analytics and machine-learning capabilities.