Home Topics Linear Regression What is linear regression?
Generate predictions using an easily interpreted mathematical formula
Discover IBM SPSS Statistics Sign up for AI updates
Illustration showing the relationship between trends, data analysis, and prediction in linear regression

IBM TechXchange Conference 2024 | October 21-24 in Las Vegas

Join the must-attend event for technologists using IBM products and solutions. Explore the growing session catalog of over 1,200 sessions and labs.

Explore and register for TechXchange
What is linear regression?

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values. There are simple linear regression calculators that use a “least squares” method to discover the best-fit line for a set of paired data. You then estimate the value of X (dependent variable) from Y (independent variable).

The data store for AI

Discover the power of integrating a data lakehouse strategy into your data architecture, including enhancements to scale AI and cost optimization opportunities.

Related content

Register for the IDC report

Generate predictions more easily

You can perform linear regression in Microsoft Excel or use statistical software packages such as IBM SPSS® Statistics that greatly simplify the process of using linear-regression equations, linear-regression models and linear-regression formula. SPSS Statistics can be leveraged in techniques such as simple linear regression and multiple linear regression.

You can perform the linear regression method in a variety of programs and environments, including:

  • R linear regression.
  • MATLAB linear regression.
  • Sklearn linear regression.
  • Linear regression Python.
  • Excel linear regression.
Why linear regression is important

Linear-regression models are relatively simple and provide an easy-to-interpret mathematical formula that can generate predictions. Linear regression can be applied to various areas in business and academic study.

You’ll find that linear regression is used in everything from biological, behavioral, environmental and social sciences to business. Linear-regression models have become a proven way to scientifically and reliably predict the future. Because linear regression is a long-established statistical procedure, the properties of linear-regression models are well understood and can be trained very quickly.

A proven way to scientifically and reliably predict the future

Business and organizational leaders can make better decisions by using linear regression techniques. Organizations collect masses of data, and linear regression helps them use that data to better manage reality — instead of relying on experience and intuition. You can take large amounts of raw data and transform it into actionable information.

You can also use linear regression to provide better insights by uncovering patterns and relationships that your business colleagues might have previously seen and thought they already understood. For example, performing an analysis of sales and purchase data can help you uncover specific purchasing patterns on particular days or at certain times. Insights gathered from regression analysis can help business leaders anticipate times when their company’s products will be in high demand.

Learn more about linear regression at the IBM Knowledge Center
Key assumptions of effective linear regression

Assumptions to be considered for success with linear-regression analysis:

  • For each variable: Consider the number of valid cases, mean and standard deviation. 
  • For each model: Consider regression coefficients, correlation matrix, part and partial correlations, multiple R, R2, adjusted R2, change in R2, standard error of the estimate, analysis-of-variance table, predicted values and residuals. Also, consider 95-percent-confidence intervals for each regression coefficient, variance-covariance matrix, variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook and leverage values), DfBeta, DfFit, prediction intervals and case-wise diagnostic information. 
  • Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
  • Data: Dependent and independent variables should be quantitative. Categorical variables, such as religion, major field of study or region of residence, need to be recoded to binary (dummy) variables or other types of contrast variables.  
  • Other assumptions: For each value of the independent variable, the distribution of the dependent variable must be normal. The variance of the distribution of the dependent variable should be constant for all values of the independent variable. The relationship between the dependent variable and each independent variable should be linear and all observations should be independent.
Try SPSS Statistics for free
Make sure your data meets linear-regression assumptions

Before you attempt to perform linear regression, you need to make sure that your data can be analyzed using this procedure. Your data must pass through certain required assumptions.

Here’s how you can check for these assumptions:

  1. The variables should be measured at a continuous level. Examples of continuous variables are time, sales, weight and test scores. 
  2. Use a scatterplot to find out quickly if there is a linear relationship between those two variables.
  3. The observations should be independent of each other (that is, there should be no dependency).
  4. Your data should have no significant outliers. 
  5. Check for homoscedasticity — a statistical concept in which the variances along the best-fit linear-regression line remain similar all through that line.
  6. The residuals (errors) of the best-fit regression line follow normal distribution.
Use this hands-on tutorial to learn more about linear regression data assumptions
Examples of linear-regression success
Evaluating trends and sales estimates

You can also use linear-regression analysis to try to predict a salesperson’s total yearly sales (the dependent variable) from independent variables such as age, education and years of experience.

Analyze pricing elasticity

Changes in pricing often impact consumer behavior — and linear regression can help you analyze how. For instance, if the price of a particular product keeps changing, you can use regression analysis to see whether consumption drops as the price increases. What if consumption does not drop significantly as the price increases? At what price point do buyers stop purchasing the product? This information would be very helpful for leaders in a retail business.

Assess risk in an insurance company

Linear regression techniques can be used to analyze risk. For example, an insurance company might have limited resources with which to investigate homeowners’ insurance claims; with linear regression, the company’s team can build a model for estimating claims costs. The analysis could help company leaders make important business decisions about what risks to take.

Sports analysis

Linear regression isn’t always about business. It’s also important in sports. For instance, you might wonder if the number of games won by a basketball team in a season is related to the average number of points the team scores per game. A scatterplot indicates that these variables are linearly related. The number of games won and the average number of points scored by the opponent are also linearly related. These variables have a negative relationship. As the number of games won increases, the average number of points scored by the opponent decreases. With linear regression, you can model the relationship of these variables. A good model can be used to predict how many games teams will win.

Related solutions
IBM SPSS Statistics software

Propel research and analysis with this fast, powerful solution.

Explore SPSS Statistics
IBM SPSS Statistics Grad Pack and Faculty Packs

Students, teachers and researchers get affordable access to predictive-analytics software.

Explore SPSS Statistics Grad Pack and Faculty Packs Try SPSS Statistics free
IBM Cognos® Analytics

This proven, self-service analytics solution helps enable you to mix and match your data and create compelling visualizations.

Explore Cognos software
Resources The details of linear regression in the IBM Knowledge Center A simple example of linear regression in a free tutorial How to use linear regression to model vehicle sales Method selection variable regression models About the linear-regression set rule Linear-regression statistics Linear-regression plots Linear regression variable-selection methods Save new variables, predicted values, residuals and other statistics Linear regression from scratch

Discover how to turn math into code and then run the code on a data set to get predictions on new data.

Linear regression in R

Building and validating linear regression models with R.

Take the next step

IBM SPSS Statistics offers advanced statistical analysis, a vast library of machine learning algorithms, text analysis, open-source extensibility, integration with big data and seamless deployment into applications.

Explore SPSS Statistics Try free for 30 days