What is linear regression

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values. There are simple linear regression calculators that use a “least squares” method to discover the best fit line for a set of paired data. You then estimate the value of X (dependent variable) from Y (independent variable).

An example of linear regression scatterplot graph

Generate predictions more easily

You can perform linear regression in Excel or use statistical software packages such as IBM© SPSS© Statistics that greatly simplify the process of using linear regression equations, linear regression models, linear regression formula. IBM SPSS Statistics can be leveraged in techniques such as simple linear regression and multiple linear regression.

You can perform the linear regression method in a variety of programs and environments, including:

  • R linear regression
  • MATLAB linear regression
  • Sklearn linear regression
  • Linear regression Python
  • Excel linear regression

Why linear regression is important

Linear regression models are relatively simple and provide an easy-to-interpret mathematical formula that can generate generate predictions. Linear regression can be applied to various areas in business and academic study. 

You’ll find that linear regression is used in everything from biological, behavioral, environmental and social sciences to business. Linear regression models have become a proven way to scientifically and reliably predict the future. Because linear regression is a long-established statistical procedure, the properties of linear regression models are well understood and can be trained very quickly.

A proven way to scientifically and reliably predict the future

Business and organizational leaders can make better decisions utilizing linear regression techniques. Organizations collect masses of data, and linear regression helps them use that data to better manage reality–instead of relying on experience and intuition. You can take large amounts of raw data and transform it into actionable information.

You can also use linear regression to provide better insights by uncovering patterns and relationships that your business colleagues might have previously seen and thought that they already understood. For example, performing an analysis of sales and purchase data can help you uncover specific purchasing patterns on particular days or at certain times. Insights gathered from regression analysis can help business leaders anticipate times when their company’s products will be in high demand.

→ Learn more about linear regression at the IBM Knowledge Center

Key assumptions of effective linear regression

Assumptions to be considered for success with linear regression analysis:

  • For each variable: Number of valid cases, mean and standard deviation. 
  • For each model: Regression coefficients, correlation matrix, part and partial correlations, multiple R, R2, adjusted R2, change in R2, standard error of the estimate, analysis-of-variance table, predicted values and residuals. Also, 95%-confidence intervals for each regression coefficient, variance-covariance matrix, variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook, and leverage values), DfBeta, DfFit, prediction intervals and case wise diagnostic information. 
  • Plots: Scatterplots, partial plots, histograms and normal probability plots.
  • Data: The dependent and independent variables should be quantitative. Categorical variables, such as religion, major field of study, or region of residence, need to be recoded to binary (dummy) variables or other types of contrast variables. 
  • Other assumptions: For each value of the independent variable, the distribution of the dependent variable must be normal. The variance of the distribution of the dependent variable should be constant for all values of the independent variable. The relationship between the dependent variable and each independent variable should be linear and all observations should be independent.

Make sure your data meets linear regression assumptions

Before you attempt to perform linear regression, you need to make sure that your data can be analyzed using this procedure. Your data must pass through certain required assumptions.

Here’show you can check for these assumptions:

  1. The variables should be measured at a continuous level. Examples of continuous variables are time, sales, weight and test scores. 
  2. Use a scatterplot to find out quickly if there is a linear relationship between those two variables.
  3. The observations should be independent of each other (there should be no dependency).
  4. Your data should have no significant outliers. 
  5. Check for homoscedasticity–a statistical concept in which the variances along the best fitted linear regression line remains similar all through that line. 
  6. The residuals (errors) of the best fitted regression line follows normal distribution.

→ Use this hands-on tutorial to learn more about linear regression data assumptions

Evaluating trends and sales estimates

You can use linear regression to make estimates, identify trends and create forecasts in a business environment. For example, you can forecast your company’s future sales by conducting a linear analysis on the sales data. You can also linear regression analysis to try to predict a salesperson’s total yearly sales (the dependent variable) from independent variables such as age, education, and years of experience.

Analyze pricing elasticity

Changes in pricing often impact consumer behavior–and linear regression can help you analyze how. For instance, if the price of a particular product keeps changing, you can use regression analysis to see whether consumption drops as the price increases. What if consumption does not drop significantly as the price increases? At what price point do buyers stop purchasing the product? This information would be very helpful for leaders in a retail business.

Assess risk in an insurance company

Linear regression techniques can be used to analyze risk. For example, an insurance company might have limited resources with which to investigate homeowners’ insurance claims; with linear regression, the company’s team can build a model for estimating claims costs. The analysis could help company leaders make important business decisions about what risks to take.

Sports analysis

Linear regression isn’t always about business. It’s also important in sports. For instance, you might wonder if the number of games won by a basketball team in a season is related to the average number of points the team scores per game.A scatterplot indicates that these variables are linearly related. The number of games won,and the average number of points scored by the opponent, are also linearly related. These variables have a negative relationship. As the number of games won increases, the average number of points scored by the opponent decreases. With linear regression, you can model the relationship of these variables. A good model can be used to predict how many games teams will win.

Linear regression products

IBM SPSS Statistics software

Propel research and analysis with this fast, powerful solution

IBM SPSS Statistics Grad Pack and Faculty Packs

Students, teachers and researchers get affordable access to predictive analytics software

IBM Cognos Statistics

This proven, self-service analytics solution lets you mix and match your data and create compelling visualizations