**Published:** 24 May 2024

**Contributors:** Joshua Noble, Eda Kavlakoglu

Autocorrelation provides data analysis for time series data and modeling. It’s widely used in econometrics, signal processing and demand prediction.

Autocorrelation, or serial correlation, analyzes time series data to look for correlations in values at different points in a time series. This key method of analysis measures how a value correlates with itself. Instead of calculating the correlation coefficient between different variables, such as an *X1* and and *X2*, we calculate the degree of correlation of a variable itself at time steps throughout the data set. When building a linear regression model one of the primary assumptions is that the errors in predicting the independent variable in that model are independent. Many times, when working with time series data you'll find errors that are time dependent. That is the dependency in the errors appears because of a temporal component. Error terms correlated over time are called autocorrelated errors. These errors cause issues with some of the more common ways of creating a linear regression such as ordinary least squares. The way to address these is to regress the dependent variable on itself using the time lags identified by an autocorrelation test. The 'lag' is simply a previous value of the dependent variable. If you have monthly data and want to predict the upcoming month, you may use the values of the previous two months as input. This means that you are regressing the previous two lags on the current value.

The same way that correlation measures a linear relationship between two variables, autocorrelation measures the relationship between lagged values of a time series through a linear model. When data have a trend, the autocorrelations for small lags tend to be large and positive because observations nearby in time are also nearby in value. So the Autocorrelation Function, often called the ACF, of a trended time series tends to have positive values that slowly decrease as the lags increase.

When data have seasonal fluctuates or patterns, the autocorrelations will be larger for the seasonal lags (at multiples of the seasonal period) than for other lags. When data are both trended and seasonal, you see a combination of these effects. Time series that show no autocorrelation are truly random processes and are called white noise. The ACF is a coefficient of correlation between two values in a time series.

There are a few key ways to test for autocorrelation:

You can compute the residuals and plot those standard errors at time *t*, usually written as *et*, against *t*. Any clusters of residuals that are on one side of the zero line may indicate where autocorrelations exist and are significant.

Running a Durbin-Watson test can help identify whether a time series contains autocorrelation. To do this in R, create a linear regression which regresses the dependent variable on the time and then pass that model to calculate the Durbin-Watson statistic. To do this in Python, you can pass the residuals from a fit linear regression model to the test.

Another option is to use a Ljung Box Test and pass the values of the time series directly to the test. The Ljung-Box test has the Null Hypothesis that the residuals are independently distributed and the Alternative Hypothesis that the residuals are not independently distributed and exhibit autocorrelation. This means in practice that results smaller than 0.05 indicate that autocorrelation exists in the time series. Both Python and R libraries provide methods to run this test.

The most common option is to use a a correlogram visualization generated from correlations between specific lags in the time series. A pattern in the results is an indication for autocorrelation. This is plotted by showing how much correlation of different lags throughout the time series correlate. An example plot is shown below:

Non-random data have at least one significant lag. When the data are not random, it’s a good indication that you need to use a time series analysis or incorporate lags into a regression analysis to model the data appropriately.

There are fundamental features of a time series that can be identified through autocorrelation.

- Stationarity

- Trends

- Seasonality

A stationary time series has statistical properties that are constant over time. This means that statistics such as the mean, variance and autocorrelation, don't change over the data. Most statistical forecasting methods, including ARMA and ARIMA, are based on the assumption that the time series can be made approximately stationary through one or more transformations. A stationary series is comparatively easy to predict because you can simply predict that the statistical properties will be about the same in the future as they were in the past. Stationarity means that the time series does not have a trend, has a constant variance, a constant autocorrelation pattern, and no seasonal pattern. The ACF declines to near zero rapidly for a stationary time series. In contrast, the ACF drops slowly for a non-stationary time series.

A key feature of time series data is whether a trend presents in the data. For instance, the prices of basic staples in a grocery store from the last 50 years would exhibit a trend because inflation would drive those prices higher. Predicting data that contains trends can be difficult because the trend obscures the other patterns in the data. If the data has a stable trend line to which it reverts consistently it may be trend-stationary, in which case the trend can be removed by just fitting a trend line and subtracting the trend from the data before fitting a model to it. If the data isn't trend-stationary, then it might be difference-stationary in which case the trend can be removed by differencing. The simplest way of differencing is to subtract the previous value from each value to get a measure of how much change is present in the time series data. So for instance, if *Y _{t}* is the value of time series

Seasonality is when a time series contains seasonal fluctuations or changes. We probably should expect ice cream sales to be higher in the summer months and lower in the winter months, ski sales might reliably spike in late autumn and dip in early summer. Seasonality can come in different time intervals such as days, weeks or months. The key for time-series analysis is to understand how the seasonality affects our series, therefore making us produce better forecasts for the future. When seasonal patterns are present, the ACF values will show more positive autocorrelation for lags at multiples of the seasonal frequency than for other lags.

Learn about barriers to AI adoptions, particularly lack of AI governance and risk management solutions.

Register for the guide on foundation models

The Partial Autocorrelation function, often called the PACF, is similar to the ACF except that it displays only the correlation between two observations that the shorter lags between those observations do not explain. An ACF plot shows the the relationship between *y _{t}* and

In practice, the ACF helps assess the properties of a time series. The PACF on the other hand is more useful during the specification process for an autoregressive model. Data scientists or analysts will use partial autocorrelation plots to specify regression models with time series data, Auto Regressive Moving Average (ARMA) or Auto Regressive Integrated Moving Average (ARIMA) models.

Create and assess Autoregression models using R on watsonx.ai.

Create and assess ARIMA models using Python on watsonx.ai.

Learn about Autoregressive Integrated Moving Average (ARIMA) models for time series analysis and forecasting.