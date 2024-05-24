Autocorrelation, or serial correlation, analyzes time series data to look for correlations in values at different points in a time series. This key method of analysis measures how a value correlates with itself. Instead of calculating the correlation coefficient between different variables, such as an X1 and and X2, we calculate the degree of correlation of a variable itself at time steps throughout the data set. When building a linear regression model one of the primary assumptions is that the errors in predicting the independent variable in that model are independent. Many times, when working with time series data you'll find errors that are time dependent. That is the dependency in the errors appears because of a temporal component. Error terms correlated over time are called autocorrelated errors. These errors cause issues with some of the more common ways of creating a linear regression such as ordinary least squares. The way to address these is to regress the dependent variable on itself using the time lags identified by an autocorrelation test. The 'lag' is simply a previous value of the dependent variable. If you have monthly data and want to predict the upcoming month, you may use the values of the previous two months as input. This means that you are regressing the previous two lags on the current value.

The same way that correlation measures a linear relationship between two variables, autocorrelation measures the relationship between lagged values of a time series through a linear model. When data have a trend, the autocorrelations for small lags tend to be large and positive because observations nearby in time are also nearby in value. So the Autocorrelation Function, often called the ACF, of a trended time series tends to have positive values that slowly decrease as the lags increase.

When data have seasonal fluctuates or patterns, the autocorrelations will be larger for the seasonal lags (at multiples of the seasonal period) than for other lags. When data are both trended and seasonal, you see a combination of these effects. Time series that show no autocorrelation are truly random processes and are called white noise. The ACF is a coefficient of correlation between two values in a time series.

There are a few key ways to test for autocorrelation:

You can compute the residuals and plot those standard errors at time t, usually written as et, against t. Any clusters of residuals that are on one side of the zero line may indicate where autocorrelations exist and are significant.

Running a Durbin-Watson test can help identify whether a time series contains autocorrelation. To do this in R, create a linear regression which regresses the dependent variable on the time and then pass that model to calculate the Durbin-Watson statistic. To do this in Python, you can pass the residuals from a fit linear regression model to the test.

Another option is to use a Ljung Box Test and pass the values of the time series directly to the test. The Ljung-Box test has the Null Hypothesis that the residuals are independently distributed and the Alternative Hypothesis that the residuals are not independently distributed and exhibit autocorrelation. This means in practice that results smaller than 0.05 indicate that autocorrelation exists in the time series. Both Python and R libraries provide methods to run this test.

The most common option is to use a a correlogram visualization generated from correlations between specific lags in the time series. A pattern in the results is an indication for autocorrelation. This is plotted by showing how much correlation of different lags throughout the time series correlate. An example plot is shown below: