Share this post:
Consider the problem of predicting the evolution of stock prices over time, as illustrated in Figure 1. This is, in general, impossible to do perfectly. One could imagine, however, that the prices are a reflection of some underlying state of the market that evolves in a fashion that is not directly observable. Clearly, there is also noise entering both the evolution of the hidden state and in the observations. If the relationships between the hidden state and the observations are linear, and the noise is additive, this model is called a linear dynamical system comprising a ‘state equation’ and an ‘observation equation’. While the state equation provides predictions of the hidden state at the next time-step, the observation equation relates the hidden state with the observation at the same time-step. The task, where one is to combine the estimate of the hidden state with observations made at the time of prediction using this model, is known as ‘filtering’.
Figure 1: Daily values of Dow Jones Industrial Average, a stock-market index, starting in 1885, together with predictions using AR(2), the spectral filter of Hazan et al. (NIPS 2017), and the trivial last-value prediction.
For more than half a century, the dominant approach in filtering is based on the work of Rudolf Emil Kálmán. There, one guesses what the system could be, and using observed measurements, possibly corrupted by noise, gauges one’s confidence in the guess, and produces estimates of its outputs based on a combination of the guess and recent observations. Despite Kalman filtering being a staple of undergraduate courses across Engineering and Statistics, there has been little or no understanding as to how to perform or circumvent the initial guess, such that one can guarantee the performance of the system relative to the use of the best possible guess.
Working with Mark Kozdoba and Shie Mannor at the Technion, the Israel Institute of Technology, we have asked what conditions make it possible to model the predictions of Kalman filtering as a regression of a few past observations. We have shown that when one sees the Kalman filter as an infinite sum of autoregressive terms, the dependence on the terms from the past is decaying exponentially, whenever the linear dynamical system is observable and process noise is non-degenerate. Therefore, Kalman filter can be approximated by regression on a few recent observations.
This makes it possible to circumvent the initial guess in Kalman filtering. In particular, our paper presents an on-line algorithm for estimating the output of a linear dynamical system considering only a few most recent observations and the respective autoregressive terms within an on-line convex optimisation framework. The algorithm is imminently practical: its per-update run-time is linear in the number of observations used (the regression depth). The above-mentioned decay results make it possible to prove the first-ever regret bounds relative to Kalman filters, that is, relative to the use of Kalman filtering with the best initial guess in hindsight.
Previous work on the subject, notably that of Hazan, Singh, and Zhang (NIPS 2017), also utilised a framework of regret bounds to provide guarantees on the performance of convexifications, and showed that the sum of errors of the forecast thus produced is close to the sum of errors of the best model from within a certain class in hindsight. Nevertheless, the class of algorithms considered did not include the Kalman filter, the most natural choice. In computational experiments on an instance proposed by Hazan et al., as well as stock market price evolution considered by other authors, our present work reduces the error considerably.
Surprisingly, we also note that having some process noise is essential for the exponential decay. As illustrated in Figure 2, while the error in the forecast increases with the variance of the process noise, the quality of approximation of the best possible Kalman filter by a small number of auto-regressive terms also increases with the variance of the process noise. This is because without process noise, it may happen that the forecast depends on all of the past uniformly, which makes forecasting more difficult.
Figure 2: The ratio of the errors of Kalman filter and its AR(2) approximation on a benchmark used by Hazan et al. (NIPS 2017), indicated by colours as a function of process and observation noise, on the vertical and horizontal axes, resp. Origin is the top-left corner. For details, see the paper.
In summary, our paper presents a forecasting method, which is applicable to arbitrary sequences and comes with a regret bound competing against a class of methods, which includes Kalman filters. The paper, “On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters,” will be presented at the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), and the code is available as open source.