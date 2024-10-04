Forecasting is an important task of time series analysis because it allows a data scientist to identify patterns by using machine learning and then generate forecasts about the future. Deep learning for forecasting is an exciting topic in artificial intelligence that is beginning to show promise when compared to benchmarks from more traditional statistical methods such as ARIMA. Foundation models for time series data are similar to other forms of generative AI that are trained on large-scale time series datasets and can output either deterministic or probabilistic forecasts. A time series foundation model can create forecasts without pretraining, similar to how a large language model (LLM) can output text without being pretrained on a task.

Foundation models have been built for time series forecasts such as Moirai, TimeGPT-1 and TimesFM but these are all deterministic models. Lag-Llama is a general-purpose open source foundation model for probabilistic time series forecasting on univariate datasets that uses a transformer architecture. The paper announcing this is titled Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting by Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, and others.

To unpack this a little, a probabilistic forecast is one that generates a probability distribution of values for each forecast step rather than just a single value. This method is helpful to indicate how certain the model is about its predictions. A wide distribution would indicate low certainty while a narrower range of values would indicate that the model is fairly certain about its predictions. Purely deterministic forecasts don't tell us how certain they are about their forecasts, which can be problematic if we're trying to ascertain how confident we should be in our forecasts.

A univariate forecast means that a forecast doesn't use covariates. In the Granite Tiny Time Mixer tutorial, we used the air temperature, wind speed and wind direction as covariate terms to forecast air pollution readings. In a univariate time series, there's only one variable changing at a time. Lag-Llama uses lag features, which are previous readings from the time series, as covariates. In this way, it is conceptually similar to ARIMA models.

In this tutorial, we'll use the Lag-Llama model and see how it does in two different forecasting tasks. Zero-shot learning is when the model is not trained on the data it’s trying to predict. It’s an interesting test of how well our model can detect and respond to the patterns present in the time series. After that, we’ll fine-tune the model to see whether there’s a performance boost. On the free tier of watsonx™ fine-tuning might take up to an hour but the model can be saved after that step for use later. The paid tier of watsonx provides access to a GPU that will decrease the fine-tuning time to around 10 minutes. With that in mind, let's get started.