3 October 2024
Forecasting is an important task of time series analysis because it allows a data scientist to identify patterns by using machine learning and then generate forecasts about the future. Deep learning for forecasting is an exciting topic in artificial intelligence that is beginning to show promise when compared to benchmarks from more traditional statistical methods such as ARIMA. Foundation models for time series data are similar to other forms of generative AI that are trained on large-scale time series datasets and can output either deterministic or probabilistic forecasts. A time series foundation model can create forecasts without pretraining, similar to how a large language model (LLM) can output text without being pretrained on a task.
Foundation models have been built for time series forecasts such as Moirai, TimeGPT-1 and TimesFM but these are all deterministic models. Lag-Llama is a general-purpose open source foundation model for probabilistic time series forecasting on univariate datasets that uses a transformer architecture. The paper announcing this is titled Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting by Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, and others.
To unpack this a little, a probabilistic forecast is one that generates a probability distribution of values for each forecast step rather than just a single value. This method is helpful to indicate how certain the model is about its predictions. A wide distribution would indicate low certainty while a narrower range of values would indicate that the model is fairly certain about its predictions. Purely deterministic forecasts don't tell us how certain they are about their forecasts, which can be problematic if we're trying to ascertain how confident we should be in our forecasts.
A univariate forecast means that a forecast doesn't use covariates. In the Granite Tiny Time Mixer tutorial, we used the air temperature, wind speed and wind direction as covariate terms to forecast air pollution readings. In a univariate time series, there's only one variable changing at a time. Lag-Llama uses lag features, which are previous readings from the time series, as covariates. In this way, it is conceptually similar to ARIMA models.
In this tutorial, we'll use the Lag-Llama model and see how it does in two different forecasting tasks. Zero-shot learning is when the model is not trained on the data it’s trying to predict. It’s an interesting test of how well our model can detect and respond to the patterns present in the time series. After that, we’ll fine-tune the model to see whether there’s a performance boost. On the free tier of watsonx™ fine-tuning might take up to an hour but the model can be saved after that step for use later. The paid tier of watsonx provides access to a GPU that will decrease the fine-tuning time to around 10 minutes. With that in mind, let's get started.
In this step, we’ll guide you through creating an IBM account to access Jupyter Notebooks.
1. Log in to watsonx.ai™ using your IBM® Cloud® account.
2. Click + to create a new project.
a. Select Create an empty project.
b. Enter a project name in the Name field.
c. Create a Cloud Object Storage for storing your project assets if not already created.
d. Select Create.
3a. Create a Jupyter Notebook.
a. Select the Assets tab in your project environment.
b. Click New asset.
c. Select the Working with models option in the left panel.
d. Click Working with data and models using Python and R notebooks.
e. Enter a name for your notebook in the Name field. Choose Runtime 23.1 on Python (2 vCPU 8 GB RAM) to define the configuration.
f. Select Create.
3b. Upload a Jupyter Notebook.
a. Select the Assets tab in your project environment.
b. Click New asset.
c. Select the Working with models option in the left panel.
d. Click Working with data and models using Python and R notebooks.
e. Click Local File in the left-hand tab
f. Download the notebook from GitHub
e. Enter a name for your notebook in the Name field. Choose Runtime 23.1 on Python (2 vCPU 8 GB RAM) to define the configuration.
f. Select Create.
To use the Lag-Llama model, we'll clone the open source GitHub repository and install it on our watsonx.ai instance.
Next we need to install pretrained model weights from the HuggingFace repository where they're stored. To do this, we use the Hugging-Face Command Line Interface to download the trained Lag-Llama checkpoint:
Now we have the weights for a pretrained model that we can use in zero shot forecasting and fine-tuning.
Next, we need to import libraries to work with Lag-Llama. The library that the Lag-Llama team built to work with Lag-Llama uses GluonTS, a PyTorch based library for working with time series data and forecasting models.
Next, we load the data from the GitHub repository. This time series contains the daily minimum temperatures in Melbourne, Australia. Because the dataset is a univariate dataset and contains values for only one variable, each row consists simply of a date and a temperature reading in Celsius.
It does have two missing dates that we need to fill in by resampling the dataset and interpolating between values so that there are no missing values in the time series.
With a complete time series, we can split our Pandas dataframe into a train, validate and test dataset.
Now we're ready to make predictions with our dataset.
We'll create some configuration settings to use with our model. The prediction_length is how many time steps each prediction should contain. Because our data is daily, we'll predict the next week of temperatures in each forecast. The context_length sets the number of time points back into the past that the model should look for lagged correlations. We don't want this to be too wide or too narrow, for each dataset the optimal value will be different.
Now we create the forecaster. This step consists of two key steps: first, creating a LagLlamaEstimator which uses all the parameters copied from the downloaded Lag-Llama model. The second step is to create a LagLlamaPredictor using the create_predictor() method of the estimator. This allows us to pass a context_length sized window of data to get forecasts from the predictor.
Now we're ready to create forecasts for one week out.
In this step, we'll ask the model to create 9 forecasts using 9 different months from the test dataset. We can use the make_evaluation_predictions from the gluonts.evaluation library to generate our forecasts.
To evaluate the forecasts, we use an evaluator object, also from the gluonts.evaluation library. This approach will generate all the statistics about our predictions that we might want to use to evaluate how accurate our predictions are using MAPE.
Once we have the evaluations for each prediction, we can graph each prediction:
The generated chart shows each of our 9 zero-shot forecasts (shown in green) and the time series data that lead up to it (the blue line). For each forecast we can see the mean forecast as the green line, the boundaries of the 50% prediction interval in dark green and the boundaries of the 90% prediction interval in lighter green. This shows us how certain our model is about the forecast at each step. This is the advantage of a probabilistic model: it will show us how certain it is at each step in the forecast.
We can see that some of the predictions are quite accurate with a 5.13% error for the period beginning 1990-02-24, while others are much less so, for instance the 62.33% error for the period beginning 1990-08-23.
Now we'll fine tune our model using the training and validation datasplits that we set aside previously. The Lag-Llama authors have recommendations for fine-tuning:
We recommend tuning two important hyperparameters for each dataset that you finetune on: the context length (suggested values: 32, 64, 128, 256, 512, 1024) and the learning rate (suggested values: 0.01, 0.05, 0.001, 0.005, 0.0001, 0.0005). We also highly recommend using a validation split of your dataset to early stop your model, with an early stopping patience of 50 epochs.
For our dataset, a context_length of 64 and a learning rate of 0.0005 work well enough, though other values might perform better.
The first step is to create a newLagLlamaEstimator :
Now we can retrain using the new estimator:
This fine-tuning step can take up to an hour to run if you run it on the CPU, if you have access to a GPU it should fine-tune in about 10 minutes. Once fine-tuning is complete, we'll have a new model to generate forecasts.
Optional step:
If you have storage associated with your watsonx instance, you can upload the fine-tuned model there. You can create an Access Token in the Manage tab of the Projects page by going to Access Control and creating a token if one does not exist. Then, copy the token string into the token field of theaccess_project_or_space method.
We'll use the same approach to forecasting with our fine-tuned model as we did with the zero-shot estimator, creating forecasts for one week out at 9 different dates in our test dataset.
Now we can reuse the estimator from the zero-shot step:
The same graph gives us the ability to compare the accuracy of the forecasts:
This returns the following graph where we can see how the fine-tuned model performs on the same week predictions:
We can see that some of the predictions are still less accurate than we might want, for instance a ~35% error for the week beginning 1990-06-24. Overall, the least accurate forecasts for fine-tuned model are more accurate than for the zero-shot model.
Now let's look to see where fine-tuning has improved the performance of our model. We'll just subtract the zero-shot mean absolute percentage error (MAPE) from the MAPE of the fine-tuned forecast to get the improvement of finetuning:
This will print out the following:
Date Finetuned MAPE Zeroshot MAPE Finetuning Improvement 1990-02-24 14.074690 5.132133 -8.942557 1990-03-26 12.999803 23.437066 10.437263 1990-04-25 15.111790 18.150030 3.038239 1990-05-25 33.572088 34.048983 0.476895 1990-06-24 34.988431 60.099336 25.110906 1990-07-24 18.796240 18.077844 -0.718396 1990-08-23 33.861474 62.327957 28.466484 1990-09-22 29.453261 51.933639 22.480379 1990-10-22 18.961399 17.984154 -0.977245
We can see that in some forecasts the zero-shot forecast does slightly better, but in most of the forecasts the fine-tuned model performs roughly the same or significantly better.
In this tutorial you learned about foundation models for forecasting, an area of research in artificial intelligence with important applications in healthcare, environmental monitoring, anomaly detection and many other fields and tasks. You used the Lag-Llama model, a foundation model trained on time series data specifically built for forecasting. You used the base model to perform zero-shot forecasting on a daily temperature dataset. You then fine-tuned the Lag-Llama model with training data and generated forecasts from that fine-tuned model and compared its performance to the zero-shot dataset.
You can learn more about the Lag-Llama at the Lag-Llama GitHub Repository.