6 June 2025
In this tutorial, we will explore time series forecasting using an IBM Granite® Time Series Foundation Model (TSFM) to predict retail sales. In addition to forecasting, we will cover key machine learning techniques such as few-shot fine-tuning, an efficient fine-tuning strategy by using only a portion of the designated training data. The sales data for this use case comes from the M5 datasets from the M-Competitions repository, provided as part of the long-running M-competition series for encouraging the research and development of forecasting methods. The aim of this tutorial is to forecast future sales aggregated by state, while showcasing how to use a pretrained TSFM for multivariate forecasting. In our analysis, we'll also use features available within the open source Granite Time Series Foundation Models toolkit, granite-tsfm.
In conjunction with the
The granite-tsfm library provides utilities for working with Time Series Foundation Models (TSFM). Here we retrieve and install the latest version of the Python library. The current Python versions supported are 3.9, 3.10, 3.11, 3.12.
In addition to standard packages for data science, we'll use functionality from the
To subset our data between training and test data, we'll use the functions
For fine-tuning, the toolkit uses a custom dataset type called
To interact with the model, we'll use the
In addition to the
Granite Time Series Models are sized with defined context lengths and forecast lengths. The context length will be sized accordingly with historical data, defining how many data points to look back into the past when making predictions. The forecast length specifies how many data points in the future to make predictions for. Here, we define
Next, we declare the Granite Time Series Foundation Model, including the specific revision that we are targeting. In this time series analysis example, we will be working with daily data, so we choose a model suitable for that resolution—90 days of historical data to forecast the next 30 days. If a different
As noted earlier, this notebook uses the M5 datasets from the official M-Competitions repository. You can read more about the competition and the dataset here.
Following initial data analysis, we observed that the original time series data includes hierarchy and product information. To prepare the data for this forecasting experiment, we will aggregate the sales by state into three separate time series. We'll use the
Let's make sure we have access to the
Following a typical data science workflow, we parse the resulting CSV file into a
The next step for our time series analysis is to organize the columns in our sales data with the naming conventions required for input to our model.
In preparation for creating a
By specifying a list of
These column designations are important for the fine-tuning workflow, which permits multivariate forecasting by using the exogenous variables.
Next, we set up a TimeSeriesPreprocessor, passing the
The
We train the scaling algorithm using the
For fine-tuning, we use the same data splits we defined earlier, but now we can include the extra columns as multivariate input.
We split the time series data into training, validation and test sets by using the
Next, we will construct three custom type ForecastDFDataset datasets by using the created data splits
In comparison to a
Before beginning fine-tuning, we want to further subset the training and validation datasets, implementing a few-shot fine-tuning strategy. This strategy will be more efficient than if we fine-tuned the whole training dataset. Here, we sample the torch datasets produced from preceding steps, reducing the length of the dataset to 20% of the original size.
(2601, 520, 1398, 279)
Now we will focus on fine-tuning the pretrained model. The TinyTimeMixer architecture allows for fast fine-tuning, even on a CPU. In the following image from the IBM Research paper, the workflow for fine-tuning is illustrated in part (a).
In the diagram, the TTM architecture is depicted with 4 main components (a):
The TTM decoder (2) and the forecast head (3) make up the TTM head. This head is typically retrained during fine-tuning, which is more efficient than fine-tuning the larger backbone component.
In contrast to the fine-tuning workflow, the pretrain workflow—also shown in part (a)—doesn't permit multivariate input, thus being unable to leverage exogenous variables in forecasting. The model is initially trained with univariate input, without considering interactions between variables, as indicated by the channel-independent methods in both the backbone and decoder components of the model for the pretrain workflow.
In the fine-tuning workflow, multivariate input is permitted and leveraged for its potential interactions with the target variable, by enabling channel mixing in the decoder and the optional exogenous mixer component.
You can read more about the steps for fine-tuning in the Granite docs.
First, we use the
To provide these columns as input to the model, we provide the
Note that we also enable channel mixing in the decoder by setting
After loading the model, we see a message reminding us to fine-tune the model before using it on a forecasting task.
Oftentimes, during fine-tuning we freeze the backbone component of the model, leaving these pretrained weights unchanged and we focus on tuning only the parameters in the decoder. This step reduces the overall number of parameters being tuned and maintains what the encoder learned during pretraining.
However, in this time series analysis, we found that performance was better when the backbone remained unfrozen—for other datasets one might prefer to freeze the backbone. We have disabled the backbone freezing code, but left it intact as an example of what might need to be done for other datasets.
We'll use the Trainer API from HuggingFace for fine-tuning. To set up our fine-tuning, we need to specify values for hyperparameters. In the following code we set values for
OPTIMAL SUGGESTED LEARNING RATE = 0.000298364724028334
Here we train the model on the historical training data by using the hyperparameters that we previously set. We create the
Using learning rate = 0.000298364724028334
{'eval_loss': 0.3611494302749634,
'eval_runtime': 15.3951,
'eval_samples_per_second': 90.613,
'eval_steps_per_second': 0.715,
'epoch': 37.0}
We'll leverage the TimeSeriesForecastingPipeline from
To assess whether the model produces accurate predictions for our time series analysis, we’ll evaluate the fine-tuned model on the original
Then, we'll calculate error metrics using the predefined
Finally, we use the plot_predictions function from
In this tutorial, we performed time series forecasting for a sales data use case. Our time series analysis demonstrates forecasting methods by using a Granite TSFM model. This model is a compact pretrained foundation model that has been shown to outperform other machine learning algorithms including ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short-Term Memory networks). While our use case focuses on sales forecasting, the Granite TSFM models can aid data scientists by forecasting values for numerous variables while being robust to outliers and seasonal variations in the data. Some of examples of these variables include weather, stock prices, flu cases, other temporal data and more. Time series predictions like these enable real world data-driven decision making.
To thrive, companies must use data to build customer loyalty, automate business processes and innovate with AI-driven solutions.
Unlock the value of enterprise data with IBM Consulting, building an insight-driven organization that delivers business advantage.
Introducing Cognos Analytics 12.0, AI-powered insights for better decision-making.