In order to be generally applicable to many time series datasets, the pretrained
Granite-TimeSeries model is trained in a channel independent way. Under channel
independence, the model learns a common dynamics model across all channels
without considering interactions. In many cases, there is important information
in a time series dataset related to how different channels interact with each
other. In order to capture these dependencies, we need to fine tune and enable
channel mixing in the decoder.The fine-tuning process requires the following steps:
Load and prepare data using the time series preprocessor. Since we are
fine-tuning the model, we have more options to include things like
cross-channel interactions, exogenous features and categorical values.
Load the model and freeze the backbone
Create a Hugging Face Trainer and train the model
Evaluate the model
For the example code-snippets below, we assume that a dataset is available with
a timestamp column (“date”), and three value columns (“value1”, “value2”,
“value3”). The columns “value1” and “value2” will serve as targets of the
forecasting exercise, while the “value3” column will serve as a control column.
A control column is a type of exogenous column where it is assumed that its
value is known into the future.
Please make sure your python environment is properly set up. The code snippets
below are illustrative examples of how various components work with our
Granite-TimeSeries models, and require the installation of the
Granite TSFM library. Please
see the setup instructions.
Pandas dataframes are required in the preprocessing and forecasting pipeline
components. We simply read the csv into a pandas dataframe, making sure to
format the timestamp column properly.
data = pd.read_csv( dataset_path, parse_dates=["date"],)
In the zero-shot case, we can only handle datasets with id columns and one or
more target columns, advanced use cases involving exogenous or categorical
features are not yet supported. In the fine-tuning case, we are free to specify
additional columns, like control_columns. We first specify the parameters of the
preprocessor, enabling scaling (normalization) of the data. Then we use the
preprocessor to extract train, validation, and test datasets from the original
data.
Here we load the model using from_pretrained but pass some extra configuration
arguments. Note the presence of “decoder_mode” and the other index related
arguments. The TimeSeriesPreprocessor provides some helper functions to make
passing these parameters easier.
model = TinyTimeMixerForPrediction.from_pretrained( "ibm-granite/granite-timeseries-ttm-r2", num_input_channels=tsp.num_input_channels, prediction_channel_indices=tsp.prediction_channel_indices, exogenous_channel_indices=tsp.exogenous_channel_indices, decoder_mode="mix_channel")for param in model.backbone.parameters(): param.requires_grad = False
The key here is the “mix-channel” decoder mode. This enables the decoder to
consider inter-channel dependencies.
Creating a Hugging Face trainer and training the model
We create a Hugging Face Trainer, passing the train and validation datasets that
were created earlier. The first step is to define the training arguments:
Many of the arguments above, e.g., learning rate, batch size, are dataset
specific and will need to be adjusted for your specific use case. Next we define
the callbacks. We add early stopping, which stops training when no improvement
is seen, and a tracking callback, which reports training time statistics.
early_stopping_callback = EarlyStoppingCallback( early_stopping_patience=10, # Number of epochs with no improvement after which to stop early_stopping_threshold=0.0, # Minimum improvement required to consider as improvement)tracking_callback = TrackingCallback()
Next we create the optimizer and the learning rate scheduler. We use AdamW as
the optimizer and the 1cycle learning rate policy which adapts the learning rate
quickly.
We make use of the trainer to evaluate on the test dataset created earlier. In
the output of the command below “eval_loss” refers to the mean squared error on
the normalized test data.