9 May 2025
Time series forecasting plays a critical role in domains like finance, energy, healthcare and supply chain management, where understanding how things evolve over time is essential. In particular, multivariate time series forecasting allows us to model the interactions between multiple variables as they change over time.
This modeling is helpful when performing estimations with exogenous variables (like weather) that can influence a target variable (like energy demand). In this context, weather is an independent variable that can have an impact on our target variable. In contrast, energy demand represents an endogenous variable, dependent on its own earlier values in the time series and potentially other external variables such as the weather.
This tutorial focuses on using the sktime library, a unified framework for machine learning with time series data. Sktime aims to simplify the workflow for time series problems by providing consistent, composable tools for tasks such as forecasting, classification and transformation. Its design is inspired by the scikit-learn API, making it approachable for people familiar with Python's broader ML ecosystem.
Sktime also allows users to create pipelines tailored to time series data by combining dedicated transformers and classifiers. Forecasting pipelines allow for the sequential application of preprocessing steps and forecasting models, facilitating a streamlined workflow for time series forecasting tasks.
While this tutorial solely focuses on comparing univariate and multivariate forecasting models, it's worth noting that sktime also supports hierarchical forecasting. This feature enables the application of different forecasting models at various levels of data aggregation. It also allows fine tuning by the use of hyperparameters.
Sktime allows for time series classification, regression and clustering while enabling users to switch between models without changing how one prepares or runs the code. This function makes it much easier to experiment and build better models faster.
In a univariate time series, we track a single variable over time (for example, daily temperature). In contrast, a multivariate time series involves multiple interdependent variables tracked over the same time interval (for example, temperature, humidity and energy consumption recorded hourly). Multivariate modeling can help capture the underlying dynamics between these variables, leading to more accurate forecasts.
To demonstrate multivariate forecasting with sktime, we can use the hourly energy demand dataset. This dataset contains four years of hourly electricity consumption and generation data in Spain (2015–2018). Because many factors influence energy demand, such as weather conditions and time of day, this dataset provides an ideal foundation for multivariate modeling.
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook.
Log in to watsonx.ai by using your IBM Cloud® account.
Create a watsonx.ai project.
Create a Jupyter Notebook.
This step opens a Jupyter Notebook environment where you can copy the code from this tutorial. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai® project as an asset. This Jupyter Notebook can be found on GitHub.
We need a few libraries and modules for this tutorial. Make sure to import the following ones and if they're not installed, you can use pip or another package manager to install them.
We're using
Before modeling, we're using the
While preparing the data, we're using
For time series forecasting, we're using the
As described earlier, this tutorial uses the Hourly energy demand dataset containing four years of electrical consumption and energy generation data gathered in Spain in 2015–2018 aggregated by hour. It is a modified version of the Hourly energy demand generation and weather dataset.
You can find more details about the dataset, including metadata, in the preceding links. You can also use this tutorial as a template to explore other well-known datasets with
For simplicity, the tabular dataset was prepared to have no missing values and to remove irrelevant columns.
Let's examine the first few rows of the dataset. We can see the time column showing a timestamp for each hour. Other columns show numeric data types for energy generation from different sources, weather forecast details and actual energy usage, termed as
In a univariate forecasting scenario, we'll use only the
To use
For our forecasting problem, we need to split the data into two sets, using the first as historical data. We'll provide the historical data to the model and ask it to predict future values from this data.
To test the accuracy of our predictions, we also need to compare these predictions against ground truth values. To make this comparison, we use a second subset of our dataset as the ground truth and we compare the predicted values to the actual values in this ground truth subset.
There are more than 35,000 rows in our dataset, but we need to use only a subset of this data for our forecasting example. To subset the data, we need to specify a value as
The second dataset that we need is the evaluation or ground truth dataset. Following the subset that we created based on the variable
In choosing the values of
(35064, 18)
(512, 18)
Let's examine the data further with this data visualization, which plots the hourly timestamps against our target column
Though we see differing energy demand across days in our dataset, we can also observe patterns in the data within a day, representing the differing energy demand throughout the day. Our data shows that energy demand is typically lower at the beginning and end of a day, with a brief drop at midday.
We can hypothesize that the midday drop might reflect lunchtime behavior in Spain. Patterns that repeat within a fixed period in a time series are referred to as seasonality. In our case, the data exhibits a pattern of daily seasonality, observed by the consistent peaks and valleys at regular intervals within a day.
For this experiment, we want to create forecasts for our target variable
Initially, we're using a naive method to create baseline forecasts for our data. Naive forecasting methods are based on the assumption that the best predictor of a future value is the current value.1
For our use case of hourly energy demand, we can think of the simplest implementation of naive forecasting to mean that the predicted energy demand for the next hour should be equal to the previous hour's energy demand. In practice, adding a small amount of random variation to naive predictions often improves them.
We can also improve naive forecasts by taking advantage of seasonal patterns observed in our data. In the case of our hourly energy demand data, we can see patterns in the time series within a one day period, indicating a seasonal periodicity of a day.
Instead of forecasting the next hour's energy demand to be similar to the previous hour's energy demand, we can forecast the next hour's energy demand to be similar to the energy demand for that same hour on the previous day.
To do this, we need to first specify the forecaster to use, in this case the NaiveForecaster. The
We must specify a
Together, these two settings implement our seasonal naive strategy described previously, where the value of energy demand at a specific hour is predicted by using the energy demand at that hour on the previous day.
After the forecaster is initialized with the parameters as described previously, we call the
Next, we call the
For an initial evaluation of our model's performance, we can use a data visualization. Here, we plot the predictions (shown in gold), along with the historical data (shown in blue) for past values. We can also see the green line, our
The gold line predicts hourly energy demand based on patterns in the historical data. We can observe the forecast pattern of our naive seasonal forecaster here. It looks like it repeated the last 24 hours of the input data across the forecast horizon.
In evaluating the green line against the gold line, we can see instances of overlap where the model's predictions are accurate. The naive forecaster has gotten very close on some of the data points.
We can also calculate evaluation metrics to quantify the accuracy of the model. We're using the Mean Absolute Percentage Error (MAPE) as our metric. A lower percentage is better and our result, at less than 10%, is a good indication of the performance of the model for this use case.
As shown in our example, naive forecasters can often perform surprisingly well. For this reason, they are often used for baseline predictions as a first step before more advanced modeling methods.
7.991385858936119
While our seasonal naive forecaster performed reasonably well on the data, we can likely improve our forecast by using a more sophisticated model. Beyond using only one variable for prediction, we can take advantage of the other columns in the dataset to provide more input for forecasting. In this case, information about weather and the amount of energy generated by various sources might contain signals that affect the target variable
We shouldn't assume that all the columns in our dataset are valuable in forecasting our target variable. In forecasting, we want to provide columns to our model only if we know that they show some alignment with our target variable. We must explore the data further to decide which variables to include.
For initial data exploration, we can look at correlations between our variables. The following correlation plot simply examines the relationship between each of our variables on an x and y axis. It can examine only two variables at once and it doesn't consider the temporal component (timestamp) of our data, but these correlations can be a useful first step of exploratory data analysis (EDA).
In interpreting this correlation plot, the colors with the strongest correlations are shown in purple and pink. If we look at the
Based on this data, we can probably assume that the column
Including it might bias the model, making it good at predicting these exact values but bad at generalizing to other unseen values. It would be a bit like studying for a test by using the answer key for the upcoming test.
Other columns, shown in purple, that exhibit correlations with our target variable include
We need to split the data again for our multivariate forecast because our existing data split included only our target variable. Now, we want to include the additional data columns in our training and testing datasets.
(512, 18)
Some forecasting models have requirements that the time series data exhibits certain properties in order to correctly apply the model. The
A time series is stationary if it doesn't exhibit seasonality or trend. Ideally, a stationary time series should look like white noise, without any overall trend or patterns repeating at stable intervals. We've already observed that our time series demonstrates daily seasonality, so we likely must transform the data to fit the requirements of the
First, we perform a statistical test from
If a column doesn't show stationarity based on the ADF test, we can apply a transformation called differencing to remove the seasonality in our data.
Differencing works by calculating the difference between points in a time series and using those differences rather than the absolute values of the time series. In the case of our data with daily seasonality, the differencing transformer is set to use a 24-hr lag.
Above we can observe the time series with daily seasonality. Below we need to set up the ADF test for stationarity and the transformer to apply differencing, hopefully making the time series stationary.
Following testing and transformation, we can observe that all the features in the dataset now exhibit stationarity. We can also visualize the differenced time series for our target variable. This version of the data looks somewhat like white noise, with no periodic patterns or obvious trends.
generation biomass, stationary: True
generation fossil brown coal/lignite, stationary: True
generation fossil gas, stationary: True
generation fossil hard coal, stationary: True
generation fossil oil, stationary: True
generation hydro pumped storage consumption, stationary: True
generation hydro run-of-river and poundage, stationary: True
generation hydro water reservoir, stationary: True
generation nuclear, stationary: True
generation other, stationary: True
generation other renewable, stationary: True
generation solar, stationary: True
generation waste, stationary: True
generation wind onshore, stationary: True
forecast solar day ahead, stationary: True
forecast wind onshore day ahead, stationary: True
total load forecast, stationary: True
total load actual, stationary: True
Given all the columns in our dataset, we need a way to select the columns that can provide the most predictive input to our model. We need to use a procedure called Granger causality testing to examine the features in our data and to measure the strength of their relationship to our target variable.
While we should be cautious of promises to discern causality between variables, Granger causality examines whether a time series is predictive of another, given a stated time lag. We need to test all of our features (except
For results with a p value < .05, we can accept that the specific feature in our time series is predictive of our target variable.
['generation fossil oil', 'generation hydro water reservoir']
After statistical testing, we've selected two features from the dataset to pass to our model
Using this method, we were able to further narrow the list of variables from correlation testing and ultimately select these variables as input to our multivariate forecasting model.
After testing several models on this dataset, we're settling on a VAR or vector autoregression model for multivariate forecasting. We've chosen this model because it's fairly fast to fit the model to our data, it performs better than our univariate forecasting model and better than several other multivariate forecasting models we tested.
A vector autoregression model can represent the interactions between different time series and forecast values for any of the examined time series.1 While the model can produce accurate forecasts, one of its shortcomings is that it doesn't provide a hypothesis for why the system of time series behaves the way it does. However, for our purposes, the VAR model is sufficient.
As mentioned previously, one of the requirements for this model is that the time series be stationary, a requirement already met by testing and transforming the data in step 5.3.
In order to create our forecasts, we first create the
Next, we pass our data
Let's evaluate our model results by using the same evaluation metrics as our univariate modeling. In order to compare against our ground truth data, we first need to remove the differencing transformation that was applied.
Then, we can plot the predictions, shown in gold, along with the historical data (shown in blue) against the ground truth data (shown in green). You might notice that the forecast initially tracks close to the actual data at the start of the forecast horizon, drifting further from the actual values towards the end of the forecast horizon.
This behavior has been observed in multivariate forecasts. They are often very good in the near short term, but less reliable over time.2
In examining our MAPE for this forecast, at ~7.20 it's slightly better than ~7.99 for our univariate forecast.
7.195281205188215
The plot, along with our calculated MAPE, demonstrates that our multivariate forecast is more accurate than our univariate forecast. Given these results, it seems likely that the selected columns passed as input to our forecaster have improved the forecast.
In this tutorial, we used the capabilities of
[1] Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on April 9, 2025.
[2] S. Salehi, M. Kavgic, H. Bonakdari, L. Begnoche, Comparative study of univariate and multivariate strategy for short-term forecasting of heat demand density: Exploring single and hybrid deep learning models, Energy and AI, Volume 16, 2024, 100343, ISSN 2666-5468, https://www.sciencedirect.com/science/article/pii/S2666546824000090?via%3Dihub.