Forecasting requirements

To use univariate or multivariate forecasts, the time dimension, historical data, forecast data, and version dimension (if you want to use one) must meet certain requirements. For multivariate forecasts, there are also requirements for forecast variables.

Some requirements vary between univariate and multivariate forecasts, as noted in the following sections.

Time dimension

Forecasting requires the time dimension to be placed on the column. The algorithm requires the time dimension members to be organized in order of increasing time, where each member of a level represents a consistent unit of time. Members within the dimension definition that do not conform to this requirement, for example calculated year-to-date member, can be hidden. Relevant time periods can be chosen to be ignored, resulting in a linear interpolation of values to be predicted from. Forecasting also supports hierarchical time dimensions and nested dimensions as time: year, quarter, and month dimensions. Views with dimensions that do not represent time on the column can be forecasted, but the process might fail or the results might be unreliable.

Note: When the members of the dimensions used to indicate the time (columns) have multiple parents, then the forecasting fails. With multiple parents, the correct order of those members is not known. To solve this limitation, create an alternative hierarchy for the dimensions that exclude the extra parents.

One or more time dimensions can be used to forecast against, and they must all appear on the column of the view. In either case, the leaves must result in contiguous, evenly spaced periods of time. The following screen capture displays an example of a single time dimension of years:

When you use multiple time dimensions in the column, then the product of the members must result in contiguous, evenly spaced periods of time with coarser grained elements that precede finer grained ones. In the following case, the inclusion of years, then months result in continuous time from 1995 on a monthly granularity.

For time (that is, columns), other than the selected dimension, hierarchies, and the order that is placed, other aspects such as level of expansion or hidden and kept columns, do not have any meaning to time series forecasting. IBM® Planning Analytics Workspace also sources historical data back as far along the time series as possible and source data from only the leaf members along the time series. One consequence of this is that leaves with multiple parents along the time series would result in non-contiguous time, so special care must be taken in these cases. If possible, it is advised to consider adding a special hierarchy for time series forecasting to support the required time configuration.

It is not enough to edit the view because the service might need to traverse the source cube to get more historical details.

In this example, the time series is validly defined by nesting the Years and Month dimensions as columns.

The Years dimension is valid:

However, the month dimension was invalid:

The leaf members of the Month dimension contain multiple summaries or parents. For example, Jan.

Second, the month dimension does not contain a single continuous time series, that the month members repeated.

A possible correction is to create a hierarchy that includes only the leaf months:

Correct time series are displayed next.

Historical data

The quality and amount of historical data that is used for forecasting must meet certain requirements to be most effective. Planning Analytics Workspace reaches back as far as possible along the time dimensions that are configured on the column of the exploratory view from which the forecast is started from. Depending on the design of the dimension and nature of the data, the forecast might not be optimal. Some rules to keep in mind:

Univariate forecasts: The historical data must represent at least double what is being forecasted. For example, if the data has 3 years of data, then anything forecasted past 1.5 years must be disregarded.; The historical data must represent at least two seasons of data if it has seasonality. For example, if the data cycles yearly and the leaf time members are at a month level of granularity, then it must have at least 24 months of data in the history. This way, the feature can automatically detect the cycle.
Multivariate forecasts: At least 16 historical data points are required for the forecast to be successful.; Seasonality doesn’t not apply for multivariate forecasts.

Forecast data cells

The forecasting feature forecasts into all members that remain along the time series after your chosen start date. It does not create members in a time dimension. The members must exist. What this means is that they must also represent contiguous time. If any calculated cells exist along this series, for example, then the forecast fails since these cells cannot be written into.

Version dimension

The version dimension, if you choose to use one, must appear in the context area of the exploration view. If the version dimension is placed in any other position on the view, it is not available for selection as target version on the Advanced tab when you set up your forecast.

If you want to see the values of all version members at once, create an independent view for this purpose.

Zero suppression

Be aware of these considerations when using zero suppression in conjunction with forecasting.

Zero suppression on columns: Although zero-suppressed columns cannot be seen in the cube viewer, they might be required to store resulting forecast values. If you ensure that there are no nested columns that don’t represent contiguous time segments (ie. years, quarters, months), you should not need to suppress zeroes on columns when forecasting.
Zero suppression on rows: If you have configured the Advanced Tab to specify where you want to save the predicted values, do not use zero suppression on rows. This can result in empty rows being suppressed in the target location where you want to save the predicted values. An error results, as there is nowhere to write the predicted values.

Consolidated values

Although consolidated members are supported in the context area of the view from which forecasting is initiated, the resulting data spreading can consume significant resources. By default, forecasting uses proportional data spreading. If you want to spread data using a different algorithm, you can change this after forecasting.

Variables for multivariate forecasts

Variables for multivariate forecasts must reside in a cube that is present on the same database as the view from which the forecast is initiated. Variables must exist in a cube that uses the same time dimension as the view from which the forecast is initiated.

Test the setup with preview

An important last step in the setup is to preview the results of at least one time series to ensure that the results make sense. If they don’t, you likely have something wrong with satisfying one or more of the requirements. The next section describes how to preview.