Operator BATS

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.modeling$BATS.svg

The BATS operator is a forecasting operator, which can be used to do long-term forecasting of regular time series with complex seasonality by using the BATS algorithms. The BATS algorithm is a generalization of the Holt-Winters algorithm that features the Box-Cox transform of the incoming data, an ARMA error model, trend and complex seasonality. The Box-Cox transformation is a power transformation technique used to stabilize variance, and to make the data more normal distribution-like to meet the assumptions of statistical calculations. The Box-Cox transformation can be applied only to positve input data. If the input data is expected to contain also zeros or negative values, Box-Cox transformation must be disabled by setting the useBoxCox parameter to false. The BATS operator is a univariate operator that accepts time series in the following formats:

  • a scalar timeseries as float64
  • a vector timeseries as list<float64>
If a vector timeseries is used as input, the operator keeps independent models for each dimension of the vector. The operator also supports one-dimensional vectors, which are logically equivalent to a scalar time series. The dimension of vectors can increase during runtime, but not decrease. When a new dimension is detected in the time series, the operator instantiates a new model for the additional dimension, which is then trained with the input data. During the training phase of the model for the new dimension, the forecasted vectors will have the previous number of dimensions.

Calculation of confidence intervals

The BATS operator can calculate a confidence interval to a given confidence level. The confidence interval is an upper and lower bound of the forecasted value, such that the actual value is between these bounds with the probability gven by the confidence level.

Partitioned forecasting

The operator can be used in partitioned mode. In this case, the input stream must contain one single attribute that identifies a partition. This attribute must be specified by the partitionedBy parameter. The operator then effectively creates a univariate scalar or vector model for every observed value of the partitionedBy attribute. If the input timeseries is a vector, the number of vector dimensions can be different in every partition. While the model for a partition is trained with input data, the operator does not produce tuples for the particular partitionedBy value.

Assignment of results

For every result of the BATS operator, you must specify an attribute of the right type in the output stream. The attributes have default names, which can be changed by operator parameters.

For example, if you want to calculate the time series forecast at a particular point in the future, you can provide an attribute with the name forecastedTimeseriesStep in the output stream. Alternatively, you can specify a different attribute by using the forecastedTimeseriesStep parameter.

The BATS operator can forecast a time series value, timestamp and confidence interval at a single point in the future, or it can create values, timestamps and confidence intervals for all forecasts up to a single point in the future. In the second case, the results are always lists, in which the element 0 contains the forecast for one step ahead.

In particular, the operator can create the following results:

  • A forecasted time series value at a single point in the future. The output attribute denoted by the forecastedTimeseriesStep parameter must have the same type as the input time series. For example, if the input time series has the type float64, the output attribute must also be of type float64.
  • A timestamp at a single point in the future. With the inputTimestamp parameter you must specify the attribute of the input stream that represents the timestamp in the input data. The output attribute denoted by the forecastedTimestamp parameter must have the same type as the timestamp in the input data.
  • The confidence interval for a single forecast at a point in the future. The output attribute denoted by the confidenceInterval parameter must have the type tuple<float64 lower,float64 upper> if the input timeseries is a scalar, and it must be a list of the same tuple type if the input time series is a vector. The SPL type com.ibm.streams.timeseries::TSTypes.ConfidenceInterval defines this tuple type.
  • A list of all forecasted values up to a point in the future. The result is a list of single float64 if the input time series is float64 or a list of vectors if the input time series is a vector. A list of vectors must be defined as list<list<float64>>. The result is assigned to the attribute denoted by the forecastedAllTimeseriesSteps parameter.
  • A list of timestamps or all forecasts up to a point in the future. The result attribute must have the type list<uint64> or list<timestamp>, dependent on the type of the input timestamp attribute. The result list is assigned to the attribute denoted by the forecastedAllTimestamps parameter.
  • A list of confidence intervals for all forecasts up to a point in the future. The result is a list of tuple<float64 lower,float64 upper> if the input time series is float64 or a list of vectors of confidence intervals if the input time series is a vector. A list of vectors of confidence intervals must be defined as list<list<tuple<float64 lower,float64 upper>>> or list<list<TSTypes.ConfidenceInterval>>. The result is assigned to the attribute denoted by the allConfidenceIntervals parameter.

Behavior in a consistent region: The operator cannot be the start of a consistent region.An error occurs when you compile your streams processing application.

Checkpointing in an autonomous region: The operator can be configured for periodic checkpointing. When configured with operator driven checkpointing, the SPL compiler will issue a compiler error when you compile your streams processing application.

Exceptions: The BATS operator throws an exception and terminates in the following cases:
  • The value that is specified by the stepAhead, initSamples, or confidenceLevel have invalid values.
  • The specified output attributes or the default attributes, if present, have the wrong types

Error handling

The operator discards and re-trains and re-initializes the algorithm or the algorithm for a single partition in the following cases:
  • the number of dimension of the input timeseries vector decreases
  • if the maxDimension parameter is not used (timeseries vector dimension increase is disabled) and the dimension of the input timeseries vector increases for a partition
  • if the maxDimension parameter is used and the number of dimensions of the input timeseries vector exceeds the maxDimension value

The prediction of timestamps for a single forecasting step or all forecasting steps is not done if the timestamps specified by the inputTimestamp parameter are not regularily spaced. If the timeseries is seen as irregular, the output attributes denoted by the forecastedTimestamp or forecastedAllTimestamps parameter or the default parameters for those results have no assignments. Detection of irregularity is logged as a warning into the PE log.

Example

The following example demonstrates how to use the BATS operator.


use com.ibm.streams.timeseries::TSTypes;
use com.ibm.streams.timeseries.modeling::BATS;

composite BATSMain {
graph
    // sensor data with pressure, timestamp in microseconds and a sensor name
    stream <float64 pressure, uint64 timestampMicros, rstring sensorName> SensorData = FileSource() {
        param
            file: "sensorData.csv";
    }

    stream <list<float64> predictedPressures,                      // predicted pressure for 1 and 2 steps ahead
            list<TSTypes.ConfidenceInterval> confidenceIntervals,  // confidence intervals for the predictions
            list<uint64> predictedTimestamps,                      // timestamps for 1 and 2 steps ahead
            float64 pressure, uint64 timestampMicros, rstring sensorName> Forecasts = BATS (SensorData) {

        param
            inputTimeSeries: pressure;
            inputTimestamp: timestampMicros;
            initSamples: 120;                 // use 120 samples to initialize the model
            stepAhead: 2;                     // predict 2 steps into the future
            confidenceLevel: 0.95;
            // output attribute names:
            forecastedAllTimeseriesSteps: "predictedPressures";
            allConfidenceIntervals: "confidenceIntervals";
            forecastedAllTimestamps: "predictedTimestamps";
            // pressure, timestampMicros, and sensorName are automatically assigned from input attributes
    }

    () as DataSink = FileSink (Forecasts) {
        param
            file: "forecastedPressures.txt";
            format: txt;
    }
}

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 14 parameters.

Required: initSamples, inputTimeSeries

Optional: allConfidenceIntervals, confidenceInterval, confidenceLevel, forecastedAllTimeseriesSteps, forecastedAllTimestamps, forecastedTimeseriesStep, forecastedTimestamp, inputTimestamp, maxDimension, partitionBy, stepAhead, useBoxCox

Metrics
This operator does not report any metrics.

Properties

Implementation
Java

Input Ports

Ports (0)

This port consumes data for training and scoring against the model.The inputTimeSeries parameter specifies the attribute on this port that contains the time series data. The accepted data types are float64 and list<float64>.

Properties

Output Ports

Assignments
Java operators do not support output assignments.
Ports (0)

This port submits a tuple that contains the forecasted value for the time series and optionally other results like a confidence interval for the forecast. By using parameters you specify, to which output attributes the results are assigned. All output attributes that can be assigned form input attributes are assigned from the input tuple.

Properties

Parameters

This operator supports 14 parameters.

Required: initSamples, inputTimeSeries

Optional: allConfidenceIntervals, confidenceInterval, confidenceLevel, forecastedAllTimeseriesSteps, forecastedAllTimestamps, forecastedTimeseriesStep, forecastedTimestamp, inputTimestamp, maxDimension, partitionBy, stepAhead, useBoxCox

allConfidenceIntervals

Output attribute name for all confidence intervals for all forecasts up to a given point in the future. The result is a list of tuple<float64 lower,float64 upper> if the input time series is float64 or a list of vectors of confidence intervals if the input time series is a vector. A list of vectors of confidence intervals must be defined as list<list<tuple<float64 lower,float64 upper>>> or list<list<TSTypes.ConfidenceInterval>>. The default attribute name is allConfidenceIntervals.

Properties
confidenceInterval

Output attribute name for the confidence intervals of the forecasted step at a single point in the future. The output attribute denoted by this parameter must have the type tuple<float64 lower,float64 upper> if the input timeseries is a scalar, and a list of the same tuple type if the input time series is a vector. The SPL type com.ibm.streams.timeseries::TSTypes.ConfidenceInterval defines this tuple type.The default attribute name is confidenceInterval.

Properties
confidenceLevel

The confidence level that will be used for generating confidence intervals for the forecasted time series. Valid values are greater than 0 and less than 1. The default value is 0.8.

Properties
forecastedAllTimeseriesSteps

Output attribute name for the results of forecasting all values up to the given horizon. The type of this attribute must be a list that takes the forecasted values. The element type of the list must be the same as the type of the input time series. For example, if the input time series has the type float64, this output attribute must have the type list<float64>. The default attribute name is forecastedAllTimeseriesSteps.

Properties
forecastedAllTimestamps

Output attribute name for the result of extrapolating the timestamp of all forecasting steps up to a given horizon. The type of this attribute must be a list that takes the timestamps. The element type of the list must be the same as the type of the input timestamp denoted by the inputTimestamp parameter. The default attribute name is forecastedAllTimestamps.When you use this parameter or when the output stream has the default attribute forecastedAllTimestamps, you must also specify the timestamp in the input stream by using the inputTimestamp parameter.

Properties
forecastedTimeseriesStep

Output attribute name for the result of one forecasting step at a given horizon. The attribute in the output stream must have the same type as the input time series. For example, if the input time series has the type float64, the output attribute must also be of type float64. The default attribute name is forecastedTimeseriesStep.

Properties
forecastedTimestamp

Output attribute name for the result of extrapolating the timestamp for a forecasting step at a given horizon. The output attribute denoted by this parameter must have the same type as the timestamp in the input data. The default attribute name is forecastedTimestamp. When you use this parameter or when the output stream has the default attribute forecastedTimestamp, you must also specify the timestamp in the input stream by using the inputTimestamp parameter.

Properties
initSamples

Specifies the number of samples to initialize the algorithm. The minimum value of this parameter is 8 to correctly estimate the seasonality of the data. The operator does not produce output tuples unless the specified number of tuples minus one is processed. The last tuple of the initialization samples initializes the algorithm, which is basically a parameter estimation. For the last tuple of the initialization samples, the operator will produce an output tuple after the model parameter estimation is done. The operator drops (initSamples - 1) tuples during the initialization phase.

Properties
inputTimeSeries

This mandatory parameter is an attribute expression, which specifies the name of the attribute that contains the time series data in the input tuple. The supported data types are float64 and list<float64>.

Properties
inputTimestamp

Specifies the attribute in the input stream that contains the timestamp values. The supported data types are uint64 and timestamp. If the data type is uint64, then the operator does not do any assumptions about the unit of the timestamp. You need to specify the timestamp attribute only if you want to forecast the timestamp at the forecasting horizon or all timestamps up to the forecasting horizon.

Properties
maxDimension

This is an optional parameter of type int32, which triggers the time series expansion mode for vector timeseries. The parameter specifies the maximum number of time series that can be included in the process in real time. When the parameter is not specified, the number of time series (as estimated from the first tuple) is assumed to be constant per partition of the operator. If the number of dimensions exceeds the maximum or is not constant, the model for the partition is re-initialized from input data. For scalar timeseries, this parameter is silently ignored.

Properties
partitionBy

Specifies the attribute that contains the key values that are associated with the time series values on the input tuple for partitioned forecasting.

Properties
stepAhead

Specifies the forecast horizon. The default value is 1. The parameter value specifies the future time series value to be predicted. If the value is 1, then the algorithm predicts the next time series value. If the value is 2, then the algorithm predicts the time series value two steps ahead.

Properties
useBoxCox

Specifies if Box-Cox transformation should be used in the BATS algorithm. If true, Box-Cox parameters will be estimated during initialization, and the Box-Cox transformation will be used. Otherwise Box-Cox transform is ignored. Set this flag to false if the data will contain zeros or negative values as the transformation and the inverse transform can be applied only to positive values. The default value for this parameter is true.

Properties

Libraries

Operator class library
Library Path: ../../../impl/lib/timeseries-tk.jar, ../../../impl/lib/commons-math-2.1.jar, ../../../impl/lib/commons-math3-3.4.1.jar, ../../../impl/lib/WatFore-0.6.1.jar