Operator GAMLearner

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.modeling$GAMLearner.svg

The GAMLearner operator applies the generalized additive model (GAM) algorithm to categorical or continuous time series data.

If the input is observed data, the GAMLearner operator adapts the model parameters and outputs the filtered data. Otherwise, the model applies the existing model to filter data.

The GAMLearner operator can enhance the generalized additive model (GAM) that is specified in the Predictive Model Markup Language (PMML) file by using the observed data. The operator can also score the data by using the PMML model.

If the input is observed data, then the operator updates the model parameters and outputs the filtered data.

The GAMLearner operator can prevent overfitting or numerical instability of the model by applying penalization and regularization techniques. A model is overfitted if it cannot learn and generalize the trends in the input data based on the observed data. The values predicted by an overfitted model are highly inaccurate and usually exaggerate minor fluctuations in the data. Numerical instability can occur when the learning problem is ill-posed, for example, due to collinearity in the input variables.

Note : If the model is applied only to scoring input data (not to updating the model parameters), use the GAMScorer operator. The GAMScorer operator requires less memory and computation time.

Specify the mapping between the attributes of the input tuples and the covariates in the operator parameters. Each model has its own modelID (an unsigned 64-bit integer). The operator can manage several generalized additive models at the same time; input tuples are assigned to the correct model by using the modelID value.

The GAMLearner operator supports an optional control port to import a GAM model as a PMML file and an optional monitor output port to monitor the GAM model. The GAMLearner operator can monitor the existing model and can use the control port to import and export the GAM models. The operator uses a PMML file for importing and exporting GAM models.

The GAMLearner operator is a univariate operator.

Behavior in a consistent region

  • The GAMLearner operator is not supported in a consistent region. A warning occurs when you compile your streams processing application.
  • The operator cannot be the start of a consistent region. An error occurs when you compile your streams processing application.

Exceptions

The GAMLearner operator throws an exception in the following cases:

  • The attributes that are provided in the spl2pmml map do not exist in the PMML file.
  • The attributes that are provided in the spl2pmml map do not exist in the input tuple.
  • The covariates defined in the PMML file do not exist in the spl2pmml map.
  • The PMML file does not exist or is malformed.
  • The GAMLearner operator receives a tuple with a modelID attribute value which does not exist in the modelID2file parameter.
  • A Load signal is sent to the control port and the modelID and pmmlModel parameters are not specified.
  • A Load signal is sent to the control port and an invalid PMML model is specified. For example the file that describes the model uses a wrong format, the file specifies parameters that are not defined, or the files does not contain values of coefficients for parameters of the model.

Examples

The following example uses the GAMLearner operator to predict a target time series. For example, it can be used to predict electricity consumption by using data about the price and weather.


use com.ibm.streams.timeseries.modeling::GAMLearner;
composite Main
{
	type Input = tuple<uint64 modelID, rstring condition, float64 x1, float64 x2,
		float64 x3, float64 x4, float64 y, float64 filtered_y>;
	graph
		stream<Input> In1 = FileSource()
		{
			param
				file : "test_data_3.dat";
				format :csv;
		}
		
		stream<Input> In0 = FileSource()
		{
			param
				file: "test_data.csv";
				initDelay: 1f;
		}
		
		(stream<In1, tuple<float64 y_hat>> Out0;
			stream<In1,tuples<float64 y_hat>> Out1) = GAMLearner(In0;In1)
		{
			param
				modelID2file: { 1ul : "train_pmml.xml" }
				spl2pmml: {
					"condition" : "Type",
					"x1" : "X1",
					"x2" : "X2",
					"x3" : "X3",
					"x4" : "X4"
				};
				modelID: In1.modelID,In0.modelID;
				observedValue:In0.y;
			output
				Out0:y_hat=predictedValue();
				Out1:y_hat=predictedValue();
		}
		
		() as Sink1 = Custom(Out1)
		{
			logic
				state: mutable int32 n = 0;
				onTuple Out1: 
				{
					if( (++n % 500) == 1)
						println(Out1);
				}
		}
	}

Summary

Ports
This operator has 3 input ports and 3 output ports.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 17 parameters.

Required: algorithm, modelID, modelID2file, observedValue, spl2pmml

Optional: controlSignal, forgettingFactor, forgettingFactorLowerBound, lambda, learningRate, penalizer, penalizerStrength, pmmlModel, regularizer, regularizerDamper, regularizerRefresh, regularizerStrength

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Never - Operator never provides a single threaded execution context.

Input Ports

Ports (0)

This port consumes data for scoring using the current generalized additive model (GAM). The score is submitted to output port 0. The spl2pmml parameter specifies the mapping between input attributes on this port and the covariates in the GAM.

Properties

Ports (1)

This port consumes data for updating the generalized additive model (GAM) that is specified in the PMML file. After the model is updated, the input value is also used for filtering the input and it submitted to output port 1. The observedValue parameter specifies the name of the attribute on this port that that contains the observed value that is used for updating the model. The accepted data type is float64.

Properties

Ports (2)

This port accepts control signals to control the behaviour of the operator. The controlSignal parameter specifies the name of the attribute on this port that contains the control signal type. The pmmlModel parameter specifies the name of the PMML model being loaded. The modelID parameter specifies the name of the input attribute that contains the model ID.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Output Functions
GAMFunctions
<any T> T AsIs(T v)

The default function for output attributes. By default, this function assigns the output attribute to the value of the input attribute with the same name.

float64 predictedValue()

Returns the predicted time series value.

GAMModel
rstring getModel()

Returns PMML model of GAM.

<any T> T AsIs(T v)

The default function for output attributes. By default, this function assigns the output attribute to the value of the input attribute with the same name.

Ports (0)

This port submits a tuple that contains the score for the time series data that is received on input port 0. This port submits a tuple whenever time series data is scored against the model. Custom output functions are used to specify the value of the output tuple attributes. The output tuple attributes whose assignments are not specified are assigned from input attributes.

Properties

Ports (1)

This port submits a tuple that contains the filtered input time series. This port submits a tuple whenever a time series tuple is filtered. Custom output functions are used to specify the value of the output tuple attributes. The output tuple attributes whose assignments are not specified are assigned from input attributes.

Properties

Ports (2)

This port submits a tuple that contains the current PMML model. This port submits a tuple each time a Monitor signal is consumed on the input control port. The getModel() output function is assigned to the output attribute that holds the PMML model. The expected data type of the attribute is rstring.

Properties

Parameters

This operator supports 17 parameters.

Required: algorithm, modelID, modelID2file, observedValue, spl2pmml

Optional: controlSignal, forgettingFactor, forgettingFactorLowerBound, lambda, learningRate, penalizer, penalizerStrength, pmmlModel, regularizer, regularizerDamper, regularizerRefresh, regularizerStrength

algorithm

Specifies the iterative learning algorithm to be used to update the model. The supported algorithms are:

  • PRLS - Penalized Recursive Least Squares
  • PRLS-FF - Penalized Recursive Least Squares with Forgetting Factors
  • PRLS-AFF - Penalized Recursive Least Squares with Adaptive Forgetting Factors

In addition, the same algorithm can be used with fixed transfer functions by using the following values:

  • PRLS-T
  • PRLS-T-FF
  • PRLS-T-AFF.

The default value is PRLS.

Properties

controlSignal

Specifies the name of the attribute in the control port, which holds the control signal. The supported control signals are: TSSignal.Monitor, TSSignal.Load, TSSignal.Suspend and TSSignal.Resume.

Properties

forgettingFactor

Specifies the forgetting factor, which gives less weight to older time series values while it is learning. The value has to be strictly greater than 0 and less than or equal to 1. The smaller the forgetting factor, the more aggressive is the operator in putting more weight on recent time series values. The default value is EPSILON.

Properties

forgettingFactorLowerBound

Specifies the forgetting factor, which gives less weight to older time series values while it is learning. It has to be strictly greater than 0 and smaller than 1 and the given value of the forgettingFactor parameter. The default value is EPSILON.

Properties

lambda

DEPRECATED Use the penalizerStrength parameter.

Properties

learningRate

Specifies the value of the learning rate for adaptive forgetting factors. The default value is EPSILON.

Properties

modelID

Specifies the name of the input attribute that contains the model ID. The parameter value is a uint64 data type. If both input ports 0 and 1 are used, the modelID value must exist in both ports.

Properties

modelID2file

Specifies the name of the PMML files that contain the generalized additive models that are associated with different modelID parameter values. This parameter value is a map<uint64, rstring> data type.

Properties

observedValue

Specifies the name of the input attribute that contains the observed value, which is to be used for updating the model. The observed value must be present in input port 1; otherwise a compile-time error occurs.

Properties

penalizer

specifies the structure of a penalization matrix for preventing overfitting of the generalized additive model. The strength of the penalizer is determined by the parameter penalizerStrength. The penalizer parameter supports the following options:

  • diagonal - the GAMLearner operator uses a diagonal penalization matrix, where the entries on the diagonal are given by the value of the penalizerStrength parameter.
  • full - uses a matrix penalizing deviations of the model smoothing splines from linear functions
  • fulldiagonal - the GAMLearner operator applies a matrix that is the sum of the penalizer matrix that is used in the full and the diagonal options

The default value is set to full. The value must not be set to full or fullDiagonal if the algorithm paramter is set to PRLS-T, PRLS-T-FF or PRLS-T-AFF.

Properties

penalizerStrength

Specifies the strength of the penalizer parameter. It must only be set if the penalizer parameter is equal to diagonal of fullDiagonal. Its value must be strictly greater than zero. The default value is EPSILON.

Properties

pmmlModel

Specifies the name of the PMML model. If this parameter is not specified, by default, the pmmlModel attribute is used. If the default attribute or the pmmlModel parameter is not provided, the operator throws an exception.

Properties

regularizer

Specifies the type of regularization to use to solve an ill-posed problem and to prevent numerical instability of the model. The following options are supported for this parameter:

  • none: The GAMLearner operator does not apply any regularization technique.
  • initial: The GAMLearner operator applies the regularization technique when creating the model.
  • periodic: The GAMLearner operator applies the regularization technique periodically, where the period is defined by the value that is assigned to the regularizerRefresh parameter.
  • adaptive: The GAMLearner operator applies the regularization technique after processing each input time series tuple. The strength of the regularization technique is defined by the regularizerStrength parameter.
  • LevenbergMarquardt: The GAMLearner operator applies the Levenberg-Marquardt algorithm on the input time series.

The default value is set to none.

Properties

regularizerDamper

Specifies the strength of the damping parameter in the Levenberg-Marquardt algorithm. The value must be a positive float64 value. This parameter must only be defined if the regularizer parameter is set to LevenbergMarquardt. The default value is EPSILON.

Properties

regularizerRefresh

Specifies the number of input tuples after which the GAMLearner operator applies the regularization technique. The default value is set to 1u. This parameter must only be defined if the regularizer parameter is set to periodic.

Properties

regularizerStrength

Defines the strength of the regularizer. This value must be a positive float64 value. This parameter must only be defined if the regularizer parameter is not set to none. The default value is EPSILON.

Properties

spl2pmml

Specifies a mapping between attributes in the input tuples and the covariates in the generalized additive model. If two input ports are provided, then the attributes in the map must exist in both the input ports. If the parameter value is an empty map, then the generalized additive model does not have any covariates and therefore represents a constant function.

Properties

Code Templates

GAMLearner

stream<${schema}> ${outputStream} = GAMLearner(${inputStream}) 
{
	param
		modelID2file:		${modelID2fileExpression};
		spl2pmml:		${spl2pmmlExpression};
		modelID:		${modelIDExpression};
		observedValue: 		${observedValueExpression};
		algorithm: 		${algorithmLiteral};
	output
		${outputStream}: ${outputExpression};
}
      

Libraries

tsaModelingLibrary
Library Name: modeling
Library Path: ../../../impl/lib
Include Path: ../../../impl/include
libxml2
Library Name: xml2 -lz -lm
Library Path: /usr/lib64
Include Path: /usr/include/libxml2