Operator PSAX

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.analysis$PSAX.svg

The PSAX operator is capable of providing a symbolic representation of real-valued time series data. By converting time series data into symbol strings, non-time series algorithms, such as text analytic algorithms, can be used to analyze the data. This operator can be used for a number of different types of analysis, including pattern recognition, anomaly detection, indexing and clustering.

The PSAX operator provides three algorithms for analyzing the data:
  • PAA - The Piecewise Aggregate Approximation (PAA) algorithm is used to compress a time series into a sequence of levels
  • SAX - The Symbolic Aggregate Approcimation (SAX) algorithm is used to convert each individual data point in the time series into a symbol.
  • PSAX - This algorithm is a combination of the PAA and SAX algorithms. The input time series is first compressed into a sequence of levels. Then, each level is converted into a symbol.

Symbol Map

Upon initialization of the operator, a symbol map will be printed to the console of the operator. This map can be used as a reference to determine which symbol will be assigned to a given input value. For example, assume the SAX algorithm is used, the numSymbols parameter value is set to 32 and the lowerBound and upperBound parameter values are set to 0 and 20, respectively. Here is a sample of what the symbol map will look like:


Symbol map for key: "default"
--------------------
symbol_level,symbol,range_begin(inclusive),range_end(exclusive)
0,A,0,0.625
1,B,0.625,1.25
2,C,1.25,1.875
3,D,1.875,2.5
4,E,2.5,3.125
...

Behavior in a consistent region

  • The operator cannot be the start of a consistent region. An error occurs when you compile your streams processing application.

Exceptions

The PSAX operator throws an exception in the following cases:

  • The upperBound and lowerBound parameters are not specified for the SAX and PSAX algorithms.
  • The value that is specified by the upperBound parameter is less than the value specified by the lowerBound parameter.
  • The length of the alphabet string is less than the value specified for the numSymbols parameter. If the numSymbols parameter is not defined, then an exception is thrown if the length of the alphabet string is less than 32 (the default value for numSymbols).
  • The window clause is not specified when using either the PAA or PSAX algorithm

Windowing

The operator supports a tumbling window with any eviction policy(count, delta or time-based). When the window is full, the PSAX operator applies the specified algorithm on the tuples.
  • For the PAA and PSAX algorithms, the operator will aggregate the values in the window and output the approximated time series and timestamp values using the getPAAValue() and getApproximatedTimestamp() output functions. For the PSAX algorithm, the operator will use the aggregated time series value to determine the symbol. This symbol can be outputted using the getSymbol() output function.
  • For the SAX algorithm, the operator will determine a the symbol for each time series value in the window. The getSymbol() output function will return a list of symbols, one for each time series value in the window.

Example


namespace application ;

use com.ibm.streams.timeseries.analysis::PSAX ;

composite PSAXTest
{
  graph
    (stream<uint64 time, float64 data> SrcStream) as Src = Beacon()
    {
      param
        iterations : 100u;
      output
        SrcStream : time = IterationCount(), data = random() * 20f ;
    }
  
    (stream<uint64 time, rstring symbol, float64 origData>
      PSAXStream) as PSAXOp = PSAX(SrcStream as inPort0Alias)
    {
      param
        inputTimeSeries : data ;
        inputTimestamp : time;
        algorithm : SAX ;
        lowerBound : 0f ;
        upperBound : 20f ;
      output
        PSAXStream : 
          time = getApproximatedTimestamp(), 
          symbol = getSymbol(), 
          origData = data ;
    }

    () as SinkOp = Custom(PSAXStream)
    {
      logic
        onTuple PSAXStream:
        {
          println(PSAXStream);
        }
    }
}

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator optionally accepts a windowing configuration.
Parameters
This operator supports 9 parameters.

Required: inputTimeSeries

Optional: algorithm, alphabet, inputTimestamp, lowerBound, numSymbols, partitionBy, upperBound, validate

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

Consumes time series data to be processed by the specified algorithm. The inputTimeSeries parameter specifies the name of the attribute on this port that contains the time series data. The accepted data type is float64.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Output Functions
AggregateApproximationFunctions
<any T> T getSymbol()

This function returns the symbol(s) for the time series value(s). This output function is only supported when using either the SAX or PSAX algorithm. When using the SAX algorithm, the return type is rstring. When using the PSAX algorithm, the return type is list<rstring>.

<any T> T getSymbolLevel()

This function returns the symbol level(s) for the time series value(s). This output function is only supported when using either the SAX or PSAX algorithm. When using the SAX algorithm, the return type is rstring. When using the PSAX algorithm, the return type is list<rstring>.

float64 getPAAValue()

This function returns the PAA value. This output function is only supported when using either the PAA or PSAX algorithm.

<any T> T getApproximatedTimestamp()

This function returns the approximated timestamp. This output function is only supported when the inputTimestamp parameter is specified. When using the SAX algorithm, this output function will simply return the original input timestamp(s). When using the PAA or PSAX algorithms, this output function will calculate a timestamp value based on the input timestamps in the window. If using the SAX algorithm with windowing, the return type for this output function will be list<uint64>. Otherwise, the return type for this function is uint64.

<any T> T AsIs(T v)

The default function for output attributes. By default, this function assigns the output attribute to the value of the input attribute with the same name.

Ports (0)

Submits a tuple containing the results of the specified algorithm. Custom output functions are used to specify the submitted data.

  • For the PAA algorithm, supported output functions include: getPAALevel() and getApproximatedTimestamp().
  • For the SAX algorithm, supported output functions include: getSymbol(), getSymbolLevel() and getApproximatedTimestamp().
  • For the PSAX algorithm, supported output functions include getPAALevel(), getSymbol(), getSymbolLevel() and getApproximatedTimestamp().

The output tuple attributes whose assignments are not specified are assigned from input attributes.

Properties

Parameters

This operator supports 9 parameters.

Required: inputTimeSeries

Optional: algorithm, alphabet, inputTimestamp, lowerBound, numSymbols, partitionBy, upperBound, validate

algorithm
Specifies the algorithm that the PSAX operator uses. The following algorithms are supported:
  • PAA: Converts the real valued input time series in to a sequence of levels of type float64. This algorithm returns one value per window.
  • PSAX: Converts the real valued input time series to discrete values that are represented by symbols. This internally uses the PAA algorithm. The PSAX algorithm returns one symbol per window of input time series. For example, if the input window is [117.0, 118.0, 116.0], the PSAX algorithm returns a value based on the values that are specified by the numSymbols, lowerBound and upperBound parameters.
  • SAX: Converts the real valued input time series to discrete values that are represented by symbols. If no window is specified, the algorithm will return one symbol for the input time series value. If windowing is specified, this algorithm returns one symbol for each value of input time series in the window. For example, if the input window is [117.0, 118.0, 116.0], this algorithm returns three symbols based on the values that are specified by the numSymbols, lowerBound and upperBound parameters. The default value for this parameter is PSAX.
Properties

alphabet

Specifies the alphabet that should be used when outputting the symbols. The default alphabet used is: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890ΓΔΛΞΠΣΦΨΩαβγδεζηθλμξπσφωÂÊÎÔÛâêîôûÄËÏÖÜäëïöüÀÈÌÒÙàèìòùĀĒāēĪīŌōŪūÅ".

Properties

inputTimeSeries

Specifies the name of the attribute that contains the time series data in the input stream.

Properties

inputTimestamp

Specifies the name of the attribute in the input stream that contains the timestamp values. The supported data type is uint64.

Properties

lowerBound

Specifies the minimum time series data value in a time series. This parameter is mandatory if the algorithm is specified as SAX or PSAX.

Properties

numSymbols

Specifies the number of symbols that can be used by the SAX and PSAX algorithms. The default alphabet contains 128 symbols, therefore the maximum number of symbols that can be specified when using the default alphabet is 128. If more symbols are required, a custom alphabet should be defined using the alphabet parameter. The default value for this parameter is 32u.

Properties

partitionBy

Specifies the name of the attribute that contains the key values that are associated with the time series values in the input tuple.

Properties

upperBound

Specifies the maximum time series data value in a time series. This parameter is mandatory if the algorithm is specified as SAX or PSAX.

Properties

validate

Determines how the input time series data should be validated when using either the SAX or PSAX algorithm. There are three valid values: fast, permissive, strict.

  • When the parameter value is strict, an exception will be thrown if the input time series value falls outside of the range defined by the lowerBound and upperBound parameter values. An error message will be logged and the operator will terminate.
  • When the parameter value is permissive, a warning will be logged if the input time series value falls outside of the range defined by the lowerBound and upperBound parameter values. In the event that the time series value falls outside of the range, the input value will be bound to either the lowerBound or upperBound. For example, assume the lowerBound value is -10 with a corresponding symbol of "A". If an input time series arrives with a value of -15, it will be treated as if it's value were -10 and thus it's corresponding symbol will also be "A".
  • When the parameter value is fast, no errors or warnings will be logged if the value falls outside of the range defined by the lowerBound and upperBound parameter values. The input time series value will be treated in the same way as described for the permissive case.
Properties

Code Templates

PSAX using SAX

stream<uint64 time, rstring symbol> PSAXStream = PSAX(${inputStream1}) 
{
	param
		inputTimeSeries: ${inputTimeseriesAttribute};
		inputTimestamp: ${inputTimestampAttribute};
		algorithm: SAX;
		lowerBound: ${lowerBoundValue};
		upperBound: ${upperBoundValue};
	output
		PSAXStream: time = getApproximatedTimestamp(), symbol = getSymbol();
}
      

PSAX using PSAX

stream<uint64 time, float64 paaValue, rstring symbol> PSAXStream = PSAX(${inputStream1}) 
{
	window 
		${inputStream1} : ${windowMode};
	param
		inputTimeSeries: ${inputTimeseriesAttribute};
		inputTimestamp: ${inputTimestampAttribute};
		algorithm: PSAX;
		lowerBound: ${lowerBoundValue};
		upperBound: ${upperBoundValue};
	output
		PSAXStream: time = getApproximatedTimestamp(), paaValue = getPAAValue(), symbol = getSymbol();
}
      

PSAX using PAA

stream<uint64 time, float64 paaValue> PSAXStream = PSAX(${inputStream1}) 
{
	window 
		${inputStream1} : ${windowMode};
	param
		inputTimeSeries: ${inputTimeseriesAttribute};
		inputTimestamp: ${inputTimestampAttribute};
		algorithm: PAA;
	output
		PSAXStream: time = getApproximatedTimestamp(), paaValue = getPAAValue();
}
      

Libraries

No description for library.
Library Name: tsatapi
Library Path: ../../../impl/lib
Include Path: ../../../impl/include