IBM Streams 4.2.1

Operator Distribution

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.analysis$Distribution.svg

The Distribution operator calculates the quartile distribution for an input time series.

A quartile distribution divides the time series into four equal groups. The first quartile or the lower quartile is a value below which 25% of the values lie. The second quartile or the median is a value that divides the distribution in two equal parts and is treated as an average. The third quartile or the upper quartile is a value that has 75% of values below it and 25% of values above it. An outlier is a value that is outside the range that is defined by the quartile distribution.

The format of the quartile distribution is [n1,m1..] where n1 is the input time series value and m1 represents the number of occurrences of n1 in the time series. For example, if the input time series is 20, 90, 90, 100, 20, 90, the Distribution operator calculates the quartile distribution as [20, 2, 90, 3, 100, 1].

You can use the Distribution operator values to classify data based on how far or how close you are to the average value.

Behavior in a consistent region

  • The operator is not supported in a consistent region. A warning occurs when you compile your streams processing application.
  • The operator cannot be the start of a consistent region. An error occurs when you compile your streams processing application.

Exceptions

The operator has no exceptions. All errors and warnings appear in the processing element log file.

Examples

use com.ibm.streams.timeseries.analysis::Distribution;

composite Main {
  stream<float64 data,int32 index> Sample1 = FileSource() {
    param
      file   : "tsList0.dat";
      format : csv;
  }
  stream <float64 median,float64 firstQuartile,list<float64> distribution > Out1 = Distribution(Sample1)
  {
    window
      Sample1: sliding,count(11),partitioned;
    param
      inputTimeSeries : Sample1.data;
      partitionBy     : index;
      minValue        : 0;
      maxValue        : 2200;

    Out1: median = median(),
    firstQuartile=firstQuartile(),
    distribution=distribution();
  }

  () as writer = FileSink(Out1) {
    param
      file   : "MedianResults.dat";
      format : csv;
  }
}

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator requires a windowing configuration.
Parameters
This operator supports 4 parameters.

Required: inputTimeSeries

Optional: maxValue, minValue, partitionBy

Metrics
This operator reports 3 metrics.

Properties

Implementation
C++
Threading
WindowEvictionBound - Operator provides a single threaded execution context only if a time-based window eviction policy is not used.

Input Ports

Ports (0)

This port consumes timeseries data for calculating the quartile distribution. The inputTimeSeries parameter specifies the name of the attribute on this port that contains the time series data. The accepted data type is float64.

Windowing

The Distribution operator supports a sliding partitioned window with count-based eviction policy and count-based trigger policy of 1. For each tuple that is inserted in the window, the Distribution operator calculates the quartile distribution and outputs the calculated distribution with median and outlier values. For partitioned windows, the Distribution operator maintains one window per partition and calculates the distribution for the input time series that belongs to a specific partition as identified by the partitionBy parameter.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Output Functions
DistributionFct
<any T> T AsIs(T v)

Default function

float64 distMedian()

This function returns the median value that is calculated so far.

float64 firstQuartile()

This function returns the first quartile value of the current distribution.

float64 smallestNonOutlier()

This function returns the smallest non-outlier of the current distribution.

float64 largestNonOutlier()

This function returns the largest non-outlier of the current distribution.

float64 thirdQuartile()

This function returns the third quartile value of the current distribution.

list<float64> distribution()

This function returns list<float64> values that contain the calculated distribution. If the input is partitioned, this function returns the distribution that is calculated so far by using input time series values that belong to the same partition. The format of the distribution is [n1,m1..], where n1 is the input time series value and m1 is the number of occurrences of n1. For example, if an input time series arrives in order 16, 17 ,16, the calculated distribution is [16,2,17,1], which is sorted in ascending order.

Ports (0)

Properties

Parameters

This operator supports 4 parameters.

Required: inputTimeSeries

Optional: maxValue, minValue, partitionBy

inputTimeSeries

Specifies the name of the attribute that contains the time series data in the input tuple. The supported type is either a single float64 value or list<float64> values.

Properties

maxValue

Specifies the maximum value that can be expected as input. The default is set to 500. The input time series whose value is greater than the maximum value is ignored and a warning message is written to the log.

Properties

minValue

Specifies the minimum value that can be expected as input. The default is set to 0. The input time series whose value is less than the minimum value is ignored and a warning message is written to the log.

Properties

partitionBy

Specifies the name of the attribute that contains the key values that are associated with the time series values in the input tuple.

Properties

Code Templates

Sort
stream<${schema}> ${streamName} = Sort(${inputStream}) {
            window
                ${inputStream}: ${windowMode};
            param
                sortBy : ${sortExpression}
        }
      

Metrics

nCurrentPartitions - Gauge

The number of partitions in the current sliding window.

nMaxValuesIgnored - Counter

The number of values that are ignored in the input time series because the values are greater than the value that is specified in the maxValue parameter.

nMinValuesIgnored - Counter

The number of values that are ignored in the input time series because the values are less than the value that is specified in the minValue parameter.

Libraries

No description for library.
Library Name: tsatapi
Library Path: ../../../impl/lib
Include Path: ../../../impl/include