Operator Compress

Primitive operator image not displayed. Problem loading file: ../../image/tk$spl/op$spl.utility$Compress.svg

The Compress operator is used to compress data in a blob and generate blob output.

Checkpointed data

When the Compress operator is checkpointed, logic state variables (if present) are saved in checkpoint.

Behavior in a consistent region

The Compress operator can be used in a consistent region. It cannot be used as the start of the region. In a consistent region, a Compress operator stores its state when a checkpoint is taken. When the region is reset, the operator restores the state from the checkpoint.

In a consistent region, it is recommended that each sequence creates a block of data that is decompressed as a unit. For example, write the output of each sequence to a separate file. The drain action is to complete generation of tuples for all data received, including any end-of-stream data that the compression algorithm requires. At receipt of the first tuple in any sequence, a new compressed stream is generated, including any start-of-stream data that the compression algorithm requires.

The generation of a single compressed stream from multiple consistent region sequences is not supported.

Checkpointing behavior in an autonomous region

When the Compress operator is in an autonomous region and configured with config checkpoint : periodic(T) clause, a background thread in SPL Runtime checkpoints the operator every T seconds, and such periodic checkpointing activity is asynchronous to tuple processing. Upon restart, the operator restores its state from the last checkpoint.

When the Compress operator is in an autonomous region and configured with config checkpoint : operatorDriven clause, no checkpoint is taken at runtime. Upon restart, the operator restores to its initial state.

Such checkpointing behavior is subject to change in the future.

Examples

This example uses the Compress operator.


composite Main {
  graph
    stream<rstring a, int32 b> A = Beacon() {
      param iterations : 100;
    }
    stream<blob b> B = Format (A) {
      param format : txt;
      output B     : b = Output();
    }
    stream<blob b> C = Compress (B) {
      param compression : gzip;
      // compressionInput defaults to 'b', as there is only 1 input attribute
    }
    // Write it to a file
    () as Nul = FileSink (C) {
      param file   : "out";
            format : block;
    }
}

// This example is equivalent to the following SPL program:
composite Main2 {
  graph
    stream<rstring a, int32 b> A = Beacon() {
      param iterations : 100;
    }
    // Write it to a file
    () as Nul = FileSink (A) {
      param file        : "out";
            format      : txt;
            compression : gzip;
    }
}

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 3 parameters.

Required: compression

Optional: compressionInput, flushOnPunct

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

The Compress operator is configurable with a single input port, which ingests tuples that contain data to be compressed.

Properties

Output Ports

Assignments
This operator does not allow assignments to output attributes.
Ports (0)

The Compress operator is configurable with a single output port, which produces tuples that contain compressed data.

Properties

Parameters

This operator supports 3 parameters.

Required: compression

Optional: compressionInput, flushOnPunct

compression

Specifies the compression mode, which compresses the input to the output by using the specified algorithm.

Properties

compressionInput

Specifies the data to be compressed. If this parameter is not specified, the input stream must consist of a single blob attribute.

Properties

flushOnPunct

Specifies when the compression is completed.

If the parameter value is true, the compression is completed when a window punctuation is received. The remaining data is generated followed by a window punctuation. Any subsequent input tuples cause the compression to reset to the initial state.

If the parameter is not specified or the value is false, the compression is completed when a final punctuation is received.

Properties

Code Templates

Compress

(stream<blob> ${outputStream} = Compress(${inputStream})   {
            param
                compression : ${algorithm};
                compressInput : ${inputAttribute};
        }
      

Libraries

spl-std-tk-lib
Library Name: streams-stdtk-runtime
Include Path: ../../../impl/include