Filtering large data sets
In this example, the stream processing application needs to filter the stock transaction data for IBM® transaction records. You use the Filter operator to extract relevant information from potentially large volumes of data. As shown, the input for the Filter operator is all the transactions; the output is only the IBM transactions.

In SPL terminology, a tuple is a unit of data for an operator. In this example, each transaction record is a tuple. A stream is a sequence of tuples. In this example, "All transactions" is an input stream, and "Only IBM transactions" is an output stream. An operator transforms an input stream into an output stream. In this example, the operator filters the data by processing each tuple from the input stream and submitting the tuple to the output stream only if it is an IBM transaction.
Suppose that each transaction record (that is, a tuple) contains the following four fields. Each field in a record corresponds to an attribute in a tuple.
| Field | Type | Name | Description |
|---|---|---|---|
| 1 | string | ticker | ticker name |
| 2 | string | date | transaction date |
| 3 | string | time | transaction time |
| 4 | decimal | price | trading price |
In SPL code, the transaction record can be represented by the following TransactionRecord type:
type
TransactionRecord = rstring ticker,
rstring date,
rstring time,
decimal64 price;
where rstring is a sequence of raw bytes that supports string processing when the character encoding is known, and decimal64 is the IEEE 754 decimal 64-bit floating point number.
The SPL code for the Filter operator is shown. To read the code, you say that the output stream is produced by operating on the input stream. In this case, you say that IBMTransactions is produced by filtering AllTransactions.
stream<TransactionRecord> IBMTransactions = Filter(AllTransactions) {
param
filter : ticker == "IBM";
}
In general, the Filter operator receives tuples from an input stream and submits a tuple to the output stream only if the tuple satisfies the criteria that are specified by the filter parameter.
In this example, the Filter operator performs the following steps:
- Receives a tuple from the input stream (AllTransactions).
- If the value of the ticker attribute is IBM, it submits the tuple to the output stream (IBMTransactions).
- Repeats Steps 1 to 2 until all the tuples from the input stream are processed.
The Filter operator requires that the type of the output stream is the same as the type of the input stream. The type of the output stream is specified by the tupleType in the "stream<tupleType> OutputStream = Filter(InputStream)" declaration. In this example, the type of the output and input streams is TransactionRecord.