Dynamic filter expressions in SPL
GabrielaJS 270004FR6S Visits (2692)
A new feature of InfoSphere Streams 3.0 allows dynamic filter expressions for applications that use
A recurring application sharing scenario is when different consuming applications are interested in processing different subsets of the exported stream. Prior to Streams 3.0, the importing application would receive all tuples available in the exported stream. This would result in waste of network resources, as the whole stream was transmitted but a filtering operation on the consuming application side would immediately discard many tuples. In a scenario where there are many different consumers, transferring the full stream multiple times wastes a significant amount of resources.
To reduce network transfers, developers can take advantage of dynamic filter expressions in Import operators. During application instantiation, the filtering expression is effectively shipped to the Export operator side. During runtime, the Export operator evaluates the filtering expression to decide which tuples should be transmitted to the consuming application.
The following figures show some SPL code using this new feature. All examples use a stream of type Schema declared as "int64 streamSubset, rstring stringSubset, uint32 random".
The segment below shows an Export operator that exports the stream produced by a Custom operator, which consumes a stream produced by a FileSource operator. In this example, the Custom operator forwards downstream all incoming tuples without doing any specific transformation. In reality, developers may substitute this operator with any arbitrary SPL topology. The invocation of the Export operator does not need to change from prior versions of Streams.
The file "sample.dat" has the following 10 lines:
In the Import side, one must now use the filter parameter, as in the example below. This instance of the Import operator receives only tuples where the streamSubset attribute has value 1 and the stringSubset attribute has value “streams”. This filtered, imported stream is processed by a Custom operator, which then sends the output directly to a FileSink. As in the example above, the Custom operator just illustrates a sample topology. The filter parameter in Import allows the construction of more complex expressions, similar to the subscription parameter.
The figure below shows the Streams instance graph when running the applications above. To illustrate the power of the dynamic filter expressions, we also run two other applications. The applications are similar to the importer application above, but use the following filtering expressions:stringSubset == "sources" || stringSubset == "sinks"
streamSubset == 2 && stringSubset == "operator"
As highlighted by the red rectangle, the Custom operator of one of the importing applications (right side) receives only 2 out of the 10 tuples submitted to the Export operator (10 lines in ‘sample.dat’). The filtering for the “streams” keyword allows 4 tuples to be transmitted and the filtering for “sources” or “sinks” keywords allows 3 tuples to be transmitted. The total number of tuples transmitted by the Export operator using filtering is 9, while a configuration without filtering would transfer 30 tuples.
Summary: When the application consumes only a subset of the tuples of an exported stream, use the filter parameter of the Import operator to save network resources.