IBM InfoSphere Streams Version 4.1.0

Developing another simple streams processing application

In addition to filtering large data sets, you can reduce the volume of data by reducing the size of the tuples from the input stream.

About this task

For example, if you have a source file with tuples that contain many attributes, and most of those attributes are not used by downstream operators, you might want to create a new stream that contains only a few of the needed attributes. In the MyFilterApp example, the tuple type contains only four attributes (ticker, date, time, price), as shown in the SmallDataType type. Suppose that the tuple type contained many more attributes; for example, 30 attributes as in the BigDataType type that is shown in the example.

   type
      BigDataType = rstring ticker, rstring date, rstring time, int32 gmtOffset,
                    rstring ttype, rstring exCntrbID, decimal64 price,
                    decimal64 volume, decimal64 vwap, rstring buyerID,
                    decimal64 bidprice, decimal64 bidsize, int32 numbuyers,
                    rstring sellerID, decimal64 askprice, decimal64 asksize,
                    int32 numsellers, rstring qualifiers, int32 seqno,
                    rstring exchtime, decimal64 blockTrd, decimal64 floorTrd,
                    decimal64 PEratio, decimal64 yield, decimal64 newprice,
                    decimal64 newvol, int32 newseqno, decimal64 bidimpvol,
                    decimal64 askimpcol, decimal64 impvol;
      SmallDataType = rstring ticker, rstring date, rstring time, decimal64 price;

In this example, you create a streams processing application that reads data from a source file, reduces the size of the tuples, and then writes the reduced data to an output file.

Procedure

  1. Reading data from a source file
  2. Reducing the size of tuples
  3. Writing data to an output file
  4. Combining the operators into a graph