Defining custom operators
Operators are one of the basic building blocks of IBM® InfoSphere® DataStage® parallel jobs. You can define your own custom operators by extending the library of available operators.
Operators underlie the stages in a parallel job. A single stage might correspond to a single operator, or to a number of operators, depending on the properties you have set, and whether you have chosen to partition or collect or sort data on the input link to a stage. At compilation, InfoSphere DataStage evaluates your job design and will sometimes optimize operators out if they are judged to be superfluous, or insert other operators if they are needed for the logic of the job.
There are a large number of predefined operators and these are described in Operators. You can also define your own operators, which you can then incorporate into parallel job stages by using a Custom stage type.
You define new operators by deriving your own classes from the parallel job C++ class library.
- General operators
- You create operators to execute custom application logic as part of an InfoSphere DataStage job. Your operators can execute in parallel or execute sequentially.
- Partitioner operators
- You create partitioner operators to implement specific partitioning algorithms. The derived partitioners can then be used by parallel operators.
- Collector operators
- You create collector operators to implement specific collection methods. A collection method defines how the partitions of a data set are combined for input to a sequential operator.