Limitations and risks
When you use combinable operators, you must accept the following limitations and risks.
- Operators that are combined have to run on the same nodes with the same degree of parallelism. This makes them unsuitable for some applications.
- Operators that are combined cannot form a multi-process pipeline, which can make combinable operators slower than non-combinable operators in cases where pipeline parallelism confers a performance advantage.
- It is the responsibility of the programmer to make sure that no records are lost by the overwriting of output from a previous write, and that the input is not reused. If there is more than one output per input, the other outputs must be queued for later use using requestWriteOutputRecord().
- A combinable operator must not use any static or global variables, because if two such operators are combined into the same process, they will share the same global variables.
- When writing a combinable operator, do not use runLocally(). Instead of runLocally(), a combinable operator implements the function processInputRecord() to consume input records, one by one.
- When writing a combinable operator, do not use getRecord(). A combinable operator must never call getRecord(); instead, the framework calls getRecord() on behalf of the combinable operator and then calls processInputRecord() for each record.
- After calling putRecord() or transferAndPutRecord(), a combinable operator must not do anything to disturb the output record before returning control to the framework. The safest thing to do after calling putRecord() or transferAndPutRecord() is to return to the framework.