Single Processor and Multi-Processor Systems

The default behavior when compiling IBM® InfoSphere® DataStage® jobs is to run all adjacent active stages in a single process. This makes good sense when you are running the job on a single processor system. When you are running on a multiprocessor system it is better to run each active stage in a separate process so the processes can be distributed among available processors and run in parallel. The enhancements to server jobs at Release 6 of InfoSphere DataStage make it possible for you to stipulate at design time that jobs should be compiled in this way. There are two ways of doing this:

The IPC facility can also be used to produce multiple processes where passive stages are directly connected. This means that an operation reading from one data source and writing to another can be divided into a reading process and a writing process able to take advantage of multiprocessor systems.

The following diagram illustrates the possible behavior for active stages:
Figure 1. Default behavior
Shows a job which runs in a single process
Figure 2. Implicit forcing of multiple processes via interprocess row buffering
Shows the same job using interprocess row buffering to force multiple processes
Figure 3. Using IPC stages to force multiple processes
Shows the same job using InterProcess stages buffering to force multiple processes
The following diagram illustrates the possible behavior for passive stages:
Figure 4. Default behavior, invisible Transformer stage inserted at compile time
Shows a job which runs in a single process
Figure 5. Using IPC stage to force multiple processes, with invisible Transformer stages inserted at compile time
Shows the same job using InterProcess stages buffering to force multiple processes