Optimizing parallelism

The degree of parallelism of a parallel job is determined by the number of nodes you define when you configure the parallel engine. There are various ways in which you can optimize parallelism.

The degree of parallelism of a parallel job is determined by the number of nodes you define when you configure the parallel engine. Parallelism should be optimized for your hardware rather than simply maximized. Increasing parallelism distributes your work load but it also adds to your overhead because the number of processes increases. Increased parallelism can actually hurt performance once you exceed the capacity of your hardware. Therefore you must weigh the gains of added parallelism against the potential losses in processing efficiency.

Obviously, the hardware that makes up your system influences the degree of parallelism you can establish.

SMP systems allow you to scale up the number of CPUs and to run your parallel application against more memory. In general, an SMP system can support multiple logical nodes. Some SMP systems allow scalability of disk I/O. "Configuration Options for an SMP" discusses these considerations.

In a cluster or MPP environment, you can use the multiple CPUs and their associated memory and disk resources in concert to tackle a single computing problem. In general, you have one logical node per CPU on an MPP system. "Configuration Options for an MPP System" describes these issues.

The properties of your system's hardware also determines configuration. For example, applications with large memory requirements, such as sort operations, are best assigned to machines with a lot of memory. Applications that will access an RDBMS must run on its server nodes; and stages using other proprietary software, such as SAS, must run on nodes with licenses for that software.

Here are some additional factors that affect the optimal degree of parallelism:

CPU-intensive applications, which typically perform multiple CPU-demanding operations on each record, benefit from the greatest possible parallelism, up to the capacity supported by your system.
Parallel jobs with large memory requirements can benefit from parallelism if they act on data that has been partitioned and if the required memory is also divided among partitions.
Applications that are disk- or I/O-intensive, such as those that extract data from and load data into RDBMSs, benefit from configurations in which the number of logical nodes equals the number of disk spindles being accessed. For example, if a table is fragmented 16 ways inside a database or if a data set is spread across 16 disk drives, set up a node pool consisting of 16 processing nodes.
For some jobs, especially those that are disk-intensive, you must sometimes configure your system to prevent the RDBMS from having either to redistribute load data or to re-partition the data from an extract operation.
The speed of communication among stages should be optimized by your configuration. For example, jobs whose stages exchange large amounts of data should be assigned to nodes where stages communicate by either shared memory (in an SMP environment) or a high-speed link (in an MPP environment). The relative placement of jobs whose stages share small amounts of data is less important.
For SMPs, you might want to leave some processors for the operating system, especially if your application has many stages in a job. See "Configuration Options for an SMP" .
In an MPP environment, parallelization can be expected to improve the performance of CPU-limited, memory-limited, or disk I/O-limited applications. See "Configuration Options for an MPP System" .

The most nearly-equal partitioning of data contributes to the best overall performance of a job run in parallel. For example, when hash partitioning, try to ensure that the resulting partitions are evenly populated.This is referred to as minimizing skew. Experience is the best teacher. Start with smaller data sets and try different parallelizations while scaling up the data set sizes to collect performance statistics.