Parallel processing topologies

In a parallel processing topology, the workload for each job is distributed across several processors.

In IBM® InfoSphere® DataStage®, you design and run jobs to process data. Normally, a job extracts data from one or more data sources, transforms the data, and loads it into one or more new locations.

In a parallel processing topology, the workload for each job is distributed across several processors on one or more computers, called compute nodes. Within InfoSphere DataStage, the user modifies a configuration file to define multiple processing nodes. These nodes work concurrently to complete each job quickly and efficiently. A conductor node computer orchestrates the work.

Parallel processing environments are categorized as symmetric multiprocessing (SMP) or massively parallel processing (MPP) systems.

Symmetric multiprocessing (SMP) systems

In a symmetric multiprocessing (SMP) environment, multiple processors share other hardware resources. In the following diagram, multiple processors share the same memory and disk space, but use a single operating system.

Figure 1. Symmetric multiprocessing (SMP) system
This image shows four processors that share common disk and memory resources.

The workload for a parallel job is distributed across the processors in the system. The actual speed at which the job completes might be limited by the shared resources in the system. To scale the system, you can increase the number of processors, add memory, or increase storage. The scalability strategy that you implement depends on how your job is limited within your current system.

Massively parallel processing (MPP) systems

In a massively parallel processing (MPP) system, many computers are physically housed in the same chassis, as shown in the following diagram:

Figure 2. Massively parallel processing (MPP) system
This image shows 16 computers in an MPP system. Each computer has four processors and disk and memory resources.

An MPP system is physically dispersed, as shown in the following diagram:

Figure 3. MPP system
This image shows 16 nodes that are connected by a network. Each node includes four processors and dedicated disk and memory resources.

In an MPP environment, performance is improved because no resources must be shared among physical computers. To scale the system, you can add computers and associated memory and disk resources.

In an MPP system, a file system is commonly shared across the network. In this configuration, program files can be shared instead of installed on individual nodes in the system.