Multi-node setup

A multi-node installation requires between three and five dedicated Secure Service Container (SSC) LPARs, which must be organized in an LPAR group (cluster). A multi-node installation is recommended if your accelerator requires more IFLs than a single drawer of your IBM Z® or LinuxOne hardware can provide. Tests have shown that the distribution of large workloads to different LPARs leads to a better performance. It is also a good idea to start with a multi-node setup if you expect your workload to grow considerably.

The head node is the controlling node. Externally, it communicates with the networks outside of the cluster. It is paired with one or more Db2® subsystems; it connects to the management network defined in the HMC activation profile, and optionally connects to GDPS® servers. Internally, the head node communicates with the data nodes.

The data nodes mostly communicate with the head node and with each other. Communication hardly leaves the network. Data nodes require a connection to a management network, but this connection is only used initially, when software is transferred from the head node, and for the collection of trace data.

Confined head-node deployment

A confined head-node setup is the only way to deploy a multi-node accelerator. In a confined-head node setup, the head node and one data node share one LPAR. How many additional data node LPARs you need depends on the number of drawers in the system.

Important:

The confined-head node setup is the only deployment option for a multi-node accelerator. A different deployment option, such as the cross-drawer setup supported by previous product versions, is not supported anymore.
You can run shared IFL workloads on the data LPARs of a confined head node setup. However, this is not recommended for performance reasons. If you do run shared workloads, make sure these are rather small workloads. Ideally, an accelerator cluster takes up the entire resources of an IBM® z16® system.
The distribution of MLNs across drawers and LPARs is determined by PR/SM. It is not possible to interfere manually with this process. In the example figures in this section, you find optimal distributions, which might not be achieved in reality.
LPAR, CPU, and memory placement play an important role in the performance of a multi-drawer setup. Any LPAR, CPU or memory that needs to work with another LPAR, CPU or memory not in the same drawer might show a certain degree of performance degradation due to cross-drawer communication. The extent of this degradation depends on the workload mix.
Each node, that is, the head node and each data node, requires the same amount of memory (RAM).

For 2 drawers:

For systems with two drawers, 4 LPARs are recommended. This translates to two LPARs per drawer. See Figure 1.

If you want to increase the system capacity, you can add IFLs to the existing drawers until you reach 50 IFLs per drawer. This leads to a very good scalability of the performance. The same is true for adding two additional drawers.

Confined head node setup with two drawers — Figure 1. Confined head-node setup with two drawers

For 3 drawers:

For a system with three drawers, install 3 LPARs, that is, one LPAR per drawer. See Figure 2.

If you ran four LPARs on a system with only three physical drawers, or used a system with an unbalanced number of IFLs and an unbalanced amount of memory, the overhead caused by the resulting cross-drawer memory access could be so significant that a three-drawer system would be barely faster than a two-drawer system. This is why three LPARs are recommended for a three-drawer system (one LPAR per drawer).

Confined head node setup with 3 drawers — Figure 2. Confined head-node setup with three drawers

For 4 drawers:

For IBM z16 systems with four drawers, four LPARs are recommended. That means one LPARs per drawer. See Figure 3:

Confined head node setup with 4 drawers — Figure 3. Confined head-node setup with four drawers

The term MLN refers to a database limit that defines the maximum number of "multiple logical nodes" per processing node. In this case, you have one logical node for the catalog and up to three logical nodes for data processing in the head node LPAR, plus two or four logical nodes for data processing in each data LPAR.

The IFLs on the head node should be dedicated IFLs. The IFLs of the data nodes, on the other hand, can be shared IFLs. Usually, the sharing has no great impact on the performance. This allows you, for example, to share the IFLs of a production data node with less important LPARs, such as another single-node accelerator for development and testing purposes.

Background: IBM Z hardware distinguishes between dedicated and shared IFLs. If an IFL is dedicated to an LPAR, it belongs to that single LPAR and the PR/SM Hypervisor does not treat it as a potential resource any longer. If an IFL is not dedicated, it is shared, and thus belongs to the shared processor pool. The PR/SM Hypervisor dynamically re-assigns IFLs out of the shared pool to LPARs depending on the LPAR weights (priorities) and the resource requirements.

It is important that every LPAR runs in a single drawer only, so that cross-drawer traffic can be avoided and the available machine resources are used as much as possible.

Important:

In general, to change the number of LPARs, a complete reload of the accelerator and a fresh installation is required. For example, if you start with a three-drawer system with one LPAR each (as recommended), and then want to add a fourth drawer, create a new setup to reinstall the cluster with 4 drawers.
To extend a two-drawer system to a four-drawer system, you need not reinstall the entire system. Here, you can migrate normally. So instead of having two LPARs that share one drawer, you end up with one LPAR per drawer.
The recommendation is to use dedicated IFLs only. However, if you want to use a confined head node setup for shared IFL workloads, make sure that the LPARs that provide the shared workloads have a significantly lower priority than the LPARs of the confined head node cluster. This way, the slowing impact on the performance is kept at a minimum.