Switch configuration on a RoCE network (AIX)

Switch failover capability is a high availability feature provided by the configure Link Aggregate Control Protocol (LACP) on the switch.

Before you begin

This configuration procedure is specific to switches in environments with AIX® systems, and to a remote direct memory access (RDMA) over converged ethernet (RoCE) network. Switch failover capability is a high availability feature that is provided by configuring Link Aggregate Control Protocol (LACP) on the switches.

Before you begin:

Ensure you have created your Db2® pureScale® Feature installation plan. Your installation plan helps ensure that your system meets the prerequisites and that you have performed the preinstallation tasks.
Ensure you have read about supported network topologies for Db2 pureScale environments in Network topology configuration support for Db2 pureScale environments.
Power on the switch and connect a serial cable or Ethernet cable to the switch.

About this task

A RoCE network switch must support:

Link Aggregate Control Protocol (LACP) for switch failover configuration
Global Pause flow control (IEEE 802.3x)
Optional. Support of local loopback IP address on the switch that can be pinged by IP addresses on the same IP subnet. For more details and restrictions on this, refer to the technote on Restrictions of automated adapter liveliness test.
VLAN id must be the same across all the switches used in each particular cluster.
Note: Currently, Db2 pureScale only supports RoCE v1.

The procedure details steps for configuring two switches to support switch failover. Switch failover capability helps the resiliency, or fault tolerance, of a network.

To create a Db2 pureScale environment with multiple switches, you must have multiple cluster interconnects on CF servers and configure switch failover on the switches.

Restrictions

Administrative access is required on the switches.

Procedure

Connect the two switches together.
- Designate two or more ports on each switch to be used as inter-switch links (ISLs) and then connect them physically.
- Aggregate all ISLs using Link Aggregate Control Protocol (LACP).
- All ISL ports on both switches must be setup as active.
Disable the Converged Enhanced Ethernet (CEE) feature.
Enable Global Pause flow control (IEEE 802.3x).
- It involves configuration at the switch level only.
- For a BNT switch with firmware level 6.5.2 and higher, Global Pause is enabled by setting both flow control send and receive to 'on' for all Db2 related ports including the ISL ports.
Perform one of the following two Spanning Tree Protocol (STP) configurations.
1. Disable STP
  - This reduces the overall configuration complexity and is suitable for dedicated private pureScale network where the pS switch is restricted to usage by pureScale hosts.
2. Enable STP
  - This is recommended for shared pureScale network where non-pureScale traffic may go through the pureScale switches. Enabling STP can prevent accidental network loop creation. The actual commands vary with the switch brand and model. For BNT switch, the following settings are required:
    - Mark all non-ISL ports (including unused) as edge ports
    - Enable BPDU guard on all edge ports
    - Enable root guard on all edge ports
Starting from V11.1.4.4, this step is no longer required as adapter port liveliness test has been enhanced and automated. Some restrictions apply. Refer to technote#0733765 for restrictions.

However, this step is mandatory for environments that are listed in the Answer section of the technote: Restrictions of automated adapter liveliness test, where the enhanced and simplified adapter liveliness tests cannot be leveraged. This step is still recommended for environments not listed in the technote as it can catch network failure in a multi-tiered switch setup.

If the switches are used for a RoCE network with IP support, assign an IP address that can be pinged on the switches. These IP addresses, which are assigned to the switch (as IP interfaces) reside in the same IP subnet as the IP addresses that are used for the hosts on the RoCE network. If the IP addresses on the hosts are connected to the same switch but are in different IP subnets, then each of those IP subnets must have a corresponding IP address that is assigned to the switch that they are directly connected to. When the setups of the host IP addresses are complete, you can then ping the switch IP addresses from the hosts.

For example, if the IP address 10.1.1.1 (with netmask 255.255.255.0) is assigned on a host's en1 net interface and the IP address 10.1.2.1 is assigned to the host's en2 net interface and each is connected to a different switch, then IP address 10.1.1.24 can be assigned to switch 1 and IP address 10.1.2.23 can be assigned to switch 2.

As a second example, if 10.1.1.1 and 10.1.3.1 (with netmask 255.255.255.0) are assigned on a host's en1 and en3, and they are connected to the same switch (with 10.1.2.1 and 10.1.4.1 assigned to the host's en2 and en4, connected to a different switch). Then, both 10.1.1.24 and 10.1.3.24 would be assigned to switch 1, and both 10.1.2.23 and 10.1.4.23 would be assigned to switch 2.
Repeat the above on all switches in the cluster.

Example

What to do next

Configure the network settings of hosts that you plan to include in the Db2 pureScale environment.