This document outlines the features and procedure required to configure switches for
RoCE.
Before you begin
This configuration procedure is specific to switches in environments with AIX® systems, and to a remote direct memory access (RDMA) over converged ethernet (RoCE) network. Switch failover capability is a high availability feature that is provided by configuring Link Aggregate Control Protocol (LACP) on the switches.
Before
you begin:
- Ensure you
have created your Db2®
pureScale® Feature installation plan.
Your installation plan helps ensure that your system meets the prerequisites and that you have
performed the per-installation tasks.
- Ensure you have read about supported network topologies for Db2
pureScale environments
in Network topology configuration support for Db2 pureScale environments.
- Power on the
switch and connect a serial cable or Ethernet cable to the switch.
About this task
An RoCE network switch must support:
The
procedure details steps for configuring two switches to support switch failover. Switch failover
capability helps the resiliency, or fault tolerance, of a network.
To create a Db2
pureScale
environment with multiple switches, you must have multiple cluster interconnects on CF servers and configure
switch failover on the switches.
Restrictions
- Administrative access is required on the switches.
Procedure
-
Connect the two switches together.
- Designate two or more ports on each switch to be used as inter-switch links (ISLs) and then
connect them physically.
- Aggregate all ISLs using Link Aggregate Control Protocol (LACP).
- All ISL ports on both switches must be setup as active.
-
Disable the Converged Enhanced Ethernet
(CEE) feature.
-
Enable Global Pause flow control (IEEE 802.3x) or Priority-Based flow control
(PFC) (IEEE 802.1Qbb). Only one of these flow control settings can be used and must be configured on
the network switches and adapters. Consult your network switch documentation for the correct
configuration as configuration varies by manufacturer.
Note: For a BNT switch with firmware level 6.5.2 and higher, Global Pause is enabled by setting both
flow control send and receive to 'on' for all Db2 related ports including the ISL ports. Db2 related ports
including the ISL ports.
-
Perform one of the following two Spanning Tree Protocol (STP)
configurations.
- Disable STP:
- This reduces the overall configuration complexity and is suitable for dedicated private pureScale network where the pS switch is restricted to usage
by pureScale hosts.
- Enable STP:
- This is recommended for shared pureScale network
where non-pureScale traffic may go through the pureScale switches. Enabling STP can prevent accidental
network loop creation. The actual commands vary with the switch brand and model. For BNT switch, the
following settings are required:
- Mark all non-ISL ports (including unused) as edge ports.
- Enable BPDU guard on all edge ports.
- Enable root guard on all edge ports.
-
This step is mandatory for environments that are listed in the Answer
section of the technote: Restrictions of automated adapter liveliness test, where the enhanced and
simplified adapter liveliness tests cannot be leveraged. This step is still recommended for
environments not listed in the technote as it can catch network failure in a multi-tiered switch
setup.
If the switches are used for an RoCE network with IP support, assign an IP address that can be
pinged on the switches. These IP addresses, which are assigned to the switch (as IP interfaces)
reside in the same IP subnet as the IP addresses that are used for the hosts on the RoCE network. If the IP addresses on the hosts are connected to the same switch
but are in different IP subnets, then each of those IP subnets must have a corresponding IP address
that is assigned to the switch that they are directly connected to. When the setups of the host IP
addresses are complete, you can then ping the switch IP addresses from the hosts.
For example, if the IP address 10.1.1.1 (with netmask 255.255.255.0) is assigned on a host's en1
net interface and the IP address 10.1.2.1 is assigned to the host's en2 net interface and each is
connected to a different switch, then IP address 10.1.1.24 can be assigned to switch 1 and IP
address 10.1.2.23 can be assigned to switch 2.
As a second example, if 10.1.1.1 and 10.1.3.1 (with netmask 255.255.255.0) are assigned on a
host's en1 and en3, and they are connected to the same switch (with 10.1.2.1 and 10.1.4.1 assigned
to the host's en2 and en4, connected to a different switch). Then, both 10.1.1.24 and 10.1.3.24
would be assigned to switch 1, and both 10.1.2.23 and 10.1.4.23 would be assigned to switch 2.
-
Repeat the above on all switches in the cluster.
Examples
Note: The above steps 2-4 are also required in single-switch RoCE configurations.
What to do next
Configure the network settings of hosts that you plan
to include in the Db2
pureScale environment.