Configuring the network settings of hosts for a Db2 pureScale environment on a RoCE network (Linux)

As described in the network topology tables and diagrams, configure the communication adapter ports in pairs, so that the devices with the same device ID (for example, roce0) are on the same subnet.

Before you begin

Ensure that you completed the following tasks:
  • Ensure you have created your Db2 pureScale Feature installation plan. Your installation plan helps ensure that your system meets the prerequisites and that you have performed the preinstallation tasks.
  • Ensure you have read about supported network topologies for Db2 pureScale environments in Network topology configuration support for Db2 pureScale environments.

Administrative access is required on all Db2 member and CF hosts.

Attention: On operating systems above SUSE Linux Enterprise Server (SLES) 15, Red Hat Enterprise Linux 8 and 9, the rdma.service has been removed. This removal affects specific commands and steps that involve the RDMA service. The new Linux releases use udev rules to manage the RDMA hardware.

About this task

To configure the network settings of hosts, install the OpenFabrics Enterprise Distribution (OFED) packages on SuSE Linux® or the InfiniBand Support package on Red Hat Linux, and configure IP addresses on the hosts. Cluster caching facilities (CFs) and members support multiple communication adapter ports to help Db2 pureScale environments scale and to help with high availability. One communication adapter port for each CF or member is all that is required. It is recommended to use more to increase bandwidth, add redundancy, and allow the use of multiple switches.
Note: These steps must be run on all hosts planned for the future Db2 pureScale environment.

Procedure

  1. Log in as root.
  2. Configure the appropriate software to support RDMA over the desired network.
    • OpenFabrics Enterprise Distributions (OFED) package is already bundled within RDMA package in SLES12 Service packs. Refer to Installation prerequisites for Db2 pureScale Feature (Intel Linux) page for packages.
    • RDMA configuration details for RHEL systems.
      1. Run a group installation of the "InfiniBand Support" package to install the required RoCE software:
        yum groupinstall "InfiniBand Support"
  3. Edit the Direct Access Transport (DAT) configuration file to have a line for each of the communication adapter ports (the dat.conf file is not required for RHEL 8.x and higher).
    The /etc/dat.conf file must only contain entries for the adapters that are in the local host. The sample /etc/dat.conf file that is installed by default typically contains irrelevant entries. To avoid unnecessary processing of the file, make the following changes:
    • Move all the Db2 pureScale cluster-related adapter entries to the top of the file.
    • Comment out or remove the irrelevant entries from the file.

    On SLES, the DAT configuration file is located at /etc/dat.conf.

    On RHEL, the DAT configuration file is located at /etc/rdma/dat.conf. This file is updated by the group installation of the packages in previous step.

    Ensure that the file has the following format:
    <interface adapter name> u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "<network interface> 0" " "
    • The <interface adapter name> string cannot be more than 19 characters long.
    • The <network interface> name is the ethernet adapter name.
    The following is an example of the configuration file on a CF host or member that uses four communication adapter ports:
    ofa-v2-roe0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth0 0" ""
    ofa-v2-roe1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth1 0" ""
    ofa-v2-roe2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
    ofa-v2-roe3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""
    Note: If you are receiving DAT_INTERNAL_ERR communication errors, it is likely that the system attempted to communicate with an adapter interface that is not set up correctly in the Direct Access Transport (DAT) configuration file for the adapter port.
  4. Optional: Install the "infiniband-diags" package for diagnostics utilities (such as ibstat, and ibstatus).
    To install the "infiniband-diags" package, run the following command:
    yum install infiniband-diags
    If on RHEL or an operating system earlier than SLES 15, the RDMA service must be restarted before running the diagnostics tools. Run the following commands as root:
    systemctl enable rdma.service
    systemctl restart rdma.service
    If on SLES 15 or later, reboot the machine by running the following command:
    reboot
  5. Verify the Global Pause (IEEE 802.3x) flow control is enabled in the Ethernet adapter driver (10GbE or other supported speed). Note that this only applies to the mlx4 driver. If the adapter driver is mlx5 as seen in ConnectX-4 and ConnectX-5 networking cards, no additional configuration is required.
    For example, to verify in the Mellanox Connect X-2 10GE adapter driver, the priority bit mask "pfctx" and "pfcrx" in the MLX4_EN module must be set to a value of "0". For example :
    HostM0 # cat /sys/module/mlx4_en/parameters/pfctx
    0
    
    HostM0 # cat /sys/module/mlx4_en/parameters/pfcrx
    0
    If either or both priority bit masks is set to any other value, they can be set to 0 using either of the following commands:
    For SuSE:
    echo "options mlx4_en pfctx=0 pfcrx=0" >> /etc/modprobe.conf.local
    systemctl restart rdma.service
    For RHEL:
    echo "options mlx4_en pfctx=0 pfcrx=0" >> /etc/modprobe.d/modprobe.conf
    systemctl restart rdma.service
  6. Edit the network configuration files to configure a static IP address for each communication adapter port.
    The following file listings show the network adapter configuration for the CFs hosts cf1 and cf2, and members hosts, member1, member2, member3, and member4. Edit the network configuration files on each host so that the first communication adapter port that is listed on each host is on the same subnet as the other hosts. If the user is configuring multiple communication adapter ports on the CFs, the user must ensure that the device names that are representing the same sequence of physical adapter and port number on the two CFs are connected to the same switch and are on the same IP subnet. For example, port 1 of the first RoCE card on the primary CF must be connected to the same switch as port 1 of the first RoCE card on the secondary CF. They must also be placed on the same IP subnet, for instance 192.168.1.0. Furthermore, port 2 of the RoCE card on the primary CF must be connected to the same switch as port 2 of the RoCE card on the secondary CF. They must be placed on IP subnet 192.168.4.0. Furthermore, port 2 of /dev/roce1 on the primary CF must be connected to the same switch as port 2 of /dev/roce1 on the secondary CF. They must be placed on IP subnet 192.168.4.0. The same applies for member hosts if the user is configuring multiple communication adapter ports on them as well.
    ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.1.227'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth1
    
    DEVICE=eth1
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.3.227'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth2
    DEVICE=eth2
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.2.227'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth3
    DEVICE=eth3
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.4.227'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    
    
    ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.1.228'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth1
    DEVICE=eth1
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.3.228'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth2
    DEVICE=eth2
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.2.228'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth3
    DEVICE=eth3
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.4.228'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    
    
    ssh member1 cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.1.225'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    
    ssh member2 cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.1.226'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    
    ssh member3 cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.1.229'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    
    ssh member4 cat /etc/sysconfig/network-scripts/ifcfg-eth0
    DEVICE=eth0
    HWADDR=00:02:C9:10:F7:26
    TYPE=Ethernet
    IPADDR='192.168.1.230'
    NETMASK='255.255.255.0'
    MTU=''
    NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    NETWORK=''
    REMOTE_IPADDR=''
    STARTMODE='auto'
    USERCONTROL='no'
    Note:
    • For simplicity, the IP addresses used in the previous example use the 255.255.255.0 subnetwork mask (NETMASK) so that the subnetwork third and forth IP segments can match the numbers of the interface devices and hostname. This subnetwork mask results in the IP addresses for CFs formatted like 10.222.interface-id-device-number.CF-hostname-suffix and members IP addresses like 10.222.interface-id-device-number.10member-hostname-suffix.
    • The first communication adapter port on each CF host is on the same subnet as the members.
    • Each communication adapter port on a CF is on a distinct subnet.
  7. Set up the IP interfaces on the switch. For more information, see Setting up the IP interfaces on the switch on a RoCE network (Linux).
  8. For all switches in the cluster, disable the Converged Enhance Ethernet (CEE) feature and ensure Global Pause (IEEE 802.3x) is enabled. For a BNT switch with firmware level 6.8.2 and higher, port flow control must also be enabled for Global Pause. For instruction, refer to the switch manual.
  9. Set up netmon.cf on each host. For more information, see Setting up the netmon.cf file on a RoCE network (Linux).
  10. Update the /etc/hosts file on each of the hosts so that for each host in the planned Db2 pureScale environment, the file includes all the IP addresses of all the communication adapter ports for all hosts in the planned environment.

    The /etc/hosts file must have this format: <IP_Address> <fully_qualified_name> <short_name>. All hosts in the cluster must have the same /etc/hosts format.

    For example, in a planned Db2 pureScale environment with multiple communication adapter ports on the CFs and four members, the /etc/hosts configuration file might resemble the following file:

    192.168.1.227 cf1-eth1.torolab.ibm.com cf1-eth1
    192.168.3.227 cf1-eth2.torolab.ibm.com cf1-eth2
    192.168.2.227 cf1-eth3.torolab.ibm.com cf1-eth3
    192.168.4.227 cf1-eth4.torolab.ibm.com cf1-eth4
    192.168.1.228 cf2-eth1.torolab.ibm.com cf2-eth1
    192.168.3.228 cf2-eth2.torolab.ibm.com cf2-eth2
    192.168.2.228 cf2-eth3.torolab.ibm.com cf2-eth3
    192.168.4.228 cf2-eth4.torolab.ibm.com cf2-eth4
    192.168.1.225 member0-eth1.torolab.ibm.com member0-eth1
    192.168.2.225 member0-eth2.torolab.ibm.com member0-eth2
    192.168.1.226 member1-eth1.torolab.ibm.com member1-eth1
    192.168.2.226 member1-eth2.torolab.ibm.com member1-eth2
    192.168.1.229 member2-eth1.torolab.ibm.com member2-eth1
    192.168.2.229 member2-eth2.torolab.ibm.com member2-eth2
    192.168.1.230 member3-eth1.torolab.ibm.com member3-eth1
    192.168.2.230 member3-eth2.torolab.ibm.com member3-eth2
    Note:
    • In a four member environment that uses only one communication adapter port for each CF and member, the file would look similar to the previous example, but contain only the first IP address of each of the CFs or members.
  11. Restart the service for the RoCE subsystem.
    If on RHEL or an operating system earlier than SLES 15, run the following command to restart the service:
    systemctl restart rdma.service
    If on SLES 15 or later, run the following command to restart the service:
    reboot

What to do next

Modify the kernel parameters of hosts that you plan to include in the Db2 pureScale environment.