Configuring the network settings of hosts for a Db2 pureScale environment on a RoCE network (Linux)
As described in the network topology tables and diagrams, configure the communication adapter ports in pairs, so that the devices with the same device ID (for example, roce0) are on the same subnet.
Before you begin
- Ensure you have created your Db2 pureScale Feature installation plan. Your installation plan helps ensure that your system meets the prerequisites and that you have performed the preinstallation tasks.
- Ensure you have read about supported network topologies for Db2 pureScale environments in Network topology configuration support for Db2 pureScale environments.
Administrative access is required on all Db2 member and CF hosts.
udev rules to manage the RDMA hardware.About this task
Procedure
- Log in as root.
- Configure the appropriate software to support RDMA over the desired network.
- OpenFabrics Enterprise Distributions (OFED) package is already bundled within RDMA package in SLES12 Service packs. Refer to Installation prerequisites for Db2 pureScale Feature (Intel Linux) page for packages.
- RDMA configuration details for RHEL systems.
- Run a group installation of the "InfiniBand Support" package to install the required RoCE
software:
yum groupinstall "InfiniBand Support"
- Run a group installation of the "InfiniBand Support" package to install the required RoCE
software:
- Edit the Direct Access Transport (DAT) configuration file to have a line for
each of the communication adapter ports (the
dat.conf file is not required for RHEL 8.x and
higher). The /etc/dat.conf file must only contain entries for the adapters that are in the local host. The sample /etc/dat.conf file that is installed by default typically contains irrelevant entries. To avoid unnecessary processing of the file, make the following changes:
- Move all the Db2 pureScale cluster-related adapter entries to the top of the file.
- Comment out or remove the irrelevant entries from the file.
On SLES, the DAT configuration file is located at /etc/dat.conf.
On RHEL, the DAT configuration file is located at /etc/rdma/dat.conf. This file is updated by the group installation of the packages in previous step.
Ensure that the file has the following format:<interface adapter name> u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "<network interface> 0" " "- The <interface adapter name> string cannot be more than 19 characters long.
- The <network interface> name is the ethernet adapter name.
ofa-v2-roe0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth0 0" "" ofa-v2-roe1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth1 0" "" ofa-v2-roe2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" "" ofa-v2-roe3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""Note: If you are receiving DAT_INTERNAL_ERR communication errors, it is likely that the system attempted to communicate with an adapter interface that is not set up correctly in the Direct Access Transport (DAT) configuration file for the adapter port. - Optional: Install the "infiniband-diags" package for diagnostics utilities
(such as ibstat, and ibstatus). To install the "infiniband-diags" package, run the following command:
yum install infiniband-diagsIf on RHEL or an operating system earlier than SLES 15, the RDMA service must be restarted before running the diagnostics tools. Run the following commands as root:systemctl enable rdma.service systemctl restart rdma.serviceIf on SLES 15 or later, reboot the machine by running the following command:reboot -
Verify the Global Pause (IEEE 802.3x) flow control is enabled in the Ethernet adapter driver
(10GbE or other supported speed). Note that this only applies to the mlx4 driver. If the adapter
driver is mlx5 as seen in ConnectX-4 and ConnectX-5 networking cards, no additional configuration is
required. For example, to verify in the Mellanox Connect X-2 10GE adapter driver, the priority bit mask "pfctx" and "pfcrx" in the MLX4_EN module must be set to a value of "0". For example :
HostM0 # cat /sys/module/mlx4_en/parameters/pfctx 0 HostM0 # cat /sys/module/mlx4_en/parameters/pfcrx 0If either or both priority bit masks is set to any other value, they can be set to 0 using either of the following commands:For SuSE:echo "options mlx4_en pfctx=0 pfcrx=0" >> /etc/modprobe.conf.local systemctl restart rdma.serviceFor RHEL:echo "options mlx4_en pfctx=0 pfcrx=0" >> /etc/modprobe.d/modprobe.conf systemctl restart rdma.service - Edit the network configuration files to configure a static IP address for
each communication adapter port. The following file listings show the network adapter configuration for the CFs hosts cf1 and cf2, and members hosts, member1, member2, member3, and member4. Edit the network configuration files on each host so that the first communication adapter port that is listed on each host is on the same subnet as the other hosts. If the user is configuring multiple communication adapter ports on the CFs, the user must ensure that the device names that are representing the same sequence of physical adapter and port number on the two CFs are connected to the same switch and are on the same IP subnet. For example, port 1 of the first RoCE card on the primary CF must be connected to the same switch as port 1 of the first RoCE card on the secondary CF. They must also be placed on the same IP subnet, for instance 192.168.1.0. Furthermore, port 2 of the RoCE card on the primary CF must be connected to the same switch as port 2 of the RoCE card on the secondary CF. They must be placed on IP subnet 192.168.4.0. Furthermore, port 2 of /dev/roce1 on the primary CF must be connected to the same switch as port 2 of /dev/roce1 on the secondary CF. They must be placed on IP subnet 192.168.4.0. The same applies for member hosts if the user is configuring multiple communication adapter ports on them as well.
ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.1.227' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.3.227' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth2 DEVICE=eth2 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.2.227' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh cf1 cat /etc/sysconfig/network-scripts/ifcfg-eth3 DEVICE=eth3 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.4.227' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no'ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.1.228' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.3.228' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth2 DEVICE=eth2 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.2.228' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh cf2 cat /etc/sysconfig/network-scripts/ifcfg-eth3 DEVICE=eth3 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.4.228' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no'ssh member1 cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.1.225' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh member2 cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.1.226' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh member3 cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.1.229' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no' ssh member4 cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=00:02:C9:10:F7:26 TYPE=Ethernet IPADDR='192.168.1.230' NETMASK='255.255.255.0' MTU='' NAME='Mellanox MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]' NETWORK='' REMOTE_IPADDR='' STARTMODE='auto' USERCONTROL='no'Note:- For simplicity, the IP addresses used in the previous example use the
255.255.255.0subnetwork mask (NETMASK) so that the subnetwork third and forth IP segments can match the numbers of the interface devices and hostname. This subnetwork mask results in the IP addresses for CFs formatted like10.222.interface-id-device-number.CF-hostname-suffixand members IP addresses like10.222.interface-id-device-number.10member-hostname-suffix. - The first communication adapter port on each CF host is on the same subnet as the members.
- Each communication adapter port on a CF is on a distinct subnet.
- For simplicity, the IP addresses used in the previous example use the
- Set up the IP interfaces on the switch. For more information, see Setting up the IP interfaces on the switch on a RoCE network (Linux).
- For all switches in the cluster, disable the Converged Enhance Ethernet (CEE) feature and ensure Global Pause (IEEE 802.3x) is enabled. For a BNT switch with firmware level 6.8.2 and higher, port flow control must also be enabled for Global Pause. For instruction, refer to the switch manual.
- Set up netmon.cf on each host. For more information, see Setting up the netmon.cf file on a RoCE network (Linux).
- Update the /etc/hosts file on each of the hosts so that for each
host in the planned Db2
pureScale environment, the file
includes all the IP addresses of all the communication adapter
ports for all hosts in the planned environment.
The /etc/hosts file must have this format: <IP_Address> <fully_qualified_name> <short_name>. All hosts in the cluster must have the same /etc/hosts format.
For example, in a planned Db2 pureScale environment with multiple communication adapter ports on the CFs and four members, the /etc/hosts configuration file might resemble the following file:
192.168.1.227 cf1-eth1.torolab.ibm.com cf1-eth1 192.168.3.227 cf1-eth2.torolab.ibm.com cf1-eth2 192.168.2.227 cf1-eth3.torolab.ibm.com cf1-eth3 192.168.4.227 cf1-eth4.torolab.ibm.com cf1-eth4 192.168.1.228 cf2-eth1.torolab.ibm.com cf2-eth1 192.168.3.228 cf2-eth2.torolab.ibm.com cf2-eth2 192.168.2.228 cf2-eth3.torolab.ibm.com cf2-eth3 192.168.4.228 cf2-eth4.torolab.ibm.com cf2-eth4 192.168.1.225 member0-eth1.torolab.ibm.com member0-eth1 192.168.2.225 member0-eth2.torolab.ibm.com member0-eth2 192.168.1.226 member1-eth1.torolab.ibm.com member1-eth1 192.168.2.226 member1-eth2.torolab.ibm.com member1-eth2 192.168.1.229 member2-eth1.torolab.ibm.com member2-eth1 192.168.2.229 member2-eth2.torolab.ibm.com member2-eth2 192.168.1.230 member3-eth1.torolab.ibm.com member3-eth1 192.168.2.230 member3-eth2.torolab.ibm.com member3-eth2Note:- In a four member environment that uses only one communication adapter port for each CF and member, the file would look similar to the previous example, but contain only the first IP address of each of the CFs or members.
- Restart the service for the RoCE
subsystem. If on RHEL or an operating system earlier than SLES 15, run the following command to restart the service:
systemctl restart rdma.serviceIf on SLES 15 or later, run the following command to restart the service:reboot
What to do next
Modify the kernel parameters of hosts that you plan to include in the Db2 pureScale environment.