Configuring the network settings of hosts for a Db2 pureScale environment on an InfiniBand network (Linux)

As described in the network topology tables and diagrams, configure the communication adapter ports in pairs, so that the devices with the same device ID (for example, ib0) are on the same subnet.

Before you begin

Important: Starting from version 11.5.5, support for Infiniband (IB) adapters as the high-speed communication network between members and CFs in Db2 pureScale on all supported platforms is deprecated and will be removed in a future release. Use Remote Direct Memory Access over Converged Ethernet (RoCE) network as the replacement.
Ensure that you completed the following tasks:
  • Ensure you have created your Db2 pureScale Feature installation plan. Your installation plan helps ensure that your system meets the prerequisites and that you have performed the preinstallation tasks.
  • Ensure you have read about supported network topologies for Db2 pureScale environments in Network topology configuration support for Db2 pureScale environments.

Administrative access is required on all Db2 member and CF hosts.

About this task

To configure the network settings of hosts, install the OpenFabrics Enterprise Distribution (OFED) packages and configure IP addresses on the hosts. Cluster caching facilities (CFs) and members support multiple communication adapter ports to help Db2 pureScale environments scale and to help with high availability. One communication adapter port for each CF or member is all that is required. It is recommended to use more to increase bandwidth, add redundancy, and allow the use of multiple switches.
Note: These steps must be executed on all hosts planned for the future Db2 pureScale environment.

Procedure

  1. Log in as root.
  2. Configure OpenFabrics Enterprise Distribution (OFED) software.
    • OpenFabrics Enterprise Distribution (OFED) package is already bundled within RDMA package in SLES12 Service packs. Refer to Installation prerequisites for Db2 pureScale Feature (Intel Linux) page for packages.
    • OFED configuration details for RHEL systems.
      On RHEL, run a group installation of the "InfiniBand Support" package to install the required InfiniBand software. The "InfiniBand Support" package is available as a group install. Perform the following as root to install the package:
      yum groupinstall "InfiniBand Support"
      Note: For the yum command to work, it requires local repositories to be created first from either Red Hat Network (RHN) or from the DVD iso images. After the repository is setup, the yum command is aware of the location to find the target packages. Registering with RHN is the recommended mechanisms to access latest kernel updates and fixes. Users are recommended to setup the repository for every RHEL system.
      If the repository cannot be setup with RHN, it can also be setup using the iso images that come with the RHEL DVD media. These procedures are only required on a system if it cannot be registered with RHN. The following example shows how to setup the repository using the RHEL iso image.
      1. Copy the file RHEL5.7-20100922.1-Server-x86_64-DVD1.iso from the DVD to a temporary directory on the target system, /tmp/iso
        # cd /tmp/iso
        # ls -rlt
        total 3354472
        -rw-r--r-- 1 root root 3431618560 Jan 10 20:13 RHEL5.7-20100922.1-Server-x86_64-DVD1.iso
      2. Extract the iso image.
        mount -o loop /tmp/iso/RHEL5.7-20100922.1-Server-x86_64-DVD1.iso /mnt/iso/
      3. Create a repository.
        # cd repodata/
        # ls -rlt
        total 76180
        -rw-r--r-- 1 root root 8032315 Jan 17 12:59 primary.xml.gz
        -rw-r--r-- 1 root root 51522840 Jan 17 12:59 other.xml.gz
        -rw-r--r-- 1 root root 18346363 Jan 17 12:59 filelists.xml.gz
        -rw-r--r-- 1 root root 951 Jan 17 12:59 repomd.xml
        # cd ..
        # cd repodata/
      4. Create a repository, by creating a local repository for the iso in /etc/yum.repos.d/my.repo
        # cat my.repo
        [my.repo]
        name=Redhat LTC
        baseurl=file:///mnt/iso
        gpgcheck=0
        enabled=1 
      5. The previous steps complete the creation of the local repository to point to /mnt/iso as the source.
      6. Issue the relevant yum command to perform the installation of the required packages.
        Sample output for a successful installation:
        [root@coralxib42 ~]# yum groupinstall 'Infiniband Support'
        Loaded plugins: product-id, refresh-packagekit, rhnplugin, subscription-manager
        Updating Red Hat repositories.
        4/4
        Setting up Group Process
        Resolving Dependencies
        --> Running transaction check
        ---> Package dapl.x86_64 0:2.0.25-5.2.el6 will be installed
        ---> Package ibsim.x86_64 0:0.5-4.el6 will be installed
        ---> Package ibutils.x86_64 0:1.5.4-3.el6 will be installed
        --> Processing Dependency: libosmcomp.so.3(OSMCOMP_2.3)(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: libosmvendor.so.3(OSMVENDOR_2.0)(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: libopensm.so.2(OPENSM_1.5)(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: tk for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: libosmcomp.so.3()(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: libosmvendor.so.3()(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: libopensm.so.2()(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        --> Processing Dependency: libibdmcom.so.1()(64bit) for package: ibutils-1.5.4-3.el6.x86_64
        ---> Package libcxgb3.x86_64 0:1.3.0-1.el6 will be installed
        ---> Package libibcm.x86_64 0:1.0.5-2.el6 will be installed
        ---> Package libibmad.x86_64 0:1.3.4-1.el6 will be installed
        ---> Package libibumad.x86_64 0:1.3.4-1.el6 will be installed
        ---> Package libibverbs.x86_64 0:1.1.4-4.el6 will be installed
        ---> Package libibverbs-utils.x86_64 0:1.1.4-4.el6 will be installed
        ---> Package libipathverbs.x86_64 0:1.2-2.el6 will be installed
        ---> Package libmlx4.x86_64 0:1.0.1-8.el6 will be installed
        ---> Package libmthca.x86_64 0:1.0.5-7.el6 will be installed
        ---> Package libnes.x86_64 0:1.1.1-1.el6 will be installed
        ---> Package librdmacm.x86_64 0:1.0.10-2.el6 will be installed
        ---> Package librdmacm-utils.x86_64 0:1.0.10-2.el6 will be installed
        ---> Package rdma.noarch 0:1.0-9.el6 will be installed
        ---> Package rds-tools.x86_64 0:2.0.4-3.el6 will be installed
        --> Running transaction check
        ---> Package ibutils-libs.x86_64 0:1.5.4-3.el6 will be installed
        ---> Package opensm-libs.x86_64 0:3.3.5-1.el6 will be installed
        ---> Package tk.x86_64 1:8.5.7-5.el6 will be installed
        --> Finished Dependency Resolution
        
        Dependencies Resolved
        
        ====================================================================================
         Package	     		Arch				Version	     			Repository	     						Size
        ====================================================================================
        Installing:
         dapl	x86_64	2.0.25-5.2.el6	rhel-x86_64-server-6 143 k
         ibsim	x86_64	0.5-4.el6	rhel-x86_64-server-6	55 k
         ibutils					x86_64	1.5.4-3.el6	rhel-x86_64-server-6	1.0 M
         libcxgb3    	 		x86_64     1.3.0-1.el6         rhel-x86_64-server-6          16 k
         libibcm     	 		x86_64     1.0.5-2.el6         rhel-x86_64-server-6          19 k
         libibmad    	 		x86_64     1.3.4-1.el6         rhel-x86_64-server-6          52 k
         libibumad      		x86_64     1.3.4-1.el6         rhel-x86_64-server-6          55 k
         libibverbs     		x86_64     1.1.4-4.el6         rhel-x86_64-server-6          44 k
         libibverbs-utils  x86_64     1.1.4-4.el6         rhel-x86_64-server-6          34 k
         libipathverbs     x86_64     1.2-2.el6           rhel-x86_64-server-6          13 k
         libmlx4      			x86_64     1.0.1-8.el6         rhel-x86_64-server-6          27 k
         libmthca    			x86_64     1.0.5-7.el6         rhel-x86_64-server-6          33 k
         libnes      			x86_64     1.1.1-1.el6         rhel-x86_64-server-6          15 k
         librdmacm         x86_64     1.0.10-2.el6        rhel-x86_64-server-6          22 k
         librdmacm-utils   x86_64     1.0.10-2.el6        rhel-x86_64-server-6          27 k
         rdma         			noarch     1.0-9.el6           rhel-x86_64-server-6          16 k
         rds-tools         x86_64     2.0.4-3.el6         rhel-x86_64-server-6          55 k
        Installing for dependencies:
         ibutils-libs      x86_64     1.5.4-3.el6         rhel-x86_64-server-6         924 k
         opensm-libs       x86_64     3.3.5-1.el6         rhel-x86_64-server-6          53 k
         tk           			x86_64     1:8.5.7-5.el6       rhel-x86_64-server-6         1.4 M
        
        Transaction Summary
        =====================================================================================
        Install      20 Package(s)
        
        Total download size: 4.0 M
        Installed size: 0
        Is this ok [y/N]:
  3. DAT configuration file details for SLES and RHEL systems:
    • On SLES, edit the Direct Access Transport (DAT) configuration file, /etc/dat.conf, to have a line for each of the communication adapter ports
    • On RHEL , the DAT configuration file is located in /etc/rdma/dat.conf and it is updated by the group installation of the "InfiniBand Support" package.

      Ensure that the file has the following format:

      <interface adapter name> u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "<network interface> 0" " "
    • The <interface adapter name> string cannot be more than 19 characters long.
    • The <network interface> name is the ethernet adapter name.
    The following example has two 2-port communication adapter ports.
    cat /etc/dat.conf
    ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
    ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
    ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib2 0" ""
    ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib3 0" ""
    
    Note: If you are receiving DAT_INTERNAL_ERR communication errors, it is likely that the system attempted to communicate with an adapter interface that is not set up correctly in the Direct Access Transport (DAT) configuration file for the adapter port.
  4. Edit the network configuration files to configure a static IP address for each communication adapter port interface.
    The following file listings show the network adapter configuration for the CFs, hosts cf1 and cf2, and members, member1, member2, member3, and member4. Edit the network configuration files on each host so that the first communication adapter port listed on each host is on the same subnet as the other hosts. If configuring multiple communication adapter ports on the CFs, pair the additional communication adapter ports CFs so that each DEVICE on the secondary CF is on the same subnetwork as the DEVICE on the primary with the same ID.The network configuration files are located in /etc/sysconfig/network in SLES and /etc/sysconfig/network-scripts in RHEL. SLES example is below.
    ssh cf1 cat /etc/sysconfig/network/ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.222.0.1'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf1 cat /etc/sysconfig/network/ifcfg-ib1
    
    DEVICE=ib1
    BOOTPROTO='static'
    IPADDR='10.222.1.1'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf1 cat /etc/sysconfig/network/ifcfg-ib2
    DEVICE=ib2
    BOOTPROTO='static'
    IPADDR='10.222.2.1'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf1 cat /etc/sysconfig/network/ifcfg-ib3
    DEVICE=ib3
    BOOTPROTO='static'
    IPADDR='10.222.3.1'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf2 cat /etc/sysconfig/network/ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.222.0.2'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf2 cat /etc/sysconfig/network/ifcfg-ib1
    DEVICE=ib1
    BOOTPROTO='static'
    IPADDR='10.222.1.2'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf2 cat /etc/sysconfig/network/ifcfg-ib2
    DEVICE=ib2
    BOOTPROTO='static'
    IPADDR='10.222.2.2'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh cf2 cat /etc/sysconfig/network/ifcfg-ib3
    DEVICE=ib3
    BOOTPROTO='static'
    IPADDR='10.222.3.2'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    ssh member1 cat /etc/sysconfig/network/ifcfg-ib0
    
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.222.0.101'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    
    ssh member2 cat /etc/sysconfig/network/ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.222.0.102'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    
    ssh member3 cat /etc/sysconfig/network/ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.222.0.103'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    
    ssh member4 cat /etc/sysconfig/network/ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.222.0.104'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    Note:
    • For simplicity, the IP addresses used in the previous example use the 255.255.255.0 subnetwork mask (NETMASK) so that the subnetwork third and forth IP segments can match the numbers of the interface devices and hostname. This subnetwork mask results in the IP addresses for CFs formatted like 10.222.interface-id-device-number.CF-hostname-suffix and members IP addresses like 10.222.interface-id-device-number.10member-hostname-suffix.
    • The first communication adapter port on each CF host is on the same subnet as the members.
    • Each communication adapter port on a CF or member is on a distinct subnet.
    • Communication adapter ports with the same interface DEVICE name on the primary and secondary CFs share the same subnet.
  5. If configuring multiple communication adapter ports on members, use the same IP subnet for each adapter interface device on the second host as was used for adapter interface with the same device ID on the other hosts so that matching devices are on the same IP subnets.
    cat /etc/sysconfig/network/ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO='static'
    IPADDR='10.1.1.161'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    
    cat /etc/sysconfig/network/ifcfg-ib1
    DEVICE=ib1
    BOOTPROTO='static'
    IPADDR='10.1.2.161'
    NETMASK='255.255.255.0'
    STARTMODE='onboot'
    WIRELESS='no'
    All members must be on an IP subnet used by the CF adapter interfaces. The resulting IP subnets are:
    • The 10.1.1 subnet has the ib0 device from all members and all CFs.
    • The 10.1.2 subnet has the ib1 device from all members and all CFs.
  6. For BladeCenter deployments only, enable the subnet manager service (Open SM) on all hosts in the Db2 pureScale environment.
    To enable the subnet manager service run the following commands on each host to start the service and have it start after a reboot:
    chkconfig opensmd on
    service opensmd start
  7. Update the /etc/hosts file on each of the hosts so that for each host in the planned Db2 pureScale environment, the file includes all the IP addresses of all the communication adapter ports for all hosts in the planned environment.

    The /etc/hosts file must have this format: <IP_Address> <fully_qualified_name> <short_name>. All hosts in the cluster must have the same /etc/hosts format.

    For example, in a planned Db2 pureScale environment with multiple communication adapter ports on the CFs with four members, the /etc/hosts configuration file might resemble the following file:

     10.222.0.1       cf1-ib0.example.com cf1-ib0
     10.222.1.1       cf1-ib1.example.com cf1-ib1
     10.222.2.1       cf1-ib2.example.com cf1-ib2
     10.222.3.1       cf1-ib3.example.com cf1-ib3
     10.222.0.2       cf2-ib0.example.com cf2-ib0
     10.222.1.2       cf2-ib1.example.com cf2-ib1
     10.222.2.2       cf2-ib2.example.com cf2-ib2
     10.222.3.2       cf2-ib3.example.com cf2-ib3
     10.222.0.101     member1-ib0.example.com member1-ib0
     10.222.1.101     member1-ib1.example.com member1-ib1
     10.222.0.102     member2-ib0.example.com member2-ib0
     10.222.1.102     member2-ib1.example.com member2-ib1
     10.222.0.103     member3-ib0.example.com member3-ib0
     10.222.1.103     member3-ib1.example.com member3-ib1
     10.222.0.104     member4-ib0.example.com member4-ib0
     10.222.1.104     member4-ib1.example.com member4-ib1
    
    Note:
    • In a four member environment that uses a communication adapter port for each CF and member, the file would look similar to the previous example, but contain only the first IP address of each of the CFs in the previous example.
  8. Restart the service for the InfiniBand subsystem.
    systemctl restart rdma.service
  9. Verify the InfiniBand subsystem.
    1. Verify that the ports are active and the links are up.
      Use the ibstat -v command or the ibstatus command to list the state of the adapters. This check applies to the ports and interfaces that were previously identified in /etc/dat.conf.
      ibstatus
      Infiniband device 'mlx4_0' port 1 status:
              default gid:    fe80:0000:0000:0000:0002:c903:0007:eafb
              base lid:        0x2
              sm lid:          0x1
              state:          4: ACTIVE
              phys state:      5: LinkUp
              rate:            20 Gb/sec (4X DDR)
      
      Infiniband device 'mlx4_0' port 2 status:
              default gid:    fe80:0000:0000:0000:0002:c903:0007:eafc
              base lid:        0x3
              sm lid:          0x1
              state:          4: ACTIVE
              phys state:      5: LinkUp
              rate:            20 Gb/sec (4X DDR)
      Note: Port 1 of the example output the ibstatus command on Linux® corresponds to port 0 in the dat.conf file:
      ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
      Verify that the state field value is ACTIVE and the phys state field reports that the link is up (LinkUp).
    2. Ensure the destination IP is resolvable.
      For example, enter the following:
      # ip -resolve neigh
      coralxib44-ib3 dev ib3 lladdr 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0e:9d:5e REACHABLE
      coralxib42.torolab.ibm.com dev bond0 lladdr 00:1a:64:c9:d1:e8 REACHABLE
      coralxib42-ib0 dev ib0 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:07:ea:5f REACHABLE
      coralxib44-ib0 dev ib0 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:07:eb:13 REACHABLE
      9.26.120.1 dev bond0 lladdr 00:00:0c:07:ac:01 REACHABLE
      coralxib43.torolab.ibm.com dev bond0 lladdr 00:1a:64:c9:cc:d4 REACHABLE
      coralxib44-ib2 dev ib2 lladdr 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:0e:9d:5d REACHABLE
      coralxib44.torolab.ibm.com dev bond0 lladdr 00:1a:64:c9:d5:24 REACHABLE
      coralxib44-ib1 dev ib1 lladdr 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:07:eb:14 REACHABLE
      coralxib43-ib0 dev ib0 lladdr 80:14:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:07:ea:07 REACHABLE
      
      # arp -an
      ? (10.1.4.144) at 80:00:00:49:fe:80:00:00:00 [infiniband] on ib3
      ? (9.26.120.241) at 00:1a:64:c9:d1:e8 [ether] on bond0
      ? (10.1.1.142) at 80:00:00:48:fe:80:00:00:00 [infiniband] on ib0
      ? (10.1.1.144) at 80:00:00:48:fe:80:00:00:00 [infiniband] on ib0
      ? (9.26.120.1) at 00:00:0c:07:ac:01 [ether] on bond0
      ? (9.26.120.103) at 00:1a:64:c9:cc:d4 [ether] on bond0
      ? (10.1.2.144) at 80:00:00:48:fe:80:00:00:00 [infiniband] on ib2
      ? (9.26.120.104) at 00:1a:64:c9:d5:24 [ether] on bond0
      ? (10.1.3.144) at 80:00:00:49:fe:80:00:00:00 [infiniband] on ib1
      ? (10.1.1.143) at 80:14:00:48:fe:80:00:00:00 [infiniband] on ib0
      

What to do next

Modify the kernel parameters of hosts that you plan to include in the Db2 pureScale environment.