Setting up the netmon.cf file on an RoCE network (AIX®)

On a remote direct memory access (RDMA) over Converged Ethernet (RoCE) network, one or more IP addresses that can be pinged must be manually set up in the netmon.cf configuration file. The netmon.cf file is required by Reliable Scalable Cluster Technology (RSCT) to monitor the network and ensure that the interfaces can be pinged.

Before you begin

Attention: The procedures documented in this task are no longer required, as the adapter port liveliness test is now enhanced and automated. Some restrictions apply. For more information, seetechnote#0733765.
The examples in this topic are based on the figure at the end of this topic, Two CFs and four members connect to two switches.

Procedure

To set up the netmon.cf configuration file:

  1. Login to the host as root.
  2. Retrieve the cluster manager domain name.
    /home/instname/sqllib/bin/db2cluster -cm -list -domain
  3. Stop the domain.
    /home/instname/sqllib/bin/db2cluster -cm -stop -domain domainname -force 
  4. Determine which IP address should be entered into the members' netmon.cf configuration file.
    On the member host, to check the communication adapter ports and the associated destination IP subnet, run the route command.
    netstat -rn
    For example:
    [root@host1:/]# netstat -rn
    Routing tables
    Destination        Gateway           Flags   Refs     Use  If   Exp  Groups
    
    Route tree for Protocol Family 2 (Internet):
    default            9.68.71.1         UG        3   1404365 en1      -      -
    9.68.71.0          9.68.71.248       UHSb      0         0 en1      -      -   =>
    9.68.71/24         9.68.71.248       U         4   1392490 en1      -      -
    9.68.71.248        127.0.0.1         UGHS      0    246670 lo0      -      -
    9.68.71.255        9.68.71.248       UHSb      0         1 en1      -      -
    127/8              127.0.0.1         U         6   1090098 lo0      -      -
    192.168.30.0       192.168.30.50     UHSb      0         0 en0      -      -   =>
    192.168.30/24      192.168.30.50     U        14  12047588 en0      -      -
    192.168.30.50      127.0.0.1         UGHS      0    760433 lo0      -      -
    192.168.30.255     192.168.30.50     UHSb      0         1 en0      -      -
    192.168.100.0      192.168.100.50    UHSb      0         0 en2      -      -   =>
    192.168.100/24     192.168.100.50    U         2       801 en2      -      -   =>
    192.168.100/24     192.168.200.50    U         2         0 en3      -      -
    192.168.100.2      192.168.100.2     UGH       0        16 en2      -      -
    192.168.100.50     127.0.0.1         UGHS      0    246673 lo0      -      -
    192.168.100.51     192.168.100.51    UGHA      0    179994 en2      -      -   =>
    192.168.100.51     192.168.100.51    UGHA      1    179952 en3      -      -
    192.168.100.60     192.168.100.60    UGHA      1    180109 en3      -      -   =>
    192.168.100.60     192.168.100.60    UGHA      0    180086 en2      -      -
    192.168.100.61     192.168.100.61    UGHA      1    858177 en3      -      -   =>
    192.168.100.61     192.168.100.61    UGHA      1    858179 en2      -      -
    192.168.100.255    192.168.100.50    UHSb      0         1 en2      -      -
    192.168.200.0      192.168.200.50    UHSb      0         0 en3      -      -   =>
    192.168.200/24     192.168.200.50    U         2       801 en3      -      -   =>
    192.168.200/24     192.168.100.50    U         2         0 en2      -      -
    192.168.200.2      192.168.200.2     UGH       0        19 en3      -      -
    192.168.200.50     127.0.0.1         UGHS      0    246588 lo0      -      -
    192.168.200.51     192.168.200.51    UGHA      0    180600 en2      -      -   =>
    192.168.200.51     192.168.200.51    UGHA      1    180587 en3      -      -
    192.168.200.60     192.168.200.60    UGHA      1    180387 en3      -      -   =>
    192.168.200.60     192.168.200.60    UGHA      0    180412 en2      -      -
    192.168.200.61     192.168.200.61    UGHA      1    858413 en3      -      -   =>
    192.168.200.61     192.168.200.61    UGHA      1    858344 en2      -      -
    192.168.200.255    192.168.200.50    UHSb      0         1 en3      -      -
    
    Route tree for Protocol Family 24 (Internet v6):
    ::1%1              ::1%1             UH        1     43168 lo0      -      -
    [root@host1:/]#
    
  5. With the IP subnet, use the IP interfaces created on the switch 1 and switch 2 that the current host connects to with the same IP subnet. (The IP interface should already be created as part of the RoCE network configuration steps, for details see Setting up the IP interfaces on the switch on a RoCE network (Linux).) In this example, assuming the IP interfaces on switch 1 have IP addresses of 192.168.1.2 and 192.168.2.2, and switch 2 have IP addresses of 192.168.1.5 and 192.168.2.5, these entries are added to the members configuration file/var/ct/cfg/netmon.cf.
    Member0 (host3)
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.2.5
    
    Member2 (host5)
    !REQD eth0 192.168.1.5
    !REQD eth1 192.168.2.2
    where:
    • token1 - !REQD is required entity
    • token2 - eth0 and eth1 are the RoCE adapter interface names on the local host
    • token3 - 192.168.1.2, 192.168.2.5, 192.168.1.5, and 192.168.2.2 are the external pingable IP addresses assigned to the interface created on the switches
    The following is an example of what the full configuration file /var/ct/cfg/netmon.cf looks like for members:
    Member0(host3)
    !IBQPORTONLY !ALL
    !REQD eth2 9.26.92.1
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.2.5
    !REQD eth0 192.168.1.5
    !REQD eth1 192.168.2.2
    
    Member2(host5)
    !IBQPORTONLY !ALL
    !REQD eth2 9.26.92.1
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.2.5
    !REQD eth0 192.168.1.5
    !REQD eth1 192.168.2.2
  6. Determine which IP address should be entered into the cluster caching facilities (CFs) netmon.cf configuration file.
    To check the communication adapter port and the associated destination IP subnet, enter:
    netstat -rn
    For example:
    [root@host1:/]# netstat -rn
    Routing tables
    Destination        Gateway           Flags   Refs     Use  If   Exp  Groups
    
    Route tree for Protocol Family 2 (Internet):
    default            9.68.71.1         UG        3   1404365 en1      -      -
    9.68.71.0          9.68.71.248       UHSb      0         0 en1      -      -   =>
    9.68.71/24         9.68.71.248       U         4   1392490 en1      -      -
    9.68.71.248        127.0.0.1         UGHS      0    246670 lo0      -      -
    9.68.71.255        9.68.71.248       UHSb      0         1 en1      -      -
    127/8              127.0.0.1         U         6   1090098 lo0      -      -
    192.168.30.0       192.168.30.50     UHSb      0         0 en0      -      -   =>
    192.168.30/24      192.168.30.50     U        14  12047588 en0      -      -
    192.168.30.50      127.0.0.1         UGHS      0    760433 lo0      -      -
    192.168.30.255     192.168.30.50     UHSb      0         1 en0      -      -
    192.168.100.0      192.168.100.50    UHSb      0         0 en2      -      -   =>
    192.168.100/24     192.168.100.50    U         2       801 en2      -      -   =>
    192.168.100/24     192.168.200.50    U         2         0 en3      -      -
    192.168.100.2      192.168.100.2     UGH       0        16 en2      -      -
    192.168.100.50     127.0.0.1         UGHS      0    246673 lo0      -      -
    192.168.100.51     192.168.100.51    UGHA      0    179994 en2      -      -   =>
    192.168.100.51     192.168.100.51    UGHA      1    179952 en3      -      -
    192.168.100.60     192.168.100.60    UGHA      1    180109 en3      -      -   =>
    192.168.100.60     192.168.100.60    UGHA      0    180086 en2      -      -
    192.168.100.61     192.168.100.61    UGHA      1    858177 en3      -      -   =>
    192.168.100.61     192.168.100.61    UGHA      1    858179 en2      -      -
    192.168.100.255    192.168.100.50    UHSb      0         1 en2      -      -
    192.168.200.0      192.168.200.50    UHSb      0         0 en3      -      -   =>
    192.168.200/24     192.168.200.50    U         2       801 en3      -      -   =>
    192.168.200/24     192.168.100.50    U         2         0 en2      -      -
    192.168.200.2      192.168.200.2     UGH       0        19 en3      -      -
    192.168.200.50     127.0.0.1         UGHS      0    246588 lo0      -      -
    192.168.200.51     192.168.200.51    UGHA      0    180600 en2      -      -   =>
    192.168.200.51     192.168.200.51    UGHA      1    180587 en3      -      -
    192.168.200.60     192.168.200.60    UGHA      1    180387 en3      -      -   =>
    192.168.200.60     192.168.200.60    UGHA      0    180412 en2      -      -
    192.168.200.61     192.168.200.61    UGHA      1    858413 en3      -      -   =>
    192.168.200.61     192.168.200.61    UGHA      1    858344 en2      -      -
    192.168.200.255    192.168.200.50    UHSb      0         1 en3      -      -
    
    Route tree for Protocol Family 24 (Internet v6):
    ::1%1              ::1%1             UH        1     43168 lo0      -      -
    [root@host1:/]#
    

    All four IP addresses created on the switch (which covers all four IP subnets) must be entered into this host's netmon.cf configuration file.

    Repeat this step for the secondary CF host in the cluster.

  7. Restart the domain.
    /home/instname/sqllib/bin/db2cluster -cm -start -domain domainname
  8. Verify all adapters are stable by running the lssrc command:
    lssrc -ls cthats
    The output is similar to the following:
    [root@coralm234 ~]# lssrc -ls cthats
    Subsystem         Group            PID     Status
     cthats           cthats           31938   active
    Network Name   Indx Defd  Mbrs  St   Adapter ID      Group ID
    CG1            [ 0] 3     3     S    192.168.1.234   192.168.1.234
    CG1            [ 0] eth0             0x46d837fd      0x46d83801
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 560419 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 537974 ICMP 0 Dropped: 0
    NIM's PID: 31985
    CG2            [ 1] 4     4     S    9.26.93.226     9.26.93.227
    CG2            [ 1] eth2             0x56d837fc      0x56d83802
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 515550 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 615159 ICMP 0 Dropped: 0
    NIM's PID: 31988
    CG3            [ 2] 3     3     S    192.168.3.234   192.168.3.234
    CG3            [ 2] eth1             0x46d837fe      0x46d83802
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 493188 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 537949 ICMP 0 Dropped: 0
    NIM's PID: 31991
    CG4            [ 3] 2     2     S    192.168.2.234   192.168.2.234
    CG4            [ 3] eth6             0x46d83800      0x46d83803
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 470746 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 537992 ICMP 0 Dropped: 0
    NIM's PID: 31994
    CG5            [ 4] 2     2     S    192.168.4.234   192.168.4.234
    CG5            [ 4] eth7             0x46d837ff      0x46d83804
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 470750 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 538001 ICMP 0 Dropped: 0
    NIM's PID: 31997
      2 locally connected Clients with PIDs:
     rmcd( 32162) hagsd( 32035)
      Dead Man Switch Enabled:
         reset interval = 1 seconds
         trip  interval = 67 seconds
         Watchdog module in use: softdog
      Client Heartbeating Enabled. Period: 6 secs. Timeout: 13 secs.
      Configuration Instance = 1322793087
      Daemon employs no security
      Segments pinned: Text Data Stack.
      Text segment size: 650 KB. Static data segment size: 1475 KB.
      Dynamic data segment size: 2810. Number of outstanding malloc: 1165
      User time 32 sec. System time 26 sec.
      Number of page faults: 0. Process swapped out 0 times.
      Number of nodes up: 4. Number of nodes down: 0.
    Figure 1. Example of environment with two CFs and four members connected to two switches.
    The two CFs and four members connect to two switches.