Setting up the netmon.cf file on an RoCE network (AIX®)
On a remote direct memory access (RDMA) over Converged Ethernet (RoCE) network, one or more IP addresses that can be pinged must be manually set up in the netmon.cf configuration file. The netmon.cf file is required by Reliable Scalable Cluster Technology (RSCT) to monitor the network and ensure that the interfaces can be pinged.
Before you begin
Attention: The procedures documented in this task are no longer required, as
the adapter port liveliness test is now enhanced and automated. Some restrictions apply. For more
information, seetechnote#0733765.
The examples in this topic are based on the figure at the end of this topic,
Two CFs and four members
connect to two switches.Procedure
To set up the netmon.cf configuration file:
- Login to the host as root.
- Retrieve the cluster manager domain name.
/home/instname/sqllib/bin/db2cluster -cm -list -domain
- Stop the domain.
/home/instname/sqllib/bin/db2cluster -cm -stop -domain domainname -force
- Determine which IP address should be entered into the members'
netmon.cf configuration file. On the member host, to check the communication adapter ports and the associated destination IP subnet, run the route command.
For example:netstat -rn
[root@host1:/]# netstat -rn Routing tables Destination Gateway Flags Refs Use If Exp Groups Route tree for Protocol Family 2 (Internet): default 9.68.71.1 UG 3 1404365 en1 - - 9.68.71.0 9.68.71.248 UHSb 0 0 en1 - - => 9.68.71/24 9.68.71.248 U 4 1392490 en1 - - 9.68.71.248 127.0.0.1 UGHS 0 246670 lo0 - - 9.68.71.255 9.68.71.248 UHSb 0 1 en1 - - 127/8 127.0.0.1 U 6 1090098 lo0 - - 192.168.30.0 192.168.30.50 UHSb 0 0 en0 - - => 192.168.30/24 192.168.30.50 U 14 12047588 en0 - - 192.168.30.50 127.0.0.1 UGHS 0 760433 lo0 - - 192.168.30.255 192.168.30.50 UHSb 0 1 en0 - - 192.168.100.0 192.168.100.50 UHSb 0 0 en2 - - => 192.168.100/24 192.168.100.50 U 2 801 en2 - - => 192.168.100/24 192.168.200.50 U 2 0 en3 - - 192.168.100.2 192.168.100.2 UGH 0 16 en2 - - 192.168.100.50 127.0.0.1 UGHS 0 246673 lo0 - - 192.168.100.51 192.168.100.51 UGHA 0 179994 en2 - - => 192.168.100.51 192.168.100.51 UGHA 1 179952 en3 - - 192.168.100.60 192.168.100.60 UGHA 1 180109 en3 - - => 192.168.100.60 192.168.100.60 UGHA 0 180086 en2 - - 192.168.100.61 192.168.100.61 UGHA 1 858177 en3 - - => 192.168.100.61 192.168.100.61 UGHA 1 858179 en2 - - 192.168.100.255 192.168.100.50 UHSb 0 1 en2 - - 192.168.200.0 192.168.200.50 UHSb 0 0 en3 - - => 192.168.200/24 192.168.200.50 U 2 801 en3 - - => 192.168.200/24 192.168.100.50 U 2 0 en2 - - 192.168.200.2 192.168.200.2 UGH 0 19 en3 - - 192.168.200.50 127.0.0.1 UGHS 0 246588 lo0 - - 192.168.200.51 192.168.200.51 UGHA 0 180600 en2 - - => 192.168.200.51 192.168.200.51 UGHA 1 180587 en3 - - 192.168.200.60 192.168.200.60 UGHA 1 180387 en3 - - => 192.168.200.60 192.168.200.60 UGHA 0 180412 en2 - - 192.168.200.61 192.168.200.61 UGHA 1 858413 en3 - - => 192.168.200.61 192.168.200.61 UGHA 1 858344 en2 - - 192.168.200.255 192.168.200.50 UHSb 0 1 en3 - - Route tree for Protocol Family 24 (Internet v6): ::1%1 ::1%1 UH 1 43168 lo0 - - [root@host1:/]#
- With the IP subnet, use the IP interfaces created on the switch 1
and switch 2 that the current host connects to with the same IP subnet. (The IP interface should
already be created as part of the RoCE network configuration steps, for
details see Setting up the IP interfaces on the switch on a RoCE network (Linux).) In this example, assuming the IP interfaces on switch 1
have IP addresses of 192.168.1.2 and 192.168.2.2, and switch 2 have IP addresses of 192.168.1.5 and
192.168.2.5, these entries are added to the members configuration
file/var/ct/cfg/netmon.cf.
where:Member0 (host3) !REQD eth0 192.168.1.2 !REQD eth1 192.168.2.5 Member2 (host5) !REQD eth0 192.168.1.5 !REQD eth1 192.168.2.2
- token1 - !REQD is required entity
- token2 - eth0 and eth1 are the RoCE adapter interface names on the local host
- token3 - 192.168.1.2, 192.168.2.5, 192.168.1.5, and 192.168.2.2 are the external pingable IP addresses assigned to the interface created on the switches
The following is an example of what the full configuration file /var/ct/cfg/netmon.cf looks like for members:Member0(host3) !IBQPORTONLY !ALL !REQD eth2 9.26.92.1 !REQD eth0 192.168.1.2 !REQD eth1 192.168.2.5 !REQD eth0 192.168.1.5 !REQD eth1 192.168.2.2 Member2(host5) !IBQPORTONLY !ALL !REQD eth2 9.26.92.1 !REQD eth0 192.168.1.2 !REQD eth1 192.168.2.5 !REQD eth0 192.168.1.5 !REQD eth1 192.168.2.2
- Determine which IP address should be entered into the cluster caching facilities (CFs)
netmon.cf configuration file. To check the communication adapter port and the associated destination IP subnet, enter:
For example:netstat -rn
[root@host1:/]# netstat -rn Routing tables Destination Gateway Flags Refs Use If Exp Groups Route tree for Protocol Family 2 (Internet): default 9.68.71.1 UG 3 1404365 en1 - - 9.68.71.0 9.68.71.248 UHSb 0 0 en1 - - => 9.68.71/24 9.68.71.248 U 4 1392490 en1 - - 9.68.71.248 127.0.0.1 UGHS 0 246670 lo0 - - 9.68.71.255 9.68.71.248 UHSb 0 1 en1 - - 127/8 127.0.0.1 U 6 1090098 lo0 - - 192.168.30.0 192.168.30.50 UHSb 0 0 en0 - - => 192.168.30/24 192.168.30.50 U 14 12047588 en0 - - 192.168.30.50 127.0.0.1 UGHS 0 760433 lo0 - - 192.168.30.255 192.168.30.50 UHSb 0 1 en0 - - 192.168.100.0 192.168.100.50 UHSb 0 0 en2 - - => 192.168.100/24 192.168.100.50 U 2 801 en2 - - => 192.168.100/24 192.168.200.50 U 2 0 en3 - - 192.168.100.2 192.168.100.2 UGH 0 16 en2 - - 192.168.100.50 127.0.0.1 UGHS 0 246673 lo0 - - 192.168.100.51 192.168.100.51 UGHA 0 179994 en2 - - => 192.168.100.51 192.168.100.51 UGHA 1 179952 en3 - - 192.168.100.60 192.168.100.60 UGHA 1 180109 en3 - - => 192.168.100.60 192.168.100.60 UGHA 0 180086 en2 - - 192.168.100.61 192.168.100.61 UGHA 1 858177 en3 - - => 192.168.100.61 192.168.100.61 UGHA 1 858179 en2 - - 192.168.100.255 192.168.100.50 UHSb 0 1 en2 - - 192.168.200.0 192.168.200.50 UHSb 0 0 en3 - - => 192.168.200/24 192.168.200.50 U 2 801 en3 - - => 192.168.200/24 192.168.100.50 U 2 0 en2 - - 192.168.200.2 192.168.200.2 UGH 0 19 en3 - - 192.168.200.50 127.0.0.1 UGHS 0 246588 lo0 - - 192.168.200.51 192.168.200.51 UGHA 0 180600 en2 - - => 192.168.200.51 192.168.200.51 UGHA 1 180587 en3 - - 192.168.200.60 192.168.200.60 UGHA 1 180387 en3 - - => 192.168.200.60 192.168.200.60 UGHA 0 180412 en2 - - 192.168.200.61 192.168.200.61 UGHA 1 858413 en3 - - => 192.168.200.61 192.168.200.61 UGHA 1 858344 en2 - - 192.168.200.255 192.168.200.50 UHSb 0 1 en3 - - Route tree for Protocol Family 24 (Internet v6): ::1%1 ::1%1 UH 1 43168 lo0 - - [root@host1:/]#
All four IP addresses created on the switch (which covers all four IP subnets) must be entered into this host's netmon.cf configuration file.
Repeat this step for the secondary CF host in the cluster.
- Restart the domain.
/home/instname/sqllib/bin/db2cluster -cm -start -domain domainname
- Verify all adapters are stable by running the lssrc command:
The output is similar to the following:lssrc -ls cthats
[root@coralm234 ~]# lssrc -ls cthats Subsystem Group PID Status cthats cthats 31938 active Network Name Indx Defd Mbrs St Adapter ID Group ID CG1 [ 0] 3 3 S 192.168.1.234 192.168.1.234 CG1 [ 0] eth0 0x46d837fd 0x46d83801 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 560419 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 537974 ICMP 0 Dropped: 0 NIM's PID: 31985 CG2 [ 1] 4 4 S 9.26.93.226 9.26.93.227 CG2 [ 1] eth2 0x56d837fc 0x56d83802 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 515550 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 615159 ICMP 0 Dropped: 0 NIM's PID: 31988 CG3 [ 2] 3 3 S 192.168.3.234 192.168.3.234 CG3 [ 2] eth1 0x46d837fe 0x46d83802 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 493188 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 537949 ICMP 0 Dropped: 0 NIM's PID: 31991 CG4 [ 3] 2 2 S 192.168.2.234 192.168.2.234 CG4 [ 3] eth6 0x46d83800 0x46d83803 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 470746 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 537992 ICMP 0 Dropped: 0 NIM's PID: 31994 CG5 [ 4] 2 2 S 192.168.4.234 192.168.4.234 CG5 [ 4] eth7 0x46d837ff 0x46d83804 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 470750 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 538001 ICMP 0 Dropped: 0 NIM's PID: 31997 2 locally connected Clients with PIDs: rmcd( 32162) hagsd( 32035) Dead Man Switch Enabled: reset interval = 1 seconds trip interval = 67 seconds Watchdog module in use: softdog Client Heartbeating Enabled. Period: 6 secs. Timeout: 13 secs. Configuration Instance = 1322793087 Daemon employs no security Segments pinned: Text Data Stack. Text segment size: 650 KB. Static data segment size: 1475 KB. Dynamic data segment size: 2810. Number of outstanding malloc: 1165 User time 32 sec. System time 26 sec. Number of page faults: 0. Process swapped out 0 times. Number of nodes up: 4. Number of nodes down: 0.
Figure 1. Example of environment with two CFs and four members connected to two switches.