Setting up the netmon.cf file on a RoCE network (Linux)
On a remote direct memory access (RDMA) over Converged
Ethernet (RoCE) network, one or more pingable IP addresses must be manually set up in the
netmon.cf configuration file. The
netmon.cf file is required by Reliable Scalable Cluster Technology (RSCT) to
monitor the network and ensure that the interfaces are pingable or not.
The
procedures documented in this page are no longer required as adapter port liveliness test has been
enhanced and automated. Some restrictions apply. Refer to technote#0733765 for
restrictions.
Before you begin
Procedure
To set up the netmon.cf configuration file:
- Login to the host as root.
- Retrieve the cluster manager domain name.
/home/instname/sqllib/bin/db2cluster -cm -list -domain
- Stop the domain.
/home/instname/sqllib/bin/db2cluster -cm -stop -domain domainname -force
- Determine which IP address should be entered into the members'
netmon.cf configuration file. On the member host, to check the communication adapter ports and the associated destination IP subnet, run the route command.
For example, based on the figure at the end of this topic:/sbin/route | grep -v link-local
The last column (with column name "Iface") lists the adapters on the current host. Choose the adapter that corresponds to the target communication adapter port. In this example, "eth0" and "eth1" are the target RoCE adapters. The corresponding IP addresses in the first column shows the target IP subnet to be used in the next step. In this case, the IP subnets are "192.168.1.0" and "192.168.2.0".Member 0 [root@host3]# route | grep -v link-local Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.1.0 * 255.255.255.0 U 0 0 0 eth0 192.168.2.0 * 255.255.255.0 U 0 0 0 eth1 9.26.92.0 * 255.255.254.0 U 0 0 0 eth2 default 9.26.92.1 0.0.0.0 UG 0 0 0 eth2 Member 2 [root@host5]# route | grep -v link-local Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.1.0 * 255.255.255.0 U 0 0 0 eth0 192.168.2.0 * 255.255.255.0 U 0 0 0 eth1 9.26.92.0 * 255.255.254.0 U 0 0 0 eth2 default 9.26.92.1 0.0.0.0 UG 0 0 0 eth2
- With
the IP subnet, use the IP interfaces created on the switch 1 and switch 2 that the current host
connects to with the same IP subnet. (The IP interface should already be created as part of the RoCE network configuration steps, for details see Setting up the IP interfaces on the switch on a RoCE network (Linux).)
In this example, assuming the IP interfaces on switch 1 have IP addresses of 192.168.1.2 and
192.168.2.2, and switch 2 have IP addresses of 192.168.1.5 and 192.168.2.5, these entries are added
to the members configuration file/var/ct/cfg/netmon.cf.
where:Member0 (host3) !REQD eth0 192.168.1.2 !REQD eth1 192.168.2.5 Member2 (host5) !REQD eth0 192.168.1.5 !REQD eth1 192.168.2.2
- token1 - !REQD is required entity
- token2 - eth0 and eth1 are the RoCE adapter interface names on the local host
- token3 - 192.168.1.2, 192.168.2.5, 192.168.1.5, and 192.168.2.2 are the external pingable IP addresses assigned to the interface created on the switches
The following is an example of what the full configuration file /var/ct/cfg/netmon.cf looks like for members:Member0(host3) !IBQPORTONLY !ALL !REQD eth2 9.26.92.1 !REQD eth0 192.168.1.2 !REQD eth1 192.168.2.5 !REQD eth0 192.168.1.5 !REQD eth1 192.168.2.2 Member2(host5) !IBQPORTONLY !ALL !REQD eth2 9.26.92.1 !REQD eth0 192.168.1.2 !REQD eth1 192.168.2.5 !REQD eth0 192.168.1.5 !REQD eth1 192.168.2.2
- Determine which IP address should be entered into the cluster caching facilities (CFs)
netmon.cf configuration file. To check the communication adapter port and the associated destination IP subnet, enter:
For example:/sbin/route | grep -v link-local
The last column (Iface) indicates the adapter interface name. In this case, eth0, eth1, eth2, and eth3 are the only communication adapter port interface on this host. Four IP subnets are relevant to this host.Host1> $ /sbin/route | grep -v link-local Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.4.0 * 255.255.255.0 U 0 0 0 eth3 192.168.3.0 * 255.255.255.0 U 0 0 0 eth1 192.168.2.0 * 255.255.255.0 U 0 0 0 eth2 192.168.1.0 * 255.255.255.0 U 0 0 0 eth0 9.26.92.0 * 255.255.252.0 U 0 0 0 eth2 default rsb-v94-hsrp.to 0.0.0.0 UG 0 0 0 eth2
All four IP addresses created on the switch (which covers all four IP subnets) must be entered into this host's netmon.cf configuration file. For example:!IBQPORTONLY !ALL !REQD eth2 9.26.92.1 !REQD eth0 192.168.1.2 !REQD eth1 192.168.3.2 !REQD eth7 192.168.2.2 !REQD eth6 192.168.4.2
Repeat this step for the secondary CF host in the cluster.
- Restart the domain.
/home/instname/sqllib/bin/db2cluster -cm -start -domain domainname
- Verify all adapters are stable by running the lssrc command:
The output is similar to the following:lssrc -ls cthats
[root@coralm234 ~]# lssrc -ls cthats Subsystem Group PID Status cthats cthats 31938 active Network Name Indx Defd Mbrs St Adapter ID Group ID CG1 [ 0] 3 3 S 192.168.1.234 192.168.1.234 CG1 [ 0] eth0 0x46d837fd 0x46d83801 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 560419 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 537974 ICMP 0 Dropped: 0 NIM's PID: 31985 CG2 [ 1] 4 4 S 9.26.93.226 9.26.93.227 CG2 [ 1] eth2 0x56d837fc 0x56d83802 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 515550 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 615159 ICMP 0 Dropped: 0 NIM's PID: 31988 CG3 [ 2] 3 3 S 192.168.3.234 192.168.3.234 CG3 [ 2] eth1 0x46d837fe 0x46d83802 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 493188 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 537949 ICMP 0 Dropped: 0 NIM's PID: 31991 CG4 [ 3] 2 2 S 192.168.2.234 192.168.2.234 CG4 [ 3] eth6 0x46d83800 0x46d83803 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 470746 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 537992 ICMP 0 Dropped: 0 NIM's PID: 31994 CG5 [ 4] 2 2 S 192.168.4.234 192.168.4.234 CG5 [ 4] eth7 0x46d837ff 0x46d83804 HB Interval = 0.800 secs. Sensitivity = 4 missed beats Ping Grace Period Interval = 60.000 secs. Missed HBs: Total: 0 Current group: 0 Packets sent : 470750 ICMP 0 Errors: 0 No mbuf: 0 Packets received: 538001 ICMP 0 Dropped: 0 NIM's PID: 31997 2 locally connected Clients with PIDs: rmcd( 32162) hagsd( 32035) Dead Man Switch Enabled: reset interval = 1 seconds trip interval = 67 seconds Watchdog module in use: softdog Client Heartbeating Enabled. Period: 6 secs. Timeout: 13 secs. Configuration Instance = 1322793087 Daemon employs no security Segments pinned: Text Data Stack. Text segment size: 650 KB. Static data segment size: 1475 KB. Dynamic data segment size: 2810. Number of outstanding malloc: 1165 User time 32 sec. System time 26 sec. Number of page faults: 0. Process swapped out 0 times. Number of nodes up: 4. Number of nodes down: 0.
Figure 1. Two CFs and four members connect to two switches.