IBM Support

How to tune node_timeout and network_fdt

How To


Summary

This technote provides a sample of steps to tune node_timeout (HEARTBEAT_FREQUENCY) and network_fdt (NETWORK_FAILURE_DETECTION_TIME) in PowerHA SystemMirror for AIX.

Objective

CAA parameter node_timeout equals to PowerHA parameter HEARTBEAT_FREQUENCY, which means the number of milliseconds that other nodes may consider to mark nodeX as DOWN if they do not receiveany incoming heartbeat from nodeX. CAA parameter network_fdt equals to PowerHA parameter NETWORK_FAILURE_DETECTION_TIME, which means the amount of time CAA would wait to give network failure notification to the PowerHA. Based on the conditions of customer's environment, sometimes it is necessary to tune the two parameters.

Steps

1. Check current setting:
  # clctrl -tune -L network_fdt -L node_timeout

2. Tune node_timeout value, for example to 120 seconds:
  # clmgr modify cluster HEARTBEAT_FREQUENCY=120
 
3. Sync cluster:
  # clmgr sync cluster
4. Tune network_fdt value, for example to 60 seconds (this value must be at least 10 smaller than node_down value):
  # clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=60  

5. Sync cluster:
  # clmgr sync cluster

6. Check current setting:
  # clctrl -tune -L network_fdt -L node_timeout

Additional Information

1. The two parameters are in milliseconds in CAA, while are in seconds in PowerHA.
2. The tuning command "clmgr" requires a sync to make the change effective.
3. Parameter network_fdt (NETWORK_FAILURE_DETECTION_TIME) must be at least 10 seconds smaller than parameter node_timeout (HEARTBEAT_FREQUENCY). So if increaing the parameters is needed, it is a practice to tune node_timeout first and then network_fdt. If the currently effective network_fdt value is already 10 seconds larger than the terget value of network_fdt, then the step 3 (sync cluster) can be skipped. Otherwise, if the currently effective network_fdt value is not at least 10 seconds larger than the terget value of network_fdt, below error could be prompted to prevent modifying the parameter in step 4.
# clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=60
ERROR: The Given Network Failure Detection TimeOut is not at least 10 seconds
       less than the Node Failure Detection Time(Heartbeat frequency)30 seconds. The
       Network Failure Detection TimeOut value can be set within the range 5 to 20
       seconds based on the current Node Failure Detection Time value.

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"Cluster Aware AIX (CAA)","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSPHQG","label":"PowerHA SystemMirror"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
17 May 2019

UID

ibm10884696