ITM Nuggets: How to improve the fail-over time of your HUB TEMS (FTO)
MarkLeftwich 270000DTK4 Visits (12163)
As normal, I like to blog about areas of ITM that I cover when working with you, either through PMRs or direct on customer site. Today's series topics is all related to speeding up fail-over times or your HUB TEMS .
I hear many of the same type of requests when it comes to FTO (fail-over)
"when our primary HUB stops responding and we need to fail-over to the backup HUB, is there any way to make this happen quicker?"
The answer is yes, there are ways to get a quicker fail-over and get the backup TEMS to take on the Primary responsibilities
Improving FTO Failover Time
Several areas can be tuned improve FTO fail-over time between two HUB TEMS. I will go over each parameter that needs to be changed in turn and explain what it does to your environment.
First up.... The parameters you need to add and change:
The default for this variable is 2. Changing the value to 1 results in only one reconnect attempt before assuming the primary HUB role, thus a slightly sped up failover time.
TIME SAVED = It can save up to a few minutes in failover time.
This parameter dictates how often the HUBs will check to see if its peer is still around and responding when no data has been received for a period of time. Its sort of like a friend picking up the phone to call his buddy to check everything is OK. When he doesn't get a response its time to jump into action.
There are two parts to this check. If either fails then the peer HUB is marked as disconnected. The default value is 120 seconds.
1. Assuming the default value, if no data has been received from the peer HUB for 60 seconds (1/2 the specified value), then a request called a "ping" is sent to the peer HUB. If the ping request cannot successfully be sent, then the peer is marked as disconnected.
2. If the ping request is successfully sent, then the peer HUB is expected to return the ping request within 2 check intervals. Assuming the default value, this means that the request must be returned within 4 minutes. The peer HUB is marked as disconnected if the request is not returned.
TIME SAVED = Using a check interval of 30 can remove another 45 seconds from the failover time.
Note: If your network is a little on the slow side and you are seeing false positives and your TEMS is failing over when it should not, just Increase this value
Note: I would not recommend going any lower than 30 for this parameter, as you will increase the likelihood of falsely marking the peer HUB as disconnected. The smaller check interval means there is less time for the peer HUB to return the ping request, which can lead to false disconnect state.
These variables can reduce the number of interfaces that are checked when a connection issue occurs between the two HUBs. These variables will restrict the HUB to use only the interfaces specified in the GLB_SITE.TXT file to communicate to the peer HUB.
Again you are deferring away from standard configuration with these parameters, so only change them if you have a strong business need to change them!
Second......Where to make the changes:
You need to add the parameters to the KBBENV file on the HUB TEMS (both of them)
For Unix / linux
You need to add the parameters to the TEMS .conf file
Remember as you are changing the running config of both TEMS, you will need to restart both TEMS for the changes to take effect.
Before you make any changes to a component BACKUP that component. These changes should not have any adverse effect on your TEMS, but that does not mean you shouldnt backup to be safe. If you do see any odd behaviour due to ITM or the environment, at least you can restore previous configuration quickly.
To follow my social updates on IBM software, please feel free to connect with me by clicking on the images below:
Find all my other blogs here:
Subscribe and follow us for all the latest information directly on your social feeds:
MIKE - TESTING.