Topic
  • 3 replies
  • Latest Post - ‏2012-04-27T03:47:59Z by amanabe
amanabe
amanabe
60 Posts

Pinned topic Failure Detection Time(FDC) of TSA(RSCT)

‏2012-02-06T10:26:17Z |
Hello.
Since it seems that the logic of Failure Detection Time(FDC) of TSA(RSCT) is changed recently, please let me check.

As far as I know, FDC can calculable by "HeartBeatPeriod * Sensitivity * 2" with a former release of TSA.
And we can check it with the value of "trip interval" of "lssrc -ls cthats" command.

But the the value of "trip interval" has changed by the latest release as below.

ex.
>HB Interval = 1.000 secs. Sensitivity = 4 missed beats
>Missed HBs: Total: 0 Current group: 0
>Packets sent : 576112 ICMP 0 Errors: 0 No mbuf: 0
>Packets received: 749092 ICMP 0 Dropped: 0
>NIM's PID: 2527
> 2 locally connected Clients with PIDs:
> rmcd( 1945) hagsd( 2545)
> Dead Man Switch Enabled:
> reset interval = 1 seconds
> trip interval = 16 seconds
> Watchdog module in use: vmwatchdog
> Client Heartbeating Enabled. Period: 8 secs. Timeout: 16 secs.
>

It seems Client Heartbeating Period is the equal to FDC.
And Timeout seems to be twice the Period.

Does this mean that it takes double time than former release as for detection of heart beat failure ?

Is the value of Client Heartbeating Period always the same as FDC?

Thanks
Atsushi Manabe
Updated on 2012-04-27T03:47:59Z at 2012-04-27T03:47:59Z by amanabe
  • amanabe
    amanabe
    60 Posts

    Re: Failure Detection Time(FDC) of TSA(RSCT)

    ‏2012-02-07T10:28:30Z  
    FDC in this sentence is corrected to FDR(Failure Detection Rate)
  • sedgewick_de
    sedgewick_de
    36 Posts

    Re: Failure Detection Time(FDC) of TSA(RSCT)

    ‏2012-04-26T09:08:38Z  
    Manabe-san,

    it seems I missed this post, my excuses for that.

    TSA/RSCT has been equipped with an additional grace period. In case heartbeating fails multiple times exceeding sensitivity, TSA/RSCT sends out an ICMP ping to its neighbor and waits for the time specified as grace period for a response. In case a response is received the node is not considered dead, but only responding too slowly to heartbeat packages.
    This grace period handling is responsible for extending the time before a node is considered dead. The default grace period is -1 indicating that HATS as TSA/RSCT component computes the period from heartbeat sensitivity and frequency, any positive number is taken as seconds and the value 0 deactivates the grace ICMP ping. In the last case the former values for computing FDR will hold true

    kind regards,
    Markus
  • amanabe
    amanabe
    60 Posts

    Re: Failure Detection Time(FDC) of TSA(RSCT)

    ‏2012-04-27T03:47:59Z  
    Manabe-san,

    it seems I missed this post, my excuses for that.

    TSA/RSCT has been equipped with an additional grace period. In case heartbeating fails multiple times exceeding sensitivity, TSA/RSCT sends out an ICMP ping to its neighbor and waits for the time specified as grace period for a response. In case a response is received the node is not considered dead, but only responding too slowly to heartbeat packages.
    This grace period handling is responsible for extending the time before a node is considered dead. The default grace period is -1 indicating that HATS as TSA/RSCT component computes the period from heartbeat sensitivity and frequency, any positive number is taken as seconds and the value 0 deactivates the grace ICMP ping. In the last case the former values for computing FDR will hold true

    kind regards,
    Markus
    Markus-san

    Thank you for your answer!!

    Regards. Atsushi Manabe