LSF_DYNAMIC_HOST_TIMEOUT

Defines how long LSF waits before automatically removing unavailable dynamic hosts from the cluster.

Syntax

  • LSF_DYNAMIC_HOST_TIMEOUT=time_hours
  • LSF_DYNAMIC_HOST_TIMEOUT=time_minutesm|M
  • LSF_DYNAMIC_HOST_TIMEOUT="EXPIRY[time_minutesm|M] THRESHOLD[number] INTERVAL[time_minutesm|M]"(Additionally, as of Fix Pack 14, you can also use this syntax)

Description

To improve performance in very large clusters:
  • Prior to Fix Pack 14, , disable this feature and remove unwanted hosts from the hostcache file manually.
  • Starting in Fix Pack 14, define the LSF_DYNAMIC_HOST_TIMEOUT parameter using the LSF_DYNAMIC_HOST_TIMEOUT="EXPIRY[time_minutesm|M] THRESHOLD[numberm|M] INTERVAL[time_minutesm|M]" syntax. Specify additional THRESHOLD and INTERVAL values based on the number of dynamic hosts in your cluster.

Use this parameter to specify how long LSF waits before the management host automatically removes unavailable dynamic hosts. Starting in Fix Pack 14, you can mark these unavailable hosts using the lsadmin expire command; then, LSF removes these hosts based on this LSF_DYNAMIC_HOST_TIMEOUT settings. Each time LSF removes a dynamic host, mbatchd automatically reconfigures itself.

Valid values

Prior to Fix Pack 14:
  • The timeout value must be greater than or equal to ten minutes.
  • Values below ten minutes are set to the minimum allowed value ten minutes; values above 100 hours are set to the maximum allowed value 100 hours.
As of Fix Pack 14, specify a timeout value using any of these options, or a combination, surrounded by double quotation marks:
"EXPIRY[time_hours] or EXPIRY[time_minutesm|M]"
The amount of time, in hours or minutes, that a host is kept as unavailable. After this time, it is marked as expired and cleaned from the hostcache only, but kept in MLIM and mbatchd memory. Expired hosts start to wait for cleanup if the THRESHOLD and INTERVAL options are specified.

The default unit for the expiry time is in hours (for example, EXPIRY[60] indicates 60 hours). To specify time in minutes, specify m or M after the value (for example, EXPIRY[60m] indicates 60 minutes). Valid values are any number between 0 and 2147483647.

"THRESHOLD[number]"
When the number of expired dynamic hosts reaches this threshold, MLIM will remove them from memory, and the mbatchd daemon is reconfigured once, when it gets host information after hosts are removed. Valid values are any number between 1 and 2147483647.
"INTERVAL[time_hours] or INTERVAL[time_minutesm|M]"
How often, in hours or minutes, that MLIM attempts to clean up expired dynamic hosts from its memory. The mbatchd daemon is reconfigured once, when it gets host information after hosts are removed.

The default unit for this interval is in hours (for example, INTERVAL[20] indicates every 20 hours). To specify time in minutes, specify m or M after the value (for example, INTERVAL[20m] indicates every 20 minutes). Valid values are any number between 0 and 2147483647.

Examples

Remove a dynamic host from the cluster if it has been unavailable for 15 hours:
LSF_DYNAMIC_HOST_TIMEOUT=15
As of Fix Pack 14, the equivalent syntax is:
LSF_DYNAMIC_HOST_TIMEOUT="EXPIRY[15] THRESHOLD[1]"
As of Fix Pack 14, when the cluster reaches 100 expired dynamic hosts, remove those hosts from the cluster:
LSF_DYNAMIC_HOST_TIMEOUT="THRESHOLD[100]"
As of Fix Pack 14, remove expired dynamic hosts from the cluster every 15 minutes:
LSF_DYNAMIC_HOST_TIMEOUT="INTERVAL[15m]"
As of Fix Pack 14, keep dynamic hosts in unavailable status for 15 minutes, then mark it as expired. When the number of expired dynamic hosts reaches 100, remove them from the cluster:
LSF_DYNAMIC_HOST_TIMEOUT="EXPIRY[15m] THRESHOLD[100]"
As of Fix Pack 14, keep dynamic hosts in unavailable status for 15 minutes, then mark as expired. Remove these expired dynamic hosts from the cluster every hour:
LSF_DYNAMIC_HOST_TIMEOUT="EXPIRY[15m] INTERVAL[1]"
As of Fix Pack 14, check the number of expired dynamic hosts every 15 minutes, and if the number of hosts reaches 100, remove them from the cluster:
LSF_DYNAMIC_HOST_TIMEOUT="THRESHOLD[100] INTERVAL[15m]"
As of Fix Pack 14, keep dynamic hosts in unavailable status for 15 minutes, then mark as expired. Check the number of expired dynamic hosts every 30 minutes, and if the number of hosts reaches 100, remove them from the cluster:
LSF_DYNAMIC_HOST_TIMEOUT="EXPIRY[15m] THRESHOLD[100] INTERVAL[30m]"

EGO parameter

EGO_DYNAMIC_HOST_TIMEOUT

Default

Not defined. Unavailable hosts are never removed from the cluster.