Controlling and monitoring host power state management
The following commands allow for control and monitoring of host power state management.
badmin hpower
The option: hpower for badmin is used to switch the power state of idle host (hosts and host groups including compute unit and host partition hosts) to enter into power saving state or working state manually. For example:
badmin hpower suspend | resume [-C comments] host_name […]
Options:
- suspend
- Puts the host in energy saving state. badmin hpower suspend calls the script defined by POWER_SUSPEND_CMD in the PowerPolicy, and tags the host so that it cannot be resumed by the PowerPolicy.
- resume
- Puts the host in working state. The host can enter power save status when CYCLE_TIME is reached. If the host should not enter power save status, use the badmin hclose command to block the host from the power policy.
- -C
- Add to describe the specified power management action. Comments are displayed by badmin hist and badmin hhist.
- host_name
- Specify one or more host names, host groups, compute units, or host partitions. All specified hosts will be switched to energy saving state or working state. Error message will be shown if the host state is not ready for switching. (Each host is in one line with each message)
badmin hist and badmin hhist
Use badmin hist and badmin hhist to retrieve the historical information about the power state changes of hosts.
All power related events are logged for both badmin hpower and actions triggered by configured (automated) PowerPolicy.
Power State Action | Performed by | Success/Fail | Logged Events |
---|---|---|---|
Suspend | By badmin hpower | On Success |
Host <host_name> suspend request from administrator <cluster_admin_name>. Host <host_name> suspend request done. Host <host_name> suspend. |
On Failure |
Host <host_name> suspend request from administrator <cluster_admin_name>. Host <host_name> suspend request failed. Host <host_name> power unknown. |
||
By PowerPolicy | On Success |
Host <host_name> suspend request from power policy <policy_name>. Host <host_name> suspend request done. Host <host_name> suspend. |
|
On Failure |
Host <host_name> suspend request from power policy <policy_name>. Host <host_name> suspend request failed. Host <host_name> power unknown. |
||
Resume | By badmin hpower | On Success |
Host <host_name> resume request from administrator <cluster_admin_name>. Host <host_name> resume request done. Host <host_name> on. |
On Failure |
Host <host_name> resume request from administrator <cluster_admin_name>. Host <host_name> resume request exit. Host <host_name> power unknown. |
||
By PowerPolicy | On Success |
Host <host_name> resume request from power policy <policy_name>. Host <host_name> resume request done. Host <host_name> on. |
|
On Failure |
Host <host_name> resume request from power policy <policy_name>. Host <host_name> resume request exit. Host <host_name> power unknown. |
bhosts
Use bhosts -l to display the power state for hosts. bhosts only shows the power state of the host when PowerPolicy (in lsb.resources) is enabled. If the host status becomes unknown (power operation due to failure), the power state is shown as a dash (“-”).
Final power states:
- on
- The host power state is “On” (Note: power state “on” does not mean the batch host state is “ok”, which depends on whether lim and sbatchd can be connected by the management host.)
- suspend
- The host is suspended by policy or manually with badmin hpower
Intermediate power states:
The following states are displayed when mbatchd has sent a request for power operations but the execution has not returned back. If the operation command returns, LSF assumes the operation is done. The intermediate status will be changed.
- restarting
- The host is resetting when resume operation failed.
- resuming
- The host is being resumed from standby state which is triggered by either policy or job, or cluster administrator
- suspending
- The host is being suspended which is triggered by either policy or cluster administrator
Final host state under administrator control:
- closed_Power
- The host it is put into power saving (suspend) state by the cluster administrator
Final host state under policy control:
- ok_Power
- A transitional state displayed while the host waits for sbatchd to resume. Lets mbatchd know that the host may be considered for scheduling, but it cannot immediately be used for jobs.
Example bhosts:
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
host1 closed - 4 0 0 0 0 0
host2 ok_Power - 4 0 0 0 0
host3 unavail - 4 0 0 0 0 0
Example bhosts -w:
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
host1 closed_Power - 4 0 0 0 0 0
host2 ok_Power - 4 0 0 0 0
host3 unavail - 4 0 0 0 0 0
Example bhosts -l:
HOST host1
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
closed_Power 1.00 - 4 4 4 0 0 - -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots
Total 0.0 0.0 0.0 0% 0.0 0 0 0 31G 31G 12G 0
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 4096M -
LOAD THRESHOLD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
POWER STATUS: ok
IDLE TIME: 2m 12s
CYCLE TIME REMAINING: 3m 1s
bjobs
When a host in energy saving state host is switched to working state by a job (that is, the job has been dispatched and waiting for the host to resume), its state is not shown as pending. Instead, it is displayed as provisioning (PROV). For example:
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
204 root PROV normal host2 host1 sleep 9999 Jun 5 15:24
The state PROV is displayed. This state shows that the job is dispatched to a suspended host, and this host is being resumed. The job remains in PROV state until LSF dispatches the job.
When a job is requires a host in energy saving state or the host is powered off, and LSF is switching the host to working state, the following event is appended by bjobs -l:
Mon Nov 5 16:40:47: Will start on 2 Hosts <host1> <host2>. Waiting for machine provisioning;
The message indicates which host is being provisioned and how many slots are requested.
bhist
When a job is dispatched to a standby host and provisioning the host to resume to working state is triggered, two events are saved into lsb.events and lsb.streams. For example:
Tue Nov 19 01:29:20: Host is being provisioned for job. Waiting for host <xxxx> to power on;
Tue Nov 19 01:30:06: Host provisioning is done;
bresources
Use bresources -p to show the configured energy aware scheduling policies. For example:
bresources -p
Begin PowerPolicy
NAME = policy_night
HOSTS = hostGroup1 host3
TIME_WINDOW= 23:59-5:00
MIN_IDLE_TIME= 1800
CYCLE_TIME= 60
APPLIED = Yes
End PowerPolicy
Begin PowerPolicy
NAME = policy_other
HOSTS = all
TIME_WINDOW= all
APPLIED = Yes
End PowerPolicy
In the above case, “policy_night” is defined only for hostGroup1 and host3 and applies during the hours of 23:59 and 5:00. In contrast, “policy_other” covers all other hosts not included in the “policy_night” power policy (with the exception of management and management candidate hosts) and is in effect at all hours.