IBM Support

Power Failure Handling - IBM i in a Hosted Environment

Troubleshooting


Problem

This document includes restrictions for handling power failure events in a hosted environment.

Resolving The Problem

For IBM i environments, there have been standard procedures for handling power failures, as documented in the Backup and Recovery reference material at the following URL:

http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/topic/rzahr/rzahrovrvwco.htm

In basic partitioned environments where the physical resources are assigned to the partitions, the following procedures are generally recommended:

1.Determine how long your UPS can support your system and all of the necessary peripheral devices. For example, let's say the UPS can support everything for 40 minutes.
2.Set the QUPSDLYTIM system value to a time less than UPS depletion time. If this timer expires, the partition will go down very quickly, but slightly more gracefully than losing power completely. Continuing our example, you might set the QUPSDLYTIM system value to 30 minutes.
3.Set up a power handling message queue and program to monitor for Utility Power messages (message CPF1816, message CPF1817, and so on).

a. Once a power failure message is received, the program sleeps for some period of time (for example, 15 minutes).

b. If the utility power is still failed at that point, send a break message to users that utility power has failed and, if it is still failed in X minutes, a system shutdown will be initiated (for example, 5 minutes later). If the power is restored, resume standard power monitoring.

c. If the utility power is still failed at that point (for example, 20 minutes after the initial power failure message), initiate an orderly shutdown of applications, and user subsystems. If the power is restored, send a break message to users that power is restored, and then resume standard power monitoring.

d. If the utility power is still failed at the next checkpoint (for example, 25 minutes after the initial power failure message), initiate a partition shutdown. If the power is restored, restart user subsystems and applications, and then resume standard power monitoring.

e. If the utility power is still failed when the QUPSDLYTIM value expires, the system cannot rely on power being available for much longer, and will be shutdown very quickly. Most subsystems and application should be cleanly ended by this time. If the power is restored prior to the QUPSDLYTIM expiring, the system shutdown will already be in progress, and the partition would need to be activated once the power off completes.
In hosted partition environments, there are added complexities that need to be taken into account. This is especially true in regards to IO resources, such as virtualized disk, and device-backed Active Memory Sharing (AMS). If the power handling in the server partition is not factored in, the IO resources can become unavailable prior to the client partition programs/timers reaching completion. This would leave the client partitions in a polling for resources state, such as A6000255 (waiting for disk to reappear).

Because all partitions will get the Utility Power Loss signal (message CPF1816 for IBM i, and EPOW events for AIX and LINUX) at the same time, all the timings must be adjusted appropriately.

In i-hosting-i environments, the serving IBM i partition should have its QUPSDLYTIM set according to the calculated UPS depletion time, as was described above in Step 2.

In VIOS-based environments, there is a function for setting the shutdown timer, as well. More details are available later in this document.
Once the server-side timer is set, the client partition times will need to be adjusted down, so that they can complete their processing prior to the server shutting down. For our example, this might look like the following:
oServer QUPSDLYTIM (or VIOS power failure timer) is set to 30 minutes.
oClient QUPSDLYTIM is set to 28 minutes.
oClient power handling program sleeps for 13 minutes, and notifies users if the power is still out.
oClient power handling program initiates application/subsystem shutdown at 18 minutes.
oClient power handling program initiates partition shutdown at 23 minutes.
oClient QUPSDLYTIM expires at 28 minutes if the partition shutdown is not yet complete.
oServer QUPSDLYTIM expires at 30, and shuts the server down prior to completely losing AC power.
AIX and VIOS-based environments currently have a limitation in regards to power handling. AIX and VIOS Utility Power Failure handling is limited to 10 minutes or less. APAR IV09188 will address this for future releases of AIX and VIOS. Until this change is delivered, client partitions in VIOS-based environments should adjust all times to shutting down within 10 minutes. AIX or VIOS support can be contacted to request an ifix, if needed, prior to general availability. IBM i hosted environments are currently able to set QUPSDLYTIM to any time value on the server side, and have that value honored.

Once the AIX/VIOS change is delivered, the following command (run as root/sudo in AIX, or under oem_setup_env in VIOS), will set the shutdown timer:

chitab "powerfail::powerfail:/etc/rc.powerfail -t 30 >/dev/console 2>&1"

This example sets the timer to 30 minutes.

[{"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"6.1;6.1.1;7.1;6.1.0;6.1.1;7.1.0","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Historical Number

612966323

Document Information

Modified date:
19 February 2022

UID

nas8N1011322