Troubleshooting
Problem
This document includes restrictions for handling power failure events in a hosted environment.
Resolving The Problem
For IBM i environments, there have been standard procedures for handling power failures, as documented in the Backup and Recovery reference material at the following URL:
http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/topic/rzahr/rzahrovrvwco.htm
In basic partitioned environments where the physical resources are assigned to the partitions, the following procedures are generally recommended:
| 1. | Determine how long your UPS can support your system and all of the necessary peripheral devices. For example, let's say the UPS can support everything for 40 minutes. |
| 2. | Set the QUPSDLYTIM system value to a time less than UPS depletion time. If this timer expires, the partition will go down very quickly, but slightly more gracefully than losing power completely. Continuing our example, you might set the QUPSDLYTIM system value to 30 minutes. |
| 3. | Set up a power handling message queue and program to monitor for Utility Power messages (message CPF1816, message CPF1817, and so on). a. Once a power failure message is received, the program sleeps for some period of time (for example, 15 minutes). b. If the utility power is still failed at that point, send a break message to users that utility power has failed and, if it is still failed in X minutes, a system shutdown will be initiated (for example, 5 minutes later). If the power is restored, resume standard power monitoring. c. If the utility power is still failed at that point (for example, 20 minutes after the initial power failure message), initiate an orderly shutdown of applications, and user subsystems. If the power is restored, send a break message to users that power is restored, and then resume standard power monitoring. d. If the utility power is still failed at the next checkpoint (for example, 25 minutes after the initial power failure message), initiate a partition shutdown. If the power is restored, restart user subsystems and applications, and then resume standard power monitoring. e. If the utility power is still failed when the QUPSDLYTIM value expires, the system cannot rely on power being available for much longer, and will be shutdown very quickly. Most subsystems and application should be cleanly ended by this time. If the power is restored prior to the QUPSDLYTIM expiring, the system shutdown will already be in progress, and the partition would need to be activated once the power off completes. |
Because all partitions will get the Utility Power Loss signal (message CPF1816 for IBM i, and EPOW events for AIX and LINUX) at the same time, all the timings must be adjusted appropriately.
In i-hosting-i environments, the serving IBM i partition should have its QUPSDLYTIM set according to the calculated UPS depletion time, as was described above in Step 2.
In VIOS-based environments, there is a function for setting the shutdown timer, as well. More details are available later in this document.
Once the server-side timer is set, the client partition times will need to be adjusted down, so that they can complete their processing prior to the server shutting down. For our example, this might look like the following:
| o | Server QUPSDLYTIM (or VIOS power failure timer) is set to 30 minutes. |
| o | Client QUPSDLYTIM is set to 28 minutes. |
| o | Client power handling program sleeps for 13 minutes, and notifies users if the power is still out. |
| o | Client power handling program initiates application/subsystem shutdown at 18 minutes. |
| o | Client power handling program initiates partition shutdown at 23 minutes. |
| o | Client QUPSDLYTIM expires at 28 minutes if the partition shutdown is not yet complete. |
| o | Server QUPSDLYTIM expires at 30, and shuts the server down prior to completely losing AC power. |
Once the AIX/VIOS change is delivered, the following command (run as root/sudo in AIX, or under oem_setup_env in VIOS), will set the shutdown timer:
chitab "powerfail::powerfail:/etc/rc.powerfail -t 30 >/dev/console 2>&1"
This example sets the timer to 30 minutes.
[{"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"6.1;6.1.1;7.1;6.1.0;6.1.1;7.1.0","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]
Historical Number
612966323
Was this topic helpful?
Document Information
Modified date:
19 February 2022
UID
nas8N1011322