Update: there's an efix available for the HMC. For further details see this post.
I received this notice today, which is very important if you're running HMC V7R7.3.0 or SDMC V6R7.3.0. The gist of it is that you could lose your partition profile data, so if you're already running these versions of the HMC or SDMC, you must backup your profile data
and wait for a fix. And if you're on an older release of the HMC or SDMC, don't upgrade to these versions until a fix is released.
I've included some links to the commands referred to in this document.
Abstract: HMC / SDMC Save Corruption Exposure
Systems Affected: All 7042s
Communicable to Clients: Yes
IBM has learned that HMCs running V7R7.3.0 or SDMC running V6R7.3.0 could potentially be exposed to save area corruption (where partition profile data is stored).
Symptoms include loss of profiles and/or recovery state due to a checksum failure against the profiles in the save area. In addition, shared processor pools names can be affected (processor pool number and configuration are not lost ), system profiles lost, virtual ethernet MAC address base may change causing next partition activation to fail or to have different virtual Ethernet MAC addresses, loss of a default profile for all or some of the partitions.
Partitions will continue to run, but reactivation via profile will fail if the profile is missing or corrupted. All mobility operations and some DLPAR operations will fail if a partition has missing or corrupted profiles.
Environments using HMCs or SDMCs to control multiple managed systems have the greatest exposure. Triggers for exposure include any of the following operations performed in parallel to any managed system: Live Partition Mobility (LPM), Dynamic LPAR (DLPAR), profile changes, partition activation, rebuild of the managed system, rebooting with multiple servers attached, disconnecting or reconnecting a server, hibernate or resume, or establishing a new RMC connection.
Recommended Service Actions:
There is no real work-around other than limiting the configurations to a single HMC managing a single managed system.
Customers who have not yet upgraded or installed HMC 7.7.3 should delay the upgrade/install if at all possible until a fix is available.
Customers who have not yet installed and deployed SDMC 184.108.40.206 should avoid discovering production servers until a fix is available.
Customers that have 7.7.3 or SDMC 220.127.116.11 deployed should:
- Immediately do a profile backup operation for all managed servers:
bkprofdata -m <managed system name> -f <filename>
- Minimize the risk of encountering the problem by using only a single HMC or SDMC to
manage a single server via the following options:
* Power off dual HMC/SDMC or remove the connection from any dual HMC/SDMC.
*Use one HMC per server (remove/add connections as needed if necessary).
*A single HMC/SDMC managing multiple servers might be done relatively safely if
the operations listed under triggers above are NOT done to two different servers
NOTE: Recovery will be easiest with a valid backup of the profile data. So it is extremely important to backup profile data prior to an HMC upgrade or after any configuration changes to the save area. If a profile data backup exists this problem can be rectified by restoring using:
rstprofdata -m <managedsysname> -l 3 -f <backupfilename>
In addition to user backups, profile backups can be extracted from the previous save upgrade data (DVD or disk); a backup console data (if available); or pedbg.
If a good backup does not exist, call your HMC/SDMC support to determine if recovery is possible.
A fix to prevent this from occurring is due out by the end of July, but the PTF will not fix an already corrupted save area. A follow-up notification will be sent as soon as it is available.
STG Client Care