A fix is available
APAR status
Closed as program error.
Error description
Potential failures while applying dynamic MCL(s). Various hard abends (HTT001, PRG004, PRG006) are possible when a concurrent microcode change that adds or removes available facilities is made to a machine while z/VM is running. .   If you apply MCLs concurrently to a CEC containing a z/VM system there is a potential for a z/VM hard abend. This APAR (VM66125) has been created to address an internal problem to z/VM during recalculation of the architecture domains during dynamic changes.   This problem has existed in prior releases of z/VM (V6.2 & V6.3) and has just now been identified internally.   . When a concurrent MCL is applied, if it changes the capability of the CEC, then the operating systems will be notified as appropriate about the changes. Not all MCLs result in a change of the capabilities. This change in capability causes z/VM to do processing for architecture virtualization. A hard ABEND can occur during this processing due to timing.   The failure is more prevalent in an SSI environment but it is theoretically possible to happen also with a Non-SSI environment. In an SSI cluster, the abend can occur on any of the members of the cluster, not just those on the CEC where the MCL is being applied.  . USERS AFFECTED for this APAR are all z/VM Users utilizing dynamic MCL changes to HW. All HW models are affected. . PROBLEM RESOLUTION for this APAR will be to Apply the APAR prior to dynamically adding MCLs. .
Local fix
Shutdown z/VM systems prior to applying MCL.
Problem summary
**************************************************************** * USERS AFFECTED: All z/VM users utilizing dynamic MCL changes * * to hardware. All hardware models affected. * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** ABENDHTT001 ABENDPRG004 ABENDPRG006 These system abends are possible on z/VM systems when a dynamic upgrade (or downgrade) of the microcode on the underlying machine is done. The problem is in the code that runs when the machine notifies z/VM that the microcode has changed and z/VM determines that the facilities available on the machine have changed due to the new microcode. The code that can fail is changing the system virtual architecture control blocks called ARDBKs for each relocation domain that includes the upgraded machine. Some of this job of changing the ARDs is done asynchronously via stacked console function mode (CFM) tasks. These tasks do the job of switching guests from their old virtual architecture level description (ARD) to the new, upgraded one. The problem is the old ARD can be deleted before all of those stacked CFM tasks have completed the job. This leads to the remaining stacked CFM tasks attempting to access ARDs that have already been returned to free storage. This is timing related and therefore does not always occur.
Problem conclusion
The fix consists of three parts: 1. Change HCPRDMLU to ensure the same ARD is not processed twice. 2. Add a counter of pending VMDBK connections to the header of the ARDBK. This is incremented whenever a CFM call is stacked to HCPRDMAV and decremented when HCPRDMAV executes. Any place that deletes an ARDBK is changed to verify this is zero before getting rid of the ARDBK. 3. In case of two interleaved upgrades occurring together, add a pointer in the VMDBK to the latest ARD that is to be connected. This new field is checked by the code in HCPCPU that is adding a new virtual processor to the configuration. If it is non-zero, there is a pending connection outstanding, and the new VMDBK should be connected to the new ARD rather than the old one that will be going away when all the calls to HCPRDMAV are unstacked. This interleaving of two microcode upgrades can happen when a Single System Image (SSI) cluster has two members on the machine that is being upgraded. They are both notified of the microcode change, and both start the ARD recalculation process together. This configuration makes it more likely for this problem to occur.
Temporary fix
********* * HIPER * ********* FOR RELEASE VM/ESA CP/ESA R640 : PREREQ: VM65986 VM65396 CO-REQ: NONE IF-REQ: NONE
Comments
APAR Information
APAR number
VM66125
Reported component name
VM CP
Reported component ID
568411202
Reported release
640
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-02-02
Closed date
2018-03-19
Last modified date
2018-12-14
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UM35287
Modules/Macros
HCPARD HCPARDBK HCPCPU HCPPLM HCPRDM HCPVMDBK
Fix information
Fixed component name
VM CP
Fixed component ID
568411202
Applicable component levels
R640 PSY UM35287
UP18/03/19 P 1802 ¢
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG27M","label":"APARs - z\/VM environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"640","Edition":"","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]
Document Information
Modified date:
14 December 2018