IBM Support

VM66125: ABENDHTT001 CAUSED BY DYNAMIC MICROCODE CHANGE ABENDPRG004 ABENDPRG006

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Potential failures while applying dynamic MCL(s). Various hard
    abends (HTT001, PRG004, PRG006) are possible when a concurrent
    microcode change that adds or removes available facilities is
    made to a machine while z/VM is running.
    .  
    If you apply MCLs concurrently to a CEC containing a z/VM system
    there is a potential for a z/VM hard abend. This APAR (VM66125)
    has been created to address an internal problem to z/VM during
    recalculation of the architecture domains during dynamic
    changes.   This problem has existed in prior releases of z/VM
    (V6.2 & V6.3) and has just now been identified internally.  
    .
    When a concurrent MCL is applied, if it changes the capability
    of the CEC, then the operating systems will be notified as
    appropriate about the changes. Not all MCLs result in a change
    of the capabilities. This change in capability causes z/VM to do
    processing for architecture virtualization. A hard ABEND can
    occur during this processing due to timing.   The failure is
    more prevalent in an SSI environment but it is theoretically
    possible to happen also with a Non-SSI environment. In an SSI
    cluster, the abend can occur on any of the members of the
    cluster, not just those on the CEC where the MCL is being
    applied. 
    .
    USERS AFFECTED for this APAR are all z/VM Users utilizing
    dynamic MCL changes to HW.
    All HW models are affected.
    .
    PROBLEM RESOLUTION for this APAR will be to Apply the APAR
    prior to dynamically adding MCLs.
    .
    

Local fix

  • Shutdown z/VM systems prior to applying MCL.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All z/VM users utilizing dynamic MCL changes *
    *                 to hardware.  All hardware models affected.  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION: APPLY PTF                                    *
    ****************************************************************
    ABENDHTT001 ABENDPRG004 ABENDPRG006
    These system abends are possible on z/VM systems when a
    dynamic upgrade (or downgrade) of the microcode on the
    underlying machine is done.
    
    The problem is in the code that runs when the machine notifies
    z/VM that the microcode has changed and z/VM determines that
    the facilities available on the machine have changed due to
    the new microcode.  The code that can fail is changing the
    system virtual architecture control blocks called ARDBKs for
    each relocation domain that includes the upgraded machine.
    
    Some of this job of changing the ARDs is done asynchronously
    via stacked console function mode (CFM) tasks.  These tasks do
    the job of switching guests from their old virtual architecture
    level description (ARD) to the new, upgraded one.  The problem
    is the old ARD can be deleted before all of those stacked CFM
    tasks have completed the job.  This leads to the remaining
    stacked CFM tasks attempting to access ARDs that have already
    been returned to free storage.  This is timing related and
    therefore does not always occur.
    

Problem conclusion

  • The fix consists of three parts:
    1. Change HCPRDMLU to ensure the same ARD is not processed
       twice.
    2. Add a counter of pending VMDBK connections to the header
       of the ARDBK.  This is incremented whenever a CFM call is
       stacked to HCPRDMAV and decremented when HCPRDMAV executes.
       Any place that deletes an ARDBK is changed to verify this is
       zero before getting rid of the ARDBK.
    3. In case of two interleaved upgrades occurring together,
       add a pointer in the VMDBK to the latest ARD that is to
       be connected.  This new field is checked by the code in
       HCPCPU that is adding a new virtual processor to the
       configuration.  If it is non-zero, there is a pending
       connection outstanding, and the new VMDBK should be
       connected to the new ARD rather than the old one that will
       be going away when all the calls to HCPRDMAV are unstacked.
       This interleaving of two microcode upgrades can happen when
       a Single System Image (SSI) cluster has two members on the
       machine that is being upgraded.  They are both notified of
       the microcode change, and both start the ARD recalculation
       process together.  This configuration makes it more likely
       for this problem to occur.
    

Temporary fix

  • *********
    * HIPER *
    *********
    FOR RELEASE VM/ESA CP/ESA R640 :
    PREREQ: VM65986 VM65396
    CO-REQ: NONE
    IF-REQ: NONE
    

Comments

APAR Information

  • APAR number

    VM66125

  • Reported component name

    VM CP

  • Reported component ID

    568411202

  • Reported release

    640

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-02-02

  • Closed date

    2018-03-19

  • Last modified date

    2018-12-14

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UM35287

Modules/Macros

  • HCPARD   HCPARDBK HCPCPU   HCPPLM   HCPRDM   HCPVMDBK
    

Fix information

  • Fixed component name

    VM CP

  • Fixed component ID

    568411202

Applicable component levels

  • R640 PSY UM35287

       UP18/03/19 P 1802 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG27M","label":"APARs - z\/VM environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"640","Edition":"","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]

Document Information

Modified date:
14 December 2018