IBM Support

OA56231: STRUCTURE REBUILD HANG FOLLOWING LOSSCONN EVENT UNDER MSGBASED PROTOCOL

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Following a CF failure, structure rebuild hangs were encountered
    showing outstanding connector responsces as evident by D XCF,STR
    output when using CFLCRMGMT. In order to be exposed to
    this issue, there would have to be a loss of CF connectivity
    (IXC528I) and also the loss of a system in a close timeframe.
    The issue also involves a timing element to it.
    
    ANALYSIS:
    Following loss of CF connectivity, a CF lossconn event will
    drive structure rebuild activity.
    
    A problem was found where the CFRM message-based manager system
    may not free EMB control blocks following a LOSSCONN event
    signal.
    
    KNOWN IMPACT:
    Structure rebuilds hang.
    
    Issuing SETXCF STOP,MSGBASED followed by
    SETXCF START,MSGBASED addresses the rebuild hangs.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users in a Parallel Sysplex environment                      *
    * with the optional XCF function CFLCRMGMT                     *
    * enabled.                                                     *
    *                                                              *
    * SYSPLEXDS                                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * Due to timing or lack of XCF signaling                       *
    * connectivity, coupling facility (CF)                         *
    * structure connector termination (for                         *
    * example, disconnect) before the system                       *
    * is removed from the sysplex may cause                        *
    * the XCF MSGBASED manager system to                           *
    * retain residual control blocks if the                        *
    * system is removed from the sysplex and                       *
    * the struture is no longer allocated.                         *
    * The residual control blocks may cause                        *
    * CF LOSSCONN RECOVERY MANAGEMENT to                           *
    * pause, which will cause any XCF                              *
    * REALLOCATE process to hang.                                  *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Fix should be installed via rolling IPL.                     *
    * Full functionality of the fix will not                       *
    * be available until it is installed on                        *
    * every system in the sysplex. The fix for                     *
    * this APAR need only be installed on the                      *
    * system assigned the role of XCF CFRM                         *
    * MESSAGE-BASED MANAGER. However, when                         *
    * that system is removed from the sysplex,                     *
    * the role is reassigned to an active                          *
    * system in the sysplex. An IPL is                             *
    * required to activate this fix on each                        *
    * system of the SYSPLEX, however, a                            *
    * rolling IPL is sufficient to accomplish                      *
    * the activation.                                              *
    *                                                              *
    * Before the fix is installed on every                         *
    * system in the sysplex, system commands                       *
    * SETXCF STOP,MSGBASED and SETXCF                              *
    * START,MSGBASED can be used on a system                       *
    * with the fix installed to make the fix                       *
    * effective. The role of the XCF CFRM                          *
    * MESSAGE-BASED manager system is assumed                      *
    * by the system that successfully executes                     *
    * the SETXCF START,MSGBASED operator                           *
    * command.                                                     *
    ****************************************************************
    The XCF MSGBASED manager system incorrectly relies on a CF
    structure connector termination event signal to clean up
    event control blocks for a structure. Due to timing or lack
    of XCF signaling connectivity, such an event signal may not
    be delivered to the manager system when another system is
    removed from the sysplex. If the structure is not allocated
    during system termination, the manager system may not clean
    up residual control blocks. The residual control blocks cause
    the manager system to process CF LOSSCONN RECOVERY MANAGEMENT
    as if a system is hung - causing CF LOSSCONN RECOVERY
    MANAGEMENT to pause. When CF LOSSCONN RECOVERY MANAGEMENT is
    paused, REALLOCATE processing will hang.
    

Problem conclusion

  • Corrected XCF system termination processing to clean up
    residual XCF MSGBASED manager control blocks for CF structures
    that are not allocated.
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA56231

  • Reported component name

    XCF

  • Reported component ID

    5752SCXCF

  • Reported release

    7B0

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2018-10-02

  • Closed date

    2019-10-10

  • Last modified date

    2019-11-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UJ01071 UJ01072 UJ01073

Modules/Macros

  • IXCL2RSR
    

Fix information

  • Fixed component name

    XCF

  • Fixed component ID

    5752SCXCF

Applicable component levels

  • R7A0 PSY UJ01073

       UP19/10/30 P F910 ¢

  • R7B0 PSY UJ01071

       UP19/10/30 P F910 ¢

  • R7C0 PSY UJ01072

       UP19/10/30 P F910 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7B0","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7B0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
01 November 2019