IBM Support

OA47993: IXC615I FOR SYSGRS AND THE SYSTEM IS REMOVED WHEN THIS WAS REALLY THE VICTIM OF ANOTHER SYSTEM IN THE SYSPLEX

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • IXC633I GROUP SYSGRS MEMBER xxx JOB GRS ASID 0007 628
    CONFIRMED IMPAIRED AT xx/xx/xxxx xx:xx:xx.xxxxxx ID: 1.2
    .
    IXC636I GROUP SYSGRS MEMBER H019 JOB GRS ASID 0007
         IMPAIRED, IMPACTING CRITICAL FUNCTION GLOBAL ENQ PROCESSING
    .
    *IXC635E SYSTEM xxxx HAS IMPAIRED XCF GROUP MEMBERS
    .
    IXC615I GROUP SYSGRS MEMBER xxxx JOB GRS ASID 0007
            SFM TERMINATING SYSTEM TO RELIEVE IMPAIRMENT CONDITION
    .
    
    .
    The above messages are seen when GRS's status exit informs XCF
    that it is impaired. It makes this determination after examing
    the ISGQDR and ISGWDR tasks in the GRS address space. In some
    cases, it's possible that these tasks are not running because
    of  another system in the SYSPLEX and not because of THIS
    system. This errorneous assumption causes XCF to partition this
    system out of the SYSPLEX, leaving the real system that is
    causing the  problem still around, such that, SYSPLEX SYMPATHY
    continues.  In the case noted in the field, another system was
    capped at a  very low value, by error. This low capping resulted
    in slow downs  of signal processing, and particularly for SYSGRS
    group. However, the system was still able to update it's status,
    thus SFM did not take action against this low capped system. On
    another system, GRS needed to process the LIST DRAIN, this meant
    that the LISTLOCK was obtained and then signals were sent to all
    the other systems, including the low capped system. When that
    system failed to respond in a timely manner, the system
    initiating the LIST DRAIN appeared as though it was impaired and
    action was taken. This system was really just waiting for
    response from the low weighted system, but, this "wait" caused
    GRS to inform XCF that it appears to be impaired, and thus, XCF
    removed this system from the plex. In the case where the system
    is waiting for the LIST DRAIN, GRS should NOT consider itself
    impaired. It's still possible this system is the problem,
    however, GRS cannot make that determination definitive, and
    thus, should not make any determination at all. The problem here
    is that a perfectly healthy system was removed in an attempt to
    relieve impaired, however, that action provide no relief at all
    and actually was more detremental, since a perfectly healthy
    system, who was victim of another system, was removed.
    .
    VERIFICATION STEPS:
    -------------------
    1) If a SADMP is taken of the system that issued the IXC633I
       message for SYSGRS, then, check to see if the following two
       bits are on:
       SST_HoldForListDrain
       SST_ListLockOwned
       The SST is located via CVT+1B0?+10?+204?
                     (cvt --> gvt --> gvtx --> sst)
        .
    2) Check to see if there are signals outstanding to other
       systems via "IP XESDATA CONNECTION STR(ISGLOCK) DETAIL".
       There is a timestamp in this output for "pending" signals
       to other systems,  compare this with when the IXC633I was
       seen, to see if it's slow in getting a response.
     .
    3) Even after this system was removed via SFM, problems persist
       and root cause is determined to be another system in the
       SYSPLEX.
    .
    If the above 3 match, then is your problem. This APAR will
    prevent GRS from making this assumption that it is impaired.
    It still cannot determine the system that is causing the issue,
    however, making no decision is better than making the wrong
    decision. This will prevent a perfectly healthy system from
    being removed from the SYSPLEX.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of HBB7780 and above in            *
    *                 GRS Star Mode                                *
    ****************************************************************
    * PROBLEM DESCRIPTION: ISGXSTAX did not recognize the          *
    *                      SST_HoldForListDrain flag as an         *
    *                      indication that the delay is due to     *
    *                      the list lock as opposed to the system  *
    *                      being sick. As a result of this, it     *
    *                      was possible for a healthy system       *
    *                      to declare itself impaired when it      *
    *                      was waiting for another (sick)          *
    *                      system.                                 *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    See Problem Description
    

Problem conclusion

  • ISGXSTAX now takes SST_HoldForListDrain into account when
    ISGWDR appears stalled. If SST_HoldForListDrain is on then
    ISGXSTAX does not declare itself impaired.
     KEYWORDS: GRSSTAR/K
    

Temporary fix

Comments

APAR Information

  • APAR number

    OA47993

  • Reported component name

    CROSS SYS.EXT.S

  • Reported component ID

    5752SCIXL

  • Reported release

    790

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2015-06-02

  • Closed date

    2015-08-12

  • Last modified date

    2015-09-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UA78606 UA78607 UA78608

Modules/Macros

  • ISGXSTAX
    

Fix information

  • Fixed component name

    GRS

  • Fixed component ID

    5752SCSDS

Applicable component levels

  • R7A0 PSY UA78606

       UP15/08/26 P F508

  • R780 PSY UA78607

       UP15/08/26 P F508

  • R790 PSY UA78608

       UP15/08/26 P F508

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"790","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"790","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
01 September 2015