IBM Support

PK52215: ADD ADDITIONAL DIAGNOSTICS TO HELP IDENTIFY GLOBAL ONLINE CHANGE PROBLEMS.

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as new function.

Error description

  • 1. Global Online change diagnostic information is very limited.
    This is especially true when the error is during COMMIT3 and
    some IMSs have completed online change and do not have an online
    change control block (MWA).
    2. A TERMINATE OLC issued after the OLCSTAT is updated cannot
    terminate the OLC. The completion code returned to the user is
    not specific about the situation and the actions the user needs
    to make.
    The purpose of this APAR will be to:
    - Add diagnostic information to help us identify the problems
    better.
    - Add additional messages or completion codes or return codes
    that will inform the user of the actions to take in case of
    errors during the online change process.
    - Add an indication if IMS is in COMMIT3.
    - Add the COMMIT3 status to QUERY MEMBER.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: IMS V10 global online change users.          *
    ****************************************************************
    * PROBLEM DESCRIPTION: A global online change that times out   *
    *                      during commit phase 3 doesn't show      *
    *                      enough information to diagnose          *
    *                      what happened or what action to         *
    *                      take next.                              *
    ****************************************************************
    * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF      *
    ****************************************************************
      A global online change was attempted.  The commit appeared to
      hang.  The dump of the IMS showed that the commit master
      sent commit phase 3 to the other IMS but never got a response
      back.  The other IMS is no longer in an online change state,
      which implies commit phase 3 worked on the other IMS.
      The command timeout value shown in the dump is 20 seconds.
      The IMS command master uses this same timeout value when
      sending the commit phase 3 process step to RM to coordinate
      with the other IMSs.  It is possible in this situation that
      the response didn't get back to RM within the timeout value,
      so that RM notified the IMS commit master of an error and
      the commit master quit, leaving itself in an online change
      state.  The IMS commit master leaves around the online change
      control block (MWA) and skips doing commit phase 3 locally,
      if it detects an error in the commit phase 3 step sent to the
      other IMSs.  This situation could be resolved with another
      COMMIT command and a longer command timeout value.
      There weren't enough diagnostics in the dump to tell exactly
      what the problem was.  This APAR adds more diagnostics to
      online change.
    
      Additional problem:
      If an IMS fails online change prepare because it couldn't
      get an MWA, it returns a return code of blanks instead of
      60, which means couldn't GETMAIN storage.  IMS could
      also abend, because it sets the bad completion code into
      the MWA control block that was not gotten.
    
      Additional problem:
      If an IMS commit master fails during commit phase 3 because
      the responses from the other IMSs didn't make back to RM
      before they timed out and a TERMINATE OLC command is
      routed to one of those other IMSs that are no longer in
      an online change state, the online change cannot be
      committed nor terminated, so they are stuck in an online
      change state.
    

Problem conclusion

Temporary fix

Comments

  • POSTREQ PK77200
    
      COMMAND REFERENCE Volume 1 SC18-9700-00
        INITIATE OLC Command Return and Reason Code table
        Add return code X'00000010' reason code X'0000410D' with
        the following meaning:
        Online change prepare has already been done.  Another
        prepare command is not allowed.   The two commands
        that are allowed when IMS is in this state are
        INITIATE OLC PHASE(COMMIT) or TERMINATE OLC.
    
        Add return code X'00000010' reason code X'0000410E' with
        the following meaning:
        Online change prepare has not been done.  An online
        change prepare command must complete successfully
        before a commit command is attempted.
    
        Add return code X'00000010' reason code X'0000410F' with
        the following meaning:
        The online change is already committed, so it cannot be
        terminated.  However, some kind of error occurred that
        prevented the commit from fully completing.  For example,
        one of the IMSs participating in the online change
        was unable to complete commit phase 2 or commit phase
        3, or an IMS response to RM timed out and the
        commit master could not determine whether the online
        change commit succeeded on the other IMSs.
        Another commit command is required to complete the
        online change commit.
    
        Add return code X'00000014' reason code X'0000510C' with
        the following meaning:
        Another RM process step is in progress.  If this reason
        code is returned after a previous COMMIT command timed
        out, try the COMMIT command again.
    
        INIT OLC error handling
        Add a new state to the list:
        * One IMS in commit phase 3 failed state
          If an INITIATE OLC PHASE(COMMIT) command fails during
          commit phase 3 where the COMMIT command master did not
          receive ok responses from the other IMSs, the commit
          phase 3 phase fails and the master exits with an
          error, leaving itself in an online change state.
          Another INITIATE OLC PHASE(COMMIT) command routed to
          the master, perhaps with a longer command timeout
          value is needed to complete the commit phase 3
          cleanup of online change information.
    
        QUERY MEMBER
        Status for QUERY MEMBER command
        Add the following new status:
        Status     Scope      Meaning
        OLCCMT3I   LCL or GBL Online change commit phase 3 is in
                              progress.  An INITIATE OLC
                              PHASE(COMMIT) command is entered.
                              Online change commit phase 3 is in
                              progress either locally for this IMS
                              or globally for all the IMSs in the
                              IMSplex.  The online change is
                              committed, but commit phase 3 is
                              needed to clean up online change
                              information on all the IMSs.
    
        OLCCMT3C   GBL        Online change commit phase 3
                              completed.  An
                              INITIATE OLC PHASE(COMMIT) command is
                              entered.  Online change commit phase 3
                              is completed globally on the other
                              IMSs except for the master.   The
                              COMMIT master still needs to perform
                              commit phase 3 locally.  The online
                              change is committed, but commit phase
                              3 is still needed to clean up the
                              online change information on all the
                              IMSs.
    
        OLCCMT3F   GBL        Online change commit phase 3 failed.
                              An INITIATE OLC PHASE(COMMIT) command
                              is entered.  Online change commit
                              phase 3 failed globally on the other
                              IMSs.  The master skips attempting to
                              perform commit phase 3 locally and
                              exits with an error, leaving itself
                              in a global online change state.
                              The other IMSs may or may not have
                              actually completed commit phase 3.
                              Issue another COMMIT command to the
                              previous COMMIT command master to
                              complete the online change.
    
    
      DFSCMDRR is changed to add new online change
      reason codes 410D (olc prepare already done),
      410E (olc prepare not done),
      410F (olc already committed).
    
      DFSMWA is changed to add bits indicating commit phase 3
      in progress globally, commit phase 3 complete globally,
      and commit phase 3 in progress locally
      It uses an available word to contain a non-zero
      completion code received from another IMS.
      It uses an available 8 bytes to save the online
      change phase eyecatcher.
      It uses an available word to record the number of IMSs
      participating in the online change.
    
      DFSIQ060 is changed to put out new global status
      OLCCMT3I (commit phase 3 in progress globally),
      OLCCMT3C (commit phase 3 in completed globally),
      OLCCMT3F (commit phase 3 failed globally),
      and new local status
      OLCCMT3I (commit phase 3 in progress locally).
    
      DFSOLC00 is changed to set bits indicating commit phase 3
      in progress globally, commit phase 3 complete globally,
      commit phase 3 in progress locally, commit phase 3 complete
      locally in the MWA.
      Logic is added to commit to proceed even if the OLCSTAT
      wasn't locked, if commit phase 3 failed previously,
      so that commit phase 3 can be retried.
      It also saves the online change phase as an 8-byte
      eyecatcher in the MWA (PREPARE, COMMIT1, COMMIT2, COMMIT3,
      QSCMBR, CPYMBR, CPYEND, CMTMBR1, CMTMB2).
      It also saves the RM process return code, reason code,
      and a non-zero completion code from the last IMS that
      detected an error in the MWA.
      It also saves the command return code and reason
      code to be returned to the user for INIT OLC PHASE(PREPARE),
      INIT OLC PHASE(COMMIT), and TERMINATE OLC in the MWA.
      TERMINATE OLC is changed to delete the MWA, if the terminate
      failed because at least 1 other IMS was already in an online
      change committed state.
      The reason code 4110 (IRSN_NOOLC) means that the
      command is not applicable to the online change state, which
      is not very helpful.  Most of the code that issues this
      reason code is changed to issue the new, more helpful
      reason codes 410D (OLC prepare already done),
      410E (OLC prepare not done), and 410F (OLC already committed).
    
      DFSOLC20 is changed to not set the completion code in the MWA,
      if it was getting the MWA storage that failed.  It is set
      in the command reason code field instead, so that the
      completion code 60 (ICC_GETMAIN) is returned in the
      command output instead of blanks.
    
      DFSORCT0 is changed to add reason codes 410D, 410E, 410F and
      to change the 4110 reason code text from
      "not in online change state"
      to
      "not applicable to online change state".
      In some cases, the reason code text "not in an online
      change state" appears when IMSs are in an online
      change state, so this is confusing.
      It is really trying to say that the command does
      not apply to the online change state of the IMS,
      which may also include not being in an online change
      state.
      Reason code 410D means that online change prepare has
      already been done.
      Reason code 410E means that online change prepare has
      not been done, so the commit command is not applicable.
      Reason code 410F means that the online change is
      already committed and cannot be terminated.  At least
      one IMS wasn't able to perform commit phase 2 or 3
      and is still in an online change state.  Another
      COMMIT command is needed to complete the online change.
    ×**** PE08/12/09 FIX IN ERROR. SEE APAR PK77200  FOR DESCRIPTION
    

APAR Information

  • APAR number

    PK52215

  • Reported component name

    IMS V10

  • Reported component ID

    5635A0100

  • Reported release

    010

  • Status

    CLOSED UR1

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2007-09-04

  • Closed date

    2008-02-21

  • Last modified date

    2009-01-15

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UK33912

Modules/Macros

  • DFSCMDRR DFSIQ060 DFSMWA   DFSOLC00 DFSOLC20
    DFSORCT0
    

Publications Referenced
SC18970000    

Fix information

  • Fixed component name

    IMS V10

  • Fixed component ID

    5635A0100

Applicable component levels

  • R010 PSY UK33912

       UP08/02/29 P F802

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
15 January 2009