A fix is available
APAR status
Closed as new function.
Error description
1. Global Online change diagnostic information is very limited. This is especially true when the error is during COMMIT3 and some IMSs have completed online change and do not have an online change control block (MWA). 2. A TERMINATE OLC issued after the OLCSTAT is updated cannot terminate the OLC. The completion code returned to the user is not specific about the situation and the actions the user needs to make. The purpose of this APAR will be to: - Add diagnostic information to help us identify the problems better. - Add additional messages or completion codes or return codes that will inform the user of the actions to take in case of errors during the online change process. - Add an indication if IMS is in COMMIT3. - Add the COMMIT3 status to QUERY MEMBER.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: IMS V10 global online change users. * **************************************************************** * PROBLEM DESCRIPTION: A global online change that times out * * during commit phase 3 doesn't show * * enough information to diagnose * * what happened or what action to * * take next. * **************************************************************** * RECOMMENDATION: INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** A global online change was attempted. The commit appeared to hang. The dump of the IMS showed that the commit master sent commit phase 3 to the other IMS but never got a response back. The other IMS is no longer in an online change state, which implies commit phase 3 worked on the other IMS. The command timeout value shown in the dump is 20 seconds. The IMS command master uses this same timeout value when sending the commit phase 3 process step to RM to coordinate with the other IMSs. It is possible in this situation that the response didn't get back to RM within the timeout value, so that RM notified the IMS commit master of an error and the commit master quit, leaving itself in an online change state. The IMS commit master leaves around the online change control block (MWA) and skips doing commit phase 3 locally, if it detects an error in the commit phase 3 step sent to the other IMSs. This situation could be resolved with another COMMIT command and a longer command timeout value. There weren't enough diagnostics in the dump to tell exactly what the problem was. This APAR adds more diagnostics to online change. Additional problem: If an IMS fails online change prepare because it couldn't get an MWA, it returns a return code of blanks instead of 60, which means couldn't GETMAIN storage. IMS could also abend, because it sets the bad completion code into the MWA control block that was not gotten. Additional problem: If an IMS commit master fails during commit phase 3 because the responses from the other IMSs didn't make back to RM before they timed out and a TERMINATE OLC command is routed to one of those other IMSs that are no longer in an online change state, the online change cannot be committed nor terminated, so they are stuck in an online change state.
Problem conclusion
Temporary fix
Comments
POSTREQ PK77200 COMMAND REFERENCE Volume 1 SC18-9700-00 INITIATE OLC Command Return and Reason Code table Add return code X'00000010' reason code X'0000410D' with the following meaning: Online change prepare has already been done. Another prepare command is not allowed. The two commands that are allowed when IMS is in this state are INITIATE OLC PHASE(COMMIT) or TERMINATE OLC. Add return code X'00000010' reason code X'0000410E' with the following meaning: Online change prepare has not been done. An online change prepare command must complete successfully before a commit command is attempted. Add return code X'00000010' reason code X'0000410F' with the following meaning: The online change is already committed, so it cannot be terminated. However, some kind of error occurred that prevented the commit from fully completing. For example, one of the IMSs participating in the online change was unable to complete commit phase 2 or commit phase 3, or an IMS response to RM timed out and the commit master could not determine whether the online change commit succeeded on the other IMSs. Another commit command is required to complete the online change commit. Add return code X'00000014' reason code X'0000510C' with the following meaning: Another RM process step is in progress. If this reason code is returned after a previous COMMIT command timed out, try the COMMIT command again. INIT OLC error handling Add a new state to the list: * One IMS in commit phase 3 failed state If an INITIATE OLC PHASE(COMMIT) command fails during commit phase 3 where the COMMIT command master did not receive ok responses from the other IMSs, the commit phase 3 phase fails and the master exits with an error, leaving itself in an online change state. Another INITIATE OLC PHASE(COMMIT) command routed to the master, perhaps with a longer command timeout value is needed to complete the commit phase 3 cleanup of online change information. QUERY MEMBER Status for QUERY MEMBER command Add the following new status: Status Scope Meaning OLCCMT3I LCL or GBL Online change commit phase 3 is in progress. An INITIATE OLC PHASE(COMMIT) command is entered. Online change commit phase 3 is in progress either locally for this IMS or globally for all the IMSs in the IMSplex. The online change is committed, but commit phase 3 is needed to clean up online change information on all the IMSs. OLCCMT3C GBL Online change commit phase 3 completed. An INITIATE OLC PHASE(COMMIT) command is entered. Online change commit phase 3 is completed globally on the other IMSs except for the master. The COMMIT master still needs to perform commit phase 3 locally. The online change is committed, but commit phase 3 is still needed to clean up the online change information on all the IMSs. OLCCMT3F GBL Online change commit phase 3 failed. An INITIATE OLC PHASE(COMMIT) command is entered. Online change commit phase 3 failed globally on the other IMSs. The master skips attempting to perform commit phase 3 locally and exits with an error, leaving itself in a global online change state. The other IMSs may or may not have actually completed commit phase 3. Issue another COMMIT command to the previous COMMIT command master to complete the online change. DFSCMDRR is changed to add new online change reason codes 410D (olc prepare already done), 410E (olc prepare not done), 410F (olc already committed). DFSMWA is changed to add bits indicating commit phase 3 in progress globally, commit phase 3 complete globally, and commit phase 3 in progress locally It uses an available word to contain a non-zero completion code received from another IMS. It uses an available 8 bytes to save the online change phase eyecatcher. It uses an available word to record the number of IMSs participating in the online change. DFSIQ060 is changed to put out new global status OLCCMT3I (commit phase 3 in progress globally), OLCCMT3C (commit phase 3 in completed globally), OLCCMT3F (commit phase 3 failed globally), and new local status OLCCMT3I (commit phase 3 in progress locally). DFSOLC00 is changed to set bits indicating commit phase 3 in progress globally, commit phase 3 complete globally, commit phase 3 in progress locally, commit phase 3 complete locally in the MWA. Logic is added to commit to proceed even if the OLCSTAT wasn't locked, if commit phase 3 failed previously, so that commit phase 3 can be retried. It also saves the online change phase as an 8-byte eyecatcher in the MWA (PREPARE, COMMIT1, COMMIT2, COMMIT3, QSCMBR, CPYMBR, CPYEND, CMTMBR1, CMTMB2). It also saves the RM process return code, reason code, and a non-zero completion code from the last IMS that detected an error in the MWA. It also saves the command return code and reason code to be returned to the user for INIT OLC PHASE(PREPARE), INIT OLC PHASE(COMMIT), and TERMINATE OLC in the MWA. TERMINATE OLC is changed to delete the MWA, if the terminate failed because at least 1 other IMS was already in an online change committed state. The reason code 4110 (IRSN_NOOLC) means that the command is not applicable to the online change state, which is not very helpful. Most of the code that issues this reason code is changed to issue the new, more helpful reason codes 410D (OLC prepare already done), 410E (OLC prepare not done), and 410F (OLC already committed). DFSOLC20 is changed to not set the completion code in the MWA, if it was getting the MWA storage that failed. It is set in the command reason code field instead, so that the completion code 60 (ICC_GETMAIN) is returned in the command output instead of blanks. DFSORCT0 is changed to add reason codes 410D, 410E, 410F and to change the 4110 reason code text from "not in online change state" to "not applicable to online change state". In some cases, the reason code text "not in an online change state" appears when IMSs are in an online change state, so this is confusing. It is really trying to say that the command does not apply to the online change state of the IMS, which may also include not being in an online change state. Reason code 410D means that online change prepare has already been done. Reason code 410E means that online change prepare has not been done, so the commit command is not applicable. Reason code 410F means that the online change is already committed and cannot be terminated. At least one IMS wasn't able to perform commit phase 2 or 3 and is still in an online change state. Another COMMIT command is needed to complete the online change. ×**** PE08/12/09 FIX IN ERROR. SEE APAR PK77200 FOR DESCRIPTION
APAR Information
APAR number
PK52215
Reported component name
IMS V10
Reported component ID
5635A0100
Reported release
010
Status
CLOSED UR1
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2007-09-04
Closed date
2008-02-21
Last modified date
2009-01-15
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK33912
Modules/Macros
DFSCMDRR DFSIQ060 DFSMWA DFSOLC00 DFSOLC20 DFSORCT0
| SC18970000 |
Fix information
Fixed component name
IMS V10
Fixed component ID
5635A0100
Applicable component levels
R010 PSY UK33912
UP08/02/29 P F802
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSCVRBJ","label":"System Services"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
15 January 2009