A fix is available
APAR status
Closed as program error.
Error description
The SCI on the remote system (the system that had the zombie member in its list) was suspended for about 3 seconds. It was in the time gap that the zombie member registered, entered a command, and deregistered. When the remote SCI resumed, it processed the registration messages sent by the other SCI, but did not receive the deregsiter notification (this uses the XCF group notification) since XCF threw the notification away. SCI needs to send a deregistration notification to authorized IMSplex members and to the remote SCIs and not depend on the XCF group notification.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * All IMS 14 SCI users. * **************************************************************** * PROBLEM DESCRIPTION: * * SCI misses a remote deregistration due to XCF not driving * * the notify exit for the event. * **************************************************************** * RECOMMENDATION: * * INSTALL CORRECTIVE SERVICE FOR APAR/PTF * **************************************************************** SCI depends on the the XCF group exit to notify authorized members that other authorized members have deregistered (both normal and abnormal). However, XCF will occasionally throw a notification away if XCF determines it doesn't need to be sent. For example, if a member JOINs and LEAVEs the XCF group while another member of the XCF is suspended, XCF may not drive drive the group exit for the LEAVE event since it never drove the group exit for the JOIN event. SCI sends a message to the other SCIs when an authorized member registers with SCI, but it depends on XCF to notify the other SCIs that an authorized member deregisters (SCI also depends on XCF to drive the group exit to notify other authorized members of the deregister event). If XCF does not drive the group exit for a LEAVE event, SCI does not know that the member deregistered. It will continue to retain the member in its routing tables and will continue to route messages and requests that are destined for that member and return the member (if appropriate) in the output for CSLSCQRY services. Note that the missing XCF group notifications have only been reported for remote authorized members, not for local authorized members. PI45509 and PI46287 added code that enables SCI to tolerate this condition. With the two APARs applied, SCI will detect and resolve differences in member lists between SCIs in the same IMSplex. However, it did not fix the base problem that led to the condition in the reported case. This APAR resolves that problem.
Problem conclusion
This APAR adds a secondary path to send notifications of authorized member deregistration to remote IMSplex members. This includes both the processing of the deregistration by the remote SCIs and notifing remote authorized members of the deregistration event. The primary path for the remote deregistration will continue to use the XCF group exit to send the deregistration event notification to remote members. SCI notifies other SCIs in the IMSplex about a non-authorized member deregistration by sending a message over XCF to the other SCIs in the IMSplex. Each SCI notifies all local members about the deregistration. With this APAR applied, SCI will do the following when a local authorized member deregisters: - Hold up processing the deregistration until SCI's XCF group exit is driven or approximately four seconds. - Send a message over XCF to the other SCIs, informing them of the deregistration. When this message is received, SCI will either: - ignore the message if SCI's XCF group has been driven for this deregistration. - Process the deregistration if SCI's XCF group exit has not been driven within approximately four seconds. Additional RAS items in this APAR: The request timeout interval is changed from 10 to 5 seconds. This will decrease SCI shutdown time and brings it in line with the other CSL address spaces. The ASID of the registering member is now passed to the remote SCIs. This provides additional diagnostics data on the remote members of the IMSplex. The correct AWE function is now passed to CSLSRGS0 for remote deregistration events. This will not change what is passed in the member notify exits, but it will ensure the correct function is traced and set in the Member History Table. The RFML (ReFresh Member List) trace code (x'82') is now decoded when formatted in a dump. The Member History Table entry formatting now includes the registration time. If the table has no entries, a message is now printed instead of an empty entry. Macros Changed ============== CSLSANCH Change request timeout from 10 seconds to 5 seconds. This decreases SCI shutdown time and brings it in line with the other CSL address spaces. CSLSAWXI Add a byte that indicates the source of the AWE. CSLSCMBR Add a flag that indicates CSLSXGP0 was driven. CSLSCPRG Add the mapping for the CSLSSR40 parameter list. CSLSRMBR Add a flag that indicates CSLSXGP0 was driven. CSLSTRC Add trace codes for RMBR disposition in CSLSHRX0 and enqueue to CSLSXCM0. CSLSXCMG Add ASID to the registration message. Modules Changed =============== CSLSDR20 Obtain a cell and use it for a new parameter list that is passed to CSLSSR40. CSLSFM00 Print registration time with the Member History Table entry. Print message instead of empty entry when table is empty. CSLSFM1F Recompile to pick up CMBR and RMBR changes. CSLSHRX0 Set a flag that indicates the disposition of the RMBR block. Trace the module flag byte and three bytes of RMBR_CSWORD. CSLSRGS0 If this is for a remote deregistration, ensure the correct function is set. Send a deregistration message for authorized members in addition to non-authorized members. If SCI's XCF group exit has not been driven, drive the notify exits of all members. Pass ASID in registration message. Pull ASID from message for remote registrations. CSLSRM20 Obtain a cell and use it for a new parameter list that is passed to CSLSSR40. CSLSSR40 Write a trace entry at the start of the module. Set the AWXI source before enqueueing the AWE. CSLSTFM0 Add decode information for trace code x'82'. CSLSTOT0 Set the AWXI source before enqueueing the AWE. CSLSXCM0 Add a function that processes deregistration AWEs. The new function does the following: - Look up the member in the token hash table. If not found, release the AWE. - If the member is non-authorized, enqueue the AWE to CSLSRGS0. - If the member is authorized and SCI's XCF group exit has been driven or the AWE has been given to BPETIMER four times, either release the AWE (remote member) or enqueue the AWE to CSLSRGS0 (local member). Otherwise, give the AWE to the BPETIMER services to wait one second and reenqueue the AWE to CSLSXCM0. CSLSXGP0 If this is a deregistration event and it is for a local member, set the 'CSLSXGP0 driven' flag in the CMBR. Set the AWXI source before enqueueing the AWE for remote members. CSLSXMG0 For authorized member deregistration messages, enqueue the AWE to CSLSXCM0. Consolidate the AWE enqueue error paths. CSLSCMBR Add a flag that indicates CSLSXGP0 was driven. The following publication updates are made by this APAR: IMS Version 14: Exit Routines SC19-4217-00 In Chapter 14, CSL SCI Notify Client exit routine, in the note at the end of the description at the start of this section, add the following paragraph: In rare cases, authorized members may receive two notifications of an authorized member deregistration. If XCF notifications are delayed for some reason, SCI has a secondary path for authorized member deregistration that may be used. If the XCF notification then occurs, a second notification may be sent.
Temporary fix
Comments
APAR Information
APAR number
PI59893
Reported component name
IMS V14
Reported component ID
5635A0500
Reported release
400
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-03-29
Closed date
2017-01-23
Last modified date
2017-02-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
PI74814 UI44138
Modules/Macros
CSLSXCM0 CSLSRGS0 CSLSTOT0 CSLSFM1F CSLSSR40 CSLSXGP0 CSLSTFM0 CSLSXMG0 CSLSTRC CSLSHRX0 CSLSFM00 CSLSDR20 CSLSRM20 CSLSPLR0
Fix information
Fixed component name
IMS V14
Fixed component ID
5635A0500
Applicable component levels
R400 PSY UI44138
UP17/01/27 P F701
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPH2","label":"IMS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"14.1","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]
Document Information
Modified date:
01 December 2023