A fix is available
APAR status
Closed as program error.
Error description
Running with the SMAPI server environment, frequently, the SMAPI worker servers, VSMWORK1, VSMWORK2, VSMWORK3, etc, ABEND with a protection exception. For example, VSMWORK2 ABENDs with: . 23:27:18 DMSITP144T Protection exception occurred at 8114B4EC in routine RXDMSSIT while UFDBUSY = 01; re-IPL CMS 23:27:18 * MSG FROM VSMWORK2: DMSDIE3550I All APPC/VM and IUCV paths have been severed. 23:27:18 HCPMFS057I VSMWORK2 not receiving; disconnected 23:27:18 DMSWSP314W Automatic re-IPL by CP due to disabled wait; PSW 000A0000 80F3BAE4 . After one VSMWORKn machine ABENDs, all other VSMWORKn machines end up ABENDing the same way within a few seconds of each other until none are working properly. The ABENDs occur about every three days or so and hits all VSMWORKn machines at nearly the same time. At the time of the ABEND, SFS server VMSERVS (for filepool VMSYS), which is running with 64M of virtual storage, runs out of storage and processes storage reclamation. The protection exception occurs in DMSJCM (SFS Cache Update) when the code is handling Invalidate CNRs. When the SFS server ran out of storage, it sent invalidate CNRs to the CMS client (VSMWORKn) to process. It's during that processing in DMSJCM in which field DCHFSTIU (count of FSTs in use) in the current hyperblock becomes corrupted with a very large number (X'FFFFFFFF'). However, the ABEND does not occur at this time but instead the next time the SFS server runs out of storage and sends out the invalidate CNRs. The corrupted value in field DCHFSTIU causes the code to attempt to write into the CMS nucleus which causes the protection exception.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All SFS and SMAPI users. * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** CMS clients who have an SFS directory accessed may experience a Protection Exception under certain conditions. The CMS client maintains a cache (in the form of FSTs) representing the files in the directory that is built when the directory is accessed. As changes are made to files in the directory, the file pool server maintains a record of the update and will send the update when the client interacts with the file pool. If the client is idle, the list of changes can build up in the file pool server, consuming virtual storage. If eventually the file pool server runs out of storage, it will start a reclamation process, part of which is to send "Invalidate" records to all idle clients who have a directory accessed. It is the processing of these "Invalidates" that is defective and can lead to an overwrite of the CMS nucleus and the Protection Exception.
Problem conclusion
DMSJCM is responsible for processing the "Invalidate" records. The "Invalidate" records represent the metadata for all FSTs that are currently in the directory. DMSJCM's job is to reconcile this data against the backlevel FST data that the CMS client currently has. It does so by first locating each HyperBlock for the accessed directory. Each HyperBlock contains an array of FSTs that can fit in one 4K page, but there can be empty slots where files have been erased. DMSJCM steps through the FSTs in each HyperBlock to see if the FST needs updating. It used the counter, DCHFSTIU, which indicates how many active FSTs there are per HyperBlock to know how many entries per HyperBlock to process. It did not consider that there could be empty slots. This led to marking an empty slot for subsequent removal and ultimately DCHFSTIU became negative (x'FFFFFFFF'). On the next invocation of DMSJCM for Invalidate processing, the bad DCHFSTIU counter led to an overlay of a bit in x'FFFFFFFF' locations, eventually spilling in to the area occupied by the CMS nucleus. This APAR corrects the logic in DMSJCM to skip over empty slots in the HyperBlocks and only reconcile valid FST entries.
Temporary fix
Comments
APAR Information
APAR number
VM65802
Reported component name
VM CMS
Reported component ID
568411201
Reported release
630
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-03-10
Closed date
2017-01-09
Last modified date
2017-04-28
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UM35006 UM35007
Modules/Macros
DMSJCM
Fix information
Fixed component name
VM CMS
Fixed component ID
568411201
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG27M","label":"APARs - z\/VM environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630","Edition":"","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]
Document Information
Modified date:
28 April 2017