A fix is available
APAR status
Closed as program error.
Error description
CNMCSSIR task is stuck which will prevent messages being sent to the NETLOG.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of Tivoli NetView for z/OS V6R1. * **************************************************************** * PROBLEM DESCRIPTION: Receiving of all unsolicited MVS * * messages by NetView could stop for * * one or both of the following reasons: * * 1. If any MVS address space or task is * * canceled or otherwise abnormally * * terminated during WTO processing. * * 2. If the rate of WTOs being issued is * * very fast. * * Also, when MVSPARM.Msg.Automation is * * set to No, occasionally the CNMCSSIR * * task's CPU time jumps to very high * * usage for a brief time and then goes * * back to zero. * **************************************************************** * RECOMMENDATION: * **************************************************************** The explanation for the first problem above is as follows: When NetView code running in its SSI exit gets control because of a WTO, DOM or command, it first allocates a number of segments from the Canzlog data space (using a Compare and Swap instruction) and then starts filling it in. Part of the data filled in is the eye-catcher, the number of segments in the current block, and the number of segments in the previous block. The code under task CNMCSSIR in the NetView address space scans for Canzlog entries to pick up, recognizing the eye-catcher and moving forward the number of segments indicated. However, if the unit of work issuing the WTO, DOM or command abnormally terminates after the block is reserved but before any data is written to it, the code under CNMCSSIR has no way of knowing how many segments are in the current block so it can continue to the next one. As a result, processing of Canzlog records does not continue. The explanation for the second problem above is as follows: When code running under the DSILOGMT task in the NetView address space runs on its maintenance timer, it zeroes out a few 8 megabyte plots in the Canzlog data space ahead of where new messages are being written. It skips the next plot after the current segment, assuming that the rate of WTOs would not be high enough to overtake where the zeroing is taking place. However, if the code under DSILOGMT is suspended long enough and the rate of WTOs is high enough, the code could zero out messages just written. This could show up as a very large area of zeroes or several smaller areas of zeroes. This also causes CNMCSSIR processing to be stalled, as described above. Also, when MVSPARM.Msg.Automation is set to No, the CNMCSSIR tasks only wakes up to process commands across the SSI. There could be very many messages issued while it is idle. So, when it does wake up, it scans every entry in Canzlog from where it left off up to the current entry. This can be seen as very high CPU usage for that time frame.
Problem conclusion
For the first problem, the code to reserve and initialize a block from the Canzlog data space is being changed to use a PLO (Performed Lock Operation) instruction instead of a CS (Compare and Swap) instruction. This allows the code to reserve the block and fill in the necessary information for the CNMCSSIR task to continue in the same locked process. Therefore, it is not possible to reserve a block and have it contain zeroes. Since there were 3 places in NetView code to reserve and initialize a Canzlog block, a new module, named DSI4LGBL, is being created to provide a common place for this function. The 3 places in NetView code are being changed to call this new module. For the second problem, module DSI4LCM1 is being changed to no longer zero plots ahead of the current segment. This is possible because of the use of the PLO instruction described above. And, it will recognize when writing has surpassed where it is currently looking, and adjust the plot map of the active Canzlog data space accordingly. Also, to keep CNMCSSIR caught up to the latest activity in Canzlog even when MVSPARM.Msg.Automation is set to No, a new ECB is being defined in the DSIGIT (which takes the place of an internal one that CNMCSSIR had been using), which will be posted by DSILOGMT when it wakes up every 11 seconds to perform Canzlog maintenance.
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
OA40854
Reported component name
NETVIEW FOR Z/O
Reported component ID
5697NV600
Reported release
10B
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt
Submitted date
2012-11-16
Closed date
2013-02-22
Last modified date
2013-04-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UA68123
Modules/Macros
CNMCSSIR CNMCSSIW DSI4LA2Z DSI4LCM1 DSI4LGBL DSI4LSIT DSI4LWLM DSI4LZLO HNV610BJ
Fix information
Fixed component name
NETVIEW FOR Z/O
Fixed component ID
5697NV600
Applicable component levels
R10B PSY UA68123
UP13/03/08 P F303
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSZJDU","label":"IBM Z NetView"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10B","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10B","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 April 2013