IBM Support

ZWSTECHNOTE : Must gather ( MUSTGATHER / DATAGATHER ) collection for z Workload Scheduler ( zWS ) cases

Question & Answer


Question

What documentation should be sent to zWS support to help debug / troubleshoot  a problem as quickly as possible?

Cause

Documentation is needed to analyze and determine the cause of various types of problems
For a general discussion of different types of problems which can occur and how to collect the correct set
of documentation refer to the zWS Diagnosis Guide and Reference, chapter 3  "Problem analysis procedures"
For the 10.1 release this is available at https://www.ibm.com/docs/en/SSGSPN_10.1.0/eqqd1mst.pdf

Answer

One of the major causes of delays in resolving zWS issues is not having a complete set of documentation available for  zWS support to review - especially if the problem is not one that is easily re-creatable.
Some proactive configuration can make it more likely that a complete set of documentation can be collected when needed.
PROACTIVE steps to make sure key documentation is not lost and to prevent some common problems:
 
(A) Make sure EQQMLOGs are not overwritten (DASD) or lost (SYSOUT not archived).  If EQQMLOG is written to
a single DASD file (not using SWITCHMLOG which has 2 files) then prior to the first step in the task (for example,
EXEC PGM=EQQMAJOR) a step must be added to copy the EQQMLOG that was previously in use, for example
to a GDG, or to a SYSOUT which is then archived
 
(B) Keep sufficient TRACKLOG (EQQTROUT) data -  the EQQTROUT file can be kept for as long as needed as
it is MODed onto each time.  The EQQTROUT could also be copied to a GDG.   If for example you want to know why
a job did not track to completion one month ago, you must have the TRACKLOG information stored for at least one month.

(C) Be sure valid SYSMDUMPS are allocated for each task and key batch jobs (CP EXTEND, LT EXTEND) - see section DUMPS
below.

(D) Have JCL ready and tested to back up all the controller files (see section "APARTAPE" below)

(E) Take dump before cancelling a task because it won't shut down

(F) Take several dumps or run some appropriate diagnosis job (STROBE, INTUNE, OMEGAMON, RMF)  if performance issue is suspected 

(G) Be sure JTARC is allocated very large   (can be as large as a full DASD volume, but not MULTI-VOLUME)
(H) In the CP EXTEND  JCL, be sure that work file EQQDIN is large enough.  This file needs to be larger to handle additional
days of current plan, for example over a long holiday. 
(I) If you have an automation / autoperations product, be sure it is set up to handle key messages like EQQZ045E  (subtask
failed) so that a procedure to copy documentation can be invoked.
(J) Be sure that there is a //SYSOUT DD SYSOUT=*   in controller /server JCL  (see this technote for more details on the correct SYSOUT definition:   https://www.ibm.com/support/pages/node/6195353
(K) Tersed files need to be sent or attached in binary mode (BIN)) with an extension of .trs so that they will be automatically untersed
when received.
When a problem occurs
As soon as a problem has been identified it is better to collect a complete set of documentation (see section "APARTAPE" below) .
For any problem the EQQMLOG of any involved tasks (controller, tracker, server, etc) is almost
always useful. You may add the name of the EQQMLOG file(s) to the list of files to be backed up via the JCL attached  (see section "APARTAPE")  If EQQMLOG is written to SYSOUT, the SDSF "XDC" command can be used to copy the data into a regular DASD file.
For problems that occur when using the zWS dialog, some screen displays that show the steps leading up to the problem are always helpful (see the section "screen prints" below).

Likewise if a dump occurs either written to the SYSMDUMP allocated to a task or as a result of a SLIP command that was set up, the dump should always be sent in (see section "DUMPS" below).
See section "Specific Features" below for additional technotes that are available to help with data collection for specific features of zWS. 

If you have questions about sending documentation to a case, review   INFO APAR II13819  (HOW TO SEND CUSTOMER DOCUMENTATION FOR CASES)  https://www.ibm.com/support/pages/apar/II13819

APARTAPE

For a complete set of zWS documentation, a set of JCL that you can modify to match your dataset naming conventions and JCL standards is attached below as  ZWS.APARTAPE.JCL.TXT

zws.apartape.jcl.txt
twsz.apartape.jcl.txt

Note that this JCL is preferable and more complete than the "APAR TAPE" skeleton EQQAPARS JCL which is used by the dialog 9.9 ("APAR TAPE") function. Check the "NOTES" in the JCL- also there are sections for files that are ONLY required for specific types of problems (for instance E2E) that you can remove if desired- however, having TOO MUCH documentation is never a problem whereas
having too little documentation may cause a delay for waiting for a problem to occur again.
Specific features
(1) MIGRATION TO A NEW RELEASE of zWS 
https://www.ibm.com/support/pages/node/213015  ZWSTECHNOTE :  MIGRATION TO NEW RELEASE - BEST PRACTICES (  zWS  z Workload Scheduler ) 
There is a section at end of this technote:   Documentation to collect if the migration is NOT successful
(2)RESTAPI
https://www.ibm.com/support/pages/node/6557054  ZWSTECHNOTE : Documentation required for RESTAPI related cases
(3)HISTORICAL REPORTING (DB2) 
https://www.ibm.com/support/pages/node/293951   ZWSTECHNOTE : Tracing to help resolve issues with DB2 historical reporting batch archive job
(4) ZCENTRIC / DYNAMIC AGENT / SHADOW JOBS
https://www.ibm.com/support/pages/node/6195353  ZWSTECHNOTE : Setting up a DIAGNOSE trace in z Workload Scheduler  

(5)DWC on z/OS 
https://www.ibm.com/support/pages/node/7085607  ZWSTECHNOTE : DWCZOS : Documentation to provide for issues involving DWC running on z/OS
6) AGENTS  or DWC RUNNING ON A NON-Z/OS OPERATIONSYSTEM
The following datagather technote may be used 
(7)WAPL 
A WAPL trace can be added to the SYSIN of a WAPL job 
  OPTIONS MSGLEVEL(5) TRACE(3) 
For example:                        
//SYSIN    DD *                                                    
OPTIONS MSGLEVEL(5) TRACE(3)       
LOADDEF * DATA(-)                                                  
LIST AD ADID(APPL1)                                                
/*
(8) PIF ISSUES
Get a PIF trace of the failing scenario, The PIF trace is set up by including parameter PIFTRACE(40) in the SERVOPTS statement
of the zWS server task, and requires that this server task have a SYSOUT DD (//SYSOUT DD SYSOUT=*) in addition to the EQQMLOG DD. A CEEOPTS DD can also be added to ensure that MSGFILE is set to SYSOUT. See this technote for more information about the CEEOPTS MSGFILE:  https://www.ibm.com/support/pages/node/6195353
After the PIF trace has been obtained. remove the PIFTRACE(40) parameter from SERVOPTS, and restart the server task.
(9) ISPF DIALOG ISSUES
https://www.ibm.com/support/pages/node/659451 ZWSTECHNOTE : CAPTURE A DUMP of an ABEND or a TRACE in the z Workload Scheduler ISPF dialog

(10)  E2E (FAULT TOLERANT  FTA)
A tar -cvzf of the WRKDIR should be provided if the zfs file containing the WRKDIR is not included via the JCL
attached above. Also the controller and E2E server EQQMLOGS should be provided and a tersed copy of the EQQTWSIN and EQQTWSOU files.

The output of the last CP EXTEND, REPLAN or Symphony Renew job should be included as this will
show the current TOPOLOGY and also may include some important error messages.

(11) RESTART AND CLEANUP  ( RCL )
For  "restart and cleanup" issues the controller and datastore EQQMLOGs should be provided.
If the probleminvolves a specific job (for example, the joblog cannot be parsed or the restart of the job appears to have been done in an incorrect manner) all the joblogs for that job (the ORIGINAL run of the job, plus any RESTARTs of the job (these usually will contain the EQQCLEAN proc). Sometimes it will be necessary to include the PKI, SKI, SDF and UDF files as well. These files are part of the JCL attached above.
(12) RECENT MAINTENANCE APPLIED
If any PTFs or APAR fixes have been recently applied and could be related to the problem please list these
or provide an SMPE report showing TWS z/OS maintenance applied. A report like this can be run against
the SMPE target zone:
//XREF     EXEC PGM=GIMSMP,REGION=0M                        
//SMPCSI   DD DSN=SERVICE.ZWS.V10R1.SMPE.CSI,DISP=SHR       
//SMPCNTL  DD *                                             
  SET      BOUNDARY (TZWSA1)                                
                  OPTIONS(Z038OPT) .                        
   LIST SYSMODS FORFMID(HWSZA10,JWSZA12,JWSZA13,JWSZA1B)    
        NOSUP  XREF.                                        
/*                                                          

This example is for the 10.1 release but can be modified (FORFMID list) for other releases.
Additional sections referenced above:

SCREEN PRINTS:

If you need to gather some screen displays to help clarify a zWS problem, first issue the command
PANELID on the TSO command like so that the panel name is displayed. This helps clarify exactly which panel is in use when there are some with similar contents. Also, while formats like jpeg and WORD documents can be sent via email or attached to the case, these cannot be copied into the case text-
so the best method of sending a screen display is actually just to do a CUT and PASTE of the screen
contents into a simple TXT document (like NOTEPAD). This makes it easy to copy the screen contents
into the case text if it is needed.
Also please include a description of the sequence of actions/inputs done matching the panel flow with the printouts that are gathered. 

DUMPS:

In case an abend should occur in a zWS task or batch job it is always a good idea to have SYSMDUMP
allocated to the JCL as a DASD file with LRECL=4160 BLKSIZE=F and SPACE around CYL(30,30) or larger.
Do not have SYSABEND or SYSUDUMP allocated as these cause formatted dumps which are not useful
for zWS debugging purposes. If you have any dump management software it may be necessary to
add a special DD card to the JCL to get a normal SYSMDUMP instead of a formatted dump (see technote:
https://www.ibm.com/support/pages/node/389979   ( ZWSTECHNOTE : z Workload Scheduler ( zWS ) ABENDS ARE NOT CREATING a SYSMDUMP )

If you need to get a dump of the controller while it is still running (for example in the case of a hang or
loop) the following procedure may be used:

DUMP COMM=(reason for taking dump)
R xx,JOBNAME=(ZZZZ),CONT
R xx,DSPNAME=('ZZZZ’.*),CONT
R xx,SDATA=(COUPLE,ALLNUC,LPA,LSQA,PSA,RGN,SQA,TRT,
CSA,GRSQ,XESDATA,WLM),END
where:
xx Specify the replay number ID.
ZZZZ Specify the name of the controller.

Note that the dataspaces are currently only needed if the problem involves dynamic critical path,
but it does not hurt to include this even if critical path is not been used. The same dump commands
without the "DSP" parameter may be used for any other zWS task or batch job.

Because some problems cause MULTIPLE dumps to be taken (and in almost all cases, the first dump
is the most important one for analysis), the SYSMDUMP DD may be specified as DISP=MOD to allow
multiple dumps to be written to the same dump dataset (this will of course require a larger SPACE
size for the SYSMDUMP dataset.
 
When a SLIP dump is needed see
https://www.ibm.com/support/pages/node/7085863  ZWSTECHNOTE : Obtaining SLIP dumps for z Workload Scheduler abends or messages issued to the z/OS SYSLOG      

If you received an abend but did not capture a dump, the dump symptoms recorded in the
JOBLOG may still be useful.
 

Related Information

[{"Product":{"code":"SSRULV","label":"IBM Workload Scheduler for z\/OS"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
01 December 2023

UID

swg21973165