Troubleshooting data for performance problems in CICS TS

CICS® MustGather for performance problems

Performance problems tend to fall into one of two common categories:

Poor response time - tasks fail to start running at all or take a long time to complete. Both symptoms contribute to your perception that CICS is running slowly.
Increased CPU time - either the end user paying the bill is complaining that CPU costs have gone up, or someone has noticed that they are using more CPU than before.

A defect in CICS or other related product could be the cause of your problem.

Performance problems can also be a result of your system being badly tuned, or because it is operating near the limits of its capacity. In this case, you will probably notice that the problem is worst at peak system load times, typically at mid-morning and mid-afternoon.

If you find that performance degradation is not dependent on system loading, but happens sometimes when the system is lightly loaded, a poorly designed transaction could be the cause.

CICS Services: IBM® support is responsible for helping you when there is a defect in an IBM product. If you have a new application that has never successfully worked, or if IBM support determines that your problem is out of the scope of what they do, you can contact your IBM sales representative. The sales team can initiate a call with IBM Lab Services who can help you tune CICS or make changes to your applications to reduce resource demand. They can also help you determine if you have insufficient or poorly configured hardware resources.

Before you gather any documentation

Answer the following performance questions to help support determine the nature of the problem (the more detail and more precise the better):

What is the problem (CPU increase or response time increase, specific transaction or all transactions...)?
How much of an increase in CPU or response time are you seeing?
What is the scope and business impact (is it an overall slowdown, are all transactions affected, or are only a few transactions affected, perhaps a single application)?
Does the problem occur at a specific time in the day (peek hours, intermittently, or continuously)?
Did something change when the problem started? If so, what changed?
- Application programs
- Workload (increase in number of transactions)
- CICS configuration (JCL, SIT, or resource definitions)
- Other software like MVS™, Db2®, IMS, or OEM vendor software
- Maintenance applied to CICS, MVS, VTAM®, LE, OEM vendor software, or other products that interface with CICS
- Hardware (processor, DASD, LPAR configuration, network, new NCP or I/O configuration)

IBM Support would like to encourage you to look at performance reports and is very interested in your analysis. Screen shots or output from tooling are welcome.

Gather the following diagnostic information before contacting the CICS support team to troubleshoot your performance problems.

Required data:

Complete CICS job log (that includes at minimum JESMSGLG, MSGUSR, and CEEMSG) back to the startup of the CICS region. System Management Facilities (SMF) writes message IEF374I to the JESYSMG log for the job at step termination. This message contains overall CPU time (TCB) and SRB accumulated time for the address space. Other helpful information like error messages might also be in the CICS job log.
MVS system dump of your CICS region taken during the time of the problem. If possible, an MVS system dump of same region during a time of equivalent workload when the problem is not occurring.
SMF110 records from all LPARs involved in the problem. If possible, the SMF110 records should span from about 15 minutes before the problem started to about 15 minutes after the problem ended. If that is not feasible, send about 1 hour of SMF110 records starting from about 15 minutes before the problem started.
- Monitoring data - SMF type 110 subtype 0001 records (Performance and Exception data) show resource usage by individual transactions. There are several utilities available for you to process CICS monitoring data.
- Statistics data - SMF type 110 subtype 0002 - 0005 records show system-wide resource usage. There are several utilities that you can use to process CICS statistics.
Printed detail summary reports of software records written to SYS1.LOGREC. To print these records, use the EREP service aid. See Step 9: Generating Detail Reports for Software Records in the EREP User's Guide for more information.
If the performance problem is confined to one LPAR, MVS system log (SYSLOG) containing messages for the LPAR leading up to and including the time when the problem is happening (or the entire day).
If the problem spans multiple LPARs, the merged sysplex-wide system message log, also known as the operations log (OPERLOG), from the time leading up to and including the time when the problem is happening (or the entire day).

Optional data:

CPU - CPU Activity report that includes SMF type 70 subtype 1 (CPU activity) records leading up to and including the time of increased CPU consumption (a few 15 minute intervals before and during the problem). This report will show you if CPU capping is occurring due to limits that have been set.
RMF 3 VSAM data sets covering the time leading up to and including the time when the problem is occurring. See Sending data sets to a different system for instructions on how to use the CLIST ERBV2S, which is supplied with RMF, to unload the VSAM dataset(s) to a sequential data set for transport.
If the problem is short lived and predictable, output from the STAT transaction (DFH0STAT) for the CICS address space. Run STAT just before the problem occurs and after the problem ends.

Tip: Review the logs and performance data generated during the period of time you are seeing the performance problem. Compare this data with data from the same CICS region during a time of equivalent workload when the problem is not occurring.