Diagnosing a performance problem
In this example of a performance problem, learn how to use the available information for diagnosis.
Information about the diagnostic tools are covered in other Performance Guidance topics.
The problem: average response considerably impacted when workload increased
In this example, a workload was sending requests to IBM® z/OS® Connect which were then routed to CICS®. Initially the workload was seen to scale well. However, when the workload increased from 300 to 400 clients, and then from 400 to 500 clients, although the TPS was acceptable, the average response was considerably impacted. Investigation was required to identify the cause of the performance impact for the larger workloads.
- A single IBM z/OS Connect server was running with an IPIC connection configured with 100 sessions.
- The LPAR had 2 GCPs and 2 zIIPs, all dedicated.
- The workload started with 100 clients, and increased over time to 500 clients in increments of 100.
- Each client sent a 100-byte JSON request and received a 32K response.
- The think time was configured to be 200ms in the workload simulator. This value represented the delay between a simulated client that received a response, and then sent the next request to IBM z/OS Connect . The think time was configured to meet the required TPS.
- As the workload increased, the average response time was expected to gradually increase if queuing was required at various points in the workflow.
Workload (# clients) | Transactions Per Second (TPS) | Average response time |
---|---|---|
100 | ~500 | 1.06 ms |
200 | ~1000 | 1.13 ms |
300 | ~1500 | 1.38 ms |
400 | ~1960 | 3.32 ms |
500 | ~2420 | 5.03 ms |
While running with a workload of 500 clients, the following investigation was performed:
1. Java™ Heap size and value was checked by using Java Health Center
The heap increased to just over 500 M when the workload increased from 400-500 clients. The Java heap was still within the maximum (Xmx
)
value of 1G, so was considered OK.

- Proportion of time spent in garbage collection pauses was less than 2%
- Proportion of time spent unpaused was greater than 98%.

2. Check CPU usage from the SMF 70 - 79 records.
- zIIPs handled most of the work (
APPL % -> IIP
) - GCPs did not need to handle much work (
APPL % -> CP
) - Only a small percentage of zIIP-eligible work had to be run on GCP (
APPL % -> IIPCP
)

- 91.1% of work ran without any delays waiting for a processor to be available.
- The workload was running well given the resources available.

- The zIIPs were handling most of the work (
APPL % -> IIP
) - However, the GCPs were needed to handle much more work than before as the two zIIPs were unable
to cope with the workload (
APPL % -> CP
) - The percentage of zIIP-eligible work that had to be run on a GCP rather than a zIIP had
increased (
APPL % -> IIPCP
) significantly.

- Only 23.4% of work ran without any delays waiting for a processor to be available.
- Over 76% of the work was delayed waiting for a processor to be available.
This is likely to be the root cause of why the average response time had increased significantly.

Proposed solution
Workload (# clients) | GCPs and zIIPs | Transactions Per Second (TPS) | Average response time |
500 | 2 GCPs, 2 zIIPs | ~2420 | 5.03 ms |
500 | 2 GCPs, 3 zIIPs | ~2480 | 1.63 ms |

The CPU Activity report (see Figure 8) showed the three zIIPs were now able to process most of the work without any delays or processor contention.

Conclusion
When running with only two zIIPs, the larger workloads created processor contention, which was resolved by adding an extra zIIP.