Diagnosing a performance problem

In this example of a performance problem, learn how to use the available information for diagnosis.

Information about the diagnostic tools is covered in other Performance Guidance topics.

The problem: average response was considerably impacted when workload increased

In this example, a workload was sending requests to IBM® z/OS® Connect which were then routed to CICS®. Initially, the workload was seen to scale well. However, when the workload increased from 300 to 400 clients, and then from 400 to 500 clients, although the TPS was acceptable, the average response was considerably impacted. Investigation was required to identify the cause of the performance impact for the larger workloads.

The first step in diagnosis of the problem is to understand the environment and the workload:

A single IBM z/OS Connect server was running with an IPIC connection that is configured with 100 sessions.
The LPAR had 2 GCPs and 2 zIIPs, all dedicated.
The workload started with 100 clients, and increased over time to 500 clients in increments of 100.
Each client sent a 100-byte JSON request and received a 32K response.
The think time was configured to be 200ms in the workload simulator. This value represented the delay between a simulated client that received a response, and then sent the next request to IBM z/OS Connect. The think time was configured to meet the required TPS.
As the workload increased, the average response time was expected to gradually increase if queuing was required at various points in the workflow.

Why did the average response time for each request increase significantly when the workload increased from 300 to 400 clients, and again from 400 to 500 clients? What resource was there contention for? Before this increase, the average response was reasonably steady. The average response times were recorded as follows:

Table 1. Recorded average response times
Workload (# clients)	Transactions Per Second (TPS)	Average response time
100	~500	1.06 ms
200	~1000	1.13 ms
300	~1500	1.38 ms
400	~1960	3.32 ms
500	~2420	5.03 ms

While running with a workload of 500 clients, the following investigation was performed:

1. Java™ checked by using Java Health Center

The heap increased to just over 500 M when the workload increased from 400-500 clients. The Java heap was still within the maximum (Xmx) value of 1G, so was considered OK.

Image to show Java heap size and pause times — Figure 1. Java heap size and pause times

Java heap values were checked by using the Java Health Center. The key Java heap metrics looked OK.

Proportion of time spent in garbage collection pauses was less than 2%
Proportion of time spent unpaused was greater than 98%.

Image to show garbage collection summary — Figure 2. Garbage collection summary

2. Check CPU usage from the SMF 70 - 79 records.

Using the SMF 70 - 79 Report Classes for IBM z/OS Connect, the Workload Activity and the CPU Activity were compared for when the workload was running with 300 clients, and then with 500 clients. The Workload Activity report for the IBM z/OS Connect server when running with 300 clients showed the following statistics (see Figure 3):

zIIPs handled most of the work (APPL % -> IIP)
GCPs did not need to handle much work (APPL % -> CP)
Only a small percentage of zIIP-eligible work had to be run on GCP (APPL % -> IIPCP)

Image to show the SMF 72 workload activity for Report Class ZOSCONN when running with 300 clients — Figure 3. SMF 72 Workload Activity for Report Class ZOSCONN when running with 300 clients

The CPU Activity report for the IBM z/OS Connect server running with 300 clients showed the following statistics (see Figure 4):

91.1% of work ran without any delays waiting for a processor to be available.
The workload was running well given the resources available.

Image to show the SMF 70 CPU activity report — Figure 4. SMF 70 CPU Activity Report

This activity was compared with the Workload Activity and CPU Activity for the 500 client workload. The Workload Activity report for the IBM z/OS Connect server when running with 500 clients showed the following statistics (see Figure 5):

The zIIPs were handling most of the work (APPL % -> IIP)
However, the GCPs were needed to handle much more work than before as the two zIIPs were unable to cope with the workload (APPL % -> CP)
The percentage of zIIP-eligible work that had to be run on a GCP rather than a zIIP had increased (APPL % -> IIPCP) significantly.

Image to show the SMF 72 workload activity for report class ZOSCONN when running with 500 clients — Figure 5. SMF 72 Workload Activity for Report Class ZOSCONN when running with 500 clients

The CPU Activity report for the IBM z/OS Connect server when running with 500 clients showed the following statistics (see Figure 6):

Only 23.4% of work ran without any delays waiting for a processor to be available.
Over 76% of the work was delayed waiting for a processor to be available.

This is likely to be the root cause of why the average response time had increased significantly.

Image to show the SMF 70 CPU activity showing dispatcher delays — Figure 6. SMF 70 CPU Activity showing dispatcher delays

Proposed solution

For this 500 client workload, an extra zIIP engine was added, and the results were observed. Running with the extra zIIP, the average response time reduced from 5.03 ms to 1.63 ms.

Table 2. Results for 500 client workload with an extra zIIP engine
Workload (# clients)	GCPs and zIIPs	Transactions Per Second (TPS)	Average response time
500	2 GCPs, 2 zIIPs	~2420	5.03 ms
500	2 GCPs, 3 zIIPs	~2480	1.63 ms

The Workload Activity report (see Figure 7) showed the three zIIPs were now handling most of the workload and only a small amount of zIIP eligible work that is needed to run a GCP.

Image to show the SMF 72 workload activity for Report Class ZOSCONN — Figure 7. SMF 72 Workload Activity for Report Class ZOSCONN

The CPU Activity report (see Figure 8) showed the three zIIPs were now able to process most of the work without any delays or processor contention.

Image to show the SMF 70 CPU activity — Figure 8. SMF 70 CPU Activity

Conclusion

When running with only two zIIPs, the larger workloads created processor contention, which was resolved by adding an extra zIIP.