Optimizing CICS TG on z/OS performance using OMEGAMON XE for CICS
ChrisWalker 270003FTW2 Visits (3236)
A few years ago I wrote a lengthy article about how the CICS TG monitoring agent delivered as part of the OMEGAMON XE for CICS product opens up the black-box of CICS Transaction Gateway (CICS TG) on z/OS giving system administrators the information required to tune their CICS TG resources in order to optimize throughput and minimize bottlenecks within this key middleware component. Well, as time moves on URLs change and the original article seems to have disappeared into the ether. The details within in it however, are still very much valid and I thought it would be time to revisit this subject and remind ourselves of what can be achieved.
Firstly, let's clarify what resources we are considering when tuning our CICS TG in these scenarios. As you may be aware, a Gateway daemon contains two pools of threads:
A lot of customers will use the default settings around the Gateway daemon threading or make arbitrary changes without understanding the effects on the throughput. OMEGAMON XE for CICS through its graphical portal provides a means of easily viewing the current allocation of threads. Additionally, thresholds and situation evaluation can be applied to alert users of resource issues. Ultimately, this allows the administrator to be alerted to resource issues or bottlenecks within their CICS TG and make informed decisions to optimize performance.
In these example scenarios that follow I have configured the thread pools within four Gateway daemons slightly differently. We will see how the behavior of the environment changes given the same workload applied in each case. We'll also look at possible changes that can be made if the behavior seen is not what is desired.
Scenario 1 – Connection Timeout Set to 0
I configured this CICS TG to reject immediately any client connection requests if there are no free connection manager thread available within the pool. If a connection request is rejected then the client application will need to resubmit the request again if it wants to continue the transaction. We might want to do this if we have a very fast network connection between the client applications and CICS TG. In many cases though, this configuration will result in an increase of network traffic due to resubmitted request that we often want to avoid. Figure 1 shows this behavior highlighted within the portal client.
The field showing the number of time the connection timeout has been hit is highlighted. It also shows that, given the current allocated resources, this CICS TG is running at capacity. If this is an issue we can try to increase the number of connection manager threads in the pool or we can increase the connection timeout limit. This will give clients additional time to wait for a connection manager thread to become available before being rejected by the Gateway daemon.
Scenario 2 – Worker Timeout Set Too Low
As with the connection timeout in the previous scenario, there is a timeout attribute that indicates how long a request may wait for a worker thread to be allocated. This timeout is useful because as a worker thread remains allocated while a work request to CICS is being made If all allocated threads are taking a long time to complete then it could be useful to reject client that have waited too long to prevent further bottleneck issues. This does mean that clients may have to resubmit their work requests (and, of course, increase network traffic). Figure 2 shows how the portal reports this behavior.
This time it is the worker timeout field that has been highlighted. If this number is increasing rapidly and therefore seen as a problem, we can remedy it through either increasing the number of worker threads (which will increase the amount of amount of requests CICS is required to handle) or increase the timeout and allow more waiting time for clients.
Scenario 3 – Not Enough Worker Threads
Some customers may decide to alter the ratio of connection manager to worker threads from 1:1. You may chose to do this to allow a higher number of clients to connect to a CICS TG and remain connected knowing that not all will be requesting work from CICS (and therefore requiring a worker thread) at precisely the same time. It may be though that the number of workers in insufficient at times to handle the incoming workload, requiring clients to wait. Figure 3 shows this behavior as reported on the portal.
On the bar graphs we can see there is a number of connected clients waiting for a worker thread to become available. Whilst for short periods this may be fine, over an extended period it demonstrates that this CICS TG has become a bottleneck and remedial action may be required such as increasing the size of the worker thread pool.
Scenario 4 – Unlimited Resources
You may be thinking that perhaps we should not place artificial limits on the CICS TG's resources and instead configure it to acquire whatever it needs to get the workload processed. This is possible and may help avoid some of the issues seen in the previous scenarios but it does come with a big warning as figure 4 will help to demonstrate.
If your workload is fairly consistent, you should see no change in behavior within an “unrestricted” CICS TG but, should there be a spike or dramatic increase in the workload, the Gateway daemon will begin to increase the size of the thread pools to cope with this demand. Thread creation is expensive relative to normal CICS TG processing and once created a thread remains alive until the CICS TG is restarted (that is, the pool remains expanded to the maximum size even if the workload level drop back to 'normal' levels) so this may mean the CICS TG begins to take more CPU time away from other processes and in the worse case may cause the the CICS TG to be stopped due to excessive resource consumption. It may also cause issues within connected CICS regions too. The solution here is to obviously define the correct limits on CICS TG resources for the typical workload seen in your environment.
As the last scenario demonstrated, you need to choose the resource allocation sensibly to ensure your CICS TG instances are not unwittingly causing a bottleneck within your transaction processing environment. OMEGAMON XE for CICS opens up the black-box of the CICS TG throughput visualizing the current performance, allowing you to make informed configuration tuning based on your infrastructure, capacity and application design. As throughput level may vary at different times of the day, the historical data collection feature of OMEGAMON allows you to take snapshots throughout the day to gain a clear idea of the limits you should accommodate with your configurations.
For more details on how OMEGAMON XE for CICS can assist with performance monitoring of CICS TG on z/OS see the Information Center. For more information on CICS TG configuration see the following section on Performance.