Analyzing response time

The key indicator of performance is user response time. When users start complaining about the response time that they see, you know that a problem in your system or the network is causing delays.

Resource contention in the host affects a user’s response time. These resources include the processor, processor storage, I/O subsystem, communication with service machines, and network and line delays. Network and line delays may be a significant part of total end-user response time, but there is no way to measure them directly within VM.

z/VM provides system-wide transaction rates and response times, as collected by the VM scheduler. z/VM also provides response times by user class and individual users. You can also compute the approximate response plus think-time command-time cycle using this formula:


    command cycle = active users / total transaction rate

where the total transaction rate is the sum of the trivial, nontrivial, and QuickDsp transaction rates.

However, none of the readily available measures of user response time, including internal transaction response time, reflect response time as seen at the user’s terminal or are consistent. For example, the classification of transactions as trivial or nontrivial constantly changes. The classification varies by workload. A transaction that is trivial in one workload may be nontrivial in another workload. The response times recorded by VMPRF also do not reflect dependencies on service machines.

The best indicators and most reliable measures of end-user response time are the length of the dispatch and eligible lists, and the number of users in SVM wait on the VTAM/VSCS servers.

A virtual machine usually experiences delays when it is in the dispatch list with a transaction in progress and waiting for resources. The only resources that can delay a user in the eligible list are storage and paging. Most systems perform best when tuned to run with no users in the eligible list. Users can also wait for resources while in the dormant list. The only explicitly measured delay is SVM wait (waiting for a service machine), but users can be in the dormant list, have transactions in progress, and be waiting on I/O or DASD paging.

The performance characteristics of an operating system depend on factors such as the choice of hardware, the total number of users on the system, the functions being performed by the system, and the way the system parameters are set up.

The interrelationship of these factors primarily affects system performance:

Overloaded user DASD
Underutilization of minidisk cache or controller cache
The speed and number of paging devices
The amount of auxiliary storage made available
Real storage size
Real processor speed
Characteristics of the workload being processed in each virtual machine
The number of virtual machines logged on concurrently

Possible causes of delay are:

Waiting to start or excessive input queue time
Waiting for the processor while in storage
Excessive demand paging
Waiting for I/O to complete (including delays due to operational constraints)
Queue contention for CMS or batch work

There is no order in which you must investigate these delays. The best way to identify a starting point is to look at the number of users waiting for various system resources. Investigate the largest source of delay first. Consider using this set of priorities:

I/O
Processor storage (central and expanded)
Processor
Communications with service machines

Perform the analysis for the system as a whole, for individual service machines in the critical path for response time, and for whatever user classes are important to you.

A well-tuned system should have few storage, paging, or I/O constraints. You can experience situations where the processor is only 40% utilized, and is 60% in wait state. You might then conclude that the system does not have enough load for the processor to work at 100% utilization. But this conclusion might be far from reality, because users might not be able to execute on the processor due to constraints in the system. Most users on the dispatch list may be waiting for I/O or paging, and some users may be on the eligible list (waiting for storage).

Thus, you should analyze the reason for wait states and verify how much the system is in idle state, without any users waiting in the dispatch list or eligible list.