The key to finding and fixing I/O-related performance problems
is DASD response time: the length of time it takes to complete an
I/O operation. Response time can have a dramatic effect on
performance, particularly with online and interactive subsystems,
such as CMS.
The following figure illustrates how DASD response time is defined.
Figure 1. DASD response-time
components
DASD response time is the elapsed time from the DIAGNOSE
instruction at the start subchannel (SSCH) instruction to the
completion of the data transfer, which is indicated by a channel
end/device end (CE/DE) interrupt. It includes any queue time plus
the actual I/O operation. Service time is the elapsed time
from the successful SSCH instruction to the data transfer
completion. It includes seek time, any rotational delays, and data
transfer time. Service time plus queue time equals response
time.
The above figure shows these DASD response-time components:
Queue wait time
This is the internal VM queueing of the I/O operations that are
waiting for a previous I/O to the device to complete. Queue time
represents the time spent waiting on a device. Delays in the other
service components may cause the queue component to increase, or
the queue time may be a function of skewed arrival of I/Os from a
particular application.
Pending time
This is the time from the start of the I/O until the DASD
receives it. The pending time indicates the channel path usage. A
high pending time indicates that the channel or logical control
unit is busy. Pending time can be caused by busy channels and
controllers or device busy from another system.
If a device is behind a cache controller (non-3990), pending
time can also be caused by cache staging (the device is busy during
the staging operation). When using nonenhanced dual copy (3990),
the device is busy while writing the data to the primary volume and
to the duplex volume if fast copy was not selected.
Disconnect time
Disconnect time includes:
The time for a seek operation.
Latency, always assumed to be half a revolution of the
device.
Rotational position sensing (RPS) reconnect delay, the time for
the set sector operation to reconnect to the channel path.
This time depends on internal path busy, control unit busy, and
channel path busy. If any element in the path is busy, a delay of
one revolution of the device is experienced.
For a 3990 Model 3 cache controller, if the record is not in
cache, the time waiting while staging completes for the previous
I/O to the device or until one of the four lower interfaces becomes
available from either transferring, staging, or destaging data for
other devices.
When a device cannot reconnect to the host to transfer data
because all paths are busy, it must wait for another
revolution.
Using cache control units reduces or eliminates disconnect time.
Disconnect time is used as a measurement of cache
effectiveness.
Connect time
This is the time actually spent transferring data between the
channel and DASD or channel and cache. Connect time can also
include the time a search operation is occurring between the
channel and the DASD or the channel and the cache (usually done to
search a directory to find the location of a program module so it
can be loaded into storage).
A high connect time indicates that you are using large blocks to
transfer the data. This can be a problem if you mix small blocks
and large blocks. The small blocks may have to wait on the larger
ones to complete, thus causing a delay.