Server Utilization is Too High

Problem description:

Delays are occurring because of excessive queueing on the server.

A SFS server machine is a performance resource in its own right. As such, excessive queueing for its services will occur when its utilization gets too high. The higher the server utilization, the more significant the queueing delays will be. As a practical guide, there is no significant problem if the total server utilization is below 50%.

Server utilization consists of that server's use of a processor plus all the time it spends waiting for serializing events that prevent it from using a processor. An example of such a serializing event is handling a page fault that occurs in the server.

There are four components to SFS server utilization: CPU utilization, page fault resolution time, checkpoint time, and QSAM time. Performance Toolkit for z/VM shows total server utilization and how much each of these four components contributes to that total. Therefore, if server utilization is identified as a contributing problem (that is, it exceeds 50%), you can use this breakdown to find the main cause.

Possible corrective actions:

If most of the server utilization is because of CPU usage, a high level of server loading is usually indicated. The SFS server can only run on one processor at a time. A server CPU utilization of 100% would mean that the server is using all of one processor in the processing complex. Well before you get to that point, you should discontinue enrolling additional users in this server's file pool. If the server's CPU utilization is already excessively high, you may even wish to transfer some users from this file pool to another file pool that is less heavily used. See z/VM: CMS File Pool Planning, Administration, and Operation for instructions on how to do this.
If page fault resolution time predominates, treat this as a server paging problem. See Too Much Server Paging for suggestions.
If checkpoint utilization predominates, consider actions that would reduce checkpoint time. See Not Enough Control Minidisk Buffers and Too Many Catalog Buffers for suggestions.
If server utilization because of QSAM time is a major contributor, the problem is likely to be that the server is doing control data backups to tape or minidisk. If this is the case, you can remedy the situation by scheduling these backups to occur at times of low file pool usage. Alternatively, you could choose to do these backups to another SFS file pool. This resolves the problem because SFS will then do the control backups using asynchronous requests to the backup file pool rather than synchronous (and therefore serializing) QSAM I/O requests to tape or minidisk.