Monitoring performance and capacity

This topic explains how to monitor z/VM functions such as paging and spooling space. The topic includes run-time characteristics and symptoms you should observe to avoid system problems.

Overview of performance monitoring

To be successful in monitoring and tuning your z/VM system, you first need to decide what the phrase good performance means for your installation. What is considered good performance varies from one installation to another. For example, for some installations response time for commands is the primary indicator of performance, and good performance might mean sub-second response times. You need to evaluate your own workload, decide what the best indicators of performance are for your workload, and decide what values for those indicators constitute good performance in your environment.

Tuning a z/VM system is really nothing more than matching your hardware's capabilities to your workload's characteristics and to your definition of good performance. Through routine monitoring, you can keep track of the characteristics of your workload, the utilization of your hardware, and, most important, whether your definition of good performance is being met.

As you work at tuning your z/VM system, keep in mind that the changes and adjustments you make serve only to get the best performance from the physical resources you have. Sometimes there is nothing more you can do for performance, due to the limitations of your hardware; when all else fails, you need to consider purchasing more.

Because performance ultimately depends on the hardware available, it's important that you, the performance analyst, have an accurate inventory of the physical resources present on your machine. In particular, as you consider tuning your z/VM system, have an accurate inventory of the hardware available to run your workload:
  • How many real CPUs are there, and how fast are they?
  • How much real storage is available?
  • How well is the system equipped for paging?
    • How much total paging space is there?
    • How many paging volumes are there?
    • How are the paging volumes distributed among disk controllers?
    • How fast are the paging volumes and their disk controllers?
    • How many channel paths (CHPIDs) are there from the processor to each disk controller used in paging?
    • For each volume used for paging, are there other kinds of data on the volume besides paging space?
  • How well is the system equipped for spooling? The paging questions apply to the spooling configuration, too.

Tuning is the art of optimizing some performance measure of a workload within the constraints imposed by the hardware available. Because the hardware always imposes some type of constraint, tuning often becomes a balancing act, trading off the needs of the whole system against the needs of specific virtual machines. Sometimes providing more resources to one virtual machine is detrimental to other virtual machines or to the system as a whole. For instance, you might not want virtual machines used for testing programs to get as much system resources as production virtual machines, because your objective is to give production work maximum throughput. Performance monitoring can help you understand which changes meet your performance objectives.

You can use the INDICATE command (class B, C, and E users) to obtain an informal snapshot of system load, scheduler dispatcher queues, and I/O. The QUERY SHARE command (class A and E) allows you to view the share of system resources a virtual machine has. Both of these commands provide only a snapshot, while the MONITOR command collects large amounts of performance measurement data for later systematic and comprehensive analysis.

The MONITOR command (class A and E users) starts and stops the emission of data relevant to specific system events. The command also starts and stops the collection, and periodic emission, of sample data descriptive of system performance.

Monitor domains divide the emitted data into topic areas: behavior of I/O, behavior of the scheduler, configuration of the z/VM system, settings of the monitor itself, and so on. Data the monitor emits under a certain domain includes both event information (such as a user logoff) and sample information (collected performance measurement data, usually counters).

Related information:

Monitoring Linux virtual servers with Performance Toolkit for VM

z/VM provides analysis tools, such as Performance Toolkit for VM, that helps you analyze the data you collect with the MONITOR command. In addition to analyzing z/VM performance data, Performance Toolkit for VM processes Linux® performance data, provided you have the proper support software. To process Linux performance data, you have these choices:
  • Install a commercial Linux on a system that contains a mainframe performance monitoring product.
  • Use a Linux 2.6 kernel such as SLES11, which has built-in support that allows Performance Toolkit for VM to monitor Linux virtual servers.
  • For additional performance data from Linux, use the Linux performance gatherer, rmfpms, and configure Performance Toolkit for VM to access this data.
    Note: Performance Toolkit for VM can process Linux performance data obtained from the Resource Management Facility (RMF) Linux performance gatherer, rmfpms. Performance Toolkit for VM support for rmfpms has been stabilized and might cease to function as the underlying Linux systems evolve. Support for the Linux rmfpms agent has been withdrawn and no new copies of it are available for installation. If you have rmfpms installed on an existing Linux image, it should continue to run on that image unsupported. There is no guarantee a current rmfpms installation will run on future Linux image installations.

Overview of the z/VM scheduler

This topic gives a rudimentary introduction to the z/VM scheduler so you can understand system responses to commands such as INDICATE.

Note: CP's virtual processor management has been improved so that no user stays in the eligible list more than an instant before being added to the dispatch list.
The z/VM scheduler controls the dispatching of virtual machines by managing three scheduling lists:
  • The dormant list contains a list of virtual machines that are idle or waiting for completion of a long event, such as a tape read.
  • The eligible list contains a list of virtual machines waiting for resources. Each virtual machine is classified according to its processing characteristics (known as its transaction class). A scheduler transaction is the amount of time the virtual machine remains able to run. A virtual machine that runs short transactions consumes little processor resource between waits, while one that runs long transactions consumes a large amount of processor resource between waits. The transaction classes are:
    • E0. Virtual machines in this class do not wait in the eligible list, but move to the dispatch list immediately.
    • E1. Virtual machines in this class are those doing short transactions, such as interactive users.
    • E2. Virtual machines in this class do medium-length transactions.
    • E3. Virtual machines in this class do long transactions.

    The scheduler assesses each virtual machine for its need for available resources. For the scheduler, resources include processors, central storage, and paging capacity. When a virtual machine is waiting for resources, it is waiting for the z/VM scheduler to decide that there is enough processor capacity, storage capacity, and paging capacity to add this virtual machine to the set of virtual machines on the dispatch list.

    The scheduler calculates the eligible priority of a virtual machine when it enters the eligible list. This priority is called the deadline priority, the time by which the virtual machine should enter the dispatch list. The relative priorities of virtual machines are designed to slow down virtual machines that require highly-demanded resources, grant virtual machines their shares of available system resources, and control the amount of resource each class gets. For instance, though E2 and E3 virtual machines wait longer in the eligible list, they receive longer elapsed time slices in the dispatch list, which allows efficient use of system resources and the rapid completion of interactive work.

  • The dispatch list contains a list of virtual machines that can run or whose wait state is expected to be short (for instance, waiting for a page-in). When a virtual machine enters the dispatch list, it retains its transaction class; E0 virtual machines become Q0, E1 become Q1, and so forth. Also, each virtual machine gets a deadline priority, the time of day when the virtual machine should complete its next dispatch time slice under ideal conditions. The lower the dispatch priority, the closer the virtual machine is to being dispatched.

    Because dispatching priorities are dynamically calculated, the sequence of the virtual machines in the dispatch list varies according to the changing operating characteristics of the virtual machines.

The scheduler controls the cycling of virtual machines through the three lists. When a virtual machine logs on, it is placed in the dormant list and moved to the eligible list only when it has work to do. When entering the eligible list, the virtual machine is assigned its deadline priority based on its share, resource requirement, and contention for system resources. When resources become available, the scheduler moves virtual machines from the eligible list to the dispatch list and assigns them dispatch priorities. As virtual machines consume CPU time in the dispatch list, they are examined and reassigned priority as their dispatch time slices end. Because a virtual machine consumes a given amount of processing or storage resource, becomes idle, or is preempted in favor of other virtual machines in the eligible list, it moves back to the eligible list (if it can still be dispatched) or the dormant list (if it can no longer be dispatched).

Related information:

For a complete description of the scheduler, see Virtual Machine Scheduling and Dispatching in z/VM: Performance.