Comparison of different z/VM environments

This topic describes various further z/VM® environments used to run the combined workload set.

The combined workload set was processed within different z/VM environments:

  • A z/VM 6.3 installation using EDEV-SCSI based paging devices
    • with the database BI workload also using SCSI based devices
    • with the file system I/O workload using ECKD based devices
  • A z/VM 6.3 installation using ECKD based paging devices

    • with the database BI workload using SCSI based devices
    • with the file system I/O workload also using ECKD based devices
  • A z/VM 6.2 installation using EDEV-SCSI based paging devices
    • Except for the z/VM release, this environment was identical with the z/VM 6.3 EDEV-SCSI environment

Figure 1 shows the various z/VM configurations for each of the z/VM environments that were used for running the combined workload set.

Figure 1. Executions of the combined workload set within different z/VM environments

Executions of the combined workload set within different z/VM environments

Observations:

Some attempted test executions failed. A failure was assumed when any of the workloads did not successfully perform its complete test execution.

The reasons for failures were different:

  • In the case of the z/VM ECKD™ paging environment, all executions for the 64 GiB z/VM real memory case failed, because the high level variant of the page-cached file system I/O workload did not complete in time. A potential reason for the failure might have been that the file system I/O workload also used ECKD devices on the same storage system and received error conditions on these devices while z/VM produced high paging rates.
  • In the case of the z/VM 6.2 EDEV-SCSI paging environment, the transactional WAS workload encountered so many HTTP communication failures that an internal threshold was exceeded and caused the workload termination. In some cases, the database BI workload also encountered communication errors causing the workload to fail.

In case of workload failures we do not further consider the test execution results and other metrics from z/VM and the other workloads because the failure of one workload left z/VM resources vacated for use by other workloads, making the comparison with successful combined workload set executions impossible.

Conclusions:

  • Environment z/VM 6.3: EDEV-SCSI paging

    In this environment, z/VM paging, as well as the database BI workload, shared parts of the I/O subsystem and the storage server. In spite of this resource overlap we did not observe strong degradations of either the z/VM paging bandwidth or of the database BI workload I/O activity. In this environment we did not observe any complete workload failure, albeit increasing amounts of recoverable communication errors were observed for the transactional WAS workload when z/VM real memory was more constrained.

  • Environment z/VM 6.3: ECKD paging

    In this environment, z/VM paging, as well as the file system I/O workload, shared parts of the I/O subsystem and the storage server. This resource overlap caused the high level variant of the page-cached file system I/O workload to fail in the most constrained 64 GiB z/VM real memory configurations. Furthermore, in less memory constrained configurations where the file system I/O workload did not fail, the partial sharing of I/O resources between z/VM paging and other I/O activities might have accounted for z/VM being retarded from achieving its full page I/O bandwidth.

  • Environment z/VM 6.2: EDEV-SCSI paging:

    Most of the failure situations we observed occurred in the z/VM 6.2 environment. Apparently in that environment already a moderate overcommitment of z/VM real memory and real CPUs caused more severe constraints within the workload virtual systems, resulting earlier in workloads failures.

Analysis of overall workload results

Figure 2 shows the relative throughput of the overall workload for the various z/VM environments that were analyzed while the combined workload set was executed in various z/VM configurations.

Figure 2. Overall workload relative throughput

Overall workload relative throughput
Note: The results of the z/VM 6.3: ECKD paging environment are relative with respect to the reference execution of the z/VM 6.3: EDEV-SCSI paging environment, because in the absence of z/VM paging activity these two environments were assumed to behave identically.

Observations:

When 15 or less real CPU were configured for z/VM 6.3, the overall workload relative throughput exhibited similar values for the environment using EDEV-SCSI paging devices and for the environment using ECKD based paging devices. However, when 20 real CPU were configured for z/VM, the overall throughput degraded faster for the ECKD configuration than for the EDEV-SCSI configuration. Furthermore, when 64 GiB z/VM real memory were configured in the z/VM 6.3: ECKD paging case, all test executions failed. Likewise, some test executions in the z/VM 6.2: EDEV-SCSI environment failed. For details, see Figure 1.

The z/VM 6.2 environment overall produced lower throughput values. Furthermore, when z/VM resources were not constrained (CID 512/25 on the far left side in Figure 2), the overall throughput was lower than that achieved in the unconstrained z/VM environment using EDEV-SCSI paging.

Conclusions:

First of all, note that the overall workload relative throughput is a rather coarse metric, because it treats all workload variants with the same weight, leveling the rather unique characteristics of the workloads. That notwithstanding, the results presented in Figure 2 unfold a surprisingly consistent view.

The z/VM 6.3 environment using EDEV-SCSI paging consistently exhibited very high overall throughput values. Even in the most memory constrained configurations with only 64 GiB real memory, no workload failures occurred. This makes this environment all in all the best performing and most reliable one for processing the combined workload set.

With only moderate z/VM real memory constraints, the z/VM 6.3 environment using ECKD paging devices performed similarly to the one using EDEV-SCSI paging devices. However, the failures occurring when z/VM was configured with 64 GiB real memory, and the earlier throughput drop when z/VM was configured with 20 real CPUs were weak points for this environment.

The z/VM 6.2 environment using EDEV-SCSI paging consistently exhibited comparatively low overall throughput values. A factor might have been that the traditional z/VM scheduler and the memory subsystem in use in the z/VM 6.2 environment were less capable than their completely redesigned z/VM 6.3 counterparts.

Analysis of z/VM paging

Figure 3 shows the average page read rates from the paging devices for the various z/VM environments that were analyzed while processing the combined workload set in various z/VM configurations.
Figure 3. z/VM average page read rates from paging devices

z/VM average page read rates from paging devices

Observations:

Within the z/VM 6.3 environments, we observed average page read rates that progressively increased as z/VM real memory became more constrained. Within the z/VM 6.2 environment, the increase of the page read rates was much steeper than within any z/VM 6.3 environment, and reached very high values. Again, missing data points indicate a failing test execution, see Figure 1.

Conclusions:

Except for the z/VM 6.2 environment, the growth of the z/VM average page read rate seems reasonable for addressing the increased pressure on z/VM real memory. Also in these cases, the paging I/O rates (not shown in Figure 3) were well within the capabilities of the paging devices.

As explained in the workload analysis, the overall workload relative throughput values obtained in the z/VM 6.2 environment were below those achieved in the z/VM 6.3 environment. Thus, the very high average page read rates observed in the z/VM 6.2 environment did not result in higher virtual memory availability. In addition, the use of expanded storage (as recommended for z/VM 6.2) did not help to attenuate the z/VM real memory constraints to the extent achieved by z/VM 6.3 environments. Apparently, the traditional paging subsystem of the z/VM 6.2 environment already reached its limits when moderate z/VM real memory constraints existed.

Analysis of z/VM CPU load

Figure 4 shows the z/VM user CPU load and the CP attributed to guest CPU load, summed up for all virtual systems. Recall that the CP attributed to guest CPU load is part of the user CPU load.

Figure 4. z/VM overall user CPU load and CP attributed to guest CPU load

z/VM overall user CPU load and CP attributed to guest CPU load

Observations:

As stated before, missing data points indicate a failing test execution, see Figure 1.

When 128 GiB or more real memory was configured for z/VM, the combined user CPU load stayed very close to the amount of real CPUs configured for z/VM. With 96 or less real memory configured for z/VM, the CPU load values diverged for the different z/VM environments.

When higher numbers of real CPU were configured for z/VM, the effect of the reduction of z/VM real memory on the user CPU load was more significant. In some cases (marked with a gray arrow in Figure 4), notable real CPU idle states (real CPU wait times) were observed (0.5 CPUs, or more). Only when real CPU idle times do not occur, a reduction of the user CPU load directly corresponds to an increase of the system CPU load.

On the other hand, when – in a CPU constrained situation – real CPU idle times occur, these indicate the fraction of real CPUs that are waiting for prerequisite activities – such as a page-in operations – to complete before being able to process work pending on virtual CPUs. In other words, in a CPU constrained situation real CPU idle times indicate that there is excess CPU power available in the environment that cannot be used because other parts of the environment (such as real memory) are over-committed.

The z/VM 6.3 environments were least affected by z/VM real memory constraints, when the number of z/VM real CPU was kept constant. At the same time, the CP attributed to guest CPU load was almost unaffected by z/VM real memory constraints, except for the z/VM 6.2 environment when 20 real CPUs were configured for z/VM.

Figure 5 shows the z/VM system CPU load.

Figure 5. z/VM overall user CPU load and CP attributed to guest CPU load

z/VM overall user CPU load and CP attributed to guest CPU load

Observations:

The z/VM overall system CPU load progressively increased, when the available z/VM real memory was constrained. Except for the z/VM 6.2 environment, the overall system CPU load stayed below 9 % of the real CPU capacity. The system CPU load behaved very similar to the page-read rates (see Figure 3). Again, missing data points indicate a failing test execution, see Figure 1.

Conclusions for user and system CPU load:

For paging subsystems using EDEV-SCSI based paging devices, an increase of the system CPU load is to be expected, because SCSI I/O implies real CPU activity for data transfer operations. Opposed to that, paging subsystems using ECKD based devices are relieved of many I/O related activities through the use of system assist processors (SAPs). Nevertheless, also for the z/VM 6.3 environments using ECKD based paging devices, we observed an increase of the system CPU load, as z/VM real memory became more constrained.

A significant portion of the system CPU load is assumed to be caused by the higher paging activity caused by the lack of z/VM real memory. Especially in the z/VM environments with only 64 GiB real memory – which is less than half the real memory used during the reference execution – the virtual memory access latency caused by the lack of real memory and resulting z/VM paging activity is a key parameter for the overall performance. The faster the paging subsystem is the better the whole system is performing. Workloads with a memory use pattern that causes larger fractions of their virtual memory being kept resident by z/VM have an advantage in this strongly memory constrained situation. Likewise, workloads are advantaged that are able to process alternate execution paths when being notified by z/VM about a page fault (see Interactions between virtual memory consumers and providers).

Overall, the system overhead resulting from constraints of z/VM real memory and/or real CPUs was very well within reasonable limits, except for the z/VM 6.2 environment. Even in the most memory constrained z/VM 6.3 configurations, more than 94 % of the available real CPU capacity was used for user workload processing.