Summary

This paper describes how IBM Cognos BI was deployed and tuned in a virtualized environment on Linux on System z, consisting of an IBM z196 model 2817-M66 using 10 CPUs and 70 GiB central storage, connected to an IBM System Storage® DS8800 Model 951.

This study analyzes how a workload mix in a virtualized environment can be managed to ensure a certain response time for a particular workload - in this study, a Cognos BI workload - independent of the load level of other competing workloads in a CPU constrained environment. It is obvious that this can only be reached at the expense of the less important workloads. The level of this expense is an important parameter which needs to be considered when defining any resource management rules.

The most important parameter to influence the assigned CPU resources to a z/VM® guest is the relative CPU share. Two methods of resource management are considered.

  1. Defining the relative shares manually (static method)

    This should provide a base line and show what happens when competing workloads run with no prioritization. For the setup, each guest gets all the virtual CPUs the workload would require. Then, each virtual CPU contributes with a value of 100 to the relative share of the guest. This method is called the fair-share setup in the following context.

    z/VM guests with the largest amount of virtual CPUs will get the highest relative share in sum.

    Obviously the big winner was a larger DayTrader application guest with 6 virtual CPUs and a high CPU load level. The Cognos BI server showed a significant throughput degradation and a moderate degradation in response times, because the individual guests end up with lower relative shares in sum for this method. z/VM guests with a larger amount of virtual CPUs will receive the highest relative share.

  2. Using VMRM to manage CPU resources (dynamic method)
    • First the relevant VMRM management parameters were determined. As z/VM monitoring is the base for VMRM resource management, the CP monitor sample interval determines the frequency, how often VMRM receives updated CP monitor sample records in order to adjust management parameters accordingly. The default interval of 60 seconds can be too large, if a faster adjustment to changed load levels and hence a faster adjustment of the management parameters is desired. The recommendation is to reduce the CP monitor sample interval to 30 seconds or 15 seconds in that case. Six seconds are found to be too short, and the overhead increased.
    • The most important VMRM parameter for this method is the CPU goal parameter.
      • With the most extreme CPU goal settings of 100 for the Cognos BI workload and 1 for the competing DayTrader workloads (One-workload-gets-all), it is possible to keep the load level and response times of the Cognos BI workload almost stable, independently of a constant parallel DayTrader workload with different load levels or an injected workload peak. However this configuration bears the risk that the competing workloads receive a severe reduction of assigned CPU resources. Even application timeouts can occur in the worst case when the system does not have sufficient CPU resources.
      • A more moderate configuration was a CPU goal setting of 70 for the Cognos BI workload, and 25 for the competing DayTrader workloads. This ensures a certain capacity for these competing workloads, once the system CPU capacity is reached. This takes place at the expense of the Cognos BI workload, but as long as the Cognos BI response times are still in an acceptable range, that might be a tolerable situation. Besides that, this configuration is consistent against a further increase of the load level for the DayTrader workloads.
    • In summary, the recommended approach is to adjust the VMRM goals for the workload groups in an iterative way, for example, starting with 70/25 and increasing it to 80/20 until the desired service times for the preferred workload are reached.

      VMRM starts regulating the relative shares with the initial value defined in a z/VM user directory entry for the guest. In order to reduce the VMRM regulating intervals for the relative share target according to the CPU velocity goals, it might be useful to start with already lower shares in the user directory for less prioritized workloads to shorten the time for reaching the target shares. After that, VMRM does not change the relative shares, when the system becomes unconstrained again.

In an environment with groups of guests that have cooperating components, or run clustered, there are additional advantages of using VMRM instead of using a static relative share setup:

  • Additional guests are just added or removed to the workload group without the need to manually review the relative shares. For example, adding a new low prioritized guest in a non-VMRM environment managed with relative shares only, the relative shares of all guests have to be recalculated, otherwise the CPU portion for all low prioritized guests increase.
  • When only some of the high prioritized guests are active, but a high amount of the low prioritized guests, they might get a larger portion of the CPU capacity than intended, because the sum of their relative share is higher.

When using the z/VM Resource Manager, there are some important things to consider:

  • The CPU goal definitions are in regard to wait times and not to CPU usage. That means, a setting with a CPU goal of 80 does not indicate that the workload gets 80% of CPU capacity. It might be less.
  • VMRM is compatible with the ILMT (IBM License Metric Tool).
  • VMRM is NOT compatible with CPU pooling.
  • VMRM is only effective when the system is CPU constrained. As long as free CPU capacity is available, all guests can get the CPU resources they require.
  • This study is restricted to the VMRM capability to manage the CPU resources. Other VMRM management capabilities are not considered.

VMRM can be a very useful tool to manage CPU resources in an environment with two or more competing workloads, where a single workload has a higher priority than the others. It can ensure constant throughput levels and response times independent of the load level of any competing workloads. It can also simplify the z/VM setup. Especially, the capability to manage environment changes, either in guest setup or with workloads level, makes it very attractive.