Service classes, report classes, and CPU activity

Learn how to use the SMF 72 records to observe the workload activity of a service class that is managed by z/OS® Workload Manager (WLM), report class for a IBM® z/OS Connect server, and a report class for a specific API. You will also learn how the SMF 70 records are used to observe the CPU activity for an LPAR.

Using SMF 72 records to analyze workload activity in service classes and report classes

z/OS Workload Manager (WLM) is a component of z/OS that monitors a sysplex and determines how much resource is given to each item of work in the sysplex to meet the goals that you have defined for it.

Everything that is managed by WLM has a service class, and can optionally have a report class. Service classes relate to the process of managing the work, and are used to tell the z/OS system what work is more important and what work is less important. Report classes relate to the process of collecting data about the work for later report generation.

In WLM, you can assign a service class and a report class for a IBM z/OS Connect server. The service class and report class for each item can be arranged as you choose. For example, two IBM z/OS Connect servers might be defined with the same service class, and could share the same service class as other products such as CICS® or IMS so that they have the same performance goals and the same level of access to system resources. However, they can each have different report classes, and therefore their usage of system resources is reported separately. For the best results, you should place work of the same type, with the same goals and importance, into the same service class wherever possible, but you should use as many different report classes as you need to achieve the level of reporting granularity that you require in reporting.

Each service class is associated with a performance goal, which specifies the target toward which WLM manages the work in the service class. For example, the goal could specify an average response time for APIs in the service class. When you create a goal for your service class you also assign an importance level, which applies in case problems occur in meeting the goal. The importance level tells WLM how important it is to meet the goal relative to meeting the goals set for other work in the sysplex.

Each service class is associated to an Importance level that specifies how important it is to your business that this workload is meeting its goal. The Importance defines how work is treated by the system and which service class should receive resources in order to achieve the target goal. The Importance level can be one of the following values:
  • Highest (1)
  • High (2)
  • Medium (3)
  • Low (4)
  • Lowest (5)
  • Disc (Discretionary) - A discretionary goal means “do the best that you can,” and usually applies to batch jobs. Note that zIIP processing cannot be performed by GCP processors for discretionary workloads. Do not use a discretionary goal for IBM z/OS Connect .

The absolute value is meaningless; it is the relative value that matters. Importance is ignored when you are meeting your goals. Discretionary work has implicitly no importance while SYSTEM and SYSSTC work are considered the highest importance works because of their top dispatching priority of 255 (0xFF) and 254 (0xFE) respectively.

Set the right importance level for IBM z/OS Connect workloads.
  • When prioritizing API provider workloads, where the IBM z/OS Connect server and SOR are managed by the same z/OS WLM, you should set the SOR to a higher importance level than the IBM z/OS Connect server so as not to overwhelm the SOR with requests.
  • When prioritizing API requester workloads where the IBM z/OS Connect server and the calling application (CICS, IMS or z/OS application) are managed by the same z/OS WLM, you should set the IBM z/OS Connect server to a higher importance level than the calling application so as not to flood the IBM z/OS Connect server with requests.

To see what service class a particular job is running in, issue the SDSF DA command. The following example shows the IBM z/OS Connect server ZOSCONN is running in service class STC. This job is running in Period 1 (SP 1). For IBM z/OS Connect servers, it is not recommended to use a service class with multiple periods as this can impact the throughput. Multiple periods should be considered only for batch jobs. The page also shows the Report Class for this job (ZOSCONN), and that the MEMLIMIT set in the JCL for the server is 8 GB.

Figure 1. Checking the service class

For a specific service class, for example, STC, you can view the SMF 72 records produced by RMF which show the CPU activity for all products classified in this service class and managed by z/OS WLM. In this example, all IBM z/OS Connect servers and CICS regions are classified within this service class.

Figure 2. Workload activity for service class STC
The report extract in Figure 2 shows a wealth of useful information. For example,
  • The interval time. Here it is set to 1 minute (line 2429). The data is averaged over the 1 minute period that started at 00.02.15. An interval of 1 minute is useful when diagnosing a problem, but it also generates a large amount of data. For a production environment, the interval is typically larger, for example, between 10 minutes and an hour, depending on how much SMF data you want to collect.
  • The date and time the policy was last activated (line 2432). If you changed the z/OS WLM configuration, for example, added a Report Class, you need to install and activate it so that it appears in the reports.
  • The name of the service class in this example is STC, and is configured in z/OS WLM to have importance level 2 (line 2434). This means that all jobs in service class STC run with the same level of importance and access to CPU.
  • The jobs in this service class used 31.22% of the GCPs (line 2447), 126.75% of zIIPs (line 2449), with 0.27% eligible to run on zIIP but had to run on a GCP as the zIIPs were busy (line 2448).
  • This service class goal is configured with an execution velocity of 50% (line 2457). For this interval, this goal was exceeded on this system (MV22) running with an execution velocity of 87.4% (line 2462).
  • The Performance Index is 0.6 (line 2462). A value less than one indicates that the service class is regularly exceeding its goals. If the value is 1, the service class is exactly meeting its goals. A value greater than 1 indicates the service class is not meeting its goals. Note that the performance index is only available for Service Class data, not Report Class data, and applies to all jobs in that service class.
  • All tasks in this service class waited an average of 0.9% for zIIPs (line 2462).

Given the information above, the CPU usage for this workload is acceptable. However, this information should be regularly monitored as workloads vary over time. Optionally you can classify an individual IBM z/OS Connect server in WLM by creating a Report Class. This is useful to see how well the server is performing and how much CPU is being used by the server.

Figure 3. Workload Activity for Report Class ZOSCONN
In the RMF extract in Figure 3, the IBM z/OS Connect server workload activity was classified for Report Class ZOSCONN.
  • The amount of processing run on zIIPs was 31.12 % (line 38248)
  • The amount of processing run on GCPs was 9% (line 38246)
  • 8.41% of the zIIP eligible work had to run on a GCP because no zIIPs were available (line 38247). Given this was a high percentage of the processing that had to run on GCPs, it is worth considering adding another zIIP.
You can also classify in z/OS WLM an individual IBM z/OS Connect API, service, API requester, or a group of these resources by creating one or more Report Classes. This is useful to determine the CPU usage of, for example, an API.
Figure 4. Workload Activity for Report Class CAR100
The RMF extract in Figure 4 shows the following information for an API called "car100":
  • The API was called 23789 times (ENDED value on line 740665)
  • The number of calls per second (TPS) was 396 (END/S value on line 740666)
  • The amount of zIIP used was 90.5% (IIP value on line 740674)
  • The amount of GCP used was 6.79% of which 6.74% was eligible to run on zIIP if a zIIP processor was available to run the request (CP and IIPCP values on lines 740672 and 740763).
For examples of how to classify your workloads, APIs, and API requesters, see Measuring API workloads with WLM.

SMF 70 records, CPU Activity

The SMF 70 records provide useful information on the CPU activity for the machine. Subtype 1 contains measurement data including that for general processors (GCPs) and special purpose processors (zIIPs). Subtype 2 records contain measurement data for cryptographic coprocessors and accelerators. The following screen capture was taken while running a heavy workload through IBM z/OS Connect that was then calling CICS transactions.

Figure 5. SMF 70 record showing CPU activity for a heavy workload
The SMF 70 record in Figure 5 contains information that is useful when monitoring the performance of workloads. For example,
  • More work is running on the three zIIP engines than the GCPs. IBM z/OS Connect is up to 99% zIIP eligible, which accounts for the zIIP usage. To confirm this, look at the service classes your IBM z/OS Connect servers are running in and observe the CPU usage, and optionally the report classes for your server.
  • The three GCPs and three zIIPs are all online 100 % of the time ("TIME % - ONLINE" column).
  • These processors are all dedicated to this particular LPAR, and not shared with other LPARs ("LOG PROC SHARE %" column).
  • HiperDispatch mode is enabled, which means that if the workload is low, some processors can be temporarily taken offline. This would be observed in the "Parked" column. For this workload, all processors were busy, and none were parked.
  • SMT not enabled for either CPs or zIIPs (see Simultaneous Multithreading (SMT) for zIIPs in GCPs and zIIPs).
  • The MSU (million service units) value for this mainframe is 10017. This is useful for CPU capacity planning as you move your IBM z/OS Connect workloads from pre-production LPARs to production LPARs that might be on different hardware.
The RMF CPU activity report also shows whether work is queuing waiting for processors to be available.
Figure 6. RMF extract showing CPU activity
There are several observations to be made. For example,
  • 91.1% of the dispatched work was performed without waiting for a processor to become available (line 32). This is the ideal scenario in that there was no contention for a processor. Note that N is the number of GCPs and zIIPs available (line 49).
  • There was a delay in dispatched work getting access to a processor for about 8% of the work. For example, for 36 pieces of work in this interval, 3% were delayed waiting for a processor (line 40).
  • The sum of the N + 1 ... N + 150 percentages in the DISTRIBUTION OF IN-READY WORK UNIT QUEUE is the percentage of time when at least one task could not be dispatched. A value higher than 60% could indicate contention for CPU.
  • The NUMBER OF WORK UNITS (lines 48-51) indicate that most of the work was carried out on zIIP engines.