Using workload service assurance to monitor z/OS critical jobs

How to monitor jobs that are critical for the customer business and that must complete by their deadline.

This scenario shows how an operator can monitor the jobs that are critical for the customer business and that must complete by their deadline.

Overview

The operator uses the Dynamic Workload Console to meet a Service Level Agreement (SLA) that requires a DB2 database to be up and running each day by 3 p.m., after its backup.

The operator must be informed whether critical jobs risk missing their deadline, to take the appropriate actions if needed. While the plan is running, the operator expects that the scheduler dynamically controls the network of submitted jobs, detecting when a critical job predecessor is late, long running, or ended with an error.

Roles

The scheduling administrator and the operator are involved in this scenario:
IBM Workload Scheduler for z/OS scheduling administrator
When planning the operations, the administrator defines:
  • Scheduled time, duration, and deadline times.
  • Critical jobs.
IBM Workload Scheduler operator
Controls the submitted workload by using Critical Jobs and Hot List views.

Setting up the environment

Complete the following tasks when planning your operations:
  1. Mark your critical jobs in the z/OS database. Set DBSTART and DBPRINT as critical jobs, using a job network with the following structure:


  2. Run a daily planning job. The daily planning process calculates the critical paths in your job network, using the deadline, scheduled time arrival, and duration settings.

Running the scenario

After you updated your current plan, you can monitor your critical workload by using Critical Path and Hot List views:
  1. In the navigation bar, click Monitoring & Reporting > Workload Monitoring > Monitor Workload.
  2. Under the Engine drop-down list, select the check-box related to the engine or engines where the task must run.
  3. Under the Object Type drop-down list select Critical job.
  4. Click Edit.
  5. In the General Filter panel, specify DB* as Job Name and set a Risk Level different from None as filter criteria, because you want to monitor the critical jobs that risk missing their deadlines.
  6. Click Save to complete the task, leaving the default values in the remaining panels.
  7. Run the task.
  8. Select the DBSTART job and click Critical Path to view the path of the DBSTART predecessors with the least slack time. The Critical Path view does not show any cause for the delay, because no problems occurred for any of the DBSTART predecessors in the critical path. Return to the Monitor Workload task output.
  9. Click Hot List or the Potential Risk hyperlink to get a list of any critical job predecessor that is late, has been running for too long, or has ended with an error. The returned Hot List shows DBMAINT as a late job. It is scheduled to run on the CPU2 workstation.
    1. Click the CPU2 hyperlink.
    2. After verifying that CPU2 is offline, activate the workstation. The DBMAINT job starts to run.
  10. Refresh the Monitor Workload task output. It shows that the Risk Level for DBSTART job is now No Risk.