Cluster jobs

When managing an IBM® i cluster, you need to know about cluster job structures and how they are organized on the system.

Cluster resource services jobs

Cluster resource services consists of a set of multi-threaded jobs. Critical cluster resource services jobs are system jobs and run under the QSYS user profile. Several work management-related functions, such as ending a job (ENDJOB), are not allowed on system jobs. This mean that a users cannot inadvertently end one of these cluster system jobs, causing problems in the cluster and high availability environment. When clustering is active on a system, the following jobs run as system jobs:
  • Cluster control job consists of one job that is named QCSTCTL.
  • Cluster resource group manager consists of one job that is named QCSTCRGM.
    Note: The QCSTCTL and QCSTCRGM job are cluster critical jobs. That is, the jobs must be running in order for the node to be active in the cluster.
  • Each cluster resource group consists of one job per cluster resource group object. The job name is the same as the cluster resource group name.
  • Cluster administrative domain jobs consist of a single system job running on every node in the cluster. The name of the system job is the name of the cluster administrative domain.

It is important to note that some work management actions will end these cluster system jobs, causing a failover to occur. During these actions, clustering ends and failover occurs, based on how that node is defined in the CRG. See the topic, Example: Failover outage events, for a complete list of system-related events that cause failovers.

You can use the Change Cluster Recovery (CHGCLURCY) command to restart the cluster resource group job that ended without ending and restarting clustering on a node.

Several other less critical cluster-related jobs are part of the QSYSWRK subsystem. Ending this QSYSWRK subsystem, ends these jobs without causing failover, however they can cause cluster problems, which may require a recovery action. Some of these jobs run under the QSYS user profile.

Most cluster resource group APIs result in a separate job being submitted that uses the user profile specified when the API was invoked. The exit program defined in the cluster resource group is called in the submitted job. By default, the jobs are submitted to the QBATCH job queue. Generally, this job queue is used for production batch jobs and will delay or prevent completion of the exit programs. To allow the APIs to run effectively, create a separate user profile, job description, and job queue for use by cluster resource groups. Specify the new user profile for all cluster resource groups that you create. The same program is processed on all nodes within the recovery domain that is defined for the cluster resource group.

A separate batch job is also submitted for a cluster administrative domain when a cluster resource group API is called. The IBM supplied QCSTADEXTP program is called. The submitted job runs under the QCLUSTER user profile using the QCSTJOBD job description.