Troubleshooting
Problem
This document provides information regarding why IBM i jobs are being held when exceeding their CPU or storage limits.
Resolving The Problem
The Maximum CPU Time and Maximum Temporary Storage that a job can use are defined in the class object named by the routing entry in the subsystem description.
o When a job reaches the maximum processing unit time, it is ended with message CPC1218.
o When a job reaches the maximum temporary storage allowed, it is ended with message CPC1217.
In some cases, the jobs would be successful if allowed a bit more CPU time or storage. The jobs should be held rather than ended and the system operator should be notified.
o When a job reaches the maximum processing unit time, it is ended with message CPC1218.
o When a job reaches the maximum temporary storage allowed, it is ended with message CPC1217.
In some cases, the jobs would be successful if allowed a bit more CPU time or storage. The jobs should be held rather than ended and the system operator should be notified.
The change to hold the jobs instead of ending the jobs was introduced in 7.1 by PTF SI42845 for APAR SE45779.
The class object defines the processing attributes for a job. The routing entry in the subsystem description is used to determine which class object is used when a job is initiated. Two of these processing attributes within the class object are Maximum processing unit time (CPUTIME) and Maximum temporary storage allowed (MAXTMPSTG), which both have default values of *NOMAX. Prior to this recent PTF, if values were entered for these parameters, the job would be ended if one of the limits was hit. The cause for each of these messages (CPC1218 , CPC1217 ) tells you whether the job ended abnormally due to the maximum CPU time being consumed or the maximum temporary storage limit being exceeded.
The system can not know if the job was actually near the completion of the work it had to do when it would end the job. It is possible that given a little more CPU time or temporary storage, the job would be able to run to completion. Because of the difficulty in predicting the upper CPU or temporary storage limits required by a job, along with the fact that the job would be ended when these limits were hit, many customers simply left these values at their default setting.
The above PTF that was recently released changes the behavior so that jobs are no longer ended when they have exceeded their maximum processing unit time or their maximum temporary storage limit. Rather, the jobs will be held. When a job is held by the system due to these conditions, a message will be sent to the QSYSOPR message queue:
o CPI112D – Job held by the system, CPUTIME limit exceeded
o CPI112E – Job held by the system, MAXTMPSTG limit exceeded
This change allows the system operator to determine whether the jobs should be ended or if they should be allowed to continue to run to completion.
If you want the jobs to continue to run, you must change the limit that was met and then use the Release Job (RLSJOB) command (you can not release a job that is above the limit). To allow these values to be changed, the Change Job command and the Change Job APIs have been enhanced.
The Change Job (CHGJOB) command has been enhanced with two new parameters:
| o | Maximum CPU time (CPUTIME): The maximum CPU time parameter specifies the maximum processing unit time (in milliseconds) that the job can use. If the maximum time is exceeded, the job is held. |
| o | Maximum temporary storage (MAXTMPSTG): The maximum temporary storage parameter specifies the maximum amount of temporary auxiliary storage (in megabytes) that the job can use. This temporary storage is used for storage required by the program itself and by implicitly created internal system objects used to support the job. (It does not include storage for objects in the QTEMP library.) If the maximum temporary storage is exceeded, the job is held. |
o Maximum processing unit time allowed, in milliseconds (1302)
o Maximum temporary storage allowed, in megabytes (1305)
This PTF makes it easier to protect your system from the effects of a run-away job that either consumes more CPU than expected or uses more temporary storage than expected. By setting these limits larger than what any job should use, you can protect the system from the potentially negative affects of a run-away job. Because the job will be held rather than ended, the limits do not need to be set perfectly. If either limit is hit, you can increase the limit with the Change Job command or API, and then release the job to allow it to continue to run. If the new upper limit is met, the system will once again hold the job.
With the change introduced with this PTF, you should start to move away from the default *NOMAX values and set appropriate limits. Particularly with the temporary storage limit, you can prevent a system outage by setting an upper limit on the class object for the maximum temporary storage that a job can use (be sure to keep that limit lower than the amount of storage available on the system). With the new behavior of the job being held when the limit is hit, you have the capability to assess and determine the best action for the job.
[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8m0z0000000CHjAAM","label":"Job and Work Management"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.1.0;7.2.0;7.3.0;7.4.0;7.5.0"}]
Historical Number
622866445
Was this topic helpful?
Document Information
Modified date:
02 October 2024
UID
nas8N1011183