You can recover a corrupted plan on a master
domain manager using the latest archived plan, however; this is possible
only if you have performed some configuration steps prior to the occurrence
of the file corruption.
The following procedure recovers a corrupted plan using ResetPlan. The
plan is recovered using the latest archived plan and any events logged throughout the day are
written to a new event message file. The last archived Symphony file is copied into the current
Symphony file and then JnextPlan is run to apply the events
from the evtlog.msg file. Restriction: Before you can perform the
recovery procedure, you must have completed some configuration steps prior to the file corruption
occurrence.
Note:
-
- Some IBM® Workload Scheduler events
that were triggered before applying the recovery procedure, might be triggered again after the
recovery procedure has completed. This limitation concerns those events that are not managed through
a message queue, for example, UNTIL, DEADLINE, and MAXDUR.
- Jobs in USERJOBS Job Stream are not subject to resource controlling. As a
result, affected resources should be adjusted and attended manually.
- Prompts in the recovered plan might have a prompt number different from the
prompt number in the original plan. To prevent mismatches, prompt reply events are not
recovered.
- Complete the following configuration steps so that you
can use the recovery procedure in the future if it becomes necessary:
- In the localopts file, add the
following attribute and value: bm log events = ON.
- Optionally, customize the path where IBM Workload Scheduler creates
the evtlog.msg event file by setting the bm
log events path property in the localopts file.
If you do not modify this setting, the evtlog.msg event
file is created in the following default location: <TWA_INST_DIR>/TWS.
- Stop and start all IBM Workload Scheduler processes
or run JnextPlan to create the evtlog.msg file.
- If necessary, you can configure the maximum size of
both the evtlog.msg and Intercom.msg event
files as follows:
evtsize -c evtlog.msg 500000000
evtsize -c Intercom.msg 550000000
Note: The default size
of these event files is 10 MB. When the maximum size is reached, events
are no longer logged to these files and the recovery procedure is
unable to recover them and any that follow. Moreover, the following
BATCHMAN warning
is logged to
<TWA_INST_DIR>/TWS/stdlist/traces/YYYYMMDD_TWSMERGE.log:
13:11:51 18.10.2012|BATCHMAN:+ WARNING:Error writing in evtlog:
AWSDEC003I End of file on events file.
13:11:51 18.10.2012|BATCHMAN:*
13:11:51 18.10.2012|BATCHMAN:* AWSBHT160E The EvtLog message file is full,
events will not be logged until a new Symphony is produced. Recovery with
event reapply is no more possible until that time.
13:11:51 18.10.2012|BATCHMAN:*
If you encounter this problem
increase the size of the
evtlog.msg and
Intercom.msg event
files.
Consider that for 80,000 jobs and a Symphony file of size
40 MB, the
evtlog.msg file is approximately 70
MB in size.
Important: The Intercom.msg maximum
size should always be set to a value greater than the maximum size
of evtlog.msg
In the
<TWA_INST_DIR>/TWS/stdlist/traces/YYYYMMDD_TWSMERGE.log trace
file, the
BATCHMAN process logs an informational
line containing the expected size of the
evtlog.msg queue.
For example:
19:02:06 14.10.2012|BATCHMAN:INFO:0.25 MB of events to log during this
batchman run
If
Intercom.msg reaches
the maximum size during the recovery procedure,
batchman stops.
- If the file system where evtlog.msg resides
runs out of space, a BATCHMAN warning is logged
to <TWA_INST_DIR>/TWS/stdlist/traces/YYYYMMDD_TWSMERGE.log as
follows:
13:10:36 16.10.2012|BATCHMAN:+ WARNING:Error writing in evtlog:
AWSDEC002E An internal error has occurred.
The following UNIX system error occurred on an events file:
"No space left on device" at line = 3517.
- Complete the recovery procedure:
- Ensure the IBM Workload Scheduler processes
are stopped. Run
conman stop
to stop them.
- Copy the information retrieved by running the planman
showinfo command.
- Run
ResetPlan
. The corrupted Symphony
file is archived in the schedlog folder.
- Copy the second last Symphony file archived in the schedlog folder,
and not the most recent one which is the corrupted file. For example,
on UNIX, submit the following command:
cp -p /opt/ibm/TWA/TWS/schedlog/MYYYYMMDDhhmm /opt/ibm/TWA/TWS/Symphony
- Run JnextPlan as follows using the
information retrieved when you ran planman showinfo:
JnextPlan -from MM/DD/YYYY hhmm TZ Timezone -for hhhmm
where,
- -from
- Production plan start time of last extension.
- -for
- Production plan time extension.
When running this procedure, consider that the run number,
that is, the total number of times the plan was generated, is automatically
increased by one.
Job stream instances that
have already completed successfully at the time this procedure is
run are not included in the recovered plan.
After you have performed
the recovery procedure, the workstation limit is set to 0 and the evtlog.msg queue
is cleared with each successive run of JnextPlan.