Troubleshooting
Problem
Resolving The Problem
Note: A job in Rollback might not show these statistics from WRKACTJOB if the rollback is being done by a second thread. The job will show END status, and you must look at the commit definition to see that a rollback is in progress. You should use the WRKCMTDFN command to view the status of the commit definition.
Ending a rollback will compromise the integrity of your database. It does not leave files "damaged"; however, it does leave them in a partial state that is unusable. Actions must be taken to use the files again. There are important considerations to understand about ending a rollback before you end it. It is highly recommended that you read this document in its entirety and understand all aspects before ending the rollback.
Aspects to consider when ending the rollback:
| o o |
It is important to identify the files involved in the rollback before ending the rollback. File level changes (ALTER TABLE) cannot be ended and the Rollback must complete |
| o | The files involved in the rollback will all be marked partial and will be unusable as soon as the cancel is done. Action must be taken and steps are documented below explaining how to allow the files to be usable again. |
| o | Other jobs trying to use the files (after the rollback is ended) will fail when accessing the file. |
| o | Ending the rollback will stop the system from rolling back changes; however, this will still require all the record locks to be released (which may take some time). |
After the rollback itself has been ended, there is still a step of releasing the record locks that can take a considerable amount of time. This step might be a few minutes, or a couple of hours. The releasing lock step is a subset of the rollback process and cannot be ended. The unlock step needs to complete. Monitoring the progress of this step is documented below. If you cannot wait for the locks to release, you may IPL the system to release them faster. This will be a normal IPL, as long as the rollback has been ended and you see the unlock step reach at least 1 percent complete.
To check the status of the rollback and lock status, do the following
1. Issue the WRKCMTDFN command, and enter the fully qualified job name.
2. Select Option 5 to Display the status.
3. Press F6 to Display resource status.
4. Page down one and select Journal.
5. Press F11 to see Rollback status.
6. Press F11 again to see Lock status.
Note: There is a step between the rollback of the entries and the unlocking of records. This step has to do with indexes. If there are a lot of indexes over the files involved in the transaction and there were a lot of record deletes, this step will take longer. The rollback will show 100%; however, the unlock will remain at 0% for some time. To estimate how long the unlock will take, you should start your estimate after it has moved to 1%.
Note: After the unlock step starts, a save while active should be able to run with out failures.
To find the files involved in the Rollback
1. Issue the WRKCMTDFN command, and enter the fully qualified job name.
2. Select Option 5 to Display the status.
3. Press F6 to Display resource status.
4. Select Option 1 by record to see the files involved.
5. Only files with a nonzero amount are affected.
Note: Not all files may be listed if the transaction is an NTS XA transaction. You can tell if it is an NTS XA by the commitment definition name; it will be *TNSOBJ for NTS XA. For NTS XA, you will need to go through the Journal Receiver to find the files involved by using DSPJRNl. Looking through many journal entries will be time consuming.
Actions you need to take when the Rollback is cancelled
One of the following options must be performed. You must fully understand what the job was doing and what changes it was making to your database to determine what option to take. Option 1 is IBMs recommendation to take.1. The best option is to delete the files involved and restore from a known state. It is recommended that you attach a new Journal Receiver before you delete the files involved in the rollback. APYJRNCHG can be used after the files are restored to get the files to a more current point in time. While the APYJRNCHG is processing, files involved cannot be used. Assistance with detailed questions of using APYJRNCHG are not supported under your normal SWMA; however, this can be covered under a consult line basis. There is not a way to estimate how long the APYJRNCHG will take, and you must decide whether it is a time saver to end the rollback/restore and use APYJRNCHG or to allow the rollback to finish. General steps to take are as follows:
a. CHGJRN JRN(LIB/JRN) JRNRCV(*GEN) Note: Message in job log will note what the first sequence number is. You must know that sequence number to use APYJRNCHG. Sequence number is not reset. First sequence number is 6436076. <-- note seq #
b. Delete files.
c. Restore saved version of files.
d. Apply changes to files:
o APYJRNCHG JRN(LIB/JRN) FILE((LIB/PF1) (LIB/PF2) (LIB/PF3) (LIB/PF4)) RCVRNG(*LASTSAVE) FROMENTLRG(*LASTSAVE)
o TOENTLRG(6436076) CMTBDY(*YES) OUTPUT(*OUTFILE) OUTFILE(LIB/APYCNLOUTF)
Note: Replace the LIB/JRN and LIB/PF1 with correct journal and file names. You will also need to replace the sequence number that you obtained from Step a.
Using the APYJRNCHG as noted in Step d will only apply changes for complete transactions within the range. Other jobs work will be preserved; however, changes for the cancelled transaction will not be applied, nor will changes for other transactions that are not complete before the ending sequence number specified.2. Issue CHGJRNOBJ OBJ((LIB/FILE)) ATR(*PTLTNS) PTLTNS(*ALWUSE) replacing the LIB/FILE with all files involved. This will allow the files to be used as is, in the partial state. Keep in mind this does not verify the data, It just allows the file to be used in the partial state it is in. SQL or a program could be used to ensure the file is accurate or make changes to the data that is in the file if data manipulation is needed to finish removing the partial transactions. You need to know what data is needed to be changed and cannot tell what rows are partial. This requires an exclusive seize, but typically can be run when the unlock phase is still running.
CHGJRNOBJ will issue the following journal entry in the journal.
F MO Allow use with partial transaction entry for each file being removed from partial state
Vlog 0D001509 (vl0D001509) will also be issued for every object being removed from partial state
Note: APYJRNCHG/RMVJRNCHG will not be possible if this option is done.
Example of ending a rollback
Example 1: JOB1 was updating product information and inventory. It had updated 100 million products before the job was ended and started to rollback the changes. The rollback was ended 2 hours later. The file is now put into this partial and unusable state. Some of the products have been updated, and some of the changes have been removed. This could affect inventory if files are left and used in this state.
Considerations: This might be a case where cancelling the rollback was not a good choice because things are left in a questionable state. Getting files to a consistent state using DLTF/RSTOBJ/APYJRNCHG will require exclusive use of the files and may take longer than if the original rollback was just allowed to complete.
Example 2: JOB2 was deleting all orders from the previous year. It had deleted 100 million records; however, it had not completed. The job was ended, and the changes started to rollback (inserting the rows back in the file). The rollback was ended 1 hour later. In this example, the fact that all of the old history data did not get deleted is very likely OK. The delete can be reattempted either with commitment control set to *NONE, or by deleting one week worth of data and then issuing a commit.
If you have decided to continue on with ending the rollback, do the following
Terminating the rollback will place the following journal entries in the journal:1. If the job is already in Rollback you can skip this option. If you have not done this already, issue the ENDJOB *IMMED command. This will cause the job to go into rollback. 2. Issue the WRKCMTDFN command, and enter the fully qualified job name. 3. Press F23 to see more options. You will see END ROLLBACK Option 20.
If you do not see END ROLLBACK as an option, it is for one of the two reasons listed below:
o The customer has created QGPL/QTNNOENDRB *DTAARA to prevent anyone from ending a rollback. The data area will have to be deleted to enable Option 20.
o You do not have *ALLOBJ special authority. You will need this authority. It does count if your group profile has *ALLOBJ authority.
Select Option 20 by the commitment definition to end the rollback. You will receive a warning message asking 'Are you sure? Answer Yes. This is asking you to consider the first item in this document, data integrity.4. This will end the rollback, and the job will move to the unlock phase. VLOG 0F0-0FE01 will be issued which indicates the rollback was ended and moving into the unlock phase. 5. After the locks are released, perform one of the options listed above under Actions you need to take after the Rollback is cancelled.
- C CN End Rollback entry for the terminated rollback
- F C1 End Rollback entry for each file that is marked as partial
To avoid this from happening again
1. You should discuss with your programmer why this commitment control job took so long and why it had to be cancelled. Consider breaking the SQL up into smaller chunks, with more COMMITs in between. For example, use WHERE RRN(MYFILE) BETWEEN N and Z, incrementing the variables. Change program compile option default to not use commitment control - CHGCMDDFT CMD(CRTSQLRPGI) NEWDFT('COMMIT(*NONE)') .
2. Consider decreasing the maximum number of locks per transaction. The current default is 500 million. This means each transaction per journal can have up to 500 million locks. You can use the QAQQINI file to set the COMMITMENT_CONTROL_LOCK_LIMIT parameter to a smaller number. This would issue message SQL0904 with type 5 to the job or CPF5079 . For application jobs or batch jobs, this may force the job to end and start to rollback the changes.
SQL0904 - Resource limit exceeded.
-- Type 5 indicates that the commit lock limit was exceeded.
CPF5079 - Commitment control resource limit exceeded for this job.
3. Use System Limits to monitor for jobs that hold a large number of locks.
Example query:
select
last_change_timestamp,
JOB_NAME,
SBS_NAME,
JOB_STATUS,
AJSTATUS,
CURRENT_VALUE,
MAXVAL,
Sizing_Name
From QSYS2/syslimits
Where limit_ID = '16200'
Order by last_change_timestamp desc
Refer to the following URL for more information:
https://www.ibm.com/docs/en/i/7.5?topic=services-system-healthExample of a trigger program for SYSLIMITS can be found here:
https://www.ibm.com/docs/en/i/7.5?topic=services-syslimtbl-table
Historical Number
502344686
Was this topic helpful?
Document Information
Modified date:
17 January 2025
UID
nas8N1013413