General Page
In version 4.5 we added the ability to better monitor BRMS backups by using a backup item exit program (WRKCTLGBRM, Opt 8) instead of a control group entry exit (WRKCTLGBRM, Opt 2, *EXIT). Often this results in surfacing errors previously ignored and thus the backup is reported as failed.
A BRMS control group ends in one of three ways:
- Success. Completion message BRM1049 is sent, all entries were processed, including the *EXIT with ENDFSFLASH *NORMAL.
- With Errors. Escape message BRM10A1 is sent, but the control group did process all the entries, including the *EXIT with ENDFSFLASH *NORMAL.
- Abnormal. Escape message BRM1820 is sent and the control group did not run to completion and the *EXIT with ENDFSFLASH *NORMAL did not get called.
Thus, a control group using entry *EXIT ENDFSFLASH *NORMAL will indicate to the FSFC Toolkit that the backups finished as long as the control group ran to completion. As usual, it is imperative that the customer analyze their BRMS reports to verify the validity of their backups.
However, when using the backup item exit program QZBRMSEXIT, BRMS will call the exit program for each entry and also at the end of the control group. In trg.log (or VIEWLOG on the target) the message "Completion Status = E | N | F (QZBRMSEXIT)" will display what BRMS sent our program. The status is:
- N = Completed Normally
- E = Completed with Errors
- F = Failed
Thus, the toolkit is now able to pass on the backup status of the control group entry which BRMS reports. Since the FSFC Toolkit is an automation tool, not a backup application, it is not equipped to determine what is an acceptable error that should be ignored, how to recover from it, or whether to send a notification. The general philosophy is "If the backup application (BRMS) says it's an error or a failure, we'll pass that on to the operator, and the operator must take action".
As a backup application, BRMS uses the operating system commands and APIs to perform backups. Similar to the toolkit philosophy, BRMS is not equipped to determine which operating system errors are acceptable and can be ignored. In general, BRMS passes failures on to the operator to take action. Since each customer environment and expectations are unique, customers need to take ownership of the implementation of their system backup strategy. If a backup completes with errors, customers are encouraged to understand any errors and investigate if objects need to be omitted or if their process needs to change. Customers should strive to achieve successful backups.
If there is an escape message (even if handled) in the backup joblog, BRMS will notify the toolkit that the backup failed, even if the control group finishes with success (i.e. BRM1049). To resolve this situation, the escape messages must be removed from the joblog; this should be done by the application issuing the escape message.
In version 4.5 with build date of Apr. 9th, 2021 or later, the QZ_IGNORE_BRMS_ERROR was introduced to ignore all BRMS errors (details are here). While this is not recommended without analyzing the backup logs for a root cause, it restores the functionality of the toolkit to be equivalent to calling ENDFSFLASH *NORMAL from the final control group *EXIT when the added benefits of status messages through the control group processing provided by using QZBRMSEXIT.
In 4.6 we have decided to tackle this problem differently. There are two options.
First, the CSEDTA has a new parameter "Control group error behavior" where the user can decide whether to *IGNORE or *NOTIFY when the toolkit encounters a BRMS error. This is equivalent to using the aforementioned environment variable but will not ignore abnormal control group errors, only the "completed with errors".
Second, a new exit point *FAILNFY was created and will be called when ENDFSFLASH *FAILNFY is called (which may be triggered manually, in a *EXIT, or from QZBRMSEXIT). At this point the user can examine the backups and then force a *NORMAL or *FAILBKU by creating a data area QZRDHASM/CVTNORMAL or CVTFAILBKU.
[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"7.1.0"}]
Was this topic helpful?
Document Information
Modified date:
07 September 2022
UID
ibm16371290