Recovery Concept for the Automation Manager

To ensure the automation manager functionality as automation decision server, the primary automation manager (PAM), must be backed up by additional automation manager address spaces called secondary automation managers (SAMs).

For sysplexwide and single-system automation, the continuous availability of the automation manager is of paramount importance.

Secondary automation managers are able to take over the function whenever a primary automation manager fails.

Therefore, it is recommended that you have at least one secondary automation manager running. For sysplexwide automation, the SAM should run on a different system than the PAM. It is important though that all automation managers (PAM and SAMs) run on systems which are in the same time zone.

To enable software or hardware maintenance in the sysplex, SA z/OS supports a command to force the takeover of the primary automation manager.

A takeover is only possible when the following requirements are met:
  • All the automation manager instances must have access to a shared external medium (DASD) where the following is stored:
    • The configuration data (result of the ACF and AMC build process).
    • The schedule overrides VSAM file.
    • The configuration information data set — this is a mini file in which the automation manager stores the parameters with which to initialize the next time that it is started WARM or HOT.
    • The takeover file.
SA z/OS follows the concept of a floating backup because:
  • The currently active automation manager has no awareness of the existence (and location) of possible backup instances.
  • The location of the backup instances can change during normal processing without any interruption for the active automation manager.
  • There is no communication between the primary automation manager and its backup instances during normal operation except when a SAM that is to become the new PAM informs the current PAM of that fact during a planned takeover.

This has the advantage that in normal operation, the processing is not impacted by a backup structure which can change.

Depending on the number of resources, the takeover time from a primary to a secondary automation manager is in the range of one to two minutes.