Failed resource actions
Failed resource actions arise when the Pacemaker component of an RDQM high availability configuration encounters some problem with a resource on one of the nodes in an HA group.
The RDQM HA solution uses Pacemaker for monitoring and managing resources (see RDQM high availability). If Pacemaker encounters an error performing an operation on a resource on a node, it records this information using a failed resource action. Some failed resource actions prevent the resource from running and must be cleared before Pacemaker can restart the resource.
You can use the rdqmstatus -m command to see if there any failed resource actions that are stopping a queue manager from starting on one or more nodes.
You can then use the rdqmstatus -m qmname -a command to view the details of failed resource actions that are associated with a queue manager. Follow this action by using the rdqmclean command to clear these failed resource actions, and so free up any restricted resources. (You must also take action to resolve the problems that caused the failed resource action in the first place.)
- Queue manager
- Floating IP
- RDQM control
- Filesystem
- DR replication (DRBD)
- HA replication (DRBD)
Each type of resource can be subject to the following types of failure:
- Soft
- Soft failures are transient, and Pacemaker continues to try to recover the resource until it times out or is otherwise stopped.
- Hard
- A hard error requires administrative intervention. Hard errors block the resource from running on a particular node.
- Fatal
- A fatal error requires administrative intervention. Fatal errors block the resource from running on any node.
See Viewing RDQM and HA group status for examples of status including failed resource queue actions.
You can use the rdqmclean command to clear all failed resource actions associated with a specified queue manager, or all failed resource actions in the RDQM HA configuration.