mmaddcallback command

Registers a user-defined command that GPFS™ will execute when certain events occur.

Synopsis

mmaddcallback CallbackIdentifier --command CommandPathname
              --event Event[,Event...] [--priority Value]
              [--async | --sync [--timeout Seconds] [--onerror Action]]
              [-N {Node[,Node...] | NodeFile | NodeClass}]
              [--parms ParameterString ...]

or

mmaddcallback {-S Filename | --spec-file Filename} 

Availability

Available on all IBM Spectrum Scale™ editions.

Description

Use the mmaddcallback command to register a user-defined command that GPFS executes when certain events occur.

The callback mechanism is intended to provide notifications when node and cluster events occur. Invoking complex or long-running commands, or commands that involve GPFS files, may cause unexpected and undesired results, including loss of file system availability. This is particularly true when the --sync option is specified.

Note: For documentation about local events (callbacks) and variables for IBM Spectrum Scale RAID, see the separate publication IBM Spectrum Scale RAID: Administration.

Parameters

CallbackIdentifier
Specifies a user-defined unique name that identifies the callback. It can be up to 255 characters long. It cannot contain special characters (for example, a colon, semicolon, blank, tab, or comma) and it cannot start with the letters gpfs or mm (which are reserved for GPFS internally defined callbacks).
--command CommandPathname
Specifies the full path name of the executable to run when the event occurs. On Windows, CommandPathname must be a Korn shell script because it will be invoked in the Cygwin ksh environment.

The executable called by the callback facility must be installed on all nodes on which the callback can be triggered. Place the executable in a local file system (not in a GPFS file system) so that it is accessible even when the GPFS file system is unavailable.

--event Event[,Event...]
Specifies a list of events that trigger the callback. The value defines when the callback is invoked. There are two kinds of events: global events and local events. A global event triggers a callback on all nodes in the cluster, such as a nodeLeave event, which informs all nodes in the cluster that a node has failed. A local event triggers a callback only on the node on which the event occurred, such as mounting a file system on one of the nodes.

Table 1 lists the supported global events and their parameters.

Table 2 lists the supported local events and their parameters.

Local events for IBM Spectrum Scale RAID are documented in IBM Spectrum Scale RAID: Administration.

--priority Value
Specifies a floating point number that controls the order in which callbacks for a given event are run. Callbacks with a smaller numerical value are run before callbacks with a larger numerical value. Callbacks that do not have an assigned priority are run last. If two callbacks have the same priority, the order in which they are run is undetermined.
--async | --sync [--timeout Seconds] [--onerror Action]
Specifies whether GPFS will wait for the user program to complete and for how long it will wait. The default is --async (GPFS invokes the command asynchronously). --onerror Action specifies one of the following actions that GPFS is to take if the callback command returns a nonzero error code:
continue
GPFS ignores the result from executing the user-provided command. This is the default.
quorumLoss
The node executing the user-provided command will voluntarily resign as, or refrain from taking over as, cluster manager. This action is valid only in conjunction with the tiebreakerCheck event.
shutdown
GPFS will be shut down on the node executing the user-provided command.
-N {Node[,Node...] | NodeFile | NodeClass}
Defines the set of nodes on which the callback is invoked. For global events, the callback is invoked only on the specified set of nodes. For local events, the callback is invoked only if the node on which the event occurred is one of the nodes specified by the -N option. The default is -N all. For general information on how to specify node names, see Specifying nodes as input to GPFS commands .

This command does not support a NodeClass of mount.

--parms ParameterString ...
Specifies parameters to be passed to the executable specified with the --command parameter. The --parms parameter can be specified multiple times.

When the callback is invoked, the combined parameter string is tokenized on white-space boundaries. Constructs of the form %name and %name.qualifier are assumed to be GPFS variables and are replaced with their appropriate values at the time of the event. If a variable does not have a value in the context of a particular event, the string UNDEFINED is returned instead.

GPFS recognizes the following variables:
%blockLimit
Specifies the current hard quota limit in KB.
%blockQuota
Specifies the current soft quota limit in KB.
%blockUsage
Specifies the current usage in KB for quota-related events.
%ccrObjectName
Specifies the name of the modified object.
%ccrObjectValue
Specifies the value of the modified object.
%ccrObjectVersion
Specifies the version of the modified object.
%clusterManager[.qualifier]
Specifies the current cluster manager node.
%clusterName
Specifies the name of the cluster where this callback was triggered.
%ckDataLen
Specifies the length of data involved in a checksum mismatch.
%ckErrorCountClient
Specifies the cumulative number of errors for the client side in a checksum mismatch.
%ckErrorCountNSD
Specifies the cumulative number of errors for the NSD side in a checksum mismatch.
%ckErrorCountServer
Specifies the cumulative number of errors for the server side in a checksum mismatch.
%ckNSD
Specifies the NSD involved.
%ckOtherNode
Specifies the IP address of the other node in an NSD checksum event.
%ckReason
Specifies the reason string indicating why a checksum mismatch callback was invoked.
%ckReportingInterval
Specifies the error-reporting interval in effect at the time of a checksum mismatch.
%ckRole
Specifies the role (client or server) of a GPFS node.
%ckStartSector
Specifies the starting sector of a checksum mismatch.
%daName
Specifies the name of the declustered array involved.
%daRemainingRedundancy
Specifies the remaining fault tolerance in a declustered array.
%diskName
Specifies a disk or a comma-separated list of disk names for which this callback is triggered.
%downNodes[.qualifier]
Specifies a comma-separated list of nodes that are currently down. Only nodes local to the given cluster are listed. Nodes which are in a remote cluster but have temporarily joined the cluster are not included.
%eventName
Specifies the name of the event that triggered this callback.
%eventNode[.qualifier]
Specifies a node or comma-separated list of nodes on which this callback is triggered. Note that the list may include nodes which are not local to the given cluster, but have temporarily joined the cluster to mount a file system provided by the local cluster. Those remote nodes could leave the cluster if there is a node failure or if the file systems are unmounted.
%filesLimit
Specifies the current hard quota limit for the number of files.
%filesQuota
Specifies the current soft quota limit for the number of files.
%filesUsage
Specifies the current number of files for quota-related events.
%filesetName
Specifies the name of a fileset for which the callback is being executed.
%filesetSize
Specifies the size of the fileset.
%fsErr
Specifies the file system structure error code.
%fsName
Specifies the file system name for file system events.
%hardLimit
Specifies the hard limit for the block.
%homeServer
Specifies the name of the home server.
%inodeLimit
Specifies the hard limit of the inode.
%inodeQuota
Specifies the soft limit of the inode.
%inodeUsage
Specifies the total number of files in the fileset.
%myNode[.qualifier]
Specifies the node where callback script is invoked.
%nodeName
Specifies the node name to which the request is sent.
%nodeNames
Specifies a space-separated list of node names to which the request is sent.
%pcacheEvent
Specifies the pcache related events.
%pdFru
Specifies the FRU (field replaceable unit) number of the pdisk.
%pdLocation
The physical location code of a pdisk.
%pdName
The name of the pdisk involved.
%pdPath
The block device path of the pdisk.
%pdPriority
The replacement priority of the pdisk.
%pdState
The state of the pdisk involved.
%pdWwn
The worldwide name of the pdisk.
%prepopAlreadyCachedFiles
Specifies the number of files that are cached. These number of files are not read into cache because data is same between cache and home.
%prepopCompletedReads
Specifies the number of reads executed during a prefetch operation.
%prepopData
Specifies the total data read from the home as part of a prefetch operation.
%prepopFailedReads
Specifies the number of files for which prefetch failed. Messages are logged to indicate the failure. However, there is no indication about the file names that failed to read.
%quorumNodes[.qualifier]
Specifies a comma-separated list of quorum nodes.
%quotaEventType
Specifies either the blockQuotaExceeded event or the inodeQuotaExceeded event. These events are related to soft quota limit being exceeded,
%quotaID
Specifies the numerical ID of the quota owner (UID, GID, or fileset ID).
%quotaOwnerName
Specifies the name of the quota owner (user name, group name, or fileset name).
%quotaType
Specifies the type of quota for quota-related events. Possible values are USR, GRP, or FILESET.
%reason
Specifies the reason for triggering the event. For the preUnmount and unmount events, the possible values are normal and forced. For the preShutdown and shutdown events, the possible values are normal and abnormal. For all other events, the value is UNDEFINED.
%requestType
Specifies the type of request to send to the target nodes.
%rgCount
The number of recovery groups involved.
%rgErr
A code from a recovery group, where 0 indicates no error.
%rgName
The name of the recovery group involved.
%rgReason
The reason string indicating why a recovery group callback was invoked.
%senseDataFormatted
Sense data for the specific fileset structure error in a formatted string output.
%senseDataHex
Sense data for the specific fileset structure error in Big endian hex output.
%snapshotID
Specifies the identifier of the new snapshot.
%snapshotName
Specifies the name of the new snapshot.
%softLimit
Specifies the soft limit of the block.
%storagePool
Specifies the storage pool name for space-related events.
%upNodes[.qualifier]
Specifies a comma-separated list of nodes that are currently up. Only nodes local to the given cluster are listed. Nodes which are in a remote cluster but have temporarily joined the cluster are not included.
%userName
Specifies the user name.
%waiterLength
Specifies the length of the waiter in seconds.

Variables recognized by IBM Spectrum Scale RAID are documented in IBM Spectrum Scale RAID: Administration.

Variables that represent node identifiers accept an optional qualifier that can be used to specify how the nodes are to be identified. When specifying one of these optional qualifiers, separate it from the variable with a period, as shown here:
variable.qualifier
The value for qualifier can be one of the following:
ip
Specifies that GPFS should use the nodes' IP addresses.
name
Specifies that GPFS should use fully-qualified node names. This is the default.
shortName
Specifies that GPFS should strip the domain part of the node names.

Events and supported parameters

Table 1. Global events and supported parameters
Global event Supported parameters
afmFilesetExpired
Triggered when the contents of a fileset expire either as a result of the fileset being disconnected for the expiration timeout value or when the fileset is marked as expired using the AFM administration commands.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetUnexpired
Triggered when the contents of a fileset become unexpired either as a result of the reconnection to home or when the fileset is marked as unexpired using the AFM administration commands.
%fsName %filesetName %pcacheEvent %homeServer %reason
nodeJoin
Triggered when one or more nodes join the cluster.
%eventNode
nodeLeave
Triggered when one or more nodes leave the cluster.
%eventNode
quorumReached
Triggered when a quorum has been established in the GPFS cluster. This event is triggered only on the cluster manager, not on all the nodes in the cluster.
%quorumNodes
quorumLoss
Triggered when quorum has been lost in the GPFS cluster.
N/A
quorumNodeJoin
Triggered when one or more quorum nodes join the cluster.
%eventNode
quorumNodeLeave
Triggered when one or more quorum nodes leave the cluster.
%eventNode
clusterManagerTakeOver
Triggered when a new cluster manager node is elected. This happens when a cluster first starts up or when the current cluster manager fails or resigns and a new node takes over as cluster manager.
N/A
Table 2. Local events and supported parameters
Local event Supported parameters
afmCmdRequeued
Triggered during replication when messages are queued up again because of errors. These messages are retried after 15 minutes.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetUnmounted
Triggered when the fileset is moved to an Unmounted state because NFS server is not reachable or remote cluster mount is not available for GPFS Native protocol.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmHomeConnected
Triggered when a gateway node connects to the afmTarget of the fileset that it is serving. This event is local on gateway nodes.

%fsName %filesetName %pcacheEvent %homeServer %reason
afmHomeDisconnected
Triggered when a gateway node gets disconnected from the afmTarget of the fileset that it is serving. This event is local on gateway nodes.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmManualResyncComplete
Triggered when a manual resync is completed.
%fsName %filesetName %reason
afmPrepopEnd
Triggered when all the files specified by a prefetch operation have been completed. This event is local on a gateway node.
%fsName %filesetName %prepopCompletedReads %prepopFailedReads %prepopAlreadyCachedFiles %prepopData
afmQueueDropped
Triggered when replication encounters an issue that cannot be corrected. After the queue is dropped, next recovery action attempts to fix the error and continue to replicate.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmRecoveryFail
Triggered when recovery fails. The recovery action is retried after 300 seconds. If recovery keeps failing, fileset is moved to a resync state if the fileset mode allows it.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmRecoveryStart
Triggered when AFM recovery starts. This event is local on gateway nodes.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmRecoveryEnd
Triggered when AFM recovery ends. This event is local on gateway nodes.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmRPOMiss
Triggered when Recovery Point Objective (RPO) is missed on DR primary filesets, RPO Manager keeps retrying the snapshots. This event occurs when there is lot of data to replicate for the RPO snapshot to be taken or there is an error such as, deadlock and recovery keeps failing.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetCreate
Triggered when an AFM fileset is created.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetLink
Triggered when an AFM fileset is linked.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetChange
Triggered when an AFM fileset is changed. If a fileset is renamed the new name is part of %reason.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetUnlink
Triggered when a AFM fileset is linked.
%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetDelete
Triggered when an AFM fileset is deleted.
%fsName %filesetName %pcacheEvent %homeServer %reason
ccrFileChange
Triggered when CCR fput operation takes place.
%ccrObjectName %ccrObjectVersion
ccrVarChange
Triggered when CCR vput operation takes place.
%ccrObjectName %ccrObjectValue %ccrObjectVersion
daRebuildFailed
The daRebuildFailed callback is generated when the spare space in a declustered array has been exhausted, and vdisk tracks involving damaged pdisks can no longer be rebuilt. The occurrence of this event indicates that fault tolerance in the declustered array has become degraded and that disk maintenance should be performed immediately. The daRemainingRedundancy parameter indicates how much fault tolerance remains in the declustered array.
%myNode %rgName %daName %daRemainingRedundancy
deadlockDetected
Triggered when a node detects a potential deadlock. If the exit code of the registered callback for this event is 1, debug data will not be collected.

See the /usr/lpp/mmfs/samples/deadlockdetected.sample file for an example of using the deadlockDetected event.

%eventName %myNode %waiterLength
deadlockOverload
Triggered when an overload event occurs. The event is local to the node detecting the overload condition.
%eventName %nodeName
diskFailure
Triggered on the file system manager when the status of a disk in a file system changes to down.
%eventName %diskName %fsName
filesetLimitExceeded
Triggered when the file system manager detects that a fileset quota has been exceeded. This is a variation of softQuotaExceeded that applies only to fileset quotas. It exists only for compatibility (and may be deleted in a future version); therefore, using softQuotaExceeded is recommended instead.
%filesetName %fsName %filesetSize %softLimit %hardLimit %inodeUsage %inodeQuota %inodeLimit %quotaEventType
fsstruct
Triggered when the file system manager detects a file system structure (FS Struct) error.
For more information about FS Struct errors, see the following topics:
%fsName %fsErr %senseDataFormatted %senseDataHex
lowDiskSpace
Triggered when the file system manager detects that disk space usage has reached the high occupancy threshold that is specified in the current policy rule. The event is generated every two minutes until the condition no longer exists. For more information, see Using thresholds to migrate data between pools.
%storagePool %fsName
noDiskSpace
Triggered when the file system encounters a disk, or storage pool that has run out of space or an inodespace has run out of inodes. An inode space can be an entire file system or an independent fileset. Use the noSpaceEventInterval configuration attribute of the mmchconfig command to control the time interval between two noDiskSpace events. The default value is 120 seconds.

When a storage pool runs out of disk space, %reason is "diskspace", %storagePool is the name of the pool that ran out of disk space, and %filesetName is "UNDEFINED".

When a fileset runs out of inode space, %reason is "inodespace", %filesetName is the name of the independent fileset that owns the affected inode space, and %storagePool is "UNDEFINED".

%storagePool %fsName %reason %filesetName
nsdCksumMismatch
The nsdCksumMismatch callback is generated whenever transmission of vdisk data by the NSD network layer fails to verify the data checksum. This can indicate problems in the network between the GPFS client node and a recovery group server. The first error between a given client and server generates the callback; subsequent callbacks are generated for each ckReportingInterval occurrence.
%myNode %ckRole %ckOtherNode %ckNSD %ckReason %ckStartSector %ckDataLen %ckErrorCountClient %ckErrorCountServer %ckErrorCountNSD %ckReportingInterval
pdFailed
The pdFailed callback is generated whenever a pdisk in a recovery group is marked as dead, missing, failed, or readonly.
%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn %pdState
pdPathDown
The pdPathDown callback is generated whenever one of the block device paths to a pdisk disappears or becomes inoperative. The occurrence of this event can indicate connectivity problems with the JBOD array in which the pdisk resides.
%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn %pdPath
pdReplacePdisk
The pdReplacePdisk callback is generated whenever a pdisk is marked for replacement according to the replace threshold setting of the declustered array in which it resides.
%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn %pdState %pdPriority
pdRecovered
The pdRecovered callback is generated whenever a missing pdisk is rediscovered.

The following parameters are available to this callback: %myNode, %rgName, %daName, %pdName, %pdLocation, %pdFru, and %pdWwn.

%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn
preMount, preUnmount, mount, unmount
These events are triggered when a file system is about to be mounted or unmounted or has been mounted or unmounted successfully. These events are generated for explicit mount and unmount commands, a remount after GPFS recovery and a forced unmount when GPFS panics and shuts down.
%fsName %reason
preRGRelinquish
The preRGRelinquish callback is invoked on a recovery group server prior to relinquishing service of recovery groups. The rgName parameter may be passed into the callback as the keyword value _ALL_, indicating that the recovery group server is about to relinquish service for all recovery groups it is serving; the rgCount parameter will be equal to the number of recovery groups being relinquished. Additionally, the callback will be invoked with the rgName of each individual recovery group and an rgCount of 1 whenever the server relinquishes serving recovery group rgName.
%myNode %rgName %rgErr %rgCount %rgReason
preRGTakeover
The preRGTakeover callback is invoked on a recovery group server prior to attempting to open and serve recovery groups. The rgName parameter may be passed into the callback as the keyword value _ALL_, indicating that the recovery group server is about to open multiple recovery groups; this is typically at server startup, and the parameter rgCount will be equal to the number of recovery groups being processed. Additionally, the callback will be invoked with the rgName of each individual recovery group and an rgCount of 1 whenever the server checks to determine whether it should open and serve recovery group rgName.
%myNode %rgName %rgErr %rgCount %rgReason
preShutdown
Triggered when GPFS detects a failure and is about to shut down.
%reason
preStartup
Triggered after the GPFS daemon completes its internal initialization and joins the cluster, but before the node runs recovery for any file systems that were already mounted, and before the node starts accepting user initiated sessions.
N/A
postRGRelinquish
The postRGRelinquish callback is invoked on a recovery group server after it has relinquished serving recovery groups. If multiple recovery groups have been relinquished, the callback will be invoked with rgName keyword _ALL_ and an rgCount equal to the total number of involved recovery groups. The callback will also be triggered for each individual recovery group.
%myNode %rgName %rgErr %rgCount %rgReason
postRGTakeover
The postRGTakeover callback is invoked on a recovery group server after it has checked, attempted, or begun to serve a recovery group. If multiple recovery groups have been taken over, the callback will be invoked with rgName keyword _ALL_ and an rgCount equal to the total number of involved recovery groups. The callback will also be triggered for each individual recovery group.
%myNode %rgName %rgErr %rgCount %rgReason
rgOpenFailed
The rgOpenFailed callback will be invoked on a recovery group server when it fails to open a recovery group that it is attempting to serve. This may be due to loss of connectivity to some or all of the disks in the recovery group; the rgReason string will indicate why the recovery group could not be opened.
%myNode %rgName %rgErr %rgReason
rgPanic
The rgPanic callback will be invoked on a recovery group server when it is no longer able to continue serving a recovery group. This may be due to loss of connectivity to some or all of the disks in the recovery group; the rgReason string will indicate why the recovery group can no longer be served.
%myNode %rgName %rgErr %rgReason
sendRequestToNodes
Triggered when a node sends a request for collecting expel-related debug data.

For this event, the %requestType is requestExpelData.

%eventName %requestType %nodeNames
shutdown
Triggered when GPFS completes the shutdown.
%reason
snapshotCreated
Triggered after a snapshot is created, and run before the file system is resumed. This event helps correlate the timing of DMAPI events with the creation of a snapshot. GPFS must wait for snapshotCreated to exit before it resumes the file system, so the ordering of DMAPI events and snapshot creation is known.

The %filesetName is the name of the fileset whose snapshot was created. For file system level snapshots that affect all filesets, %filesetName is set to global.

%snapshotID %snapshotName %fsName %filesetName
softQuotaExceeded
Triggered when the file system manager detects that a soft quota limit (for either files or blocks) has been exceeded. This event is triggered only on the file system manager. Therefore, this event must be handled on all manager nodes.
%fsName %filesetName %quotaId %quotaType %quotaOwnerName %blockUsage %blockQuota %blockLimit %filesUsage %filesQuota %filesLimit
startup
Triggered after a successful GPFS startup before the node is ready for user initiated sessions. After this event is triggered GPFS proceeds to finish starting including mounting all file systems defined to mount on startup.
N/A
tiebreakerCheck
Triggered when the cluster manager detects a lease timeout on a quorum node before GPFS runs the algorithm that decides if the node will remain in the cluster. This event is generated only in configurations that use tiebreaker disks.
Note: Before you add or delete the tiebreakerCheck event, you must stop the GPFS daemon on all the nodes in the cluster.
N/A
traceConfigChanged
Triggered when GPFS tracing configuration is changed.
N/A
usageUnderSoftQuota
Triggered when the file system manager detects that quota usage has dropped below soft limits and grace time is reset.
%fsName %filesetName %fsName %quotaId %quotaType %quotaOwnerName %blockUsage %blockQuota %blockLimit %filesUsage %filesQuota %filesLimit

Options

-S Filename | --spec-file Filename
Specifies a file with multiple callback definitions, one per line. The first token on each line must be the callback identifier.

Exit status

0
Successful completion.
nonzero
A failure has occurred.

Security

You must have root authority to run the mmaddcallback command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

  1. To register command /tmp/myScript to run after GPFS startup, issue this command:
    mmaddcallback test1 --command=/tmp/myScript --event startup
    The system displays information similar to:
    mmaddcallback: Propagating the cluster configuration data to all
      affected nodes.  This is an asynchronous process.
  2. To register a callback on the NFS servers to export or to unexport a particular file system after it has been mounted or before it has been unmounted, issue this command:
    mmaddcallback NFSexport --command /usr/local/bin/NFSexport --event mount,preUnmount -N nfsserver1, 
    nfsserver2 --parms "%eventName %fsName" --parms "%eventName %fsName"
    The system displays information similar to:
    mmaddcallback: 6027-1371 Propagating the cluster configuration data to all
      affected nodes.  This is an asynchronous process.

Location

/usr/lpp/mmfs/bin