mmaddcallback command

Registers a user-defined command that GPFS™ will execute when certain events occur.

Synopsis

mmaddcallback CallbackIdentifier --command CommandPathname
              --event Event[,Event...] [--priority Value]
              [--async | --sync [--timeout Seconds] [--onerror Action]]
              [-N {Node[,Node...] | NodeFile | NodeClass}]
              [--parms ParameterString ...]

mmaddcallback {-S Filename | --spec-file Filename}

Availability

Available on all IBM Spectrum Scale™ editions.

Description

Use the mmaddcallback command to register a user-defined command that GPFS executes when certain events occur.

The callback mechanism is intended to provide notifications when node and cluster events occur. Invoking complex or long-running commands, or commands that involve GPFS files, may cause unexpected and undesired results, including loss of file system availability. This is particularly true when the --sync option is specified.

Note: For documentation about local events (callbacks) and variables for IBM Spectrum Scale RAID, see the separate publication IBM Spectrum Scale RAID: Administration.

Parameters

CallbackIdentifier

Specifies a user-defined unique name that identifies the callback. It can be up to 255 characters long. It cannot contain special characters (for example, a colon, semicolon, blank, tab, or comma) and it cannot start with the letters gpfs or mm (which are reserved for GPFS internally defined callbacks).

--command CommandPathname

Specifies the full path name of the executable to run when the event occurs. On Windows, CommandPathname must be a Korn shell script because it will be invoked in the Cygwin ksh environment.

The executable called by the callback facility must be installed on all nodes on which the callback can be triggered. Place the executable in a local file system (not in a GPFS file system) so that it is accessible even when the GPFS file system is unavailable.

--event Event[,Event...]

Specifies a list of events that trigger the callback. The value defines when the callback is invoked. There are two kinds of events: global events and local events. A global event triggers a callback on all nodes in the cluster, such as a nodeLeave event, which informs all nodes in the cluster that a node has failed. A local event triggers a callback only on the node on which the event occurred, such as mounting a file system on one of the nodes.

Table 1 lists the supported global events and their parameters.

Table 2 lists the supported local events and their parameters.

Local events for IBM Spectrum Scale RAID are documented in IBM Spectrum Scale RAID: Administration.

--priority Value

Specifies a floating point number that controls the order in which callbacks for a given event are run. Callbacks with a smaller numerical value are run before callbacks with a larger numerical value. Callbacks that do not have an assigned priority are run last. If two callbacks have the same priority, the order in which they are run is undetermined.

--async | --sync [--timeout Seconds] [--onerror Action]

Specifies whether GPFS will wait for the user program to complete and for how long it will wait. The default is --async (GPFS invokes the command asynchronously). --onerror Action specifies one of the following actions that GPFS is to take if the callback command returns a nonzero error code:

continue: GPFS ignores the result from executing the user-provided command. This is the default.
quorumLoss: The node executing the user-provided command will voluntarily resign as, or refrain from taking over as, cluster manager. This action is valid only in conjunction with the tiebreakerCheck event.
shutdown: GPFS will be shut down on the node executing the user-provided command.

-N {Node[,Node...] | NodeFile | NodeClass}

Defines the set of nodes on which the callback is invoked. For global events, the callback is invoked only on the specified set of nodes. For local events, the callback is invoked only if the node on which the event occurred is one of the nodes specified by the -N option. The default is -N all. For general information on how to specify node names, see Specifying nodes as input to GPFS commands .

This command does not support a NodeClass of mount.

--parms ParameterString ...

Specifies parameters to be passed to the executable specified with the --command parameter. The --parms parameter can be specified multiple times.

When the callback is invoked, the combined parameter string is tokenized on white-space boundaries. Constructs of the form %name and %name.qualifier are assumed to be GPFS variables and are replaced with their appropriate values at the time of the event. If a variable does not have a value in the context of a particular event, the string UNDEFINED is returned instead.

GPFS recognizes the following variables:

%blockLimit: Specifies the current hard quota limit in KB.
%blockQuota: Specifies the current soft quota limit in KB.
%blockUsage: Specifies the current usage in KB for quota-related events.
%ccrObjectName: Specifies the name of the modified object.
%ccrObjectValue: Specifies the value of the modified object.
%ccrObjectVersion: Specifies the version of the modified object.
%clusterManager[.qualifier]: Specifies the current cluster manager node.
%clusterName: Specifies the name of the cluster where this callback was triggered.
%ckDataLen: Specifies the length of data involved in a checksum mismatch.
%ckErrorCountClient: Specifies the cumulative number of errors for the client side in a checksum mismatch.
%ckErrorCountNSD: Specifies the cumulative number of errors for the NSD side in a checksum mismatch.
%ckErrorCountServer: Specifies the cumulative number of errors for the server side in a checksum mismatch.
%ckNSD: Specifies the NSD involved.
%ckOtherNode: Specifies the IP address of the other node in an NSD checksum event.
%ckReason: Specifies the reason string indicating why a checksum mismatch callback was invoked.
%ckReportingInterval: Specifies the error-reporting interval in effect at the time of a checksum mismatch.
%ckRole: Specifies the role (client or server) of a GPFS node.
%ckStartSector: Specifies the starting sector of a checksum mismatch.
%daName: Specifies the name of the declustered array involved.
%daRemainingRedundancy: Specifies the remaining fault tolerance in a declustered array.
%diskName: Specifies a disk or a comma-separated list of disk names for which this callback is triggered.
%downNodes[.qualifier]: Specifies a comma-separated list of nodes that are currently down. Only nodes local to the given cluster are listed. Nodes which are in a remote cluster but have temporarily joined the cluster are not included.
%eventName: Specifies the name of the event that triggered this callback.
%eventNode[.qualifier]: Specifies a node or comma-separated list of nodes on which this callback is triggered. Note that the list may include nodes which are not local to the given cluster, but have temporarily joined the cluster to mount a file system provided by the local cluster. Those remote nodes could leave the cluster if there is a node failure or if the file systems are unmounted.
%filesLimit: Specifies the current hard quota limit for the number of files.
%filesQuota: Specifies the current soft quota limit for the number of files.
%filesUsage: Specifies the current number of files for quota-related events.
%filesetName: Specifies the name of a fileset for which the callback is being executed.
%filesetSize: Specifies the size of the fileset.
%fsErr: Specifies the file system structure error code.
%fsName: Specifies the file system name for file system events.
%hardLimit: Specifies the hard limit for the block.
%homeServer: Specifies the name of the home server.
%inodeLimit: Specifies the hard limit of the inode.
%inodeQuota: Specifies the soft limit of the inode.
%inodeUsage: Specifies the total number of files in the fileset.
%myNode[.qualifier]: Specifies the node where callback script is invoked.
%nodeName: Specifies the node name to which the request is sent.
%nodeNames: Specifies a space-separated list of node names to which the request is sent.
%pcacheEvent: Specifies the pcache related events.
%pdFru: Specifies the FRU (field replaceable unit) number of the pdisk.
%pdLocation: The physical location code of a pdisk.
%pdName: The name of the pdisk involved.
%pdPath: The block device path of the pdisk.
%pdPriority: The replacement priority of the pdisk.
%pdState: The state of the pdisk involved.
%pdWwn: The worldwide name of the pdisk.
%prepopAlreadyCachedFiles: Specifies the number of files that are cached. These number of files are not read into cache because data is same between cache and home.
%prepopCompletedReads: Specifies the number of reads executed during a prefetch operation.
%prepopData: Specifies the total data read from the home as part of a prefetch operation.
%prepopFailedReads: Specifies the number of files for which prefetch failed. Messages are logged to indicate the failure. However, there is no indication about the file names that failed to read.
%quorumNodes[.qualifier]: Specifies a comma-separated list of quorum nodes.
%quotaEventType: Specifies either the blockQuotaExceeded event or the inodeQuotaExceeded event. These events are related to soft quota limit being exceeded,
%quotaID: Specifies the numerical ID of the quota owner (UID, GID, or fileset ID).
%quotaOwnerName: Specifies the name of the quota owner (user name, group name, or fileset name).
%quotaType: Specifies the type of quota for quota-related events. Possible values are USR, GRP, or FILESET.
%reason: Specifies the reason for triggering the event. For the preUnmount and unmount events, the possible values are normal and forced. For the preShutdown and shutdown events, the possible values are normal and abnormal. For all other events, the value is UNDEFINED.
%requestType: Specifies the type of request to send to the target nodes.
%rgCount: The number of recovery groups involved.
%rgErr: A code from a recovery group, where 0 indicates no error.
%rgName: The name of the recovery group involved.
%rgReason: The reason string indicating why a recovery group callback was invoked.
%senseDataFormatted: Sense data for the specific fileset structure error in a formatted string output.
%senseDataHex: Sense data for the specific fileset structure error in Big endian hex output.
%snapshotID: Specifies the identifier of the new snapshot.
%snapshotName: Specifies the name of the new snapshot.
%softLimit: Specifies the soft limit of the block.
%storagePool: Specifies the storage pool name for space-related events.
%upNodes[.qualifier]: Specifies a comma-separated list of nodes that are currently up. Only nodes local to the given cluster are listed. Nodes which are in a remote cluster but have temporarily joined the cluster are not included.
%userName: Specifies the user name.
%waiterLength: Specifies the length of the waiter in seconds.

Variables recognized by IBM Spectrum Scale RAID are documented in IBM Spectrum Scale RAID: Administration.

Variables that represent node identifiers accept an optional qualifier that can be used to specify how the nodes are to be identified. When specifying one of these optional qualifiers, separate it from the variable with a period, as shown here:

variable.qualifier

The value for qualifier can be one of the following:

ip: Specifies that GPFS should use the nodes' IP addresses.
name: Specifies that GPFS should use fully-qualified node names. This is the default.
shortName: Specifies that GPFS should strip the domain part of the node names.

Events and supported parameters

Table 1. Global events and supported parameters
Global event	Supported parameters
afmFilesetExpired Triggered when the contents of a fileset expire either as a result of the fileset being disconnected for the expiration timeout value or when the fileset is marked as expired using the AFM administration commands.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetUnexpired Triggered when the contents of a fileset become unexpired either as a result of the reconnection to home or when the fileset is marked as unexpired using the AFM administration commands.	%fsName %filesetName %pcacheEvent %homeServer %reason
nodeJoin Triggered when one or more nodes join the cluster.	%eventNode
nodeLeave Triggered when one or more nodes leave the cluster.	%eventNode
quorumReached Triggered when a quorum has been established in the GPFS cluster. This event is triggered only on the cluster manager, not on all the nodes in the cluster.	%quorumNodes
quorumLoss Triggered when quorum has been lost in the GPFS cluster.	N/A
quorumNodeJoin Triggered when one or more quorum nodes join the cluster.	%eventNode
quorumNodeLeave Triggered when one or more quorum nodes leave the cluster.	%eventNode
clusterManagerTakeOver Triggered when a new cluster manager node is elected. This happens when a cluster first starts up or when the current cluster manager fails or resigns and a new node takes over as cluster manager.	N/A

Table 2. Local events and supported parameters
Local event	Supported parameters
afmCmdRequeued Triggered during replication when messages are queued up again because of errors. These messages are retried after 15 minutes.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetUnmounted Triggered when the fileset is moved to an Unmounted state because NFS server is not reachable or remote cluster mount is not available for GPFS Native protocol.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmHomeConnected Triggered when a gateway node connects to the afmTarget of the fileset that it is serving. This event is local on gateway nodes.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmHomeDisconnected Triggered when a gateway node gets disconnected from the afmTarget of the fileset that it is serving. This event is local on gateway nodes.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmManualResyncComplete Triggered when a manual resync is completed.	%fsName %filesetName %reason
afmPrepopEnd Triggered when all the files specified by a prefetch operation have been completed. This event is local on a gateway node.	%fsName %filesetName %prepopCompletedReads %prepopFailedReads %prepopAlreadyCachedFiles %prepopData
afmQueueDropped Triggered when replication encounters an issue that cannot be corrected. After the queue is dropped, next recovery action attempts to fix the error and continue to replicate.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmRecoveryFail Triggered when recovery fails. The recovery action is retried after 300 seconds. If recovery keeps failing, fileset is moved to a resync state if the fileset mode allows it.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmRecoveryStart Triggered when AFM recovery starts. This event is local on gateway nodes.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmRecoveryEnd Triggered when AFM recovery ends. This event is local on gateway nodes.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmRPOMiss Triggered when Recovery Point Objective (RPO) is missed on DR primary filesets, RPO Manager keeps retrying the snapshots. This event occurs when there is lot of data to replicate for the RPO snapshot to be taken or there is an error such as, deadlock and recovery keeps failing.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetCreate Triggered when an AFM fileset is created.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetLink Triggered when an AFM fileset is linked.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetChange Triggered when an AFM fileset is changed. If a fileset is renamed the new name is part of %reason.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetUnlink Triggered when a AFM fileset is linked.	%fsName %filesetName %pcacheEvent %homeServer %reason
afmFilesetDelete Triggered when an AFM fileset is deleted.	%fsName %filesetName %pcacheEvent %homeServer %reason
ccrFileChange Triggered when CCR fput operation takes place.	%ccrObjectName %ccrObjectVersion
ccrVarChange Triggered when CCR vput operation takes place.	%ccrObjectName %ccrObjectValue %ccrObjectVersion
daRebuildFailed The daRebuildFailed callback is generated when the spare space in a declustered array has been exhausted, and vdisk tracks involving damaged pdisks can no longer be rebuilt. The occurrence of this event indicates that fault tolerance in the declustered array has become degraded and that disk maintenance should be performed immediately. The daRemainingRedundancy parameter indicates how much fault tolerance remains in the declustered array.	%myNode %rgName %daName %daRemainingRedundancy
deadlockDetected Triggered when a node detects a potential deadlock. If the exit code of the registered callback for this event is 1, debug data will not be collected. See the /usr/lpp/mmfs/samples/deadlockdetected.sample file for an example of using the deadlockDetected event.	%eventName %myNode %waiterLength
deadlockOverload Triggered when an overload event occurs. The event is local to the node detecting the overload condition.	%eventName %nodeName
diskFailure Triggered on the file system manager when the status of a disk in a file system changes to down.	%eventName %diskName %fsName
filesetLimitExceeded Triggered when the file system manager detects that a fileset quota has been exceeded. This is a variation of softQuotaExceeded that applies only to fileset quotas. It exists only for compatibility (and may be deleted in a future version); therefore, using softQuotaExceeded is recommended instead.	%filesetName %fsName %filesetSize %softLimit %hardLimit %inodeUsage %inodeQuota %inodeLimit %quotaEventType
fsstruct Triggered when the file system manager detects a file system structure (FS Struct) error. For more information about FS Struct errors, see the following topics: mmfsck command MMFS_FSSTRUCT Events Information to be collected before contacting the IBM Support Center	%fsName %fsErr %senseDataFormatted %senseDataHex
lowDiskSpace Triggered when the file system manager detects that disk space usage has reached the high occupancy threshold that is specified in the current policy rule. The event is generated every two minutes until the condition no longer exists. For more information, see Using thresholds to migrate data between pools.	%storagePool %fsName
noDiskSpace Triggered when the file system encounters a disk, or storage pool that has run out of space or an inodespace has run out of inodes. An inode space can be an entire file system or an independent fileset. Use the noSpaceEventInterval configuration attribute of the mmchconfig command to control the time interval between two noDiskSpace events. The default value is 120 seconds. When a storage pool runs out of disk space, %reason is "diskspace", %storagePool is the name of the pool that ran out of disk space, and %filesetName is "UNDEFINED". When a fileset runs out of inode space, %reason is "inodespace", %filesetName is the name of the independent fileset that owns the affected inode space, and %storagePool is "UNDEFINED".	%storagePool %fsName %reason %filesetName
nsdCksumMismatch The nsdCksumMismatch callback is generated whenever transmission of vdisk data by the NSD network layer fails to verify the data checksum. This can indicate problems in the network between the GPFS client node and a recovery group server. The first error between a given client and server generates the callback; subsequent callbacks are generated for each ckReportingInterval occurrence.	%myNode %ckRole %ckOtherNode %ckNSD %ckReason %ckStartSector %ckDataLen %ckErrorCountClient %ckErrorCountServer %ckErrorCountNSD %ckReportingInterval
pdFailed The pdFailed callback is generated whenever a pdisk in a recovery group is marked as `dead`, `missing`, `failed`, or `readonly`.	%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn %pdState
pdPathDown The pdPathDown callback is generated whenever one of the block device paths to a pdisk disappears or becomes inoperative. The occurrence of this event can indicate connectivity problems with the JBOD array in which the pdisk resides.	%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn %pdPath
pdReplacePdisk The pdReplacePdisk callback is generated whenever a pdisk is marked for replacement according to the replace threshold setting of the declustered array in which it resides.	%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn %pdState %pdPriority
pdRecovered The pdRecovered callback is generated whenever a missing pdisk is rediscovered. The following parameters are available to this callback: %myNode, %rgName, %daName, %pdName, %pdLocation, %pdFru, and %pdWwn.	%myNode %rgName %daName %pdName %pdLocation %pdFru %pdWwn
preMount, preUnmount, mount, unmount These events are triggered when a file system is about to be mounted or unmounted or has been mounted or unmounted successfully. These events are generated for explicit mount and unmount commands, a remount after GPFS recovery and a forced unmount when GPFS panics and shuts down.	%fsName %reason
preRGRelinquish The preRGRelinquish callback is invoked on a recovery group server prior to relinquishing service of recovery groups. The rgName parameter may be passed into the callback as the keyword value _ALL_, indicating that the recovery group server is about to relinquish service for all recovery groups it is serving; the rgCount parameter will be equal to the number of recovery groups being relinquished. Additionally, the callback will be invoked with the rgName of each individual recovery group and an rgCount of 1 whenever the server relinquishes serving recovery group rgName.	%myNode %rgName %rgErr %rgCount %rgReason
preRGTakeover The preRGTakeover callback is invoked on a recovery group server prior to attempting to open and serve recovery groups. The rgName parameter may be passed into the callback as the keyword value _ALL_, indicating that the recovery group server is about to open multiple recovery groups; this is typically at server startup, and the parameter rgCount will be equal to the number of recovery groups being processed. Additionally, the callback will be invoked with the rgName of each individual recovery group and an rgCount of 1 whenever the server checks to determine whether it should open and serve recovery group rgName.	%myNode %rgName %rgErr %rgCount %rgReason
preShutdown Triggered when GPFS detects a failure and is about to shut down.	%reason
preStartup Triggered after the GPFS daemon completes its internal initialization and joins the cluster, but before the node runs recovery for any file systems that were already mounted, and before the node starts accepting user initiated sessions.	N/A
postRGRelinquish The postRGRelinquish callback is invoked on a recovery group server after it has relinquished serving recovery groups. If multiple recovery groups have been relinquished, the callback will be invoked with rgName keyword _ALL_ and an rgCount equal to the total number of involved recovery groups. The callback will also be triggered for each individual recovery group.	%myNode %rgName %rgErr %rgCount %rgReason
postRGTakeover The postRGTakeover callback is invoked on a recovery group server after it has checked, attempted, or begun to serve a recovery group. If multiple recovery groups have been taken over, the callback will be invoked with rgName keyword _ALL_ and an rgCount equal to the total number of involved recovery groups. The callback will also be triggered for each individual recovery group.	%myNode %rgName %rgErr %rgCount %rgReason
rgOpenFailed The rgOpenFailed callback will be invoked on a recovery group server when it fails to open a recovery group that it is attempting to serve. This may be due to loss of connectivity to some or all of the disks in the recovery group; the rgReason string will indicate why the recovery group could not be opened.	%myNode %rgName %rgErr %rgReason
rgPanic The rgPanic callback will be invoked on a recovery group server when it is no longer able to continue serving a recovery group. This may be due to loss of connectivity to some or all of the disks in the recovery group; the rgReason string will indicate why the recovery group can no longer be served.	%myNode %rgName %rgErr %rgReason
sendRequestToNodes Triggered when a node sends a request for collecting expel-related debug data. For this event, the %requestType is requestExpelData.	%eventName %requestType %nodeNames
shutdown Triggered when GPFS completes the shutdown.	%reason
snapshotCreated Triggered after a snapshot is created, and run before the file system is resumed. This event helps correlate the timing of DMAPI events with the creation of a snapshot. GPFS must wait for snapshotCreated to exit before it resumes the file system, so the ordering of DMAPI events and snapshot creation is known. The %filesetName is the name of the fileset whose snapshot was created. For file system level snapshots that affect all filesets, %filesetName is set to global.	%snapshotID %snapshotName %fsName %filesetName
softQuotaExceeded Triggered when the file system manager detects that a soft quota limit (for either files or blocks) has been exceeded. This event is triggered only on the file system manager. Therefore, this event must be handled on all manager nodes.	%fsName %filesetName %quotaId %quotaType %quotaOwnerName %blockUsage %blockQuota %blockLimit %filesUsage %filesQuota %filesLimit
startup Triggered after a successful GPFS startup before the node is ready for user initiated sessions. After this event is triggered GPFS proceeds to finish starting including mounting all file systems defined to mount on startup.	N/A
tiebreakerCheck Triggered when the cluster manager detects a lease timeout on a quorum node before GPFS runs the algorithm that decides if the node will remain in the cluster. This event is generated only in configurations that use tiebreaker disks. Note: Before you add or delete the tiebreakerCheck event, you must stop the GPFS daemon on all the nodes in the cluster.	N/A
traceConfigChanged Triggered when GPFS tracing configuration is changed.	N/A
usageUnderSoftQuota Triggered when the file system manager detects that quota usage has dropped below soft limits and grace time is reset.	%fsName %filesetName %fsName %quotaId %quotaType %quotaOwnerName %blockUsage %blockQuota %blockLimit %filesUsage %filesQuota %filesLimit

Options

-S Filename | --spec-file Filename: Specifies a file with multiple callback definitions, one per line. The first token on each line must be the callback identifier.

Exit status

0: Successful completion.
nonzero: A failure has occurred.

Security

You must have root authority to run the mmaddcallback command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

To register command /tmp/myScript to run after GPFS startup, issue this command:

mmaddcallback test1 --command=/tmp/myScript --event startup

The system displays information similar to:

mmaddcallback: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

To register a callback on the NFS servers to export or to unexport a particular file system after it has been mounted or before it has been unmounted, issue this command:

mmaddcallback NFSexport --command /usr/local/bin/NFSexport --event mount,preUnmount -N nfsserver1, 
nfsserver2 --parms "%eventName %fsName" --parms "%eventName %fsName"