mmchconfig command

Changes GPFS configuration parameters.

Synopsis

mmchconfig Attribute=value[,Attribute=value...] [-i | -I]
             [-N {Node[,Node...] | NodeFile | NodeClass}] 

Availability

Available on all IBM Spectrum Scale editions.

Description

Use the mmchconfig command to change the GPFS configuration attributes on a single node, a set of nodes, or globally for the entire cluster.

Results

The configuration is updated on the specified nodes.

Parameters

-I
Specifies that the changes take effect immediately, but do not persist when GPFS is restarted. This option is valid only for the following attributes:
  • Start of changeappendShipEnabledEnd of change
  • deadlockBreakupDelay
  • deadlockDataCollectionDailyLimit
  • deadlockDataCollectionMinInterval
  • deadlockDetectionThreshold
  • deadlockDetectionThresholdForShortWaiters
  • deadlockOverloadThreshold
  • dioSmallSeqWriteBatching
  • diskReadExclusionList
  • dmapiMountEvent
  • dmapiMountTimeout
  • dmapiSessionFailureTimeout
  • expelDataCollectionDailyLimit
  • expelDataCollectionMinInterval
  • fastestPolicyCmpThreshold
  • fastestPolicyMaxValidPeriod
  • fastestPolicyMinDiffPercent
  • fastestPolicyNumReadSamples
  • fileHeatLossPercent
  • fileHeatPeriodMinutes
  • ignorePrefetchLUNCount
  • ignoreReplicationForQuota
  • ignoreReplicationOnStatfs
  • linuxStatfsUnits
  • lrocData
  • lrocDataMaxFileSize
  • lrocDataStubFileSize
  • lrocDirectories
  • lrocEnableStoringClearText
  • lrocInodes
  • logRecoveryThreadsPerLog
  • logOpenParallelism
  • logRecoveryParallelism
  • maxMBpS
  • nfsPrefetchStrategy
  • nsdBufSpace
  • nsdCksumTraditional
  • nsdDumpBuffersOnCksumError
  • nsdInlineWriteMax
  • nsdMultiQueue
  • pagepool
  • panicOnIOHang
  • pitWorkerThreadsPerNode
  • proactiveReconnect
  • readReplicaPolicy
  • readReplicaRuleEnabled
  • seqDiscardThreshold
  • syncbuffsperiteration
  • systemLogLevel
  • unmountOnDiskFail
  • verbsRdmaRoCEToS
  • worker1Threads (only when adjusting value down)
  • writebehindThreshold
-i
Specifies that the changes take effect immediately and are permanent. This option is valid only for the following attributes:
  • afmNFSVersion
  • Start of changeafmObjKeyExpirationEnd of change
  • afmSyncNFSV4ACL
  • Start of changeappendShipEnabledEnd of change
  • cesSharedRoot
  • cnfsGrace
  • cnfsMountdPort
  • cnfsNFSDprocs
  • cnfsReboot
  • cnfsSharedRoot
  • cnfsVersions
  • commandAudit
  • confirmShutdownIfHarmful
  • dataDiskWaitTimeForRecovery
  • dataStructureDump
  • deadlockBreakupDelay
  • deadlockDataCollectionDailyLimit
  • deadlockDataCollectionMinInterval
  • deadlockDetectionThreshold
  • deadlockDetectionThresholdForShortWaiters
  • deadlockOverloadThreshold
  • debugDataControl
  • dioSmallSeqWriteBatching
  • disableInodeUpdateOnFDatasync
  • diskReadExclusionList
  • dmapiMountEvent
  • dmapiMountTimeout
  • dmapiSessionFailureTimeout
  • expelDataCollectionDailyLimit
  • expelDataCollectionMinInterval
  • fastestPolicyCmpThreshold
  • fastestPolicyMaxValidPeriod
  • fastestPolicyMinDiffPercent
  • fastestPolicyNumReadSamples
  • fileHeatLossPercent
  • fileHeatPeriodMinutes
  • ignorePrefetchLUNCount
  • ignoreReplicationForQuota
  • ignoreReplicationOnStatfs
  • linuxStatfsUnits
  • lrocData
  • lrocDataMaxFileSize
  • lrocDataStubFileSize
  • lrocDirectories
  • lrocEnableStoringClearText
  • lrocInodes
  • logRecoveryThreadsPerLog
  • logOpenParallelism
  • logRecoveryParallelism
  • maxDownDisksForRecovery
  • maxFailedNodesForRecovery
  • maxMBpS
  • metadataDiskWaitTimeForRecovery
  • minDiskWaitTimeForRecovery
  • mmfsLogTimeStampISO8601
  • mmhealthHDFSBinPath
  • mmhealthHDFSCoreSiteFile
  • mmhealthHDFSEnvFile
  • mmhealthHDFSGpfsBinPath
  • mmhealthHDFSPidPath
  • mmhealthHDFSWorkersFile
  • mmhealthNFSCollectDebug
  • mmhealthEnclosureFanLowerLimit
  • mmhealthEnclosureFanUpperLimit
  • mmhealthNFSFailoverWhenUnresponsive
  • mmhealthNFSPreventStartWhenFsMissing
  • mmhealthNFSRpcBindRestart
  • mmhealthPtfUpdatesMonitorEnabled
  • mmhealthUseSharedLib
  • nfsPrefetchStrategy
  • nsdBufSpace
  • nsdCksumTraditional
  • nsdDumpBuffersOnCksumError
  • nsdInlineWriteMax
  • nsdMultiQueue
  • pagepool
  • panicOnIOHang
  • pitWorkerThreadsPerNode
  • proactiveReconnect
  • readReplicaPolicy
  • readReplicaRuleEnabled
  • restripeOnDiskFailure
  • sdrNotifyAuthEnabled
  • seqDiscardThreshold
  • sudoUser
  • syncbuffsperiteration
  • systemLogLevel
  • tscCmdAllowRemoteConnections
  • unmountOnDiskFail
  • verbsRdmaRoCEToS
  • worker1Threads (only when adjusting value down)
  • writebehindThreshold
Note: For more information on HDFS, see CES HDFS support.

The following attributes are used by mmsysmonitor:

-N {Node[,Node...] | NodeFile | NodeClass}
Specifies the set of nodes to which the configuration changes apply. The default is -N all.

For information on how to specify node names, see Specifying nodes as input to GPFS commands.

To see a complete list of the attributes for which the -N flag is valid, see the table "Configuration attributes on the mmchconfig command" in Changing the GPFS cluster configuration data.

This command does not support a NodeClass of mount.

Attribute=value
Specifies the name of the attribute to be changed and its associated value. More than one attribute and value pair can be specified.
Start of changeThe mmchconfig command supports the following special values:
DEFAULT
Restores the GPFS default setting for an attribute.
DELETE
Removes the specified attribute from the GPFS configuration file.

If -N is not specified (or -N all), the DELETE value is equivalent to the DEFAULT value.

If -N is specified along with one of the immediate options (that is, -i or -I), the immediate options are ignored.

End of change

This command accepts the following attributes:

adminMode
Specifies whether all nodes in the cluster are used for issuing GPFS administration commands or just a subset of the nodes. Valid values are:
allToAll
Indicates that all nodes in the cluster are used for running GPFS administration commands and that all nodes are able to execute remote commands on any other node in the cluster without the need of a password.
central
Indicates that only a subset of the nodes is used for running GPFS commands and that only those nodes are able to execute remote commands on the rest of the nodes in the cluster without the need of a password.

For more information, see Requirements for administering a GPFS file system.

afmAsyncDelay
Specifies (in seconds) the amount of time by which write operations are delayed (because write operations are asynchronous with respect to remote clusters). For write-intensive applications that keep writing to the same set of files, this delay is helpful because it replaces multiple writes to the home cluster with a single write containing the latest data. However, setting a very high value weakens the consistency of data on the remote cluster.

This configuration parameter is applicable only for writer caches (SW, IW, and primary), where data from cache is pushed to home.

Valid values are between 1 and 2147483647. The default is 15.

afmAsyncOpWaitTimeout
Specifies the time (in seconds) that AFM or AFM DR waits for completion of any inflight asynchronous operation which is synchronizing with the home or primary cluster. Subsequently, AFM or AFM DR cancels the operation and tries synchronization again after home or primary cluster is available.

Default value is 300. The range of valid values is 5 and 2147483647.

afmDirLookupRefreshInterval
Controls the frequency of data revalidations that are triggered by such lookup operations as ls or stat (specified in seconds). When a lookup operation is done on a directory, if the specified amount of time has passed, AFM sends a message to the home cluster to find out whether the metadata of that directory has been modified since the last time it was checked. If the time interval has not passed, AFM does not check the home cluster for updates to the metadata.

Valid values are 0 through 2147483647. The default is 60. In situations where home cluster data changes frequently, a value of 0 is recommended.

afmDirOpenRefreshInterval
Controls the frequency of data revalidations that are triggered by such I/O operations as read or write (specified in seconds). After a directory has been cached, open requests resulting from I/O operations on that object are directed to the cached directory until the specified amount of time has passed. Once the specified amount of time has passed, the open request gets directed to a gateway node rather than to the cached directory.

Valid values are between 0 and 2147483647. The default is 60. Setting a lower value guarantees a higher level of consistency.

afmDisconnectTimeout
The Waiting period in seconds to detect the status of the home cluster. If the home cluster is inaccessible, the metadata server (MDS) changes the state to 'disconnected'.
afmEnableNFSSec
If enabled at cache/primary, exported paths from home/secondary with kerberos-enabled security levels like sys, krb5, krb5i, krb5p are mounted at cache/primary in the increasing order of security level - sys, krb5, krb5i, krb5p. For example, the security level of exported path is krb5i then at cache, AFM/AFM DR tries to mount with level sys, followed by krb5, and finally mounts with the security level krb5i. If disabled at cache/primary, then exported paths from home or secondary are mounted with security level sys at cache or primary. You must configure KDC clients on all the gateway nodes at cache or primary before enabling this parameter.

Valid values are yes and no. The default value is no.

afmExpirationTimeout
Is used with afmDisconnectTimeout (which can be set only through mmchconfig) to control how long a network outage between the cache and home clusters can continue before the data in the cache is considered out of sync with home. After afmDisconnectTimeout expires, cached data remains available until afmExpirationTimeout expires, at which point the cached data is considered expired and cannot be read until a reconnect occurs.

Valid values are 0 through 2147483647. The default is disable.

afmFileLookupRefreshInterval
Controls the frequency of data revalidations that are triggered by such lookup operations as ls or stat (specified in seconds). When a lookup operation is done on a file, if the specified amount of time has passed, AFM sends a message to the home cluster to find out whether the metadata of the file has been modified since the last time it was checked. If the time interval has not passed, AFM does not check the home cluster for updates to the metadata.

Valid values are 0 through 2147483647. The default is 30. In situations where home cluster data changes frequently, a value of 0 is recommended.

afmFileOpenRefreshInterval
Controls the frequency of data revalidations that are triggered by such I/O operations as read or write (specified in seconds). After a file has been cached, open requests resulting from I/O operations on that object are directed to the cached file until the specified amount of time has passed. Once the specified amount of time has passed, the open request gets directed to a gateway node rather than to the cached file.

Valid values are 0 through 2147483647. The default is 30. Setting a lower value guarantees a higher level of consistency.

afmHardMemThreshold

Sets a limit to the maximum amount of memory that AFM can use on each gateway node to record changes to the file system. After this limit is reached, the fileset goes into a 'dropped' state.

Exceeding the limit and the fileset going into a 'dropped' state due to accumulated pending requests might occur if -
  • the cache cluster is disconnected for an extended period of time.
  • the connection with the home cluster is on a low bandwidth.
afmHashVersion

Specifies the version of hashing algorithm to be used to assign AFM and AFM ADR filesets across gateway nodes, thus running as few recoveries as possible. This minimizes impact of gateway nodes joining or leaving the active cluster.

Valid values are 1,2, 4 or 5. Default value is 2.

afmMaxParallelRecoveries
Specifies the number of filesets per gateway node on which event recovery is run. The default value is 0. When the value is 0, event recovery is run on all filesets of the gateway node.
afmNFSVersion
Enables AFM NFSv4 supports.

Default value is 3 for compatibility with an earlier version. Allowed values are 3, 4.1 and 4.2 for Knfs protocol, and 3 and 4.1 for the NFS protocol.

For more information, see AFM Network File System version 4 support.

afmNumReadThreads
Defines the number of threads that can be used on each participating gateway node during parallel read. The default value of this parameter is 1; that is, one reader thread will be active on every gateway node for each big read operation qualifying for splitting per the parallel read threshold value. The valid range of values is 1 to 64.
afmNumWriteThreads
Defines the number of threads that can be used on each participating gateway node during parallel write. The default value of this parameter is 1; that is, one writer thread will be active on every gateway node for each big write operation qualifying for splitting per the parallel write threshold value. Valid values can range from 1 to 64.
Start of changeafmObjKeyExpirationEnd of change
Start of change

Specifies COS Key Expiration timeout value (in seconds). In case, the expiration timeout is set for access keys and secret keys at the Cloud Object Server, you can set expiration at the cache side for AFM to reload the access key and secret key values into the memory after the defined timeout. Initially keys are loaded into the memory first time when AFM Fileset is accessed.

After the expiration timeout is passed, AFM reload access and secret keys again into the memory and use the keys for communication purpose. You must update the access key and secret key once it is expired, before you start the next communication with server.

The afmObjKeyExpiration parameter is set at the cluster level. The valid values are 0 to 2147483647. The default is 36000.

To set the object key expiration timeout in seconds, issue the following command as shown:
# mmchconfig afmObjKeyExpiration=1800 -i
End of change
afmParallelMounts

When this parameter is enabled, the primary gateway node of a fileset at a cache cluster attempts to mount the exported path from multiple NFS servers that are defined in the mapping. Then, this primary gateway node sends unique messages through each NFS mount to improve performance by transferring data in parallel.

Before enabling this parameter, define the mapping between the primary gateway node and NFS servers by issuing the mmafmconfig command.

afmParallelReadChunkSize
Defines the minimum chunk size of the read that needs to be distributed among the gateway nodes during parallel reads. Values are interpreted in terms of bytes. The default value of this parameter is 128 MiB, and the valid range of values is 0 to 2147483647. It can be changed cluster wide with the mmchconfig command. It can be set at fileset level using mmcrfileset or mmchfileset commands.
afmParallelReadThreshold
Defines the threshold beyond which parallel reads become effective. Reads are split into chunks when file size exceeds this threshold value. Values are interpreted in terms of MiB. The default value is 1024 MiB. The valid range of values is 0 to 2147483647. It can be changed cluster wide with the mmchconfig command. It can be set at fileset level using mmcrfileset or mmchfileset commands.
afmParallelWriteChunkSize
Defines the minimum chunk size of the write that needs to be distributed among the gateway nodes during parallel writes. Values are interpreted in terms of bytes. The default value of this parameter is 128 MiB, and the valid range of values is 0 to 2147483647. It can be changed cluster wide with the mmchconfig command. It can be set at fileset level using mmcrfileset or mmchfileset commands.
afmParallelWriteThreshold
Defines the threshold beyond which parallel writes become effective. Writes are split into chunks when file size exceeds this threshold value. Values are interpreted in terms of MiB. The default value of this parameter is 1024 MiB, and the valid range of values is 0 to 2147483647. It can be changed cluster wide with the mmchconfig command. It can be set at fileset level using mmcrfileset or mmchfileset commands.
afmReadSparseThreshold
Specifies the size in MB for files in cache beyond which sparseness is maintained. For all files below the specified threshold, sparseness is not maintained.
afmRefreshAsync
Modifies the cache data refresh operation to asynchronous mode. Cache data refresh operation in asynchronous mode improves performance of applications that query data. Upon readdir or lookup request, a revalidation request for files or directories is queued as an asynchronous request to the gateway but the last known synchronized state of the cache data is returned to the applications. Cache data is refreshed after revalidation with home is complete. Revalidation time depends on the network availability and its bandwidth.

Valid values are 'no' and 'yes'. With the default value as 'no', cache data is validated with home synchronously. Specify the value as 'yes' if you want the cache data refresh operation to be in asynchronous mode.

afmRevalOpWaitTimeout
Specifies the time that AFM waits for revalidation to get response from the home cluster. Revalidation checks if any changes are available at home (data and metadata) that need to be updated to the cache cluster. Revalidation is performed when application trigger operations like lookup, open at cache. If revalidation is not completed within this time, AFM cancels the operation and returns data available at cache to the application.

Default value is 180. The range of valid values is 5 and 2147483647.

afmRPO
Specifies the recovery point objective (RPO) interval for an AFM DR fileset. This attribute is disabled by default. You can specify a value with the suffix M for minutes, H for hours, or W for weeks. For example, for 12 hours specify 12H. If you do not add a suffix, the value is assumed to be in minutes. The range of valid values is 720 minutes - 2147483647 minutes.

To disable afmRPO, you can set the parameter value to afmRPO=disable. For example, mmchfileset fs1 afmFileset -p afmRPO=disable.

afmSecondaryRW
Specifies if the secondary is read-write or not.
yes
Specifies that the secondary is read-write.
no
Specifies that the secondary is not read-write.
afmShowHomeSnapshot
Controls the visibility of the home snapshot directory in cache. For this to be visible in cache, this variable has to be set to yes, and the snapshot directory name in the cache and home cannot be the same.
yes
Specifies that the home snapshot link directory is visible.
no
Specifies that the home snapshot link directory is not visible.

For more information about the snapshot, see Peer snapshot -psnap.

afmSyncNFSv4ACL
Enables the migration of NFSv4 ACLs from third-party file systems. when this parameter is enabled, data from a third-party storage is migrated to IBM Spectrum Scale.

Default value is 0. Allowed values are 1 or 0.

For more information, see AFM Network File System version 4 support.

afmSyncOpWaitTimeout
Specifies the time that AFM or AFM DR waits for completion of any inflight synchronous operation which is synchronizing with the home or primary cluster. When any application is performing any synchronous operation at cache or secondary, AFM or AFM DR tries to get a response from home or primary cluster. If home or primary cluster is not responding, application might be unresponsive. If operation does not complete in this timeout interval, AFM or AFM DR cancels the operation.

Default value is 180. The range of valid values is 5 and 2147483647.

Start of changeappendShipEnabledEnd of change
Start of changeImproves the overall write performance of workloads that involve appending small chunks of data from multiple nodes concurrently to a shared file. The appended data might be of full block size or less. Valid values are yes and no. The default value is no.
This attribute must be enabled for the following locations:
  • On the home cluster
  • On the remote cluster if appends to the shared file occur from the remote cluster nodes
Note: All nodes in the home and remote cluster that are mounting the file system must be Linux® nodes and must run on IBM Spectrum Scale 5.1.0.0 or later.
End of change
atimeDeferredSeconds
Controls the update behavior of atime when the relatime option is enabled. The default value is 86400 seconds (24 hours). A value of 0 effectively disables relatime and causes the behavior to be the same as the atime setting.

For more information, see Mount options specific to IBM Spectrum Scale.

autoBuildGPL={yes | no | mmbuildgplOptions}
Causes IBM Spectrum Scale to detect when the GPFS portability layer (GPL) needs to be rebuilt and to rebuild it automatically. A rebuild is triggered if the GPFS kernel module is missing or if a new level of IBM Spectrum Scale is installed. The mmbuildgpl command is called to do the rebuild. For the rebuild to be successful, the requirements of the mmbuildgpl command must be met; in particular, the build tools and kernel headers must be present on each node. This attribute takes effect when the GPFS daemon is restarted. For more information, see the topics mmbuildgpl command and Using the mmbuildgpl command to build the GPFS portability layer on Linux nodes.
Note: This parameter does not apply to the AIX® and Windows environments.
yes
Causes the GPL to be rebuilt when necessary.
no
Takes no action when the GPL needs to be rebuilt. This is the default value.
mmbuildgplOptions
Causes the GPL to be rebuilt when necessary and causes mmbuildgpl to be called with the indicated options. This value is a hyphen-separated list of options in any order:
quiet
Causes mmbuildgpl to be called with the --quiet parameter.
verbose
Causes mmbuildgpl to be called with the -v option.
Note that yes, no, and mmbuildgplOptions are mutually exclusive, and that mmbuildgplOptions implies yes. You cannot specify both yes and mmbuildgplOptions. You can specify both quiet and verbose on the command line by separating them with a hyphen, as in autoBuildGPL=quiet-verbose. See Table 1.

The -N flag is valid for this attribute.

Table 1. Values assigned to autoBuildGPL and their effects
Value assigned to autoBuildGPL Command and options that are invoked when the GPL needs to be rebuilt
autoBuildGPL=no N/A
autoBuildGPL=yes mmbuildgpl
autoBuildGPL=quiet mmbuildgpl --quiet
autoBuildGPL=verbose mmbuildgpl -v
autoBuildGPL=quiet-verbose mmbuildgpl --quiet -v
autoBuildGPL=verbose-quiet mmbuildgpl --quiet -v
autoload
Starts GPFS automatically whenever the nodes are rebooted. Valid values are yes or no.

The -N flag is valid for this attribute.

automountDir
Specifies the directory to be used by the Linux automounter for GPFS file systems that are being mounted automatically. The default directory is /gpfs/automountdir. This parameter does not apply to AIX and Windows environments.
backgroundSpaceReclaimThreshold
Specifies the percentage of reclaimable blocks that must occur in an allocation space for devices capable of space reclaim, such as NVMe and thin provisioned disks, to trigger a background space reclaim. The default value is 0 indicating that background space reclaim is disabled. You can enable it by setting it to a value larger than 0 but less than or equal to 100. Specifying a lower value causes the background space reclaim to occur more frequently.
cesSharedRoot
Specifies a directory in a GPFS file system to be used by the Cluster Export Services (CES) subsystem. For the CES shared root, the recommended value is a dedicated file system, but it is not enforced. The CES shared root can also be a part of an existing GPFS file system. In any case, cesSharedRoot must reside on GPFS and must be available when it is configured through mmchconfig.

GPFS must be down on all CES nodes in the cluster when changing the cesSharedRoot attribute.

cifsBypassTraversalChecking
Controls the GPFS behavior while performing access checks for directories
GPFS grants the SEARCH access when the following conditions are met:
  • The object is a directory
  • The parameter value is yes
  • The calling process is a Samba process

GPFS grants the SEARCH access regardless of the mode or ACL.

cipherList
Sets the security mode for the cluster. The security mode determines the level of the security that the cluster provides for communications between nodes in the cluster and also for communications with other clusters. There are three security modes:
EMPTY
The sending node and the receiving node do not authenticate each other, do not encrypt transmitted data, and do not check data integrity.
AUTHONLY
The sending and receiving nodes authenticate each other, but they do not encrypt transmitted data and do not check data integrity. This mode is the default in IBM Spectrum Scale 4.2 or later.
Cipher
The sending and receiving nodes authenticate each other, encrypt transmitted data, and check data integrity. To set this mode, you must specify the name of a supported cipher, such as AES128-GCM-SHA256.
Note: Although after mmchconfig is issued, the mmfsd daemon accepts the new cipherList immediately, it uses the cipher only for new TCP/TLS connections to other nodes. Existing mmfsd daemon connections remain with the prior cipherList settings. For the new cipherList to take complete effect immediately, GPFS needs to be restarted on all nodes in a rolling fashion, one node at a time, to prevent cluster outage. When a cipher other than AUTHONLY or EMPTY is in effect, it can lead to significant performance degradation, as this results in encryption and data integrity verification of the transmitted data.
For more information about the security mode and supported ciphers, see Security mode.
cnfsGrace
Specifies the number of seconds a CNFS node will deny new client requests after a node failover or failback, to allow clients with existing locks to reclaim them without the possibility of some other client that is being granted a conflicting access. For v3, only new lock requests are denied. For v4, new lock, read, and write requests are rejected. The cnfsGrace value also determines the time period for the server lease.

Valid values are 10 - 600. The default is 90 seconds. While a short grace period is good for fast server failover, it comes at the cost of increased load on server to effect lease renewal.

GPFS must be down on all CNFS nodes in the cluster when changing the cnfsGrace attribute.

cnfsMountdPort
Specifies the port number to be used for rpc.mountd. See the IBM Spectrum Scale: Administration Guide for restrictions and additional information.
cnfsNFSDprocs
Specifies the number of nfsd kernel threads. The default is 32.
cnfsReboot
Specifies whether the node reboots when CNFS monitoring detects an unrecoverable problem that can be handled only by node failover.

Valid values are yes or no. The default is yes and recommended. If node reboot is not desired for other reasons, it should be noted that clients that were communicating with the failing node are likely to get errors or hang. CNFS failover is only guaranteed with cnfsReboot enabled.

The -N flag is valid for this attribute.

cnfsSharedRoot
Specifies a directory in a GPFS file system to be used by the clustered NFS subsystem.

GPFS must be down on all CNFS nodes in the cluster when changing the cnfsSharedRoot attribute.

See the IBM Spectrum Scale: Administration Guide for restrictions and additional information.

cnfsVersions
Specifies a comma-separated list of protocol versions that CNFS should start and monitor.

The default is 3,4.

GPFS must be down on all CNFS nodes in the cluster when changing the cnfsVersions attribute.

See the IBM Spectrum Scale: Administration Guide for additional information.

commandAudit
Controls the logging of audit messages for GPFS commands that change the configuration of the cluster. This attribute is not supported on Windows operating systems. For more information, see Audit messages for cluster configuration changes.
on
Starts audit messages. Messages go to syslog and the GPFS log.
syslogOnly
Starts audit messages. Messages go to syslog only. This value is the default.
off
Stops audit messages.

The -N flag is valid for this attribute.

confirmShutdownIfHarmful={yes|no}
Specifies whether the mmshutdown command checks that shutting down the listed nodes will cause a loss of function in the cluster. For more information, see mmshutdown command.
The default value is yes.
dataDiskCacheProtectionMethod
The dataDiskCacheProtectionMethod parameter defines the cache protection method for disks that are used for the GPFS file system. The valid values for this parameter are 0, 1, and 2.

The default value is 0. The default value indicates that the disks are Power-Protected and, when the down disk is started, only the standard GPFS log recovery is required. If the value of this parameter is 1, the disks are Power-Protected with no disk cache. GPFS works the same as before. If the value of this parameter is 2, when a node stops functioning, files that have data in disk cache must be recovered to a consistent state when the disk is started.

This parameter impacts only disks in the FPO storage pool. If the physical disk-write cache is enabled, the value of this parameter must be set to 2. Otherwise, maintain the default.

dataDiskWaitTimeForRecovery
Specifies a period, in seconds, during which the recovery of dataOnly disks is suspended to give the disk subsystem a chance to correct itself. This parameter is taken into account when the affected disks belong to a single failure group. If more than one failure group is affected, the delay is based on the value of minDiskWaitTimeForRecovery.

Valid values are 0 - 3600 seconds. The default is 3600. If restripeOnDiskFailure is no, dataDiskWaitTimeForRecovery has no effect.

dataStructureDump
Specifies a path for storing dumps. You can specify a directory or a symbolic link. The default is to store dumps in /tmp/mmfs. This attribute takes effect immediately whether or not -i is specified.
It is a good idea to create a directory or a symbolic link for problem determination information. Do not put it in a GPFS file system, because it might not be available if GPFS fails. When a problem occurs, GPFS can write 200 MiB or more of problem determination data into the directory. Copy and delete the files promptly so that you do not get a NOSPACE error if another failure occurs.
Important: Before you change the value of dataStructureDump, stop the GPFS trace. Otherwise you will lose GPFS trace data. Restart the GPFS trace afterward. For more information, see Generating GPFS trace reports.

The -N flag is valid for this attribute.

deadlockBreakupDelay
Specifies how long to wait after a deadlock is detected before attempting to break up the deadlock. Enough time must be provided to allow the debug data collection to complete.

The default is 0, which means that the automated deadlock breakup is disabled. A positive value enables the automated deadlock breakup. If automated deadlock breakup is to be enabled, a delay of 300 seconds or longer is recommended.

deadlockDataCollectionDailyLimit
Specifies the maximum number of times that debug data can be collected each day.

The default is 3. If the value is 0, then no debug data is collected when a potential deadlock is detected.

deadlockDataCollectionMinInterval
Specifies the minimum interval between two consecutive collections of debug data.

The default is 3600 seconds.

deadlockDetectionThreshold
Specifies the initial deadlock detection threshold. The effective deadlock detection threshold adjusts itself over time. A suspected deadlock is detected when a waiter waits longer than the effective deadlock detection threshold.

The default is 300 seconds. If the value is 0, then automated deadlock detection is disabled.

deadlockDetectionThresholdForShortWaiters
Specifies the deadlock detection threshold for short waiters. The default value is 60 seconds. Do not set a large value, because short waiters are supposed to complete and disappear quickly.
deadlockOverloadThreshold
Specifies the threshold for detecting a cluster overload condition. If the overload index on a node exceeds the deadlockOverloadThreshold, then the effective deadlockDetectionThreshold is raised. The overload index is calculated heuristically and is based mainly on the I/O completion times.

The default is 1. If the value is 0, then overload detection is disabled.

debugDataControl
Controls the amount of debug data that is collected. This attribute takes effect immediately whether or not -i is specified. The -N flag is valid for this attribute.
none
No debug data is collected.
light
The minimum amount of debug data that is most important for debugging issues is collected. This is the default value.
medium
More debug data is collected.
heavy
The maximum amount of debug data is collected, targeting internal test systems.
verbose
Needed only for troubleshooting special cases and can result in large dumps.
The following table provides more information about these settings:
Table 2. Settings for debugDataControl
Setting Collect dump data Collect trace information (if tracing is already running) Collect a short sample of trace information1
none No No No
light (default) Yes Yes No
medium Yes, more dump data Yes No
heavy (for internal test teams) Yes, even more dump data Yes Yes
verbose (for developers) Yes, all dump data Yes Yes
1If trace is not running, turn tracing on, let it run for 20 seconds, and then turn trace off.
defaultHelperNodes
Specifies a default set of nodes that can be used by commands that are able to distribute work to multiple nodes. To specify values for this parameter, follow the rules that are described for the -N option in the topic Specifying nodes as input to GPFS commands.

To override this setting when you use such commands, explicitly specify the helper nodes with the -N option of the command that you issue.

The following commands can use the nodes that this parameter provides: mmadddisk, mmapplypolicy, mmbackup, mmchdisk, mmcheckquota, mmdefragfs, mmdeldisk, mmdelsnapshot, mmfileid, mmfsck, mmimgbackup, mmimgrestore, mmrestorefs, mmrestripefs, and mmrpldisk.

When the command runs, its lists the NodeClass values.

defaultMountDir
Specifies the default parent directory for GPFS file systems. The default value is /gpfs. If an explicit mount directory is not provided with the mmcrfs, mmchfs, or mmremotefs command, the default mount point is set to DefaultMountDir/DeviceName.

dioSmallSeqWriteBatching={yes | no}
Controls whether GPFS enables a performance optimization that allows multiple Direct I/O (DIO) Asynchronous Input/Output (AIO) write requests to be handled as buffered I/O and be batched together into larger write operations. When enabled, GPFS tries to combine multiple small sequential asynchronous Direct I/O writes when committing the writes to storage. Valid values are yes or no. The default value is no.

When dioSmallSeqWriteBatching is set to yes GPFS holds small (up to 64 KiB) AIO/DIO write requests for a few microseconds, to allow for the held request to be combined together with additional contiguous writes that might occur.

disableInodeUpdateOnFdatasync
Controls the inode update on fdatasync for mtime and atime updates. Valid values are yes or no.

When disableInodeUpdateOnFdatasync is set to yes, the inode object is not updated on disk for mtime and atime updates on fdatasync() calls. File size updates are always synced to the disk.

When disableInodeUpdateOnFdatasync is set to no, the inode object is updated with the current mtime on fdatasync() calls. This is the default.

diskReadExclusionList
Specifies the list of NSD names that should be excluded from data block reads. It is used for resolving data block replica mismatches. Separate the NSD names with a semicolon (;) and enclose the list in quotes. For example:
diskReadExclusionList="gpfs1nsd;gpfs2nsd;gpfs3nsd"
To disable this option, use:
diskReadExclusionList=""
You can also use:
diskReadExclusionList=DEFAULT
For more information, see Replica mismatches.
dmapiDataEventRetry
Controls how GPFS handles data events that are enabled again immediately after the event is handled by the DMAPI application. Valid values are as follows:
-1
Specifies that GPFS always regenerates the event as long as it is enabled. This value should be used only when the DMAPI application recalls and migrates the same file in parallel by many processes at the same time.
0
Specifies to never regenerate the event. Do not use this value if a file might be migrated and recalled at the same time.
RetryCount
Specifies the number of times the data event should be retried. The default is 2.

For further information regarding DMAPI for GPFS, see GPFS-specific DMAPI events.

dmapiEventTimeout
Controls the blocking of file operation threads of NFS, while in the kernel waiting for the handling of a DMAPI synchronous event. The parameter value is the maximum time, in milliseconds, the thread blocks. When this time expires, the file operation returns ENOTREADY, and the event continues asynchronously. The NFS server is expected to repeatedly retry the operation, which eventually finds the response of the original event and continue. This mechanism applies only to read, write, and truncate event types, and only when such events come from NFS server threads. The timeout value is given in milliseconds. The value 0 indicates immediate timeout (fully asynchronous event). A value greater than or equal to 86400000 (which is 24 hours) is considered infinity (no timeout, fully synchronous event). The default value is 86400000.

For the parameter change to take effect, restart the GPFS daemon on the nodes that are specified in the -N option. If the -N option is not used, restart the GPFS daemon on all nodes.

For further information about DMAPI for GPFS, see GPFS-specific DMAPI events.

The -N flag is valid for this attribute.

dmapiMountEvent
Controls the generation of the mount, preunmount, and unmount events. Valid values are:
all
mount, preunmount, and unmount events are generated on each node. This is the default behavior.
SessionNode
mount, preunmount, and unmount events are generated on each node and are delivered to the session node, but the session node does not deliver the event to the DMAPI application unless the event is originated from the SessionNode itself.
LocalNode
mount, preunmount, and unmount events are generated only if the node is a session node.

For further information regarding DMAPI for GPFS, see GPFS-specific DMAPI events.

dmapiMountTimeout
Controls the blocking of mount operations, waiting for a disposition for the mount event to be set. This timeout is activated, at most once on each node, by the first external mount of a file system that has DMAPI enabled, and only if there has never before been a mount disposition. Any mount operation on this node that starts while the timeout period is active waits for the mount disposition. The parameter value is the maximum time, in seconds, that the mount operation waits for a disposition. When this time expires and there is still no disposition for the mount event, the mount operation fails, returning the EIO error. The timeout value is given in full seconds. The value 0 indicates immediate timeout (immediate failure of the mount operation). A value greater than or equal to 86400 (which is 24 hours) is considered infinity (no timeout, indefinite blocking until there is a disposition). The default value is 60.

The -N flag is valid for this attribute.

For further information regarding DMAPI for GPFS, see GPFS-specific DMAPI events.

dmapiSessionFailureTimeout
Controls the blocking of file operation threads, while in the kernel, waiting for the handling of a DMAPI synchronous event that is enqueued on a session that has experienced a failure. The parameter value is the maximum time, in seconds, the thread waits for the recovery of the failed session. When this time expires and the session has not yet recovered, the event is canceled and the file operation fails, returning the EIO error. The timeout value is given in full seconds. The value 0 indicates immediate timeout (immediate failure of the file operation). A value greater than or equal to 86400 (which is 24 hours) is considered infinity (no timeout, indefinite blocking until the session recovers). The default value is 0.

For further information regarding DMAPI for GPFS, see GPFS-specific DMAPI events.

The -N flag is valid for this attribute.

enableIPv6
Controls whether the GPFS daemons communicate through the IPv6 network. The following values are valid:
no
Specifies that the GPFS daemons do not communicate through the IPv6 network. This is the default value.
Note: If any of the node interfaces that are specified for the mmcrcluster command resolves to an IPv6 address, the mmcrcluster command automatically enables the new cluster for IPv6 and sets the enableIPv6 attribute to yes. For more information, see Enabling a cluster for IPv6.
yes
Specifies that the GPFS daemons communicate through the IPv6 network. yes requires that the daemon be down on all nodes.
prepare
After the command completes, the daemons can be recycled on all nodes at a time chosen by the user (before proceeding to run the command with commit specified).
commit
Verifies that all currently active daemons have received the new value, allowing the user to add IPv6 nodes to the cluster.
Note:

To use IPv6 addresses for GPFS, the operating system must be properly configured as IPv6 enabled, and IPv6 addresses must be configured on all the nodes within the cluster.

encryptionKeyCacheExpiration
Specifies the refresh interval, in seconds, of the file system encryption key cache that is used internally by the mmfsd daemon. The default value of this parameter is 900 seconds. The refresh operation of the encryption key cache requires the remote key server to be accessible and functional. A restart of the GPFS mmfsd daemon is required for any change in the value of this parameter to become effective. For more information, see Encryption keys.

A value of 0 indicates that the encryption key cache does not expire and it is not periodically refreshed.

A value of 60 (seconds) or greater indicates that the encryption key cache expires after the specified amount of time. The encryption key cache is refreshed automatically by the mmfsd daemon from the remote key server. No administrative action is required.

Note:

Changing the value of the encryptionKeyCacheExpiration requires cluster services (mmfsd) to be restarted on each node in order to take effect.

Though a value of 0 means the encryption key cache does not expire and is not refreshed periodically, connectivity to the remote key server is required from all nodes that access an encrypted file system. Regardless of the value of the encryptionKeyCacheExpiration, access to the remote key server is required after the following scenarios:
  • When you are unmounting or mounting an encrypted file system.
  • When you are using the mmchpolicy command to install a new policy for the file system, even if the key does not change.
  • When a KMIP client is registered or deregistered by using the mmkeyserv client register or mmkeyserv client deregister command.
enforceFilesetQuotaOnRoot
Controls whether fileset quotas should be enforced for the root user the same way as for any other users. Valid values are yes or no. The default is no.
expelDataCollectionDailyLimit
Specifies the maximum number of times that debug data associated with expelling nodes can be collected in a 24-hour period. Sometimes exceptions are made to help capture the most relevant debug data.

The default is 3. If the value is 0, then no expel-related debug data is collected.

expelDataCollectionMinInterval
Specifies the minimum interval, in seconds, between two consecutive expel-related data collection attempts on the same node.

The default is 3600 seconds.

failureDetectionTime
Indicates to GPFS the amount of time it takes to detect that a node has failed.

GPFS must be down on all the nodes when changing the failureDetectionTime attribute.

fastestPolicyCmpThreshold
Indicates the disk comparison count threshold, above which GPFS forces selection of this disk as the preferred disk to read and update its current speed.

Valid values are >= 3. The default is 50. In a system with SSD and regular disks, the value of the fastestPolicyCmpThreshold parameter can be set to a greater number to let GPFS refresh the speed statistics for slower disks less frequently.

fastestPolicyMaxValidPeriod
Indicates the time period after which the disk's current evaluation is considered invalid (even if its comparison count has exceeded the threshold) and GPFS prefers to read this disk in the next selection to update its latest speed evaluation.

Valid values are >= 1 in seconds. The default is 600 (10 minutes).

fastestPolicyMinDiffPercent
A percentage value indicating how GPFS selects the fastest between two disks. For example, if you use the default fastestPolicyMinDiffPercent value of 50, GPFS selects a disk as faster only if it is 50% faster than the other. Otherwise, the disks remain in the existing read order.

Valid values are 0 - 100 in percentage points. The default is 50.

fastestPolicyNumReadSamples
Controls how many read samples are taken to evaluate the disk's recent speed.

Valid values are 3 - 100. The default is 5.

fileHeatLossPercent
The file heat attribute of a file increases in value when the file is accessed but decreases in value over time if the file is not accessed. The fileHeatLossPercent attribute specifies the percent of file access heat that an unaccessed file loses at the end of each tracking period. The valid range is 0 - 100. The default value is 10, which indicates that an unaccessed file loses 10 percent of its file access heat at the end of each tracking period. The tracking period is set by fileHeatPeriodMinutes. For more information, see File heat: Tracking file access temperature.

This attribute does not take effect until the GPFS daemon is stopped and restarted.

fileHeatPeriodMinutes
A nonzero value enables file heat tracking and specifies the frequency with which the file heat attribute is updated. A value of 0 disables file access heat tracking. The default value is 0. For more information, see File heat: Tracking file access temperature.

This attribute does not take effect until the GPFS daemon is stopped and restarted.

FIPS1402mode
Controls whether GPFS uses a FIPS-140-2-compliant encryption module for encrypted communications between nodes and for file encryption. Valid values are yes or no. The default value is no.
When it is enabled, FIPS 140-2 mode applies only to the following two features of IBM Spectrum Scale:
  • Encryption and decryption of file data when it is transmitted between nodes in the current cluster or between a node in the current cluster and a node in another cluster. To enable this feature, issue the following command:
    mmchconfig cipherList=SupportedCipher
    where SupportedCipher is a cipher that is supported by IBM Spectrum Scale, such as AES128-GCM-SHA256. For more information, see the following topics:

  • Encryption of file data as it is written to storage media and decryption of file data as it is read from storage media. For more information about file data encryption, see the following section of the documentation:
    Note: For performance reasons, do not enable FIPS 140-2 mode unless all the nodes in the cluster are running FIPS-certified kernels in FIPS mode. This note applies only to encryption of file data as it is written to storage media and decryption of file data as it is read from storage media. This note does not apply to encryption and decryption of file data when it is transmitted between nodes.
FIPS 140-2 mode does not apply to other components of IBM Spectrum Scale that use encryption, such as object encryption.

frequentLeaveCountThreshold
Specifies the number of times a node exits the cluster within the last frequentLeaveTimespanMinutes before autorecovery ignores the next exit of that node. If the exit count of a node within the last frequentLeaveTimespanMinutes is greater than frequentLeaveCountThreshold, autorecovery ignores the corresponding node exit.

The valid values are 0 - 10. The default is 0, which means autorecovery always handles the exit of a node no matter how frequent a node exits.

If restripeOnDiskFailure is no, frequentLeaveCountThreshold has no effect.

frequentLeaveTimespanMinutes
Specifies the time span that is used to calculate the exit frequency of a node. If the exit count of a node within the last frequentLeaveTimespanMinutes is greater than the frequentLeaveCountThreshold, autorecovery ignores the corresponding node exit.

The valid values are 1 - 1440. The default is 60.

If restripeOnDiskFailure is no, frequentLeaveTimespanMinuteshas no effect.

ignorePrefetchLUNCount
The GPFS client node calculates the number of sequential access prefetch and write-behind threads to run concurrently for each file system by using the count of the number of LUNs in the file system and the value of maxMBpS. However, if the LUNs being used are composed of multiple physical disks, this calculation can underestimate the amount of IO that can be done concurrently.

Setting the value of the ignorePrefetchLUNCount parameter to yes does not include the LUN count and uses the maxMBpS value to dynamically determine the number of threads to schedule the prefetchThreads value.

This parameter impacts only the GPFS client node. The GPFS NSD server does not include this parameter in the calculation.

The valid values for this parameter are yes and no. The default value is no and can be used in traditional LUNs where one LUN maps to a single disk or an n+mP array. Set the value of this parameter to yes when the LUNs presented to GPFS are made up of a large numbers of physical disks.

The -N flag is valid for this attribute.

ignoreReplicationForQuota
Specifies whether the quota commands ignore data replication factor. Valid values are yes or no. The default value is no.

The ignoreReplicationForQuota parameter hides the data replication factor for both input and output quota commands. This parameter adjusts the values of quota commands according to the data replication factor. For example, if the data replication factor is 2, it means that for every block in the file, there are effectively 2 blocks being used. For a file of 1 MB, internally 2 MB is allocated. But for the end user, the file size must use 1MB of quota. Without the ignoreReplicationForQuota attribute, the quota management reports the file size as 2 MB.

Similarly, quota command inputs are also adjusted with the replication factor. For example, a 1 GB quota limits the overall sum of file sizes to 1 GB, even though internally, the data block usage is 2 GB because of the replication factor.

In a multi-cluster environment, the ignoreReplicationForQuota parameter must be configured on the home cluster that is the cluster which owns the file system. The configuration for this parameter is managed by the quota manager of the home cluster.

ignoreReplicationOnStatfs
Specifies whether df command output on GPFS file system ignores data replication factor. Valid values are yes or no. The default value is no.

The ignoreReplicationOnStatfs parameter ignores the replication factor and helps to report only the actual file size when a df command is used. For example, if the data replication factor is 2, it means that for every block in the file, there are effectively 2 blocks being used. For a file of 1 MB, internally 2 MB is allocated. But for the end user, the file size must use 1MB of quota. Without the ignoreReplicationOnStatfs attribute, the df command reports the file size as 2 MB.

In a multi-cluster environment, the ignoreReplicationOnStatfs parameter must be configured on the cluster from where the command is issued.

linuxStatfsUnits={posix | subblock | fullblock}
Controls the values that are returned by the Linux functions statfs and statvfs for f_bsize, f_rsize, f_blocks, and f_bfree:
Table 3. Values returned by statvfs or statfs for different settings of linuxStatfsUnits
linuxStatfsUnits f_bsize f_frsize f_blocks f_bfree
posix Block size Subblock size Units of subblocks Units of subblocks
subblock Subblock size Subblock size Units of subblocks Units of subblocks
fullblock Block size Block size Units of blocks Units of blocks
posix
Returns the correct values as they are specified by POSIX for statvfs. This setting might break Linux applications that are written for an earlier version of the statfs function and that incorrectly assume that file system capacity (f_blocks) and free space (f_bfree) are reported in units given by f_bsize rather than f_frsize.
subblock
Returns values that result in correct disk space requirement calculations but that do not break earlier Linux applications.
fullblock
Returns the same values as do versions of IBM Spectrum Scale that are earlier than 5.0.3. This is the default value.
Note:
  • The posix value is preferable for applications that use the POSIX-compliant statvfs function.
  • Linux applications that were built with the earlier Linux statfs function might depend on the behavior that is provided by the fullblock option.
For more information, see the description of the -b Blocksize option in the topic mmcrfs command.

The -N flag is valid for this attribute.

lrocData
Controls whether user data is populated into the local read-only cache. Other configuration options can be used to select the data that is eligible for the local read-only cache. When using more than one such configuration option, data that matches any of the specified criteria is eligible to be saved.
Valid values are yes or no. The default value is yes.
If lrocData is set to yes, by default the data that was not already in the cache when accessed by a user is subsequently saved to the local read-only cache. The default behavior can be overridden using the lrocDataMaxFileSize and lrocDataStubFileSize configuration options to save all data from small files or all data from the initial portion of large files.
lrocDataMaxFileSize
Limits the data that can be saved in the local read-only cache to only the data from small files.
A value of -1 indicates that all data is eligible to be saved. A value of 0 indicates that small files are not to be saved. A positive value indicates the maximum size of a file to be considered for the local read-only cache. For example, a value of 32768 indicates that files with 32 KB of data or less are eligible to be saved in the local read-only cache. The default value is -1.
lrocDataStubFileSize
Limits the data that can be saved in the local read-only cache to only the data from the first portion of all files.
A value of -1 indicates that all file data is eligible to be saved. A value of 0 indicates that stub data is not eligible to be saved. A positive value indicates that the initial portion of each file that is eligible is to be saved. For example, a value of 32768 indicates that the first 32 KB of data from each file is eligible to be saved in the local read-only cache. The default value is -1.
lrocDirectories
Controls whether directory blocks is populated into the local read-only cache. The option also controls other file system metadata such as indirect blocks, symbolic links, and extended attribute overflow blocks.
Valid values are yes or no. The default value is yes.
lrocEnableStoringClearText
Controls whether encrypted file data can be read into a local read-only cache (LROC) device. Valid values are yes and no. The default value is no.
If the value is yes, encrypted files can benefit from the performance improvements that are provided by an LROC device. However, be aware that IBM Spectrum Scale holds encrypted file data in memory as cleartext. Because LROC storage is non-volatile, an attacker can capture the cleartext by removing the LROC device from the system and reading the contents at some other location.
Warning: You must take steps to protect the cleartext while it is in LROC device storage. One method is to install an LROC device that internally encrypts data that is written into it and decrypts data that is read from it. However, be aware that a device of this type voids the IBM Spectrum Scale secure deletion guarantee, because IBM Spectrum Scale does not manage the encryption key for the device.
For more information, see the following topics:
lrocInodes
Controls whether inodes from open files is populated into the local read-only cache; the cache contains the full inode, including all disk pointers, extended attributes, and data.
Valid values are yes or no. The default value is yes.

logRecoveryThreadsPerLog
Controls the number of threads that are available for recovering a single log file. The default value is 8 and the valid range is 1 - 64. Setting a higher value expedites the recovery of single log files that are being replayed. The improvement in processing speed depends on the log file size and user workload.
logOpenParallelism
Controls the number of log files that can be opened in parallel during a log recovery. The default value is 8 and the valid range is 1 - 256. Setting a higher value improves the recovery speed of the file system when it is mounted on multiple nodes.
logRecoveryParallelism
Controls the number of log files that can be recovered concurrently. The default value is 1 and the valid range is 1 - 64. Setting a higher value expedites the recovery of the file system when multiple nodes fail at the same time.
maxActiveIallocSegs
Specifies the number of active inode allocation segments that are maintained on the specified nodes. The valid range is 1 - 64. A value greater than 1 can significantly improve performance in the following scenario:
  1. A single node has created a large number of files in multiple directories.
  2. Processes and threads on multiple nodes are now concurrently attempting to delete or unlink files in those directories.
Values greater than 8 might not provide any improvement in performance.

A value greater than 1 is supported only on file systems that are created at or upgraded to file system format version 5.0.2 or later (file system format number 20.00 or later).

The default value is 8 on file systems that are created at file system format version 5.0.2 or later. The default value is 1 on earlier file system format versions and on file systems that are upgraded to file system format version 5.0.2.

A change in the value of this attribute is not effective until after the file system is remounted.

If the value of this attribute has not been changed since the file system was created, the mmlsconfig command lists the value as -1. But the actual value is 1 or 8, depending on the level of the file system:
  • File systems created at version 5.0.2 or later: 8
  • File systems created at an earlier version: 1

The -N flag is valid for this attribute.

maxblocksize
Changes the maximum file system block size. Valid block sizes are 64 KiB, 128 KiB, 256 KiB, 512 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, and 16 MiB. Specify this value with the character K or M; for example, use 8M to specify a block size of 8 MiB. When you create a new cluster, maxblocksize is set to DEFAULT (4 MiB). For more information, see mmcrfs command.

File systems with block sizes larger than the specified value cannot be created or mounted unless the block size is increased.

GPFS must be down on all the nodes in the cluster when you change the maxblocksize attribute.

Note: When you migrate a cluster from an earlier version to 5.0.0 or later, the value of maxblocksize stays the same. However, if maxblocksize was set to DEFAULT in the earlier version of the cluster, then migrating it to 5.0.0 or later sets it explicitly to 1 MiB, which was the default value in earlier versions. To change maxblocksize to the default value after migrating to 5.0.0 or later, set maxblocksize=DEFAULT (4 MiB).
maxBufferDescs
Valid values are from 512 to 10,000,000.

Without explicit setting, it is set to a value of 10 * maxFilesToCache up to pagepool size/16 KB. Each buffer descriptor caches maximum block size data for a file. When caching small files, it does not need to be more than a small multiple of maxFilesToCache since only OpenFile objects can cache data blocks. When an application needs to cache large files, maxBufferDescs can be tuned to ensure that there are enough to cache large files.

For example, if you have 10,000 buffer descriptors that are configured and a 1MiB file system block size, you do not have enough buffer descriptors to cache a 20 GiB file. To cache a 20 GiB file, increase maxBufferDescs to at least 20,480 (20 GiB/1MiB=20,480).

The -N flag is valid for this attribute.
Note: When LROC is configured on the node, additional buffer descriptors may be required to reference data that is stored in LROC. For more information, see Local read-only cache.
maxDownDisksForRecovery
Specifies the maximum number of disks that might experience a failure and still be subject to an automatic recovery attempt. If this value is exceeded, no automatic recovery actions take place.

Valid values are in the range 0 - 300. The default is 16. If restripeOnDiskFailure is no, maxDownDisksForRecovery has no effect.

maxFailedNodesForRecovery
Specifies the maximum number of nodes that might be unavailable before automatic disk recovery actions are canceled.

Valid values are in the range 0 - 300. The default is 3. If restripeOnDiskFailure is no, maxFailedNodesForRecovery has no effect.

maxFcntlRangesPerFile
Specifies the number of fcntl locks that are allowed per file. The default is 200. The minimum value is 10 and the maximum value is 200000.

maxFilesToCache
Specifies the number of inodes to cache for open files or files that are recently closed. This parameter does not limit the number of files that can remain concurrently open on the node.

Storing the inode of a file in cache permits faster re-access to the file. The default is 4000, but increasing this number might improve throughput for workloads with high file reuse. However, increasing this number excessively might cause paging at the file system manager node. The value should be large enough to handle the number of concurrently open files plus allow caching of recently used files.

The -N flag is valid for this attribute.

maxMBpS
Specifies an estimate of how many megabytes of data can be transferred per second into or out of a single node. The default is 2048 MiB per second. The value is used in calculating the amount of I/O that can be done to effectively prefetch data for readers and write-behind data from writers. By lowering this value, you can artificially limit how much I/O one node can put on all of the disk servers.

The -N flag is valid for this attribute.

maxMissedPingTimeout
See the minMissedPingTimeout parameter.

maxReceiverThreads
Controls the maximum number of receiver threads that handle incoming TCP packets.  The actual number of receiver threads that are configured is limited by the number of logical CPUs on the node. If the number of logical CPUs is less than maxReceiverThreads, then the number of threads that are handling the packets is set to the number of logical CPUs.

The default value of maxReceiverThreads is 32.  The maximum value of maxReceiverThreads is 128. Range is 1-128.

This parameter should be increased on large clusters to limit the number of sockets that are handled by each receive thread. For clusters with 2048 nodes or more, set the parameter to the number of logical CPUs.

The -N flag is valid for this attribute.

maxStatCache
Specifies the number of inodes to keep in the stat cache. The stat cache maintains only enough inode information to perform a query on the files kept in the stat cache as would be needed by the 'ls' command. The valid range for maxStatCache is 0 - 100,000,000.
The default value of the maxStatCache parameter depends on the value of maxFilesToCache parameter in certain scenarios. The following table provides examples of such scenarios:
Table 4. Default value of maxStatCache parameter
Scenarios maxFilesToCache maxStatCache Comments
No changes to the default values of maxFilesToCache and maxStatCache 4000 1000 These default values are applied.
Value of maxFilesToCache is less than 2500 [ < 2500 ] 4 * maxFilesToCache Example: If maxFilesToCache is 2000, then maxStatCache is 8000.
Value of maxFilesToCache is 2500 or higher [ >= 2500 ] 10000 If maxStatCache is calculated by the daemon, the value is capped at 10000.
If you do not accept either of the default values, set maxStatCache to an appropriate size based on the number of nodes in the cluster, the number of token managers in the cluster, the size of the Local Read-Only Cache (LROC) if one is configured, and any other relevant factors.
Note: In versions of IBM Spectrum Scale earlier than 5.0.2, the stat cache is not effective on the Linux platform unless the LROC is configured. In versions earlier than 5.0.2, follow these guidelines:
  • If LROC is not enabled on the node, set maxStatCache to 0.
  • If LROC is enabled on the node, accept a default value of maxStatCache or set maxStatCache to an appropriate size as described in the previous paragraphs.
This pre-5.0.2 restriction applies to all the versions and distributions of Linux that IBM Spectrum Scale supports.

The -N flag is valid for this attribute.

maxTcpConnsPerNodeConn
Controls the maximum number of TCP connections that the GPFS mmfsd daemon will establish to another node. Valid values are 1-8, with the default being 2. For any given pair of nodes, the number of established TCP connections will be the minimum value of maxTcpConnsPerNodeConn defined on these two nodes. For further information on configuring this parameter, see Recommendations for tuning maxTcpConnsPerNodeConn parameter.

The -N flag is valid for this attribute.

Note: In IBM Spectrum Scale 5.1.1 or later, multiple TCP connections may be used to communicate with another remote GPFS daemon. Although a single destination IP address is always used, multiple physical links may be used for those connections given the appropriate configuration. For example, an 802.3ad connection with a layer 3+4 xmit_hash_policy.
metadataDiskWaitTimeForRecovery
Specifies a period, in seconds, during which the recovery of metadata disks is suspended to give the disk subsystem a chance to correct itself. This parameter is taken into account when the affected disks belong to a single failure group. If more than one failure group is affected, the delay is based on the value of minDiskWaitTimeForRecovery.

Valid values are 0 - 3600 seconds. The default is 2400. If restripeOnDiskFailure is no, metadataDiskWaitTimeForRecovery has no effect.

minDiskWaitTimeForRecovery
Specifies a period, in seconds, during which the recovery of disks is suspended to give the disk subsystem a chance to correct itself. This parameter is taken into account when more than one failure group is affected. If the affected disks belong to a single failure group, the delay is based on the values of dataDiskWaitTimeForRecovery and metadataDiskWaitTimeForRecovery.

Valid values are 0 - 3600 seconds. The default is 1800. If restripeOnDiskFailure is no, minDiskWaitTimeForRecovery has no effect.

minIndBlkDescs
Specifies the total number of indirect blocks in cache where the disk addresses of data blocks or indirect blocks of files are stored. Each indirect block descriptor caches one indirect block. Caching these indirect blocks enable faster retrieval of the disk location of the data to be read or written, thus improving the performance of I/Os.

The default value for minIndBlkDesc is 5000. The minIndBlkDesc value is calculated by assigning the maximum value between minIndBlkDesc and maxFilesToCache.

For example:
  • If user does not set the value for minIndBlkDesc, then its effective value is MAX (5000,maxFilesToCache), whichever is the higher.
  • If user sets the value for minIndBlkDesc, then its effective value is MAX (minIndBlkDesc, maxFilesToCache), whichever is the higher.

The minIndBlkDesc value must be large enough to handle the number of concurrently open files, or traverse many data blocks in large files, if the maxFilesToCache is set to a small value.

The -N flag is valid for this attribute.

minMissedPingTimeout
The minMissedPingTimeout and maxMissedPingTimeout parameters set limits on the calculation of missedPingTimeout (MPT). The MPT is the allowable time for pings sent from the Cluster Manager (CM) to a node that has not renewed its lease to fail. The default MPT value is 5 seconds less than leaseRecoveryWait. The CM will wait the MPT seconds after the lease has expired before declaring a node out of the cluster. The values of the minMissedPingTimeout and maxMissedPingTimeout are in seconds; the default values are 3 and 60 respectively. If these values are changed, only GPFS on the quorum nodes that elect the CM must be recycled to take effect.

This parameter can be used to cover over a central network switch failure timeout or other network glitches that might be longer than leaseRecoveryWait. This might prevent false node down conditions, but it extends the time for node recovery to finish and might block other nodes from progressing if the failing node holds the tokens for many shared files.

As is the case with leaseRecoveryWait, a node is usually expelled from the cluster if there is a problem with the network or the node runs out of resources like paging. For example, if there is an application that is running on a node that is paging the machine too much or overrunning network capacity, GPFS might not have the chance to contact the Cluster Manager node to renew the lease within the timeout period.

The default value of this parameter is 3. A valid value is any number in the range 1 - 300.

mmapRangeLock
Specifies POSIX or non-POSIX mmap byte-range semantics. Valid values are yes or no (yes is the default). A value of yes indicates POSIX byte-range semantics apply to mmap operations. A value of no indicates non-POSIX mmap byte-range semantics apply to mmap operations.
If using InterProcedural Analysis (IPA), turn off this option:
mmchconfig mmapRangeLock=no -i
This allows more lenient intranode locking, but imposes internode whole file range tokens on files using mmap while writing.
mmfsLogTimeStampISO8601
Controls the time stamp format for GPFS log entries. Specify yes to use the ISO 8601 time stamp format for log entries or no to use the earlier time stamp format. The default value is yes. You can specify the log time stamp format for the entire cluster or for individual nodes. You can have different log time stamp formats on different nodes of the cluster. For more information, see Time stamp in GPFS log entries.

The -N flag is valid for this attribute. This attribute takes effect immediately, whether or not -i is specified.

mmhealthHDFSBinPath

Specifies the path to the HDFS binary directory.

mmhealthHDFSCoreSiteFile

Specifies the file that contains information about the HDFS core site.

mmhealthHDFSEnvFile

Specifies the file that contains information about the HDFS environment.

mmhealthHDFSGpfsBinPath

Specifies the directory that contains information of the mmhadoopctl executable. By default, the file location is /usr/lpp/mmfs/bin.

mmhealthHDFSPidPath

Specifies the path to the PID file of HDFS.

mmhealthHDFSWorkersFile

Specifies the file that contains information about HDFS workers.

mmhealthNFSCollectDebug
yes
Allows the system to collect the NFS debug data in the /var/adm/ras directory when an nfs_not_active failure event occurs.
no
Restricts the system from collecting the NFS debug data in the /var/adm/ras directory when an nfs_not_active failure event occurs.
mmhealthNFSFailoverWhenUnresponsive
yes
Performs a CES IP failover when the NFS server is unresponsive.
no
Raises a WARNING event, nfs_unresponsive, when the NFS server is unresponsive.
mmhealthNFSPreventStartWhenFsMissing
yes
Prevents the NFS from starting up after a reboot, or after the mmstartup command is run. The NFS startup is prevented only when none of the required export file systems are available.
no
Allows the NFS from starting up after reboot, or after the mmstartup command is run. The NFS startup is prevented only when none of the required export file systems are available.
mmhealthPtfUpdatesMonitorEnabled
yes
Enables PTF update monitoring by using the call home function.
no
Disables PTF update monitoring. This is the default value.
mmhealthNFSRpcBindRestart
yes
Raises an event rpcbind_unresponsive, and restarts the rpcbind service when a call to rpcinfo -p does not yield a return code of 0.
no
Raises an mmhealth WARNING event rpcbind_warn.
mmhealthUseSharedLib
yes
Uses an internal library to establish communication between sysmonitor daemons.
no
Uses a slower and more resilient communication channel.
nfsPrefetchStrategy
With the nfsPrefetchStrategy parameter, GPFS optimizes prefetching for NFS file-style access patterns. This parameter defines a window of the number of blocks around the current position that are treated as fuzzy-sequential access. The value of this parameter can improve the performance while reading large files sequentially. However, because of kernel scheduling, some read requests that come to GPFS are not sequential. If the file system block size is smaller than the read request sizes, increasing the value of this parameter provides a bigger window of blocks. The default value is 0. A valid value is any number in the range 0 - 10.

Setting the value of nfsPrefetchStrategy to 1 or greater can improve the sequential read performance when large files are accessed by using NFS and the file system block size is smaller than the NFS transfer block size.

nistCompliance
Controls whether GPFS operates in the NIST 800-131A mode. (This applies to security transport only, not to encryption, as encryption always uses NIST-compliant mechanisms.)
Valid values are:
off
Specifies that there is no compliance to NIST standards. For clusters that are operating below the GPFS 4.1 level, this is the default. For clusters at the version 5.1 level or higher, setting nistCompliance to off is not allowed.
SP800-131A
Specifies that security transport is to follow the NIST SP800-131A recommendations. For clusters at the GPFS 4.1 level or higher, this is the default.
Note: In a remote cluster setup, all clusters must have the same nistCompliance value.
noSpaceEventInterval
Specifies the time interval between calling a callback script of two noDiskSpace events of a file system. The default value is 120 seconds. If this value is set to zero, the noDiskSpace event is generated every time the file system encounters the noDiskSpace event. The noDiskSpace event is generated when a callback script is registered for this event with the mmaddcallback command.
nsdBufSpace
This option specifies the percentage of the page pool that is reserved for the network transfer of NSD requests. Valid values are within the range of 10 to 70. The default value is 30. On IBM Spectrum Scale RAID recovery group NSD servers, this value should be decreased to its minimum of 10, since vdisk-based NSDs are served directly from the RAID buffer pool (as governed by nsdRAIDBufferPoolSizePct). On all other NSD servers, increasing either this value or the amount of page pool, or both, could improve NSD server performance. On NSD client-only nodes, this parameter is ignored. For more information about IBM Spectrum Scale RAID, see IBM Spectrum Scale RAID: Administration.

The -N flag is valid for this attribute.

nsdCksumTraditional
This attribute enables checksum data-integrity checking between a traditional NSD client node and its NSD server. Valid values are yes and no. The default value is no. (Traditional in this context means that the NSD client and server are configured with IBM Spectrum Scale rather than with IBM Spectrum Scale RAID. The latter is a component of IBM® Elastic Storage Server (ESS) and of IBM GPFS Storage Server (GSS).)

The checksum procedure detects any corruption by the network of the data in the NSD RPCs that are exchanged between the NSD client and the server. A checksum error triggers a request to retransmit the message.

When this attribute is enabled on a client node, the client indicates in each of its requests to the server that it is using checksums. The server uses checksums only in response to client requests in which the indicator is set. A client node that accesses a file system that belongs to another cluster can use checksums in the same way.

You can change the value of the this attribute for an entire cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes.
Note:
  • Enabling this feature can result in significant I/O performance degradation and a considerable increase in CPU usage.
  • To enable checksums for a subset of the nodes in a cluster, issue a command like the following one:
    mmchconfig nsdCksumTraditional=yes -i -N <subset-of-nodes>

The -N flag is valid for this attribute.

nsdDumpBuffersOnCksumError
This attribute enables the dumping of the data buffer to a file when a checksum error occurs. Valid values are yes and no. The default value is no. The location of the dump file is set by the dataStructureDump attribute.

You can change the value of the nsdDumpBuffersOnCksumError attribute for a cluster without shutting down the mmfsd daemon, or for one or more nodes without restarting the nodes.

The -N flag is valid for this attribute.

nsdInlineWriteMax
The nsdInlineWriteMax parameter specifies the maximum transaction size that can be sent as embedded data in an NSD-write RPC. The value of this parameter is ignored if verbs send is enabled by using the verbsRdmaSend parameter.

In most cases, the NSD-write RPC exchange performs the following steps:

  1. An RPC is sent from the client to the server to request a write.
  2. A GetData RPC is sent back from the server to the client to request the data.
Note: For data smaller than nsdInlineWriteMax, GPFS sends that amount of write data directly without the GetData RPC from the server to the client.

The default value of this parameter is 1024. A valid value is any number in the range 0 - 8M.

nsdMaxWorkerThreads
The nsdMaxWorkerThreads parameter sets the maximum number of NSD threads that can be involved in NSD I/O operations on an NSD server to the storage system to which the server is connected. On 64-bit architectures, the maximum value of the workerThreads, prefetchThreads, and nsdMaxWorkerThreads is less than 8192. The minimum value of nsdMaxWorkerThreads is 8 and the default value is 512. The default value works for most of the use cases.

This value should be scaled with the number of NSDs the server is connected with and not NSD clients in the cluster. When the NSDs to which the server is doing I/O are constructed from high performance storage device, such as IBM FlashSystem® or ESS, increased value of nsdMaxWorkerThreads can obtain better performance.

nsdMinWorkerThreads
The nsdMinWorkerThreads parameter sets a lower bound on number of active NSD I/O threads on an NSD server node that executes I/O operations against NSDs. The value represents the minimum number of NSD I/O threads required to execute the I/O operations.

The default value of this parameter is 16 and the minimum value is 1. For limits on setting the number of NSD threads, see the description of nsdMaxWorkerThreads.

nsdMultiQueue
The nsdMultiQueue parameter sets the number of queues. The default value of this parameter is 256. A valid value is any number in the range 2 - 512.
nsdRAIDBufferPoolSizePct
This option specifies the percentage of the page pool that is used for the IBM Spectrum Scale RAID vdisk buffer pool. Valid values are within the range of 10 to 90. The default is 50 when IBM Spectrum Scale RAID is configured on the node in question; 0 when it is not. For more information about IBM Spectrum Scale RAID, see IBM Spectrum Scale RAID: Administration.

The -N flag is valid for this attribute.

nsdRAIDTracks
This option specifies the number of tracks in the IBM Spectrum Scale RAID buffer pool, or 0 if this node does not have a IBM Spectrum Scale RAID vdisk buffer pool. This controls whether IBM Spectrum Scale RAID services are configured. For more information about IBM Spectrum Scale RAID, see IBM Spectrum Scale RAID: Administration.

Valid values are: 0; 256 or greater.

The -N flag is valid for this attribute.

nsdServerWaitTimeForMount
When mounting a file system whose disks depend on NSD servers, this option specifies the number of seconds to wait for those servers to come up. The decision to wait is controlled by the criteria managed by the nsdServerWaitTimeWindowOnMount option.

Valid values are 0 - 1200 seconds. The default is 300. A value of zero indicates that no waiting is done. The interval for checking is 10 seconds. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no effect.

The mount thread waits when the daemon delays for safe recovery. The mount wait for NSD servers to come up, which is covered by this option, occurs after expiration of the recovery wait allows the mount thread to proceed.

The -N flag is valid for this attribute.

nsdServerWaitTimeWindowOnMount
Specifies a window of time (in seconds) during which a mount can wait for NSD servers as described for the nsdServerWaitTimeForMount option. The window begins when quorum is established (at cluster startup or subsequently), or at the last known failure times of the NSD servers required to perform the mount.

Valid values are 1 - 1200 seconds. The default is 600. If nsdServerWaitTimeForMount is 0, nsdServerWaitTimeWindowOnMount has no effect.

The -N flag is valid for this attribute.

When a node rejoins the cluster after being removed for any reason, the node resets all the failure time values that it knows about. Therefore, when a node rejoins the cluster it believes that the NSD servers have not failed. From the perspective of a node, old failures are no longer relevant.

GPFS checks the cluster formation criteria first. If that check falls outside the window, GPFS then checks for NSD server fail times being within the window.

numaMemoryInterleave
In a Linux NUMA environment, the default memory policy is to allocate memory from the local NUMA node of the CPU from which the allocation request was made. This parameter is used to change to an interleave memory policy for GPFS by starting GPFS with numactl --interleave=all.

Valid values are yes and no. The default is no.

If you run IBM Spectrum Scale on a multi-processor system, you should set this variable to yes and restart GPFS.

Before using this parameter, ensure that the Linux numactl package has been installed.

pagepool
Changes the size of the cache on each node. The default value is either one-third of the physical memory on the node or 1 GiB, whichever is smaller. This applies to new installations only; on upgrades the existing default value is kept.

The maximum GPFS page pool size depends on the value of the pagepoolMaxPhysMemPct parameter and the amount of physical memory on the node. You can specify this value with the suffix K, M, or G, for example, 128M.

If a node in the cluster is an NSD server or is configured for IBM Spectrum Scale RAID, this parameter takes effect on the node only when the GPFS daemon is restarted, even if the -i or -I option is specified.

The -N flag is valid for this attribute.

pagepoolMaxPhysMemPct
Percentage of physical memory that can be assigned to the page pool. Valid values are 10 - 90 percent. The default is 75 percent (with the exception of Windows, where the default is 50 percent).

The -N flag is valid for this attribute.

panicOnIOHang={yes | no}
Controls whether the GPFS daemon panics the node kernel when a local I/O request is pending in the kernel for more than five minutes. This attribute applies only to disks that the node is directly attached to.
yes
Causes the GPFS daemon to panic the node kernel.
no
Takes no action. This is the default value.
This attribute is not supported in the Microsoft Windows environment.
Note: With the diskIOHang event of the mmaddcallback command, you can add notification and data collection scripts to isolate the reason for a long I/O wait.

The -N flag is valid for this attribute.

pitWorkerThreadsPerNode
Controls the maximum number of threads to be involved in parallel processing on each node that is serving as a Parallel Inode Traversal (PIT) worker.

By default, when a command that uses the PIT engine is run, the file system manager asks all nodes in the local cluster to serve as PIT workers; however, you can specify an exact set of nodes to serve as PIT workers by using the -N option of a PIT command. Note that the current file system manager node is a mandatory participant, even if it is not in the list of nodes you specify. On each participating node, up to pitWorkerThreadsPerNode can be involved in parallel processing. The range of accepted values is 0 to 8192. The default value varies within the 2-16 range, depending on the file system configuration. If a file system contains vdisk-based NSD disks, the default value varies within the 8-64 range.

prefetchPct
GPFS uses the prefetchPct parameter as a guideline to limit the page pool space that is to be used for prefetch and write-behind buffers for active sequential streams. The default value of the prefetchPct parameter is 20% of the pagepool value. If the workload is sequential with very little caching of small files or random IO, increase the value of this parameter to 60% of the pagepool value, so that each stream can have more buffers that are cached for prefetch and write-behind operations.

The default value of this parameter is 20. The valid value can be any numbers in the range 0 - 60.

prefetchThreads
Controls the maximum number of threads that are dedicated to prefetching data for files that are read sequentially, or to handle sequential write-behind.

Functions in the GPFS daemon dynamically determine the actual degree of parallelism for prefetching data. The default value is 72. The minimum value is 2. The maximum value of prefetchThreads plus worker1Threads plus nsdMaxWorkerThreads is 8192 on all 64-bit platforms.

The -N flag is valid for this attribute.

proactiveReconnect={yes | no}
When enabled causes nodes to proactively close problematic TCP connections with other nodes and to reestablish new connections in their place. The default value of the proactiveReconnect parameter is 'no' when the minimum release level of a cluster is less than 5.0.4. The default value of the proactiveReconnect parameter is 'yes' when the minimum release level of a cluster is at least 5.0.4.
yes

Each node monitors the state of TCP connections with other nodes in the cluster and proactively closes and reestablishes a connection whenever the TCP congestion state and TCP retransmission timeout (RTO) (similar to the information that is displayed by the ss -i command in Linux) indicate that data might not be flowing.

In certain environments that are prone to short-term network outages, this feature can prevent nodes from being expelled from a cluster when TCP connections go into error states that are caused by packet loss in the network or in the adapter. If a TCP connection is successfully reestablished and operates normally, nodes on either side of the connection are not expelled.

no
Disables proactive closing and reestablishing of problematic TCP connections between nodes.

For both the yes and no settings, a message is written to the mmfs.log file whenever a TCP connection reaches an error state.

This attribute is supported only on Linux.

The -N flag is valid for this attribute.

profile
Specifies a predefined profile of attributes to be applied. System-defined profiles are located in /usr/lpp/mmfs/profiles/. All the configuration attributes listed under a cluster stanza are changed as a result of this command. The following system-defined profile names are accepted:
  • gpfsProtocolDefaults
  • gpfsProtocolRandomIO

A user's profiles must be installed in /var/mmfs/etc/. The profile file specifies GPFS configuration parameters with values different than the documented defaults. A user-defined profile must not begin with the string 'gpfs' and must have the .profile suffix.

User-defined profiles consist of the following stanzas:

%cluster:
[CommaSeparatedNodesOrNodeClasses:]ClusterConfigurationAttribute=Value
...

File system attributes and values are ignored.

A sample file can be found in /usr/lpp/mmfs/samples/sample.profile. See the mmchconfig command for a detailed description of the different configuration parameters. User-defined profiles should be used only by experienced administrators. When in doubt, use the mmchconfig command instead.

readReplicaPolicy
Specifies the location from which the disk is to read replicas. By default, GPFS reads the first replica whether there is a replica on the local disk or not. When readReplicaPolicy=local is specified, the policy reads replicas from the local disk if the local disk has data; for performance considerations, this is the recommended setting for FPO environments. An NSD server on the same subnet as the client is also considered to be local. When the subnets configuration parameter is set, all configured subnets are taken into account; not merely the daemon IP address.
When readReplicaPolicy=fastest is specified, the policy reads replicas from the disk considered the fastest based on the read I/O statistics of the disk. You can tune the way the system determines the fastest policy using the following parameters:
  • fastestPolicyNumReadSamples
  • fastestPolicyCmpThreshold
  • fastestPolicyMaxValidPeriod
  • fastestPolicyMinDiffPercent

In a system with SSD and regular disks, the value of fastestPolicyCmpThreshold can be set to a greater number to let GPFS refresh the speed statistics for the slower disks less frequently. The default value is maintained for all other configurations. The default value of this parameter is default. The valid values are default, local, and fastest.

To return this attribute to the default setting, specify readReplicaPolicy=DEFAULT -i.

readReplicaRuleEnabled
Specifies if the gpfs.readReplicaRule extended attribute or the diskReadExclusionList configuration option are evaluated during the data block read. The valid values are yes and no. The default value is no. For more information, see Replica mismatches.
release=LATEST

Increases the minimum release level of a cluster to the latest version of IBM Spectrum Scale that is supported by all the nodes of the cluster. For example, if the minimum release level of a cluster is 5.0.1.3 but IBM Spectrum Scale 5.0.2.2 is installed on all the nodes, then release=LATEST increases the minimum release level of the cluster to 5.0.2.0.

The effect of increasing the minimum release level is to enable the features that are installed with the version of IBM Spectrum Scale that the new minimum release level specifies. To return to the preceding example, increasing the minimum release level to 5.0.2.0 enables the features that are installed with IBM Spectrum Scale 5.0.2.0.

Issuing mmchconfig with the release=LATEST parameter is one of the final steps in upgrading the nodes of a cluster to a later version of IBM Spectrum Scale. For more information, see Completing the upgrade to a new level of IBM Spectrum Scale.

Before you use this parameter, consider any possible unintended consequences. For more information, see Minimum release level of a cluster.

To process this parameter, the mmchconfig command must access each node in the cluster to determine the version of IBM Spectrum Scale that is installed. If the command cannot access one or more of the nodes, it displays an error message and terminates. You must correct the communication problem and issue the command again. Repeat this process until the command verifies the information for all the nodes and ends successfully.

This parameter causes the mmchconfig command to fail with an error message if the cipherList configuration attribute of the cluster is not set to AUTHONLY or higher. For more information, see Completing the upgrade to a new level of IBM Spectrum Scale.

To display the minimum release level, issue the following command:
mmlsconfig minReleaseLevel
Note: If the Cluster Configuration Repository (CCR) is not enabled, you cannot run the mmchconfig release=LATEST command. Use the mmchcluster --ccr-enable command to enable CCR in the cluster.
restripeOnDiskFailure
Specifies whether GPFS will attempt to automatically recover from certain common disk failure situations.

When a disk experiences a failure and becomes unavailable, the recovery procedure first attempts to restart the disk and if this fails, the disk is suspended and its data that is moved to other disks. Similarly, when a node joins the cluster, all disks for which the node is responsible are checked and an attempt is made to restart any that are in a down state.

Whether a file system is a subject of a recovery attempt is determined by the max replication values for the file system. If the mmlsfs -M or -R value is greater than one, then the recovery code is executed. The recovery actions are asynchronous and GPFS continues its processing while the recovery attempts take place. The results from the recovery actions and any errors that are encountered is recorded in the /var/adm/ras/autorecovery.log.<timestamp> log.

For more information about GPFS disk fail auto recovery, see Auto recovery in IBM Spectrum Scale: Big Data and Analytics Guide.

rpcPerfNumberDayIntervals
Controls the number of days that aggregated RPC data is saved. Every day the previous 24 hours of one-hour RPC data is aggregated into a one-day interval.

The default value for rpcPerfNumberDayIntervals is 30, which allows the previous 30 days of one-day intervals to be displayed. To conserve memory, fewer intervals can be configured to reduce the number of recent one-day intervals that can be displayed. The values that are allowed for rpcPerfNumberDayIntervals are in the range 4 - 60.

rpcPerfNumberHourIntervals
Controls the number of hours that aggregated RPC data is saved. Every hour the previous 60 minutes of 1-minute RPC data is aggregated into a one-hour interval.

The default value for rpcPerfNumberHourIntervals is 24, which allows the previous day's worth of one-hour intervals to be displayed. To conserve memory, fewer intervals can be configured to reduce the number of recent one-hour intervals that can be displayed. The values that are allowed for rpcPerfNumberHourIntervals are 4, 6, 8, 12, or 24.

rpcPerfNumberMinuteIntervals
Controls the number of minutes that aggregated RPC data is saved. Every minute the previous 60 seconds of 1-second RPC data is aggregated into a 1-minute interval.

The default value for rpcPerfNumberMinuteIntervals is 60, which allows the previous hour's worth of 1-minute intervals to be displayed. To conserve memory, fewer intervals can be configured to reduce the number of recent 1-minute intervals that can be displayed. The values that are allowed for rpcPerfNumberMinuteIntervals are 4, 5, 6, 10, 12, 15, 20, 30, or 60.

rpcPerfNumberSecondIntervals
Controls the number of seconds that aggregated RPC data is saved. Every second RPC data is aggregated into a 1-second interval.

The default value for rpcPerfNumberSecondIntervals is 60, which allows the previous minute's worth of 1-second intervals to be displayed. To conserve memory, fewer intervals can be configured to reduce the number of recent 1-second intervals that can be displayed. The values that are allowed for rpcPerfNumberSecondIntervals are 4, 5, 6, 10, 12, 15, 20, 30, or 60.

rpcPerfRawExecBufferSize
Specifies the number of bytes to allocate for the buffer that is used to store raw RPC execution statistics. For each RPC received by a node, 16 bytes of associated data is saved in this buffer when the RPC completes. This circular buffer must be large enough to hold 1 second's worth of raw execution statistics.

The default value for rpcPerfRawExecBufferSize is 10 MiB, which produces 655360 entries. The data in this buffer is processed every second. It is a good idea to set the buffer size 10% to 20% larger than what is needed to hold 1 second's worth of data.

rpcPerfRawStatBufferSize
Specifies the number of bytes to allocate for the buffer that is used to store raw RPC performance statistics. For each RPC sent to another node, 56 bytes of associated data is saved in this buffer when the reply is received. This circular buffer must be large enough to hold 1 second's worth of raw performance statistics.

The default value for rpcPerfRawStatBufferSize is 30 MiB, which produces 561737 entries. The data in this buffer is processed every second. It is a good idea to set the buffer size 10% to 20% larger than what is needed to hold one second's worth of data.

sdrNotifyAuthEnabled
Specifies whether to authenticate the notify RPCs that are related to deadlock detection and amelioration, node overload, and node expel. Possible values are yes or no. The value yes forces to authenticate such notify RPCs.

For new clusters with a minimum release level of 5.1.1 or later, the default value of this parameter is yes. For existing clusters, the default value of this parameter is no and the system administrator can change it to yes, if the cluster uses CCR and the minimum release level is at least 5.1.1.

The scope of the sdrNotifyAuthEnabled parameter is local to a cluster. However, when set to yes, it can impact the notify RPCs that are sent to remote clusters. The behavior of the notify RPCs varies based on the value of sdrNotifyAuthEnabled parameter in different configurations. For example:
  1. If sdrNotifyAuthEnabled is set to different values in remote clusters, then the notify RPCs sent between the remote clusters fails and an error message is logged in the mmfs.log file.
  2. The sender of the notify RPC has sdrNotifyAuthEnabled set to yes and the receiver of the notify RPC has sdrNotifyAuthEnabled set to no. In this case, the sender of the notify RPC returns an error and logs an error in the message log, as it cannot establish a secure connection, but the receiver will service the notify RPC request.
  3. The sender of the notify RPC has sdrNotifyAuthEnabled set to no and the receiver of the notify RPC has sdrNotifyAuthEnabled set to yes. In this case, the sender of the notify RPC does not return an error and the receiver of the notify RPC does not service the RPC and logs an error message in the message log.
seqDiscardThreshold
With the seqDiscardThreshold parameter, GPFS detects a sequential read or write access pattern and specifies what has to be done with the page pool buffer after it is consumed or flushed by write-behind threads. This is the highest performing option in a case where a very large file is read or written sequentially. The default for this value is 1 MiB, which means that if a file is sequentially read and is greater than 1 MiB, GPFS does not keep the data in cache after consumption. There are some instances where large files are reread by multiple processes such as data analytics. In some cases, you can improve the performance of these applications by increasing the value of the seqDiscardThreshold parameter so that it is larger than the sets of files that have to be cached. If the value of the seqDiscardthreshold parameter is increased, GPFS attempts to keep as much data in cache as possible for the files that are below the threshold.

The value of seqDiscardThreshold is file size in bytes. The default is 1/ MB. Increase this value if you want to cache files that are sequentially read or written and are larger than 1 MiB in size. Ensure that there are enough buffer descriptors to cache the file data. For more information about buffer descriptors, see the maxBufferDescs parameter.

sharedTmpDir
Specifies a default global work directory where the mmapplypolicy command or the mmbackup command can store the temporary files that it generates during its processing. The command uses this directory when no global work directory was specified on the command line with the -g option. The directory must be in a file system that meets the requirements of the -g option. For more information, see mmapplypolicy command.
Note: The mmapplypolicy command or the mmbackup command uses this directory regardless of the format version of the target file system. That is, to take advantage of this attribute, you do not need to upgrade your file system to file system format version 5.0.1 or later (file system format number 19.01 or greater).
sidAutoMapRangeLength
Controls the length of the reserved range for Windows SID to UNIX ID mapping. See Identity management on Windows in the IBM Spectrum Scale: Administration Guide for additional information.
sidAutoMapRangeStart
Specifies the start of the reserved range for Windows SID to UNIX ID mapping. See Identity management on Windows in the IBM Spectrum Scale: Administration Guide for additional information.
subnets
Specifies subnets that are used to communicate between nodes in a GPFS cluster or a remote GPFS cluster.
The subnets option must use the following format:
subnets="Subnet[/ClusterName[;ClusterName...][ Subnet[/ClusterName[;ClusterName...]...]"
where:
Subnet
Is a subnet specification such as 192.168.2.0.
ClusterName
Can be either a cluster name or a shell-style regular expression, which is used to match cluster names, such as:
CL[23].kgn.ibm.com
Matches CL2.kgn.ibm.com and CL3.kgn.ibm.com.
CL[0-7].kgn.ibm.com
Matches CL0.kgn.ibm.com, CL1.kgn.ibm.com, ... CL7.kgn.ibm.com.
CL*.ibm.com
Matches any cluster name that starts with CL and ends with .ibm.com.
CL?.kgn.ibm.com
Matches any cluster name that starts with CL, is followed by any one character, and then ends with .kgn.ibm.com.

The order in which you specify the subnets determines the order in which GPFS uses these subnets to establish connections to the nodes within the cluster. GPFS follows the network settings of the operating system for a specified subnet address, including the network mask. For example, if you specify subnets="192.168.2.0" and a 23-bit mask is configured, then the subnet spans IP addresses 192.168.2.0 - 192.168.3.255. In contrast, with a 25-bit mask, the subnet spans IP addresses 192.168.2.0 - 192.168.2.127.

GPFS does not impose limits on the number of bits in the subnet mask.

This feature cannot be used to establish fault tolerance or automatic failover. If the interface corresponding to an IP address in the list is down, GPFS does not use the next one on the list.

When you use subnets, both the interface corresponding to the daemon address and the interface that matches the subnet settings must be operational.

For more information about subnets, see Using remote access with multiple network definitions.

Specifying a cluster name or a cluster name pattern for each subnet is needed only when a private network is shared across clusters. If the use of a private network is confined within the local cluster, then you do not need to specify the cluster name in the subnet specification.

Limitation and fix: Although there is no upper limit to the number of subnets that can be specified in the subnets option, a limit does exist as to the number of subnets that are listed in the subnets option that a given node can be a part of. That limit is seven for nodes that do not have a fix that increases the limit and 64 for nodes that do have the fix. For example, the 7-subnet limit precludes the effective use of more than seven network interfaces on a node if each interface belongs to a distinct subnet that is listed in the subnets option.

The fix that increases the limit to 64 is included as part of the following APARs: IJ06771 for Version 4.1.1, IJ06770 for 4.2.3, and IJ06762 for 5.0.1.

If a node exceeds the limit, then some or all of its network interfaces that belong to the subnets in the subnets option might not be used in communicating with other nodes, with the primary GPFS daemon interface being used instead.

sudoUser={UserName | DELETE}
Specifies a non-root admin user ID to be used when sudo wrappers are enabled and a root-level background process calls an administration command directly instead of through sudo. The GPFS daemon that processes the administration command specifies this non-root user ID instead of the root ID when it needs to run internal commands on other nodes. For more information, see Root-level processes that call administration commands directly.
UserName
Enables this feature and specifies the non-root admin user ID.
DELETE
Disables this feature, as in the following example:
mmchconfig sudoUser=DELETE
syncBuffsPerIteration
This parameter is used to expedite buffer flush and the rename operations that are done by MapReduce jobs.

The default value is 100. It should be set to 1 for the GPFS FPO cluster for Big Data applications. Keep it as the default value for all other cases.

syncSambaMetadataOps
Is used to enable and disable the syncing of metadata operations that are issued by the SMB server.

If set to yes, fsync() is used after each metadata operation to provide reasonable failover behavior on node failure. This ensures that the node taking over can see the metadata changes. Enabling syncSambaMetadataOps can affect performance due to more sync operations.

If set to no, the additional sync overhead is avoided at the potential risk of losing metadata updates after a failure.

systemLogLevel
Specifies the minimum severity level for messages that are sent to the system log. The severity levels from highest to lowest priority are: alert, critical, error, warning, notice, configuration, informational, detail, and debug. The value that is specified for this attribute can be any severity level, or the value none can be specified so no messages are sent to the system log. The default value is notice.

GPFS generates some critical log messages that are always sent to the system logging service. This attribute only affects messages originating in the GPFS daemon (mmfsd). Log messages originating in some administrative commands are only stored in the GPFS log file.

This attribute is only valid for Linux nodes.

tiebreakerDisks
Controls whether GPFS uses the node-quorum-with-tiebreaker algorithm in place of the regular node-based quorum algorithm. See the IBM Spectrum Scale: Concepts, Planning, and Installation Guide. To enable this feature, specify the names of 1 - 3 disks. Separate the NSD names with a semicolon (;) and enclose the list in quotes. The disks do not have to belong to any particular file system, but must be directly accessible from the quorum nodes. For example:
tiebreakerDisks="gpfs1nsd;gpfs2nsd;gpfs3nsd" 
To disable this feature, use:
tiebreakerDisks=no
When you change the tiebreaker disks, be aware of the following requirements:
  • In a CCR-based cluster, if the disks that are specified in the tiebreakerDisks parameter belong to any file system, the GPFS daemon does not need to be up on all of the nodes in the cluster. However, any file system that contains the disks must be available to run the mmlsfs command.
  • In a traditional server-based (non-CCR) configuration repository cluster, the GPFS daemon must be shut down on all the nodes of the cluster.
Note: When you add or delete a tiebreakerCheck event, IBM Spectrum Scale must be down on all the nodes of the cluster. For more information, see mmaddcallback command and mmdelcallback command.
tscCmdAllowRemoteConnections
Specifies whether the ts* commands in /usr/lpp/mmfs/bin (which are used by the mm* commands) are allowed to use the remote TCP/IP connections when communicating with the local or other mmfsd daemons.

Valid values are yes and no. For new clusters with a minimum release level of 5.1.3 or later, the default value of this parameter is no. Otherwise, the default value of this parameter is yes. The system administrator can change it to no when the minimum release level is changed to 5.1.3 or higher. The recommended value for the tscCmdAllowRemoteConnections parameter is no.

yes
Allows ts* commands to use the remote TCP/IP connections when communicating with the local or other mmfsd daemons.
no
Forces the ts* commands to communicate with the local mmfsd daemon only over a UNIX domain socket (UDS), and to communicate with mmfsd daemons running on other nodes over the RPC communication framework used by IBM Spectrum Scale.

The scope of the tscCmdAllowRemoteConnections parameter is local to a cluster and it has the same value on all the nodes in the cluster. However, when the parameter value is set to no, it can impact the communication between the ts* commands and the mmfsd daemons in remote clusters that service the requests. Set the tscCmdAllowRemoteConnections parameter value to yes in the home cluster if any remote cluster is running a version of IBM Spectrum Scale older than 5.1.3. This value is useful to avoid the Operation not permitted errors from mm* commands invoked on the remote cluster’s nodes that are redirected to mmfsd daemons running on nodes in the home cluster.

tscCmdPortRange=Min-Max
Specifies the range of port numbers to be used for extra TCP/IP ports that some administration commands need for their processing. Defining a port range makes it easier for you set firewall rules that allow incoming traffic on only those ports. For more information, see IBM Spectrum Scale port usage.

If you used the spectrumscale installation toolkit to install a version of IBM Spectrum Scale that is earlier than version 5.0.0, then this attribute is initialized to 60000-61000. Otherwise, this attribute is initially undefined and the port numbers are dynamically assigned from the range of ephemeral ports that are provided by the operating system.

uidDomain
Specifies the UID domain name for the cluster.

GPFS must be down on all the nodes when changing the uidDomain attribute.

See the IBM white paper UID Mapping for GPFS in a Multi-cluster Environment (https://www.ibm.com/docs/en/spectrum-scale?topic=STXKQY/uid_gpfs.pdf).

unmountOnDiskFail={yes | no | meta}
Controls how the GPFS daemon responds when it detects a disk failure:
yes
The GPFS daemon force-unmounts the file system that contains the failed disk. Other file systems on the local node and all the nodes in the cluster continue to function normally, unless they have a dependency on the failed disk.
Note: The local node can remount the file system when the disk problem is resolved.
Use this setting in the following situations:
  • The cluster contains SAN-attached disks in large multinode configurations and does not use replication.
  • A node in the cluster hosts descOnly disks. Other nodes in the cluster should not set this value. For more information, see Data Mirroring and Replication.
no
This setting is the default. The GPFS daemon marks the disk as failed, notifies all nodes that use the disk that the disk has failed, and continues to function without the failed disk as long it can. If the number of failure groups with a failed disk is the same as or greater than the metadata replication factor, the daemon panics the file system. Note that the metadata replication factor that is used for checking is the current metadata replication set or actual replication factor in effect for log files. If all failure groups contain failed disks, then the daemon panics the file system, instead of marking the disk down.
Note: When the disk problem is resolved, issue the mmchdisk <file system> start command to make the disk active again.

This setting is appropriate when the node is using metadata-and-data replication, because the cluster can work from the replica until the failed disk is active again.

meta
This setting has the same effect as no, except that the GPFS daemon does not panic the file system unless it cannot access any replica of the metadata. Except that even if all failure groups contain the failed data disks, the daemon still marks the data disks down, instead of panicking the file system as with the no option.

However, subsequent data flush attempts on failed data disks might still result in panicking the file system.

Note that the intention of this setting is to allow users to list all directories and read some files, even if some or all data disks are not available for read.

Important: Set the attribute to meta for FPO deployment or when the metadata replication factor and the data replication factor are greater than one.

The -N flag is valid for this attribute.

usePersistentReserve
Specifies whether to enable or disable Persistent Reserve (PR) on the disks. Valid values are yes or no (no is the default). GPFS must be stopped on all nodes when setting this attribute.
To enable PR and to obtain recovery performance improvements, your cluster requires a specific environment:
  • All disks must be PR-capable.
  • On AIX, all disks must be hdisks; on Linux, they must be generic (/dev/sd*) or DM-MP (/dev/dm-*) disks.
  • If the disks have defined NSD servers, all NSD server nodes must be running the same operating system (AIX or Linux).
  • If the disks are SAN-attached to all nodes, all nodes in the cluster must be running the same operating system (AIX or Linux).

For more information, see Reduced recovery time by using Persistent Reserve.

verbsGPUDirectStorage
Specifies whether to enable the GPUDirect Storage (GDS) feature. When GDS is enabled, file data can be fetched directly from an NSD server's page pool into the GPU buffer of IBM Spectrum Scale clients by RDMA to improve the read performance. The minimum release level of the cluster must be 5.1.2 or later to enable this feature.

Valid values are yes and no and the default value is no. For more information about GDS support and software and hardware prerequisites, see GPUDirect Storage support for IBM Spectrum Scale .

verbsHungRDMATimeout
Specifies the number of seconds that IBM Spectrum Scale waits before waking up a thread that is waiting for a response to an RDMA request. The default value is 30 seconds. The valid range is 15 - 8640000 seconds. This feature cleans up long-waiting threads that wait for the completion event of RDMA read, write and send work requests which might never receive a response.

A sample RDMA read log entry is shown:

2020-07-28_19:06:49.972-0400: [I] RDMA hang break: wake up thread 28097 (waiting 33.82 sec for RDMA read on index 0 cookie 1)

verbsNumaAffinity
Specifies whether to set RDMA thread affinity to CPUs of the NUMA node to which RDMA device is attached or to the CPUs to which the mmfsd daemon has affinity. It also determines whether an NSD server does a NUMA aware RDMA adapter selection in RDMA-based communication with NSD clients.

Valid values are enable and disable. When the value selected is enable, the RDMA thread affinity is set to the CPUs of the NUMA nodes to which RDMA device is attached. In addition to this, the NSD server chooses the RDMA adapter in the same NUMA node as the server pagepool memory while performing RDMA-based communication with NSD clients. The default value is enable.

If the value selected is disable, the RDMA thread affinity is set to the CPUs to which mmfsd daemon has affinity.

Refer to the following information before you enable verbsNumaAffinity in the cluster:
  • Enabling verbsNumaAffinity is suitable only for the systems with more than one NUMA node and RDMA adapters that are connected to two or more NUMA nodes.
  • Before using the verbsNumaAffinity parameter, ensure that the Linux numactl package has been installed.
  • The verbsRdma option must be enabled and valid verbsPorts must be defined.
  • The verbsNumaAffinity option is supported only on Linux platforms where the minimum release level of the IBM Spectrum Scale cluster is 5.1.2 or later.

The -N flag is valid for this attribute.

verbsPorts
Specifies the addresses for RDMA transfers between an NSD client and server, where an address can be either of the following identifiers:
  • InfiniBand device name, port number, and fabric number
  • Network interface name and fabric number
You must enable verbsRdma to enable verbsPorts. If you want to specify a network interface name and a fabric number, you must enable verbsRdmaCm.
The format for verbsPorts is as follows:
verbsPorts="{ibAddress | niAddress}[ {ibAddress | niAddress}...]"
where:
ibAddress
Is an InfiniBand address with the following format:
Device[/Port[/Fabric]]
where:
Device
Is the HCA device name.
Port
Is a one-based port number, such as 1 or 2. The default value is 1. If you do not specify a port, then the port is 1.
Fabric
Is the number of an InfiniBand (IB) fabric (IB subnet on a switch). The default value is 0. If you do not specify a fabric number, then the fabric number is 0.
niAddress
Is a network interface address with the following format:
Interface[/Fabric]
where:
Interface
Is a network interface name.
Fabric
Is the number of an InfiniBand (IB) fabric (IB subnet on a switch). The default value is 0. If you do not specify a fabric number, then the fabric number is 0.

For this attribute to take effect, you must restart the GPFS daemon on the nodes on which the value of verbsPorts changed.

The -N flag is valid for this attribute.

The following examples might be helpful:
  • The following assignment creates two RDMA connections between an NSD client and server that use both ports of a dual-ported adapter with fabric number 7 on port 1 and fabric number 8 on port 2:
    verbsPorts="mlx4_0/1/7 mlx4_0/2/8"
  • The following assignment, without the fabric number, creates two RDMA connections between an NSD client and server that use both ports of a dual-ported adapter with the fabric number defaulting to 0:
    verbsPorts="mthca0/1 mthca0/2"
  • The following assignment creates two RDMA connections between an NSD client and server that use network interface names. The first connection is network interface ib3 with a default fabric number of 0. The second connection is network interface ib2 with a fabric number of 7:
    verbsPorts="ib3 ib2/7"
  • The following assignment creates four RDMA connections that include both InfiniBand addresses and network interface addresses:
    verbsPorts="ib3 ib2/7 mlx4_0/1/7 mlx4_0/2/8"
verbsPortsWaitTimeout
Specifies the number of seconds that the GPFS daemon startup service waits for the RDMA ports on a node to become active. The default value is 60 seconds. If the timeout expires, the startup service takes the action that is specified by the attribute verbsRdmaFailBackTCPIfNotAvailable. This attribute applies only to the RDMA ports that are specified by the verbsPorts attribute. For more information, see Suboptimal performance due to VERBS RDMA being inactive.
Note: To monitor the state of the RDMA ports, the GPFS startup service requires the following commands to be installed on the node:
  • /usr/sbin/ibstat
  • /usr/bin/ibdev2netdev
  • /usr/bin/netstat

The -N flag is valid for this attribute. This attribute takes effect when IBM Spectrum Scale is restarted.

verbsRdma
Enables or disables InfiniBand RDMA using the Verbs API for data transfers between an NSD client and NSD server. Valid values are enable or disable.

The -N flag is valid for this attribute.

verbsRdmaCm
Enables or disables the RDMA Connection Manager (RDMA CM or RDMA_CM) using the RDMA_CM API for establishing connections between an NSD client and NSD server. Valid values are enable or disable. You must enable verbsRdma to enable verbsRdmaCm.

If RDMA CM is enabled for a node, the node is only able to establish RDMA connections using RDMA CM to other nodes with verbsRdmaCm enabled. RDMA CM enablement requires IPoIB (IP over InfiniBand) with an active IP address for each port. Although IPv6 must be enabled, the GPFS implementation of RDMA CM does not currently support IPv6 addresses, so an IPv4 address must be used.

If verbsRdmaCm is not enabled when verbsRdma is enabled, the older method of RDMA connection prevails.

The -N flag is valid for this attribute.

verbsRdmaFailBackTCPIfNotAvailable={yes | no}
Specifies the action for the GPFS startup service to take if the timeout that is specified by the verbsPortsWaitTimeout attribute expires. This attribute applies only to the RDMA ports that are specified by the verbsPorts attribute.
Note: To monitor the state of the RDMA ports, the GPFS startup service requires the following commands to be installed on the node:
  • /usr/sbin/ibstat
  • /usr/bin/ibdev2netdev
  • /usr/bin/netstat
yes
The GPFS startup service configures communication for the GPFS daemon based on the number of active RDMA ports:
  • If some of the RDMA ports are active, the GPFS daemon is configured to use the active RDMA ports for RDMA transfers.
  • If none of the RDMA ports are active, the GPFS daemon is configured to use the TCP/IP connections of the node for RDMA transfers.
no
The startup service exits, even if some of the RDMA ports are active. Correct the problem and restart IBM Spectrum Scale by running the mmstartup command on the node.

The -N flag is valid for this attribute. This attribute takes effect when IBM Spectrum Scale is restarted.

verbsRdmaPkey
Specifies an InfiniBand partition key for a connection between the specified node and an Infiniband server that is included in an InfiniBand partition. This parameter is valid only if verbsRdmaCm is set to disable.

Only one partition key is supported per IBM Spectrum Scale cluster.

The -N flag is valid for this attribute.

verbsRdmaRoCEToS

Specifies the Type of Service (ToS) value for clusters using RDMA over Converged Ethernet (RoCE). Acceptable values for this parameter are 0, 8, 16, and 24. The default value is -1.

If the user-specified value is neither the default nor an acceptable value, the script exits with an error message to indicate that no change has been made. However, a RoCE cluster continues to operate with an internally set ToS value of 0 even if the mmchconfig command failed. Different ToS values can be set for different nodes or groups of nodes.

The -N flag is valid for this attribute.

The verbsPorts parameter can use IP netmask/subnet to specify network interfaces to use for RDMA CM. However, this format is allowed only when verbsRdmaCm=yes. Otherwise these entries are ignored. This allows the use of VLANs and multiple IP interfaces per IB device in general.

verbsRdmaSend
Enables or disables the use of InfiniBand RDMA send and receive rather than TCP for most GPFS daemon-to-daemon communication. Valid values are yes or no. The default value is no. The verbsRdma option must be enabled and valid verbsPorts must be defined before verbsRdmaSend can be enabled.

When the attribute is set to no, only data transfers between an NSD server and an NSD client are eligible for RDMA. When the attribute is set to yes, the GPFS daemon uses InfiniBand RDMA connections for daemon-to-daemon communications only with nodes that are at IBM Spectrum® Scale 5.0.0 or later.

If verbsRdmaSend is enabled, then the value set for the nsdInlineWriteMax parameter is ignored. In such cases, you can set the maximum transaction size that can be sent as embedded data in an NSD-write RPC by using the verbsRecvBufferSize parameter.

The -N flag is valid for this attribute.

verbsRecvBufferCount
Defines the number of RDMA recv buffers created for each RDMA connection that is enabled for RDMA send when verbsRdmaSend is enabled. The default value is 128.

The -N flag is valid for this attribute.

verbsRecvBufferSize
Defines the size, in bytes, of the RDMA send and recv buffers that are used for RDMA connections that are enabled for RDMA send when verbsRdmaSend is enabled. This parameter also specifies the maximum transaction size that can be sent as embedded data in an NSD-write RPC. The default value is 4096.

The -N flag is valid for this attribute.

workerThreads
Controls an integrated group of variables that tune file system performance. Use this variable to tune file systems in environments that are capable of high sequential or random read/write workloads or small-file activity. For new installations of the product, this variable is preferred over worker1Threads and preFetchThreads.

The default value is 48. If protocols are installed, then the default value is 512. The valid range is 1-8192. However, the maximum value of workerThreads plus preFetchThreads plus nsdMaxWorkerThreads is 8192. The -N flag is valid with this variable.

This variable controls both internal and external variables. The internal variables include maximum settings for concurrent file operations, for concurrent threads that flush dirty data and metadata, and for concurrent threads that prefetch data and metadata. You can further adjust the external variables with the mmchconfig command:
  • logBufferCount
  • prefetchThreads
  • worker3Threads
The prefetchThreads parameter is described in this help topic. See the Tuning Parameters article in the IBM Spectrum Scale wiki in developerWorks® for descriptions of the logBufferCount and worker3Threads parameters.
Important: After you set workerThreads to a non-default value, avoid setting worker1Threads. If you do, at first only worker1Threads is changed. But when IBM Spectrum Scale is restarted, all corresponding variables are automatically tuned according to the value of worker1Threads, instead of workerThreads.
worker1Threads
For some categories of file I/O, this variable controls the maximum number of concurrent file I/O operations. You can increase this value to increase the I/O performance of the file system. However, increasing this variable beyond some point might begin to degrade file system performance.
Important: After you set workerThreads to a non-default value, avoid setting worker1Threads. If you do, at first only worker1Threads is changed. But when IBM Spectrum Scale is restarted, all corresponding variables are automatically tuned according to the value of worker1Threads, instead of workerThreads.

This attribute is primarily used for random read or write requests that cannot be pre-fetched, random I/O requests, or small file activity. The default value is 48. The minimum value is 1. The maximum value of prefetchThreads plus worker1Threads plus nsdMaxWorkerThreads is 8192 on all 64-bit platforms.

The -N flag is valid for this attribute.

writebehindThreshold

The writebehindThreshold parameter specifies the point at which GPFS starts flushing new data out of the page pool for a file that is being written sequentially. Until the file size reaches this threshold, no write-behind is started because the full blocks are filled.

Increasing this value defers write-behind for new larger files, which can be useful. The workload folder contains temporary files that are smaller than the value of writebehindThreshold and are deleted before they are flushed from cache. The default value of this parameter is 512 KiB. If the value is too large, there might be too many dirty buffers that the sync thread has to flush at the next sync interval, causing a surge in disk IO. Keeping the value small ensures a smooth flow of dirty data to disk.

Note: If you set new values for afmParallelReadChunkSize, afmParallelReadThreshold, afmParallelWriteChunkSize, and afmParallelWriteThreshold; you need not relink filesets for the new values to take effect.

Exit status

0
Successful completion.
nonzero
A failure has occurred.

Security

You must have root authority to run the mmchconfig command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

Start of change
  1. To change the maximum file system block size that is allowed to 8 MiB, issue the following command:
    # mmchconfig maxblocksize=8M
    A sample output is as follows:
    Verifying GPFS is stopped on all nodes ...
    mmchconfig: Command successfully completed
    mmchconfig: Propagating the cluster configuration data to all
      affected nodes.  This is an asynchronous process.
  2. The following example shows the use of the DELETE value.
    A sample output of the mmlsconfig pagepool command appears as shown.
    [root@node-11 ~]# mmlsconfig pagepool
    pagepool 3G  
    pagepool 4G [node-11d]
    pagepool 5G [node-12d]
    pagepool 6G [node-13d]
    pagepool 7G [linuxNodes]
    

    The nodes node-11,node-12 and node-13 are Linux nodes and the pagepool=7G configuration applies to all of these nodes.

    To remove the pagepool=7G configuration for Linux nodes, issue the following command.
    # mmchconfig pagepool=DELETE -N linuxNodes
    The sample output appears as shown:
    mmchconfig: Command successfully completed
    mmchconfig: Propagating the cluster configuration data to all
      affected nodes.  This is an asynchronous process.
    To confirm the change, issue the following command:
    [root@node-11 ~]# mmlsconfig pagepool
    The output appears as shown.
    
    pagepool 3G  
    pagepool 4G [node-11d]
    pagepool 5G [node-12d]
    pagepool 6G [node-13d]

    The configuration change in the example shown, takes effect when the GPFS daemon is restarted even if it was issued with the immediate option. The DELETE operation ignores the immediate options -i or -I when it is issued along with the -N option.

End of change

See also

Location

/usr/lpp/mmfs/bin