gpfs.snap command

Creates an informational system snapshot at a single point in time. This system snapshot consists of information such as cluster configuration, disk configuration, network configuration, network status, GPFS logs, dumps, and traces.

Synopsis

gpfs.snap [-d OutputDirectory] [-m | -z]
          [-a | -N {Node[,Node...] | NodeFile | NodeClass}]
          [--check-space | --no-check-space | --check-space-only]
          [--cloud-gateway {NONE |BASIC |FULL} ] [--full-collection] [--deadlock [--quick] |
           --limit-large-files {YYYY:MM:DD:HH:MM | NumberOfDaysBack | latest}]
          [--component-only Component |--hadoop]          
          [--exclude-aix-disk-attr] [--exclude-aix-lvm] [--exclude-merge-logs]
          [--exclude-net] [--gather-logs] [--mmdf] [--performance] [--prefix]
          [--protocol ProtocolType[,ProtocolType,...]] [--timeout Seconds]
          [--purge-files KeepNumberOfDaysBack][--qos {NONE | FULL}]
          [--pmr {xxxxx.yyy.zzz | TSxxxxxxxxx}][--sosreport] [--vmcore] 
          [--vmcore-path Path] [--upload-snap]

Availability

Available on all IBM Storage Scale editions.

Description

Use the gpfs.snap command as the main tool to gather data when a GPFS problem is encountered, such as a hung file system, a hung GPFS command, or a daemon assert.

The gpfs.snap command gathers information (for example, GPFS internal dumps, traces, and kernel thread dumps) to solve a GPFS problem.

Note:

By default, large debug files are now a delta collection, which means that they are only collected when there are new files since the previous run of gpfs.snap. To override this default behavior, use either the --limit-large-files or --full-collection options.
This utility program is a service tool and options might change dynamically. The tool impacts performance and occupies disk space when it runs.
If the IBM Storage Scale native REST API feature is enabled, the administration daemon runs as a non-root service on all nodes. When you run the gpfs.snap command with the native REST API protocols, and the snap collection needs to place files into a directory specified by the -d option, ensure that one of the following conditions is met:
- The directory permissions are open to allow write access.
- The directory explicitly allows access for the scaleapiadmd user ID.

Parameters

-d OutputDirectory

Specifies the output directory where the snapshot information is stored. You cannot specify a directory that is located in a GPFS file system that is managed by the same cluster that you are running the gpfs.snap command against. The default output directory is /tmp/gpfs.snapOut.

-m

Specifying this option is equivalent to specifying --exclude-merge-logs with -N.

-z

Collects gpfs.snap data only from the node on which the command is invoked. No master data is collected.

-a

Directs gpfs.snap to collect data from all nodes in the cluster. This value is the default.

-N {Node[,Node ...] | NodeFile | NodeClass}

Specifies the nodes from which to collect gpfs.snap data. This option supports all defined node classes. For more information about how to specify node names, see Specifying nodes as input to GPFS commands.

When this option is used, cluster information is collected along with data from the specified nodes. This cluster-wide collection of information requires access to all the nodes in the cluster.

--check-space

Specifies that space checking is performed before data is collected.

--no-check-space

Specifies that no space checking is performed. This value is the default.

--check-space-only

Specifies that only space checking is performed. No data is collected.

--cloud-gateway {NONE | BASIC | FULL}

When this option is set to NONE, no transparent cloud tiering data is collected. With the BASIC option, when the transparent cloud tiering service is enabled, the snap collects information, such as logs, traces, Java™ cores, along with minimal system and IBM Storage Scale cluster information specific to Transparent cloud tiering. No customer sensitive information is collected.

Note: The default behavior of the gpfs.snap command includes basic information of transparent cloud tiering, in addition to the GPFS information.

With the FULL option, extra details, such as Java Heap dump are collected, along with the information captured with the BASIC option.

--full-collection

Specifies that all large debug files are collected instead of the default behavior that collects only new files since the previous run of gpfs.snap.

--deadlock

Collects only the minimum amount of data necessary to debug a deadlock problem. Part of the data that is collected is the output of the mmfsadm dump all command. This option ignores all other options except for -a, -N, -d, and --prefix.

--quick

Collects less data when specified along with the --deadlock option. The output includes mmfsadm dump most, mmfsadm dump kthreads, and 10 seconds of trace in addition to the usual gpfs.snap output.

--limit-large-files {YYYY:MM:DD:HH:MM | NumberOfDaysBack | latest}

Specifies a time limit to reduce the number of large files collected.

--component-only Component

Allows to select only a single IBM Storage Scale component for which the data is supposed to be collected. This option omits the collection of the common data, which means that it is useful only when the common data was already collected and the root cause was identified to be the selected component. This option results in much smaller output and much faster snap generation.

Note: In IBM Storage Scale release 5.1.3, only the Hadoop component is supported. This specifies that only Hadoop data is to be collected.

--exclude-aix-disk-attr

Specifies that data about AIX® disk attributes is not collected. Collecting data about AIX disk attributes on an AIX node that has a large number of disks might be very time-consuming, so using this option might help improve performance.

--exclude-aix-lvm

Specifies that data about the AIX Logical Volume Manager (LVM) is not collected.

--exclude-merge-logs

Specifies that merge logs and waiters are not collected.

--exclude-net

Specifies that network-related information is not collected.

--gather-logs

Gathers, merges, and chronologically sorts all of the mmfs.log files. The results are stored in the directory that is specified with the -d option.

--mmdf

Specifies that mmdf output is collected.

--performance

Specifies that performance data is to be gathered. It is recommended to issue the gpfs.snap command with the -a option on all nodes or on the pmcollector node that has an ACTIVE THRESHOLD MONITOR role. Starting from IBM Storage Scale 5.1.0, the performance data package includes the performance monitoring report from the top metric for last 24 hours as well.

Note: The performance script can take up to 30 minutes to run. Therefore, the script is not included when all other types of protocol information are gathered by default. Specifying this option is the only way to turn on the gathering of performance data.

--prefix

Specifies that the prefix name gpfs.snap is added to the tar file.

--protocol ProtocolType[,ProtocolType,...]

Specifies the type or types of protocol information to be gathered. By default, whenever any protocol is enabled on a file system, information is gathered for all types of protocol information (except for performance data; see the --performance option). However, when the --protocol option is specified, the automatic gathering of all protocol information is turned off, and only the specified type of protocol information is gathered. The following values for ProtocolType are accepted:

smb
nfs
s3
authentication
ces
core
none

--timeout Seconds

Specifies the timeout value, in seconds, for all commands.

--purge-files KeepNumberOfDaysBack

Specifies that large debug files are deleted from the cluster nodes based on the KeepNumberOfDaysBack value. If 0 is specified, then all of the large debug files are deleted. If a value greater than 0 is specified, then large debug files that are older than the number of days specified are deleted. For example, if the value 2 is specified, then the previous two days of large debug files are retained.

This option is not compatible with many of the gpfs.snap options because it only removes files and does not collect any gpfs.snap data.

--qos {NONE | FULL}

If the optional --qos option is not supplied with the command, the default behavior is a FULL collection for QoS FFDC data.

The --qos option collects QoS FFDC files from the cluster. The feature can be turned off by supplying the NONE argument with the --qos option for the gpfs.snap collection.

--hadoop

Specifies that Hadoop data is to be gathered.

--pmr {xxxxx.yyy.zzz | TSxxxxxxxxx}

Specifies either the dot-delimited PMR descriptor, where x, y and z could be digits, and y might additionally be a letter, or a Salesforce case descriptor, where each x is a digit.

--sosreport

Specifies the option to run the sos report utility for collecting system information.

--vmcore

Specifies the option to collect vmcore data.

--vmcore-path Path

Specifies the local file system path where vmcore is saved. The default is /var/crash.

--upload-snap

After completion of a successful snap, uploads the snap automatically through call home. Use the mmcallhome status list command to check the upload progress. End of change

Use the -z option to generate a non-master snapshot. This option is useful if there are many nodes on which to take a snapshot, and only one master snapshot is needed. For a GPFS problem within a large cluster (hundreds or thousands of nodes), one strategy might call for a single master snapshot (one invocation of gpfs.snap with no options), and multiple non-master snapshots (multiple invocations of gpfs.snap with the -z option).

Use the -N option to obtain gpfs.snap data from multiple nodes in the cluster. When the -N option is used, the gpfs.snap command takes non-master snapshots of all the nodes that are specified with this option and a master snapshot of the node on which it was issued.

Exit status

0: Successful completion.
nonzero: A failure has occurred.

Security

You must have root authority to run the gpfs.snap command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

To collect gpfs.snap on all nodes with the default data, issue the following command:

c34f2n03:#
gpfs.snap
gpfs.snap started at Fri Mar 22 13:16:12 EDT 2019.
Gathering common data...
Gathering Linux specific data...
Gathering extended network data...
Gathering local callhome data...
Gathering local sysmon data...
Gathering trace reports and internal dumps...
Gathering Transparent Cloud Tiering data at level BASIC...
gpfs.snap:  The Transparent Cloud Tiering snap file was not located on the node c34f2n03.gpfs.net
Gathering cluster wide sysmon data...
Gathering cluster wide callhome data...
gpfs.snap:  Spawning remote gpfs.snap calls. Master is c34f2n03.
This may take a while.

Copying file /tmp/gpfs.snapOut/27348/gpfs.snap.c13c1apv7_0322131503.out.tar.gz from c13c1apv7.gpfs.net ...
gpfs.snap.c13c1apv7_0322131503.out.tar.gz        100% 7732KB   7.6MB/s   00:00
Successfully copied file /tmp/gpfs.snapOut/27348/gpfs.snap.c13c1apv7_0322131503.out.tar.gz from
     c13c1apv7.gpfs.net.
Gathering cluster wide protocol data...
Gathering waiters from all nodes...
Gathering mmfs.logs from all nodes.  This may take a while...
All data has been collected.  Processing collected data.
  This may take a while...
Packaging master node data...
Writing * to file /tmp/gpfs.snapOut/27348/collect/gpfs.snap.c34f2n03_master_0322131612.out.tar.gz
Packaging all data.
Writing . to file /tmp/gpfs.snapOut/27348/all.0322131612.tar
gpfs.snap completed at Fri Mar 22 13:17:55 EDT 2019
###############################################################################
Send file /tmp/gpfs.snapOut/27348/all.0322131612.tar to IBM Service
Examine previous messages to determine additional required data.
###############################################################################

After this command has completed, send the tar file (highlighted) to IBM® service.

The following example collects gpfs.snap data on specific nodes and provides an output directory:


./gpfs.snap -N c34f2n03.gpfs.net,c13c1apv7.gpfs.net -d /tmp/snap_outdir
gpfs.snap started at Fri Mar 22 15:54:35 EDT 2019.
Gathering common data...
Gathering Linux specific data...
Gathering extended network data...
Gathering local callhome data...
Gathering local sysmon data...
Gathering trace reports and internal dumps...
Gathering Transparent Cloud Tiering data at level BASIC...
gpfs.snap:  The Transparent Cloud Tiering snap file was not located on the node 
     c34f2n03.gpfs.net
Gathering cluster wide sysmon data...
Gathering cluster wide callhome data...
gpfs.snap:  Spawning remote gpfs.snap calls. Master is c34f2n03.
This may take a while.

Copying file /tmp/snap_outdir/gpfs.snap.c13c1apv7_0322155324.out.tar.gz from c13c1apv7.gpfs.net 
     ...
gpfs.snap.c13c1apv7_0322155324.out.tar.gz     100% 7720KB   7.5MB/s   00:00
Successfully copied file /tmp/snap_outdir/gpfs.snap.c13c1apv7_0322155324.out.tar.gz from 
     c13c1apv7.gpfs.net.
Gathering cluster wide protocol data...
Gathering waiters from all nodes...
Gathering mmfs.logs from all nodes.  This may take a while...
All data has been collected.  Processing collected data.
  This may take a while...
Packaging master node data...
Writing * to file /tmp/snap_outdir/collect/gpfs.snap.c34f2n03_master_0322155435.out.tar.gz

Packaging all data.
Writing . to file /tmp/snap_outdir/all.0322155435.tar
gpfs.snap completed at Fri Mar 22 15:56:11 EDT 2019
###############################################################################
Send file /tmp/snap_outdir/all.0322155435.tar to IBM Service
Examine previous messages to determine additional required data.
###############################################################################

Location

/usr/lpp/mmfs/bin