gnrhealthcheck script

Checks the general health of an ESS configuration.

Synopsis


gnrhealthcheck  [--topology] [--enclosure] [--rg] [--pdisk] 
                 [--perf-dd] [--ipr] [--nvme-ctrl] [--ssd-endurance-percentage] [--local]

Availability

Available on all IBM Storage Scale editions.

Description

The gnrhealthcheck script checks the general health of an ESS configuration.

Parameters

--topology: Checks the operating system topology. Runs mmgetpdisktopology and topsummary to look for cabling and path issues.
--enclosure: Checks enclosures. Runs mmlsenclosure to look for failures.
--rg: Checks recovery groups. Runs mmlsrecoverygroup to check whether all recovery groups are active and whether the active server is the primary server. Also checks for any recovery groups that need service.
--pdisk: Checks pdisks. Runs mmlspdisk to check that each pdisk has two paths.
--perf-dd: Checks basic performance of disks. Runs a dd read to each potential IBM Storage Scale RAID disk drive for a GB and reports basic performance statistics. Reads are done six disks at a time. These statistics will only be meaningful if run on an idle system. Available on Linux® only.
--ipr: Checks IBM® Power® RAID Array status. Runs iprconfig to check if the local RAID adapter is running "Optimized" or "Degraded". The ESS NVR pdisks are created on a RAID 10 array on this adapter. If one of the drives has failed, it will affect performance and should be replaced.
--nvme-ctrl: Checks NVMe controllers. Runs mmlsnvmestatus to find the status of the NVMe controllers.
--ssd-endurance-percentage: Runs mmlspdisk to check for SSDs that exceed 90% endurance.
--local: Runs tests only on the invoking node.

The default is to check everything except --perf-dd arguments on all NSD server nodes.

Exit status

0: No problems were found.
1: Problems were found and information was displayed.
Note: The default is to display to standard output. There could be a large amount of data, so it is recommended that you pipe the output to a file.

Security

You must have root authority to run the gnrhealthcheck script.

The node on which the script is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more details, see the following IBM Storage Scale RAID: Administration topic: Requirements for administering IBM Storage Scale RAID.

Examples

In this example, all checks are successful.

To run a health check on the local server nodes and place output in /tmp/gnrhealthcheck.out, issue the following command:

   gnrhealthcheck --local | tee /tmp/gnrhealthcheck.out

The system displays information similar to this:

################################################################
# Beginning topology checks.
################################################################
Topology checks successful.

################################################################
# Beginning enclosure checks.
################################################################
Enclosure checks successful.

################################################################
# Beginning recovery group checks.
################################################################
Recovery group checks successful.

################################################################
# Beginning pdisk checks.
################################################################
Pdisk group checks successful.

###########################################################
# Beginning IBM Power RAID checks. 
###############################################################
IBM Power RAID checks successful.

################################################################
# Beginning the NVMe Controller checks.
################################################################
The NVMe Controller checks are successful.

################################################################
# Beginning SSD endurance checks
################################################################
The SSD endurance checks are successful.

In this example, several issues need to be investigated.