Health checking provides methods to check for errors and
the overall health of the fabric.
Before setting up health checking, obtain the Fast Fabric
Toolset Users Guide for reference.
There are several times that health checking is done. The method
for interpreting results varies depending on what you are trying to
accomplish. The most generic health checking command available is
the all_analysis command. There are some underlying
health-checking tools beneath the all_analysis command
that are described in the Fast Fabric Toolset Users Guide.
You can also target specific devices and ports with these commands.
This information is also documented in the Fast Fabric Toolset
Users Guide.
Note: These commands must be processed on each fabric management server
that has a master subnet manager running on it.
The health checking commands should be run at various times as
described in the following list:
- During installation or reconfiguration to verify that there are
no errors in the fabric and that the configuration is as expected,
repeatedly run the /sbin/all_analysis -b command
until the configuration is correct.
- After an installation or repair is verified, a baseline health
check is saved for future comparisons. Repairs that lead to serial
number changes on field replaceable units (FRUs), movement of cables,
or switch firmware and software updates constitute configuration changes.
The /sbin/all_analysis -b command should be run again.
- Periodically to monitor the fabric. For details, see Setting up periodic fabric health checking. To periodically monitor
the fabric, run the /sbin/all_analysis command.
Note: The
LinkDown counter in the IBM® GX/GX+
host channel adapters (HCAs) is reset as soon as the link shuts down.
This action is part of the recovery procedure. While this action is
not optimal, the LinkDown counter for the connected switch port provides
an accurate count of the number of LinkDown actions for the link.
- To check link error counters without comparing against the baseline
for configuration changes, use the /sbin/all_analysis –e command.
- During debug to query the fabric. This query can be helpful for
performance problem debugging. To save the history during debugging,
use the /sbin/all_analysis –s command.
- During repair verification to identify errors or inadvertent changes
by comparing the latest health check results to the baseline health
check results.
- To save history during queries: /sbin/all_analysis –s
- If the configuration is changed (with new part serial
numbers), a new baseline is required. Use the /sbin/all_analyis
–b command.
The following commands are important setup files for Fast Fabric
Health Check. Details about how to set them up are found in the Fast
Fabric Toolset Users Guide.
Note: These commands must be changed on each fabric management server.
- To see the basic setup file: /etc/sysconfig/fastfabric.conf
- For a list of switch chassis: /etc/sysconfig/iba/chassis
- For a list of switch chassis running embedded SM: /etc/sysconfig/iba/esm_chassis
- For a list of ports on Fabric/MS: /etc/sysconfig/iba/ports (format
equals “hca:port” and space delimited)