| Scenario A: Switch-LED or stopped fan indicates
the problem |
Where to look: - Physical InfiniBand switch-LEDs
or fan
- Fabric management server, graphical interface
|
| Scenario B: Subnet manager log message indicates
the problem |
Where to look: - Fabric management server, file /var/log/messages
- Cluster systems management (CSM) server, file /var/log/csm/errorlog/CSM
MS hostname
|
| Scenario C: Switch log message indicates the
problem |
Where to look: - On switch (through logShow)
- CSM server, file /var/log/csm/errorlog/CSM
MS hostname
|
| Scenario D: Fast fabric health check indicates
the problem |
Where to look: - Fabric management server
- file /var/opt/iba/analysis/latest/chassis*.diff
- file /var/opt/iba/analysis/latest/chassis*.errors
- file /var/opt/iba/analysis/latest/fabric*.errors
|
| Scenario E: One or more switches is missing
from the configuration |
Where to look: - Fabric management server, file /var/log/messages
- CSM server, file /var/log/csm/errorlog/CSM/MS
hostname
- Fast fabric, dir /var/opt/iba/analysis/latest files:
hostsm*.diff, hostsm*.errors, esm*.diff, hostsm*.errors
|
| Scenario F: Fast fabric health check tool problems
|
Where to look: - Fast fabric, file /var/opt/iba/analysis/latest/*.stderr
|
| Scenario G: Host channel adapter hardware failure
|
Where to look: - Host channel adapter LED status
- Service focal point serviceable event that calls the host channel
adapter.
|
| Scenario H: Host channel adapter cannot ping
|
Where to look: - Users cannot ping this host channel adapter.
Data to collect and record: - Go to Collecting data from the fabric management server for 7874-024, 7874-040, 7874-120, and 7874-240 switches, and perform that
procedure.
- If collecting data from the cluster systems management (CSM) server,
perform the following steps:
- On the CSM server, make a data collection directory. In this directory,
you will be storing node data to a file with the name of ibdata.customer.timestamp.
Where you see file in the following substeps, use
that directory and file name, for example, dir/ibdata.IBM.20080508-1034.
- dsh -av "netstat - i | grep ib" > file.netstat
- For AIX® nodes:
- dsh -av "ibstat -n" > file.ibstat
- dsh -av "lscfg -vp" > file.lscfg
- For Linux nodes:
- dsh -av "ibv_devinfo - v" > file.devinfo
- dsh -av "lspci" > file.lspci
- If collecting data from the node directly, perform the following
steps:
- On the node, create a data collection directory. In this directory
you will be storing the node data to a file with name of ibdata.customer.timestamp.
Where you see file in the following substeps, use
that directory and file name, for example, dir/ibdata.IBM.20080508-1034.
- netstat -i | grep ib > file.netstat
- For AIX nodes:
- ibstat -n > file.ibstat
- lscfg -vp > file.lscfg
- For Linux nodes:
- ibv_devinfo -v > file.devinfo
- lspci > file.lspci
- Send all files from that directory to your IBM service representative.
|
| Scenario I: General debug with no specific problem
|
Where to look: No specific symptom
Data to collect and record: Collect the same data
as for scenario H.
|