subscribe iconSubscribe to this information
POWER7 information

Collecting data for InfiniBand switch errors for 7874-024, 7874-040, 7874-120, and 7874-240 switches

For InfiniBand host channel adapter, switch, or management errors, you need to determine what data to collect.

Because of the complexity of the hardware and software configurations, determine which case most closely matches your situation. If you cannot determine which scenario applies to your situation, contact your IBM® service representative.

Table 1. Failure scenarios and data collection
Failure scenario Location of data and data to collect
Scenario A: Switch-LED or stopped fan indicates the problem
Where to look:
  • Physical InfiniBand switch-LEDs or fan
  • Fabric management server, graphical interface
Data to collect and record:
  1. Record the status and the location of the problem.
  2. Go to Collecting data from the fabric management server for 7874-024, 7874-040, 7874-120, and 7874-240 switches, and perform that procedure.
Scenario B: Subnet manager log message indicates the problem
Where to look:
  • Fabric management server, file /var/log/messages
  • Cluster systems management (CSM) server, file /var/log/csm/errorlog/CSM MS hostname
Scenario C: Switch log message indicates the problem
Where to look:
  • On switch (through logShow)
  • CSM server, file /var/log/csm/errorlog/CSM MS hostname
Scenario D: Fast fabric health check indicates the problem
Where to look:
  • Fabric management server
    • file /var/opt/iba/analysis/latest/chassis*.diff
    • file /var/opt/iba/analysis/latest/chassis*.errors
    • file /var/opt/iba/analysis/latest/fabric*.errors
Scenario E: One or more switches is missing from the configuration
Where to look:
  • Fabric management server, file /var/log/messages
  • CSM server, file /var/log/csm/errorlog/CSM/MS hostname
  • Fast fabric, dir /var/opt/iba/analysis/latest files: hostsm*.diff, hostsm*.errors, esm*.diff, hostsm*.errors
Scenario F: Fast fabric health check tool problems
Where to look:
  • Fast fabric, file /var/opt/iba/analysis/latest/*.stderr
Data to collect and record:
  1. Collect all health check history data. Go to Collecting data for Fast Fabric Health Check for 7874-024, 7874-040, 7874-120, and 7874-240 switches, and perform that procedure.
  2. From the Fabric management server, collect the data from the file /var/log/messages.
Scenario G: Host channel adapter hardware failure
Where to look:
  • Host channel adapter LED status
  • Service focal point serviceable event that calls the host channel adapter.
Data to collect and record:
  1. Perform normal data collection of an iqyylog and a dump (if applicable) from the management console.
  2. If LEDs indicate link problem, go to Collecting data from the fabric management server for 7874-024, 7874-040, 7874-120, and 7874-240 switches, and perform that procedure.
Scenario H: Host channel adapter cannot ping
Where to look:
  • Users cannot ping this host channel adapter.
Data to collect and record:
  1. Go to Collecting data from the fabric management server for 7874-024, 7874-040, 7874-120, and 7874-240 switches, and perform that procedure.
  2. If collecting data from the cluster systems management (CSM) server, perform the following steps:
    1. On the CSM server, make a data collection directory. In this directory, you will be storing node data to a file with the name of ibdata.customer.timestamp. Where you see file in the following substeps, use that directory and file name, for example, dir/ibdata.IBM.20080508-1034.
    2. dsh -av "netstat - i | grep ib" > file.netstat
    3. For AIX® nodes:
      • dsh -av "ibstat -n" > file.ibstat
      • dsh -av "lscfg -vp" > file.lscfg
    4. For Linux nodes:
      • dsh -av "ibv_devinfo - v" > file.devinfo
      • dsh -av "lspci" > file.lspci
  3. If collecting data from the node directly, perform the following steps:
    1. On the node, create a data collection directory. In this directory you will be storing the node data to a file with name of ibdata.customer.timestamp. Where you see file in the following substeps, use that directory and file name, for example, dir/ibdata.IBM.20080508-1034.
    2. netstat -i | grep ib > file.netstat
    3. For AIX nodes:
      • ibstat -n > file.ibstat
      • lscfg -vp > file.lscfg
    4. For Linux nodes:
      • ibv_devinfo -v > file.devinfo
      • lspci > file.lspci
    5. Send all files from that directory to your IBM service representative.
Scenario I: General debug with no specific problem

Where to look: No specific symptom

Data to collect and record: Collect the same data as for scenario H.



Send feedback Rate this page

Last updated: Tue, February 11, 2014