Platform diagnostics (ppc64-diag)
Platform diagnostics report firmware events, provide an automated response mechanism to urgent events, and provide event notifications to system administrators and service frameworks.
Utility | PowerVM® partition on any level of Power® processor |
---|---|
rtas_errd | All current versions of the following distributions:
|
opal_errd | Not applicable |
opal-elog-parse | Not applicable |
opal-dump-parse | Not applicable |
diag_encl | All current versions of the following distributions:
|
encl_led | All current versions of the following distributions:
|
usysident | All current versions of the following distributions:
|
usysattn | All current versions of the following distributions:
|
ppc64-diag Error Log Analyzer (ELA) | All current versions of the following distributions:
|
diag_nvme | All current versions of the following distributions:
|
For Linux distributions currently supported on Power systems, see Linux on Power overview.
Platform diagnostics for systems using PowerVM virtualization
The platform diagnostics rtas_errd daemon logs platform events that are detected by firmware to servicelog. Platform events are also known as RTAS events. The rtas_errd daemon might also take more action on certain types of events, such as failures of fans or power supplies. It is configured to start automatically when Linux boots.
Platform diagnostics commands and the rtas_errd daemon are provided by the ppc64-diag package. The commands that are typically included are:
- explain_syslog
- Read a file (or stdin) that is in the format that is produced by the syslogd daemon, and print
an explanation for each line that matches a message in the
/etc/ppc64-diag/message_catalog message catalog. The explanations include
probable cause and recommended action. If run with the -M flag, the command
reads from the /var/log/messages file. For
example:
explain_syslog -M
- syslog_to_svclog
- Read a file (or stdin) that is in the format that is produced by the syslogd daemon, and log an
event to the servicelog database for each line that matches a message in the
/etc/ppc64-diag/message_catalog message catalog. It is not automatically
started when Linux boots. If run in the background with the
-M flag, it continuously monitors the /var/log/messages
file. For example:
syslog_to_svclog -M &
- usysident
- Use this utility to operate device identification, or to view and modify system identification indicators. This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3.
- usysattn
- If you run the usysattn utility without arguments, the system prints a list of all of the attention indicators on the system along with their current status (on or off). This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3.
Enclosure diagnostics (diag_encl)
As of SUSE Linux Enterprise Server 11 SP3, you can use additional options to diagnose problems on the 5888 PCIe storage enclosure. The diag_encl utility is contained in the ppc64-diag package.
The diag_encl utility can be run as part of a Linux CRON job (recommended), or run independently. For more information on setting up a CRON job, including the diag_encl utility, see Connecting and configuring the disk drive enclosure in a system running Linux (http://www.ibm.com/support/knowledgecenter/POWER7®/p7ham/scsidiskdriveenclosurelinux.htm).
Run the following command to access enclosure diagnostics as part of a CRON job:
:/usr/sbin/diag_encl -scl
- -h: Print this help message.
- -s: Generate serviceable events for any failures and write events to the service log.
- -c: Compare with previous status and report only new failures.
- -l: Turn on fault LEDs for serviceable events.
- -v: Verbose output.
- -V: Print the version of the command and exit.
- -f: For testing, read SCSI enclosure services (SES) data from path.pg2 and VPD from path.vpd.
- <scsi_enclosure>: The SCSI generic (sg) device on which to operate, such as sg7. If you do not specify a device, all such devices are diagnosed.
For more information, see the 5888 PCIe storage enclosure topic (http://www.ibm.com/support/knowledgecenter/POWER7/p7ham/p7ham_5888_kickoff.htm).
NVMe diagnostics (diag_nvme)
The ppc64-diag
package contains the diag_nvme utility. It is
recommended to run the diag_nvme utility as part of a Linux CRON job. However, you can also run the diag_nvme utility
independently. After ppc64-diag
is installed, a CRON job is automatically created.
This CRON job runs the diag_nvme utility daily for all NVMe devices that are
detected on the system. The CRON job file can be found at the following location:
/etc/cron.daily/run_diag_nvme
From the list of detected events, you can select the events that must be reported to the servicelog database by editing the /etc/ppc64-diag/diag_nvme.config configuration file. By default, reporting of all detected events is enabled.
- -h or --help: Prints a help message and exits.
- -d or --dump: Dumps SMART data from the specified NVMe device to a specified file in the file path.
- -f or --file: Only used for testing. Uses SMART data from the file that is specified instead of a NVMe device.
- nvme_devices: The NVMe device (or devices) that must be diagnosed, such as
nvme0
. If the NVMe device name is not specified, all the NVMe devices that are detected in the system are diagnosed.
Light path diagnostics
Light path diagnostics is a system of light emitting diodes (LEDs) on various external and internal components of the server. When an error occurs, LEDs are lit throughout the server. Use the following utilities to gather information about light path diagnostics:
- usysident
- Use this utility to view and turn on or off the indicators that identify devices on Power systems. This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3.
- usysattn
- If you run the usysattn utility without arguments, the system prints a list of all of the attention indicators on the system along with their current status (on or off). This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3.
Example: Locating a faulty Ethernet card
- The service log notifier alerts the light path diagnostics subsystem, lp_diag, that the Ethernet card is not functioning. Typically, the lp_diag utility runs automatically through an script that is registered when the ppc64-diag package is installed.
- The lp_diag utility enables an indicator LED.
- You notice that one of the LEDs on your system is lit and not flashing. You run the usysattn utility from the command line to get the location code of the LED indicator.
- To gather more information about card, you run the lscfg utility.
- You replace the faulty card, and use the log_repair_action utility to reset the LED.
For more information, see Light path diagnostics topic.
The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages.
For more information about the ppc64-diag package, see ppc64 Platform Diagnostics.