Troubleshooting
Problem
Symptom
Cause
An FPGA reset is performed when the checksum for SRAM contained in the FPGA mismatches with that of the source logic program contained in flash memory. A mismatch is due to a transient soft-error in the chip. A soft-error is fully recoverable and is not a hardware error in the chip. Therefore, no hardware is replaced for this condition.
SRAM contains the programming for the FPGA chip. For the PCIe3 cable adapter, and for the EMXG PCIe 6-slot fanout module, the FPGA is used only for monitoring, control functions and also carries the PCIe reference clock needed by the fanout module. No customer data is handled or affected by the function of the FPGA.
Environment
Resolving The Problem
A temporary workaround to prevent the FPGA reset is available. This workaround must be performed after each server platform IPL, and also after concurrent maintenance where a component the EMX0 expansion drawer is replaced. These sections describe how to use the Hardware Management Console (HMC) Enhanced User Interface, the Advanced System Management Interface (ASMI), and restricted shell command line to enable the workaround.
This procedure turns off the ability of the FPGA to reset. It does not prevent the soft-errors that trigger the reset. A slight risk that a hardware failure can be reported exists that is due to a soft-error. However, there is no possibility to trace the failure back to a soft-error.
FW920.20 and Newer Workaround by using the HMC Enhanced User Interface
- Under Resources -> All Systems, select the server to enable the workaround on.
- In the navigator, find the serviceability section and click Serviceability.
- In the Serviceability menu, under View and Collect, select Manage Dumps.
- On the Manage Dumps dialog, Verify the server at the top then select Action -> Initiate Resource Dump. Do not attempt to use any other options from this menu.
- Enter "xmsvc -DISABLECCSER" in the resource selector field.
- Click the OK button to disable FPGA resets for the server. If the system indicates that the dump request was successfully initiated. No further action is needed.
FW920.20 and Newer Workaround by using the Advanced System Management Interface (ASMI)
- Log in to ASMI as admin or greater authority.
- In the navigator, expand System Service Aids, then select Resource Dump.
- Enter "xmsvc -DISABLECCSER" in the resource selector field:
- Click the Initiate Resource Dump button. If the request is successful, text similar to the following is displayed.
***** THIS COMPLETES THE PROCEDURE FOR THE ASMI INTERFACE *****
FW920.20 and Newer Workaround by using HMC Restricted Shell
- Log in to HMC Restricted Shell via SSH, PuTTY, or from Restricted Shell on the HMC GUI with admin level authority.
- Use the command "lssyscfg -r sys -Fname,serial_num" to identify the system name in the first field of output.
- Run the command "startdump" replacing {managed server} with the server name identified previously.
- startdump -m {managed server} -t resource -r "xmsvc -DISABLECCSER"
- If the command completes successfully, no further action is required. FPGA resets are now disabled for the selected server.
***** THIS COMPLETES THE PROCEDURE FOR THE HMC RESTRICTED SHELL INTERFACE *****
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
07 December 2021
UID
ibm16114076



