Troubleshooting
Problem
SEU error messages in sysmgr.log
Symptom
Example of single bit SEU Error:
2013-12-05 18:37:06.358233 EST Error: DAC [dac hwid=2080 sn="1232F58250345" SPA=8 Parent=1796 Position=1][fpga-index=3] on SPU [spu hwid=1796 sn="06Y5616" SPA=8 Parent=1012 Position=9 spuName= spu0809] reported - SEU single bit error:
Example of multi-bit SEU error:
2013-11-17 00:59:38.007859 PST Error: DAC [dac hwid=1590 sn="YF10JB2AS772" SPA=1 Parent=1588 Position=2][fpga-index=1] on SPU [spu hwid=1588 sn="Y012BG2BK073" SPA=1 Parent=1002 Position=13 spuName= spu0113 DesignatedSpu] reported - SEU multi bit error:
Cause
Single-Event Upsets, SEUs, are caused by particle or ray-based radiation changing the contents of the FPGA programming logic
Environment
N1000-1, N2000-1, N100-1,, N3001
Diagnosing The Problem
To diagnose this issue the first step is to review all sysmgr logs by using the grep command for the SEU string
grep -i SEU /nz/kit/log/sysmgr/*.log
Resolving The Problem
Work with support to failover the affected SPU and scheduled FPGA replacement.
Fail the affected SPU if more than two SEU error messages have been present in 24 hours, or a clear trend has been established.
Fail the affected SPU if a single multi-bit SEU error is detected in the logging.
Was this topic helpful?
Document Information
Modified date:
17 October 2019
UID
swg21680171