IBM Support

All PCI Error detected a critical state on Group2

Troubleshooting


Problem

A false-positive alert is generated by the combination of the IMM and uEFI firmware levels installed on the nodes. The error is benign and does not affect any physical hardware functionality. The error normally occurs during a reboot of the node, but it has also been seen when the IMM is booted independently of the node.

Symptom

The error/event will appear in GUI similar to the following:


The error will appear in the CLI "lslog" output similar to:
Sensor All PCI Error detected a critical state on Group2: Fault Status asserted.

Environment


This issue exists on IBM SONAS and IBM Storwize V7000 Unified System V1.5 and V1.6

Resolving The Problem


This error/event can safely be cleared by right-clicking the event and selecting "Mark as Read" from the GUI.

NOTE: If the error occurs repeatedly after clearing it, the following can be used to as a longer-term solution.
1. From the GUI (Monitoring > System Details > Status), Right-click on the 'device assert' event and mark it as fixed/read.
2. Clear the IMM event log and reboot the IMM, this will not impact the node or clients.

01. Log into mgmt001

02. Connect to the IMM on mgmt002
[mgmt001]# telnet mgmt002st001imm [USERID / PASSW0RD, 0=zero]

03. Clear the IMM Log
system> clearlog
Log cleared successfully!!

04. Reboot the IMM (this does not affect user operations)
system> resetsp
Submitting reset request.. Reset done.

NOTE: This may lock up your current session and you may need to open a new one.

05. From mgmt001 ping mgmt002's IMM
[mgmt001]# ping mgmt002st001imm

06. Verify you can get into the IMM and the log is cleared (you should old see a few recent events when running readlog multiple times.)
[mgmt001]# telnet mgmt002st001imm [USERID / PASSW0RD, 0=zero]

system> readlog
1 I 04/07/2016 21:30:18.83 Management Controller SN# 06YAVH6 reset was initiated by user USERID.
2 I 04/07/2016 21:29:46.212 The Platform Event Log on system SN# 06YAVH6 cleared by user USERID.
3 I 04/07/2016 21:29:44.54 The Audit Event Log on system SN# 06YAVH6 cleared by user USERID.
system> readlog
1 I 04/07/2016 21:30:18.83 Management Controller SN# 06YAVH6 reset was initiated by user USERID.
2 I 04/07/2016 21:29:46.212 The Platform Event Log on system SN# 06YAVH6 cleared by user USERID.
3 I 04/07/2016 21:29:44.54 The Audit Event Log on system SN# 06YAVH6 cleared by user USERID.

07. Log into mgmt002

08. Connect to the IMM on mgmt001
[mgmt002]# telnet mgmt001st001imm [USERID / PASSW0RD, 0=zero]

09. Clear the IMM Log
system> clearlog
Log cleared successfully!!

10. Reboot the IMM (this does not affect user operations)
system> resetsp
Submitting reset request.. Reset done.

NOTE: This may lock up your current session and you may need to open a new one.

11. From mgmt002 ping mgmt002's IMM
[mgmt002]# ping mgmt002st001imm

12. Verify you can get into the IMM and the log is cleared (you should old see a few recent events when running readlog multiple times.)
[mgmt002]# telnet mgmt001st001imm [USERID / PASSW0RD, 0=zero]

system> readlog
1 I 04/07/2016 21:30:18.83 Management Controller SN# 06YAVH6 reset was initiated by user USERID.
2 I 04/07/2016 21:29:46.212 The Platform Event Log on system SN# 06YAVH6 cleared by user USERID.
3 I 04/07/2016 21:29:44.54 The Audit Event Log on system SN# 06YAVH6 cleared by user USERID.
system> readlog
1 I 04/07/2016 21:30:18.83 Management Controller SN# 06YAVH6 reset was initiated by user USERID.
2 I 04/07/2016 21:29:46.212 The Platform Event Log on system SN# 06YAVH6 cleared by user USERID.
3 I 04/07/2016 21:29:44.54 The Audit Event Log on system SN# 06YAVH6 cleared by user USERID.

[{"Product":{"code":"ST5Q4U","label":"IBM Storwize V7000 Unified (2073-700)"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"1.5","Platform":[{"code":"PF016","label":"Linux"}],"Version":"1.5;1.6","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
17 June 2018

UID

ssg1S1009955