Troubleshooting
Problem
Recently, refreshed appliance incorrectly calls out a failed raid drive when there is none.
Symptom
Raid drive error from new appliance recently refreshed.
Here is an example the payload of the event:
Nov 27 06:03:53 XXX.XXX.XXX.XXX Monitoring]: [ERROR] [NOT:0500000100] (host IP XXX.XXX.XXX.XXX) Disk Failure - Hardware Monitoring has determined that a disk is in failed state - (Number of Failed Disks: 1) Slot Number: 2 - Failed;
This is the typical disk failure error received on the console.
Cause
We found that machine sending the alert is an older M4 or M5 that has been removed from the deployment. It has the same IP assigned, is still running and is advertising its raid drive problem throughout the sub-net.
Environment
The environment is usually a recent refresh, or a recent replacement of an appliance both using the same IP's as the original appliance has.
Diagnosing The Problem
Using one of the three usual diagnoses as examples show below;
Run and diagnose a DSA as shown below
https://www.ibm.com/support/pages/node/221063#DSA
Or
IMM or XCC Service Data logs as shown below
IMM Service Data can be collected by logging in to the IMM web interface, Go to Service and Support --> Download Service Data; then click Download Now.
Or
XCC Service Data can be collected by logging in to the XCC web interface, then from the home page look under Quick Actions section and click the Service drop-down link.
Then select Download Service Data.
When the information is gathered, you should be prompted to save the Service data file to your machine.
When the information is gathered, you should be prompted to save the Service data file to your machine.
Or
Command-line advanced settings utility commands to check the appliance's raid as shown below
/opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -a0
Results:
When the new appliance is checked there are no failing raid drives.
When the old appliance is checked you find the failing raid drives.
Resolving The Problem
We would advise that at least one of the three following actions is taken:
The failed drive would need to be replaced.
The failed drive would need to be replaced.
The OS will need to be re-installed with a non-QRadar build on the old appliance.
The old appliance will need to be powered off, to stop the alerts.
Most sites just power off the appliance, and then reinstall the OS to some non-QRadar build to be used as they deem fit.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtcAAA","label":"Hardware"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Product Synonym
Qradar
Was this topic helpful?
Document Information
Modified date:
08 September 2022
UID
ibm16616339