Troubleshooting
Problem
The Oracle Grid Control process (gcstartup) causes SAS drive failures on IBM blade servers. Once started, a mirrored SAS drive will be marked as failed on the LSI controller and the mirror needs to be recreated. Within two minutes of starting gcstartup (/etc/init.d/gcstartup start), the SAS RAID Mirror drive failed. When the agent is started, it corrupts the RAID disk mirroring. A number of messages appear in the syslog. A number of errors appear in the file "$ORACLE_HOME/sysman/sysman/log/emage
Resolving The Problem
Source
RETAIN tip: H191461
Symptom
The Oracle Grid Control process (gcstartup) causes SAS drive failures on IBM blade servers. Once started, a mirrored SAS drive will be marked as failed on the LSI controller and the mirror needs to be recreated.
Within two minutes of starting gcstartup (/etc/init.d/gcstartup start), the SAS RAID Mirror drive failed.
When the agent is started, it corrupts the RAID disk mirroring.
Refer to the following messages of the format in the syslog:
| Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: PhysDisk is now failed Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: PhysDisk is now failed, out of sync Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0 Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: volume is now degraded, enabled |
A number of errors appear in the file $ORACLE_HOME/sysman/sysman/log/emagent_perl.trc. The disks then report that there is an issue and the RAID controller splits the mirroring.
Affected configurations
The system may be any of the following IBM servers:
- BladeCenter HS12, type 8014, any model
- BladeCenter HS12, type 8028, any model
- BladeCenter HS21 XM, type 1915, any model
- BladeCenter HS21 XM, type 7995, any model
- BladeCenter HS21, type 1885, any model
- BladeCenter HS21, type 8853, any model
- System x3200 M2, type 4367, any model
- System x3200 M2, type 4368, any model
- System x3200, type 4362, any model
- System x3200, type 4363, any model
- System x3250 M2, type 4190, any model
- System x3250 M2, type 4191, any model
- System x3250 M2, type 4194, any model
- System x3250, type 4364, any model
- System x3250, type 4365, any model
- System x3250, type 4366, any model
- System x3350, type 4192, any model
- System x3350, type 4193, any model
This tip is not option specific.
The system is configured with at least one of the following:
- Red Hat Enterprise Linux 4, any update
Note: This does not imply that the network operating system will work under all combinations of hardware and software.
Please see the compatibility page for more information:
Solution
- Update the firmware for the on-board LSI 1064e controller to version 1.20 or higher.
- This issue is caused by the Oracle agent Oracle Grid Control. Oracle has a patch for this issue:
- Applying the one off backport
- Apply Patch 5713547 to AGENT_HOME
The Oracle reference is Bug 5713547 - STORAGE_REPORT_METRICS.PL IS CORRUPTING RAID DISK MIRRORING.
Workaround
Do not run the Oracle Grid Control.
Additional information
There is a known issue for Storage Array Metric Collections.
This is specific to the LSI 1064 Controller for all firmware levels. Currently, it is unknown if the issue affects other levels other than Red Hat Enterprise Linux 4. This issue does not affect other operating systems such as Microsoft Windows 2000 and 2003.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
29 January 2019
UID
ibm1MIGR-5072036