IBM Support

LSI mirror RAID failure on Red Hat Enterprise Linux 4 - Servers

Troubleshooting


Problem

The Oracle Grid Control process (gcstartup) causes SAS drive failures on IBM blade servers. Once started, a mirrored SAS drive will be marked as failed on the LSI controller and the mirror needs to be recreated. Within two minutes of starting gcstartup (/etc/init.d/gcstartup start), the SAS RAID Mirror drive failed. When the agent is started, it corrupts the RAID disk mirroring. A number of messages appear in the syslog. A number of errors appear in the file "$ORACLE_HOME/sysman/sysman/log/emage

Resolving The Problem

Source

RETAIN tip: H191461

Symptom

The Oracle Grid Control process (gcstartup) causes SAS drive failures on IBM blade servers. Once started, a mirrored SAS drive will be marked as failed on the LSI controller and the mirror needs to be recreated.

Within two minutes of starting gcstartup (/etc/init.d/gcstartup start), the SAS RAID Mirror drive failed.

When the agent is started, it corrupts the RAID disk mirroring.

Refer to the following messages of the format in the syslog:

  Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1

Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: PhysDisk is now failed

Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1

Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: PhysDisk is now failed, out of sync

Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0

Mar 26 14:05:33 tggsoa01 kernel: mptbase: ioc0: volume is now degraded, enabled

A number of errors appear in the file $ORACLE_HOME/sysman/sysman/log/emagent_perl.trc. The disks then report that there is an issue and the RAID controller splits the mirroring.

Affected configurations

The system may be any of the following IBM servers:

  • BladeCenter HS12, type 8014, any model
  • BladeCenter HS12, type 8028, any model
  • BladeCenter HS21 XM, type 1915, any model
  • BladeCenter HS21 XM, type 7995, any model
  • BladeCenter HS21, type 1885, any model
  • BladeCenter HS21, type 8853, any model
  • System x3200 M2, type 4367, any model
  • System x3200 M2, type 4368, any model
  • System x3200, type 4362, any model
  • System x3200, type 4363, any model
  • System x3250 M2, type 4190, any model
  • System x3250 M2, type 4191, any model
  • System x3250 M2, type 4194, any model
  • System x3250, type 4364, any model
  • System x3250, type 4365, any model
  • System x3250, type 4366, any model
  • System x3350, type 4192, any model
  • System x3350, type 4193, any model

This tip is not option specific.

The system is configured with at least one of the following:

  • Red Hat Enterprise Linux 4, any update

Note: This does not imply that the network operating system will work under all combinations of hardware and software.

Please see the compatibility page for more information:

Solution

  1. Update the firmware for the on-board LSI 1064e controller to version 1.20 or higher.
  2. This issue is caused by the Oracle agent Oracle Grid Control. Oracle has a patch for this issue:
  • Applying the one off backport
  • Apply Patch 5713547 to AGENT_HOME

The Oracle reference is Bug 5713547 - STORAGE_REPORT_METRICS.PL IS CORRUPTING RAID DISK MIRRORING.

Workaround

Do not run the Oracle Grid Control.

Additional information

There is a known issue for Storage Array Metric Collections.

This is specific to the LSI 1064 Controller for all firmware levels. Currently, it is unknown if the issue affects other levels other than Red Hat Enterprise Linux 4. This issue does not affect other operating systems such as Microsoft Windows 2000 and 2003.

Document Location

Worldwide

Operating System

BladeCenter:Red Hat Enterprise Linux 4

System x:Red Hat Enterprise Linux 4

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW22E","label":"BladeCenter->BladeCenter HS21"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW22I","label":"BladeCenter->BladeCenter HS21 XM"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW22K","label":"BladeCenter->BladeCenter HS12"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"LOB18","label":"Miscellaneous LOB"}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW329","label":"System x->System x3200"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW331","label":"System x->System x3250"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW334","label":"System x->System x3200 M2"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW335","label":"System x->System x3250 M2"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW336","label":"System x->System x3350"},"Platform":[{"code":"PF042","label":"Caldera"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
29 January 2019

UID

ibm1MIGR-5072036