IBM Support

Multipath and SG_DD Command hangs - System x3650 M4

Troubleshooting


Problem

While trying to setup Multipath, x3650 M4 hangs with Serve RAID 5110e and 512 BS (block size) drives. This issue occurs at start, while multipath is enabled and no redundant path is established. System hangs while trying to initialize multipath. RAID logs may show "Controller encountered a fatal error and was reset". sg_dd command with 4096 bs parameter triggers the same internal read that multipath does and eventually hangs the system.

Resolving The Problem

Source

RETAIN tip: H212297

Symptom

While trying to setup Multipath, x3650 M4 hangs with Serve RAID 5110e and 512 BS (block size) drives. This issue occurs at start, while multipath is enabled and no redundant path is established.

System hangs while trying to initialize multipath.

RAID logs may show "Controller encountered a fatal error and was reset".

sg_dd command with 4096 bs parameter triggers the same internal read that multipath does and eventually hangs the system.

 

dmesg command shows following:

[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
megasas: [ 0]waiting for 1 commands to complete
megasas: [ 5]waiting for 1 commands to complete
megasas: [10]waiting for 1 commands to complete
megasas: [15]waiting for 1 commands to complete
megasas: [20]waiting for 1 commands to complete
megasas: [25]waiting for 1 commands to complete
megasas: [30]waiting for 1 commands to complete
megasas: [35]waiting for 1 commands to complete
megasas:[40]waiting for 1 commands to complete
megasas: [45]waiting for 1 commands to complete
megasas: [50]waiting for 1 commands to complete
megasas: [55]waiting for 1 commands to complete
megasas: [60]waiting for 1 commands to complete
megasas: [65]waiting for 1 commands to complete
megasas: [70]waiting for 1 commands to complete
megasas: [75]waiting for 1 commands to complete
megasas: [80]waiting for 1 commands to complete
megasas: [85]waiting for 1 commands to complete

megaraid_sas: pending commands remain after waiting, will reset adapter.

megaraid_sas: resetting fusion adapter.

megasas: Waiting for firmware to come to ready state

INFO: task sg_dd:2601 blocked for more than 120 seconds.

This issue was seen on following OS:

  • RHEL6.3 GA
  • RHEL6.4 GA
  • RHEL6.5 RC
  • SLES11 SP3 GMC2

Affected configurations

The system may be any of the following IBM servers:

  • System x3650 M4, type 7915, any model

The system is configured with one or more of the following IBM Options:

  • ServeRAID M5110 SAS/SATA Controller Card, Option part number 81Y4481, any replacement part number (CRU)
  • ServeRAID M5110 SAS/SATA Controller for IBM System x (CTO), any replacement part number
  • ServeRAID M5110e SAS/SATA Controller for IBM System x, onboard, any embedded

This tip is not software specific.

The SERVE RAID device driver for the RAID is affected.

The SERVE RAID firmware for the RAID is affected.

The following system firmware level(s) are affected: ServeRAID Firmware for Onboard and External RAID Card

The system has the symptom described above.

Workaround

By blocklisting the IBM ServeRAID M5110e in multipath.conf, this issue can be avoided as shown on Red Hat example below.


  blocklist {>

device {>
vendor "IBM">
product "ServeRAID M5110e">
} }

Additional information

This issue occurs due to the Multipath command trying to inquire about a redundant path that is yet not established, hence firmware cannot find the path and will not provide results to the driver.

By blocklisting the IBM local ServeRAID, multipath ignores the local Host Bus Adapters (HBAs) and only will search for a path established outside of the server.

This command along with sg_dd command with 4096 bs parameter hits the same limitation of the drive's physical capability and ServeRAID firmware capability, none of those are supported until third quarter 2014.

The issue can be recreated using following steps:

sg_dd command

sg_dd blk_sgio=1 if=/dev/sda of=/dev/null bs=4096 count=1

  1. Create RAID 1 array on M5110e.

  2. Install RHEL 6.4 64 bit, UEFI install.

  3. Install multipath via RPM -ivh device-mapper-multipath.

  4. Run following command to enable multipath with default config file and start the multipathd:

      mpathconf --enable --with_multipathd y --find_multipaths
  5. Restart. Everything will be fine.
  6. Edit /etc/multipath.conf. Under the 'Default' section, add the following:
      polling_interval 10
    path_grouping_policy multibus
    path_checker readsector0
    failback immediate
    rr_min_io_rq 100
    rr_weight priorities
  7. Restart the service:
    1. service multipathd restart
  8. Restart system.

Document Location

Worldwide

Operating System

System x:SUSE Linux Enterprise Server 11

System x Hardware Options:SUSE Linux Enterprise Server 11

System x:SUSE Linux Enterprise Server 11 x86-64

System x Hardware Options:SUSE Linux Enterprise Server 11 x86-64

System x:Red Hat Enterprise Linux 6

System x:Red Hat Enterprise Linux 6 x86-64

System x Hardware Options:Red Hat Enterprise Linux 6

System x Hardware Options:Red Hat Enterprise Linux 6 x86-64

Lenovo x86 servers:Red Hat Enterprise Linux 6

Lenovo x86 servers:Red Hat Enterprise Linux 6 x86-64

Lenovo x86 servers:SUSE Linux Enterprise Server 11

Lenovo x86 servers:SUSE Linux Enterprise Server 11 x86-64

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU01DKP","label":"System x->System x3650 M4->7915"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QUOEARE","label":"System x Hardware Options->ServeRAID->ServeRAID M and MR10 Series->81Y4481"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"QUOFNIO","label":"Lenovo x86 servers->Lenovo System x3650 M4->7915"},"Platform":[{"code":"PF042","label":"Caldera"},{"code":"PF047","label":"SurePOS"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5094908