Settings for Linux hosts

To ensure path recovery in failover scenarios, certain Device Mapper Multipath (DMMP) settings and udev rules for the attachment of Linux® hosts to the system are recommended. These settings are valid for IBM® System x, all Intel or AMD-based servers, and Power® platforms.

You must restart your host after you complete the following two steps:
  • Editing the multipath settings in /etc/multipath.conf
  • Editing the udev rules for SCSI command timeout

For each Linux distribution and releases within a distribution, refer to the default settings under [/usr/share/doc/device-mapper-multipath.*] for Red Hat® and [/usr/share/doc/packages/multipath-tools] for Novell SuSE. Ensure that the entries added to multipath.conf match the format and syntax for the Linux distribution. Use the multipath.conf only from your related distribution and release. Do not copy the multipath.conf file from one distribution or release to another.

For some operating system levels, the polling_interval needs to be located under defaults instead of under device settings. If polling_interval is present in the device section, comment out polling_interval by using a # key.

For example:
Under Device Section
# 		polling_interval 30,

Under Defaults Section
defaults {
		user_friendly_names yes
		polling_interval  30
}

Multipath settings for specific Linux distributions and releases

Edit /etc/multipath.conf with the following parameters and confirm the changes by entering:
multipathd -k
multipathd> show config
Red Hat Linux versions 5.x, 6.0, and 6.1
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "round-robin 0"
     prio_callout "/sbin/mpath_prio_alua /dev/%n" #Used by Red Hat 5.x
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5
     rr_weight uniform
     rr_min_io 1000
     dev_loss_tmo 120	
Red Hat Linux versions 6.2 and higher
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "round-robin 0" # Used by Red Hat 6.2
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5
     rr_weight uniform
     rr_min_io_rq "1"
     dev_loss_tmo 120	
Red Hat Linux version 7.x and 8.x
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "service-time 0" # Used by Red Hat 7.x
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5
     rr_weight uniform
     rr_min_io_rq "1"
     dev_loss_tmo 120	
SUSE Linux Versions 10.x and 11.0 and 11 SP1
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "round-robin 0"
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5
     rr_weight uniform
     rr_min_io 1000
     dev_loss_tmo 120	
SUSE Linux Versions 11 SP2
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "round-robin 0" # Used by SLES 11 SP2
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5
     rr_weight uniform
     rr_min_io_rq "1"
     dev_loss_tmo 120	
SUSE Linux Versions 11 SP3+
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "service-time 0" # Used by SLES 11 SP3+
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5
     rr_weight uniform
     rr_min_io_rq "1"
     dev_loss_tmo 120	
SUSE Linux Versions 12+
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "service-time 0" 
     prio "alua"
     path_checker "tur"
     failback "immediate"
     retain_attached_hw_handler "yes"
     no_path_retry 5 # or no_path_retry "fail"
     fast_io_fail_tmo 5
     rr_min_io 1000
     rr_min_io_rq 1
     rr_weight "uniform"	
Ubuntu
     vendor "IBM"
     product "2145"
     path_grouping_policy "group_by_prio"
     path_selector "service-time 0" 
     prio "alua"
     path_checker "tur"
     failback "immediate"
     no_path_retry 5 # or no_path_retry "fail"
     retain_attached_hw_handler "yes"
     fast_io_fail_tmo 5
     rr_min_io 1000
     rr_min_io_rq 1
     rr_weight "uniform"	

DM-MPIO for dev_loss_tmo

After a problem is detected on an FC port and it set to infinity, the SCSI layer can wait until 2147483647 seconds (68 years) before removing it from the system. The default value is determined by the OS.

All Linux hosts should have a dev_loss_tmo setting, but the value in seconds is how long to wait for the device/paths to be pruned. The suggested duration is 120-150 seconds, but extended duration is also supported.

Care needs to be taken if it is too low since if paths are pruned, then they also need to be rediscovered and if too low, that may require manual rescan later. If inquiry timeout is right, the host should be able to re-add the paths when the SVC nodes are restored.

If the inquiry is too short such as 20 seconds then the inquiry may timeout before the paths are ready.

Multipathing driver

If you lose paths and are not automatically restored, you can manually get them back with the following process.

  • If you are using Linux dm-multipath as the multipathing software on the host, and have NPIV disabled on the SVC, it may be necessary to rescan ports after each node restores its paths.
  • If you have NPIV disabled on the SVC, additional configuration is required. During the upgrade, an SVC with NPIV disabled will shut down the ports for an extended period of time, which may cause Linux to remove the ports.
  • This setting may be applied to a running system, however it must also be applied to the GRUB configuration in order to ensure it persists over reboots.
  • It is possible that, even with this setting, an SVC upgrade may keep the ports down longer than the timeout setting allows. It may be necessary to rescan the ports once the SVC node has restored operation.
  • Use the multipath command to check path status once the SVC node has completed the upgrade, before updating the next node:
    # multipath -ll /dev/mapper/mpathampatha 
    (360050768028211d8b000000000000061) dm-11 IBM,2145
    size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
    |-+- policy='service-time 0' prio=50 status=enabled
    | |- 4:0:1:0  sder    129:48  failed faulty running
    | `- 5:0:1:0  sdfb    129:208 failed faulty running
    `-+- policy='service-time 0' prio=10 status=active 
     |- 4:0:0:0  sdem    128:224 active ready running 
     `- 5:0:0:0  sdew    129:128 active ready running#

Choose a device name, instead of mpatha in the above example, to match a device being utilized by the SVC that is performing the upgrade. The failed and faulty status, indicates that paths are still down for Linux. You can use multipath -ll command to list all devices name, and then scan the output for failed paths, confirming that failed paths are on devices associated with the SVC upgrade.

To rescan all SCSI targets and the ports on Red Hat Enterprise Linux or SUSE Linux Enterprise Server, use the rescan-scsi-bus.sh command which is part of the sg3_utils package.

If your Linux distribution does not include the rescan-scsi-bus.sh command, use the SCSI-rescan command to rescans all the SCSI targets.