IBM Support

MPIO Paths Don't Automatically Recover

Question & Answer


Question

Why do failed fscsi paths not automatically recover?

Cause

Some SAN or fabric event has caused fscsi path failure on MPIO disks.

Answer

Prerequisites for Automatic Recovery of Failed MPIO Paths
Automatic recovery of fscsi failed paths to an MPIO hdisk device occurs when the following prerequisites are met:
1. For open hdisk device, the hdisk’s hcheck_interval value must be greater than zero.
A path hcheck_interval=0 disables all path health checking.
Path hcheck_interval can be viewed by using the lsattr command for the disk(s) in question:
lsattr -El hdisk# -a hcheck_interval
2. For closed hdisk device, the dk_closed_path_recovery tunable shall be set to “1” (enabled) to enable auto-recovery of failed disk paths even when the associated disk is closed and health check cannot be performed.
The following command can be used to verify the tunable setting:
ioo -o dk_closed_path_recovery
With dk_closed_path_recovery=0, recovery does not occur for closed disks. The fscsi path status reflects the last known state from disk closure.
 

Understanding Open and Closed Disks in the context of AIX MPIO

In AIX, the terms "Open" and "Closed" describe the state of an MPIO storage device(hdisk), based on whether it is actively in use by an application, process, or system operation for I/O activities.

1. Open Disk
An open disk is a storage device actively engaged in I/O operations by an application, process, or system task. For example:
A database application reading from or writing to the disk keeps it open.
A physical volume (hdisk) that is part of an 'active' (varied on) volume group is also considered open.
While an MPIO disk is open, the lsmpio command displays the extended fscsi path status for each path - as defined in the AIX lsmpio man page - excluding the "Clo" status, which indicates a closed fscsi path.
# lspv
hdisk0          00xxxxxxxxxxxxxx                    rootvg          active

# lsmpio -l hdisk0
name    path_id  status   path_status  parent  connection
hdisk0  0        Enabled  Sel,Opt      fscsi0  50050768102xxxxx,x000000000000
hdisk0  1        Enabled  Non          fscsi0  50050768101xxxxx,x000000000000
2. Closed Disk

A disk is considered "closed" when it is not involved in any active I/O operations. For example:
No applications or processes are accessing the disk.
The disk is not part of an 'active' volume group.
All MPIO disk paths show the extended "Clo" status.
# lsmpio -l hdisk2
name    path_id  status   path_status  parent  connection
hdisk2  0        Enabled  Clo          fscsi0  50050768102xxxxx,x000000000000
hdisk2  1        Enabled  Clo          fscsi0  50050768101xxxxx,x000000000000
Recommendations:
  • Whatever connectivity / path communication error caused the initial fscsi path failure must have been remedied and corrected. If connectivity/ communication on the fscsi path has not been restored, the PCM auto-recovery mechanisms will not be able to bring the fscsi path back online.
  • For hdisk device that is open, ensure hcheck_interval is set to ‘60’ as recommended in the IBM AIX MPIO Best Practices and Considerations
  • For closed hdisk device, consider enabling  dk_closed_path_recovery  for automatic recovery of Failed fscsi paths. Please visit AIX MPIO Closed Path Recovery Feature for more information regarding dk_closed_path_recovery tunable.

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
19 May 2025

UID

isg3T1024485