IBM Support

AIX NPIV Client Boot Hanging with Error Code 554 due to max_xfer_size

Troubleshooting


Problem

The max_xfer_size value from the Virtual Fibre Channel can cause the next boot to fail with LED 554

Symptom

AIX NPIV Client Boot : LED 554

Cause

This boot error is caused by the client's max_xfer_size being set too high compared to the VIOS fibre physical adapter.

Environment

AIX & NPIV

Diagnosing The Problem

To confirm the boot failure is due to the max_xfer_size's value being too high on the AIX Client, perform a Boot Debug on the AIX client.

To enable this Boot Debug, follow the guidance from this link:

http://www-01.ibm.com/support/docview.wss?uid=isg3T1019037

Then, trap the following message: "open of /dev/fscsi0 returned 61"

0 > boot -s verbose -s debug

Elapsed time since release of system processors: 1608539 mins 51 secs
-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: 12:57:03 08/25/2017
The current time and date: 10:47:47 06/04/2018
processor count: 10; memory size: 4096MB; kernel size: 38214731
boot device: /vdevice/vfc-client@30000003/disk@5005076802101490,0000000000000000
....
exec(/bin/sh,-c,/usr/lib/methods/cfgefscsi -1 -l fscsi0){4784538,3932556}
exec(/usr/lib/methods/cfgefscsi,-1,-l,fscsi0){4784538,3932556}
----------------
Completed method for: fscsi0, Elapsed time = 0
Return code = 61
*** no stdout ****
***** stderr *****
MS 4784538 3932556 /usr/lib/methods/cfgefscsi -1 -l fscsi0
M4 4784538 Parallel mode = 1
M4 4784538 Get CuDv for fscsi0
M4 4784538 Get device PdDv, uniquetype=driver/vionpiv/efscsi
M4 4784538 Get parent CuDv, name=fcs0
M4 4784538 ..make_dvc_available()
M4 4784538 ..get_dvdr_name()
M4 4784538 driver: efscsidd
M4 4784538 Major number: 17
M4 4784538 Minor number: 1
M4 4784538 ..mk_sp_file()
M4 4784538 Calling build_dds()
M4 4784538 ..get_attr_list()
M4 4784538 Get PdAts for 'uniquetype = driver/vionpiv/efscsi'
M4 4784538 Get CuAts for 'name = fscsi0'
M4 4784538 ..get_attr_list()
M4 4784538 Get PdAts for 'uniquetype = adapter/vdevice/IBM,vfc-client'
M4 4784538 Get CuAts for 'name = fcs0'
M4 4784538 Attr cancel_to found
M4 4784538 Attr tg_cancel_to found
M4 4784538 Attr nport_log_to not found
M4 4784538 Attr pdisc_to not found
M4 4784538 Attr prlog_to not found
M4 4784538 Attr reset_to not found
M4 4784538 Attr gen_req_to not found
M4 4784538 Attr rft_id_to not found
M4 4784538 Attr aopen_to not found
M4 4784538 Attr linkup_to not found
M4 4784538 Attr update_vport_to not found
M4 4784538 Attr starve_dly not found
M4 4784538 Attr fc_class not found
M4 4784538 Attr sw_fc_class found
M4 4784538 Attr nm_fc_class not found
M4 4784538 Attr fc_err_recov found
M4 4784538 Attr dyntrk found
M4 4784538 Attr prli_delay not found
M4 4784538 Attr cmd_delay not found
M4 4784538 Attr gid_pn_delay not found
M4 4784538 Attr num_cmd_elems found
M4 4784538 Attr intr_priority found
M4 4784538 Attr retry_count not found
M4 4784538 Attr relogin not found
M4 4784538 Attr max_events not found
M4 4784538 ..handle_vpd()
M4 4784538 Setting CuDv to AVAILABLE
M4 4784538 ..get_attr_list()
M4 4784538 Get PdAts for 'uniquetype = driver/vionpiv/efscsi'
M4 4784538 Get CuAts for 'name = fscsi0'
M0 4784538 cfgefscsi.c 1085 open of /dev/fscsi0 returned 61
M0 4784538 cfgcommon.c 355 define_children returned rc=47
M4 4784538 Exit code = 61

Method error (/usr/lib/methods/cfgefscsi -1 -l fscsi0 ):
0514-061 Cannot find a child device.
----------------
+ /usr/lib/methods/showled 0x511 DEV CFG 1 END
exec(/usr/lib/methods/showled,0x511,DEV CFG 1 END){3932560,3867002}
showled + + bootinfo -b
exec(/usr/sbin/bootinfo,-b){3932562,3867002}
exec(/usr/lib/boot/bin/bootinfo_chrp,-b){3932562,3867002}
dvc=
+ [ ! ]
+ bootinfo -U
exec(/usr/sbin/bootinfo,-U){3932564,3867002}
+ [ 0 -eq 1 ]
+ loopled 0x554 UNKNOWN BOOTDISK
exec(/usr/lib/methods/showled,0x554,UNKNOWN BOOTDISK){3932566,3867002}
showled

Resolving The Problem

Boot the AIX client partition in maintenance mode, then change the max_xfer_size to an equal or lower value than the one configured on the VIOS.

To check the value implemented on the VIOS, identify the "vfchost" used.  In this example, vfchost5 is the one we want to check:

$ lsmap -vadapter vfchost5 -npiv

-----------------------------------------------------------------------------

Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost5 U9117.MMB.10014EP-V2-C25 3 LPAR_NAME AIX


Status:LOGGED_IN
FC name:fcs0 FC loc code:U78C0.001.DQD9443-P2-C1-T1
Ports logged in:11
Flags:a<LOGGED_IN,STRIP_MERGE>
VFC client name:fcs1 VFC client DRC:U9117.MMB.10014EP-V3-C3


Then check the physical adapter setting from the ODM & Kernel :

# lsattr -El fcs0
....
max_xfer_size 0x200000 Maximum Transfer Size True

# echo "efcs -d fcs0" | kdb| grep max_xfer_size
int max_xfer_size = 0x200000


NOTE: If the values do not match, this means the ODM settings have been changed, and the VIOS has not been rebooted yet.

Then you can set this value in AIX Client ODM (Maintenance Mode) with the chdev command, and perform a normal boot.

Another way to address this would be to change "max_xfer_size" on the physical HBA to match the client, but this change requires a VIOS reboot. This option is more risky if one of LPARs is not fully operational from the multi-path perspective.

[{"Product":{"code":"SSMV87","label":"AIX Enterprise Edition"},"Business Unit":{"code":"BU009","label":"Systems - Server"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"}],"Version":"6.1;7.1;7.2","Edition":""}]

Document Information

Modified date:
06 December 2019

UID

isg3T1027885