VIOS_VFC_HOST with rc = 0x00000034

Troubleshooting

Problem

VIOS_VFC_HOST with rc = 0x00000034 logged in VIOS errlog.

In some cases, the client partition can fail to discover SAN storage.

Symptom


LABEL:		VIOS_VFC_HOST
IDENTIFIER:	95A6D9B9

Date/Time:       Tue Apr  9 17:19:57 2019
Sequence Number: 643665
Machine Id:      00C0E0584C00
Node Id:         <VIOS>
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   vfchost112

Description
Virtual FC Host Adapter detected an error

Probable Causes
Virtual FC Host Adapter Driver is an Undefined State

Failure Causes
Virtual FC Host Adapter Driver is an Undefined State

	Recommended Actions
	Remove Virtual FC Host Adapter Instance, then Configure the same instance

Detail Data
ERNUM
0000 0089                                                                        [....                            ]
ABSTRACT
fp_ioctl() failed for SCIOLSTART
AREA
File System
BUILD INFO
BLD: 1802 12-13:56:38 p2018_07A1
LOCATION
Filename:npiv_utils.c Function:npiv_port_sciolst Line:3369
DATA
rc = 0x00000034	 failure_type = 0x00	 fail_reason_code = 0x00
 fail_reason_exp = 0x00	 einval_arg = 0x00	 login->SCSI_ID = 0x00000000000A0FA1

Cause

1) Based on the Detail Data, SCIOLSTART failed with ESTALE (0x00000034) indicating a stale login on the SAN. This error is commonly due to an issue on the SAN (outside the VIOS), such as a cable move, a bad SFP port or switch cable.

2) This same error is also seen when the switch nameserver returns the initiator WWPN in the list of targets, causing the initiator to try to log in to itself (which fails with ESTALE). This can be confirmed by checking if the login->SCSI_ID value in the errpt entry matches the scsi_id of the physical port to which the vfchost is mapped. In the previous example, let us say vfchost112 is mapped to physical port fcs4. If physical port fcs4 has SCSI_ID 0x0A0Fxx it matches this condition. To find the physical port SCSI_ID, first find the child device of fcs4:
$ lsdev -dev fcs4 -child
(Usually this will be fscsi4)
$ lsdev -dev fscsi4 -attr scsi_id
Leading zeros can be dropped, so the output might look like: 0xa0f00

Environment

Any VIOS

Diagnosing The Problem

In the sample error, the ESTALE (0x00000034) happened when we tried to log in to port 0A0FA1 (the SCSI_ID value in the Detail Data), which is the port that we failed to log in to.

Resolving The Problem

1) Determine your remote port in question (the SCSI_ID value). Then, engage you SAN administrator to review the SAN logs around the time the error was logged on the VIOS against the port in question.

2) Where the error is triggered by the initiator attempting to log in to itself, the solution is to set the hidden attribute sw_prli_rjt=yes on the physical VIOS ports. It is advisable to set this on all ports, even if the errors are not seen on all of them:

As padmin:

$ chdev -dev fcs4 -perm -attr sw_prli_rjt=yes

The VIOS needs to be rebooted to activate the above change. Alternatively, a VIOS reboot can be avoided if all the mappings to the physical adapter can be set to Defined state (assumes clients have sufficient redundant paths). Continuing with the example of fcs4:

- make a list of all vfchosts mapped to fcs4:

$ lsmap -npiv -all

- Set each to Defined state (client paths will fail)

$ rmdev -dev vfchostX -ucfg
$ rmdev -dev fscsi4 -ucfg -recursive

- Set the attribute and make the devices Available again:

$ chdev -dev fcs4 -attr sw_prli_rjt=yes

$ cfgdev

Verify that client paths have recovered before repeating on the remaining adapters.

Related Information

Certain Fibre Adapters in IBM Power Systems are Incorrectly Registerd as Target…

Document Location

Worldwide

[{"Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Tips