A fix is available
APAR status
Closed as program error.
Error description
When a virtual machine is configured to exploit one or more simulated VSwitch NIC devices for network connectivity and also using a dedicated real QDIO device in a multiple vCPU virtual I/O configuration, there is the possibility of the system terminating with a SXA004 ABEND. A real dedicated QDIO device can either be an OSA-Express, HiperSockets or a FCP device. A VAP002 abend is also a symptom of this problem. Other symptoms can be a slow or unresponsive system. If a SNAPDUMP is taken many calls to HCPVAI+9D2 may exist.
Local fix
N/A
Problem summary
**************************************************************** * USERS AFFECTED: All customers using a z/VM VSwitch for guest * * network connectivity in addition to a * * dedicated QDIO capable device such as an OSA * * Express, HiperSockets, or a FCP Device. * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** z/VM may terminate with an SXA004 ABEND forcing a system IPL to recover, when a virtual machine is using both a VSwitch to provide network connectivity and real dedicated QDIO Devices like OSA-Express, HiperSockets or FCP. QDIO devices use a special form of I/O Interruption called an Adapter Interruption (AI) to present completion events for pending networking or SCSI operations. The main difference between a tradition I/O Interruption and an AI is that completion events for multiple devices can be presented with just a single interruption. The problem here is in the logic which currently merges completion events for multiple devices into a single I/O Interruption. Specifically the merging of simulated AIs for a VSwitch device with real hardware generated AIs for a dedicated QDIO Device. The following virtual machine configuration is exposed to this issue: 1. Configured and using multiple virtual processors (MP CONFIG) 2. One or more virtual NICs coupled to one or multiple VSwitches 3. One or more dedicated FCP, HiperSockets or OSA-Express Devices 4. Both VSwitch and real QDIO Devices must be transferring data
Problem conclusion
For a real dedicated QDIO Device, the hardware has the ability to reflect an Adapter Interruption (AI) directly to an operating system running in a virtual machine without z/VM involvement. For a simulated VSwitch Device, it is z/VM's responsibility for generating a simulated AI to report a completion event. Therefore, when running a configuration using both real and simulated devices, a hardware generated AI can inform the operating system of completion events for both real and simulated devices. In order to generate a simulated AI, z/VM uses a special AI Virtual Device Block (AIF VDEV) to serialize the presentation of a simulated AI to a virtual machine. The AIF VDEV is only required for a simulated AI, since a real device can pass the AI directly to a virtual machine. The problem at hand is directly related to the high frequency of z/VM's need to acquire the AIF VDEV Lock to determine whether it can merge a simulated AI with an already pending real AI. When dealing with moderate to high bandwidth data transfers on all QDIO devices, it's possible for a large number of independent completion event tasks to get queued waiting for an exclusive AIF VDEV Lock. This is necessary to determine whether it can architecturally merge the completion event. If the virtual processor z/VM selected to present the interruption is slow in processing for any reason while it holds the AIF VDEV Lock, the queue of pending tasks can get so long, it can exhaust all available memory in the System Execution Space; thus causing the system to ABEND. Additional logic is added to optimize the merging of multiple completion events into a single event. This is accomplished by eliminating the need to acquire the AIF VDEV Lock when a previous completion event task is already pending to do the work. This will prevent no more than a few tasks ever waiting for the lock at any point in time.
Temporary fix
FOR RELEASE VM/ESACP/ESAR710 : PREREQ: VM66302 VM66426 CO-REQ: NONE IF-REQ: NONE FOR RELEASE VM/ESA CP/ESA R720 : PREREQ: VM66426 CO-REQ: NONE IF-REQ: NONE
Comments
APAR Information
APAR number
VM66487
Reported component name
VM CP
Reported component ID
568411202
Reported release
710
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-01-08
Closed date
2021-01-12
Last modified date
2023-08-28
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UM35807 UM35808
Modules/Macros
HCPVAI HCPVIS HCPVQA
Fix information
Fixed component name
VM CP
Fixed component ID
568411202
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG27M"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"710"}]
Document Information
Modified date:
29 August 2023