IBM Support

Resolving zEDC pending conditions by using PCI system services

Preforming disruptive action using PCI system services

Resolving zEDC pending conditions

 
 

Introduction

 

IBM® zEnterprise Data Compression (zEDC) is a combination of the z/OS® V2.1 zEDC capability and the hardware feature zEDC Express (FC 0420) that became available with the IBM zEnterprise EC12 (zEC12) and the IBM zEnterprise BC12 (zBC12), as well as later models except the IBM z15™. zEDC provides compression acceleration capability that enables customer programs to benefit from hardware-based data compression and decompression because it is designed for high-performance, low-latency compression, to reduce processor usage, optimize the performance of compression-related tasks, improve disk usage, and optimize cross-platform exchange of data.

 

The zEDC Express feature is a native PCIe adapter on the IBM Z® server, which is managed as two or four Resource Groups. Occasionally, Licensed Internal Code (LIC) updates can require an entire Resource Group and each of its managed adapters to be toggled offline and then online to complete the LIC updates. For environments where reliable compression services are critical to the delivery of service to customers and applications, it is recommended that you have documented procedures to handle the updates for each Resource Group while also maintaining compression services from redundant zEDC adapters in other Resource Groups.

 

This document provides a holistic procedural approach on how this should be done. The examples and screen captures were taken from an IBM z13® with zEDC adapters that are configured to two Resource Groups, but also apply to the IBM z14®, which will support four Resource Groups. The referenced z/OS commands are supported on all supported z/OS releases.

Preparing for a PCI System Services LIC update

 

When preparing for a PCI system services LIC update, typically because of a pending condition from an MCL installation by your IBM support personnel, it is always a good routine to check for all pending conditions that that need to be resolved by using the Manage PCI System Services task.

 

Using your CPCs Support Element (SE), select the CPC à Change Management à View Internal Code Changes Summary task as displayed in the following figure.

image-20240910094129-1

The following figure displays the pending conditions that need to be resolved.

image-20240910094210-2

To see more details about the PCI System Services pending condition, you can click PCI System Services hyperlink in the View Internal Code Changes Summary window, or use the Manage PCI System Services task by selecting the CPC à Change Management à Manage PCI System Services task as displayed in the following figure.

image-20240910094243-3

The following figure displays the Manage PCI System Services window.

image-20240910094313-4

Use the Select Action pull-down menu and select Update as displayed in the following figure.

image-20240910094358-5

Any FIDs that are online will potentially be disrupted if you proceed. A warning of disruptive operations dialog is displayed in the following figure.

image-20240910094431-6

You need to take these FIDs offline before proceeding.

Performing the disruptive action

 

Before you perform the disruptive action, turn the system over to the z/OS operator to configure offline any resources for the target Resource Group.

z/OS operator configures the resources offline

Use the D PCIE z/OS command to display PCIE resources assigned to any partition listed in the

image-20240910094601-7

FIDs Online table in the PCIE Resource Group Update task. zEDC PFIDs are listed with a Device Type Name of Hardware Accelerator. For any of these zEDC PFIDs, a state of ALLC indicates that this zEDC

device is allocated or in use to the z/OS partition and should be configured offline.

To configure PFIDs offline, use the CONFIGURE PFID z/OS command, which can be abbreviated as CF PFID. For example, to configure PFID 30 offline use the following command.

 

 

CF PFID(030),OFF,FORCE 

 

 

Note: FORCE is required; without it, the PFID will not be taken offline.

 

 

 

 

CF PFID(030),OFF  

IEE148I PFID(30) NOT RECONFIGURED - PCI FUNCTION CURRENTLY IN USE

IEE712I CONFIG PROCESSING COMPLETE IEE505I PFID(30),OFFLINE IEE712I CONFIG PROCESSING COMPLETE 

 

CF PFID(030),OFF,FORCE 

 

 

Repeat the following command for all zEDC PFIDs in the affected Resource Group.

 

 

CF PFID(031),OFF,FORCE

 IEE505I PFID(31),OFFLINE IEE712I CONFIG PROCESSING COMPLETED 

 

 

Verify the PFID states, using the z/OS D PCIE, to confirm they are now in STNBY mode as displayed in the following figure.

image-20240910094736-8Verify that the PFID is physically stopped by observing its PCHID status on the SE, as displayed in the following figure.

image-20240910094807-9

Now that the resources are configured off to all z/OS partitions and are in standby mode, you can perform the disruptive action.

Resource Group update 

 

From the PCI Resource Group Update window, select the Permit disruption of operations for FIDs which remain in the Online state option, then click OK to proceed:

image-20240910094903-10The LIC update is started. During this time, the PCI Support Partition LIC is updated and any pending zEDC MCLs are installed as displayed in the following figures.

image-20240910094956-11

image-20240910095010-12

After the update completes, the PCHID remains in Standby state as displayed in the following figure.

image-20240910095042-13

You need to verify that the pending conditions are resolved as displayed in the following figure.

image-20240910095129-14

z/OS operator configures the resources online

 

Next, the z/OS operator configures the resources online, as displayed in the following figure.

 

 

CF PFID(030),ON 

IEE504I PFID(30),ONLINE IEE712I CONFIG PROCESSING COMPLETE IEE504I PFID(31),ONLINE IEE712I CONFIG PROCESSING COMPLETE

 

CF PFID(031),ON

 

 

Verify that the PFID status is back to ALLC mode by using the z/OS D PCIE again, as displayed in the following figure.

image-20240910095210-15

Confirm that the PFID is physically online by using the SE PCHID display, as shown in the following figure.

image-20240910095240-16

 Another way to confirm that the new LIC level is active is to display its code level and click Advanced Facilities as displayed in the following figures.

image-20240910095326-17

The z/OS D PCIE,PFID command will provides device-specific information that can change as a result of a LIC update as displayed in the following figure.

 

 

D PCIE,PFID(30) 

RESPONSE=SC90 

IQP024I 14.17.15 DISPLAY PCIE 465 

PCIE 0011 ACTIVE 

PFID DEVICE TYPE NAME           STATUS  ASID  JOBNAME   PCHID  VFN 

0030 Hardware Accelerator       ALLC    0012  FPGHWAM   05D0   0001 

CLIENT ASIDS: NONE 

Application Description: zEDC Express 

Device State: Ready 

Adapter Info - Relid: 000000 Arch Level: 03  Build Date: 07/22/2013 Build Count: 00

 

 

Disrupting zEDC service when PFIDs are not taken offline  

 

Note: The examples and messages in the section are captured from different z/OS partitions with different PFIDs than the previous examples that were used before.

 

When zEDC PFIDS are not gracefully taken offline before starting a Resource Group LIC update, you see the following messages from the z/OS partition when the LIC update happens.

 

 

TPN 13:37:24.79  FPG003I DEVICE RECOVERY INITIATED FOR PFID 00000008

                 REASON: EXCESSIVE DEVICE BUSY CONDITION                

                        16 REQUESTS ARE PENDING COMPLETION              

                 ACTION TAKEN: HARDWARE DUMP REQUESTED                   

TPN 13:37:32.77  FPG004I DEVICE RECOVERY UNSUCCESSFUL FOR PFID 00000008

                 REASON: ASYNCHRONOUS RECOVERY HAS NOT COMPLETED             

                 PREVIOUS ACTION: HARDWARE DUMP REQUESTED                    

                 NEW ACTION: REALLOCATE DEVICE                                

TPN 13:38:14.36  FPG004I DEVICE RECOVERY UNSUCCESSFUL FOR PFID 00000008

                 REASON: DEVICE ALLOCATION FAILED                                       

                 PREVIOUS ACTION: REALLOCATE DEVICE                                     

                 NEW ACTION: INTERVENTION REQUIRED         

                             

TPN 13:38:14.36  FPG002E INTERVENTION REQUIRED FOR PFID 00000008, RECOVERY ACTIONS HAVE                                     FAILED

 

 

When the PFID is restored, the following message is displayed.

 

 

TPN 13:39:56.98  IQP034I PCIE FUNCTION 00000006 AVAILABLE FOR CONFIGURATION                  PCIE DEVICE TYPE NAME = (Hardware Accelerator    )

 

 

For cases where no PFIDs remain available during the LIC update because redundant PFIDs were not defined or allocated, the following messages are displayed:

 

 

JH0 13:37:25.12  FPG003I DEVICE RECOVERY INITIATED FOR PFID 0000000F

                 REASON: EXCESSIVE DEVICE BUSY CONDITION                

                         1 REQUESTS ARE PENDING COMPLETION              

                 ACTION TAKEN: HARDWARE DUMP REQUESTED                   

JH0 13:37:27.38  FPG003I DEVICE RECOVERY INITIATED FOR PFID 00000001

                 REASON: QUEUED REQUESTS ARE NOT STARTING               

                         1 REQUESTS ARE PENDING COMPLETION              

                 ACTION TAKEN: HARDWARE DUMP REQUESTED                  

                 DIAG INFO: 0000C800 92C90002 80010034 00000000 00000000 

JH0 13:37:33.42  FPG004I DEVICE RECOVERY UNSUCCESSFUL FOR PFID 0000000F

                 REASON: ASYNCHRONOUS RECOVERY HAS NOT COMPLETED            

                 PREVIOUS ACTION: HARDWARE DUMP REQUESTED                   

                 NEW ACTION: REALLOCATE DEVICE                               

JH0 13:37:35.42  FPG004I DEVICE RECOVERY UNSUCCESSFUL FOR PFID 00000001

                 REASON: ASYNCHRONOUS RECOVERY HAS NOT COMPLETED           

                 PREVIOUS ACTION: HARDWARE DUMP REQUESTED                  

                 NEW ACTION: REALLOCATE DEVICE                               JH0 13:38:48.41  FPG004I DEVICE RECOVERY UNSUCCESSFUL FOR PFID 0000000F         

                 REASON: DEVICE ALLOCATION FAILED                                       

                 PREVIOUS ACTION: REALLOCATE DEVICE                                     

                 NEW ACTION: INTERVENTION REQUIRED                                      

 

JH0 13:38:48.41  FPG002E INTERVENTION REQUIRED FOR PFID 0000000F, RECOVERY ACTIONS HAVE 

                  FAILED                                                                  

JH0 13:38:45.91  IEA045I AN SVC DUMP HAS STARTED AT TIME=13.38.45 DATE=12/02/2020

                 FOR ASIDS(0015,0014)                                                 

                 ERROR ID = SEQ16988 CPU00 ASID0015 TIME13.38.44.9                    

                 QUIESCE = YES                                                         

JH0 13:38:48.93  IEA794I SVC DUMP HAS CAPTURED: 598                              

                 DUMPID=002 REQUESTED BY JOB (FPGHWAM )                          

                 DUMP TITLE=COMPON=IQP,COMPID=SCIQP,ISSUER=IQPMIPCE,MODULE=IQPPR 

                            ALL+0EC0,ABEND=S0111,REASON=0D0C0531                 

                 DUMP CAPTURED USING OPTIMIZE=YES                                

 

 

JH0 13:39:51.99  FPG004I DEVICE RECOVERY UNSUCCESSFUL FOR PFID 00000001            

                 REASON: DEVICE ALLOCATION FAILED                                      

                 PREVIOUS ACTION: REALLOCATE DEVICE                                    

                 NEW ACTION: INTERVENTION REQUIRED   

                                  

JH0 13:39:51.99  FPG002E INTERVENTION REQUIRED FOR PFID 00000001, RECOVERY ACTIONS HAVE

                  FAILED                                                               

 

 

 

One potential impact result in compression failure of compressed SMF records being written when the last PFID is taken away, the following message is displayed.

 

JH0 13:39:57.00  IFA730E COMPRESSION FAILED FOR SMF                         

                  FOR IFASMF.SMF70T79.JH0                                            

 

 

zFS file system compression failure is another potential impact when the last PFID is taken away. The following message is displayed.

 

 

JH0 13:38:11.38 *IOEZ00605I Task asid=02C7 tcb=007BEE88 appears to be delayed  JH0 13:38:11.38 *IOEZ00524I zFS has a potentially hanging thread caused by: asid=x003C                  tcb=007B66C8, asid=x0013 tcb=00000000.

JH0 13:38:11.38  IOEZ00340E Potential zFS hang detected. Taking informational dump...  

 

JH0 13:38:11.40  IEA045I AN SVC DUMP HAS STARTED AT TIME=13.38.11 DATE=12/02/2020 

                 FOR ASIDS(0013,003C)                                                

                 QUIESCE = YES                                                        

JH0 13:38:15.74  IEA794I SVC DUMP HAS CAPTURED:

                 DUMPID=001 REQUESTED BY JOB (OMVS    )

                 DUMP TITLE=zFS abend 02C3 EA580341 Dec  2 18:38:38 -- HANG DETE

                            CTED                  DUMP CAPTURED USING OPTIMIZE=YES                

 

 

These types of errors, while most often transient, might have an impact on in-flight z/OS compression tasks, causing data access failures, slow response, or even hangs.

 

PFID restored after disruption

 

When the PFID is restored, both zFS and SMF can resume by using the zEDC services for new hardware compression requests. 

 

 

JH0 13:39:58.50  FPG005I DEVICE RECOVERY SUCCESSFUL FOR PFID 00000003    

 

JH0 13:40:00.01  FPG005I DEVICE RECOVERY SUCCESSFUL FOR PFID 00000007    

 

JH0 13:40:00.41  IFA731I COMPRESSION ACTIVE FOR SMF 775                                     FOR IFASMF.SMF70T79.JH0     

 

 

 

Conclusion

The procedures provided in this document, can alleviate any concern when you are planning for MCLs that might require additional toggling actions for zEDC. Depending on your installation’s preferences, it might help you to design automated steps that make the update process less tedious and decrease the time that is needed for de-configuring and reconfiguring the affected resource in multiple partitions. Following the steps allow you to non-disruptively toggle the PFIDs on zEDC Express without causing any disruption to your zEDC services.

 

             

About this document

This document builds upon the work that is done by John Troy and Roger Sower in their IBM Service roles, by adding a more holistic procedure that combines an SE task and z/OS operations. It is the intention of the authors to update this document in the future. IBM publications may be updated to reflect the information contained herein, superseding this document with changes from newer generations of servers and technology.

Other references

Redbook - IBM zEnterprise Data Compression redbook : Reduce Storage Occupancy and Increase Operations Efficiency with IBM zEnterprise Data Compression

 

Feedback 

Send comments or suggestions for changes to ggobehi@us.ibm.com.

 

Trademarks

A full list of U.S. trademarks owned by IBM may be found at:

https://www.ibm.com/legal/copytrade

 

Acknowledgements 

The authors want to thank all the many contributors and reviewers of this document. A special thanks to Christine Axnix, and Otto Wohlmuth from IBM as well as Frank S K Leung.



[{"Type":"MASTER","Line of Business":{"code":"LOB56","label":"Z HW"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"HW187","label":"IBM Z"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]},{"Type":"MASTER","Line of Business":{"code":"LOB16","label":"Mainframe HW"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"HW11S","label":"IBM z14"},"ARM Category":[{"code":"a8m3p000000F93IAAS","label":"CEC-\u003EPCIE"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

More support for:
IBM Z

Document number:
7167982

Modified date:
21 November 2024

UID

ibm17167982

Manage My Notification Subscriptions