Steps for diagnosing and resolving a zFS hang

About this task

Perform the following steps when a hang condition is suspected.

Procedure

Continually monitor for the following messages:
IOEZ00524I

zFS has a potentially hanging thread that is caused by: UserList, where: UserList is a list of address space IDs and TCB addresses causing the hang.

IOEZ00547I

zFS has a potentially hanging XCF request on systems: Systemnames, where: Systemnames is the list of system names.

To start investigating, if in a sysplex file sharing environment check for message IOEZ00547I (hanging XCF request), which can indicate an XCF issue. If you see this message:
1. Check the status of XCF on each system in the sysplex.
2. Check for any outstanding message that might need a response to determine whether a system is leaving the sysplex or not (for example, IXC402D). The wait for a response to the message might appear to be a zFS hang.
If there is no apparent problem with XCF, continue diagnosis and resolution of the hang by looking for the following messages in syslog or on the operator console. Check each system in the sysplex if applicable.
IOEZ00604I or IOEZ00660I

The delay is outside of zFS. zFS called the identified system service and is waiting for a response. Investigate the identified system service. The problem is likely not with zFS.

IOEZ00605I

The delay is either in zFS or in a system service that zFS did not specifically identify in message IOEZ00604I. zFS cannot determine whether there is a hang, a slowdown, or some other system problem. To take action, look for other symptoms. For example, if you see messages about components that are using a significant amount of auxiliary storage, resolve the auxiliary storage shortage. If the message persists, continue to the next step.
Enter the MODIFY ZFS,QUERY,THREADS command to determine whether any zFS threads are hanging and why.
The type and amount of information that is displayed as a result of this command is for internal use and can vary between releases or service levels. For an example, see Figure 1.
Enter the DISPLAY A,ZFS command to determine the zFS ASID.
Enter MODIFY ZFS,QUERY,THREADS at one to two-minute intervals for six minutes.
Check the output for any user tasks (tasks that do not show the zFS ASID) that are repeatedly in the same state during the time you requested MODIFY ZFS,QUERY,THREADS. If there is a hang, the task that is hanging persists unchanged over the course of this time span. If the information is different each time, there is no hang.
If message IOEZ00581E is highlighted in white on the console, there are or recently were quiesced zFS aggregates. Verify that no zFS aggregates are in the QUIESCED state by checking their status using the zfsadm lsaggr, zfsadm aggrinfo -long, or zfsadm fsinfo command. For example, quiesced aggregates are displayed as follows:
```
DCESVPI:/home/susvpi/> zfsadm lsaggr
IOEZ00106I A total of 1 aggregates are attached
SUSVPI.HIGHRISK.TEST                            DCESVPI   R/W QUIESCE
DCESVPI:/home/susvpi/> zfsadm aggrinfo
IOEZ00370I A total of 1 aggregates are attached.
SUSVPI.HIGHRISK.TEST (R/W COMP QUIESCED): 35582 K free out of total 36000 
DCESVPI:/home/susvpi/>  
```
or
```
DCESVPI:/home/susvpi/> zfsadm aggrinfo susvpi.highrisk.test1.zfs -long
SUSVPI.HIGHRISK.TEST1.ZFS (R/W COMP QUIESCED): 50333 K free out of total 72000
version 1.4
auditfid 00000000 00000000 0000
6289 free 8k blocks; 21 free 1K fragments
720 K log file; 40 K filesystem table
16 K bitmap file
Quiesced by job SUSVPI5 on system DCESVPI on Tue Jan 3 13:36:37 2013   
```
This example shows how to determine which aggregates are quiesced with the owner information.
```
> ./zfsadm fsinfo -select Q
PLEX.DCEIMGNJ.FS4                            DCEIMGNJ RW,RS,Q
PLEX.DCEIMGNK.FS6                            DCEIMGNK RW,RS,Q

Legend: RW=Read-write,Q=Quiesced,RS=Mounted RWSHARE
```
If the hang condition prevents you from issuing shell commands, you can also issue the MODIFY ZFS,QUERY,FILE,ALL command to determine whether any file systems are quiesced. As Figure 1 shows, a quiesced file system is identified by a "Q" in the flg column.

Resolve the QUIESCED state before continuing to the next step. The hang condition message can remain on the console for up to a minute after the aggregate is unquiesced.

Message IOEZ00581E appears on the zFS owning systems that contain at least one zFS aggregate that is quiesced. There is a delay between the time that the aggregate is quiesced and the time that the message appears. Typically, this time delay is about 30 seconds. You can control this time delay by using the IOEFSPRM QUIESCE_MESSAGE_DELAY option. This option allows you to specify that the delay should be longer than 30 seconds before the IOEZ00581E message is first displayed. When there are no quiesced zFS aggregates on the system, this message is removed from the console.

There is also a delay between the time that the last aggregate is unquiesced and the time that the message is removed from the console. This message is handled by a thread that wakes up every 30 seconds and checks for any quiesced aggregates that are owned by this system. It is possible for an aggregate to be quiesced and unquiesced in the 30-second sleep window of the thread and not produce a quiesce message. This message remains if one aggregate is unquiesced and another is quiesced within the 30-second sleep window.

Check whether any user tasks are hung, focusing on the tasks that are identified by message IOEZ00524I or message IOEZ00660I. User tasks do not have the same address space identifier (ASID) as the zFS address space. One or more threads consistently at the same location might indicate a hang (for example, Recov, TCB, ASID Stack, Routine, State). The threads in the zFS address space with the zFS ASID (for example, xcf_server) are typically waiting for work. It is typical for the routine these threads are waiting in to have the same name as the entry routine. For an example, see Figure 1.

If successive iterations of the MODIFY ZFS,QUERY,THREADS command show that the STK/Recov, TCB, ASID, Routine, and State for a thread are constant, it is probable that this thread is hung.

Figure 1. Example of how to check whether user tasks are hung

zFS and z/OS UNIX Tasks                                   
            -----------------------                                   
STK/Recov     TCB     ASID   Stack      Routine               State   
----------  --------  ----  ----------  --------              --------
48338F0000  005CABE8  005A  48338F0700  ZFSRDWR               OSIWAIT 
48000AF8F0                                                            
       since  Oct 14 04:15:57 2014  Current DSA: 48338F2D38           
       wait code location offset=0ACA rtn=allocate_pages              
  wait for resource=7BCC6330 0                                        
       resource description=VNOPS user file cache page reclaim wait   
    ReadLock held for 4823FDBF50 state=2 0                            
       lock description=Vnode-cache access lock                       
  Operation counted for OEVFS=7E7EC190  VOLP=4826660200               
fs=PLEX.ZFS.SMALL1                                                    
                                                                      
48338E8000  005CA1D0  00B8  48338E8810  ZFSCREAT              WAITLOCK
48000B0640                                                            
       since  Oct 14 04:15:57 2014  Current DSA: 48338EB5C8           
       wait code location offset=3D74 rtn=epit4_Allocate             
  lock=48203E30F0 state=80000048000D6AA1 owner=(48000D6AA0 00B85CA830)                                                              
       lock description=ANODETB status area lock                     
    ReadLock held for 4833F0DE50 state=A 0                           
       lock description=Vnode-cache access lock                      
    ReadLock held for 4833F0DEC0 state=8 0                           
       lock description=Vnode lock                                   
    ReadLock held for 482060CC20 state=7 7A94FEF0                    
       lock description=Vnode lock                                   
    ReadLock held for 482606BA00 state=4 0                           
       lock description=Anode fileset handle lock                    
    ReadLock held for 48203E30E0 state=4 0                           
       lock description=ANODETB main update lock                     
    Resource 4833F0DE40 1A held                                      
       resource description=STKC held token by local user task       
    Resource 4826661800 17 held                                      
       resource description=ANODE maximum transactions started for a 
    Resource 4830D68580 2F held                                       
       resource description=Transaction in progress                   
  Operation counted for OEVFS=7AB8DA20  VOLP=4826661A00               
fs=ZFSAGGR.BIGZFS.DHH.FS1.EXTATTR                                     
                                                                      
48338E0000  005C12F8  0084  48338E0700  ZFSRDWR               WAITLOCK
48000B1390                                                            
       since  Oct 14 04:15:57 2014  Current DSA: 48338E23C8           
       wait code location offset=4940 rtn=stkc_getTokenLocked         
  lock=4823F8CFD0 state=5 owner=(2 read holders)                      
       lock description=Vnode-cache access lock                       
  Operation counted for OEVFS=7AB8D1E0  VOLP=4826663200               
fs=ZFSAGGR.BIGZFS.DHH.FS6.EXTATTR                                     
                                                                      
48338D8000  005CAD80  0079  48338D8700  ZFSRDWR               OSIWAIT 
48000B20E0                                                            
       since  Oct 14 04:15:57 2014  Current DSA: 48338DAE38           
       wait code location offset=0ACA rtn=allocate_pages              
  wait for resource=7BCC6330 0                                        
       resource description=VNOPS user file cache page reclaim wait   
    ReadLock held for 4823F49F10 state=A 0                            
       lock description=Vnode-cache access lock                       
  Operation counted for OEVFS=7AB8D1E0  VOLP=4826663200               
fs=ZFSAGGR.BIGZFS.DHH.FS6.EXTATTR                                     
                                                                      
48338D0000  005CAA50  00B7  48338D0810  ZFSCREAT              RUNNING 
48000B2E30                                                            
       since  Oct 14 04:15:57 2014                                    
    ReadLock held for 7E5C2670 state=2 0                              
       lock description=Cache Services hashtable resize lock          
    Resource 4823FF4820 1A held                                       
       resource description=STKC held token by local user task        
    Resource 4826661E00 17 held                                       
       resource description=ANODE maximum transactions started for a  
    Resource 4831569A80 2F held                                       
       resource description=Transaction in progress                   
  Operation counted for OEVFS=7AB8D810  VOLP=4826662000              
fs=ZFSAGGR.BIGZFS.DHH.FS2.EXTATTR                                    
                                                                     
48338C8000  005CABE8  00A6  48338C8700  ZFSRDWR               OSIWAIT
48000B3B80                                                           
       since  Oct 14 04:15:57 2014  Current DSA: 48338CAD38          
       wait code location offset=0ACA rtn=allocate_pages             
  wait for resource=7BCC6330 0                                       
       resource description=VNOPS user file cache page reclaim wait  
    ReadLock held for 4835B3ABD0 state=6 0                           
       lock description=Vnode-cache access lock                      
  Operation counted for OEVFS=7E7EC190  VOLP=4826660200              
fs=PLEX.ZFS.SMALL1                                                   

  7F37B000  005D5528  0044    7F37C000  openclose_task        RUNNING 
       since  Oct 14 03:43:35 2014                                    
                                                                      
  7F3B4000  005F81D0  0044    7F3B5000  CNMAIN                WAITING 
       since  Oct 14 02:58:01 2014                                    

  7BC45000  005C19C0  0044    7BC46000  comm_daemon           RUNNING 
4800004290                                                            
       since  Oct 14 04:15:57 2014

IBM® Support must have dumps of zFS, OMVS and the OMVS data spaces and also possibly the user address space identified on any preceding IOEZ00605 for problem resolution. Obtain and save SYSLOG and dumps of zFS, OMVS and the OMVS data spaces , and the user ASID using JOBNAME=(OMVS,ZFS,user_jobname),DSPNAME=('OMVS'.*) in your reply to the DUMP command. If you are running in a sysplex and zFS is running on other systems in the sysplex, dump all the systems in the sysplex where zFS is running, dumping zFS, OMVS and OMVS data spaces. The following is an example of the DUMP command:
```
DUMP COMM=(zfs hang)
R x,JOBNAME=(OMVS,ZFS),SDATA=(RGN,LPA,SQA,LSQA,PSA,CSA,GRSQ,TRT,SUM,COUPLE),
JOBNAME=(OMVS,ZFS,user_jobname)
DSPNAME=('OMVS'.*),END
```
Do not specify the job name ZFS if zFS is running inside the OMVS address space.

You must capture dumps for IBM Support before taking any recovery actions (HANGBREAK, CANCEL, ABORT).
If you know which user task is hung (for example, returned in IOEZ00524I or determined to be hung after review of the output from repeated MODIFY ZFS,QUERY,THREADS,OLDEST commands), consider entering the CANCEL or STOP command to clear that task from the system.
Finally, if the previous steps do not clear the hang, issue the MODIFY ZFS,ABORT command to initiate a zFS internal restart.
An internal restart causes the zFS kernel (IOEFSKN) to end and then restart, under control of the zFS controller task (IOEFSCM). The zFS address space does not end and the z/OS® UNIX mount tree is preserved. During the internal restart, requests that are already in the zFS address space fail and new requests are suspended. File systems owned by zFS on the system that is doing the internal restart become temporarily unowned. These file systems are taken over by other zFS systems (or by the zFS system doing the internal restart when it completes the internal restart). When the internal restart is complete, the suspended new requests resume.

If you question the hang condition or if the MODIFY ZFS,ABORT command does not resolve the situation, contact IBM Support and provide all the dumps and SYSLOG information.