Steps for diagnosing and resolving a zFS hang

About this task

Perform the following steps when a hang condition is suspected.

Procedure

  1. Continually monitor for the following messages:
    IOEZ00524I
    zFS has a potentially hanging thread that is caused by: UserList, where: UserList is a list of address space IDs and TCB addresses causing the hang.
    IOEZ00547I
    zFS has a potentially hanging XCF request on systems: Systemnames, where: Systemnames is the list of system names.

    To start investigating, if in a sysplex file sharing environment check for message IOEZ00547I (hanging XCF request), which can indicate an XCF issue. If you see this message:

    1. Check the status of XCF on each system in the sysplex.
    2. Check for any outstanding message that might need a response to determine whether a system is leaving the sysplex or not (for example, IXC402D). The wait for a response to the message might appear to be a zFS hang.
    If there is no apparent problem with XCF, continue diagnosis and resolution of the hang by looking for the following messages in syslog or on the operator console. Check each system in the sysplex if applicable.
    IOEZ00604I or IOEZ00660I
    The delay is outside of zFS. zFS called the identified system service and is waiting for a response. Investigate the identified system service. The problem is likely not with zFS.
    IOEZ00605I
    The delay is either in zFS or in a system service that zFS did not specifically identify in message IOEZ00604I. zFS cannot determine whether there is a hang, a slowdown, or some other system problem. To take action, look for other symptoms. For example, if you see messages about components that are using a significant amount of auxiliary storage, resolve the auxiliary storage shortage. If the message persists, continue to the next step.
  2. Enter the MODIFY ZFS,QUERY,THREADS command to determine whether any zFS threads are hanging and why.

    The type and amount of information that is displayed as a result of this command is for internal use and can vary between releases or service levels. For an example, see Figure 1.

  3. Enter the DISPLAY A,ZFS command to determine the zFS ASID.
  4. Enter MODIFY ZFS,QUERY,THREADS at one to two-minute intervals for six minutes.
  5. Check the output for any user tasks (tasks that do not show the zFS ASID) that are repeatedly in the same state during the time you requested MODIFY ZFS,QUERY,THREADS. If there is a hang, the task that is hanging persists unchanged over the course of this time span. If the information is different each time, there is no hang.
  6. If message IOEZ00581E is highlighted in white on the console, there are or recently were quiesced zFS aggregates. Verify that no zFS aggregates are in the QUIESCED state by checking their status using the zfsadm lsaggr, zfsadm aggrinfo -long, or Start of changezfsadm fsinfoEnd of change command. For example, quiesced aggregates are displayed as follows:
    DCESVPI:/home/susvpi/> zfsadm lsaggr
    IOEZ00106I A total of 1 aggregates are attached
    SUSVPI.HIGHRISK.TEST                            DCESVPI   R/W QUIESCE
    DCESVPI:/home/susvpi/> zfsadm aggrinfo
    IOEZ00370I A total of 1 aggregates are attached.
    SUSVPI.HIGHRISK.TEST (R/W COMP QUIESCED): 35582 K free out of total 36000 
    DCESVPI:/home/susvpi/>  
    or
    DCESVPI:/home/susvpi/> zfsadm aggrinfo susvpi.highrisk.test1.zfs -long
    SUSVPI.HIGHRISK.TEST1.ZFS (R/W COMP QUIESCED): 50333 K free out of total 72000
    version 1.4
    auditfid 00000000 00000000 0000
    6289 free 8k blocks; 21 free 1K fragments
    720 K log file; 40 K filesystem table
    16 K bitmap file
    Quiesced by job SUSVPI5 on system DCESVPI on Tue Jan 3 13:36:37 2013   
    Start of changeThis example shows how to determine which aggregates are quiesced with the owner information.End of changeStart of change
    > ./zfsadm fsinfo -select Q
    PLEX.DCEIMGNJ.FS4                            DCEIMGNJ RW,RS,Q
    PLEX.DCEIMGNK.FS6                            DCEIMGNK RW,RS,Q
    
    Legend: RW=Read-write,Q=Quiesced,RS=Mounted RWSHARE
    End of change

    If the hang condition prevents you from issuing shell commands, you can also issue the MODIFY ZFS,QUERY,FILE,ALL command to determine whether any file systems are quiesced. As Figure 1 shows, a quiesced file system is identified by a "Q" in the flg column.

    Resolve the QUIESCED state before continuing to the next step. The hang condition message can remain on the console for up to a minute after the aggregate is unquiesced.

    Message IOEZ00581E appears on the zFS owning systems that contain at least one zFS aggregate that is quiesced. There is a delay between the time that the aggregate is quiesced and the time that the message appears. Typically, this time delay is about 30 seconds. You can control this time delay by using the IOEFSPRM QUIESCE_MESSAGE_DELAY option. This option allows you to specify that the delay should be longer than 30 seconds before the IOEZ00581E message is first displayed. When there are no quiesced zFS aggregates on the system, this message is removed from the console.

    There is also a delay between the time that the last aggregate is unquiesced and the time that the message is removed from the console. This message is handled by a thread that wakes up every 30 seconds and checks for any quiesced aggregates that are owned by this system. It is possible for an aggregate to be quiesced and unquiesced in the 30-second sleep window of the thread and not produce a quiesce message. This message remains if one aggregate is unquiesced and another is quiesced within the 30-second sleep window.

  7. Check whether any user tasks are hung, focusing on the tasks that are identified by message IOEZ00524I or message IOEZ00660I. User tasks do not have the same address space identifier (ASID) as the zFS address space. One or more threads consistently at the same location might indicate a hang (for example, Recov, TCB, ASID Stack, Routine, State). The threads in the zFS address space with the zFS ASID (for example, xcf_server) are typically waiting for work. It is typical for the routine these threads are waiting in to have the same name as the entry routine. For an example, see Figure 1.

    If successive iterations of the MODIFY ZFS,QUERY,THREADS command show that the STK/Recov, TCB, ASID, Routine, and State for a thread are constant, it is probable that this thread is hung.

    Start of change
    Figure 1. Example of how to check whether user tasks are hung
    End of change Start of change
    zFS and z/OS UNIX Tasks                                   
                -----------------------                                   
    STK/Recov     TCB     ASID   Stack      Routine               State   
    ----------  --------  ----  ----------  --------              --------
    48338F0000  005CABE8  005A  48338F0700  ZFSRDWR               OSIWAIT 
    48000AF8F0                                                            
           since  Oct 14 04:15:57 2014  Current DSA: 48338F2D38           
           wait code location offset=0ACA rtn=allocate_pages              
      wait for resource=7BCC6330 0                                        
           resource description=VNOPS user file cache page reclaim wait   
        ReadLock held for 4823FDBF50 state=2 0                            
           lock description=Vnode-cache access lock                       
      Operation counted for OEVFS=7E7EC190  VOLP=4826660200               
    fs=PLEX.ZFS.SMALL1                                                    
                                                                          
    48338E8000  005CA1D0  00B8  48338E8810  ZFSCREAT              WAITLOCK
    48000B0640                                                            
           since  Oct 14 04:15:57 2014  Current DSA: 48338EB5C8           
           wait code location offset=3D74 rtn=epit4_Allocate             
      lock=48203E30F0 state=80000048000D6AA1 owner=(48000D6AA0 00B85CA830)                                                              
           lock description=ANODETB status area lock                     
        ReadLock held for 4833F0DE50 state=A 0                           
           lock description=Vnode-cache access lock                      
        ReadLock held for 4833F0DEC0 state=8 0                           
           lock description=Vnode lock                                   
        ReadLock held for 482060CC20 state=7 7A94FEF0                    
           lock description=Vnode lock                                   
        ReadLock held for 482606BA00 state=4 0                           
           lock description=Anode fileset handle lock                    
        ReadLock held for 48203E30E0 state=4 0                           
           lock description=ANODETB main update lock                     
        Resource 4833F0DE40 1A held                                      
           resource description=STKC held token by local user task       
        Resource 4826661800 17 held                                      
           resource description=ANODE maximum transactions started for a 
        Resource 4830D68580 2F held                                       
           resource description=Transaction in progress                   
      Operation counted for OEVFS=7AB8DA20  VOLP=4826661A00               
    fs=ZFSAGGR.BIGZFS.DHH.FS1.EXTATTR                                     
                                                                          
    48338E0000  005C12F8  0084  48338E0700  ZFSRDWR               WAITLOCK
    48000B1390                                                            
           since  Oct 14 04:15:57 2014  Current DSA: 48338E23C8           
           wait code location offset=4940 rtn=stkc_getTokenLocked         
      lock=4823F8CFD0 state=5 owner=(2 read holders)                      
           lock description=Vnode-cache access lock                       
      Operation counted for OEVFS=7AB8D1E0  VOLP=4826663200               
    fs=ZFSAGGR.BIGZFS.DHH.FS6.EXTATTR                                     
                                                                          
    48338D8000  005CAD80  0079  48338D8700  ZFSRDWR               OSIWAIT 
    48000B20E0                                                            
           since  Oct 14 04:15:57 2014  Current DSA: 48338DAE38           
           wait code location offset=0ACA rtn=allocate_pages              
      wait for resource=7BCC6330 0                                        
           resource description=VNOPS user file cache page reclaim wait   
        ReadLock held for 4823F49F10 state=A 0                            
           lock description=Vnode-cache access lock                       
      Operation counted for OEVFS=7AB8D1E0  VOLP=4826663200               
    fs=ZFSAGGR.BIGZFS.DHH.FS6.EXTATTR                                     
                                                                          
    48338D0000  005CAA50  00B7  48338D0810  ZFSCREAT              RUNNING 
    48000B2E30                                                            
           since  Oct 14 04:15:57 2014                                    
        ReadLock held for 7E5C2670 state=2 0                              
           lock description=Cache Services hashtable resize lock          
        Resource 4823FF4820 1A held                                       
           resource description=STKC held token by local user task        
        Resource 4826661E00 17 held                                       
           resource description=ANODE maximum transactions started for a  
        Resource 4831569A80 2F held                                       
           resource description=Transaction in progress                   
      Operation counted for OEVFS=7AB8D810  VOLP=4826662000              
    fs=ZFSAGGR.BIGZFS.DHH.FS2.EXTATTR                                    
                                                                         
    48338C8000  005CABE8  00A6  48338C8700  ZFSRDWR               OSIWAIT
    48000B3B80                                                           
           since  Oct 14 04:15:57 2014  Current DSA: 48338CAD38          
           wait code location offset=0ACA rtn=allocate_pages             
      wait for resource=7BCC6330 0                                       
           resource description=VNOPS user file cache page reclaim wait  
        ReadLock held for 4835B3ABD0 state=6 0                           
           lock description=Vnode-cache access lock                      
      Operation counted for OEVFS=7E7EC190  VOLP=4826660200              
    fs=PLEX.ZFS.SMALL1                                                   
    
      7F37B000  005D5528  0044    7F37C000  openclose_task        RUNNING 
           since  Oct 14 03:43:35 2014                                    
                                                                          
      7F3B4000  005F81D0  0044    7F3B5000  CNMAIN                WAITING 
           since  Oct 14 02:58:01 2014                                    
    
      7BC45000  005C19C0  0044    7BC46000  comm_daemon           RUNNING 
    4800004290                                                            
           since  Oct 14 04:15:57 2014                                    
    End of change
  8. IBM® Support must have dumps of zFS, OMVS and the OMVS data spaces and also possibly the user address space identified on any preceding IOEZ00605 for problem resolution. Obtain and save SYSLOG and dumps of zFS, OMVS and the OMVS data spaces , and the user ASID using JOBNAME=(OMVS,ZFS,user_jobname),DSPNAME=('OMVS'.*) in your reply to the DUMP command. If you are running in a sysplex and zFS is running on other systems in the sysplex, dump all the systems in the sysplex where zFS is running, dumping zFS, OMVS and OMVS data spaces. The following is an example of the DUMP command:
    DUMP COMM=(zfs hang)
    R x,JOBNAME=(OMVS,ZFS),SDATA=(RGN,LPA,SQA,LSQA,PSA,CSA,GRSQ,TRT,SUM,COUPLE),
    JOBNAME=(OMVS,ZFS,user_jobname)
    DSPNAME=('OMVS'.*),END

    Start of changeDo not specify the job name ZFS if zFS is running inside the OMVS address space.End of change

    You must capture dumps for IBM Support before taking any recovery actions (HANGBREAK, CANCEL, ABORT).

  9. If you know which user task is hung (for example, returned in IOEZ00524I or determined to be hung after review of the output from repeated MODIFY ZFS,QUERY,THREADS,OLDEST commands), consider entering the CANCEL or STOP command to clear that task from the system.
  10. Finally, if the previous steps do not clear the hang, issue the MODIFY ZFS,ABORT command to initiate a zFS internal restart.

    An internal restart causes the zFS kernel (IOEFSKN) to end and then restart, under control of the zFS controller task (IOEFSCM). The zFS address space does not end and the z/OS® UNIX mount tree is preserved. During the internal restart, requests that are already in the zFS address space fail and new requests are suspended. File systems owned by zFS on the system that is doing the internal restart become temporarily unowned. These file systems are taken over by other zFS systems (or by the zFS system doing the internal restart when it completes the internal restart). When the internal restart is complete, the suspended new requests resume.

    If you question the hang condition or if the MODIFY ZFS,ABORT command does not resolve the situation, contact IBM Support and provide all the dumps and SYSLOG information.