Steps for diagnosing and resolving a zFS hang
About this task
Perform the following steps when a hang condition is suspected.
Procedure
- Continually monitor for the following messages:
- IOEZ00524I
- zFS has a potentially hanging thread that is caused by: UserList, where: UserList is a list of address space IDs and TCB addresses causing the hang.
- IOEZ00547I
- zFS has a potentially hanging XCF request on systems: Systemnames, where: Systemnames is the list of system names.
To start investigating, if in a sysplex file sharing environment check for message IOEZ00547I (hanging XCF request), which can indicate an XCF issue. If you see this message:
- Check the status of XCF on each system in the sysplex.
- Check for any outstanding message that might need a response to determine whether a system is leaving the sysplex or not (for example, IXC402D). The wait for a response to the message might appear to be a zFS hang.
If there is no apparent problem with XCF, continue diagnosis and resolution of the hang by looking for the following messages in syslog or on the operator console. Check each system in the sysplex if applicable.- IOEZ00604I or IOEZ00660I
- The delay is outside of zFS. zFS called the identified system service and is waiting for a response. Investigate the identified system service. The problem is likely not with zFS.
- IOEZ00605I
- The delay is either in zFS or in a system service that zFS did not specifically identify in message IOEZ00604I. zFS cannot determine whether there is a hang, a slowdown, or some other system problem. To take action, look for other symptoms. For example, if you see messages about components that are using a significant amount of auxiliary storage, resolve the auxiliary storage shortage. If the message persists, continue to the next step.
- Enter the MODIFY ZFS,QUERY,THREADS command to determine
whether any zFS threads are hanging and why.
The type and amount of information that is displayed as a result of this command is for internal use and can vary between releases or service levels. For an example, see Figure 1.
- Enter the DISPLAY A,ZFS command to determine the zFS ASID.
- Enter MODIFY ZFS,QUERY,THREADS at one to two-minute intervals for six minutes.
- Check the output for any user tasks (tasks that do not show the zFS ASID) that are repeatedly in the same state during the time you requested MODIFY ZFS,QUERY,THREADS. If there is a hang, the task that is hanging persists unchanged over the course of this time span. If the information is different each time, there is no hang.
- If message IOEZ00581E is highlighted in white on the console,
there are or recently were quiesced zFS aggregates. Verify that no
zFS aggregates are in the QUIESCED state by checking their status
using the zfsadm lsaggr, zfsadm aggrinfo
-long, or zfsadm fsinfo command.
For example, quiesced aggregates are displayed as follows:
orDCESVPI:/home/susvpi/> zfsadm lsaggr IOEZ00106I A total of 1 aggregates are attached SUSVPI.HIGHRISK.TEST DCESVPI R/W QUIESCE DCESVPI:/home/susvpi/> zfsadm aggrinfo IOEZ00370I A total of 1 aggregates are attached. SUSVPI.HIGHRISK.TEST (R/W COMP QUIESCED): 35582 K free out of total 36000 DCESVPI:/home/susvpi/>
This example shows how to determine which aggregates are quiesced with the owner information.DCESVPI:/home/susvpi/> zfsadm aggrinfo susvpi.highrisk.test1.zfs -long SUSVPI.HIGHRISK.TEST1.ZFS (R/W COMP QUIESCED): 50333 K free out of total 72000 version 1.4 auditfid 00000000 00000000 0000 6289 free 8k blocks; 21 free 1K fragments 720 K log file; 40 K filesystem table 16 K bitmap file Quiesced by job SUSVPI5 on system DCESVPI on Tue Jan 3 13:36:37 2013
> ./zfsadm fsinfo -select Q PLEX.DCEIMGNJ.FS4 DCEIMGNJ RW,RS,Q PLEX.DCEIMGNK.FS6 DCEIMGNK RW,RS,Q Legend: RW=Read-write,Q=Quiesced,RS=Mounted RWSHARE
If the hang condition prevents you from issuing shell commands, you can also issue the MODIFY ZFS,QUERY,FILE,ALL command to determine whether any file systems are quiesced. As Figure 1 shows, a quiesced file system is identified by a "Q" in the flg column.
Resolve the QUIESCED state before continuing to the next step. The hang condition message can remain on the console for up to a minute after the aggregate is unquiesced.
Message IOEZ00581E appears on the zFS owning systems that contain at least one zFS aggregate that is quiesced. There is a delay between the time that the aggregate is quiesced and the time that the message appears. Typically, this time delay is about 30 seconds. You can control this time delay by using the IOEFSPRM QUIESCE_MESSAGE_DELAY option. This option allows you to specify that the delay should be longer than 30 seconds before the IOEZ00581E message is first displayed. When there are no quiesced zFS aggregates on the system, this message is removed from the console.
There is also a delay between the time that the last aggregate is unquiesced and the time that the message is removed from the console. This message is handled by a thread that wakes up every 30 seconds and checks for any quiesced aggregates that are owned by this system. It is possible for an aggregate to be quiesced and unquiesced in the 30-second sleep window of the thread and not produce a quiesce message. This message remains if one aggregate is unquiesced and another is quiesced within the 30-second sleep window.
- Check whether any user tasks are hung, focusing on the
tasks that are identified by message IOEZ00524I or message IOEZ00660I.
User tasks do not have the same address space identifier (ASID) as
the zFS address space. One or more threads consistently at the same
location might indicate a hang (for example, Recov, TCB, ASID Stack,
Routine, State). The threads in the zFS address space with the zFS
ASID (for example, xcf_server) are typically waiting for work. It
is typical for the routine these threads are waiting in to have the
same name as the entry routine. For an example, see Figure 1.
If successive iterations of the MODIFY ZFS,QUERY,THREADS command show that the STK/Recov, TCB, ASID, Routine, and State for a thread are constant, it is probable that this thread is hung.
Figure 1. Example of how to check whether user tasks are hungzFS and z/OS UNIX Tasks ----------------------- STK/Recov TCB ASID Stack Routine State ---------- -------- ---- ---------- -------- -------- 48338F0000 005CABE8 005A 48338F0700 ZFSRDWR OSIWAIT 48000AF8F0 since Oct 14 04:15:57 2014 Current DSA: 48338F2D38 wait code location offset=0ACA rtn=allocate_pages wait for resource=7BCC6330 0 resource description=VNOPS user file cache page reclaim wait ReadLock held for 4823FDBF50 state=2 0 lock description=Vnode-cache access lock Operation counted for OEVFS=7E7EC190 VOLP=4826660200 fs=PLEX.ZFS.SMALL1 48338E8000 005CA1D0 00B8 48338E8810 ZFSCREAT WAITLOCK 48000B0640 since Oct 14 04:15:57 2014 Current DSA: 48338EB5C8 wait code location offset=3D74 rtn=epit4_Allocate lock=48203E30F0 state=80000048000D6AA1 owner=(48000D6AA0 00B85CA830) lock description=ANODETB status area lock ReadLock held for 4833F0DE50 state=A 0 lock description=Vnode-cache access lock ReadLock held for 4833F0DEC0 state=8 0 lock description=Vnode lock ReadLock held for 482060CC20 state=7 7A94FEF0 lock description=Vnode lock ReadLock held for 482606BA00 state=4 0 lock description=Anode fileset handle lock ReadLock held for 48203E30E0 state=4 0 lock description=ANODETB main update lock Resource 4833F0DE40 1A held resource description=STKC held token by local user task Resource 4826661800 17 held resource description=ANODE maximum transactions started for a Resource 4830D68580 2F held resource description=Transaction in progress Operation counted for OEVFS=7AB8DA20 VOLP=4826661A00 fs=ZFSAGGR.BIGZFS.DHH.FS1.EXTATTR 48338E0000 005C12F8 0084 48338E0700 ZFSRDWR WAITLOCK 48000B1390 since Oct 14 04:15:57 2014 Current DSA: 48338E23C8 wait code location offset=4940 rtn=stkc_getTokenLocked lock=4823F8CFD0 state=5 owner=(2 read holders) lock description=Vnode-cache access lock Operation counted for OEVFS=7AB8D1E0 VOLP=4826663200 fs=ZFSAGGR.BIGZFS.DHH.FS6.EXTATTR 48338D8000 005CAD80 0079 48338D8700 ZFSRDWR OSIWAIT 48000B20E0 since Oct 14 04:15:57 2014 Current DSA: 48338DAE38 wait code location offset=0ACA rtn=allocate_pages wait for resource=7BCC6330 0 resource description=VNOPS user file cache page reclaim wait ReadLock held for 4823F49F10 state=A 0 lock description=Vnode-cache access lock Operation counted for OEVFS=7AB8D1E0 VOLP=4826663200 fs=ZFSAGGR.BIGZFS.DHH.FS6.EXTATTR 48338D0000 005CAA50 00B7 48338D0810 ZFSCREAT RUNNING 48000B2E30 since Oct 14 04:15:57 2014 ReadLock held for 7E5C2670 state=2 0 lock description=Cache Services hashtable resize lock Resource 4823FF4820 1A held resource description=STKC held token by local user task Resource 4826661E00 17 held resource description=ANODE maximum transactions started for a Resource 4831569A80 2F held resource description=Transaction in progress Operation counted for OEVFS=7AB8D810 VOLP=4826662000 fs=ZFSAGGR.BIGZFS.DHH.FS2.EXTATTR 48338C8000 005CABE8 00A6 48338C8700 ZFSRDWR OSIWAIT 48000B3B80 since Oct 14 04:15:57 2014 Current DSA: 48338CAD38 wait code location offset=0ACA rtn=allocate_pages wait for resource=7BCC6330 0 resource description=VNOPS user file cache page reclaim wait ReadLock held for 4835B3ABD0 state=6 0 lock description=Vnode-cache access lock Operation counted for OEVFS=7E7EC190 VOLP=4826660200 fs=PLEX.ZFS.SMALL1 7F37B000 005D5528 0044 7F37C000 openclose_task RUNNING since Oct 14 03:43:35 2014 7F3B4000 005F81D0 0044 7F3B5000 CNMAIN WAITING since Oct 14 02:58:01 2014 7BC45000 005C19C0 0044 7BC46000 comm_daemon RUNNING 4800004290 since Oct 14 04:15:57 2014
- IBM® Support must
have dumps of zFS, OMVS and the OMVS data spaces and also possibly
the user address space identified on any preceding IOEZ00605 for problem
resolution. Obtain and save SYSLOG and dumps of zFS, OMVS and the
OMVS data spaces , and the user ASID using JOBNAME=(OMVS,ZFS,user_jobname),DSPNAME=('OMVS'.*) in
your reply to the DUMP command. If you are running in a sysplex and
zFS is running on other systems in the sysplex, dump all the systems
in the sysplex where zFS is running, dumping zFS, OMVS and OMVS data
spaces. The following is an example of the DUMP command:
DUMP COMM=(zfs hang) R x,JOBNAME=(OMVS,ZFS),SDATA=(RGN,LPA,SQA,LSQA,PSA,CSA,GRSQ,TRT,SUM,COUPLE), JOBNAME=(OMVS,ZFS,user_jobname) DSPNAME=('OMVS'.*),END
Do not specify the job name ZFS if zFS is running inside the OMVS address space.
You must capture dumps for IBM Support before taking any recovery actions (HANGBREAK, CANCEL, ABORT).
- If you know which user task is hung (for example, returned in IOEZ00524I or determined to be hung after review of the output from repeated MODIFY ZFS,QUERY,THREADS,OLDEST commands), consider entering the CANCEL or STOP command to clear that task from the system.
- Finally, if the previous steps do not
clear the hang, issue the MODIFY ZFS,ABORT command to initiate a zFS
internal restart.
An internal restart causes the zFS kernel (IOEFSKN) to end and then restart, under control of the zFS controller task (IOEFSCM). The zFS address space does not end and the z/OS® UNIX mount tree is preserved. During the internal restart, requests that are already in the zFS address space fail and new requests are suspended. File systems owned by zFS on the system that is doing the internal restart become temporarily unowned. These file systems are taken over by other zFS systems (or by the zFS system doing the internal restart when it completes the internal restart). When the internal restart is complete, the suspended new requests resume.
If you question the hang condition or if the MODIFY ZFS,ABORT command does not resolve the situation, contact IBM Support and provide all the dumps and SYSLOG information.