A fix is available
APAR status
Closed as program error.
Error description
Various assertion abends,restarts in DFSKERN IOEN00154A Assertion Failed:oldcs == FH_LOCKED Line:383 File:./../../../../src/file/osi/fasthash.c. CORR: 0 IOEN00154A Assertion Failed:!(oldcount.drefcount & FH_PEEK) Line:959 File:./../../../../src/file/osi/fasthash.c. IOEN00154A Assertion Failed:(oldcount.drefcount & FH_LOCKED) != 0 Line:958 File:./../../../../src/file/osi/fasthash.c. CORR: 0 IOEN00407I Signal 3 received, dump in progress. 258 This is a timing issue caused by multiple concurrent access to the same oenode and serialization fails Verification: Extract DFSKERN trace from the dump, review for symptoms similar to the following. Noting the concurrent access, locking and refcount manipulation across various concurrent threads in DFSKERN Ex. See below, but I suspect that thread (001 locks this when doing the kill But then the waiter (00F) gets to run BEFORE thread (001 has completed the release and turned off the FH_LOCKED bit *** Thread assignment: thread=001 asid=0112 tcb=009C2BE0 . . *** Thread assignment: Thread=00F asid=0112 tcb=009C0AD0 . . (00F 20:56:45.166188) do_wait: waiting lock=2DE478F8 state=A0000000 6D777000 elem=6D777000 lockOwnerTCB=009BDCF0 ret=0000030C rtn=MVS_lock_contention . . *Holder posts waiters to run (001 20:56:45.490861) post_waiter: code=0 lock=7F548E28 elem=6D737628 ret=0000007A rtn=post_sleepers (001 20:56:45.490863) post_waiter: code=0 lock=7F548E28 elem=6D777870 ret=0000007A rtn=post_sleepers *releasing and kill. NOTE HOB on it item = FH_LOCKED (001 20:56:45.490865) oefs_relVnode avoid hang oep=2DE47878 old refcnt=80000005 new refct=x00000001 (001 20:56:45.490865) fh_item_kill: item=2DE47878 oldcount=00000001 00000000 newcount=80000001 00000000 count=80000001 failed=0 (001 20:56:45.490865) fh_item_kill: itemp=2DE47878 tablep=196E7D58 refcount=80000000 (001 20:56:45.490866) kill_item: itemp=2DE47878 longterm=00000000 count=80000000 tablep=196E7D58 cost=1 (001 20:56:45.490867) osi_Free: a1=x3DF2AF38 s1=x30 a2=x3DF2AF40 s2=x20 total=270849955 off=x000000E4 subr=osi_Free *thread 00F waiting(with others) for *lock=2DE47890 which is item->lock for oep 2DE47878 *held by thread 001 (00F 20:56:45.490872) do_wait: waiting lock=2DE47890 state=A0000000 6DC28790 elem=6DC28790 lockOwnerTCB=009C2BE0 ret=0000030C rtn=MVS_lock_contention (00C 20:56:45.490877) do_wait: waiting lock=2DE47890 state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C2BE0 ret=0000030C rtn=MVS_lock_contention (00D 20:56:45.490895) do_wait: waiting lock=2DE47890 state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C2BE0 ret=0000030C rtn=MVS_lock_contention (00E 20:56:45.490898) do_wait: waiting lock=2DE47890 state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C2BE0 ret=0000030C rtn=MVS_lock_contention . . *threads posted and run (001 20:56:45.499372) post_waiter: code=0 lock=2DE47890 elem=6DC28790 ret=000018A0 rtn=MVS_unlock_contention (001 20:56:45.499375) post_waiter: code=0 lock=2DE47890 elem=6D8539D8 ret=000018A0 rtn=MVS_unlock_contention (001 20:56:45.499380) post_waiter: code=0 lock=2DE47890 elem=6DC0C960 ret=000018A0 rtn=MVS_unlock_contention (001 20:56:45.499381) post_waiter: code=0 lock=2DE47890 elem=6DF318F8 ret=000018A0 rtn=MVS_unlock_contention (001 20:56:45.499385) post_waiter: code=0 lock=7F548E28 elem=6D777618 ret=0000007A rtn=post_sleepers *refcount now zero but VERY close to the assert taken by thread 00F (001 20:56:45.499389) oefs_relVnode: vp=2DE47A60 oep=2DE47878 old refcnt=80000005 new refct=x00000000 prg=1 last_hold=1.333613 subr=vn_rele (001 20:56:45.499406) <<<-SMB-sess=36D8E6D0 refct=00000006 csp=348BF040 com=x32 uid=x012B tid=x0001 err=(0,0) VECT size 58 (001 20:56:45.499409) ss_writev: ---> sending 62 bytes in 3 buffers to client sptr=348BF040 s=730 ip=140.170.185. 89..65283 *assert indicates HOB on in refcount.... timing issue between threads 001 and 00F (00F 20:56:45.499449) dfs_assert: assert text len=38 <(oldcount.drefcount & ~FH_LOCKED) != 0> (00C 20:56:45.499458) do_wait: waiting lock=2DE47890 state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention (00D 20:56:45.499462) do_wait: waiting lock=2DE47890 state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention (00E 20:56:45.499466) do_wait: waiting lock=2DE47890 state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention . . (00F 20:56:45.499449) dfs_assert: assert text len=38 <(oldcount.drefcount & ~FH_LOCKED) != 0> (00C 20:56:45.499458) do_wait: waiting lock=2DE47890 state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention (00D 20:56:45.499462) do_wait: waiting lock=2DE47890 state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention (00E 20:56:45.499466) do_wait: waiting lock=2DE47890 state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention *** Thread assignment: thread=014 asid=0112 tcb=009C3AD0 (014 20:56:45.499473) fh_item_obtain: search complete, item=00000000 tablep=196E7D58 searched=3 (014 20:56:45.499475) fh_item_obtain: reusing itemp=2DE47878 tablep=196E7D58 cost=0 searchcnt=4 (014 20:56:45.499475) do_wait: waiting lock=2DE47890 state=A0000000 6D777618 elem=6D777618 lockOwnerTCB=009C0AD0 ret=0000030C rtn=MVS_lock_contention (00F 20:56:45.500210) signal_handler_kern: Caught signal 3
Local fix
na
Problem summary
**************************************************************** * USERS AFFECTED: All users of the z/OS Distributed File * * Service SMB Server. * **************************************************************** * PROBLEM DESCRIPTION: Under heavy stress conditions, the * * following asserts may be encountered: * * * * IOEN00154A Assertion Failed: * * (oldcount.drefcount & FH_LOCKED) != 0 * * or * * IOEN00154A Assertion Failed: * * !(oldcount.drefcount & FH_PEEK) * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** A timing window is possible under stress conditions in which an oenode hang could be encountered. The code would detect the potential hang and attempt to break the hang by adjusting the oenode refcount. Adjusting the refcount under certain conditions could cause the asserts.
Problem conclusion
Modified the logic that adjusts the refcount when a potential hang is encountered and added code to dump the address space the first time this is encountered. The dump can be used to further diagnose the potential oenode hang condition. The dump will be titled "DFSKERN OENODE hang break detected". The SMB server will continue to run. CMVC defect 18236
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
OA39745
Reported component name
DFS FILE SERVIC
Reported component ID
569694200
Reported release
3C0
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2012-06-11
Closed date
2012-12-11
Last modified date
2016-12-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UA67467 UA67468
Modules/Macros
IOEDFSKN
Fix information
Fixed component name
DFS FILE SERVIC
Fixed component ID
569694200
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3C0","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3C0","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
08 December 2016