OA39745: IOEN00154A OLDCS == FH_LOCKED OLDCOUNT.DREFCOUNT & ~FH_LOCKED !(OLDCOUNT.DREFCOUNT & FH

A fix is available

APAR status

Closed as program error.

Error description

Various assertion abends,restarts in DFSKERN

 IOEN00154A Assertion Failed:oldcs == FH_LOCKED Line:383
 File:./../../../../src/file/osi/fasthash.c.
 CORR: 0
 IOEN00154A Assertion Failed:!(oldcount.drefcount &
 FH_PEEK) Line:959
 File:./../../../../src/file/osi/fasthash.c.


IOEN00154A Assertion Failed:(oldcount.drefcount &
FH_LOCKED) != 0
Line:958 File:./../../../../src/file/osi/fasthash.c.
CORR: 0
IOEN00407I Signal 3  received, dump in progress. 258


This is a timing issue caused by multiple concurrent access to
the same oenode and serialization fails


Verification:
Extract DFSKERN trace from the dump, review for symptoms similar
to the following. Noting the concurrent access, locking and
refcount manipulation across various concurrent threads in
DFSKERN

Ex.
See below, but I suspect that thread (001 locks this when doing
the kill
But then the waiter (00F) gets to run BEFORE thread (001 has
completed the release and turned off the FH_LOCKED bit

*** Thread assignment: thread=001 asid=0112 tcb=009C2BE0
.
.
*** Thread assignment: Thread=00F asid=0112 tcb=009C0AD0
.
.

(00F 20:56:45.166188) do_wait: waiting lock=2DE478F8
state=A0000000 6D777000 elem=6D777000 lockOwnerTCB=009BDCF0
ret=0000030C rtn=MVS_lock_contention
.
.
*Holder posts waiters to run
(001 20:56:45.490861) post_waiter: code=0 lock=7F548E28
elem=6D737628 ret=0000007A rtn=post_sleepers
(001 20:56:45.490863) post_waiter: code=0 lock=7F548E28
elem=6D777870 ret=0000007A rtn=post_sleepers
*releasing and kill. NOTE HOB on it item = FH_LOCKED
(001 20:56:45.490865) oefs_relVnode avoid hang oep=2DE47878 old
refcnt=80000005 new refct=x00000001
(001 20:56:45.490865) fh_item_kill: item=2DE47878
oldcount=00000001 00000000 newcount=80000001 00000000
count=80000001 failed=0
(001 20:56:45.490865) fh_item_kill: itemp=2DE47878
tablep=196E7D58 refcount=80000000
(001 20:56:45.490866) kill_item: itemp=2DE47878
longterm=00000000 count=80000000 tablep=196E7D58 cost=1
(001 20:56:45.490867) osi_Free: a1=x3DF2AF38 s1=x30 a2=x3DF2AF40
s2=x20 total=270849955 off=x000000E4 subr=osi_Free

*thread 00F waiting(with others) for
*lock=2DE47890 which is item->lock for oep 2DE47878
*held by thread 001
(00F 20:56:45.490872) do_wait: waiting lock=2DE47890
state=A0000000 6DC28790 elem=6DC28790 lockOwnerTCB=009C2BE0
ret=0000030C rtn=MVS_lock_contention
(00C 20:56:45.490877) do_wait: waiting lock=2DE47890
state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C2BE0
ret=0000030C rtn=MVS_lock_contention
(00D 20:56:45.490895) do_wait: waiting lock=2DE47890
state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C2BE0
ret=0000030C rtn=MVS_lock_contention
(00E 20:56:45.490898) do_wait: waiting lock=2DE47890
state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C2BE0
ret=0000030C rtn=MVS_lock_contention
.
.
*threads posted and run
(001 20:56:45.499372) post_waiter: code=0 lock=2DE47890
elem=6DC28790 ret=000018A0 rtn=MVS_unlock_contention
(001 20:56:45.499375) post_waiter: code=0 lock=2DE47890
elem=6D8539D8 ret=000018A0 rtn=MVS_unlock_contention
(001 20:56:45.499380) post_waiter: code=0 lock=2DE47890
elem=6DC0C960 ret=000018A0 rtn=MVS_unlock_contention
(001 20:56:45.499381) post_waiter: code=0 lock=2DE47890
elem=6DF318F8 ret=000018A0 rtn=MVS_unlock_contention
(001 20:56:45.499385) post_waiter: code=0 lock=7F548E28
elem=6D777618 ret=0000007A rtn=post_sleepers
*refcount now zero but VERY close to the assert taken by thread
00F
(001 20:56:45.499389) oefs_relVnode: vp=2DE47A60 oep=2DE47878
old refcnt=80000005 new refct=x00000000 prg=1 last_hold=1.333613
subr=vn_rele
(001 20:56:45.499406) <<<-SMB-sess=36D8E6D0 refct=00000006
csp=348BF040 com=x32 uid=x012B tid=x0001 err=(0,0) VECT size 58
(001 20:56:45.499409) ss_writev: ---> sending 62 bytes in 3
buffers to client sptr=348BF040 s=730 ip=140.170.185. 89..65283
*assert indicates HOB on in refcount.... timing issue
between threads 001 and 00F
(00F 20:56:45.499449) dfs_assert: assert text len=38
<(oldcount.drefcount & ~FH_LOCKED) != 0>
(00C 20:56:45.499458) do_wait: waiting lock=2DE47890
state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
(00D 20:56:45.499462) do_wait: waiting lock=2DE47890
state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
(00E 20:56:45.499466) do_wait: waiting lock=2DE47890
state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
.
.
(00F 20:56:45.499449) dfs_assert: assert text len=38
<(oldcount.drefcount & ~FH_LOCKED) != 0>
(00C 20:56:45.499458) do_wait: waiting lock=2DE47890
state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
(00D 20:56:45.499462) do_wait: waiting lock=2DE47890
state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
(00E 20:56:45.499466) do_wait: waiting lock=2DE47890
state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
*** Thread assignment: thread=014 asid=0112 tcb=009C3AD0
(014 20:56:45.499473) fh_item_obtain: search complete,
item=00000000 tablep=196E7D58 searched=3
(014 20:56:45.499475) fh_item_obtain: reusing itemp=2DE47878
tablep=196E7D58 cost=0 searchcnt=4
(014 20:56:45.499475) do_wait: waiting lock=2DE47890
state=A0000000 6D777618 elem=6D777618 lockOwnerTCB=009C0AD0
ret=0000030C
rtn=MVS_lock_contention
(00F 20:56:45.500210) signal_handler_kern: Caught signal 3

Local fix

```
na
```

Problem summary

****************************************************************
* USERS AFFECTED: All users of the z/OS Distributed File       *
*                 Service SMB Server.                          *
****************************************************************
* PROBLEM DESCRIPTION: Under heavy stress conditions, the      *
*                      following asserts may be encountered:   *
*                                                              *
*                      IOEN00154A Assertion Failed:            *
*                      (oldcount.drefcount &  FH_LOCKED) != 0  *
*                      or                                      *
*                      IOEN00154A Assertion Failed:            *
*                      !(oldcount.drefcount & FH_PEEK)         *
****************************************************************
* RECOMMENDATION: APPLY PTF                                    *
****************************************************************
A timing window is possible under stress conditions in which an
oenode hang could be encountered.  The code would detect the
potential hang and attempt to break the hang by adjusting the
oenode refcount. Adjusting the refcount under certain conditions
could cause the asserts.

Problem conclusion

Modified the logic that adjusts the refcount when a potential
hang is encountered and added code to dump the address
space the first time this is encountered.  The dump can be used
to further diagnose the potential oenode hang condition. The
dump will be titled "DFSKERN OENODE hang break detected". The
SMB server will continue to run.

CMVC defect 18236

Temporary fix

```
*********
* HIPER *
*********
```

Comments

APAR Information

APAR number
OA39745
Reported component name
DFS FILE SERVIC
Reported component ID
569694200
Reported release
3C0
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2012-06-11
Closed date
2012-12-11
Last modified date
2016-12-08

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

UA67467 UA67468

Modules/Macros

```
IOEDFSKN
```

Fix information

Fixed component name
DFS FILE SERVIC
Fixed component ID
569694200

Applicable component levels

R3C0 PSY UA67467
UP12/12/27 P F212 Ž
R3D0 PSY UA67468
UP12/12/27 P F212 Ž

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3C0","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3C0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
08 December 2016

Tips

OA39745: IOEN00154A OLDCS == FH_LOCKED OLDCOUNT.DREFCOUNT & ~FH_LOCKED !(OLDCOUNT.DREFCOUNT & FH_PEEK) IOEN00407I

A fix is available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R3C0 PSY UA67467

R3D0 PSY UA67468

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

Document Information

Share your feedback

Need support?