IBM Support

OA39745: IOEN00154A OLDCS == FH_LOCKED OLDCOUNT.DREFCOUNT & ~FH_LOCKED !(OLDCOUNT.DREFCOUNT & FH_PEEK) IOEN00407I

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Various assertion abends,restarts in DFSKERN
    
     IOEN00154A Assertion Failed:oldcs == FH_LOCKED Line:383
     File:./../../../../src/file/osi/fasthash.c.
     CORR: 0
     IOEN00154A Assertion Failed:!(oldcount.drefcount &
     FH_PEEK) Line:959
     File:./../../../../src/file/osi/fasthash.c.
    
    
    IOEN00154A Assertion Failed:(oldcount.drefcount &
    FH_LOCKED) != 0
    Line:958 File:./../../../../src/file/osi/fasthash.c.
    CORR: 0
    IOEN00407I Signal 3  received, dump in progress. 258
    
    
    This is a timing issue caused by multiple concurrent access to
    the same oenode and serialization fails
    
    
    Verification:
    Extract DFSKERN trace from the dump, review for symptoms similar
    to the following. Noting the concurrent access, locking and
    refcount manipulation across various concurrent threads in
    DFSKERN
    
    Ex.
    See below, but I suspect that thread (001 locks this when doing
    the kill
    But then the waiter (00F) gets to run BEFORE thread (001 has
    completed the release and turned off the FH_LOCKED bit
    
    *** Thread assignment: thread=001 asid=0112 tcb=009C2BE0
    .
    .
    *** Thread assignment: Thread=00F asid=0112 tcb=009C0AD0
    .
    .
    
    (00F 20:56:45.166188) do_wait: waiting lock=2DE478F8
    state=A0000000 6D777000 elem=6D777000 lockOwnerTCB=009BDCF0
    ret=0000030C rtn=MVS_lock_contention
    .
    .
    *Holder posts waiters to run
    (001 20:56:45.490861) post_waiter: code=0 lock=7F548E28
    elem=6D737628 ret=0000007A rtn=post_sleepers
    (001 20:56:45.490863) post_waiter: code=0 lock=7F548E28
    elem=6D777870 ret=0000007A rtn=post_sleepers
    *releasing and kill. NOTE HOB on it item = FH_LOCKED
    (001 20:56:45.490865) oefs_relVnode avoid hang oep=2DE47878 old
    refcnt=80000005 new refct=x00000001
    (001 20:56:45.490865) fh_item_kill: item=2DE47878
    oldcount=00000001 00000000 newcount=80000001 00000000
    count=80000001 failed=0
    (001 20:56:45.490865) fh_item_kill: itemp=2DE47878
    tablep=196E7D58 refcount=80000000
    (001 20:56:45.490866) kill_item: itemp=2DE47878
    longterm=00000000 count=80000000 tablep=196E7D58 cost=1
    (001 20:56:45.490867) osi_Free: a1=x3DF2AF38 s1=x30 a2=x3DF2AF40
    s2=x20 total=270849955 off=x000000E4 subr=osi_Free
    
    *thread 00F waiting(with others) for
    *lock=2DE47890 which is item->lock for oep 2DE47878
    *held by thread 001
    (00F 20:56:45.490872) do_wait: waiting lock=2DE47890
    state=A0000000 6DC28790 elem=6DC28790 lockOwnerTCB=009C2BE0
    ret=0000030C rtn=MVS_lock_contention
    (00C 20:56:45.490877) do_wait: waiting lock=2DE47890
    state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C2BE0
    ret=0000030C rtn=MVS_lock_contention
    (00D 20:56:45.490895) do_wait: waiting lock=2DE47890
    state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C2BE0
    ret=0000030C rtn=MVS_lock_contention
    (00E 20:56:45.490898) do_wait: waiting lock=2DE47890
    state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C2BE0
    ret=0000030C rtn=MVS_lock_contention
    .
    .
    *threads posted and run
    (001 20:56:45.499372) post_waiter: code=0 lock=2DE47890
    elem=6DC28790 ret=000018A0 rtn=MVS_unlock_contention
    (001 20:56:45.499375) post_waiter: code=0 lock=2DE47890
    elem=6D8539D8 ret=000018A0 rtn=MVS_unlock_contention
    (001 20:56:45.499380) post_waiter: code=0 lock=2DE47890
    elem=6DC0C960 ret=000018A0 rtn=MVS_unlock_contention
    (001 20:56:45.499381) post_waiter: code=0 lock=2DE47890
    elem=6DF318F8 ret=000018A0 rtn=MVS_unlock_contention
    (001 20:56:45.499385) post_waiter: code=0 lock=7F548E28
    elem=6D777618 ret=0000007A rtn=post_sleepers
    *refcount now zero but VERY close to the assert taken by thread
    00F
    (001 20:56:45.499389) oefs_relVnode: vp=2DE47A60 oep=2DE47878
    old refcnt=80000005 new refct=x00000000 prg=1 last_hold=1.333613
    subr=vn_rele
    (001 20:56:45.499406) <<<-SMB-sess=36D8E6D0 refct=00000006
    csp=348BF040 com=x32 uid=x012B tid=x0001 err=(0,0) VECT size 58
    (001 20:56:45.499409) ss_writev: ---> sending 62 bytes in 3
    buffers to client sptr=348BF040 s=730 ip=140.170.185. 89..65283
    *assert indicates HOB on in refcount.... timing issue
    between threads 001 and 00F
    (00F 20:56:45.499449) dfs_assert: assert text len=38
    <(oldcount.drefcount & ~FH_LOCKED) != 0>
    (00C 20:56:45.499458) do_wait: waiting lock=2DE47890
    state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    (00D 20:56:45.499462) do_wait: waiting lock=2DE47890
    state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    (00E 20:56:45.499466) do_wait: waiting lock=2DE47890
    state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    .
    .
    (00F 20:56:45.499449) dfs_assert: assert text len=38
    <(oldcount.drefcount & ~FH_LOCKED) != 0>
    (00C 20:56:45.499458) do_wait: waiting lock=2DE47890
    state=A0000000 6D8539D8 elem=6D8539D8 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    (00D 20:56:45.499462) do_wait: waiting lock=2DE47890
    state=A0000000 6DC0C960 elem=6DC0C960 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    (00E 20:56:45.499466) do_wait: waiting lock=2DE47890
    state=A0000000 6DF318F8 elem=6DF318F8 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    *** Thread assignment: thread=014 asid=0112 tcb=009C3AD0
    (014 20:56:45.499473) fh_item_obtain: search complete,
    item=00000000 tablep=196E7D58 searched=3
    (014 20:56:45.499475) fh_item_obtain: reusing itemp=2DE47878
    tablep=196E7D58 cost=0 searchcnt=4
    (014 20:56:45.499475) do_wait: waiting lock=2DE47890
    state=A0000000 6D777618 elem=6D777618 lockOwnerTCB=009C0AD0
    ret=0000030C
    rtn=MVS_lock_contention
    (00F 20:56:45.500210) signal_handler_kern: Caught signal 3
    

Local fix

  • na
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of the z/OS Distributed File       *
    *                 Service SMB Server.                          *
    ****************************************************************
    * PROBLEM DESCRIPTION: Under heavy stress conditions, the      *
    *                      following asserts may be encountered:   *
    *                                                              *
    *                      IOEN00154A Assertion Failed:            *
    *                      (oldcount.drefcount &  FH_LOCKED) != 0  *
    *                      or                                      *
    *                      IOEN00154A Assertion Failed:            *
    *                      !(oldcount.drefcount & FH_PEEK)         *
    ****************************************************************
    * RECOMMENDATION: APPLY PTF                                    *
    ****************************************************************
    A timing window is possible under stress conditions in which an
    oenode hang could be encountered.  The code would detect the
    potential hang and attempt to break the hang by adjusting the
    oenode refcount. Adjusting the refcount under certain conditions
    could cause the asserts.
    

Problem conclusion

  • Modified the logic that adjusts the refcount when a potential
    hang is encountered and added code to dump the address
    space the first time this is encountered.  The dump can be used
    to further diagnose the potential oenode hang condition. The
    dump will be titled "DFSKERN OENODE hang break detected". The
    SMB server will continue to run.
    
    CMVC defect 18236
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    OA39745

  • Reported component name

    DFS FILE SERVIC

  • Reported component ID

    569694200

  • Reported release

    3C0

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2012-06-11

  • Closed date

    2012-12-11

  • Last modified date

    2016-12-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UA67467 UA67468

Modules/Macros

  • IOEDFSKN
    

Fix information

  • Fixed component name

    DFS FILE SERVIC

  • Fixed component ID

    569694200

Applicable component levels

  • R3C0 PSY UA67467

       UP12/12/27 P F212 Ž

  • R3D0 PSY UA67468

       UP12/12/27 P F212 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3C0","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG19O","label":"APARs - MVS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3C0","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
08 December 2016