IBM Support

IV89732: LOGASSERTFAILED: !"MORE THAN 22 MINUTES SEARCHING FOR A FREE BUF

 

APAR status

  • Closed as program error.

Error description

  • LogAssertFailed when pagepool is too small. The following
    logs can be seen in MMFS logs:
    
    [W] The pagepool size may be too small.  Try increasing
    the pagepool size or
    adjusting pagepool usage with, for example, nsdBufSpace,
    nsdRAIDBufferPoolSizePct,
    or verbsSendBufferMemoryMB).
    [X] logAssertFailed: !"More than 22 minutes searching for
    a free buffer in the pagepool"
    

Local fix

Problem summary

  • Current DIO over RDMA code allocates memory for NSD
    operation by below logic:
     1) When RDMA is enabled:
     1.1) Use encryption temp buffer for regular DIO
          Calls newBuffer(SO_NO_LOG_WRITES) if encryption
          temp buffer
          initialization fails
     1.2) Use heap for mmap DIO page out
     2)  when RDMA is disabled
     2.1)  Calls newBuffer(SO_NO_LOG_WRITES) for regular
           DIO
     2.2)  Use heap for mmap DIO page out .
    
    The newBuffer call is risky as SO_NO_LOG_WRITES flag
    means it can not steal dirty buffer. If periodical sync
    and DioHandlerThread lock into each other we may dead
    lock (clean thread and Per steal may break the
    interlocking, but they are inactive totally under some
    conditions).
    

Problem conclusion

  • 1) When RDMA is enabled:
     1.1) Use encryption temp buffer for regular DIO
          Use heap if encryption temp buffer initialization
          fails
     1.2) Use heap for mmap DIO page out
     2)  when RDMA is disabled
     2.1)  Calls newBuffer(SO_NO_LOG_WRITES) for regular DIO
     2.2)  Use heap for mmap DIO page out .
     Note: the behavior when RDMA is disabled is same as pre-4.1
     behavior. Alternatively, we can keep using encryption temp
     buffer. Heap is prefered at this time, as encryption temp
     buffer consumes some space from the pagepool and its
     allocations can still block if the pool is depleted.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IV89732

  • Reported component name

    SPECTRUM SCALE

  • Reported component ID

    5725Q01AP

  • Reported release

    411

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-10-06

  • Closed date

    2016-10-06

  • Last modified date

    2019-04-30

  • APAR is sysrouted FROM one or more of the following:

    IV83567

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPECTRUM SCALE

  • Fixed component ID

    5725Q01AP

Applicable component levels

  • R411 PSY U884675

       19/04/30 I 1000

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"411","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSFKCN","label":"General Parallel File System"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"411","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 April 2019