IBM Support

PI61701: EXCESSIVE MEMORY USAGE IN C++ DFS READER PROCESS DUE TO LARGE NUMBER OF MALLOC ARENAS IN REDHAT LINUX - LEADS TO OOM ERRORS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Over time, BigSQL operations may begin to fail due to excessive
    memory usage by the DFS C++ read/write process.  Errors include
    SQL5199 reason code 2 (unable to allocate more memory for the
    FMP memory set).
    
    The excessive memory usage is caused by the large number of
    malloc memory arenas created under RedHat's per-thread arena
    strategy. A malloc arena is created for every thread within the
    process, up to a maximum of 8 times the number of cores. An
    additional overhead is caused by the default memory map
    threshold, which increases to the largest block ever freed (not
    specific to RedHat).  Any allocation below this level is
    retained for reuse, which both contributes to fragmentation as
    well as prevents the application from freeing memory back to the
    operating system (via malloc free).
    
    The memory footprint is accounted for in the DFSRW_PRIVATE
    memory consumer of the BIGSQL DB2 instance memory controller.
    This can be viewed by running "db2pd -dbptnmem" on a given
    worker node  and observing the current usage for the
    DFSRW_PRIVATE consumer.
    
    The db2diag.log should show excessive sizes for the
    DFSRW_PRIVATE consumer at the time of the errors, eg.
    
    $ grep DFSRW_PRIVATE <db2diag.log>
    DFSRW_PRIVATE - Current size : 80008000 KB, HWM : 80008000 KB,
     Cached : 0 KB
    
    DFSRW_PRIVATE is normally no higher than roughly 50% of the
    Instance Memory limit.
    

Local fix

  • Add a cap of 16 malloc arenas and mmap threshold of 1MB by
    inserting the following into <bigsql home>/sqllib/userprofile :
    <snip>
        HIVE_HOME=$BIGSQL_DIST_HOME/hive
        export HIVE_HOME
    
        MALLOC_ARENA_MAX=16
        export MALLOC_ARENA_MAX
        MALLOC_MMAP_THRESHOLD_=1048576
        export MALLOC_MMAP_THRESHOLD_
    
        DB2ENVLIST="LD_LIBRARY_PATH DB2LIBPATH BIGSQL_DIST_HOME
        BIGSQL_HOME HADOOP_HOME EGO_CONFDIR"
        DB2ENVLIST="${DB2ENVLIST} HADOOP_CONF_DIR HIVE_HOME
        DB2ENVLIST="${DB2ENVLIST} LIBHDFS_OPTS DB2_EXT_TABLE_READER
    HADOOP_MAPRED_HOME"
        DB2ENVLIST="${DB2ENVLIST} SQOOP_HOME BIGSQL_DIST_LIB
    BIGSQL_DIST_VAR BIGSQL_AUX_JARS_PATH"
        DB2ENVLIST="${DB2ENVLIST} GSK_STRICTCHECK_CBCPADBYTES"
        DB2ENVLIST="${DB2ENVLIST} DB2_BIGSQL_LIBPATH
    DB2_BIGSQL_CLASSPATH"
        DB2ENVLIST="${DB2ENVLIST} METASTORE_PORT HCAT_PID_DIR
    HCAT_LOG_DIR HCAT_CONF_DIR DBROOT"
        DB2ENVLIST="${DB2ENVLIST} MALLOC_ARENA_MAX"
        DB2ENVLIST="${DB2ENVLIST} MALLOC_MMAP_THRESHOLD_"
    
    
    
    
    Above, the new lines to set were:
        MALLOC_ARENA_MAX=16
        export MALLOC_ARENA_MAX
        MALLOC_MMAP_THRESHOLD_=1048576
        export MALLOC_MMAP_THRESHOLD_
    
    and
     DB2ENVLIST="${DB2ENVLIST} MALLOC_ARENA_MAX"
     DB2ENVLIST="${DB2ENVLIST} MALLOC_MMAP_THRESHOLD_"
    
    After adding the above, bigsql must be recycled twice in order
    for the userprofile to be propagated to all nodes.
    i.e. bigsql stop;bigsql start;bigsql stop;bigsql start
    
    It is strongly advised to upgrade glibc to a level containing a
    fix for the Linux "cyclic malloc arena selection bug".  This bug
    results in very imbalanced memory usage across the arenas,
    which causes inefficient/excessive memory usage as well as
    performance degradation.
    The fix is contained in the following glibc levels :
    RHEL 6 : glibc 2.12.1.192 ( bug ID 1264189 )
    https://bugzilla.redhat.com/show_bug.cgi?id=1264189
    RHEL 7 : glibc 2.17-157 ( bug ID 1276753 )
    https://bugzilla.redhat.com/show_bug.cgi?id=1276753
    as well as glibc 2.23
    

Problem summary

  • See Description
    

Problem conclusion

  • Problem is fixed in Version 4.2.0.0 December
     2016 Refresh (DB2 level s161128, Bigsql-dist version 5.78.5.310
    )
    

Temporary fix

  • see description
    

Comments

APAR Information

  • APAR number

    PI61701

  • Reported component name

    INFO BIGINSIGHT

  • Reported component ID

    5725C0900

  • Reported release

    400

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-04-29

  • Closed date

    2017-05-12

  • Last modified date

    2017-05-31

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • DFSRW
    

Fix information

  • Fixed component name

    INFO BIGINSIGHT

  • Fixed component ID

    5725C0900

Applicable component levels

  • R410 PSN

       UP

  • R420 PSN

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"400","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
25 August 2020