IBM Support

Configuring NameNode Heap Size

Question & Answer


Question

How to configure the Heap Size for a NameNode?

Answer

NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters where the number of blocks is very close to the number of files (generally the average ratio of number of blocks per file in a system is 1.1 to 1.2). Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.

Table 1.11. NameNode Heap Size Settings

Number of Files in MillionsTotal Java Heap (Xmx and Xms)Young Generation Size (-XX:NewSize -XX:MaxNewSize)
< 1 million files1024m128m
1-5 million files3072m512m
5-105376m768m
10-209984m1280m
20-30     14848m2048m
30-4019456m2560m
40-5024320m3072m
50-7033536m4352m
70-10047872m6144m
100-12559648m7680m
125-15071424m8960m
150-20094976m8960m


You should also set -XX:PermSize to 128m and -XX:MaxPermSize to 256m.

The following are the recommended settings for HADOOP_NAMENODE_OPTS in the hadoop-env.sh file (replace the ##### placeholder for -XX:NewSize, -XX:MaxNewSize, -Xms, and -Xmx with the recommended values from the table):

-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=##### -XX:MaxNewSize=##### -Xms##### -Xmx##### -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}

If the cluster uses a Secondary NameNode, you should also set HADOOP_SECONDARYNAMENODE_OPTS to HADOOP_NAMENODE_OPTS in the hadoop-env.sh file:

HADOOP_SECONDARYNAMENODE_OPTS=$HADOOP_NAMENODE_OPTS

[{"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.1.0;4.2.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
18 July 2020

UID

swg21987530