General HBase tuning

When you tune HBase, you can improve the performance and balance the memory usage.

Updating environment variables (hbase-env.sh)

Depending on the availability of memory on the cluster nodes, you can use environment variables to tune the memory that is available to the HBase master server and the HBase region servers. You can also configure the garbage collector. As part of the HBase tuning process, consider the map reduce workload and the memory that is allocated to the map reduce JVMs.

The environment variables that help you control performance in HBase are in file /etc/hbase/conf/hbase-env.sh.
Note: Do not edit this configuration file manually. Modify properties in the Ambari dashboard.
You change the values of all of the variables listed by clicking HBase from the Ambari dashboard and then clicking the Configs > Advanced tabs. Expand the Advanced hbase-env section. or you can search for each variable in the Filter field.

After any changes to the variables, save the configuration changes and restart the HBase service.

Master and region server memory
Each region server contains regions that contain all of the data in a key range.

The HBASE_HEAPSIZE value is the maximum amount of heap to use, in MB. The default is 1000. This is small for an HBase system that is used regularly in your cluster. Give HBase as much memory as you can to avoid swapping to achieve good performance. The example uses a value of 8000, but you should tune the size based on your environment and workloads.

You can increase the HBase master server JVM heap size with the following steps:
  1. Expand the Advanced hbase-env section.
  2. In the hbase-env template field, find the current reference to HBASE_HEAPSIZE and modify the value:
    export HBASE_HEAPSIZE=8000
    Remove the hash sign if it exists to uncomment the string.
  3. Then, increase the JVM heap size for the region servers. In the same hbase-env template field, scroll to find the HBASE_REGIONSERVER_OPTS variable, and update the value:
    export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xms8G -Xmx8g"
Garbage collection
HBase uses the JVM garbage collection subsystem, which reduces some memory management issues. Garbage collection is an automated system that handles both the allocation and reclamation of memory for Java objects.
For a JVM that contains less that 4 GB of memory, use gcpolicy=gencon. A suggested best practice is the following setting:
-Xms3000m -Xmx3000m -Xgcpolicy:gencon 

The -Xms<size> sets the initial size of the heap. The -Xmx<size> sets the maximum size of the heap.

For a JVM that contains more than 4 GB or memory, use policy=balanced. With this policy, you do not need to set anything beyond the initial size and the maximum size of the heap.
-Xms8192m -Xmx8192m -Xgcpolicy:balanced
You can manipulate the garbage collection options in HBASE_OPTS. Search for garbage collection and then update with the following string:
export HBASE_OPTS="$HBASE_OPTS -Xgcthreads2 -Xgcpolicy:gencon -Xalwaysclassgc" 

Updating configuration values (hbase-site.xml)

HBase site-specific customizations are in the file /etc/hbase/conf/hbase-site.xml.
Note: Do not edit this configuration file manually. Modify properties in the Ambari dashboard.
To change these values, from the Ambari dashboad, click HBase and then click the Configs > Settings tabs. Expand the Advanced hbase-site section, or you can search for each variable in the Filter field.
hbase.regionserver.handler.count

This parameter defines the number of threads that are kept open to answer incoming requests to user tables. The default value is 30.

A rule of thumb is to keep the value low when the payload for each request is large, and keep the value high when the payload is small. Increase the hbase.regionserver.handler.count to a value that is approximately the number of CPUs on the region servers. Go to the Settings tab, and find the Number of Handlers per RegionServer field. Move the horizontal bar to the value 64.

hbase.hregion.max.filesize
This parameter is the maximum HStoreFile size. The default value is 10737418240. Decrease the region server size. Big SQL determines the number of mappers based on the region size. There is one mapper for each region. Go to the Settings tab, and find the Maximum Region File Size field. Move the horizontal bar to the value between 10GB and 11GB.
hbase.client.write.buffer
This parameter is the size of the HTable client write buffer in bytes. The default value is 2097152.
A bigger buffer takes more memory,on both the client and server side, but a larger buffer size reduces the number of remote procedure calls that are made. Increase the hbase.client.write.buffer value. To change these values, click HBase and then click the Configs > Advanced tab. Expand the Custom hbase-site section section. or you can search for each variable in the Filter field.

hbase.client.write.buffer = 8388608
hbase.client.scanner.caching
This parameter is the number of rows that are fetched when calling next on a scanner, if it is not served from memory. The default value is 100.
A higher caching value enables faster scanners, but uses more memory and some calls of next can take longer times when the cache is empty. Increase the scanner cache size to improve the performance of large reads. To change these values, click HBase and then click the Configs > Advanced tab. Expand the Custom hbase-site section section. or you can search for each variable in the Filter field.

hbase.client.scanner.caching = 10000