Tuning for YCSB/HBase

HBase configuration
  1. Change the hbase-site.xml from Ambari if you take HortonWorks or IBM® BigInsights®. If you take open source HBase, you could modify $HBASE_HOME/conf/hbase-site.xml directly.
    Table 1. HBase Configuration Tuning
    Configuration Default value Recommended value Comments
    Java™ Heap N/A Refer the Memory tuning section. HBase Master server Heap Size;

    HBase Region Server Heap Size

    hbase.regionserver.handler.count 30 60
    zookeeper.session.timeout N/A 180000
    hbase.hregion.max.filesize 10737418240 10737418240 Check the default value. If it is not 10GB, change it into 10GB.
    hbase.hstore.blockingStoreFiles 10 50
    hbase.hstore.compaction.max 10 10
    hbase.hstore.compaction.max.size LONG.MAX_VALUE Variable If you see a lot of compaction, you could set this to 1GB to exclude those HFiles from compaction.
    hbase.hregion.majorcompaction 604800000 0 Turn off the major compaction when running benchmark to ensure that the results are stable. In production, this should not be changed.
    hbase.hstore.compactionThreshold 3 3
    hbase.hstore.compaction.max 10 3
Table 2. IBM Storage® Scale Tuning
Configuration Default value Recommended value Comments
pagepool 1GB 30% of physical memory 30% of physical memory
Note: 30% of physical memory is only for running HBase/YCSB. In production, you need to consider the memory allocation for other workloads. If you run Map/Reduce jobs, Hive jobs over the same cluster, you need to trace off the performance for these different workloads. If you allocate more memory for pagepool because of HBase, you will have fewer memory for Map/Reduce jobs and therefore degrade the performance for Map/Reduce jobs.
Table 3. YCSB Configuration Tuning
Configuration Default value Recommended value Comments
writebuffersize 12MB 12MB
clientbuffering False True For benchmark, keep this the same as what you use to run YCSB over native HDFS.
recordcount 1000 1000000
operationcount N/A N/A Depends on the number of operations you want to benchmark. For example, 20M operations
threads N/A Variable Depends on the number of threads you want to benchmark.
requestdistribution zipfian Not changed
recordsize 100*10 Not changed YCSB for HBase takes 100 bytes per field and 10 fields for one record.
Important: While creating the HBase table before running YCSB, you need to pre-split the table accordingly. For example, you need to pre-split the table into 100 partitions for ~10 HBase Region servers. If it is more than 10 HBase Region servers, you need to increase the pre-split partition number.

If you do not pre-split the table, all requests are handled by limited HBase Region servers and therefore the performance of YCSB is impacted.