Running YCSB/HBase

This section lists the steps to run YCSB/ HBASE.

  1. Download YCSB from YCSB 0.14.0.
  2. Extract the YCSB package into $YCSB_HOME.
  3. Remove all the libraries shipped by YCSB:
    #rm -fr $YCSB_HOME/hbase10-binding/lib/*
  4. Copy the libraries from your Hadoop distro:
    The following is for HortonWorks HDP 2.6:
    cp /usr/hdp/2.6.0.3-8/hbase/lib/* $YCSB_HOME/ hbase10-binding/lib/
    cp /usr/hdp/2.6.0.3-8/Hadoop/*.jar $YCSB_HOME/ hbase10-binding/lib/
    
  5. Create the following script accordingly:
    The script for loading:
    vim ycsb_workload_load.sh 
    #!/bin/bash
    set -x
    
    # <script> <thread-number> <YCSB_RESULT_HOME>
    
    ${YCSB_HOME}/bin/ycsb load hbase10 -P ${YCSB_HOME}/workloads/workloada 
    -p columnfamily=family -p recordcount=${RECORD_COUNT} -s -threads $1 
    -p measurementtype=timeseries -p timeseries.granularity=2000 2>&1 > 
    ${YCSB_RESULT_HOME}/$2/workload_load.output.thread-$1-.`date "+%y%m%d_%H%M%S"`
    

    In the previous script, RECORD_COUNT is the record number you want to load into HBase. RECORD_COUNT should be 1M. <thread-number> depends on your running.

    If you want to change writebuffersize and clientbuffering, you could add -p <writebuffersize> -p <clientbuffering> for the previous YCSB command.

    The script for YCSB workload A/B/C/D/E/F:
    #vim ycsb_workload_a.sh 
    #!/bin/bash
    # <script> <workloadA-recordcount> <thread-number> <result sub dir> 
    
    ${YCSB_HOME}/bin/ycsb run hbase10 -P ${YCSB_HOME}/workloads/workloada 
    -p columnfamily=family -p operationcount=${OPERATION_COUNT} 
    -p recordcount=$1 -s -threads $2 -p measurementtype=timeseries 
    -p timeseries.granularity=2000 2>&1 | 
    tee ${YCSB_RESULT_HOME}/$3/workload_a.output.thread-$2.`date "+%y%m%d_%H%M%S"`
    

Similarly create the other scripts for workload B/C/D/E/F.

You need to first run the ycsb_workload_load.sh script. After it loads data into HBase, run ycsb_workload_a.sh or other scripts.