Apache HBase
Apache HBase is an open-source, distributed, column-oriented store. HBase is the Hadoop database and provides Bigtable-like capabilities in addition to Hadoop and HDFS.
Before you begin
Ensure that the MapReduce framework in IBM® Spectrum Symphony is set to use Hadoop or Cloudera's Distribution including Hadoop (CDH). For the supported versions, see Supported distributed files systems for MapReduce or YARN integration.To run HBase with Cloudera, download the required packages from the Cloudera web site. For the supported versions of HBase that the MapReduce framework in IBM Spectrum Symphony has been qualified with, see Supported third-party applications for MapReduce.
About this task
Follow these steps to use HBase within the MapReduce framework in IBM Spectrum Symphony:
Procedure
Example of using the HBase RowCounter tool
testhbase1
, you can use RowCounter to count the table testhbase1
within the MapReduce framework in IBM Spectrum Symphony, like this: # HADOOP_CLASSPATH='${HBASE_HOME}/bin/hbase classpath'
mrsh jar /root/hbase/lib/hbase-server-0.98.4-hadoop2.jar
rowcounter 'testhbase1'
You are using Hadoop API with 2.4.x version.
... ...
2014-10-31 18:11:17,803 INFO [main] internal.MRJobSubmitter: Connected to JobTracker(SSM)
2014-10-31 18:11:17,827 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-10-31 18:11:18,083 INFO [main] Configuration.deprecation: mapred.output.key.comparator.class is deprecated. Instead, use mapreduce.job.output.key.comparator.class
2014-10-31 18:11:18,091 WARN [main] internal.MRJobSubmitter: !Not using C++ framework, because comparator class < org.apache.hadoop.hbase.io.ImmutableBytesWritable > for map output key class < org.apache.hadoop.hbase.io.ImmutableBytesWritable$Comparator > is not supported in C++!
2014-10-31 18:11:20,879 INFO [main] internal.MRJobSubmitter: Job <rowcounter_testhbase1> submitted, job id <703>
2014-10-31 18:11:20,879 INFO [main] internal.MRJobSubmitter: Job will not verify intermediate data integrity using checksum.
2014-10-31 18:11:20,882 INFO [main] mapreduce.Job: Running job: job_ssm_0703
2014-10-31 18:14:07,940 INFO [main] mapreduce.Job: Job job_ssm_0703 running in uber mode : false
2014-10-31 18:14:07,944 INFO [main] mapreduce.Job: map 0% reduce 0%
2014-10-31 18:14:14,950 INFO [main] mapreduce.Job: map 100% reduce 100%
2014-10-31 18:14:14,951 INFO [main] mapreduce.Job: Job job_ssm_0703 completed successfully
2014-10-31 18:14:16,181 INFO [main] mapreduce.Job: Counters: 20
Map-Reduce Framework
Map input records=4
Map output records=0
Input split bytes=40
GC time elapsed (ms)=67
File System Counters
GPFS: Number of bytes read=0
GPFS: Number of bytes written=0
GPFS: Number of large read operations=0
GPFS: Number of read operations=0
GPFS: Number of write operations=0
HBase Counters
BYTES_IN_REMOTE_RESULTS=128
BYTES_IN_RESULTS=128
MILLIS_BETWEEN_NEXTS=476
NOT_SERVING_REGION_EXCEPTION=0
NUM_SCANNER_RESTARTS=0
REGIONS_SCANNED=1
REMOTE_RPC_CALLS=3
REMOTE_RPC_RETRIES=0
RPC_CALLS=3
RPC_RETRIES=0
org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
ROWS=4
This example shows that there are 4 rows in the table.