Apache Hive is a data warehouse system that summarizes
data, facilitates ad-hoc queries, and analyzes large data sets stored
in Hadoop-compatible file systems. It provides a mechanism to project
structure on this data and to query data using a SQL-like language
called HiveQL. This language also allows you to plug in custom mappers
and reducers when it is inconvenient or inefficient to express logic
in HiveQL.
About this task
Follow
these steps to run Hive applications with IBM Spectrum Symphony.
Procedure
- Install Hive.
- Download Hive. The MapReduce framework in IBM Spectrum Symphony is
qualified with Hive 0.13.1.
- Extract the Hive package.
$
tar -zxvf hive-0.13.1.tar.gz -C /opt/
- Configure Hive.
- Set the environment variables for running Hive within
the MapReduce framework in IBM Spectrum Symphony.
Include HIVE_HOME, PATH, HADOOP_HOME,
and HBASE_HOME.
For example:
export HIVE_HOME=/bi211/hive
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_HOME=/root/hadoop
#export HBASE_HOME=/root/hbase
- Edit the $HIVE_HOME/bin/hive file
by adding the
HADOOP=$PMR_BINDIR/mrsh
line between
lines 202 and 205: For example:
if [ "$hadoop_major_ver" -lt "1" -a "$hadoop_minor_ver$hadoop_patch_ver" -lt "201" ]; then
echo "Hive requires Hadoop 2.4.x (x >= 1)."
echo "'hadoop version' returned:"
echo `$HADOOP version`
exit 6
fi
HADOOP=$PMR_BINDIR/mrsh #* add this one line *#
# HBase detection. Need bin/hbase and a conf dir for building classpath entries.
# Start with BigTop defaults for HBASE_HOME and HBASE_CONF_DIR.
HBASE_HOME=${HBASE_HOME:-"/usr/lib/hbase"}
HBASE_CONF_DIR=${HBASE_CONF_DIR:-"/etc/hbase/conf"}
if [[ ! -d $HBASE_CONF_DIR ]] ; then
# not explicitly set, nor in BigTop location. Try looking in HBASE_HOME.
HBASE_CONF_DIR="$HBASE_HOME/conf"
fi
- Create or modify the hive-site.xml file,
located at $HIVE_HOME/conf, with the hadoop.bin.path entry
to set the path to the mrsh utility in your installation:
For example:
<configuration>
<property>
<name>hadoop.bin.path</name>
<value>$PMR_BINDIR/mrsh</value>
</property>
</configuration>
- Run the Hive script from the command line interface to
create a Hive table, load some data and select certain records form
it, run the following commands for example:
hive
Use USER_CLASSPATH instead
of HADOOP_CLASSPATH to set the user's classpath.
hive>
create table pokes(foo INT, bar STRING);
hive>
show tables;
hive> LOAD DATA
LOCAL INPATH 'hive-0.7.1/examples/files/kv1.txt' OVERWRITE INTO TABLE
pokes;
hive> SELECT a.foo FROM
pokes a WHERE a.foo > 490;
Hive
cannot use subcommands that mrsh does not support (for example, the fs command
used with Hadoop).