MapR
The MapReduce framework IBM® Spectrum Symphony can work with MapR, an enterprise distribution of Apache Hadoop.
About this task
Procedure
-
Download and install MapR. The MapReduce framework in IBM Spectrum Symphony is qualified with MapR version 3.0.2.
MapR is available for download at:http://package.mapr.com/releases/v3.0.2/redhat/
Ensure that the file system is installed under folder MapR_HOME and that IBM Spectrum Symphony can access MapR_HOME.
-
As cluster administrator, shut down the IBM Spectrum Symphony cluster.
soamcontrol app disable all egosh service stop all egosh ego shutdown all -
Add the following configuration to the core-site.xml file, located under the $PMR_HOME/conf directory:
<property> <name>fs.default.name</name> <value>maprfs://cldbHost:7222/</value> </property><property> <name>fs.maprfs.impl</name> <value>com.mapr.fs.MapRFileSystem</value> </property>where:cldbHostis the host on which MapR's Container Location Database (CLDB) (mapr-cldb) is installed.7222is the default port used for communication with the CLDB; change it according to your environment.
- Edit pmr-env.sh, located under $PMR_HOME/conf, to set the HADOOP_VERSION to the HDFS version.
export HADOOP_VERSION=20_204 -
Edit the mrsh job submission script (at $SOAM_HOME/mapreduce/7.3.1/linux2.6-glibc2.3-x86_64/bin/) on each MapReduce host to add the following code:
-
Locate
if [ $DEBUG_PORT ];then. -
Before this line, add the following code:
CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATHThe file should have the following code:if [ "$JAVA_HEAP_MAX" ]; then JAVA_HEAP_MAX_DEFAULT=${JAVA_HEAP_MAX} fi CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH if [ $DEBUG_PORT ];then
-
Locate
-
Edit the RunMapReduceService.sh script (found in the
$SOAM_HOME/mapreduce/7.3.1/linux2.6-glibc2.3-x86_64/etc/ directory):
-
Locate this line:
loginfo "Trying to start a service instance." -
Replace all the code under this line with the following code:
CLASSPATH=$SYMPHONY_SDK_JARFILE:$PMR_APP_DEP_JARFILES:$WORK_DIR/classpath:$SOAM_HOME/mapreduce/conf:${USEDCONF} CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH loginfo "JVM start cmd: $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS" loginfo "Start to run JVM ..." $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS >> ${filepath};The file should have the following code:loginfo "Trying to start a service instance." CLASSPATH=$SYMPHONY_SDK_JARFILE:$PMR_APP_DEP_JARFILES:$WORK_DIR/classpath:$SOAM_HOME/mapreduce/conf:${USEDCONF} CLASSPATH="/opt/mapr/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH CLASSPATH="/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH loginfo "JVM start cmd: $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS" loginfo "Start to run JVM ..." $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS >> ${filepath};
-
Locate this line:
- Start the cluster.
egosh ego start - To verify your setup, copy some files to the /user/root/input folder on the MapR file system.
- Run the following job:
mrsh jar $PMR_HOME/7.3.1/linux2.6-glibc2.3-x86_64/samples/hadoop-examples-0.20.204.0.jar wordcount /user/root/input /user/root/output - Check the result.