MapR

The MapReduce framework IBM® Spectrum Symphony can work with MapR, an enterprise distribution of Apache Hadoop.

About this task

Follow these steps to configure MapR as a distributed file system for MapReduce:

Procedure

Download and install MapR. The MapReduce framework in IBM Spectrum Symphony is qualified with MapR version 3.0.2.

MapR is available for download at:http://package.mapr.com/releases/v3.0.2/redhat/

Ensure that the file system is installed under folder MapR_HOME and that IBM Spectrum Symphony can access MapR_HOME.

As cluster administrator, shut down the IBM Spectrum Symphony cluster.

soamcontrol app disable all
egosh service stop all
egosh ego shutdown all

Add the following configuration to the core-site.xml file, located under the $PMR_HOME/conf directory:
```
<property>
  <name>fs.default.name</name>
  <value>maprfs://cldbHost:7222/</value>
</property><property>
  <name>fs.maprfs.impl</name>
  <value>com.mapr.fs.MapRFileSystem</value>
</property>
```
where:
- cldbHost is the host on which MapR's Container Location Database (CLDB) (mapr-cldb) is installed.
- 7222 is the default port used for communication with the CLDB; change it according to your environment.
Edit pmr-env.sh, located under $PMR_HOME/conf, to set the HADOOP_VERSION to the HDFS version.
```
export HADOOP_VERSION=20_204
```

Edit the mrsh job submission script (at $SOAM_HOME/mapreduce/7.3.1/linux2.6-glibc2.3-x86_64/bin/) on each MapReduce host to add the following code:

Locate if [ $DEBUG_PORT ];then.

Before this line, add the following code:

CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH

The file should have the following code:

if [ "$JAVA_HEAP_MAX" ]; then
  JAVA_HEAP_MAX_DEFAULT=${JAVA_HEAP_MAX}
fi

CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH

if [ $DEBUG_PORT ];then

Edit the RunMapReduceService.sh script (found in the $SOAM_HOME/mapreduce/7.3.1/linux2.6-glibc2.3-x86_64/etc/ directory):

Locate this line:

loginfo "Trying to start a service instance."

Replace all the code under this line with the following code:

CLASSPATH=$SYMPHONY_SDK_JARFILE:$PMR_APP_DEP_JARFILES:$WORK_DIR/classpath:$SOAM_HOME/mapreduce/conf:${USEDCONF}
CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH

loginfo "JVM start cmd: $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS"
loginfo "Start to run JVM ..."
$JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS >> ${filepath};

The file should have the following code:

loginfo "Trying to start a service instance."

CLASSPATH=$SYMPHONY_SDK_JARFILE:$PMR_APP_DEP_JARFILES:$WORK_DIR/classpath:$SOAM_HOME/mapreduce/conf:${USEDCONF}
CLASSPATH="/opt/mapr/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
CLASSPATH="/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH

loginfo "JVM start cmd: $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS"
loginfo "Start to run JVM ..."
$JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS >> ${filepath};

Start the cluster.
```
egosh ego start
```
To verify your setup, copy some files to the /user/root/input folder on the MapR file system.

Run the following job:

mrsh jar $PMR_HOME/7.3.1/linux2.6-glibc2.3-x86_64/samples/hadoop-examples-0.20.204.0.jar wordcount /user/root/input /user/root/output

Check the result.