MapR

The MapReduce framework IBM® Spectrum Symphony can work with MapR, an enterprise distribution of Apache Hadoop.

About this task

Follow these steps to configure MapR as a distributed file system for MapReduce:

Procedure

  1. Download and install MapR. The MapReduce framework in IBM Spectrum Symphony is qualified with MapR version 3.0.2.

    MapR is available for download at:http://package.mapr.com/releases/v3.0.2/redhat/

    Ensure that the file system is installed under folder MapR_HOME and that IBM Spectrum Symphony can access MapR_HOME.

  2. As cluster administrator, shut down the IBM Spectrum Symphony cluster.
    soamcontrol app disable all
    egosh service stop all
    egosh ego shutdown all
  3. Add the following configuration to the core-site.xml file, located under the $PMR_HOME/conf directory:
    <property>
      <name>fs.default.name</name>
      <value>maprfs://cldbHost:7222/</value>
    </property><property>
      <name>fs.maprfs.impl</name>
      <value>com.mapr.fs.MapRFileSystem</value>
    </property>
    where:
    • cldbHost is the host on which MapR's Container Location Database (CLDB) (mapr-cldb) is installed.
    • 7222 is the default port used for communication with the CLDB; change it according to your environment.
  4. Edit pmr-env.sh, located under $PMR_HOME/conf, to set the HADOOP_VERSION to the HDFS version.
    export HADOOP_VERSION=20_204
    
  5. Edit the mrsh job submission script (at $SOAM_HOME/mapreduce/7.3.1/linux2.6-glibc2.3-x86_64/bin/) on each MapReduce host to add the following code:
    1. Locate if [ $DEBUG_PORT ];then.
    2. Before this line, add the following code:
      CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
      CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH
      The file should have the following code:
      if [ "$JAVA_HEAP_MAX" ]; then
        JAVA_HEAP_MAX_DEFAULT=${JAVA_HEAP_MAX}
      fi
      
      CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
      CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH
      
      if [ $DEBUG_PORT ];then
  6. Edit the RunMapReduceService.sh script (found in the $SOAM_HOME/mapreduce/7.3.1/linux2.6-glibc2.3-x86_64/etc/ directory):
    1. Locate this line:
      loginfo "Trying to start a service instance."
    2. Replace all the code under this line with the following code:
      CLASSPATH=$SYMPHONY_SDK_JARFILE:$PMR_APP_DEP_JARFILES:$WORK_DIR/classpath:$SOAM_HOME/mapreduce/conf:${USEDCONF}
      CLASSPATH="MapR_HOME/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
      CLASSPATH="MapR_HOME/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH
      
      loginfo "JVM start cmd: $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS"
      loginfo "Start to run JVM ..."
      $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS >> ${filepath};
      The file should have the following code:
      loginfo "Trying to start a service instance."
      
      CLASSPATH=$SYMPHONY_SDK_JARFILE:$PMR_APP_DEP_JARFILES:$WORK_DIR/classpath:$SOAM_HOME/mapreduce/conf:${USEDCONF}
      CLASSPATH="/opt/mapr/lib/maprfs-1.0.3-mapr-3.0.2.jar":$CLASSPATH
      CLASSPATH="/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.6.jar":$CLASSPATH
      
      loginfo "JVM start cmd: $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS"
      loginfo "Start to run JVM ..."
      $JAVA_HOME/bin/java ${DEBUG} $JVM_OPTIONS -classpath "$CLASSPATH" -Duser.dir=$WORK_DIR $SOAM_SERVICE_MAINCLASS >> ${filepath};
      
  7. Start the cluster.
    egosh ego start
  8. To verify your setup, copy some files to the /user/root/input folder on the MapR file system.
  9. Run the following job:
    mrsh jar $PMR_HOME/7.3.1/linux2.6-glibc2.3-x86_64/samples/hadoop-examples-0.20.204.0.jar wordcount /user/root/input /user/root/output
  10. Check the result.