Apache Hadoop

Apache Hadoop is a distributed computing platform that primarily consists of the Hadoop Distributed File System (HDFS) and an implementation of the MapReduce programming paradigm.

About this task

The MapReduce framework in IBM® Spectrum Symphony supports Hadoop 2.7.2 APIs.

If your application is built with an older Hadoop version, follow these steps to configure the MapReduce framework in IBM Spectrum Symphony to work with applications from older Hadoop API versions:

Procedure

  1. Add the HADOOP_VERSION environment variable to the application profile and set its value to the value that represents the Hadoop version.
    For example, for Hadoop 2.4.x, specify <env name="HADOOP_VERSION">2_4_x</env> as follows:
    <osType fileNamePattern="task_%taskId%" 
    logDirectory="${SOAM_HOME}/mapreduce/logs/tasklogs" 
    name="all" startCmd="${PMR_HOME}/${PMR_VERSION}/
    ${EGO_MACHINE_TYPE}/etc/RunPlatformMapReduceService.sh" 
    subDirectoryPattern="PlatformMapReduce/%sessionId%" 
    workDir="${PMR_HOME}/work">
        <env name="PMR_HOME">${SOAM_HOME}/mapreduce</env>
        <env name="PMR_VERSION">7.3.2</env>
        <env name="SUB_WORK_DIR">$"{log4cxx_autoindex}</env>
        <env name="JAVA_HOME">/opt/java/jre1.6.0_25_64bits/</env>
        <env name="HADOOP_VERSION">2_4_x</env>
    </osType>
    
  2. Re-register the application profile so that the change takes effect:

    $ soamreg application_profile

  3. Set environment variable HADOOP_VERSION on the submission side by editing it in the $PMR_HOME/conf/pmr-env.sh file. Ensure that you use the value applicable for the HDFS version included in your Hadoop distribution. For example:
    export HADOOP_VERSION=2_4_x

    Input and output for Haddoop HDFS 2.4.x can now be specified from the mrsh utility.