Apache Hadoop is a distributed computing
platform that primarily consists of the Hadoop Distributed File System
(HDFS) and an implementation of the MapReduce programming paradigm.
About this task
The MapReduce framework in IBM® Spectrum Symphony supports Hadoop 2.7.2 APIs.
If your application is built with an older Hadoop version, follow these steps to configure the MapReduce framework in IBM Spectrum Symphony to work with applications from older Hadoop API versions:
Procedure
-
Add the HADOOP_VERSION environment variable to the application profile and set its value to the value that represents the Hadoop version.
For example, for Hadoop
2.4.x, specify
<env name="HADOOP_VERSION">2_4_x</env> as follows:
<osType fileNamePattern="task_%taskId%"
logDirectory="${SOAM_HOME}/mapreduce/logs/tasklogs"
name="all" startCmd="${PMR_HOME}/${PMR_VERSION}/
${EGO_MACHINE_TYPE}/etc/RunPlatformMapReduceService.sh"
subDirectoryPattern="PlatformMapReduce/%sessionId%"
workDir="${PMR_HOME}/work">
<env name="PMR_HOME">${SOAM_HOME}/mapreduce</env>
<env name="PMR_VERSION">7.3.2</env>
<env name="SUB_WORK_DIR">$"{log4cxx_autoindex}</env>
<env name="JAVA_HOME">/opt/java/jre1.6.0_25_64bits/</env>
<env name="HADOOP_VERSION">2_4_x</env>
</osType>
- Re-register the application profile so that the change
takes effect:
$ soamreg application_profile
- Set environment variable HADOOP_VERSION on
the submission side by editing it in the $PMR_HOME/conf/pmr-env.sh file.
Ensure that you use the value applicable for the HDFS version included
in your Hadoop distribution. For example:
export HADOOP_VERSION=2_4_x
Input and output for Haddoop HDFS 2.4.x can now be specified from the mrsh utility.