Deploying a MapReduce application
You have compiled your MapReduce application and are ready to run and test the application within your IBM® Spectrum Symphony DE cluster.
About this task
- In the standalone (local) mode, where all MapReduce flow runs in a single Java™ process. In this mode, all setup, map, reduce, and cleanup tasks associated with a job are executed one by one in a single process on the local host, making debugging easier if a job fails.
- In the pseudo-distributed mode, where each MapReduce daemon runs in a separate Java process on a single host cluster.
Procedure
- Deploy MapReduce application in standalone mode.
In the standalone (local) mode, all MapReduce flow runs in a single Java process, with all setup, map, reduce, and cleanup tasks related to a job executed one by one in a single process on the local host. Testing in this mode (on a local machine in one JVM) makes debugging your jobs a little easier if a job fails.
When running applications, you can read/write data in two ways:- Read input data from the local disk and write output files to the local disk.
- Read input data from HDFS and write output files to HDFS.
- Deploy application in standalone mode on local disk.
To run an application in the standalone mode using input data from the local disk and writing output to the local disk, follow these steps:
- Source the environment in the installation directory
(/opt/ibm/platformsymphonyde/de732 by default). For example:
- (csh) source /opt/ibm/platformsymphonyde/de732/cshrc.platform
- (bsh) $ . /opt/ibm/platformsymphonyde/de732/profile.platform
- Ensure that the MapReduce application is enabled. Enter:
$ soamview app application_name
- Submit a MapReduce job using the mrsh utility using the following syntax:
mrsh jar jarfile [classname] [-options] [args]
where:- jarfile specifies the file name of the application packaged as a jar file that includes the MapReduce code.
- (Optional) classname specifies the class to be invoked. If the class is not specified, the class specified by the jar manifest is run.
- (Optional) -options specify
settings for a job using, among others, the -D option. To submit jobs in the standalone mode, use either of the
following parameters depending on your Hadoop version:
- 0.21: mapreduce.jobtracker.address
- 2.7.2: mapred.job.tracker
- (Optional) args specify arguments for the class.
For example, to submit the WordCount sample job, enter:- Hadoop version 0.21
mrsh jar $SOAM_HOME/mapreduce/version/os_type/samples/hadoop-mapred-examples-0.21.0.jar -Dmapreduce.jobtracker.address=local wordcount input output
- Hadoop version 2.7.2
mrsh jar $SOAM_HOME/mapreduce/version/os_type/samples/hadoop-0.20.2-examples.jar -Dmapred.job.tracker=local wordcount input output
The output for the WordCount job should now be available on the local disk.
If you must debug your application, see Debug MapReduce application in standalone mode.
- Source the environment in the installation directory
(/opt/ibm/platformsymphonyde/de732 by default). For example:
- Deploy application in standalone mode with HDFS.
When submitting a job in the standalone mode to HDFS, copy your input files from the local disk to the HDFS. Then, when the job is submitted, the input files are located on the HDFS directory and the output files are generated to the HDFS directory.
- Source the environment in the installation directory
(/opt/ibm/platformsymphonyde/deversion by default). For example:
- (csh) source /opt/ibm/platformsymphonyde/de732/cshrc.platform
- (bsh) $ . /opt/ibm/platformsymphonyde/de732/profile.platform
- Ensure that the MapReduce application is enabled. Enter:
$ soamview app application_name
- Submit a MapReduce job using the mrsh utility using the following syntax:
mrsh jar jarfile [classname] [-options] [args]
where:- jarfile specifies the file name of the application packaged as a jar file that includes the MapReduce code.
- (Optional) classname specifies the class to be invoked. If the class is not specified, the class specified by the jar manifest is run.
- (Optional) -options specify
settings for a job using, among others, the -D option. To submit jobs in the standalone mode, use either of the
following parameters depending on your Hadoop version:
- 0.21: mapreduce.jobtracker.address
- 2.7.2: mapred.job.tracker
- (Optional) args specify arguments for the class.
For example, to submit the WordCount sample job, enter:- Hadoop version 0.21
mrsh jar $SOAM_HOME/mapreduce/version/os_type/samples/hadoop-mapred-examples-0.21.0.jar -Dmapreduce.jobtracker.address=local wordcount hdfs://hadoopsys:9000/input hdfs://hadoopsys:9000/output-pmr
- Hadoop version 2.7.2
mrsh jar $SOAM_HOME/mapreduce/version/os_type/samples/hadoop-0.20.2-examples.jar -Dmapred.job.tracker=local wordcount hdfs://hadoopsys:9000/input hdfs://hadoopsys:9000/output-pmr
where:
- hadoopsys is the address of the HDFS namenode.
- 9000 is the HDFS port.
- /input is the directory on HDFS from which the WordCount job reads text input files.
- /output-pmr is the output directory under which the WordCount job creates the output file.
The output for the WordCount job should now be available on the HDFS directory.
- Source the environment in the installation directory
(/opt/ibm/platformsymphonyde/deversion by default). For example:
- Deploy MapReduce application in pseudo-distributed mode.
In the pseudo-distributed mode, each MapReduce daemon runs in separate JVMs for Mapper and Reducer processes on a single host.
- Source the environment in the installation directory
(/opt/ibm/platformsymphonyde/de732 by default). For example:
- (csh) source /opt/ibm/platformsymphonyde/de732/cshrc.platform
- (bsh) $ . /opt/ibm/platformsymphonyde/de732/profile.platform
- Ensure that the MapReduce application is enabled. Enter:
$ soamview app application_name
- Submit a MapReduce job using the mrsh utility using the following syntax:
mrsh jar jarfile [classname] [-options] [args]
where:- jarfile specifies the file name of the application packaged as a jar file that includes the MapReduce code.
- (Optional) classname specifies the class to be invoked. If the class is not specified, the class specified by the jar manifest is run.
- (Optional) -options specify settings for a job using, among others, the -D option.
- (Optional) args specify arguments for the class.
For example, to submit the WordCount sample job using Hadoop 0.20.2, enter:
mrsh jar $SOAM_HOME/mapreduce/version/os_type/samples/hadoop-0.20.2-examples.jar wordcount hdfs://hadoopsys:9000/input hdfs://hadoopsys:9000/output-pmr
where:
- hadoopsys is the address of the HDFS namenode.
- 9000 is the HDFS port.
- /input is the directory on HDFS from which the WordCount job reads text input files.
- /output-pmr is the output directory under which the WordCount job creates the output file.
The output for the WordCount job should now be available on the HDFS directory.
- Source the environment in the installation directory
(/opt/ibm/platformsymphonyde/de732 by default). For example: