mrsh

The command to submit MapReduce jobs. The MapReduce framework is supported on IBM® Spectrum Symphony Advanced Edition on Linux®; however, you can run MapReduce jobs in an IBM Spectrum Symphony Developer Edition environment, and you can submit MapReduce jobs from IBM Spectrum Symphony Developer Edition to IBM Spectrum Symphony Advanced Edition.

Synopsis

mrsh subcommand [options]

mrsh -h

mrsh -V

mrsh version

mrsh subcommand -h

Description

Use the mrsh command to submit MapReduce jobs.

-h: Outputs command usage and exits.
-V: Prints the IBM Spectrum Symphony-MapReduce framework version to stderr and exits.
version: Prints the IBM Spectrum Symphony-MapReduce framework API version to stderr and exits.
subcommand -h: Outputs subcommand usage and exits. The mrsh command supports the jar, pipes, and cleanup subcommands. Refer to the subcommand synopsis for details.

Subcommand synopsis

jar jar_file [classname] [-options option] [args...]

pipes [-options]

cleanup [-u user_name] -x password

jar `jarfile` [`classname`] [-options `option`] [-args …]

Submits MapReduce jobs in the IBM Spectrum Symphony-MapReduce framework with optional class, option, and argument specifications.

jar jar_file

Runs the MapReduce framework packaged as a JAR file, using the specified JAR file name.

classname

Specifies the class to be invoked. If the class is not specified, the class that is specified by the JAR manifest is run.

-options

Specifies MapReduce job configuration options. Supported options are as follows:

-confconfiguration_file: Specifies an application configuration file.
-Dname=value: Specifies the job configuration property in name-value pairs. For example, to specify the number of threads to be used to execute a MapReduce job, use the pmr.reduce.multithread.num option with a value (for instance, to specify three threads in your job submission , specify -Dpmr.reduce.multithread.num=3).
Note: For IBM Spectrum Symphony Developer Edition, if you run MapReduce jobs to show the execution profile (by specifying pmr.performance.instrument.level=value_greater_than_0), you will see an error similar to The required file /directory_path/datasource.xml does not exist. This error occurs as a warning that the datasource.xml file is required by the reporting framework, and IBM Spectrum Symphony Developer Edition does not support this framework. You can safely ignore this message. This error does not display if you run the MapReduce job using IBM Spectrum Symphony.
-fslocal|namenode:port: Specifies a file system's address (specify either local, or the actual name address of the NameNode (not its URL)). The port is the file system port, such as the HDFS port.
-jtlocal|namenode:port: Specifies the job tracker's address (specify either local, or the actual name address of the NameNode (not its URL)). The port is the job tracker port, such as the HDFS port.
-filescomma_separated_list: Specifies a list of files to be copied to the MapReduce cluster. Separate file names with a comma.
-libjarscomma_separated_list: Specifies a list of JAR files to be included in the classpath. Separate file names with a comma.
-archivescomma_separated_list: Specifies a list of archives to be unarchived on the compute hosts.

-args

Specifies the arguments to be invoked for the MapReduce job.

pipes [-input `path`] | [-output `path`] | [-jar `jar_file`] | [-inputformat `class`] | [-map `class`] | [-partitioner `class`] | [-reduce `class`] | [-writer `class`] | [-program `executable`] | [-reduces `number`] | [-lazyOutput true|false]

Submits MapReduce jobs in the IBM Spectrum Symphony-MapReduce framework with pipe (redirection) specifications. Specify any of the supported pipe options to pass information for MapReduce jobs.

pipes [-input path]: Runs the MapReduce job from the specified input directory.

pipes [-output path]: Runs the MapReduce job to the specified output directory.

pipes [-jar jar_file]: Runs the MapReduce job packaged as a JAR file, using the specified JAR file name.

pipes [-inputformat class]: Runs the MapReduce job using the specified input format class.

pipes [-map class]: Runs the MapReduce job using the specified Java map class.

pipes [-partitioner class]: Runs the MapReduce job using the specified Java partitioner class.

pipes [-reduce class]: Runs the MapReduce job using the specified Java reduce class.

pipes [-writer class]: Runs the MapReduce job using the specified Java record writer.

pipes [-program executable]: Runs the MapReduce job using the specified executable URI.

pipes [-reduces number]: Runs the MapReduce job using the specified number of reduces.

pipes [-lazyOutput true|false]: Runs the MapReduce job in lazy output format. Specify mrsh pipes -lazyOutput true to specify lazy output. Otherwise, specify mrsh pipes -lazyOutput false. The default is no lazy output.

cleanup [-u `user_name`] [-x `password`]

Removes intermediate data for aborted or terminated MapReduce jobs. Intermediate data relates to files generated by map tasks on the local disk that are used as input for the reduce task.

-u user_name: Specifies the name of the user to connect to IBM Spectrum Symphony for this command. If you are already logged on to IBM Spectrum Symphony using the soamlogon command, for this command only the user name specified here overrides the user name entered in the soamlogon command.
-x password: Specifies the user password to connect to IBM Spectrum Symphony for this command. If you are already logged on to IBM Spectrum Symphony using the soamlogon command, for this command only the password specified here overrides the password entered in the soamlogon command.