Run parameters

Map/Reduce streaming is run by using the mapreduce jar command. When executing, you must also specify the path to the Map/Reduce streaming jar. Each streaming invocation must have at least one mapper job specified. Following is the syntax for the mapreducestreaming.jar command:

mapreduce jar /nz/export/ae/products/netezza/mapreduce/current/mapreducestreaming.
jar
-db <db>
-input <table name> <key_column> <value column>
-output <output_table> <key_column> <value_column>
-mapper <mapper_cmd>
-mapper_out_key_size <size>
-mapper_out_value_size <size>
-reducer <reducer_cmd>
-reducer_out_key_size <size>
-reducer_out_value_size <size>
-file <file>

The following table describes the required parameters.

Table 1. Required parameters for the mapreduce jar command
Parameter	Description
db	Specifies the name of the database containing input data.
input <table_name> <key_column> <value_column>	Specifies the name of the table containing the input data and the names of the columns where key and value data is stored.
output <table_name> <key_column> <value_column>	Specifies the name of the table where output data will be stored, followed by the names of key and value columns.
mapper	Executes the map step on the SPU.
mapper_out_key_size	Specifies the size (number of characters) of the output key column created after the map step.
mapper_output_value_size	Specifies the size (number of characters) of the output value column created after the map step.

The combine and reduce steps are optional. They can be defined by specifying the following parameters.

Table 2. Parameters for optional combine and reduce
Parameter	Description
combiner	Executes the combine step on the SPU.
combiner_out_key_size	Specifies the size (number of characters) of the output key column created after the combine step.
combiner_output_value_size	Specifies the size (number of characters) of the output value column created after the combine step. reducer Executes the reduce step on the SPU.
reducer_out_key_size	Specifies the size (number of characters) of the output key column created after the reduce step.
reducer_output_value_size	Specifies the size (number of characters) of the output value column created after the reduce step.

You must use the file parameter (specified by the streaming command syntax)to specify each file that you want to run mapper/combiner/reducer commands on. Multiple file parameters are allowed. All files are copied to a temporary directory that is accessible by the SPUs.

Within the streaming program, one line of input and output consists of key and value entries separated by a tab character. However, you can define other separators to be used to distinguish key from value by specifying the following parameters.

Table 3. Alternate separators

Input separators
mapper_output_separator	Tab character by default, or set to any chosen symbol.
combiner_output_separator
reducer_output_separator
Output separators
mapper_output_separator	Tab character by default, or set to any chosen symbol.
combiner_output_separator
reducer_output_separator

Parameters to the mapper, combiner, or reducer are passed using environment variables on the SPUs. To pass parameters, use the cmdenv option. For example, enter the following to set the environment variable “NAME” on the SPU to contain the value ADAM:

-cmdenv “NAME=ADAM”

The variable can be subsequently read in a program or script with commands specific to the programming language being used.

Note: It is possible to mix streaming functionality with standard Java mapper/reducer/combiner classes. Instead of defining mapper/combiner/reducer parameters, you specify mapperClass/combinerClass/reducerClass parameters with appropriate class names. All java classes should be included in jar archives and passed to the mapreduce_streaming executable through the libJar parameter.