pmr-site.xml reference

The pmr-site.xml configuration file applies to MapReduce workload, which is available only with the IBM® Spectrum Symphony Advanced Edition. To enable the MapReduce framework, you must have an advanced edition entitlement key.

The pmr-site.xml file adopts an XML format similar to Hadoop's core-site.xml file (used to define core properties) and mapred-site.xml file (used to define MapReduce properties).

The properties that are defined in the pmr-site.xml file, such as the default MapReduce application and the logon user for job submission, are specific to the MapReduce framework in IBM Spectrum Symphony. You can, however, use this file to add or adjust some Hadoop parameters, such as the log level for map and reduce tasks.

Important: Any Hadoop parameter that is defined in the pmr-site.xml file takes precedence over the corresponding parameter defined in Hadoop configuration files (such as mapred-site.xml).

Use the pmr-site.xml file to define Hadoop parameters only if you did not set either of the following environment variables, which import settings from your Hadoop installation to IBM Spectrum Symphony:

HADOOP_HOME before installing IBM Spectrum Symphony as described in Installing and configuring IBM Spectrum Symphony.
PMR_EXTERNAL_CONFIG_PATH after installing IBM Spectrum Symphony as described in Working with the MapReduce framework and IBM Spectrum Symphony.

The properties in the pmr-site.xml file apply to all jobs submitted from the local host. To specify settings for a single job, use the -D option from the mrsh utility or the MapReduce console on the cluster management console during job submission.

Location

This configuration file is installed with IBM Spectrum Symphony in the %SOAM_HOME%/mapreduce/conf directory.

Properties

The properties that you can define in the pmr-site.xml file are of two types:

IBM Spectrum Symphony properties.
Hadoop properties which are supported by IBM Spectrum Symphony.

For a list of properties that are supported in the pmr-site.xml file, see Configuration files in MapReduce framework.

Example

<property>
  <name>mapreduce.application.name</name>
  <value>MapReduce731.</value>
  <description>The MapReduce application name</description>
</property>
<property>
  <name>mapreduce.job.login.user</name>
  <value>User1</value>
  <description>The user to submit jobs</description>
</property>
<property>
  <name>mapreduce.job.login.password</name>
  <value>User1</value>
  <description>The password of submit user</description>
</property>
<property>
  <name>mapreduce.map.log.level</name>
  <value>INFO</value>
</property>
<property>
  <name>mapreduce.reduce.log.level</name>
  <value>INFO</value>
  <description>Log level can be ERROR, WARN, INFO, DEBUG, or ALL.
  </description>
</property>
<property>
  <name>mapreduce.setup.log.level</name>
  <value>INFO</value>
</property>
<property>
  <name>mapreduce.cleanup.log.level</name>
  <value>INFO</value>
</property>
<property>
  <name>mapreduce.reduce.shuffle.parallelcopies</name>
  <value>5</value>
  <description>The number of concurrent threads to copy the output of map tasks
 from mapper machine.</description>
</property>
<property>
  <name>mapreduce.job.intermediatedata.checksum</name>
  <value>true</value>
  <description>Job uses CRC32 checksum to validate intermediate data for 
accidental errors during shuffle stage.</description>
</property>
<property>
  <name>mapreduce.task.io.sort.factor</name>
	<value>10</value>
  <description>The number of streams to merge at once while sorting files. This
 determines the number of open file handles.</description>
</property>
<property>
  <name>mapreduce.reduce.merge.inmem.threshold</name>
  <value>1000</value>
  <description>The threshold, in terms of the number of files for the in-memory
 merge process. The Reducer waits for intermediate files provided by Mappers. 
All these files are stored in local memories for merge operation. If local host 
has stored max-file-amount of files or the total file size exceeds the memory size 
threshold, the Reducer will start the merge operation.
  </description>
</property>
<property>
  <name>mapreduce.reduce.shuffle.merge.percent</name>
  <value>0.66</value>
  <description>The usage threshold at which an in-memory merge will be initiated, 
expressed as a percentage of the total memory allocated to storing in-memory map 
outputs, as defined by mapreduce.reduce.shuffle.input.buffer.percent.
  </description>
</property>
<property>
  <name>mapreduce.reduce.shuffle.input.buffer.percent</name>
  <value>0.70</value>
  <description>The percentage of memory to be allocated from the maximum heap size 
to storing map outputs during the shuffle.
  </description>
</property>
<property>
  <name>mapreduce.reduce.input.buffer.percent</name>
  <value>0.0</value>
  <description>The percentage of memory-relative to the maximum heap size-to 
retain map outputs during the reduce. When the shuffle is concluded, any 
remaining map outputs in memory must consume less than this threshold 
before the reduce can begin.
  </description>
</property>
<property>
  <name>pmr.debug.task.keepfiles.pattern</name>
  <value>heapdump.*|javacore.*|Snap.*|core.*</value>
  <description>Regular expression of the file name. If the file name 
	matches this expression, the files under the task working directory
(${PMR_HOME}/work/AppName/ServiceIndex/) will be saved under 
{PMR_HOME}/work/AppName/save/sessionID/taskID.
</description>
</property>
<!--property>
  <name>pmr.debug.job.keep.failedtask.files</name>
  <value>false</value>
  <description>true: Indicates that, if there are failed tasks in this job, 
	job related working directories will remain and not be cleaned up.
	false: Indicates that job related working directories should not 
	remain and that directories will be cleaned up after the job completes. 
  </description>
</property-->
<!--property>
  <name>pmr.debug.job.keep.failedtask.interdata</name>
  <value>true</value>
  <description>true: Indicates that if there are failed tasks in this 
	MapReduce job, map task output intermediate data directories will remain 
	and not be cleaned up.
	false: Indicates that map task output intermediate data directories 
	should not remain and that directories will be cleaned up after the 
	MapReduce job completes.
</description>
</property-->
<property>
  <name>pmr.reduce.multithread.num</name>
  <value>1</value>
  <description>The number of concurrent threads to be created to execute 
	the MapReduce job. The default value is 1, meaning no new threads will be 
	created. A value of 2 means two threads will be used to execute the job. 
	This setting is mandatory.
</description>
</property>
<!--property>
  <name>pmr.reduce.multithread.sample.min</name>
  <value>10</value>
  <description>Number of sample keys required to be collected. The sample 
	keys determine 	how to partition and create corresponding threads to execute 
	MapReduce jobs. The default value is 10. This setting is optional.
</description>
</property-->