pmr-site.xml reference
The pmr-site.xml configuration file applies to MapReduce workload, which is available only with the IBM® Spectrum Symphony Advanced Edition. To enable the MapReduce framework, you must have an advanced edition entitlement key.
The pmr-site.xml file adopts an XML format similar to Hadoop's core-site.xml file (used to define core properties) and mapred-site.xml file (used to define MapReduce properties).
The properties that are defined in the pmr-site.xml file, such as the default MapReduce application and the logon user for job submission, are specific to the MapReduce framework in IBM Spectrum Symphony. You can, however, use this file to add or adjust some Hadoop parameters, such as the log level for map and reduce tasks.
- HADOOP_HOME before installing IBM Spectrum Symphony as described in Installing and configuring IBM Spectrum Symphony.
- PMR_EXTERNAL_CONFIG_PATH after installing IBM Spectrum Symphony as described in Working with the MapReduce framework and IBM Spectrum Symphony.
The properties in the pmr-site.xml file apply to all jobs submitted from the local host. To specify settings for a single job, use the -D option from the mrsh utility or the MapReduce console on the cluster management console during job submission.
Location
This configuration file is installed with IBM Spectrum Symphony in the %SOAM_HOME%/mapreduce/conf directory.
Properties
- IBM Spectrum Symphony properties.
- Hadoop properties which are supported by IBM Spectrum Symphony.
For a list of properties that are supported in the pmr-site.xml file, see Configuration files in MapReduce framework.
Example
<property>
<name>mapreduce.application.name</name>
<value>MapReduce731.</value>
<description>The MapReduce application name</description>
</property>
<property>
<name>mapreduce.job.login.user</name>
<value>User1</value>
<description>The user to submit jobs</description>
</property>
<property>
<name>mapreduce.job.login.password</name>
<value>User1</value>
<description>The password of submit user</description>
</property>
<property>
<name>mapreduce.map.log.level</name>
<value>INFO</value>
</property>
<property>
<name>mapreduce.reduce.log.level</name>
<value>INFO</value>
<description>Log level can be ERROR, WARN, INFO, DEBUG, or ALL.
</description>
</property>
<property>
<name>mapreduce.setup.log.level</name>
<value>INFO</value>
</property>
<property>
<name>mapreduce.cleanup.log.level</name>
<value>INFO</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>5</value>
<description>The number of concurrent threads to copy the output of map tasks
from mapper machine.</description>
</property>
<property>
<name>mapreduce.job.intermediatedata.checksum</name>
<value>true</value>
<description>Job uses CRC32 checksum to validate intermediate data for
accidental errors during shuffle stage.</description>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>10</value>
<description>The number of streams to merge at once while sorting files. This
determines the number of open file handles.</description>
</property>
<property>
<name>mapreduce.reduce.merge.inmem.threshold</name>
<value>1000</value>
<description>The threshold, in terms of the number of files for the in-memory
merge process. The Reducer waits for intermediate files provided by Mappers.
All these files are stored in local memories for merge operation. If local host
has stored max-file-amount of files or the total file size exceeds the memory size
threshold, the Reducer will start the merge operation.
</description>
</property>
<property>
<name>mapreduce.reduce.shuffle.merge.percent</name>
<value>0.66</value>
<description>The usage threshold at which an in-memory merge will be initiated,
expressed as a percentage of the total memory allocated to storing in-memory map
outputs, as defined by mapreduce.reduce.shuffle.input.buffer.percent.
</description>
</property>
<property>
<name>mapreduce.reduce.shuffle.input.buffer.percent</name>
<value>0.70</value>
<description>The percentage of memory to be allocated from the maximum heap size
to storing map outputs during the shuffle.
</description>
</property>
<property>
<name>mapreduce.reduce.input.buffer.percent</name>
<value>0.0</value>
<description>The percentage of memory-relative to the maximum heap size-to
retain map outputs during the reduce. When the shuffle is concluded, any
remaining map outputs in memory must consume less than this threshold
before the reduce can begin.
</description>
</property>
<property>
<name>pmr.debug.task.keepfiles.pattern</name>
<value>heapdump.*|javacore.*|Snap.*|core.*</value>
<description>Regular expression of the file name. If the file name
matches this expression, the files under the task working directory
(${PMR_HOME}/work/AppName/ServiceIndex/) will be saved under
{PMR_HOME}/work/AppName/save/sessionID/taskID.
</description>
</property>
<!--property>
<name>pmr.debug.job.keep.failedtask.files</name>
<value>false</value>
<description>true: Indicates that, if there are failed tasks in this job,
job related working directories will remain and not be cleaned up.
false: Indicates that job related working directories should not
remain and that directories will be cleaned up after the job completes.
</description>
</property-->
<!--property>
<name>pmr.debug.job.keep.failedtask.interdata</name>
<value>true</value>
<description>true: Indicates that if there are failed tasks in this
MapReduce job, map task output intermediate data directories will remain
and not be cleaned up.
false: Indicates that map task output intermediate data directories
should not remain and that directories will be cleaned up after the
MapReduce job completes.
</description>
</property-->
<property>
<name>pmr.reduce.multithread.num</name>
<value>1</value>
<description>The number of concurrent threads to be created to execute
the MapReduce job. The default value is 1, meaning no new threads will be
created. A value of 2 means two threads will be used to execute the job.
This setting is mandatory.
</description>
</property>
<!--property>
<name>pmr.reduce.multithread.sample.min</name>
<value>10</value>
<description>Number of sample keys required to be collected. The sample
keys determine how to partition and create corresponding threads to execute
MapReduce jobs. The default value is 10. This setting is optional.
</description>
</property-->