Configuring circular buffering for map tasks

A circular buffer can be used instead of the default dual buffers to temporarily save intermediate data from map tasks. Use the pmr.map.output.buffer.type parameter to specify the circular buffering approach.

About this task

You can configure circular buffering from the mrsh utility, or in a configuration file.

Procedure

  • Configure circular buffering from the mrsh utility:
    1. To enable the use of a circular buffer from the command line, add the pmr.map.output.buffer.type option to your job submission command and set its value to circular:
      $ mrsh jar jarfile [classname] -Dpmr.map.output.buffer.type=circular [args]
    2. To specify the maximum size of the buffer, add the io.sort.mb option to your job submission command and set its value:
      $ mrsh jar jarfile [classname] -Dio.sort.mb=360 [args]
    3. To specify a percentage of the maximum buffer size, at which point the buffer contents begin to spill to disk, add the io.sort.spill.percent option to your job submission command and set its value:
      $ mrsh jar jarfile [classname] -Dio.sort.mb=0.6 [args]
  • Configure circular buffering in a configuration file:
    Note: Options set in the mrsh command-line override options set in the configuration file.
    1. Open the pmr-site.xml configuration file at $PMR_HOME/conf.
    2. Add the pmr.map.output.buffer.type property. For example:
      <property>
        <name>pmr.map.output.buffer.type</name>
        <value>circular</value>
      </property>
      
    3. Save the file.