mrss.xml reference

The mrss.xml configuration file applies to MapReduce workload, which is available only with the IBM® Spectrum Symphony Advanced Edition. An entitlement key is required to enable the MapReduce framework.

The MapReduce framework in IBM Spectrum Symphony provides a data transfer daemon for the shuffle service phase of a MapReduce job. This shuffle service daemon runs as an IBM Spectrum Symphony service on each host in the cluster. It optimizes local memory and disk usage to facilitate faster shuffling processes for map task output to local and remote reduce tasks.

Configure the environment for the shuffle service daemon by using the mrss.xml file.

Location

This file is installed with IBM Spectrum Symphony at $EGO_ESRVDIR/esc/conf/services/.

Environment Variables

PMR_MRSS_SHUFFLE_CLIENT_PORT

The port which is used by the shuffle service. This port is by default BASEPORT+10. If you use the default port of 7869, the shuffle service port is 7879.
Note: The shuffle service port is listed in mrss.xml and in $SOAM_HOME/mapreduce/conf/pmr-env.sh. If you change the port, ensure that you update the port number in both configuration files.

Default: 7879

PMR_MRSS_SHUFFLE_DATA_WRITE_PORT

The port which is used by the shuffle service for data writes. This port is by default BASEPORT+11. If you use the default base port of 7869, the shuffle service port for data writes is 7880.
Note: The shuffle service port is listed in mrss.xml and in $SOAM_HOME/mapreduce/conf/pmr-env.sh. If you change the port, ensure that you update the port number in both configuration files.

Default: 7881

PMR_MRSS_WORKING_THREADS_NUMBER

The number of working threads for data-copy requests.

Default: 20

PMR_MRSS_DATA_WRITE_WORKING_THREADS_NUMBER

The number of working threads for data-write requests.

Default: 24

PMR_MRSS_TASK_LOG_DIR

Tthe log directory for map and reduce tasks. The shuffle service checks this directory every PMR_MRSS_TASK_LOG_CLEAN_INTERVAL seconds. If a subdirectory is created PMR_MRSS_TASK_DIRECTORY_DELETE_INTERVAL seconds ago, the shuffle service deletes the subdirectory.

Default: ${PMR_HOME}/logs

PMR_MRSS_TASK_LOG_CLEAN_INTERVAL

The interval, in minutes, at which the shuffle service checks and cleans the log directory for a map or reduce task.

Default: 30

PMR_MRSS_TASK_DIRECTORY_DELETE_INTERVAL

The interval, in hours, at which the shuffle service deletes the log directory for a map or reduce task.

Default: 48

PMR_MRSS_CHUNK_SIZE

The size of the data chunk, in KB, copied during the shuffle phase to the reducer.

Default: 64

PMR_MRSS_CACHE_PATH

A location on the local disk to store the map file of the input split. Relates to the feature configuration for cache-aware scheduling, enabling a job to get its input split from the cache.

Default: ${PMR_HOME}/work/datacache

PMR_MRSS_INPUTCACHE_MAX_MEMSIZE_MB

The maximum memory limit of the input split cache (in MB). If the size of the total memory cache does not exceed the configured size, the cache files are mapped to system memory and used as in-memory cache. If the size of the total memory cache exceeds the configured size, the cache files are not mapped to system memory but are instead used as on-disk cache. Relates to the feature configuration for cache-aware scheduling, enabling a job to get its input split from the cache.

Default: 2

PMR_MRSS_INPUTCACHE_CLEAN_INTERVAL

The duration (in seconds) that a split is cached without being accessed by any job of any application. When a split cache exceeds this duration, the oldest data input split (and its local disk file) are deleted from the MapReduce shuffle service cache. Relates to the feature configuration for cache-aware scheduling, enabling a job to get its input split from the cache.

Default: 3600

Example of mrss.xml file

<sc:ActivityDescription>
    ...
    <ego:ActivitySpecification>
      ...
		<ego:EnvironmentVariable name="PMR_MRSS_SHUFFLE_CLIENT_PORT">35010</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_SHUFFLE_DATA_WRITE_PORT">35011</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_WORKING_THREADS_NUMBER">20</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_WORKING_THREADS_NUMBER">24</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_TASK_LOG_DIR">${PMR_HOME}/logs</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_TASK_LOG_CLEAN_INTERVAL">30</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_TASK_DIRECTORY_DELETE_INTERVAL">48</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_CHUNK_SIZE">64</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_CACHE_PATH">${PMR_HOME}/work/datacache</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_INPUTCACHE_MAX_MEMSIZE_MB">2048</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_INPUTCACHE_CLEAN_INTERVAL">3600</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_PRINCIPALNAME">testuser/iMapReduce@EXAMPLE.COM</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_KEYTAB">/dev/sym_mr/kernel/conf/abcuser.keytab</ego:EnvironmentVariable>
    <ego:EnvironmentVariable name="PMR_MRSS_PRINCIPALNAME">/usr/bin</ego:EnvironmentVariable>
			...
    </ego:ActivitySpecification>
  </sc:ActivityDescription>