Adding MapReduce to your IBM Spectrum Symphony cluster

MapReduce is not automatically installed by default; you must specify the INSTALL_MAPREDUCE installation environment variable to install it when you installed IBM® Spectrum Symphony. However, if you did not previously install MapReduce, you can manually add it now to your IBM Spectrum Symphony cluster.

Before you begin

  • To install MapReduce on your IBM Spectrum Symphony cluster, you require the same installer package (.bin file) used to install IBM Spectrum Symphony. The installer will detect and skip over the other IBM Spectrum Symphony packages you already have installed; it will only install the MapReduce packages (soammrcore-version.architecture.rpm, and soammrmgmt-version.noarch.rpm).
  • After installing the MapReduce binaries, you create consumers and resource groups for the MapReduce configuration. To create resource groups, you must be a cluster administrator or have the Resource Groups Manage permission.
  • Use the same installation environment as with IBM Spectrum Symphony:
    • If you installed IBM Spectrum Symphony to a local environment, then install MapReduce on every host.
    • If you installed IBM Spectrum Symphony to a shared environment, then install MapReduce to the shared primary host, and on no other hosts. Every host in the cluster shares this installation, and therefore, should have access to this primary host.

Procedure

  1. Log in to the host by using the installation user operating system account (cluster administrator, root or sudo to root permission) used to install IBM Spectrum Symphony, and source the environment:
    • If you are running Linux® by using bash, run:
      . $EGO_TOP/profile.platform
    • If you are running Linux by using csh, run:
      source $EGO_TOP/cshrc.platform

      where $EGO_TOP is the path to your installation directory (default is /opt/ibm/spectrumcomputing).

    Important: To install MapReduce to a shared environment, you install only once on the primary host, so log on to that primary host and source the environment.

    To install MapReduce to a local environment, log on to each host (primary host, other management hosts, and each compute host), and source the environment on each host.

  2. Define cluster properties by setting the following environment variables. If you do not set the optional environment variables, the default values are used. If you have previously set these environment variables (for example, during IBM Spectrum Symphony installation), setting them now will override the previous settings.
    To define cluster properties as environment variables, run the following commands corresponding to your shell:
    • sh, ksh, or bash:
      export VARIABLE_NAME=value
      
    • csh or tcsh:
      setenv VARIABLE_NAME value
    To define cluster properties in a file, create a simple text file /tmp/install.config and enter each variable on a new line, in this format:
    VARIABLE_NAME=value
    Any variables set in the environment overwrite the same variables set in the configuration file.
    Option Description
    INSTALL_MAPREDUCE
    Specify Y to indicate that you want to install MapReduce. For example:
    export INSTALL_MAPREDUCE=Y
    IBM_SPECTRUM_SYMPHONY_LICENSE_ACCEPT Mandatory when you install without user interaction (that is, with the --quiet option), to indicate that you accept the license agreement before installing:
    export IBM_SPECTRUM_SYMPHONY_LICENSE_ACCEPT=Y
    JAVA_HOME Specify the installation directory of Oracle or IBM Java™. For example:
    export JAVA_HOME=/usr/java/latest
    Note: Your Java installation location is required for enabling the MapReduce framework. You can enable this framework by setting the JAVA_HOME environment variable before you install IBM Spectrum Symphony.

    If you install IBM Spectrum Symphony without setting the JAVA_HOME environment variable, you can do so later by defining the JAVA_HOME in the $SOAM_HOME/mapreduce/conf/pmr-env.sh file.

    HADOOP_HOME Specify the installation directory of the Hadoop distribution. For example:
    export HADOOP_HOME=/opt/hadoop-version
    HADOOP_VERSION Specify the version of the Hadoop distribution. For example:
    export HADOOP_VERSION=version
  3. For Ubuntu installation, run the apt-get install rpm in Ubuntu to install the RPM utility.
  4. Run the IBM Spectrum Symphony installer package. Use the same .bin file that you used to install IBM Spectrum Symphony. The installer will detect and skip over the other IBM Spectrum Symphony packages you have installed, and only install the MapReduce packages (soammrcore-version.architecture.rpm, and soammrmgmt-version.noarch.rpm).
    • If you installed IBM Spectrum Symphony with default settings, run:
      sym-version_architecture.bin
      If you are using the evaluation version, run:
      symeval-version_architecture.bin
      Note: Alternative method: If you must use .rpm files instead of .bin, extract the MapReduce .rpm files, and then install them. On management hosts, there are two .rpm files to install, and the order is important.
      For example, first extract the .rpm files from the sym-version_architecture.bin package by running:
      sym-version_architecture.bin --extract extract_directory

      where extract_directory specifies the directory to extract .rpm files.

      Next, install each MapReduce .rpm file:
      • For management hosts:
        rpm -ivh soammrcore-version.architecture.rpm
        rpm -ivh soammrmgmt-version.noarch.rpm
      • For compute hosts:
        rpm -ivh soammrcore-version.architecture.rpm
      Additional note for Ubuntu: To install the .rpm files on Ubuntu, you must also specify the --ignorearch parameter. For example:
      • For management hosts:
        rpm -ivh --ignorearch soammrcore-version.architecture.rpm
        rpm -ivh --ignorearch soammrmgmt-version.noarch.rpm
      • For compute hosts:
        rpm -ivh --ignorearch soammrcore-version.architecture.rpm
    • If you installed IBM Spectrum Symphony to a different location other than the default, run:
      sym-version_architecture.bin --prefix install_location --dbpath dbpath_location
      If you are using the evaluation version, run:
      symeval-version_architecture.bin --prefix install_location --dbpath dbpath_location
      where:
      • --prefix install_location specifies the absolute path to the installation directory. The --prefix parameter is optional. If you install without the --prefix option, IBM Spectrum Symphony is installed in its default directory: /opt/ibm/spectrumcomputing on Linux. Ensure that the path is set to a clean directory.

        Use the same --prefix path as used for your IBM Spectrum Symphony installation.

      • --dbpath dbpath_location sets the RPM database to a directory different from the default /var/lib/rpm. The --dbpath parameter is optional.

        Use the same --dbpath path as used for your IBM Spectrum Symphony installation.

      Note: Alternative method: If you must use .rpm files instead of .bin, extract the MapReduce .rpm files, and then install them. On management hosts, there are two .rpm files to install, and the order is important.
      For example, first extract the .rpm files from the sym-version_architecture.bin package by running:
      sym-version_architecture.bin --extract extract_directory

      where extract_directory specifies the directory to extract .rpm files.

      Next, install each MapReduce .rpm file:
      • For management hosts:
        rpm -ivh --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
        rpm -ivh --prefix install_location --dbpath dbpath_location soammrmgmt-version.noarch.rpm
      • For compute hosts:
        rpm -ivh --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
      Additional note for Ubuntu: To install the .rpm files on Ubuntu, you must also specify the --ignorearch parameter. For example:
      • For management hosts:
        rpm -ivh --ignorearch --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
        rpm -ivh --ignorearch --prefix install_location --dbpath dbpath_location soammrmgmt-version.noarch.rpm
      • For compute hosts:
        rpm -ivh --ignorearch --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
    • If you installed IBM Spectrum Symphony without user interaction,run:
      Remember: Before installing with the --quiet option, ensure that you have set the IBM_SPECTRUM_SYMPHONY_LICENSE_ACCEPT environment variable to Y or y.
      sym-version_architecture.bin --quiet
      If you are using the evaluation version, run:
      symeval-version_architecture.bin --quiet

      where --quiet suppresses prompts during installation.

  5. After MapReduce installation is complete on all applicable hosts, complete configuration steps on the primary host:
    1. Log on to the primary host and source the environment:
      • If you are running Linux by using bash, run:
        . $EGO_TOP/profile.platform
      • If you are running Linux by using csh, run:
        source $EGO_TOP/cshrc.platform
      where $EGO_TOP is the path to your installation directory (default is /opt/ibm/spectrumcomputing).
    2. If you configured a shared directory for IBM Spectrum Symphony failover, specify the shared directory:
      egoconfig mghost shared_location
    3. Specify the host name and port number for the HDFS NameNode in the $EGO_CONFDIR/wsm.conf file by editing the file to specify the DFS_GUI_HOSTNAME and DFS_GUI_PORT parameters. For example:
      DFS_GUI_HOSTNAME=namenodehost.mydomain.com
      DFS_GUI_PORT=50070
    4. Create resource groups:

      Note also that there are predefined resource groups, such as the ManagementHosts and ComputeHosts resource groups, which you will use in the next step. This step is to create non-default resource groups required for MapReduce.

      1. Log on to the cluster management console as the cluster administrator (such as Admin).
      2. Select Resources > Resource Planning (Slot) > Resource Groups.
      3. Click Global Actions > Create a Resource Group and create these groups:
        • MapReduceInternalResourceGroup
        • NameNodeRG
        • SecondaryNodeRG
        • DataNodeRG
        For detailed steps, see Creating resource groups.
    5. Create consumers, and associate them with the resource groups:
      1. Locate the following section in the $EGO_CONFDIR/../../gui/conf/pmcconf/pmc_conf_ego.xml file, and remove the ;MapReduceInternalResourceGroup;DataNodeRG;NameNodeRG;SecondaryNodeRG; line in the <Value> ... </Value> attribute:
        <Parameter>
                <Name>hideInternalResourceGroup</Name>
                <!-- This parameter allows the user to define which resource group should be hidden on consumer properties page. -->
                <!-- By default, not define this parameter, it will hide InternalResourceGroup, MapReduceInternalResourceGroup, DataNodeRG, -->
                <!-- NameNodeRG and SecondaryNodeRG internal resource groups. Resource group names must be separated by a semicolon (";"). -->
                <Value>InternalResourceGroup;MapReduceInternalResourceGroup;DataNodeRG;NameNodeRG;SecondaryNodeRG;</Value>
        </Parameter>

        so that the <Value> ... </Value> attribute in that section is as follows:

        <Value>InternalResourceGroup</Value>
      2. Stop and restart the WEBGUI service:
        egosh service stop WEBGUI
        egosh service start WEBGUI
      3. Create consumers for MapReduce: from the cluster management console, select Resources > Consumers, locate and click on the tree level for which you would like to add a consumer, and then click Global Actions > Create a Consumer.
        When you create the consumer, you specify a resource group for it. Create these consumers and associate them with your resource groups, as follows:
        Consumer Resource group
        /MapReduceConsumer ManagementHosts and ComputeHosts
        /MapReduceConsumer/MapReduce732 ManagementHosts and ComputeHosts
        /ComputeServices MapReduceInternalResourceGroup
        /ComputeServices/MapreduceComputeServices MapReduceInternalResourceGroup
        /HDFS DataNodeRG, SecondaryNodeRG, and NameNodeRG
        /HDFS/NameNodeConsumer NameNodeRG
        /HDFS/SecondaryNodeConsumer SecondaryNodeRG
        /HDFS/DataNodeConsumer DataNodeRG

        For details steps, see Creating a consumer and Modifying consumer properties.

    6. Register the MRSS (MapReduce EGO shuffle service under the /ComputeServices/MapreduceComputeServices consumer):
      1. From the cluster management console, select System & Services > EGO Services > Service Profiles.
      2. In the consumer tree, navigate to the /ComputeServices/MapreduceComputeServices consumer.
      3. Click Global Actions > Register a new service.
      4. Register the MRSS by clicking Import > Browse, navigate to the $EGO_TOP/.install/sym_advanced_edition/mrss.xml file, and click Register.
    7. Register the MapReduce application:
      soamreg $EGO_TOP/.install/sym_advanced_edition/MapReduce7.3.2.xml
    8. Enable data loading and data purging for MapReduce by copying the following files from the $EGO_TOP/.install/sym_advanced_edition/ directory to the $EGO_CONFDIR/../../perf/conf/ directory as follows:
      cp $EGO_TOP/.install/sym_advanced_edition/plc_pmr.xml $EGO_CONFDIR/../../perf/conf/plc
      cp $EGO_TOP/.install/sym_advanced_edition/purger_pmr.xml $EGO_CONFDIR/../../perf/conf/purger
      cp $EGO_TOP/.install/sym_advanced_edition/pmrresourcemetrics.xml $EGO_CONFDIR/../../perf/conf/dataloader
  6. Restart the cluster:
    egosh service stop all
    egosh ego shutdown
    egosh ego start