Adding MapReduce to your IBM Spectrum Symphony cluster
MapReduce is not automatically installed by default; you must specify the INSTALL_MAPREDUCE installation environment variable to install it when you installed IBM® Spectrum Symphony. However, if you did not previously install MapReduce, you can manually add it now to your IBM Spectrum Symphony cluster.
Before you begin
- To install MapReduce on your IBM Spectrum Symphony cluster, you require the same installer package (.bin file) used to install IBM Spectrum Symphony. The installer will detect and skip over the other IBM Spectrum Symphony packages you already have installed; it will only install the MapReduce packages (soammrcore-version.architecture.rpm, and soammrmgmt-version.noarch.rpm).
- After installing the MapReduce binaries, you create consumers and resource groups for the MapReduce configuration. To create resource groups, you must be a cluster administrator or have the Resource Groups Manage permission.
- Use the same
installation environment as with IBM Spectrum Symphony:
- If you installed IBM Spectrum Symphony to a local environment, then install MapReduce on every host.
- If you installed IBM Spectrum Symphony to a shared environment, then install MapReduce to the shared primary host, and on no other hosts. Every host in the cluster shares this installation, and therefore, should have access to this primary host.
Procedure
-
Log in to the host by using the installation user operating system account (cluster
administrator, root or sudo to root permission) used to install IBM Spectrum Symphony, and source the
environment:
- If you are running Linux® by using bash, run:
. $EGO_TOP/profile.platform
- If you are running Linux by using csh, run:
source $EGO_TOP/cshrc.platform
where $EGO_TOP is the path to your installation directory (default is /opt/ibm/spectrumcomputing).
Important: To install MapReduce to a shared environment, you install only once on the primary host, so log on to that primary host and source the environment.To install MapReduce to a local environment, log on to each host (primary host, other management hosts, and each compute host), and source the environment on each host.
- If you are running Linux® by using bash, run:
-
Define cluster properties by setting the following environment variables. If you do not set the
optional environment variables, the default values are used. If you have previously set these
environment variables (for example, during IBM Spectrum Symphony installation), setting them
now will override the previous settings.
To define cluster properties as environment variables, run the following commands corresponding to your shell:
- sh, ksh, or bash:
export VARIABLE_NAME=value
- csh or tcsh:
setenv VARIABLE_NAME value
To define cluster properties in a file, create a simple text file /tmp/install.config and enter each variable on a new line, in this format:
Any variables set in the environment overwrite the same variables set in the configuration file.VARIABLE_NAME=value
Option Description INSTALL_MAPREDUCE Specify Y to indicate that you want to install MapReduce. For example:export INSTALL_MAPREDUCE=Y
IBM_SPECTRUM_SYMPHONY_LICENSE_ACCEPT Mandatory when you install without user interaction (that is, with the --quiet option), to indicate that you accept the license agreement before installing: export IBM_SPECTRUM_SYMPHONY_LICENSE_ACCEPT=Y
JAVA_HOME Specify the installation directory of Oracle or IBM Java™. For example: export JAVA_HOME=/usr/java/latest
Note: Your Java installation location is required for enabling the MapReduce framework. You can enable this framework by setting the JAVA_HOME environment variable before you install IBM Spectrum Symphony.If you install IBM Spectrum Symphony without setting the JAVA_HOME environment variable, you can do so later by defining the JAVA_HOME in the $SOAM_HOME/mapreduce/conf/pmr-env.sh file.
HADOOP_HOME Specify the installation directory of the Hadoop distribution. For example: export HADOOP_HOME=/opt/hadoop-version
HADOOP_VERSION Specify the version of the Hadoop distribution. For example: export HADOOP_VERSION=version
- sh, ksh, or bash:
- For Ubuntu installation, run the apt-get install rpm in Ubuntu to install the RPM utility.
-
Run the IBM Spectrum Symphony
installer package. Use the same .bin file that you used to install IBM Spectrum Symphony. The installer will detect
and skip over the other IBM Spectrum Symphony packages you have
installed, and only install the MapReduce packages (soammrcore-version.architecture.rpm, and
soammrmgmt-version.noarch.rpm).
- If you installed IBM Spectrum Symphony with default settings, run:
sym-version_architecture.bin
If you are using the evaluation version, run:symeval-version_architecture.bin
Note: Alternative method: If you must use .rpm files instead of .bin, extract the MapReduce .rpm files, and then install them. On management hosts, there are two .rpm files to install, and the order is important.For example, first extract the .rpm files from the sym-version_architecture.bin package by running:sym-version_architecture.bin --extract extract_directory
where extract_directory specifies the directory to extract .rpm files.
Next, install each MapReduce .rpm file:- For management hosts:
rpm -ivh soammrcore-version.architecture.rpm
rpm -ivh soammrmgmt-version.noarch.rpm
- For compute hosts:
rpm -ivh soammrcore-version.architecture.rpm
Additional note for Ubuntu: To install the .rpm files on Ubuntu, you must also specify the --ignorearch parameter. For example:- For management
hosts:
rpm -ivh --ignorearch soammrcore-version.architecture.rpm
rpm -ivh --ignorearch soammrmgmt-version.noarch.rpm
- For compute hosts:
rpm -ivh --ignorearch soammrcore-version.architecture.rpm
- For management hosts:
- If you installed IBM Spectrum Symphony to a different location
other than the default,
run:
sym-version_architecture.bin --prefix install_location --dbpath dbpath_location
If you are using the evaluation version, run:symeval-version_architecture.bin --prefix install_location --dbpath dbpath_location
where:- --prefix
install_location specifies the absolute path to the installation directory. The
--prefix parameter is optional. If you install without the
--prefix option, IBM Spectrum Symphony is installed in its default
directory: /opt/ibm/spectrumcomputing
on Linux. Ensure that the path is set to a clean
directory.
Use the same --prefix path as used for your IBM Spectrum Symphony installation.
- --dbpath
dbpath_location sets the RPM database to a directory different from the default
/var/lib/rpm. The --dbpath parameter is optional.
Use the same --dbpath path as used for your IBM Spectrum Symphony installation.
Note: Alternative method: If you must use .rpm files instead of .bin, extract the MapReduce .rpm files, and then install them. On management hosts, there are two .rpm files to install, and the order is important.For example, first extract the .rpm files from the sym-version_architecture.bin package by running:sym-version_architecture.bin --extract extract_directory
where extract_directory specifies the directory to extract .rpm files.
Next, install each MapReduce .rpm file:- For management hosts:
rpm -ivh --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
rpm -ivh --prefix install_location --dbpath dbpath_location soammrmgmt-version.noarch.rpm
- For compute hosts:
rpm -ivh --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
Additional note for Ubuntu: To install the .rpm files on Ubuntu, you must also specify the --ignorearch parameter. For example:- For management
hosts:
rpm -ivh --ignorearch --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
rpm -ivh --ignorearch --prefix install_location --dbpath dbpath_location soammrmgmt-version.noarch.rpm
- For compute hosts:
rpm -ivh --ignorearch --prefix install_location --dbpath dbpath_location soammrcore-version.architecture.rpm
- --prefix
install_location specifies the absolute path to the installation directory. The
--prefix parameter is optional. If you install without the
--prefix option, IBM Spectrum Symphony is installed in its default
directory: /opt/ibm/spectrumcomputing
on Linux. Ensure that the path is set to a clean
directory.
- If you installed IBM Spectrum Symphony without user
interaction,run:Remember: Before installing with the --quiet option, ensure that you have set the IBM_SPECTRUM_SYMPHONY_LICENSE_ACCEPT environment variable to Y or y.
sym-version_architecture.bin --quiet
If you are using the evaluation version, run:symeval-version_architecture.bin --quiet
where --quiet suppresses prompts during installation.
- If you installed IBM Spectrum Symphony with default settings, run:
-
After MapReduce
installation is complete on all applicable hosts, complete configuration steps on the primary host:
- Log on to the primary host and source the
environment:
- If you are running Linux by using bash, run:
. $EGO_TOP/profile.platform
- If you are running Linux by using csh, run:
source $EGO_TOP/cshrc.platform
- If you are running Linux by using bash, run:
- If you configured a shared directory for IBM Spectrum Symphony failover, specify the
shared directory:
egoconfig mghost shared_location
- Specify the host name and port number for the HDFS NameNode in the
$EGO_CONFDIR/wsm.conf file by editing the file to specify the
DFS_GUI_HOSTNAME and DFS_GUI_PORT parameters. For
example:
DFS_GUI_HOSTNAME=namenodehost.mydomain.com DFS_GUI_PORT=50070
- Create resource groups:
Note also that there are predefined resource groups, such as the
ManagementHosts
andComputeHosts
resource groups, which you will use in the next step. This step is to create non-default resource groups required for MapReduce.- Log on to the cluster management console as the cluster administrator (such as Admin).
- Select Resources > Resource Planning (Slot) > Resource Groups.
- Click Global Actions > Create a Resource
Group and create these groups:
MapReduceInternalResourceGroup
NameNodeRG
SecondaryNodeRG
DataNodeRG
- Create consumers, and associate them with the resource groups:
- Locate the following section in the
$EGO_CONFDIR/../../gui/conf/pmcconf/pmc_conf_ego.xml file, and remove the
;MapReduceInternalResourceGroup;DataNodeRG;NameNodeRG;SecondaryNodeRG;
line in the<Value> ... </Value>
attribute:<Parameter> <Name>hideInternalResourceGroup</Name> <!-- This parameter allows the user to define which resource group should be hidden on consumer properties page. --> <!-- By default, not define this parameter, it will hide InternalResourceGroup, MapReduceInternalResourceGroup, DataNodeRG, --> <!-- NameNodeRG and SecondaryNodeRG internal resource groups. Resource group names must be separated by a semicolon (";"). --> <Value>InternalResourceGroup;MapReduceInternalResourceGroup;DataNodeRG;NameNodeRG;SecondaryNodeRG;</Value> </Parameter>
so that the
<Value> ... </Value>
attribute in that section is as follows:<Value>InternalResourceGroup</Value>
- Stop and restart the WEBGUI service:
egosh service stop WEBGUI egosh service start WEBGUI
- Create consumers for MapReduce: from the cluster management console, select
Resources > Consumers, locate
and click on the tree level for which you would like to add a consumer, and then click
Global Actions > Create a
Consumer. When you create the consumer, you specify a resource group for it. Create these consumers and associate them with your resource groups, as follows:
Consumer Resource group /MapReduceConsumer
ManagementHosts
andComputeHosts
/MapReduceConsumer/MapReduce732
ManagementHosts
andComputeHosts
/ComputeServices
MapReduceInternalResourceGroup
/ComputeServices/MapreduceComputeServices
MapReduceInternalResourceGroup
/HDFS
DataNodeRG
,SecondaryNodeRG
, andNameNodeRG
/HDFS/NameNodeConsumer
NameNodeRG
/HDFS/SecondaryNodeConsumer
SecondaryNodeRG
/HDFS/DataNodeConsumer
DataNodeRG
For details steps, see Creating a consumer and Modifying consumer properties.
- Locate the following section in the
$EGO_CONFDIR/../../gui/conf/pmcconf/pmc_conf_ego.xml file, and remove the
- Register the MRSS (MapReduce
EGO shuffle service under
the
/ComputeServices/MapreduceComputeServices
consumer):- From the cluster management console, select System & Services > EGO Services > Service Profiles.
- In the consumer tree, navigate to the /ComputeServices/MapreduceComputeServices consumer.
- Click Global Actions > Register a new service.
- Register the MRSS by clicking Import > Browse, navigate to the $EGO_TOP/.install/sym_advanced_edition/mrss.xml file, and click Register.
- Register the MapReduce application:
soamreg $EGO_TOP/.install/sym_advanced_edition/MapReduce7.3.2.xml
- Enable data loading and data purging for MapReduce by copying the following
files from the $EGO_TOP/.install/sym_advanced_edition/ directory to the
$EGO_CONFDIR/../../perf/conf/ directory as follows:
cp $EGO_TOP/.install/sym_advanced_edition/plc_pmr.xml $EGO_CONFDIR/../../perf/conf/plc cp $EGO_TOP/.install/sym_advanced_edition/purger_pmr.xml $EGO_CONFDIR/../../perf/conf/purger cp $EGO_TOP/.install/sym_advanced_edition/pmrresourcemetrics.xml $EGO_CONFDIR/../../perf/conf/dataloader
- Log on to the primary host and source the
environment:
- Restart the cluster:
egosh service stop all egosh ego shutdown egosh ego start