Hortonworks Data Platform MapReduce and YARN
Follow these steps to integrate the IBM® Spectrum Symphony MapReduce and YARN engine with Hortonworks Data Platform (HDP) 2.3.
Before you begin
- HDP 2.3 is installed and running.
- IBM Spectrum Symphony is installed and running.
Procedure
- Shut down the cluster.
- Stop all HDP components (services) from the Ambari console.
-
In all management hosts, replace the ExecutionUser with the HDP YARN service user, which is
yarn
by default, in the following files:$EGO_CONFDIR/ConsumerTrees.xml:
<Consumer ConsumerName="NodeManagerConsumer"> <ConsumerProperties> <ExecutionUser>yarn</ExecutionUser> </ConsumerProperties> </Consumer>
$EGO_ESRVDIR/esc/conf/services/SymphonyYARN.xml:
<ego:ExecutionUser>yarn</ego:ExecutionUser>
-
Replace the WEBGUI service with a free port number; its default port number 8443 may be
occupied by the Ambari server if they happen to run on the same node. For example, in
$EGO_CONFDIR/../../gui/conf/server_gui.xml, replace port number 8443:
<httpEndpoint host="*" httpPort="-1" httpsPort="8443" id="defaultHttpEndpoint"/>
- Log on to each host as root and perform the following steps:
- Set the HDP_HOME environment variable to point to the
HDP installation directory. For example:
export HDP_HOME=/usr/hdp/2.3.0.0-2041/
-
Source the IBM Spectrum Symphony environment:
# source $EGO_TOP/profile.platform
-
Run jar_integration.sh on each management host and compute host where IBM Spectrum Symphony is installed. For example:
# $EGO_TOP/soam/mapreduce/integration/IBM_HDP_2.3/jar_integration.sh
If the YARN service user name is a name other than
yarn
, you must specify the user name as a parameter to run the integration script.# $EGO_TOP/soam/mapreduce/integration/IBM_HDP_2.3/jar_integration.sh -yarn yarn service user
where yarn service user maps to the user in step 3. The default user name is
yarn
.The jar_integration.sh script performs the following actions:- Creates IBM-pmr-hadoop-*.jar files and replaces the following jar files under the HDP installation directory. The original jar files are backed up as *.HDP.ORIG.
- hadoop-annotations-2.6.0.2.3.0.0-2041.jar
- hadoop-mapreduce-client-core-2.6.0.2.3.0.0-2041.jar
- Adds pmr-site.xmlto the following Hadoop configuration directories:
- /etc/hadoop/conf
- /usr/hdp/current/oozie-server/conf/
- Creates IBM-yarn-hadoop-*.jar and SymYarn.jar files, and replaces the following jar files under the BigInsights installation directory. The original jar files are backed up as *.HDP.ORIG.
- yarn-common-2.6.0.2.3.0.0-2041.jar
- yarn-server-common-2.6.0.2.3.0.0-2041.jar
- yarn-server-nodemanager-2.6.0.2.3.0.0-2041.jar
- yarn-server-resourcemanager-2.6.0.2.3.0.0-2041.jar
- Creates a default multi-dimensional resource plan, and consumers and resource groups for YARN.
- Replaces HDP YARN launcher scripts.
- Ensures that OS libraries are up-to-date.
- Creates IBM-pmr-hadoop-*.jar files and replaces the following jar files under the HDP installation directory. The original jar files are backed up as *.HDP.ORIG.
- Set the HDP_HOME environment variable to point to the
HDP installation directory. For example:
-
Enable IBM Spectrum Symphony MapReduce to run on a Kerberos enabled HDP 2.3 cluster. (This step only applies to Kerberos-enabled HDP 2.3 clusters.)
- Ensure all hosts have access to the keytab file containing the principal used to access the HDP cluster.
-
Run the Kerberos integration script enableKerberosPMR4HDP.sh:
# $PMR_HOME/integration/IBM_HDP_2.3/enableKerberosPMR4HDP.sh --appname MapReduce7.3.2 --principal nameNodeConsumer/clusterName@IBM.COM --keytab /etc/conf/keytab.dummy --kinitdir /usr/bin
where:- appname
- Specifies the name of the MapReduce application.
- principal
- Specifies the Kerberos principal used to submit MapReduce sessions.
- keytab
- Specifies the location of the keytab for the principal.
- kinitdir
- Specifies the location of the kinit binary. By default, the location is the /usr/bin/ directory.
Note: The enableKerberosPMR4HDP.sh script overwrites $EGO_CONFDIR/sec_ego_kerberos.conf.
- Start the IBM Spectrum Symphony cluster.
- Start all services from the Ambari console.
-
Submit MapReduce jobs using either the hadoop or mrsh command. For example:
mrsh jar $PMR_HOME/7.3.2/linux2.6-glibc2.3-x86_64/samples/hadoop-mapreduce-client-jobclient-2.4.1-tests.jar sleep -mt 1 -rt 1 -m 1000 -r 1
-
Submit YARN jobs using either the yarn or symyarn
command. For example:
yarn jar /usr/hdp/2.3.0.0-2041/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.6.0.2.2.0.0-2041.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar /usr/hdp/2.3.0.0-2041/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.6.0.2.2.0.0-2041.jar --shell_command date --num_containers 1