Hortonworks Data Platform MapReduce and YARN

Follow these steps to integrate the IBM® Spectrum Symphony MapReduce and YARN engine with Hortonworks Data Platform (HDP) 2.3.

Before you begin

HDP 2.3 is installed and running.
IBM Spectrum Symphony is installed and running.

Procedure

Shut down the cluster.
Stop all HDP components (services) from the Ambari console.

In all management hosts, replace the ExecutionUser with the HDP YARN service user, which is yarn by default, in the following files:

$EGO_CONFDIR/ConsumerTrees.xml:

<Consumer ConsumerName="NodeManagerConsumer">                          
    <ConsumerProperties>                            
      <ExecutionUser>yarn</ExecutionUser>                          
    </ConsumerProperties>
</Consumer>

$EGO_ESRVDIR/esc/conf/services/SymphonyYARN.xml:

<ego:ExecutionUser>yarn</ego:ExecutionUser>

Replace the WEBGUI service with a free port number; its default port number 8443 may be occupied by the Ambari server if they happen to run on the same node. For example, in $EGO_CONFDIR/../../gui/conf/server_gui.xml, replace port number 8443:
```
<httpEndpoint host="*" httpPort="-1" httpsPort="8443" id="defaultHttpEndpoint"/>
```
Log on to each host as root and perform the following steps:
1. Set the HDP_HOME environment variable to point to the HDP installation directory. For example:
  
  export HDP_HOME=/usr/hdp/2.3.0.0-2041/
2. Source the IBM Spectrum Symphony environment:
  
  # source $EGO_TOP/profile.platform
3. Run jar_integration.sh on each management host and compute host where IBM Spectrum Symphony is installed. For example:
  
  # $EGO_TOP/soam/mapreduce/integration/IBM_HDP_2.3/jar_integration.sh
  
  If the YARN service user name is a name other than yarn, you must specify the user name as a parameter to run the integration script.
  
  # $EGO_TOP/soam/mapreduce/integration/IBM_HDP_2.3/jar_integration.sh -yarn yarn service user
  
  where yarn service user maps to the user in step 3. The default user name is yarn.
  The jar_integration.sh script performs the following actions:
  1. Creates IBM-pmr-hadoop-*.jar files and replaces the following jar files under the HDP installation directory. The original jar files are backed up as *.HDP.ORIG.
    
    hadoop-annotations-2.6.0.2.3.0.0-2041.jar
    
    hadoop-mapreduce-client-core-2.6.0.2.3.0.0-2041.jar
  2. Adds pmr-site.xmlto the following Hadoop configuration directories:
    
    /etc/hadoop/conf
    
    /usr/hdp/current/oozie-server/conf/
  3. Creates IBM-yarn-hadoop-*.jar and SymYarn.jar files, and replaces the following jar files under the BigInsights installation directory. The original jar files are backed up as *.HDP.ORIG.
    
    yarn-common-2.6.0.2.3.0.0-2041.jar
    
    yarn-server-common-2.6.0.2.3.0.0-2041.jar
    
    yarn-server-nodemanager-2.6.0.2.3.0.0-2041.jar
    
    yarn-server-resourcemanager-2.6.0.2.3.0.0-2041.jar
  4. Creates a default multi-dimensional resource plan, and consumers and resource groups for YARN.
  5. Replaces HDP YARN launcher scripts.
  6. Ensures that OS libraries are up-to-date.
Enable IBM Spectrum Symphony MapReduce to run on a Kerberos enabled HDP 2.3 cluster. (This step only applies to Kerberos-enabled HDP 2.3 clusters.)
1. Ensure all hosts have access to the keytab file containing the principal used to access the HDP cluster.
2. Run the Kerberos integration script enableKerberosPMR4HDP.sh:
  
  # $PMR_HOME/integration/IBM_HDP_2.3/enableKerberosPMR4HDP.sh --appname MapReduce7.3.2 --principal nameNodeConsumer/clusterName@IBM.COM --keytab /etc/conf/keytab.dummy --kinitdir /usr/bin
  
  where:
  
  appname
  
  Specifies the name of the MapReduce application.
  
  principal
  
  Specifies the Kerberos principal used to submit MapReduce sessions.
  
  keytab
  
  Specifies the location of the keytab for the principal.
  
  kinitdir
  
  Specifies the location of the kinit binary. By default, the location is the /usr/bin/ directory.
  
  Note: The enableKerberosPMR4HDP.sh script overwrites $EGO_CONFDIR/sec_ego_kerberos.conf.
Start the IBM Spectrum Symphony cluster.
Start all services from the Ambari console.

Submit MapReduce jobs using either the hadoop or mrsh command. For example:

mrsh jar $PMR_HOME/7.3.2/linux2.6-glibc2.3-x86_64/samples/hadoop-mapreduce-client-jobclient-2.4.1-tests.jar sleep -mt 1 -rt 1 -m 1000 -r 1

Submit YARN jobs using either the yarn or symyarn command. For example:

yarn jar /usr/hdp/2.3.0.0-2041/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.6.0.2.2.0.0-2041.jar org.apache.hadoop.yarn.applications.distributedshell.Client --jar /usr/hdp/2.3.0.0-2041/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.6.0.2.2.0.0-2041.jar --shell_command date --num_containers 1