Contents


Digital Business Start trial offer for IBM Power Enterprise Systems with Hortonworks Data Platform

A quick start guide

Comments

The Digital Business Start for IBM® Power® Enterprise Servers with Hortonworks Data Platform (HDP) trial offer is specifically tailored for customers who already have IBM Power System E850C, E870C, E880C, E850, E870 or E880 servers (based on the IBM POWER8® processor-based technology) in their environment and wish to enable inactive cores and memory to quickly deploy HDP cluster through the Ambari interface. Installing and running a standard example workload (in this case, TeraSort) will be used for verification of the HDP cluster environment.

This quick start guide is designed for POWER8 administrators who already have a working knowledge of IBM POWER® processor-based servers, but not necessarily of configuring and installing Linux® in their environments. Taking advantage of this free trial offer allows experimentation with one of the latest open source operating system and analytics platform available today, specifically Hortonworks Data Platform, running on Red Hat Enterprise Linux 7.2 (little endian).

IBM Power Virtualization Center (IBM PowerVC) is an advanced virtualization and cloud management offering for IBM Power Systems™ servers based on OpenStack technology. IBM PowerVC greatly reduces virtual machine management efforts with simple installation instructions and base templates for both compute and storage resources enabling fast deployment of completely installed and configured virtual machines (VMs). PowerVC can be a great enhancement to any environment, especially environments that require deployment of many VMs. Although PowerVC is a useful tool, it is not required for the configurations in this offering.

Customer use cases

The following section contains sample use case scenarios that can aid in deciding where to start and what to deploy.

Customer use case 1

IBM Power user proficient with PowerVC, IBM PowerVM®, and has Linux images (RHEL 7.2 LE) imported into PowerVC:

  1. Contact your local IBM sales representative to request the free trial resource enablement codes.
  2. Deploy a Linux VM using one of the configurations from Table 2.
  3. Install the prerequisites.
  4. Install Ambari and deploy the HDP cluster (refer to section 7 and section 8).
  5. Run the TeraSort workload.

Customer use case 2

IBM Power user proficient with PowerVC and PowerVM, and who may not be familiar with Linux and does not currently have Linux images imported:

  1. Contact your local IBM sales representative to request the free trial resource enablement codes.
  2. Create a logical partition (LPAR) using one of the configurations from Table 2.
  3. Install the Linux OS: RHEL 7.2 LE.
  4. Import the Linux image into PowerVC.
  5. Deploy a Linux VM LPAR using one of the configurations from Table 2.
  6. Install the prerequisites.
  7. Install Ambari and deploy the HDP cluster (refer to section 7 and section 8).
  8. Run the TeraSort workload.

Customer use case 3

IBM Power user using PowerVM with or without Linux experience:

  1. Contact your local IBM sales representative to request the free trial resource enablement codes.
  2. Create an LPAR with one of the configurations from Table 2.
  3. Install the Linux OS RHEL 7.2 LE.
  4. Install the prerequisites.
  5. Install Ambari and deploy the HDP cluster (refer to section 7 and section 8).
  6. Run the TeraSort workload.
Figure 1. Process flowchart
Table 1. Use cases and approximate time to complete use cases
Environment components Estimated time to install and be up and running
PowerVM PowerVC PowerVC and Linux image
Use case 1 Yes Yes Yes 90-120 min
Use case 2 Yes Yes No 3-5 hr
Use case 3 Yes No No 3-5 hr

Note: Time variances rely heavily on experience.

1

Get started

This section explains the basic prerequisites for this program. It describes the hardware requirements, software licenses, and the process for enabling Trial Capacity on Demand (CoD) resources.

Trial Capacity on Demand

Trial CoD is provided when you make a request for trial capacity enablement, typically through the CoD project office.

For this special offer, the process has been simplified to extend the typical 30-day trial period for up to three months. If you are interested in taking advantage of this free trial offer, contact your local IBM representative or IBM Business Partner and request for a HDP Proof of Concept Authorization form submission on your behalf. Depending on your role, you can download the respective form:

A standard trial capacity request grants eight processor core activations and 64 GB of memory (provided those resources are available). You can also make an exception request for all processors, all memory, or both to be activated. An exception request can typically be made only once over the life of the machine. These CoD requests are typically good for 30 days. With this offer, however, extensions are possible for up to three months in increments of 30 days each. Talk to your sales representative for more information. After the request is made, it might take up to three business days to process. The code will be sent to the email address you provide and will also be posted to the CoD website.

When you receive the CoD code, you need to activate it by following the steps outlined in the following PDF: IBM Capacity on Demand. Note that a Hardware Management Console (HMC) is required to administer the CoD offering. Additional requirements and assumptions are listed in the next section.

After you initialize the CoD code, the trial period is available for 30 power-on days. The trial period only advances while the server is powered on. Any additional time left from a previous installed CoD trial license will not be added to the 30 days of the new trial offer. For this special offer, up to three months of usage will be issued in 30 day trial code entries.

System requirements and assumptions

Basic system requirements and Integrated Facility for Linux (IFL) configurations are listed in this section.

  • An HMC is required to activate the trial CoD license code and for creating Virtual I/O Server (VIOS) instances.
  • IBM PowerVM is required for creating VIOS instances.
  • Hardware:
    • Power E850, E870, E880, E850C, E870C, or E880C servers
    • Network access for VM
    • If you're using IBM PowerVC you’ll need to use virtual storage and virtual network for all VMs
Table 2. Core memory activation IFL configurations with suggested disk sizes for RHEL 7.2 LE
Configuration Node cluster CPU (per node) VCPU (per node) Memory (GB) (per node) Storage (FC-NPIV) for OS and HDFS (per node) Comments
HDP -small 4 nodes:
1 master
3 workers
1 1 8 100 GB
Ambari installed on the master node
HDP-medium 4 nodes:
1 master
3 workers
2 2 16 100 GB Ambari installed on the master node
HDP- large 4 nodes:
1 master
3 workers
4 4 32 500 GB Ambari installed on the master node

PowerVM – creating a Linux VM

PowerVM is required for VIOS, which is in turn required for Virtual I/O LPARs. Virtual I/O LPARs, which are also known as virtual machines (VMs) should be created through the HMC following standard procedures.

VMs should be created through the HMC following standard procedures. It is assumed that the user is familiar with VM provisioning through PowerVM on the POWER hardware.

If PowerVC is being used in your environment and has a Linux image imported to it, deploy a VM with the desired resources listed in Table 2 and go to the section, Prepare system for HDP installation.

For more information about configuring a PowerVM VM, refer to the following IBM Redbooks®:

2

Install Red Hat Enterprise Linux 7.2 LE

This section describes how to install Red Hat Enterprise Linux 7.2 LE.

Obtain a Red Hat Enterprise Linux Server license (required)

If needed, you can get a free 60 day evaluation license by following the directions at this link: http://ibm.biz/BdswD4. Make sure to request a license for the optional and supplementary paths as well as the base installation repo paths that come with your Red Hat Network (RHN) subscription.

Note: For Red Hat Enterprise Linux 7.x, some of the prerequisites for installing PowerVC and cloud-init have moved from the Red Hat Enterprise Linux OS install media to an optional software channel that can be accessed by using your RHN subscription.

Register and subscribe the system with RHN:
https://access.redhat.com/documentation/en-US/Red_Hat_Subscription_Management/1/html/RHSM/registering-cmd.html

Install Red Hat Enterprise Linux 7.2 LE

After your Red Hat subscription is enabled, download the RHEL 7.2 LE image to an existing Network Installation Management (NIM) server, a DVD, or a flash drive, and use the SMS menu to point to the install device to begin installation.

For detailed Red Hat Enterprise Linux installation instructions, see the documentation found on the Red Hat Customer Portal, under Getting Started.

Install the required additional packages for RHEL

After installation, bring the system on network and install Reliable Scalable Cluster Technology (RSCT) to manage it properly from HMC/PowerVC.

  • Verify that the network is configured with external access available.

    If the network is not configured, use nmtui to open the network manager and edit the connection. Then run service network restart to restart the network manager.

  • Verify the RSCT packages.

    RSCT packages might be required for Linux VMs to work properly with some HMCs. Failure to install the RSCT packages may result in NO RMC Connection being reported by the HMC for that VM. PowerVC will not be able to import the VM, including the RHEL 7.2 LE image if the RMC is not able to connect to the HMC.

    A prerequisite for the RSCT packages is ksh.

    You can find the RSCT packages at: http://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/hmcmanaged/rhel7.html

    Download the following LE rpm packages and install them:

    • SRC / 3.2.2
    • RSCT basic
    • RSCT core
    • RSCT utilities
    # yum install ksh -y
    # yum --nogpgcheck localinstall src-3.2.2.0-16265.ppc64le.rpm
    # yum --nogpgcheck localinstall rsct.basic-3.2.2.0-16265.ppc64le.rpm
      rsct.core-3.2.2.0-16265.ppc64le.rpm rsct.core.utils-3.2.2.0-16265.ppc64le.rpm
  • Install ksh and RSCT file sets using yum.

Disable SELinux

SELinux is required by HDP and PowerVC to be in either the permissive mode or permanently disabled.

To set SELinux to the permissive mode (which means that access attempts violating the SELinux policy will be logged, but not prevented), edit the /etc/selinux/config file to disable SELinux.

        sed -i "s/^SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config
3

Install PowerVC and manage the Linux VM

IBM PowerVC is a powerful advanced virtualization and cloud management tool that operates from its own host without taking up a lot of resources. PowerVC can be installed on a server with just two vCPUs and 10 GB of memory.

If you already use PowerVC in an IBM AIX® or Linux environment, then you can skip this section and deploy a Linux VM using the templates that most closely match the resources listed in Table 2.

If you have PowerVC managing your non-Linux VM environment, then you may want to move to the section, Preparing the RHEL 7.2 LE VM for capture by installing cloud-init

If you have never used PowerVC, and would like to give it a try, continue to install PowerVC .

To expand your knowledge of PowerVC, or to answer specific questions not covered here, visit the IBM PowerVC – Virtualization Center.

For more detailed step-by-step guidance covering all PowerVC Linux tasks, see Chapter 4 of IBM PowerVC Version 1.3.2 Introduction and Configuration Redbook.

Install PowerVC

Appendix A in the Digital Business Start trial offer for IBM Power Enterprise Systems using MongoDB article provides detailed steps for installing PowerVC on RHEL 7.2 LE.

Table 3. PowerVC resource requirements
Number of VMs
Minimum Up to 400401-10001001-20002001-30003001-5000
Processor capacity 1 2 4 8 8 12
Virtual CPUs 2 2 4 8 8 12
Memory (GB) 10 10 12 20 28 44
Swap space (GB) 10 10 12 20 28 44
Disk space (GB) 40 43 60 80 100 140

For more information about PowerVC resource requirements, see Power VC hardware and software requirements in the IBM Knowledge Center.

Start using PowerVC

After PowerVC is installed, open a web browser to start using PowerVM to manage your system environment.

https://<ipaddress or hostname of PowerVC server>

The initial login ID is root with the root password of the server it is launched from.

Discover system components

See Chapter 4 of the IBM Redbooks, IBM PowerVC Version 1.3.2 Introduction and Configuration, for step-by-step instructions on discovering and configuring all the Linux environment resources.

Most of the configuration information will be imported as you perform the discovery process of each resource. Follow the directions in the configuration guide and refer to the section, Add Host (HMC), Storage, Network, and then perform the following steps in this article to prepare to import the VM with the selected Linux OS installed on it (which can be discovered under the Host tab).

Prepare the RHEL 7.2 LE VM for capture by installing cloud-init

There are a few prerequisites that must be met before a VM can be imported, or have its image captured.

Prerequisites:

  • Ensure that the VIOS on which the VM runs is managed by IBM PowerVC.
  • Ensure that the VM uses virtual network and storage; the network and storage devices are provided by the VIOS.
  • IF using multipath on PowerVM, you must configure Linux for multipath I/O (MPIO) on the root device.
  • The RSCT packages must be installed or the Linux VM will fail to be managed by PowerVC.

RHEL 7.2 LE (host) VM preparation:

# wget http://dl.fedoraproject.org/pub/epel/7/ppc64/e/epel-release-7-9.noarch.rpm
# yum --nogpgcheck localinstall epel-release-7*.rpm
  • Add the Extra Packages for Enterprise Linux (EPEL) yum repository.
    # rpm  –qa |grep cloud
    # rpm –e <full package name>
  • If cloud-init already exists on the VM, uninstall it.
  • Install cloud-init dependencies.
     # yum install python python-boto policycoreutils-python python-jsonpatch python-prettytable    
     # yum install python-cheetah

    If the installation of python-cheetah fails with a dependency issue, download and install python-pygments from the RHEL 7.2 LE supplemental DVD/repo.

    If the libwebp-0.3.0-5.ael7b.ppc64le.rpm file fails to install, you can use the RHEL 7.2 LE installation DVD.

  • Get the cloud-init rpm located on the PowerVC server to manage the VM.
    # scp root@<PowerVC system name/
      IP>:/opt/ibm/powervc/images/cloud-init/rhel/cloud-init-0.7.4-8.el7.noarch.rpm
    # rpm -Uvh cloud-init-0.7.4-8.el7.noarch.rpm

    Copy /opt/ibm/PowerVC/images/cloud-init/rhel/cloud-init-0.7.4-8.el7.noarch.rpm to RHEL 7.2 VM.

  • Modify the variables in the cloud.cfg file as shown below:
    # vi /etc/cloud/cloud.cfg
    		disable_root: 0
            ssh_pwauth: 1
            ssh_deletekeys: 1

    Then add the following two lines.

    disable_ec2_metadata: True
    datasource_list: ['ConfigDrive']

Note: If you want to change the name of the systems after deployment, remove - update_hostname from /etc/cloud/cloud.cfg. If you do not remove it, cloud-init resets the host name to the original host name deployed value when the system restarts.

Import the RHEL 7.2 LE VM and capture the image

Complete the following post cloud-init installation preparations:

  • Verify that SELinux is disabled or set to the permissive mode.
  • Install and enable the network manager.
            # service NetworkManager status
  • Install net-tools if not already installed.
            # yum install net-tools
  • Edit all the /etc/sysconfig/network-scripts/ifcfg-eth* files to update or add.
            NM_CONTROLLED = no settings.
  • Remove the MAC address so it will not be propagated on future clones.

    Edit the /etc/sysconfig/network-scripts/ifcfg-eth* files to delete the HWADDR="xx" line.

  • The VM must be powered off before starting the capture.

The VM must be managed before the boot LUN can be imported, or captured, as an image.

In the PowerVC GUI, click Hosts. Select the appropriate host server and click Manage Existing Virtual Machines.

Then, click Select Specific Virtual Machines or All and click Manage (select the VM and manage it, if selecting specific VM).

As soon as the VM has been managed, it is ready to have its image imported.

  1. Click the Virtual Machines tab on the left window pane.
  2. Select the VM with the RHEL 7.2 LE image.
  3. Click Capture.

Create a PowerVC template

In this step you'll create a template using the number of processors and the memory allotment you selected from Table 2.

See section 4.13, “Compute template setup” in the IBM Redbook, IBM PowerVC Version 1.3.2 Introduction and Configuration if you need instructions for creating the template.

Deploy the VM

After you've created the template, you'll use it to deploy a VM using the Linux image you captured in the previous step.

For step-by-step instructions, see section 4.15.7 of the IBM Redbook, IBM PowerVC Version 1.3.2 Introduction and Configuration

Post installation considerations for RHEL 7.2 LE

  • Depending on which parameters are selected before cloning, the newly deployed VM may boot with the same name as the original clone. It will not be accessible from the network as it will have a new MAC address. To change the host name after boot, run this command:
    # hostnamectl set-hostname <new VM name>
    # hostnamectl status
    # reboot
  • Configure the network parameters:
    NAME="eth0"
    ONBOOT=yes
    NETBOOT=yes
    UUID="70ae700d-2ad0-42ab-a58f-782bc4dbbeec"
    IPV6INIT=yes
    BOOTPROTO=static
    IPADDR="<ip>"
    NETMASK="<Netmask>"
    GATEWAY="<GW>"
    TYPE=Ethernet
    NM_CONTROLLED = no
    
    # systemctl restart netowrk
  • Edit /etc/sysconfig/network-scripts/ifcfg-eth0 with the new IP address, as shown below.
  • Make sure that you register the new RHEL 7.2 LE VM with Red Hat to enable licenses. Verify that they exist in the redhat.repo before going to the HDP Installation section.

See the section, Install Red Hat Enterprise 7.2 LE for additional directions.

4

Prepare system for HDP installation

To deploy the Hadoop instance, the deployment environment should be prepared according to the prerequisites. The commands listed in the following subsections should then be run on each node in the cluster.

Set up a password-less SSH

# ssh-keygen
# ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa <nodes_names>  (Copy to all the node of the cluster)
# ssh-copy-id -i ~/.ssh/id_rsa  <node names> (copy to all the nodes)

After the Ambari server node is identified (as per table 2), password-less Secure Shell (SSH) must be set from the Ambari server to all the nodes in the cluster (including itself), as the Ambari server will automatically install the agents on all the nodes.

Add the SSH public key to the authorized keys file and change the permission on all the nodes.

#cat id_rsa.pub >> authorized_keys 
#chmod 700 ~/.ssh
#chmod 600 ~/.ssh/authorized_keys

Check the password-less SSH works from the Ambari node to all the four nodes of the cluster.

Enable Network Time Protocol (NTP) on cluster nodes

To install NTP packages, run:

#yum install -y ntp

To set the NTP service to auto-start on boot, run:

# systemctl start ntpdsystemctl enable ntpd

To start the NTP service, run:

#systemctl start ntpd systemctl enable ntpd

Verify host name resolution

Edit the /etc/host file on all the nodes of the cluster to add the fully qualified domain name (FQDN) entry of each node of the cluster.

# vi /etc/hosts

Add the fully qualified DNS name of all nodes in the cluster.

1.1.1.4 abc. fully.qualified.domain.name

Set the host name on all nodes

To set the host name, run:

#hostname abc.fully.qualified.domain.name

To check whether the host name is set properly, run:

#hostname -f

Edit the network configuration file

Make the following edits in the network configuration file.

#vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=<abc.fully.qualified.domain.name>

Configure the IP tables

#systemctl disable firewalld
#systemctl stop firewalld

Ambari talks to cluster nodes on different ports. An easy way to set up and allow this communication is by disabling the firewall.

Disable network manager

Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 file as shown below.

NM_CONTROLLED=no
ONBOOT=yes

Disable SELinux and set umask

To disable SELinux, edit the /etc/selinux/config file using the following command:

#sed -i "s/^SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config

To set the umask, edit /etc/profile.d/mask.sh as shown below:

#!/bin/bash
if [ -n "$BASH_VERSION" -o -n "$KSH_VERSION" -o -n"$ZSH_VERSION" ];
then 
  # for bash, pdksh and zsh, set the umask 
    umask 0022
fi

Set the file descriptor

Recommended value of maximum number file descriptor is 10000, edit nofile.conf as below.

Verify the change with the ulimit command.

#vi /etc/security/limits.d/nofile.conf
# HDP needs nofile to be large.
#	ulimit -Sn; ulimit -Hn; ulimit -a
#
*	soft	nofile	10000
*	hard	nofile	10000

#ulimit -a | grep files

Install JSch packages

The JSch packages are required for oozie services.

Download the following two packages if those are not found on the repository and install them on all nodes.

yum --nogpgcheck localinstall  jzlib-1.1.1-6.ael7b.noarch.rpm
yum --nogpgcheck localinstall  jsch-0.1.50-5.ael7b.noarch.rpm

Install open JDK

Install open JDK 1.8.

#yum install java-1.8.0-openjdk-devel.ppc64le

This installs the following three packages.

  • java-1.8.0-openjdk-1.8.0.65-3.b17.el7.ppc64le
  • java-1.8.0-openjdk-devel-1.8.0.65-3.b17.el7.ppc64le
  • java-1.8.0-openjdk-headless-1.8.0.65-3.b17.el7.ppc64le

Edit the JVM configuration file

Storm services might fail with JDK 1.8. So, add the following line on the configuration files on all the nodes.

# vi /usr/lib/jvm/java-1.8.0-openjdk/jre/lib/ppc64le/jvm.cfg
-client IGNORE

Verify python requirement

python-libs-2.7.5-34 is a requirement for metric collector. If we have python greater/lesser than this level, you need to downgrade/upgrade to this level.

# rpm -qa|grep ^python-libs
# yum downgrade python-libs-2.7.5-34.el7.ppc64le         or
# yum update python-libs-2.7.5-34.el7.ppc64le

Reboot all the nodes after prepping the system for HDP installation.

5

Install and configure the Ambari server

Seclect one of the four nodes to act as the Ambari server, preferably the name node (refer to Table 2).

Ambari 2.5.0 repository

OS Format URL
RHEL 7.x Base URL http://public-repo-1.hortonworks.com/ambari/centos7-ppc/2.x/updates/2.5.0.0
Repo file http://public-repo-1.hortonworks.com/ambari/centos7-ppc/2.x/updates/2.5.0.0/ambari.repo

Copy the Ambari repository file to /etc/yum.repos.d/.

wget -nv http://public-repo-1.hortonworks.com/ambari/centos7-ppc/2.x/updates/2.5.0.0/ambari.repo 
-O /etc/yum.repos.d/ambari.repo

Install the Ambari server

Install the Ambari server on one of the nodes (preferably on the master node).

# yum -y install ambari-server

Set up the Ambari server

Use the following instructions to complete the Ambari server setup. Screen captures are included in this section to make it more clear.

  • Accept the default (n) at the Customize user account for ambari-server daemon prompt, to proceed as root. If you want to run as a different user other than root, then provide the user name and password.
  • Select option [1] Custom JDK. JDKv1.8 should be installed on all nodes as a prerequisite.

    JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk

  • Type n and press Enter for advanced database configuration to use the default.
Figure 2. Command line interface for Ambari server setup

Start the Ambari server

To start the Ambari server:

# ambari-server start

To check the Ambari server processes:

#ambari-server status

To stop the Ambari server:

#ambari-server stop

Workaround to start the Falcon service

With HDP version 2.6.0.0, the Falcon services might not get started after HDP deployment. You can follow these steps to configure the Ambari server to use Berkley DB driver as workaround to this problem. These steps can be performed even after HDP deployment.

# wget –O je-5.0.73.jar
        http://search.maven.org/remotecontent?filepath=com/sleepycat/je/5.0.73/je-5.0.73.jar
#mv remotecontent?filepath=com%2Fsleepycat%2Fje%2F5.0.73%2Fje-5.0.73.jar
        /usr/share/je-5.0.73.jar
#ambari-server setup --jdbc-db=bdb --jdbc-driver=/usr/share/je-5.0.73.jar
#ambari-server restart
6

Install and configure HDP

Use the Ambari web interface for HDP cluster deployment. This section provides steps to configure HDP cluster along with screen captures.

Web URL for Ambari server is: http://<hostname.fqdn.com>:8080

The default user name and password on the Ambari server is admin/admin.

It is assumed here that connectivity to the Ambari server exists from your browser. Otherwise, Secure Shell (SSH) tunneling or virtual private network (VPN) might be required to open the Ambari web interface.

Launch the install wizard

Figure 3. Ambari welcome page

Get started

Figure 4. HDP deployment start page

On the Get Started page, enter a name for your new cluster and click Next.

Select version

Use HDP 2.6.0.0 and change the HDP-2.6 and HDP-UTILS URLs for Red Hat 7 to the following specified URLs.

  • HDP 2.6: http://private-repo-1.hortonworks.com/HDP/centos7-ppc/2.x/updates/2.6.0.0-598
  • HDP-UTILS: http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/ppc64le
Figure 5. HDP version selection page

Install option

Enter the FQDN host names of all the nodes in the cluster. Also, enter the SSH private key on the Ambari node (cat ~/.ssh/id_rsa).

Figure 6. HDP deployment options and target selection

Confirm hosts

You need to validate all the hosts as a prerequisite to deploy the HDP cluster.

Figure 7. Host prerequisite validation

Choose services

The services (as shown in the following figure) are selected as default, and you can continue with the same.

Note: The SmartSense service will be selected by default and will not be started after deployment as the Customer Account Name and SmartSenseID values are mandatory to start this service.

Figure 8. HDP service selection

You can ignore the limited functionality warning from smart sense service and click Proceed Anyway.

Assign masters

Retain these services with the default assignment.

Figure 9. Assign master services

Assign slaves and clients

Default assignments can be used here as shown in the following figure.

Figure 10. Assign slaves and clients

Customize services

You can click each service displayed with a red flag and set the password.

HDFS and YARN tabs can be populated with a name node and data node directory. One of the directories start with /home. You can either remove or replace it with any other backup directory that does not start with /home.

Figure 11. Service customization

Click Next. A Configurations page is displayed with suggestions based on the cluster configuration.

Figure 12. Configuration suggestions

Make the necessary changes as suggested from the Ambari GUI under the service tabs Ambari Metrics -> Configuration from the dashboard.

You can use any existing DB or proceed with the Derby database. Click Proceed Anyway in this warning message.

Figure 13. Derby database warning

Review and deploy

Review the deployment selections and start deployment.

Figure 14. Review the deployment selections
7

Validate HDP cluster configuration

After successfully starting all the cluster services, validate the new cluster configuration by using one or both of the two sample benchmarking tools, spark-pi and TeraSort.

spark-pi

Refer to the following document for the steps to validate spark.
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_spark-quickstart/content/ch_validating-spark-quickstart.html

su hdfs
cd /usr/hdp/current/spark-client
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster
        --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
        lib/spark-examples*.jar 10

TeraSort

TeraSort is one of the well-known benchmarking tools used for Hadoop. It comes along with the HDP installation. This benchmark workload measures the amount of time to sort given amount of randomly distributed data on a given computer system. It is commonly used to measure MapReduce performance.

TeraSort includes three MapReduce applications and running all following three applications can complete the real TeraSort exercise on the system.

  • teragen: Generates the data to be used for the workload.
  • terasort: Samples the input data and uses them with MapReduce to sort the data.
  • teravalidate: Validates the output data of the TeraSort.

For more information on how to use TeraSort, refer to:
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

Log in to your name node server and perform the following steps to run TeraSort.

su hdfs
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.ppc64le 
export HADOOP_HOME=/usr/hdp/2.6.0.0-598/hadoop 
export HADOOP_MR_DIR=/usr/hdp/2.6.0.0-598/hadoop-mapreduce
export SPARK_HOME=/usr/hdp/current/spark2
export HADOOP_CONF_DIR=/etc/hadoop/conf/

Teragen will generate around 10 GB of data with parameter, SIZE =100000000.

NUM_MAP=128
BLOCK_SIZE=536870912
IN_DIR=in_dir
OUT_DIR=out_dir
NUM_REDUCE=128
SIZE=100000000
#hadoop jar $HADOOP_MR_DIR/hadoop-mapreduce-examples-2*.jar teragen
          -Ddfs.block.size=$BLOCK_SIZE -Dmapred.map.tasks=$NUM_MAPS ${SIZE} ${IN_DIR}

You can monitor MapReduce jobs using the following yarn resource manager webapp address: http:<resource mgr ip>:8088

You can find the resource manager webapp address in the Ambari GUI -> Yarn -> Config -> Advanced -> Advanced yarn site.

#hadoop jar $HADOOP_MR_DIR/hadoop-mapreduce-examples-2*.jar terasort
          -Ddfs.block.size=$BLOCK_SIZE -Dmapred.reduce.tasks=$NUM_REDUCES ${IN_DIR} ${OUT_DIR}
# hadoop jar $HADOOP_MR_DIR/hadoop-mapreduce-examples-2*.jar teravalidate ${OUT_DIR}
          /hadoop/teravalidate-output
hadoop fs -rm -r -skipTrash ${IN_DIR} <delete directory used by teragen>
hadoop fs -rm -r -skipTrash ${OUT_DIR} <delete directory used by terasort>
jps – to show the java process running on the system.

There are few tunable suggestions provided specifically for TeraSort. You can change the values on the parameters based on the cluster configuration from Ambari GUI.

  • Increase the yarn memory per node from the Ambari console with leaving behind minimum required memory for the OS. With more yarn memory, more yarn containers can be scheduled.
  • Decrease MR reduce java heap size based on the configuration and this helps to run more reducers.
  • Set the parallel garbage collection thread count to 4. Parallel garbage collection is recommended for TeraSort.

Acknowledgments

We would like to thank the following people for their guidance and help to publish this article:

  • Maria R Ward: Power software and solutions test architect
  • Kurtis Ruby: Big Data, HPC and Linux on Power consultant, IBM Systems
  • Richard Scheller: Integrated Software Systems Test team, PowerKVM verification specialist
  • Russell Sloan: Integrated Software Systems Test team, software test professional

Downloadable resources


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=1044573
ArticleTitle=Digital Business Start trial offer for IBM Power Enterprise Systems with Hortonworks Data Platform
publish-date=04042017