Quick start guide to Apache Bigtop v1.1 on IBM SoftLayer OpenPOWER with Ubuntu 14.04

This article outlines the process of installing the Apache Hadoop and Spark Bigtop v1.1.0 bundle on an IBM® SoftLayer® POWER8® bare metal server running Ubuntu 14.04. Apache Zeppelin notebook is included in the bundled installation script to run an initial benchmark suite.

Share:

Amir Sanjar (asanjar@us.ibm.com), Big Data Solution Developer, IBM

Pic of AmirAmir Sanjar has many years of experience in big data software and solution development at companies including IBM and Canonical. He holds several patents in areas of enterprise solution automation, wireless, and cell technology. Currently at IBM, he leads big data ecosystem and ISVs IBM Power enablement.



Donna Ball (dball@us.ibm.com), ISST Open Source Offerings Test Lead, IBM

Pic of DonnaDonna has been with IBM for over 15 years, the majority of her time spent as a Test Lead on various IBM System programs such as FAStT, RSSM, and FLEX iLAB. Two years ago, she moved to lead the IBM PowerKVM Customer Test Environment project, researching and implementing analytic and cloud solutions with Linux on Power architecture. She is currently the Integrated Software Systems Test Open Source Offerings Test Lead.



Bill Phu (billyphu@ca.ibm.com), Software Performance Analyst, IBM

Bio of BillBill started work as an IBM DB2® kernel developer, and then after 8 years moved to the System and Software Performance group. There, he tests the performance of a wide range of products, Including Hadoop and Spark on SoftLayer, creating price and performance reports.



Maria Ward (mrward@us.ibm.com), ISST Power SW Solutions Test Architect, IBM

Pic of MariaMaria Ward is the Power Software and Solutions Test Architect for the IBM Systems Client Advocacy and System Assurance organization. Her current focus is on IBM Power solutions and integrating OpenPOWER and open source technologies, such as Hadoop and Spark, into the IBM Power Systems test strategy to help provide a green path to deploying customer solutions on IBM Power servers.



01 June 2016

Also available in Chinese Japanese

1. Introduction

IBM POWER8 architecture, which is the latest offering of IBM SoftLayer is the perfect vehicle for trying out Apache Bigtop solutions. In under 45 minutes, from receiving a welcome package to your new bare metal IBM POWER8 processor-based server, you can have Hadoop and Spark, along with many other software packages, installed, configured, and ready to run pre-packaged tutorials on a Zeppelin notebook, all done automatically by using the install_bigtop.sh script. SoftLayer bare metal servers are not required to use the install_bigtop.sh script, as it works with Ubuntu 14.04.

Outline of steps to install Apache Bigtop v1.1 on POWER8 with Ubuntu 14.04

  1. Client places an order. Refer to the following URLs for more details.

Get started now! Use PROMO code FREEPOWER8 for up to $2,238 in credits towards a POWER8 system in SoftLayer starting the first of every month.

  1. Client receives a welcome package, including web portal access with a user ID and password.
  2. Client accesses the SoftLayer POWER8 server and prepares for installation.
  3. Client downloads the install_bigtop.sh script that will download, configure, and install the Apache Hadoop, Spark, Zeppelin, and other needed packages for Linux on Power.
  4. Client logs in to Apache Zeppelin and runs the preconfigured benchmark.
  5. Client uploads the sample Stock Workload and runs it for a quick performance test

Current IBM SoftLayer OpenPOWER optimized POWER8 offerings are bundled in four convenient packages. Refer to the following figure.

Figure 1. SoftLayer POWER8 bare metal server options (as of May 2016)

For more information and current pricing about SoftLayer POWER8 bare metal server configurations, visit: SoftLayer POWER8 bare metal servers.

Minimum system requirements for Bigtop

  • Four cores
  • 32 GB of memory
  • 50 GB virtual Small Computer System Interface (SCSI) disk
  • Linux Ubuntu 14.04.03 Little Endian

2. Getting started on SoftLayer

This section explains how you can access the SoftLayer POWER8 bare metal server.

Accessing the SoftLayer POWER8 bare metal server

You will receive a welcome email from SoftLayer with a link to the SoftLayer Control Portal along with the login information.

View the Getting Registered tutorial for detailed information about first-time access to fully configure a SoftLayer system.

Follow the steps outlined in the SoftLayer Tutorial videos to accomplish the following tasks:

  • Log in (first-time access).
  • Quickly navigate through the menus to configure the IBM SoftLayer POWER8 server.
  • Understand where to access online tools, including how to re-image the system.
  • Open a remote virtual private network (VPN) and access the virtual system using Secure Shell (SSH).

For more information about SoftLayer setup and configuration, refer to: knowledgelayer.softlayer.com.


3. Preparing the system for Bigtop installation

The install_bigtop.sh script can quickly, in less than 45 minutes, completely install and configure Hadoop, Spark, and Zeppelin (a tool for running benchmarking) automatically.

In order to run the install_bigtop.sh script, you must have super user privileges.

It is highly recommended that the install_bigtop.sh script be installed on a newly installed Ubuntu kernel. If the system has already been in use, run the cleanup.sh script located with the rest of the packages at: https://github.com/ibmsoe/bigtop/

Note: The install_bigtop.sh script will fail to install if the ~/bigtop/source directory exists.

  1. Install Bigtop as a non-root user (for example: bigtop_user) and perform the following steps.
    1. Add the new user.
    2. Set the new user password.
    3. Log in as the new user
    4. Change to the user home directory.
    5. Install git preparatory for downloading from github.com
    $ useradd bigtop_user –U –G sudo –m
    $ passwd bigtop_user (enter passwords)
    $ su bigtop_user
    $ cd ~
    $ sudo apt-get install git
  2. Download the install_bigtop.sh script from github.com.

    For first-time download, use the git clone command.

    git clone https://github.com/ibmsoe/bigtop

    For subsequent updates, use the git pull command.

    This will download the following scripts to a directory called bigtop:

    • cleanup.sh
    • install_bigtop.sh
    • restart-bigtop.sh
    • hadoopTest.sh
    • status.sh
    • sparkTest.sh

    Or, you can use the wget command for individual file downloads. For example, to download the install_bigtop.sh file, run the following command:

    wget https://raw.githubusercontent.com/ibmsoe/bigtop/master/install_bigtop.sh
  3. Change your directory to bigtop.
    $ git clone https://github.com/ibmsoe/bigtop
    $ cd bigtop/
    ~/bigtop$ ls
    install_bigtop.sh  restart-bigtop.sh  status.sh              Stock_workload.json
    cleanup.sh          LICENSE.md         source             stockprices.csv.gz-aa
    hadoopTest.sh       README.md          sparkTest.sh       stockprices.csv.gz-ab
  4. Verify that /etc/hosts has the host name associated with the private IP if using both private and public IPs.
  5. Zeppelin must be opened using the private IP address of the SoftLayer server. This requires that a VPN Portal be open. The private IP is listed along with the public IP on the Device Details page for the SoftLayer bare metal server.
    Figure 2. Device Details page for SoftLayer POWER8 bare metal server
  6. Increase the default maximum number of open files available.

    Example: Set the file limit to 1000000.

    Sudo vi /etc/security/limits.conf

    Add the following lines of code (this affects all users at next log in):

    *     soft    nofile          1000000 
    *     hard    nofile          1000000

    Log out and then log in for the new ulimit value to take effect.


4. Installing Bigtop Hadoop

Perform the following steps to download and install Apache Hadoop, Spark, and Zeppelin packages using the install_bigtop.sh script.

  1. Run the install_bigtop.sh script.

    For example: ./install_bigtop.sh

    Note: Ignore the error messages during the installation process, pertaining to Hadoop, YARN, or other processes not starting. These packages are being installed, however, the system is not configured to allow the packages to start running at this time.

    The install_bigtop.sh script performs the following tasks:

    • Installs all dependencies (Java Open JDK1.8)
    • Downloads and installs the latest Apache Bigtop Hadoop 2.7.1 Debian packages, including:
      • Hadoop v 2.7.1
      • Bigtop-groovey v2.4.4
      • Jsvc v1.0.15
      • Tomcat v6.0.36
      • ZooKeeper v3.4.6
      • Scala v2.10.4
    • Configures the environment for Hadoop
    • Formats the Hadoop Distributed File System (HDFS)
    • Downloads and installs Apache Bigtop Spark 1.5.1
    • Downloads and installs Zeppelin v0.5.6
    • Starts all configured services on a single node
  2. After the install_bigtop.sh script finishes installing, verify that everything is up and running, as expected, using the status.sh script.

    Example: $ ./status.sh

    Figure 3. Status of Bigtop processes

    Note: Spark Thrift server and Hadoop ZKFC are not needed for this benchmark.


5. Installing and running the Hadoop test script

Run the hadoopTest.sh script to verify that Hadoop is working properly.

Example: $ ./hadoopTest.sh


6. Installing and running the Spark test script

Run the sparkTest.sh script.

Example: $ ./sparkTest.sh

Verify the results.

$ sudo ./sparkTest.sh
Pi is roughly 3.1427

7. Zeppelin Tutorial

Apache Zeppelin is a versatile web-based UI notebook that installs with a default tutorial, written in Scala, to get you started on your way to creating your own notebook scripts. Perform the follow steps to run the Zeppelin Tutorial.

  1. Log in to Zeppelin from your browser.

    (For example: http://<Private IP Addr>:8080)

    Figure 4. Zeppelin welcome page
  2. Run the tutorial benchmark by clicking Zeppelin Tutorial.

    Note: If the Interpreter binding section is open with interpreters highlighted in blue (refer Figure 5), click Save.

    Figure 5. Interpreters loaded as part of the default interpreter group
  3. Then, click the Run all paragraphs icon to run the tutorial. Click OK when prompted.
    Figure 6. Zeppelin UI showing the “Run all paragraphs” icon
  4. Scroll down the page to view the results.
    Figure 7. Zeppelin Tutorial graphic results

    Click to see larger image

    Figure 7. Zeppelin Tutorial graphic results


8. Zeppelin sample stock intraday workload

If needed, download the new Zeppelin JSON file:
wget https://github.com/ibmsoe/bigtop/raw/master/Stock_workload.json

Note: The JSON file needs to be on the system viewing the Zeppelin notebook.

  1. Go to the Zeppelin welcome page.
  2. Select Import note.
  3. Click the Choose a JSON here panel.
  4. Browse to select the Stock_workload.json file.
  5. Run it by selecting the newly added Stock_workload.json notebook on the welcome page.
Figure 8. Stock Intraday Workload Tutorial on Zeppelin

Before starting the sample Stock workload JSON file, you can use the Interpreter tab to optimize the Spark properties according to your system configuration.

To optimize the Spark interpreter settings:

  1. Click the Interpreter tab.
  2. Click edit (refer to Figure 9).
  3. Change the values to maximize the Spark properties for your system, specifically consider:
    • spark.cores.max: Leave the value empty to use all available processors.
    • Spark.executor.memory: 16 GB minimum.
    • Zeppelin.spark.maxResults: (default is 1000) 40,000 at a minimum for good results.
  4. Save the changes.
  5. Return to the Stock_workload notebook by selecting it from the Notebook drop-down list.

Note: The following rule of thumb has been suggested.

Example: Best results for an IBM POWER8 processor-based SoftLayer bare metal server of type C812L-S with eight cores, SMT set to 4, and 64 GB of memory were obtained when:

  • spark.cores.max was set to 24.
    (8 – 2) * 4 = 24
  • Spark.executormemory was set to 100 GB.
  • Zeppelin.spark.maxResults was set to 16000000.

For example, best results for an IBM POWER8 processor-based SoftLayer bare metal server of type C812L-L with the 10 cores and 512 GB of memory were obtained when:

  • spark.cores.max was set to 64
  • Spark.executor.memory was set to 100 GB
  • Zeppelin.spark.maxResults was set to 16000000
Figure 9. Sample of Spark interpreter settings to maximize a POWER8 C812L-S configuration

Note: Starting the sample Stock_workload JSON file and clicking Run all paragraphs (see Figure 6), for the first time will report that all workloads under the Data Ingestion fail. The script appears to be loading data and starting the actual workloads too soon. Subsequent runs will report the data ingestion as failing. This is a known bug: it is failing because the stockprices.csv data has already been copied in to the /user/zeppelin/ directory. You can either remove stockprices.csv or ignore the error.

You can try this same Zeppelin workload on comparable x86 environments and see for yourself the benefits that POWER8 processor-based Linux on Power server brings to running with Spark.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=1032400
ArticleTitle=Quick start guide to Apache Bigtop v1.1 on IBM SoftLayer OpenPOWER with Ubuntu 14.04
publish-date=06012016