- 1. Introduction
- 2. Getting started on SoftLayer
- 3. Preparing the system for Bigtop installation
- 4. Installing Bigtop Hadoop
- 5. Installing and running the Hadoop test script
- 6. Installing and running the Spark test script
- 7. Zeppelin Tutorial
- 8. Zeppelin sample stock intraday workload
- Downloadable resources
Quick start guide to Apache Bigtop v1.1 on IBM SoftLayer OpenPOWER with Ubuntu 14.04
IBM POWER8 architecture, which is the latest offering of IBM SoftLayer is the perfect vehicle for trying out Apache Bigtop solutions. In under 45 minutes, from receiving a welcome package to your new bare metal IBM POWER8 processor-based server, you can have Hadoop and Spark, along with many other software packages, installed, configured, and ready to run pre-packaged tutorials on a Zeppelin notebook, all done automatically by using the install_bigtop.sh script. SoftLayer bare metal servers are not required to use the install_bigtop.sh script, as it works with Ubuntu 14.04.
Outline of steps to install Apache Bigtop v1.1 on POWER8 with Ubuntu 14.04
- Client places an order. Refer to the following URLs for more details.
“Get started now! Use PROMO code FREEPOWER8 for up to $2,238 in credits towards a POWER8 system in SoftLayer starting the first of every month.”
- Client receives a welcome package, including web portal access with a user ID and password.
- Client accesses the SoftLayer POWER8 server and prepares for installation.
- Client downloads the install_bigtop.sh script that will download, configure, and install the Apache Hadoop, Spark, Zeppelin, and other needed packages for Linux on Power.
- Client logs in to Apache Zeppelin and runs the preconfigured benchmark.
- Client uploads the sample Stock Workload and runs it for a quick performance test
Current IBM SoftLayer OpenPOWER optimized POWER8 offerings are bundled in four convenient packages. Refer to the following figure.
Figure 1. SoftLayer POWER8 bare metal server options (as of May 2016)
For more information and current pricing about SoftLayer POWER8 bare metal server configurations, visit: SoftLayer POWER8 bare metal servers.
Minimum system requirements for Bigtop
- Four cores
- 32 GB of memory
- 50 GB virtual Small Computer System Interface (SCSI) disk
- Linux Ubuntu 14.04.03 Little Endian
2. Getting started on SoftLayer
This section explains how you can access the SoftLayer POWER8 bare metal server.
Accessing the SoftLayer POWER8 bare metal server
You will receive a welcome email from SoftLayer with a link to the SoftLayer Control Portal along with the login information.
View the Getting Registered tutorial for detailed information about first-time access to fully configure a SoftLayer system.
Follow the steps outlined in the SoftLayer Tutorial videos to accomplish the following tasks:
- Log in (first-time access).
- Quickly navigate through the menus to configure the IBM SoftLayer POWER8 server.
- Understand where to access online tools, including how to re-image the system.
- Open a remote virtual private network (VPN) and access the virtual system using Secure Shell (SSH).
For more information about SoftLayer setup and configuration, refer to: knowledgelayer.softlayer.com.
3. Preparing the system for Bigtop installation
The install_bigtop.sh script can quickly, in less than 45 minutes, completely install and configure Hadoop, Spark, and Zeppelin (a tool for running benchmarking) automatically.
In order to run the install_bigtop.sh script, you must have super user privileges.
It is highly recommended that the install_bigtop.sh script be installed on a newly installed Ubuntu kernel. If the system has already been in use, run the cleanup.sh script located with the rest of the packages at: https://github.com/ibmsoe/bigtop/
Note: The install_bigtop.sh script will fail to install if the ~/bigtop/source directory exists.
- Install Bigtop as a non-root user (for example: bigtop_user) and perform the following steps.
- Add the new user.
- Set the new user password.
- Log in as the new user
- Change to the user home directory.
- Install git preparatory for downloading from github.com
$ useradd bigtop_user –U –G sudo –m $ passwd bigtop_user (enter passwords) $ su bigtop_user $ cd ~ $ sudo apt-get install git
- Download the install_bigtop.sh script from github.com.
For first-time download, use the
git clone https://github.com/ibmsoe/bigtop
For subsequent updates, use the
This will download the following scripts to a directory called bigtop:
Or, you can use the
wgetcommand for individual file downloads. For example, to download the install_bigtop.sh file, run the following command:
- Change your directory to bigtop.
$ git clone https://github.com/ibmsoe/bigtop $ cd bigtop/ ~/bigtop$ ls install_bigtop.sh restart-bigtop.sh status.sh Stock_workload.json cleanup.sh LICENSE.md source stockprices.csv.gz-aa hadoopTest.sh README.md sparkTest.sh stockprices.csv.gz-ab
- Verify that /etc/hosts has the host name associated with the private IP if using both private and public IPs.
- Zeppelin must be opened using the private IP address of the SoftLayer server. This requires that a VPN Portal be open. The private IP is listed along with the public IP on the Device Details page for the SoftLayer bare metal server.
Figure 2. Device Details page for SoftLayer POWER8 bare metal server
- Increase the default maximum number of open files available.
Example: Set the file limit to 1000000.
Sudo vi /etc/security/limits.conf
Add the following lines of code (this affects all users at next log in):
* soft nofile 1000000 * hard nofile 1000000
Log out and then log in for the new ulimit value to take effect.
4. Installing Bigtop Hadoop
Perform the following steps to download and install Apache Hadoop, Spark, and Zeppelin packages using the install_bigtop.sh script.
- Run the install_bigtop.sh script.
For example: ./install_bigtop.sh
Note: Ignore the error messages during the installation process, pertaining to Hadoop, YARN, or other processes not starting. These packages are being installed, however, the system is not configured to allow the packages to start running at this time.
The install_bigtop.sh script performs the following tasks:
- Installs all dependencies (Java Open JDK1.8)
- Downloads and installs the latest Apache Bigtop Hadoop 2.7.1 Debian packages, including:
- Hadoop v 2.7.1
- Bigtop-groovey v2.4.4
- Jsvc v1.0.15
- Tomcat v6.0.36
- ZooKeeper v3.4.6
- Scala v2.10.4
- Configures the environment for Hadoop
- Formats the Hadoop Distributed File System (HDFS)
- Downloads and installs Apache Bigtop Spark 1.5.1
- Downloads and installs Zeppelin v0.5.6
- Starts all configured services on a single node
- After the install_bigtop.sh script finishes installing, verify that everything is up and running, as expected, using the status.sh script.
Example: $ ./status.sh
Figure 3. Status of Bigtop processes
Note: Spark Thrift server and Hadoop ZKFC are not needed for this benchmark.
5. Installing and running the Hadoop test script
Run the hadoopTest.sh script to verify that Hadoop is working properly.
6. Installing and running the Spark test script
Run the sparkTest.sh script.
Verify the results.
$ sudo ./sparkTest.sh Pi is roughly 3.1427
7. Zeppelin Tutorial
Apache Zeppelin is a versatile web-based UI notebook that installs with a default tutorial, written in Scala, to get you started on your way to creating your own notebook scripts. Perform the follow steps to run the Zeppelin Tutorial.
- Log in to Zeppelin from your browser.
(For example: http://<Private IP Addr>:8080)
Figure 4. Zeppelin welcome page
- Run the tutorial benchmark by clicking Zeppelin Tutorial.
Note: If the Interpreter binding section is open with interpreters highlighted in blue (refer Figure 5), click Save.
Figure 5. Interpreters loaded as part of the default interpreter group
- Then, click the Run all paragraphs icon to run the tutorial. Click OK when prompted.
Figure 6. Zeppelin UI showing the “Run all paragraphs” icon
- Scroll down the page to view the results.
Figure 7. Zeppelin Tutorial graphic results
8. Zeppelin sample stock intraday workload
If needed, download the new Zeppelin JSON file:
Note: The JSON file needs to be on the system viewing the Zeppelin notebook.
- Go to the Zeppelin welcome page.
- Select Import note.
- Click the Choose a JSON here panel.
- Browse to select the Stock_workload.json file.
- Run it by selecting the newly added Stock_workload.json notebook on the welcome page.
Figure 8. Stock Intraday Workload Tutorial on Zeppelin
Before starting the sample Stock workload JSON file, you can use the Interpreter tab to optimize the Spark properties according to your system configuration.
To optimize the Spark interpreter settings:
- Click the Interpreter tab.
- Click edit (refer to Figure 9).
- Change the values to maximize the Spark properties for your system, specifically consider:
- spark.cores.max: Leave the value empty to use all available processors.
- Spark.executor.memory: 16 GB minimum.
- Zeppelin.spark.maxResults: (default is 1000) 40,000 at a minimum for good results.
- Save the changes.
- Return to the Stock_workload notebook by selecting it from the Notebook drop-down list.
Note: The following rule of thumb has been suggested.
Example: Best results for an IBM POWER8 processor-based SoftLayer bare metal server of type C812L-S with eight cores, SMT set to 4, and 64 GB of memory were obtained when:
- spark.cores.max was set to 24.
(8 – 2) * 4 = 24
- Spark.executormemory was set to 100 GB.
- Zeppelin.spark.maxResults was set to 16000000.
For example, best results for an IBM POWER8 processor-based SoftLayer bare metal server of type C812L-L with the 10 cores and 512 GB of memory were obtained when:
- spark.cores.max was set to 64
- Spark.executor.memory was set to 100 GB
- Zeppelin.spark.maxResults was set to 16000000
Figure 9. Sample of Spark interpreter settings to maximize a POWER8 C812L-S configuration
Note: Starting the sample Stock_workload JSON file and clicking Run all paragraphs (see Figure 6), for the first time will report that all workloads under the Data Ingestion fail. The script appears to be loading data and starting the actual workloads too soon. Subsequent runs will report the data ingestion as failing. This is a known bug: it is failing because the stockprices.csv data has already been copied in to the /user/zeppelin/ directory. You can either remove stockprices.csv or ignore the error.
You can try this same Zeppelin workload on comparable x86 environments and see for yourself the benefits that POWER8 processor-based Linux on Power server brings to running with Spark.