Qlik Sense integrated with Hortonworks Data Platform (HDP) running on IBM Power Systems
Steps for discovering and visualizing data in HDP on IBM Power Systems using Qlik Sense
Qlik Sense is a business intelligence (BI) solution that helps in discovering the insights that query-based BI tools can simply miss. You can search and explore across all your data, instantly pivoting your analysis when new ideas surface. Qlik Sense gives total flexibility with a cloud-ready data analytics platform that supports the full spectrum of BI use cases. Qlik Sense supports accessing data in Apache Hadoop environments. Validation testing was performed to verify Qlik Sense’s ability to integrate with and visualize data specifically to Hortonworks Data Platform (HDP) on IBM® POWER8®. This article provides an overview of the validation tests that were completed.
The key objectives for the validation testing of Qlik Sense were to:
- Configure Qlik Sense to connect to HDP 2.6 running on an IBM POWER8 processor-based server.
- Extract and visualize sample data from the Hadoop Distributed File System (HDFS) of HDP running on a POWER8 processor-based server.
This section lists the high-level components included in the test environment.
- Qlik Sense Desktop 11.11.1 for a Microsoft Windows 7 PC
- Hortonworks ODBC Driver for Apache Hive (v184.108.40.2066 64 bit)
- A notebook running Windows 7
Hortonworks Data Platform
- Hortonworks Data Platform (HDP) version 2.6
- Red Hat Enterprise Linux (RHEL) 7.2
- Minimum resources: Eight virtual processors, 24 GB memory, 50 GB disk space
- IBM PowerKVM™
- IBM POWER8 processor-based server
The deployment architecture is quite simple. Qlik Sense and the Hortonworks ODBC driver were installed and run on a Windows 7 notebook. HDP was installed and run on a POWER8 server. Qlik Sense and the ODBC driver were configured to connect to HDP. Data in HDP was accessed and visualized by Qlik Sense. Tests were run in a single-node HDP environment and a multi-node HDP cluster.
Installation and configuration
The section covers installation and configuration of a HDP cluster and Qlick Sense software.
Installing and configuring a HDP cluster
Here are the high-level steps to install and configure the HDP cluster:
- Follow the installation guide for HDP on IBM Power Systems™ to install and configure the HDP cluster. Refer to the Related topics section for more information.
- Log in to the Ambari server and ensure that all the services are running.
- Monitor and manage the HDP cluster, Hadoop, and related services through Ambari.
Setting up test data and Hive tables
Download the MovieLens and driver test data, create Hive tables and copy the data to the Hive tables in HDFS.
- Download the MovieLens data set from MovieLens Dataset (see the citation in Related topics).
- Create tables in Hive.
hive> create database movielens; OK Time taken: 3.063 seconds hive> use movielens; OK Time taken: 0.261 seconds hive> > CREATE EXTERNAL TABLE ratings ( > userid INT, > movieid INT, > rating INT, > tstamp STRING > ) ROW FORMAT DELIMITED > FIELDS TERMINATED BY '#' > STORED AS TEXTFILE > LOCATION '/dataset/movielens/ratings'; OK Time taken: 0.578 seconds hive> > CREATE EXTERNAL TABLE movies ( > movieid INT, > title STRING, > genres ARRAY<STRING> > ) ROW FORMAT DELIMITED > FIELDS TERMINATED BY '#' > COLLECTION ITEMS TERMINATED BY "|" > STORED AS TEXTFILE > LOCATION '/dataset/movielens/movies'; OK Time taken: 0.429 seconds hive> > CREATE EXTERNAL TABLE users ( > userid INT, > gender STRING, > age INT, > occupation INT, > zipcode STRING > ) ROW FORMAT DELIMITED > FIELDS TERMINATED BY '#' > STORED AS TEXTFILE > LOCATION '/dataset/movielens/users'; OK Time taken: 0.291 seconds hive> > CREATE EXTERNAL TABLE occupations ( > id INT, > occupation STRING > ) ROW FORMAT DELIMITED > FIELDS TERMINATED BY '#' > STORED AS TEXTFILE > LOCATION '/dataset/movielens/occupations'; OK Time taken: 0.336 seconds hive>
- Load the test data into tables.
hadoop fs -put ratings.t /dataset/movielens/ratings hadoop fs -put movies.t /dataset/movielens/movies hadoop fs -put users.t /dataset/movielens/users hadoop fs -put occupations.t /dataset/movielens/occupations
- Verify the data in HDFS.
[root@hdp264build ml-1m]# hadoop fs -ls /dataset/movielens Found 4 items drwxr-xr-x - hive hdfs 0 2018-05-30 05:19 /dataset/movielens/movies drwxr-xr-x - hive hdfs 0 2018-05-30 05:19 /dataset/movielens/occupations drwxr-xr-x - hive hdfs 0 2018-05-30 05:19 /dataset/movielens/ratings drwxr-xr-x - hive hdfs 0 2018-05-30 05:19 /dataset/movielens/users [root@hdp264build ml-1m] #
- Verify the data in Hive tables.
hive> show databases; OK default movielens Time taken: 2.559 seconds, Fetched: 2 row(s) hive> show tables in movielens; OK movies occupations ratings users Time taken: 0.371 seconds, Fetched: 4 row(s) hive>
Installing and configuring Hortonworks ODBC driver
Here are the steps to install and configure the ODBC driver:
- Download the Hortonworks ODBC driver on Microsoft Windows 7 (see Related topics for the download website).
- Install and configure the ODBC driver. Follow the instructions in the tutorial and guide listed in the Related topics section.
Installing and configuring Qlik Sense
Here are the steps to install and configure Qlik Sense:
- Go to the Qlik Sense download page (see Related topics) to download Qlik Sense Desktop Edition on Windows 7.
- Follow the instructions to install it on Windows 7.
- Follow the instructions in the Qlik Sense Quick Installation Guide (see Related topics). Some of the steps are described in the following configuration section.
Connecting HDP to Qlik Sense
Qlik Sense uses the following methods for fetching data from HIVE2 running on HDP. In the test, the first method (A) was used for ingesting data from Hive.
- Method A: Data loaded to Qlik Sense In-Memory Associative Data Store
- Method B: Qlik Sense Hybrid Solution – Qlik Sense Direct Discovery on top of Hadoop
Here are the steps to configure the connection between HDP and Qlik Sense.
- Launch ODBC Administrator from Windows and add a data source for Hortonworks Hive as
shown in Figure 1.
Figure 1. Hortonworks Hive ODBC Driver setup screens
- On Windows 7, launch the Qlik Sense application. Create a new app by clicking
Create new app. Enter a name for the app and click
Create. Open the newly created app and click Create new connection -> ODBC. Refer to Figure 2 and Figure 3.
Figure 2. Create a new app
Figure 3. Qlik Sense - ODBC new connection screen
- Connect to the HIVE2 server running on HDP 2.6 instance running on the IBM POWER8
processor-based server as shown in Figure 3. Select the ODBC data source added from the
ODBC Administrator in the previous step. Provide the Hive user name and password (use the
Hive DB password and not the Hive UNIX user password). The connection to HIVE2 must
succeed to continue.
Note: If you have created the Hive DB and tables using the Hive user account, then use hive as the user username and the Hive DB password.
- Click Load data at the upper-right side to load the data from the Hive table to the
Qlik Sense application memory, as shown in Figures 4.
Figure 4. Loading data to the Qlik Sense application memory
From the drop-down menu at the upper-left side (as shown in figure 5) select Data load editor and then click Add data.
Select the required data connection as, in this case ODBC, from the left side menu (as shown in Figure 6).
Figure 5. Add data to the new Qlik Sense app
Figure 6. Add data to the new Qlik Sense App (continued)
Select the tables that needs to be added to the new app and click Add data. Refer to Figure 7. After the data is loaded it will be available for analysis. Figure 8 shows data loaded in the Qlik Sense app.
Figure 7. Add data to the new Qlik Sense app (continued)
Figure 8. Tables in the new Qlik Sense App
Visualization and analysis in Qlik Sense
Here are the steps to visualize and analyze data using Qlik Sense:
- Connect the tables (circles) shown in Figure 8 by dragging toward each other to create
an association. After the tables are connected, a new window opens (Figure 9) that enables
you to select the fields for association. Associated data will look as in Figure 10.
Figure 9. Data association in Qlik Sense
Figure 10. Data associated in Qlik Sense
- You can perform analysis and visualization on the data fetched from the Hive DB. Note that
the data is now in memory and analysis is done on the data in memory. Figure 11 and Figure
12 show example visualizations within the Qlik Sense dashboard.
Figure 11. Qlik Sense visualization example 1
Figure 12. Qlik Sense visualization example 2
- Hortonworks Data Platform: Apache Ambari Installation for IBM Power Systems
- Hortonworks ODBC Driver for Apache Hive v2.1.12 download web page
- Hortonworks ODBC installation and configuration guide
- Business discovery and visualizing your data using Qlik Sense
- Qlik Sense website
- Qlik Sense Personal Edition download web page
- MovieLens Dataset
- ISV solution ecosystem for Hortonworks on IBM Power Systems
- MovieLens data set citation:
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872