Analytics for Apache Hadoop service is essentially a single data node BigInsights (v220.127.116.11) cluster with Data Scientist package therefore has all the value-adds like BigSQL,BigSheets, BigR,TextAnalytics built-in.You can start analyzing your data right away using analytics capabilities of Analytics for Apache Hadoop capabilities like BigSheets or TextAnalytics after loading the data. Lets not consider BigInsights as replacement for a relational database management system (DBMS) or a traditional data warehouse. It isn't optimized for interactive queries over tabular data structures, online analytical processing (OLAP), or online transaction processing (OLTP) applications. Rather, it's a platform that can augment your existing analytic infrastructure, enabling you to filter through high volumes of raw data and combine the results with structured data stored in your DBMS or warehouse, if desired.Another potential deployment approach involves using BigInsights as a query-ready archive for a data warehouse. With this approach, frequently accessed data can be maintained in the warehouse while “cold” or outdated information can be offloaded to BigInsights.
IBM dashDB ( yet another service on Bluemix cloud platform ),is a data warehousing and analytics solution. Use dashDB to store relational data, including special types such as geospatial data. Then analyze that data with SQL or advanced built-in analytics like predictive analytics and data mining, analytics with R, and geospatial analytics. dashDB is built to connect easily to all of your services and applications.
In this article , we will learn how to load data from dashDB a data-warehousing service to a hadoop table in Analytics for Apache Hadoop service with BigSQL
- First lets get started by launching BigSQL from your Analytics for Apache Hadoop service. Once you have provisioned the Analytics for Apache Hadoop service, click the Launch button to see the BigInsights Home page.
Once you see the BigInsights Home page , you shall see launch pads for BigSheets , TextAnalytics and Big SQL. For now click Launch from the Big SQL frame.
The first time you launch Big SQL, the page which gets loaded is the Monitor Database though which you shall inspect various parameters of the BigSQL data-base through the user interface.
Explore Database lets you check the table definition and statistics by running Analyze on the chosen table. Before you get there , you will need to chose the Database from the Drop Down by selecting the Host:Port Instance value of BIGSQL. Now it will prompt for credentials ,enter the User ID :biblumix and then enter the Password with the one you find in the Analytics for Apache Hadoop service Dashboard.
Click on SQL Editor from the Left Pane to launch the SQL editor which lets you can create tables and run any DML queries.
Now you are ready to run queries to create hadoop or hbase table and load it with data from a dashDB database.
For demo , let me pull data from GOSALES.BRANCH from the dashDB service. You shall need to get the JDBC connection URL and the credentials from the dashDB service from the dashDB Console under Connect tab.
For reference, I am attaching the query here:
CREATE HADOOP TABLE if not exists hadoop_dash (BRANCH_CODE INTEGER NOT NULL, ADDRESS1 VARCHAR(120) NOT NULL, WAREHOUSE_BRANCH_CODE INTEGER NOT NULL) ;
LOAD HADOOP USING JDBC CONNECTION URL 'jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50000/BLUDB' WITH PARAMETERS (user='dashxxxx',password='xxxxxxxxx') FROM SQL QUERY 'SELECT BRANCH_CODE,ADDRESS1,WAREHOUSE_BRANCH_CODE FROM GOSALES.BRANCH WHERE $CONDITIONS' SPLIT COLUMN BRANCH_CODE INTO TABLE hadoop_dash WITH LOAD PROPERTIES ('num.map.tasks' = '1', 'max.rejected.records'='100'); SELECT * FROM hadoop_dash;
- Back in the Explore Database tab, you shall notice that the table which was created above is listed under Hadoop Tables. You may run Analyze on the table to collect the table statistics.
To use SSL connection in the LOAD statement, use the sample syntax given below:
LOAD HADOOP USING JDBC CONNECTION URL 'jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50001/BLUDB' WITH PARAMETERS (user='dashxxxxx',password='xxxxxxx',sslConnection='true') FROM SQL QUERY 'SELECT BRANCH_CODE,ADDRESS1,WAREHOUSE_BRANCH_CODE FROM GOSALES.BRANCH WHERE $CONDITIONS' SPLIT COLUMN BRANCH_CODE INTO TABLE hadoop_dashssl WITH LOAD PROPERTIES ('num.map.tasks' = '1', 'max.rejected.records'='100');