IBM InfoSphere BigInsights Version 3.0

Importing data from and exporting data to DB2 by using Sqoop

You can use Sqoop to move data between the InfoSphere® BigInsights™ distributed file system and relational database management systems.

Before you begin

Ensure that your JDBC driver is compatible with the version of the database to which you are connecting. Most JDBC drivers are backward compatible, so if you are connecting to databases of different versions, you can use the higher version of the JDBC driver. If you are using an Oracle JDBC driver, Sqoop requires version 11g r2 or later.

About this task

Sqoop is a set of high-performance open source connectors that can be customized for your specific external connections. Sqoop also offers specific connector modules that are designed for different product types. Large amounts of data can be imported from various relational database sources into an InfoSphere BigInsights cluster by using Sqoop. For more information about using Sqoop with relational databases, see the Sqoop User Guide.

Procedure

  1. Copy the JDBC driver JAR files for any relational database that you will use with Sqoop to the $SQOOP_HOME/lib directory on the node where the Big SQL server is installed. This node is usually the namenode. The JAR files will be automatically propagated to other nodes as needed at run time.
  2. To import data to the cluster, start a data import map reduce job. When you use Sqoop to import data from a database with a built-in connector, you do not need to specify the -driver option. When you use the generic Sqoop connector to import from a database, you must specify the -driver option. The following example shows a data exchange with a DB2® database by using the built-in DB2 connector:
    $SQOOP_HOME/bin/sqoop import --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --password db2pwd --table db2tbl
            --split-by tbl_primarykey --target-dir sqoopimports
  3. To export data from InfoSphere BigInsights cluster to relational database tables, start a data export job. This example is for a data exchange with a DB2 database:
    $SQOOP_HOME/bin/sqoop export --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl 
            --export-dir /sqoop/dataFile.csv

What to do next

When the data is entered into the distributed file system, you can find the answers to your business questions by analyzing your data.
  • Use BigSheets to load your data into a workbook, create sheets to filter and combine your data for analysis, and apply visualizations for consumable results.
  • Use Big R to perform comprehensive data analysis in an R environment on structured and unstructured big data.
  • Use Big SQL to use your SQL skills, SQL-based business intelligence applications, and query or reporting tools to query data on your cluster.
  • Use Jaql to create and run queries to read, manipulate, and write data in your local environment or on a cluster.
  • Use Text Analytics to extract structured information from unstructured and semi-structured text.


Feedback