Extending Db2 Big SQL

You can extend the capability of Db2® Big SQL by adding certain JAR files.

About this task

You can use Java™ packages that are available from the web, or Java packages that you create to extend the functions of Db2 Big SQL. You extend Db2 Big SQL by adding JAR files to the appropriate paths. BigSQL includes a useful custom SerDe (Serializer/Deserializer) extension for processing JSON data, which can be used to transform a JSON record into something that Db2 Big SQL can process. To use this SerDe, specify the ROW FORMAT SERDE option in the CREATE TABLE syntax. For example:
CREATE EXTERNAL HADOOP TABLE bigsql.example_json (
 col1 int
 col2 date
) ROW FORMAT SERDE 'com.ibm.biginsights.bigsql.serde.JSONSerDe';
In addition to this custom JSONSerDe, which has been provided already, you can add or create your own custom SerDe or download it from the web.

When you add a JAR file to Db2 Big SQL, the JAR file is available to the Db2 Big SQL JVM and the MapReduce JVMs.

Procedure

  1. Stop the Db2 Big SQL server.
  2. Depending on the purpose of the JAR file, complete the following steps:
    Description of the JAR file Path to the appropriate library
    If the JAR file contains a custom SerDe or a User Defined Function (UDF)
    1. Create or download the appropriate SerDe JAR file or a user-defined function (UDF) JAR file.
    2. If they do not already exist, create directories under biginsights/bigsql to contain SerDe and UDF JAR files:
      hdfs dfs -mkdir /biginsights/bigsql/serde
      hdfs dfs -mkdir /biginsights/bigsql/udf
    3. Make sure the security permissions for the distributed file system (DFS) /biginsights/bigsql directory allow the appropriate write access. If required, complete the following steps to change the permission:
      1. From a Linux® terminal window, log in as the root user and type the password when prompted:
        su - root
      2. Switch to the hdfs user account:
        su - hdfs
      3. Run the following commands to change the directory permissions:
        hdfs dfs -chmod 777 /biginsights/bigsql/serde
        hdfs dfs -chmod 777 /biginsights/bigsql/udf
      4. Verify the permissions on the directory:
        hdfs dfs -ls /biginsights
      5. Exit the hdfs account and then the root account by typing exit in each account shell.
    4. Copy the JAR file from your local file system to the distributed file system directory. The JAR files for user applications such as custom SerDes or UDFs are stored in DFS in the following paths:
      
      hdfs dfs -copyFromLocal /yourLocaldir/hive-json-serde-0.2.jar /biginsights/bigsql/serde
      
      hdfs dfs -copyFromLocal /yourLocaldir/my_UDFfile.jar /biginsights/bigsql/udf
      Only a member of the hadoop group can put JAR files to this location.
    5. Set the $HADOOP_HOME environment variable to the appropriate Hadoop client directory so that you can run the required stored procedure. If you accepted the installation defaults, the directory is /opt/cloudera/parcels/CDH/lib/hadoop. From the bash shell, run the following command:
      export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
    6. Restart the Db2 Big SQL server. Launch your Db2 Big SQL query execution tool, such as JSQSH or the Db2 Big SQL console, and connect to the Db2 Big SQL database as the bigsql user.
    7. Call the Db2 Big SQL stored procedure to complete the SerDe registration:
      call SYSHADOOP.SERDE_JAR_SYNC();
      The stored procedure copies the JAR file into the appropriate installation directories on your cluster and adds it to your class path.
    If you want to add a JDBC driver of a source database so that it can be used in the Big SQL LOAD USING statement in the WITH PARAMETERS option:
    1. Copy the JAR file to /opt/cloudera/parcels/CDH/lib/sqoop/lib.
    2. Restart the Db2 Big SQL server.