WebHDFS

You can create Db2 Big SQL tables in a location that is specified as a WebHDFS uniform resource identifier (URI). Such tables store data in a separate HDFS instance, which is typically remote to the Db2 Big SQL cluster and managed separately.

About this task

WebHDFS provides the same functionality as HDFS, but over a REST interface, which eliminates wire compatibility issues between Db2 Big SQL and the remote HDFS, and enables Db2 Big SQL to access data from most versions of the HDFS. This flexibility comes with some reduced performance. The feature is intended to enable Db2 Big SQL to reach out to remote clusters, from any Hadoop platform, if the WebHDFS protocol is preserved. After it has been set up, a WebHDFS Db2 Big SQL table works just like any other table.

Before you can use a WebHDFS URI as the location for a Db2 Big SQL table, you must ensure that the remote HDFS instance is accessible. You can connect to a WebHDFS instance through HTTP or HTTPS. You also need to register the remote cluster's certificate in the local JVM trust store.

Limitation: Db2 Big SQL cannot connect to WebHDFS through Knox.

Accessing WebHDFS through Hadoop

About this task

You can use the Hadoop or HDFS shell commands and the WebHDFS URI to retrieve file and folder information. You can use the Hadoop shell commands (-ls, -mkdir, -put, -get, -cp, -mv, -cat, and others) with the WebHDFS on your Hadoop cluster. The format of the WebHDFS URI is webhdfs://<HOSTNAME>:<HTTP_PORT>/<PATH>. For example: webhdfs://namenode.acme.com:50070/path/to/table/test_webhdfs. If SSL is enabled on the remote WebHDFS, the format of the URI becomes swebhdfs://<HOSTNAME>:<HTTP_PORT>/<PATH>.

Creating a Db2 Big SQL table in the WebHDFS

About this task

You can create a Db2 Big SQL table by specifying a location that is a WebHDFS URI. For example:

CREATE EXTERNAL HADOOP TABLE country (
  SALESCOUNTRYCODE INT,
  COUNTRY VARCHAR(40),
  ISOTHREELETTERCODE VARCHAR(3),
  ISOTWOLETTERCODE VARCHAR(2),
  ISOTHREEDIGITCODE VARCHAR(3),
  CURRENCYNAME VARCHAR(50),
  EUROINUSESINCE TIMESTAMP
) LOCATION 'webhdfs://namenode.acme.com:50070/path/to/table/country';

The table can be a partitioned table. For example:

CREATE HADOOP TABLE staff (
  ID SMALLINT,
  NAME VARCHAR(9),
  DEPT SMALLINT,
  YEARS SMALLINT,
  SALARY DECIMAL(7,2),
  COMM DECIMAL(7,2)
) PARTITIONED BY (JOB VARCHAR(5))
  LOCATION 'webhdfs://namenode.acme.com:50070/path/to/table/staff';

You can also load data from a WebHDFS source file. For example:

LOAD HADOOP USING FILE URL 'webhdfs://namenode.acme.com:50070/path/to/table/staff.csv'
  WITH SOURCE PROPERTIES (
    'field.indexes'='1,2,3,5,6,7,4',
    'replace.with.null'='NULL',
    'replace.string.with.null'='NULL')
  INTO TABLE staff;