S3
Db2 Big SQL tables can be created in a location that is specified as an S3a Object Storage URI. Such tables store data in the Object Storage service, which is typically remote to the Db2 Big SQL cluster and managed separately by, for example, IBM®'s Cloud Object Storage - S3 API service that is available through IBM Bluemix®.
About this task
Simple Storage Service (S3) is an object store interface protocol that was invented by Amazon. S3a is the name of a Hadoop component that understands the S3 protocol. S3a provides an interface for Hadoop services, such as Db2 Big SQL, to consume S3-hosted data.
Before you can use an S3a uniform resource identifier (URI) as a location in a Hive or Db2 Big SQL table, you must add the connection information about the S3a Object Storage service to the core-site.xml file. You can access the HDFS service core-site.xml file from the Ambari web console to add the properties in Table 1 for the S3a file system configuration. After you add these properties, restart the HDFS, MapReduce, YARN, Hive, and Db2 Big SQL services. Obtain the authentication and connection property values for your Object Storage service to use here.
Property name | Property value |
---|---|
fs.s3a.impl | org.apache.hadoop.fs.s3a.S3AFileSystem |
fs.s3a.endpoint | The endpoint from which the S3 service is provided. IBM's Cloud Object Storage – S3 API is available from many regions. For example, the San Jose region's endpoint is s3-api.sjc-us-geo.objectstorage.service.networklayer.com. |
fs.s3a.secret.key | The required secret key from the S3 endpoint. |
fs.s3a.access.key | The required access key from the S3 endpoint. |
When using IBM Cloud COS (Cloud Object Store) you should generate HMAC credentials which will have the access_key_id and secret_access_key properties needed for S3 connection. Through the IBM Cloud COS UI you can create HMAC credentials following these steps. Make sure that {"HMAC":true} is specified in the parameters field.
Use the value of access_key_id for fs.s3a.access.key and the value of secret_access_key for fs.s3a.secret.key in the core-site.xml.
- The S3a component supports only one endpoint definition.
- Adding these
fs.s3a.*
properties to the core-site.xml file makes data that is stored in the object storage server publicly accessible to anyone with access to the Hadoop cluster. The data does not have an owner, group, or permissions set. Db2 Big SQL can use GRANT statements to restrict access to a table, but the data is still accessible to everyone through Hadoop shell commands. Note that Db2 Big SQL impersonation is not applicable to these tables. If you prefer not to expose this information, see Enabling Db2 Big SQL to use a credential keystore file.
Accessing S3a buckets and files through Hadoop
About this task
s3atables, using the Bluemix San Jose S3 endpoint.
# aws s3 ls s3atables --recursive --endpoint-url=https://s3-api.sjc-us-geo.objectstorage.softlayer.net
2016-07-27 14:51:16 922 country/country.csv
2016-07-27 18:11:22 367 staff/job=Clerk/l1469234296227-6-r-00000
2016-07-27 18:03:29 286 staff/job=Mgr/l1469234296227-6-r-00003
2016-07-27 18:13:50 362 staff/job=Sales/l1469234296227-6-r-00003
# hdfs dfs -ls s3a://s3atables/
Found 1 items
drwxrwxrwx - 0 2016-07-27 14:39 s3a://s3atables/country
# hdfs dfs -ls s3a://s3atables/country
Found 1 items
-rw-rw-rw- 1 922 2016-07-27 14:51 s3a://s3atables/country/country.csv
By using an S3a client, you can create a bucket and add files in the S3 Object Storage service.
# hdfs dfs -ls -R s3a://s3atables/staff
drwxrwxrwx - 0 2016-07-27 18:10 s3a://s3atables/staff/job=Clerk
-rw-rw-rw- 1 367 2016-07-27 18:11 s3a://s3atables/staff/job=Clerk/l1469234296227-6-r-00000
drwxrwxrwx - 0 2016-07-27 17:57 s3a://s3atables/staff/job=Mgr
-rw-rw-rw- 1 286 2016-07-27 18:03 s3a://s3atables/staff/job=Mgr/l1469234296227-6-r-00003
drwxrwxrwx - 0 2016-07-27 18:08 s3a://s3atables/staff/job=Sales
-rw-rw-rw- 1 362 2016-07-27 18:13 s3a://s3atables/staff/job=Sales/l1469234296227-6-r-00003
Creating a Db2 Big SQL table with S3a Object Storage location
About this task
CREATE EXTERNAL HADOOP TABLE country (
SALESCOUNTRYCODE INT,
COUNTRY VARCHAR(40),
ISOTHREELETTERCODE VARCHAR(3),
ISOTWOLETTERCODE VARCHAR(2),
ISOTHREEDIGITCODE VARCHAR(3),
CURRENCYNAME VARCHAR(50),
EUROINUSESINCE TIMESTAMP
) LOCATION 's3a://s3atables/country';
CREATE HADOOP TABLE staff (
ID SMALLINT,
NAME VARCHAR(9),
DEPT SMALLINT,
YEARS SMALLINT,
SALARY DECIMAL(7,2),
COMM DECIMAL(7,2)
) PARTITIONED BY (JOB VARCHAR(5))
LOCATION 's3a://s3atables/staff';
LOAD HADOOP USING FILE URL 's3a://data-files/staff.csv'
WITH SOURCE PROPERTIES (
'field.indexes'='1,2,3,5,6,7,4',
'replace.with.null'='NULL',
'replace.string.with.null'='NULL')
INTO TABLE staff;