Analyzing data stored on S3 or S3 compatible object store services (Db2 Big SQL)
You can use Db2 Big SQL to access data that is stored on private or public cloud object stores that are accessible through an S3 compatible interface, and then run SQL analytics on that data.
Before you begin
You must have a data file that is stored in a bucket or in an object store service such as IBM Cloud Object Store.
Configuring Db2 Big SQL to connect to an S3 or S3 compatible object store service
The Db2 Big SQL instance must be configured to access data that is stored on an S3 compatible object store service. This configuration is done when an IBM® Software Hub administrator provisions the Db2 Big SQL instance.
When the connecting to a remote data source stage in the provisioning process is reached, ask the administrator do the following steps.
- Select the Configure object store checkbox.
- In the Object store service endpoint box, enter an endpoint for the S3
compatible object store service that you want to connect to.
For example, s3.eu-gb.cloud-object-storage.appdomain.cloud.
- In the Object store access ID box, enter an access key for the S3 compatible object store service.
- In the Object store service secret box, enter the matching secret key.
- If you want to restrict access to a single bucket, select the Specify the object
store bucket name checkbox, and provide the bucket name.
If the checkbox is not selected, Db2 Big SQL can access any bucket that is accessible by the provided endpoint and credentials.

Identifying the JDBC URL to connect to Db2 Big SQL
Do the following steps to identify the JDBC URL that is needed to connect to Db2 Big SQL.
- As a user that was granted access to the Db2 Big SQL instance, log in to IBM Cloud Pak for Data web interface.
- Open the Instance page and locate the Db2
Big SQL instance that was configured to access the
object store.Tip: If you do not see the Db2 Big SQL instance in the table of instances, it is probably because you were not yet granted access by the IBM Software Hub administrator or the user that created the Db2 Big SQL instance.
- Click the Db2
Big SQL instance name.
The Db2 Big SQL instance details page opens.
- In the Access information section, check that the JDBC
URL field shows the External host name • Non-SSL parameters.
If you do not see External host name • Non-SSL, click the Edit icon and update the parameters.
Figure 2. Db2 Big SQL instance page - Copy the JDBC URL.
The URL is needed in the next section.
- Log out of the IBM Software Hub web interface.
Creating a table from data that is stored on an object store service
From a JDBC interactive client, such as a Jupyter notebook, you can create a table from data that is stored in a bucket that is on an object store service that the Db2 Big SQL instance is connected to. Connect Db2 Big SQL to the object store service by using the URL that is identified in the previous section.
In the following example, the statement expects a bucket s3bucket
with a
directory tpcds/customer to exist, and to contain the data files for the
customer
table that is being created. The data files are expected to be in a value
separated file format that uses the | character as a separator.
CREATE EXTERNAL HADOOP TABLE customer
(
c_customer_sk int not null,
c_customer_id varchar(16) not null,
c_current_cdemo_sk int ,
c_current_hdemo_sk int ,
c_current_addr_sk int ,
c_first_shipto_date_sk int ,
c_first_sales_date_sk int ,
c_salutation varchar(10) ,
c_first_name varchar(20) ,
c_last_name varchar(30) ,
c_preferred_cust_flag varchar(1) ,
c_birth_day bigint ,
c_birth_month bigint ,
c_birth_year bigint ,
c_birth_country varchar(20) ,
c_login varchar(13) ,
c_email_address varchar(50) ,
c_last_review_date int
)
STORED AS TEXTFILE
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LOCATION 's3a://s3bucket/tpcds/customer'
TBLPROPERTIES('serialization.null.format'='');
LOCATION
clause must always specify a directory that contains the files
for the table that is being created. The files must have the same structure. If you have files with
different structures and you want to create a different table for each structure, then the files
must be in separate directories.You can now query the table by using SQL. For example,
[localhost] [user1] 1> SELECT c_birth_country, count (c_customer_sk) AS count
[localhost] [user1] 2> FROM customer
[localhost] [user1] 3> GROUP BY c_birth_country
[localhost] [user1] 4> HAVING count > 500
[localhost] [user1] 5> ORDER BY count DESC;
+-----------------+-------+
| C_BIRTH_COUNTRY | COUNT |
+-----------------+-------+
| [NULL] | 3439 |
| VANUATU | 532 |
| BERMUDA | 516 |
| LESOTHO | 506 |
| MAURITIUS | 506 |
| LIBERIA | 501 |
+-----------------+-------+
6 rows in results (first row: 2.836s; total: 2.838s)
[localhost] [user1] 1>
If no data files exist in the path that is specified in the LOCATION
clause, the
table creation succeeds, but the table is empty. You can use Db2
Big SQL to insert data into the table.
Inferring the table structure from a remote file
You can use the CREATE TABLE (HADOOP) statement to infer table structure from a data file with a certain file format.
When you use the LIKE clause on the CREATE TABLE (HADOOP) statement, you can specify that the table columns are to have a name and type that are most similar to columns in a file at the specified URL. If the given URL identifies a directory, a file is selected at random from within that directory. Use the STORED AS clause to specify the file format, which must be one of ORC, PARQUET, PARQUETFILE, TEXTFILE, or JSONFILE.
By default, the new table is stored separately from the directory that is identified by the specified URL. However, you can use the LOCATION clause to store the table in the same directory. The file formats of the source and target tables should be the same.
CREATE EXTERNAL HADOOP TABLE customer
LIKE 's3a://monsoon/customer/customer.parquet'
STORED AS PARQUET
LOCATION 's3a://monsoon/customer';
For more information, see CREATE TABLE (HADOOP) statement.