Preparing to provision a Db2 Big SQL instance

To query data, Db2 Big SQL instances must be connected to an existing remote big data storage system.

The remote big data storage system can be:

A Hadoop cluster on Cloudera Data Platform Version 7.1.9
Object storage
Db2 Big SQL supports:
- IBM Cloud Object Storage
- Amazon Web Services object storage
- IBM Storage Scale object storage
- Red Hat® OpenShift® Data Foundation

Object store requirements

To connect Db2 Big SQL to an object store, you must meet the following requirements.

The object store must be one of the following Hadoop S3a compatible object stores:
- Amazon Web Services S3
- IBM Cloud Object Storage
- IBM Storage Scale Object Storage
- IBM Storage Ceph®/Red Hat Ceph Storage
  Note: If version 7 of Ceph is used, the minimum level that is supported is 7.0z2.
- Microsoft Azure Data Lake Storage Gen2
The credentials must permit read and write access on the storage that Db2 Big SQL interacts with.

Remote Hadoop cluster requirements

To connect Db2 Big SQL to a remote Hadoop cluster, you must meet the following requirements.

The Hadoop cluster is on CDP 7.1.9, on x86-64 hardware.
As Db2 Big SQL needs to connect to the individual HDFS Data Nodes, they must be accessible and the associated ports must be open on each data node to the Db2 Big SQL service. The following components must also be accessible:
- Cloudera Manager server
- HDFS NameNodes
- Zookeeper
- Hive metastore
- Ranger (if configured)