Db2 Big SQL architecture

IBM® Db2® Big SQL is an advanced SQL engine that is optimized for advanced analytics in big data environments. IBM Db2 Big SQL combines open source big data technologies with powerful Db2 SQL processing to drive efficient, interactive, and batch analytics use cases.

A Db2 Big SQL instance that is deployed on Cloud Pak for Data can be configured in one of three ways: Hadoop side-car, pure object store, or a hybrid deployment that is configured for both.

When Db2 Big SQL is configured as a Hadoop side-car instance, the Db2 Big SQL engine runs on Cloud Pak for Data, but the metadata and data are stored on the remote Hadoop cluster’s Hive metastore and the HDFS, respectively (see Figure 1). Because of this, you can offload your resource-intensive BI workloads from the Hadoop cluster to Cloud Pak for Data while keeping your data, metadata, and security policies where they are now. Db2 Big SQL reads the data from the HDFS over the network in parallel. If you create a table by using Db2 Big SQL, a Hive table is created in the Hive metastore of the Hadoop cluster.

Architectural diagram showing Db2 Big SQL configured as a Hadoop side-car instance — Figure 1. Db2 Big SQL configured as a Hadoop side-car instance

When Db2 Big SQL is configured as a pure object store instance, the Db2 Big SQL engine runs on Cloud Pak for Data, the metadata is stored in a Hive metastore that is associated with the Db2 Big SQL instance, and the data is stored on an object storage service (see Figure 2). The object storage service can be offered by a public cloud provider (such as IBM or Amazon), or it can be implemented as a storage cluster (for example, Red Hat® OpenShift® Data Foundation) in the same data center as the Cloud Pak for Data compute cluster.

Important: Starting with Db2 Big SQL 7.2.5, the Hive metastore backend database is the BIGSQL database. The data is stored in the HIVE schema, and accessed by a dedicated system user named hive. Consider the HIVE schema as an internal schema; its data must not be modified by any locally running workload.

Architectural diagram showing Db2 Big SQL configured as a pure object store instance — Figure 2. Db2 Big SQL configured as a pure object store instance

If the remote Hadoop cluster is configured to access an object store, you can also deploy Db2 Big SQL in a hybrid configuration to access tables on both HDFS and object store.

Db2 Big SQL head and worker nodes:

A Db2 Big SQL service instance has two main nodes: c-bigsql-<instance_id>-db2u-0 and c-bigsql-<instance_id>-db2u-x, where x is a value of 1 or greater. The term head node in Db2 Big SQL corresponds to the c-bigsql-<instance_id>-db2u-0 pod that runs on your Red Hat OpenShift cluster. The term worker node in Db2 Big SQL refers to the worker service component that runs on each c-bigsql-<instance_id>-db2u-x pod. You can allocate multiple worker nodes, which are effectively multiple c-bigsql-<instance_id>-db2u-x pods, to a Db2 Big SQL service instance.

You must not confuse Db2 Big SQL nodes with OpenShift compute nodes, which are the physical nodes that compose the OpenShift cluster. For more information about cluster compute nodes, see Architecture for Cloud Pak for Data.