Db2 Big SQL architecture
IBM® Db2® Big SQL is an advanced SQL engine that is optimized for advanced analytics in big data environments. IBM Db2 Big SQL combines open source big data technologies with powerful Db2 SQL processing to drive efficient, interactive, and batch analytics use cases.
A Db2 Big SQL instance that is deployed on Cloud Pak for Data can be configured in one of three ways: Hadoop side-car, pure object store, or a hybrid deployment that is configured for both.
When Db2 Big SQL is configured as a Hadoop side-car instance, the Db2 Big SQL engine runs on Cloud Pak for Data, but the metadata and data are stored on the remote Hadoop cluster’s Hive metastore and the HDFS, respectively (see Figure 1). Because of this, you can offload your resource-intensive BI workloads from the Hadoop cluster to Cloud Pak for Data while keeping your data, metadata, and security policies where they are now. Db2 Big SQL reads the data from the HDFS over the network in parallel. If you create a table by using Db2 Big SQL, a Hive table is created in the Hive metastore of the Hadoop cluster.
When Db2 Big SQL is configured as a pure object store instance, the Db2 Big SQL engine runs on Cloud Pak for Data, the metadata is stored in a Hive metastore that is associated with the Db2 Big SQL instance, and the data is stored on an object storage service (see Figure 2). The object storage service can be offered by a public cloud provider (such as IBM or Amazon), or it can be implemented as a storage cluster (for example, Red Hat® OpenShift® Data Foundation) in the same data center as the Cloud Pak for Data compute cluster.
hive
.
Consider the HIVE schema as an internal schema; its data must not be modified by any locally running
workload.If the remote Hadoop cluster is configured to access an object store, you can also deploy Db2 Big SQL in a hybrid configuration to access tables on both HDFS and object store.
A Db2 Big
SQL service instance has two main
nodes: c-bigsql-<instance_id>-db2u-0
and
c-bigsql-<instance_id>-db2u-x
, where
x is a value of 1 or greater. The term head node in Db2 Big
SQL corresponds to the
c-bigsql-<instance_id>-db2u-0
pod that runs on your Red Hat
OpenShift cluster. The term worker node in
Db2 Big
SQL refers to the worker service
component that runs on each
c-bigsql-<instance_id>-db2u-x
pod. You can
allocate multiple worker nodes, which are effectively multiple
c-bigsql-<instance_id>-db2u-x
pods, to a
Db2 Big
SQL service instance.
You must not confuse Db2 Big SQL nodes with OpenShift compute nodes, which are the physical nodes that compose the OpenShift cluster. For more information about cluster compute nodes, see Architecture for Cloud Pak for Data.