Preparing to install Db2 Big SQL
Use this information to plan and prepare to install the Db2® Big SQL service.
For information about the minimum system requirements for the Db2 Big SQL service, see System requirements.
Even though Db2 Big SQL data is stored on HDFS or object stores, Db2 Big SQL requires persistent volumes to store its SQL catalog, configuration files, log files, and temporary files created during SQL processing.
Db2 Big SQL supports dynamic storage provisioning. The cluster administrator provides a storage class name, which is used to provision storage for all nodes of the service (the head node and each worker node).
For more information about storage requirements for Cloud Pak for Data, see Storage considerations.
Required access on persistent storage
The storage class that you use for Db2 Big SQL must support the
ReadWriteMany access mode.
No specific file system access needs to be granted. The Db2 Big SQL service manages the persistent volume.
Remote Hadoop cluster or public or private object store
Db2 Big SQL connects to an existing remote big data storage system. The system can be an object store, or a Hadoop cluster on Cloudera Data Platform (CDP) 7.1.6.
You provide the connection information when you provision Db2 Big SQL.
- The HDFS NameNode and Hive metastore services must be accessible from the Db2 Big SQL service. Specifically, the services must be
bound to an IP address that is accessible outside the Hadoop cluster.
To make the HDFS NameNode accessible, do the following steps:
- In Cloudera Manager, navigate to the HDFS service.
- Click the Configuration tab.
- Locate HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and click Add.
- Create the property dfs.namenode.rpc-bind-host and set its value to 0.0.0.0.
- Data nodes on the remote Hadoop cluster must be accessible to Db2 Big SQL, and the associated ports must be open on
each data node to the Db2 Big SQL service. The
following components must also be accessible:
- Cloudera Manager server
- Hive server(s)
- Hive metastore
- Ranger (if configured).
- In HDFS, the /user/db2inst1 directory must be created. An HDFS
administrator must run the following commands to create the directory, make the db2inst1 user the
owner, and allow the db2inst1 user to write to
hdfs dfs -mkdir /user/db2inst1 hdfs dfs -chmod 755 /user/db2inst1 hdfs dfs -chown -R db2inst1 /user/db2inst1
- If automatic syncing of the Db2 Big SQL
catalog and the Hive metastore is enabled, an additional subdirectory is required. Run the following
hdfs dfs -mkdir /user/db2inst1/sync hdfs dfs -chmod 777 /user/db2inst1/sync hdfs dfs -chown -R db2inst1:hdfs /user/db2inst1 hdfs dfs -chown -R db2inst1:hdfs /user/db2inst1/sync
- The object store must be an S3 compatible object store, such as Amazon Web Services S3, IBM Cloud Object Storage, or Red Hat® OpenShift® Container Storage.
- The credentials must permit read and write access on the bucket(s) that Db2 Big SQL interacts with.
Giving Db2 Big SQL access to the Hive warehouse directory
For Db2 Big SQL to be able to read and write data on the HDFS, the db2inst1 user must be granted access privileges on the Hive warehouse directory. The specific privileges required depend on whether Db2 Big SQL impersonation is enabled or not.
hdfs dfs -setfacl -R -m user:db2inst1:rwx /warehouse/tablespace/managed/hive hdfs dfs -setfacl -R -m default:user:db2inst1:rwx /warehouse/tablespace/managed/hive hdfs dfs -setfacl -R -m user:db2inst1:rwx /warehouse/tablespace/external/hive hdfs dfs -setfacl -R -m default:user:db2inst1:rwx /warehouse/tablespace/external/hive
hdfs dfs -setfacl -R -m user:db2inst1:rwx /warehouse/tablespace/external/hive/sales.db hdfs dfs -setfacl -R -m default:user:db2inst1:rwx /warehouse/tablespace/external/hive/sales.db
If Db2 Big SQL is to have access to tables outside of the warehouse (that is, when the LOCATION clause is specified), then the required permissions can be granted via a policy defined in the Ranger HDFS plugin, or via file access control lists (FACLs) that are similar to those shown above should be set for the specified directories.
Updating the key distribution center (KDC)
If you are connecting to a Hadoop cluster on which Kerberos security is enabled, you must update your KDC for Db2 Big SQL.
- Edit the /var/kerberos/krb5kdc/kadm5.acl file. Change the asterisk
e*. For example:
cat /var/kerberos/krb5kdc/kadm5.acl */admin@IBM.COM e*
- Restart the KDC by running the following
sudo service krb5kdc restart sudo service kadmin restart
Db2 Big SQL obtains a new Kerberos ticket approximately every 12 hours. Therefore, the ticket lifetime defined in the Kerberos configuration (KDC, krb5) must be set to at least 16 hours.
If you are using Active Directory Kerberos or would prefer to manually pre-create the principal and keytab required for Db2 Big SQL, see Enabling Active Directory as the Kerberos server.
Ensure that the DNS for the OpenShift cluster is configured so that the Kerberos server and Hive metastore hosts in the Hadoop cluster have valid forward and reverse DNS entries. This configuration means that a DNS entry exists that maps the hostname (of the KDC and the Hive metastores) to an IP address, and a reverse entry exists for that IP address that maps it back to the original hostnames.
If you are deploying on Red Hat OpenShift Kubernetes Service (ROKS), the kernel semaphore settings are too low for Db2 Big SQL to perform properly, and you must update the settings. For more information, see Provisioning Db2 Big SQL on ROKS.
Security Context Constraint requirements
Db2 Big SQL requires a specific Red Hat OpenShift Security Context Constraint (SCC) to be applied to the identity that is running the Db2 Big SQL pods. The creation and application of the SCC is done by the installation & deployment process.
Db2 Big SQL uses the SCC capabilities in Db2. For details about the permissions that the SCC grants, see Security context constraints in Db2.