Kerberos authentication in Cloud Pak for Data

Kerberos is a network authentication protocol that uses strong cryptography for authentication and authorization for client/server applications. Cloud Pak for Data supports Kerberos in multiple services and in multiple connections to remote data sources.

Kerberos authentication in services

The following services support connecting to data with Kerberos authentication.

Analytics Engine Powered by Apache Spark

Access a remote Hadoop cluster with Kerberos authentication in an Analytics Engine Powered by Apache Spark instance to run Spark jobs. See Running a Spark job on a secure Hadoop cluster at Accessing a remote Hadoop cluster.

Db2® Big SQL

Db2 Big SQL connects to an existing remote big data storage system. One option is a Hadoop cluster on Cloudera Data on which Kerberos security is enabled. See Preparing to install Db2 Big SQL.

When you provision the Db2 Big SQL service, you can set up Db2 Big SQL to automate the creation of principals and keytabs when Kerberos security is enabled on a Hadoop cluster. If Active Directory (AD) is used for Kerberos, you upload a custom keytab file. For information, see the procedure Setting up a connection from Db2 Big SQL to a remote data source.

Alternatively, you can connect Db2 Big SQL to a Cloudera Hadoop cluster that is secured by Kerberos when Cloudera Manager does not manage the Kerberos configuration. In this case, you must update the Db2 Big SQL secret after you provision a Db2 Big SQL instance. See Connecting Db2 Big SQL to a Hadoop cluster with a manually managed Kerberos configuration.

Execution Engine for Apache Hadoop

See the details for edge node software requirements for clusters with Kerberos security enabled and delegation token endpoints in Installing Execution Engine for Apache Hadoop on Apache Hadoop clusters.

If the Apache Hadoop cluster is enabled for Kerberos, you can use delegation tokens for authentication when you access the specific Hadoop services, such as HDFS, Hive, or HMS. For information, see Using delegation token endpoints.

Execution Engine for Apache Hadoop is a Knox Gateway service that is enabled on the Hadoop cluster. You can use the gateway URLs and the exposed service endpoints of exposed services like Livy, JEG, or WEBHDFS to connect to the Hadoop (HDFS, Spark, or Hive) services.

Watson™ Knowledge Catalog

You can use platform connections that use Kerberos authentication in metadata import.

In legacy metadata import, automated discovery, and legacy quality tasks, you can use non-platform Hive connections with Kerberos. For details, see Configuring Hive with Kerberos for quality tasks (Watson Knowledge Catalog).

Watson OpenScale

Watson OpenScale supports monitoring batch models by using Spark jobs in the following Kerberos-enabled Spark engines:

Kerberos authentication in platform connections

The following connections to remote data sources are supported in a Kerberos-enabled environment: