Kerberos authentication in Cloud Pak for Data

Kerberos is a network authentication protocol that uses strong cryptography for authentication and authorization for client/server applications. Cloud Pak for Data supports Kerberos in multiple services and in multiple connections to remote data sources.

Kerberos authentication in services

The following services support connecting to data with Kerberos authentication.

Data Virtualization
Data Virtualization supports connecting to the following data source connectors that use Kerberos authentication:
  • Apache Hive
  • Cloudera Impala
  • Apache Spark SQL
For more information, see Enabling Kerberos authentication in Data Virtualization.
DataStage supports Kerberos authentication in many of its connections. For information, see the individual connections at Supported data sources in DataStage.
Db2 Big SQL

Db2 Big SQL connects to an existing remote big data storage system. One option is a Hadoop cluster on Cloudera Data on which Kerberos security is enabled. See Preparing to install Db2 Big SQL.

When you provision the Db2 Big SQL service, you can set up Db2 Big SQL to automate the creation of principals and keytabs when Kerberos security is enabled on a Hadoop cluster. If Active Directory (AD) is used for Kerberos, you upload a custom keytab file. For information, see the procedure Setting up a connection from Db2 Big SQL to a remote data source.

Alternatively, you can connect Db2 Big SQL to a Cloudera Hadoop cluster that is secured by Kerberos when Cloudera Manager does not manage the Kerberos configuration. In this case, you must update the Db2 Big SQL secret after you provision a Db2 Big SQL instance. See Connecting Db2 Big SQL to a Hadoop cluster with a manually managed Kerberos configuration.

Execution Engine for Apache Hadoop

See the details for edge node software requirements for clusters with Kerberos security enabled and delegation token endpoints in Installing Execution Engine for Apache Hadoop on Apache Hadoop clusters.

If the Apache Hadoop cluster is enabled for Kerberos, you can use delegation tokens for authentication when you access the specific Hadoop services, such as HDFS, Hive, or HMS. For information, see Using delegation token endpoints.

Execution Engine for Apache Hadoop is a Knox Gateway service that is enabled on the Hadoop cluster. You can use the gateway URLs and the exposed service endpoints of exposed services like Livy, JEG, or WEBHDFS to connect to the Hadoop (HDFS, Spark, or Hive) services.

MANTA Automated Data Lineage

You can use platform connections that use Kerberos authentication in metadata import.

IBM Knowledge Catalog

You can use platform connections that use Kerberos authentication in metadata import.

Watson OpenScale

Watson OpenScale supports monitoring batch models by using Spark jobs in the following Kerberos-enabled Spark engines:

Kerberos authentication in platform connections

Certain connections support Kerberos SSO. With Kerberos SSO, the user does not enter a username, password, user principal name, or keytab file for the data source. Instead, the Cloud Pak for Data login credentials are used to authenticate the user. As a prerequisite, an administrator must configure Kerberos SSO. For more information, see Enabling platform connections to use Kerberos SSO authentication. The user who creates the connection must use personal credentials.

The following connections to remote data sources are supported in a Kerberos-enabled environment: