Known issues with Hive or HDFS connections for data discovery in IBM Cloud Pak for Data 3.5

Preventive Service Planning

Abstract

The following known issues, limitations, and workarounds apply for running automated discovery or quick scan on Hive or HDFS connections in Watson Knowledge Catalog for IBM Cloud Pak for Data.

Content

Existing connections in upgrade installations

If you have existing Hive Kerberos connections, a project administrator must complete several steps before and after upgrading the Watson Knowledge Catalog service in IBM Cloud Pak for Data.

Before upgrading

As a project administrator, back up the driver configuration for Kerberos authentication. Log in to the conductor pod (usually the is-en-conductor-0 pod) and create a backup copy of the /opt/IBM/InformationServer/ASBNode/lib/java/JDBCDriverLogin.conf file.

After upgrading

As a project administrator, complete these steps:

Log in to the conductor pod (usually the is-en-conductor-0 pod) and replace the /opt/IBM/InformationServer/ASBNode/lib/java/JDBCDriverLogin.conf file with the backup copy you created before the upgrade.
For connections that you want to use for automated discovery, complete steps 8 thru 12 of the procedure described in Configuring Hive with Kerberos for quality tasks.

New connections

New Hive or HDFS connections for data discovery
Data source	Connection type	Authentication type	Quick scan	Automated discovery
HDFS	Third party: Apache HDFS	Without additional authentication mechanism	Not supported	Platform connection
		With Knox gateway	Not supported	Platform connection
		With Kerberos	Not supported	Connection created through metadata import with additional configuration. Metadata import must be enabled, and the user setting up the connection must have the Access advanced governance capability and Metadata import permissions. Connector to use: Apache > File connector - HDFS For details, see Metadata import connectors.
Hive	Third party: Apache Hive	Without additional authentication mechanism	Platform connection	Platform connection
		With Knox gateway	Platform connection	Platform connection
		With Kerberos	Connection created through metadata import with additional configuration. Metadata import must be enabled, and the user setting up the connection must have the Access advanced governance capability and Metadata import permissions. Connector to use: IBM > JDBC connector For details about the additional configuration steps, see Configuring Hive with Kerberos for quick scan.	Connection created through metadata import with additional configuration. Metadata import must be enabled, and the user setting up the connection must have the Access advanced governance capability and Metadata import permissions. Connector to use: IBM > JDBC connector For details, see Configuring Hive with Kerberos for quality tasks.

Considerations when setting up data discovery jobs for Hive or HDFS connections

Connections created through metadata import are automatically shown in the connection selection list when you create a new discovery job.
When you create a discovery job for a platform-level Hive connection, you might need to add the connection twice.
When you set up an automated discovery job for an HDFS connection, browsing the discovery root might not work. As a workaround, select a different connection and then switch back to the originally selected HDFS connection. Then, browsing will work and you can select the folder for analysis.

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m50000000ClVnAAK","label":"Organize->Discovery"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"3.5.0"}]

Tips

Known issues with Hive or HDFS connections for data discovery in IBM Cloud Pak for Data 3.5

Preventive Service Planning

Abstract

Content

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?