Configuring Hive connections with Kerberos authentication for lineage imports (IBM Knowledge Catalog)

To import lineage for assets from Hive connections with Kerberos authentication, complete several configuration steps.

The lineage feature of MANTA Automated Data Lineage for IBM Cloud Pak for Data is an optional feature and must be explicitly enabled if you want to import lineage metadata.

Hive connections are not validated when a metadata import for getting lineage from such a connection is configured. For standard Hive data sources or Hive data sources with Kerberos that are connected through Knox, a MANTA Automated Data Lineage for IBM Cloud Pak for Data connection with default values is created because IBM Knowledge Catalog does not have access to the actual configuration.

To enable lineage imports from such connections, add the configuration files to your Cloud Pak for Data cluster and configure Kerberos authentication in MANTA Automated Data Lineage for IBM Cloud Pak for Data:

  1. Get the following configuration files from the Hive server:

    • For authentication method Keytab: the Kerberos keytab and configuration files
    • For authentication method JAAS: the JAAS login configuration file
  2. Copy these files to the manta-admin-gui pod:

    • For authentication method Keytab:

      oc cp <keytab_file> <manta-admin-gui-pod-id>:/conf/keytabs/wkc.keytab
      oc cp <krb5_config_file> <manta-admin-gui-pod-id>:/etc/krb5.conf
      
    • For authentication method JAAS:

      oc cp <jaas_login_config_file> <manta-admin-gui-pod-id>:<path_to_jaas_login_config_file>
      
  3. Open the MANTA Automated Data Lineage for IBM Cloud Pak for Data Admin UI:

    https://<CPD-HOSTNAME>/manta-admin-gui/
    
  4. Go to Connections > Databases > Hive and select the connection for which you want to configure Kerberos authentication.

    In the MANTA Automated Data Lineage for IBM Cloud Pak for Data Admin UI, the connections are listed with their connection IDs instead of a display name. A connection that is used for lineage import to IBM Knowledge Catalog has the format wkcconnectionID_lineage. The IBM Knowledge Catalog connection ID is part of the URL when you open the corresponding connection asset.

  5. Check and the settings as required:

    • HiveServer2 driver class name is by default set to org.apache.hive.jdbc.HiveDriver. Change it as necessary to match the actual class name.
    • For authentication mode Kerberos, the default authentication method is Keytab. If necessary, you can change it to JAAS.
  6. Update settings based on the selected authentication method:

    • For authentication method Keytab: the path to the keytab file and the path to the krb5 configuration file in the manta-admin-gui pod in the Cloud Pak for Data cluster
    • For authentication method JAAS: the path to the JAAS login configuration file in the manta-admin-gui pod in the Cloud Pak for Data cluster

Parent topic: Administering IBM Knowledge Catalog