Hive via Execution Engine for Hadoop connection

You can create a connection asset for Hive via Execution Engine for Hadoop.

Use the Hive via Execution Engine for Hadoop connection to connect to tables in a Hive warehouse on the Hadoop cluster.

Prerequisites

  • Your administrator must register the Hadoop cluster from the Hadoop Execution Engine panel. Ask your administrator for the URL.
  • You must create an environment runtime definition for Hadoop in your project.
  • SSL certificate associated with the connection URL.
  • SSL certificate for the Hive server if the Hive server is SSL-enabled.
  • Download the HiveJDBC41.jar file from the Cloudera website:
    1. Select the latest version of the Hive JDBC Driver.
    2. Click GET IT NOW, and then download and extract the hive_jdbc_#.#.#.####.zip file.
    3. Extract the ClouderaHiveJDBC41-#.#.#.####.zip file. The HiveJDBC41.jar file will be in the extracted contents.
    4. Upload the file to Cloud Pak for Data. See [Importing JDBC drivers]({{ site.data.keyword.swdocs }}/hub/admin/jdbc-drivers.html){: external} in the {{ site.data.keyword.swhub }} documentation for the procedure and required permissions to upload the JAR file to Cloud Pak for Data. Important: By default, uploading JDBC driver files is disabled and users cannot view the list of JDBC drivers in the web client. An administrator must [enable users to upload or view JDBC drivers]({{ site.data.keyword.swdocs }}/hub/admin/post-install-enable-jdbc-upload.html){: external} in the {{ site.data.keyword.swhub }} documentation.

Supported encryption

  • SSL Certificate
  • This connection supports connecting to a Hadoop environment that is secured by Kerberos.

Prerequisites for Kerberos authentication

If you plan to use Kerberos authentication, complete the following requirements:

  • Configure the data source for Kerberos authentication. Optional: This connection supports Kerberos SSO with user impersonation, which requires additional configuration.
  • Confirm that the service that you plan to use the connection supports Kerberos. For more information, see Kerberos authentication in Cloud Pak for Data.

Credentials

Platform login credentials

Create a Hive via Execution Engine for Hadoop connection to the Hive warehouse on the Hadoop cluster

  1. From your project, on the Assets tab, click New asset > Connect to a data source.

  2. Select Hive via Execution Engine for Hadoop.

  3. Enter a name and description and the connection information.

  4. Select your platform login credentials.

    Note: For other users to use the connection, they would need to supply their own Cloud Pak for Data credentials.
  5. In the Jar uris drop-down list, upload the HiveJDBC41.jar file if it is not already there, and then select it.

  6. In the SSL Certificate field, enter the SSL certificate for the connection URL (the url labelled as URL) found under the registration details in Administration > Configuration and settings > Hadoop Execution Engine. If the Hive server is SSL-enabled, enter the certificate for the server as well.
    Example with two certificates:

    -----BEGIN CERTIFICATE-----
    certificate from the connection URL
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
     certificate from the Hive server
     -----END CERTIFICATE-----
    
  7. Enter the URL for accessing the Hadoop Integration Service.

    Important: The Hadoop Integration Service URL must be the same as the URL in the Hadoop Registration Details. The administrator can confirm the URL from Administration > Configuration and settings > Hadoop Execution Engine.
  8. Click Create.

Next step: Add data assets from the connection

Federal Information Processing Standards (FIPS) compliance

The Hive via Execution Engine for Hadoop connection cannot be created in a FIPS environment.

Restrictions

  • This feature is not supported on Hortonworks 3.x clusters.
  • For Data Refinery, you can use this connection only as a source. You cannot use this connection as a target connection or as a target connected data asset. For workaround, see Refining data that is stored in tables in a Hive warehouse.
  • For SPSS Modeler, you can use this connection only to import data. You cannot export data to this connection or to a Hive via Execution Engine for Hadoop connected data asset.

Known issues

Troubleshooting Hadoop environments

Parent topic: Supported connections