Hive via Execution Engine for Hadoop connection
You can create a connection asset for Hive via Execution Engine for Hadoop.
Use the Hive via Execution Engine for Hadoop connection to connect to tables in a Hive warehouse on the Hadoop cluster.
Prerequisites
- Your administrator must register the Hadoop cluster from the Systems integration panel. Ask your administrator for the URL.
- You must create an environment runtime definition for Hadoop in your project.
- SSL certificate associated with the connection URL.
- SSL certificate for the Hive server if the Hive server is SSL-enabled.
- Download the
HiveJDBC41.jar
file from the Cloudera website:- Select the latest version of the Hive JDBC Driver.
- Click GET IT NOW, and then download and extract the
hive_jdbc_#.#.#.####.zip
file. - Extract the
ClouderaHiveJDBC41-#.#.#.####.zip
file. TheHiveJDBC41.jar
file will be in the extracted contents. - Upload the file to Cloud Pak for Data. See Importing JDBC drivers for the procedure and required permissions to upload the JAR file to Cloud Pak for Data.
Supported encryption
- SSL Certificate
- This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Credentials
Platform login credentials
Create a Hive via Execution Engine for Hadoop connection to the Hive warehouse on the Hadoop cluster
-
From your project, on the Assets tab, click New asset > Connection.
-
Select Hive via Execution Engine for Hadoop.
-
Enter a name and description and the connection information.
-
Select your platform login credentials.
Note: For other users to use the connection, they would need to supply their own Cloud Pak for Data credentials. -
In the Jar uris drop-down list, upload the
HiveJDBC41.jar
file if it is not already there, and then select it. -
In the SSL Certificate field, enter the SSL certificate for the connection URL (the url labelled as URL) found under the registration details in Administration > Platform configuration > Systems integration. If the Hive server is SSL-enabled, enter the certificate for the server as well.
Example with two certificates:-----BEGIN CERTIFICATE----- certificate from the connection URL -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- certificate from the Hive server -----END CERTIFICATE-----
-
Enter the URL for accessing the Hadoop Integration Service.
Important: The Hadoop Integration Service URL must be the same as the URL in the Hadoop Registration Details. The administrator can confirm the URL from **Administration > Platform configuration > System integration**. -
Click Create.
Next step: Add data assets from the connection
Where you can use this connection
You can use a Hive via Execution Engine for Hadoop connection in the following workspaces and tools:
Projects
- Data Refinery (Watson Studio or Watson Knowledge Catalog). For instructions, see Refining data stored in tables in a Hive warehouse.
- Notebooks (Watson Studio). Click Read data on the Code snippets pane to get the connection credentials and load the data into a data structure. See Load data from data source connections.
- SPSS Modeler (SPSS Modeler service)
Catalogs
- Platform assets catalog
- Other catalogs (Watson Knowledge Catalog)
Federal Information Processing Standards (FIPS) compliance
The Hive via Execution Engine for Hadoop connection cannot be created in a FIPS environment.
Restrictions
- This feature is not supported on Hortonworks 3.x clusters.
- For Data Refinery, you can use this connection only as a source. You cannot use this connection as a target connection or as a target connected data asset. For workaround, see Refining data that is stored in tables in a Hive warehouse.
- For SPSS Modeler, you can use this connection only to import data. You cannot export data to this connection or to a Hive via Execution Engine for Hadoop connected data asset.
Known issues
Troubleshooting Hadoop environments
Parent topic: Supported connections