Impala via Execution Engine for Hadoop connection
You can create a connection asset for Impala via Execution Engine for Hadoop.
Use the Impala via Execution Engine for Hadoop connection to connect to data stored in tables in Impala on the Hadoop cluster.
Prerequisites
- Your administrator must register the Hadoop cluster from the Systems integration panel. Ask your administrator for the URL.
- LDAP must be enabled as part of the Impala authentication.
- You must create an environment runtime definition for Hadoop in your project.
- Optional: SSL certificate for the Impala daemon if the daemon is SSL-enabled.
- Download the
ImpalaJDBC41.jar
file from the Cloudera website:- Select the latest version of the Impala JDBC Connector.
- Click GET IT NOW, and then download and extract the
ClouderaImpala_JDBC-#.#.##.####.zip
file. - Extract the
ClouderaImpalaJDBC41-#.#.##.####.zip
file. TheImpalaJDBC41.jar
file is in the extracted contents.
Supported encryption
- SSL Certificate (optional)
- This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Credentials
Username and password
For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.
Create an Impala via Execution Engine for Hadoop connection to the Hadoop cluster
-
From your project, on the Assets tab, click New asset > Connection.
-
Select Impala via Execution Engine for Hadoop.
-
Enter a name and description.
-
Enter the connection details:
- Hostname or IP Address: Hostname or IP Address where the Impala daemon is available.
- Port: Impala daemon's port from Hadoop cluster.
- Port is SSL-enabled: Select if Impala daemon is SSL-enabled.
- SSL certificate: Provide the SSL certificate of the Impala daemon if it is SSL-enabled. The connection URL (the url labelled as URL) is found under the registration details in Administration > Configuration and settings > Hadoop Execution Engine.
- Hostname or IP Address: Hostname or IP Address where the Impala daemon is available.
-
In the JDBC driver files drop-down list, upload the
ImpalaJDBC41.jar
file if it is not already there, and then select it. See Importing JDBC drivers for the procedure and required permissions to upload the JAR file to Cloud Pak for Data. Important: Starting in version 4.8.4, uploading JDBC drivers is disabled by default. Starting in 4.8.5, by default in new installations, users cannot view the list of JDBC drivers in the web client. An administrator must enable users to upload or view JDBC drivers. -
For Credentials, select Personal and enter the user's LDAP userid and password.
Note: The connection to Impala via Execution Engine for Hadoop must be a personal connection. The connection cannot be shared. Other users must enter their own credentials when accessing this connection. -
Click Create.
Next step: Add data assets from the connection
Where you can use this connection
You can use an Impala via Execution Engine for Hadoop connection in the following workspaces and tools:
Projects
- Data Refinery (Watson Studio or IBM Knowledge Catalog). For instructions, see Refining data stored in tables in Impala.
- SPSS Modeler (SPSS Modeler service)
Catalogs
-
Platform assets catalog
-
Other catalogs (IBM Knowledge Catalog)
Federal Information Processing Standards (FIPS) compliance
The Impala via Execution Engine for Hadoop connection cannot be created in a FIPS environment.
Restrictions
- For Data Refinery, you can use this connection only as a source. You cannot use this connection as a target connection or as a target connected data asset. For workaround, see Refining data stored in tables in Impala.
- For SPSS Modeler, you can use this connection only to import data. You cannot export data to this connection or to a Impala via Execution Engine for Hadoop connected data asset.
Known issues
Troubleshooting Hadoop environments
Parent topic: Supported connections