Cloudera Impala connection

To access your data in Cloudera Impala, create a connection asset for it.

Cloudera Impala provides SQL queries directly on your Apache Hadoop data stored in HDFS or HBase.

Supported versions

Cloudera Impala 1.3+

Prerequisite for Kerberos authentication

To use Kerberos authentication, the data source must be configured for Kerberos and the service that you plan to use the connection in must support Kerberos. For information, see Enabling platform connections to use Kerberos authentication.

Create a connection to Cloudera Impala

To create the connection asset, you need these connection details:

  • Database name
  • Hostname or IP address
  • Port number
  • Username and password
  • SSL certificate (if required by the database server)

For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Authentication method

You can choose Kerberos credentials or Username and password.
For Kerberos credentials, you must complete the prerequisite for Kerberos authentication and you need the following connection details:

  • Service principal name (SPN) that is configured for the data source
  • User principal name to connect to the Kerberized data source
  • The keytab file for the user principal name that is used to authenticate to the Key Distribution Center (KDC)

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Data access tools > Connection. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Add to space > Connection. See Adding connections to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use Cloudera Impala connections in the following workspaces and tools:

Projects

  • Data Refinery (Watson Studio or Watson Knowledge Catalog)
  • DataStage (DataStage service). See Connecting to a data source in DataStage.
  • Metadata enrichment (Watson Knowledge Catalog)
  • Metadata import (Watson Knowledge Catalog)
  • SPSS Modeler (SPSS Modeler service)

Catalogs

  • Platform assets catalog
  • Other catalogs (Watson Knowledge Catalog)
Watson Query service
You can connect to this data source from Watson Query.

Cloudera Impala setup

Cloudera Impala installation

Restriction

You can use this connection only for source data. You cannot write to data or export data with this connection.

Running SQL statements

To ensure that your SQL statements run correctly, refer to the Impala SQL Language Reference for the correct syntax.

Learn more

Cloudera Impala documentation

Parent topic: Supported connections