Apache Hive connection

To access your data in Apache Hive, create a connection asset for it.

Apache Hive is a data warehouse software project that provides data query and analysis and is built on top of Apache Hadoop.

Supported versions

Apache Hive 1.0.x, 1.1.x, 1.2.x. 2.0.x, 2.1.x, 3.0.x, 3.1.x.

Prerequisites for Kerberos authentication

If you plan to use Kerberos authentication, complete these requirements:

Create a connection to Apache Hive

To create the connection asset, you need the following connection details:

  • Database name (optional): If you do not enter a database name, you must enter the catalog name, schema name, and the table name in the properties for SQL queries.
  • Hostname or IP address
  • Port number
  • HTTP path (optional): The path of the endpoint such as the gateway, default, or hive if the server is configured for the HTTP transport mode.
  • If required by the database server, the SSL certificate

Authentication method

Username and password or Kerberos credentials
Available Kerberos selections depend on whether you select Personal or Shared credentials.

Credentials

The credentials setting determines the available authentication methods.
If you select Shared (default), you can use either username and password authentication or Kerberos authentication (without SSO). For more information, see Prerequisites for Kerberos authentication. For Kerberos, you need the following connection details:

  • Service principal name (SPN) that is configured for the data source
  • User principal name to connect to the Kerberized data source
  • The keytab file for the user principal name that is used to authenticate to the Key Distribution Center (KDC)

If you select Personal, you can enter your username and password for the server manually, use secrets from a vault, or use Kerberos authentication. For more information, see Prerequisites for Kerberos authentication. You have two choices for Kerberos:

  • Kerberos (without SSO). For Kerberos without SSO, you need the following connection details:
    • Service principal name (SPN) that is configured for the data source
    • User principal name to connect to the Kerberized data source
    • The keytab file for the user principal name that is used to authenticate to the Key Distribution Center (KDC)
  • Kerberos SSO. Select Kerberos SSO and enter the Service principal name (SPN) that is configured for the data source.

ZooKeeper discovery (optional)

Select Use ZooKeeper discovery to ensure continued access to the connection in case the Apache Hive server that you log in to fails.

Prerequisites for ZooKeeper discovery:

  • ZooKeeper must be configured in your Hadoop cluster.
  • The Hive service in the Hadoop cluster must be configured for ZooKeeper, along with the ZooKeeper namespace.
  • Alternative servers for failover.

Enter the ZooKeeper namespace and a comma-separated list of alternative servers in this format:
hostname1:port-number1,hostname2:port-number2,hostname3:port-number3

For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Federal Information Processing Standards (FIPS) compliance

This connection is FIPS-compliant and can be used on a FIPS-enabled cluster.

Apache Hive setup

Apache Hive installation and configuration

Restriction

For all services except DataStage, you can use this connection only for source data. You cannot write to data or export data with this connection. In DataStage, you can use this connection as a target if you select Use DataStage properties in the connector's properties.

Running SQL statements

To ensure that your SQL statements run correctly, refer to the SQL Standard Based Hive Authorization in the Apache Hive documentation for the correct syntax.

Learn more