Apache Hive connection
To access your data in Apache Hive, create a connection asset for it.
Apache Hive is a data warehouse software project that provides data query and analysis and is built on top of Apache Hadoop.
Supported versions
- Amazon Elastic MapReduce 2.1.4+
- Apache Hadoop Hive
- Cloudera CDH3 update 4+
- Hortonworks 1.3+
- MapR 1.2+
- Pivotal HD Enterprise 2.0.1
Create a connection to Apache Hive
To create the connection asset, you need these connection details:
- Database name
- Hostname or IP address
- Port number
- HTTP path (Optional): The path of the endpoint such as the gateway, default, or hive if the server is configured for the HTTP transport mode.
- Username and password
- SSL certificate (if required by the database server)
For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.
Choose the method for creating a connection based on where you are in the platform
In a project Click Add to project > Connection. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Add to space > Connection. See Adding connections to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use the Apache Hive connection in the following workspaces and tools:
Analytics projects
- Data Refinery (Watson Studio or Watson Knowledge Catalog)
- DataStage (DataStage service)
- Metadata import (Watson Knowledge Catalog)
- Notebooks (Watson Studio). Use the insert-to-code function to get the connection credentials and load the data into a data structure. See Load data from data source connections.
- SPSS Modeler (SPSS Modeler service)
Catalogs
- Platform assets catalog
- Other catalogs (Watson Knowledge Catalog)
Data Virtualization service You can connect to this data source from Data Virtualization.
Apache Hive setup
Apache Hive installation and configuration
Restrictions
- For Data Refinery, you can use this connection only as a source. You cannot use this connection as a target connection or as a target connected data asset.
- For SPSS Modeler, you can use this connection only to import data. You cannot export data to this connection or to an Apache Hive connected data asset.
Running SQL statements
To ensure that your SQL statements run correctly, refer to the Apache Hive documentation for the correct syntax.
Learn more
Parent topic: Supported connections