Apache HDFS connection
To access your data in Apache HDFS, create a connection asset for it.
Apache Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. Apache HDFS was formerly Hortonworks HDFS.
Create a connection to Apache HDFS
To create the connection asset, you need these connection details:
- WebHDFS URL to access HDFS
- Hive Database
- Hive Host: Hostname or IP Address of Apache Hive server
- Hive Port number and HTTP Path
- Hive User and Password
- Username and password
- SSL certificate (if required by the database server)
For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.
Choose the method for creating a connection based on where you are in the platform
In a project Click Add to project > Connection. See Adding a connection to a project.
In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.
In a deployment space
Click Add to space > Connection. See Adding connections to a deployment space.
In the Platform assets catalog
Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use Apache HDFS connections in the following workspaces and tools:
Analytics projects
- Data Refinery (Watson Studio or Watson Knowledge Catalog)
- DataStage (DataStage)
- Metadata import (Watson Knowledge Catalog)
- Notebooks (Watson Studio). Use the insert-to-code function to get the connection credentials and load the data into a data structure. See Load data from data source connections.
- SPSS Modeler (SPSS Modeler service)
Catalogs
- Platform assets catalog
- Other catalogs (Watson Knowledge Catalog)
Apache HDFS setup
Install and set up a Hadoop cluster
Supported file types
The Apache HDFS connection supports these file types: Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.
Learn more
Parent topic: Supported connections