Apache HDFS connection

To access your data in Apache HDFS, create a connection asset for it.

Apache Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. Apache HDFS was formerly Hortonworks HDFS.

Create a connection to Apache HDFS

To create the connection asset, you need these connection details:

For Credentials and Certificates, you can use secrets if a vault is configured for the platform and the service supports vaults. For information, see Using secrets from vaults in connections.

Choose the method for creating a connection based on where you are in the platform

In a project Click Add to project > Connection. See Adding a connection to a project.


In a catalog Click Add to catalog > Connection. See Adding a connection asset to a catalog.


In a deployment space Click Add to space > Connection. See Adding connections to a deployment space.


In the Platform assets catalog Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use Apache HDFS connections in the following workspaces and tools:

Analytics projects

Catalogs

Apache HDFS setup

Install and set up a Hadoop cluster

Supported file types

The Apache HDFS connection supports these file types: Avro, CSV, Delimited text, Excel, JSON, ORC, Parquet, SAS, SAV, SHP, and XML.

Learn more

Apache HDFS Users Guide

Parent topic: Supported connections