HDFS via Execution Engine for Hadoop connection

You can create a connection asset for HDFS via Execution Engine for Hadoop.

Use the HDFS via Execution Engine for Hadoop connection to access data in a Hadoop Distributed File System (HDFS) in a Hadoop cluster.

Prerequisites

Supported encryption

  • SSL Certificate
  • This connection supports connecting to a Hadoop environment that is secured by Kerberos.

Prerequisites for Kerberos authentication

If you plan to use Kerberos authentication, complete the following requirements:

  • Configure the data source for Kerberos authentication. Optional: This connection supports Kerberos SSO with user impersonation, which requires additional configuration.
  • Confirm that the service that you plan to use the connection supports Kerberos. For more information, see Kerberos authentication in Cloud Pak for Data.

Credentials

Platform login credentials

Create a HDFS via Execution Engine for Hadoop connection to the Hadoop cluster

  1. From your project, on the Assets tab, click New asset > Connect to a data source.

  2. Select HDFS via Execution Engine for Hadoop.

  3. Enter a name and description and the connection information.

  4. Select your platform login credentials.

    Note: For other users to use the connection, they would need to supply their own Cloud Pak for Data credentials.
  5. Enter the WebHDFS URL for accessing HDFS.

    Important: The WebHDFS URL must contain the same URL in the Hadoop Registration Details page for your target Hadoop system. For example: <URL>/webhdfs/v1. The administrator can confirm the URL from Administration > Configuration and settings > Hadoop Execution Engine.
  6. In the SSL Certificate field, enter the SSL certificate for the connection URL (the url labelled as URL) found under the registration details in Administration > Configuration and settings > Hadoop Execution Engine.

  7. Click Create.

Next step: Add data assets from the connection

Federal Information Processing Standards (FIPS) compliance

This connection can be used on a FIPS-enabled cluster (FIPS tolerant); however, it is not FIPS-compliant.

Restriction

For SPSS Modeler, you can only export data to this connection or to a HDFS via Execution Engine for Hadoop connected data asset if the data exists. You cannot create a new data asset.

Supported file types

The HDFS via Execution Engine for Hadoop connection supports these file types: CSV, Delimited text, JSON, and Parquet.

Known issues

Troubleshooting Hadoop environments

Learn more

Administering Apache Hadoop clusters

Parent topic: Supported connections