HDFS via Execution Engine for Hadoop connection

You can create a connection asset for HDFS via Execution Engine for Hadoop.

Use the HDFS via Execution Engine for Hadoop connection to access data in a Hadoop Distributed File System (HDFS) in a Hadoop cluster.

Prerequisites

Supported encryption

  • SSL Certificate
  • This connection supports connecting to a Hadoop environment that is secured by Kerberos.

Credentials

Platform login credentials

Create a HDFS via Execution Engine for Hadoop connection to the Hadoop cluster

  1. From your project, on the Assets tab, click New asset > Connection.

  2. Select HDFS via Execution Engine for Hadoop.

  3. Enter a name and description and the connection information.

  4. Select your platform login credentials.

    Note: For other users to use the connection, they would need to supply their own Cloud Pak for Data credentials.
  5. Enter the WebHDFS URL for accessing HDFS.

    Important: The WebHDFS URL must contain the same URL in the Hadoop Registration Details page for your target Hadoop system. For example: ` /webhdfs/v1`. The administrator can confirm the URL from **Administration > Platform configuration > Systems integration**.
  6. In the SSL Certificate field, enter the SSL certificate for the connection URL (the url labelled as URL) found under the registration details in Administration > Platform configuration > Systems integration.

  7. Click Create.

Next step: Add data assets from the connection

Where you can use this connection

You can use the HDFS via Execution Engine for Hadoop connection in the following workspaces and tools:

Projects

  • Data Refinery (Watson Studio or Watson Knowledge Catalog). For instructions, see Refining HDFS data.

  • Notebooks (Watson Studio). Click Read data on the Code snippets pane to get the connection credentials and load the data into a data structure. See Load data from data source connections.

  • SPSS Modeler (SPSS Modeler service)

Catalogs

  • Platform assets catalog
  • Other catalogs (Watson Knowledge Catalog)

Federal Information Processing Standards (FIPS) compliance

The HDFS via Execution Engine for Hadoop connection is compliant with FIPS. However, SSL certificates that you paste into the SSL certificate field are not supported. As a workaround, you can add the certificate to the OpenShift secret named connection-ca-certs. See Using a CA certificate to connect to internal servers from the platform for the procedure.

Restriction

For SPSS Modeler, you can only export data to this connection or to a HDFS via Execution Engine for Hadoop connected data asset if the data exists. You cannot create a new data asset.

Supported file types

The HDFS via Execution Engine for Hadoop connection supports these file types: CSV, Delimited text, JSON, and Parquet.

Known issues

Troubleshooting Hadoop environments

Learn more

Administering Apache Hadoop clusters

Parent topic: Supported connections