HDFS via Execution Engine for Hadoop connection
You can create a connection asset for HDFS via Execution Engine for Hadoop.
Use the HDFS via Execution Engine for Hadoop connection to access data in a Hadoop Distributed File System (HDFS) in a Hadoop cluster.
Prerequisites
- Your administrator must register the Hadoop cluster from the Hadoop Execution Engine panel. Ask your administrator for the URL.
- You must create an environment runtime definition for Hadoop in your project.
- SSL certificate associated with the connection URL.
Supported encryption
- SSL Certificate
- This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Prerequisites for Kerberos authentication
If you plan to use Kerberos authentication, complete the following requirements:
- Configure the data source for Kerberos authentication. Optional: This connection supports Kerberos SSO with user impersonation, which requires additional configuration.
- Confirm that the service that you plan to use the connection supports Kerberos. For more information, see Kerberos authentication in Cloud Pak for Data.
Credentials
Platform login credentials
Create a HDFS via Execution Engine for Hadoop connection to the Hadoop cluster
-
From your project, on the Assets tab, click New asset > Connect to a data source.
-
Select HDFS via Execution Engine for Hadoop.
-
Enter a name and description and the connection information.
-
Select your platform login credentials.
Note: For other users to use the connection, they would need to supply their own Cloud Pak for Data credentials. -
Enter the WebHDFS URL for accessing HDFS.
Important: The WebHDFS URL must contain the same URL in the Hadoop Registration Details page for your target Hadoop system. For example: <URL>/webhdfs/v1. The administrator can confirm the URL from Administration > Configuration and settings > Hadoop Execution Engine. -
In the SSL Certificate field, enter the SSL certificate for the connection URL (the url labelled as URL) found under the registration details in Administration > Configuration and settings > Hadoop Execution Engine.
-
Click Create.
Next step: Add data assets from the connection
Federal Information Processing Standards (FIPS) compliance
This connection can be used on a FIPS-enabled cluster (FIPS tolerant); however, it is not FIPS-compliant.
Restriction
For SPSS Modeler, you can only export data to this connection or to a HDFS via Execution Engine for Hadoop connected data asset if the data exists. You cannot create a new data asset.
Supported file types
The HDFS via Execution Engine for Hadoop connection supports these file types: CSV, Delimited text, JSON, and Parquet.
Known issues
Learn more
Administering Apache Hadoop clusters
Parent topic: Supported connections