HDFS via Execution Engine for Hadoop connection
You can create a connection asset for HDFS via Execution Engine for Hadoop.
Use the HDFS via Execution Engine for Hadoop connection to access data in a Hadoop Distributed File System (HDFS) in a Hadoop cluster.
Prerequisites
- Your administrator must register the Hadoop cluster from the Systems integration panel. Ask your administrator for the URL.
- You must create an environment runtime definition for Hadoop in your project.
- SSL certificate associated with the connection URL.
Supported encryption
- SSL Certificate
- This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Credentials
Platform login credentials
Create a HDFS via Execution Engine for Hadoop connection to the Hadoop cluster
-
From your project, on the Assets tab, click New asset > Connection.
-
Select HDFS via Execution Engine for Hadoop.
-
Enter a name and description and the connection information.
-
Select your platform login credentials.
Note: For other users to use the connection, they would need to supply their own Cloud Pak for Data credentials. -
Enter the WebHDFS URL for accessing HDFS.
Important: The WebHDFS URL must contain the same URL in the Hadoop Registration Details page for your target Hadoop system. For example: `/webhdfs/v1`. The administrator can confirm the URL from **Administration > Platform configuration > Systems integration**. -
In the SSL Certificate field, enter the SSL certificate for the connection URL (the url labelled as URL) found under the registration details in Administration > Platform configuration > Systems integration.
-
Click Create.
Next step: Add data assets from the connection
Where you can use this connection
You can use the HDFS via Execution Engine for Hadoop connection in the following workspaces and tools:
Projects
-
Data Refinery (Watson Studio or Watson Knowledge Catalog). For instructions, see Refining HDFS data.
-
Notebooks (Watson Studio). Click Read data on the Code snippets pane to get the connection credentials and load the data into a data structure. See Load data from data source connections.
-
SPSS Modeler (SPSS Modeler service)
Catalogs
- Platform assets catalog
- Other catalogs (Watson Knowledge Catalog)
Federal Information Processing Standards (FIPS) compliance
The HDFS via Execution Engine for Hadoop connection is compliant with FIPS. However, SSL certificates that you paste into the SSL certificate field are not supported. As a workaround, you can add the certificate to the OpenShift secret named
connection-ca-certs
. See Using a CA certificate to connect to internal servers from the platform for the procedure.
Restriction
For SPSS Modeler, you can only export data to this connection or to a HDFS via Execution Engine for Hadoop connected data asset if the data exists. You cannot create a new data asset.
Supported file types
The HDFS via Execution Engine for Hadoop connection supports these file types: CSV, Delimited text, JSON, and Parquet.
Known issues
Learn more
Administering Apache Hadoop clusters
Parent topic: Supported connections