Defining a connection

You can access the local file system on the engine tier or access a Hadoop Distributed File System (HDFS) by using the WebHDFS API or the HttpFS API.

About this task

To access HDFS, you must define a connection that specifies the server name, server port number, user name, and password. Alternatively, you can also specify the connection URL instead of the host name and port.

You can use Kerberos authentication to connect to HDFS by using the WebHDFS API or the HttpFS API. If you use Kerberos authentication, you must specify the Kerberos principal in the User name property and the password in the Password property. Optionally, you can also specify the realm of the principal in the user name property, for example, principal@realm. Alternatively, you can use the Kerberos keytab file instead of the password.

When Kerberos authentication is used, the File connector uses the krb5.conf file. Specify the Kerberos Key Distribution Center (KDC) host name and the default realm in the krb5.conf file. For more information about the krb5.conf file, see the Kerberos documentation.

Procedure

Configure a connection to a local file system or HDFS by using the WebHDFS API or the HttpFS API.
Option Procedure
Access the local file system
  1. On the Properties page, in the Connection section, set the File system property to Local.
  2. Click OK.
Access a file system on HDFS by using the WebHDFS API or the HttpFS API
  1. On the Properties page, in the Connection section, set the File system property to WebHDFS or HttpFS.
  2. Specify the host name, user name, and the password for the user.
  3. Specify a port number. If a port number is not specified, the connector uses one of the following port numbers:
    • If the Use SSL (HTTPS) property is set to No, the connector uses 50070 (WebHDFS) or 14000 (HttpFS).
    • If the Use SSL (HTTPS) property is set to Yes, the connector uses the port number 50470 (WebHDFS) or 14443 (HttpFS).
  4. Click OK.