File connector data connection prerequisites and parameters

You can use the File connector in the InfoSphere® Information Analyzer thin client to import metadata from HDFS data sets. You can use the File connector in InfoSphere Metadata Asset Manager to import metadata from engine tier computers. You must meet the following prerequisites and configure the parameters when you create or edit a data connection in the thin client or InfoSphere Metadata Asset Manager.

Data connection parameters for File Connector - Engine Tier

Specify values for the following parameters when you create a data connection to the engine tier computer in InfoSphere Metadata Asset Manager.
Name
Specify the name of the data connection.
Description
Specify the description of the data connection.

For more information on the File connector, see Importing File connector metadata.

Prerequisites for File Connector - HDFS

  • If you use Kerberos or SSL encryption to access HDFS, see Defining a connection.
  • If you do not have metadata about files and folders in HDFS, specify column metadata and metadata about how a file is formatted. Use one of the metadata formatting options.
    You can import metadata that is specified in one of the following ways:
    • As the first row of the file. The following formats are supported: column_name:datatype and column_name:datatype(length).
    • In an .osh schema file that is in the same folder and is named file.osh or folder.osh, where file is the name of a file in the folder and folder is the name of the folder. For example, if fileA.txt is in the sample directory, metadata can be specified in the fileA.txt.osh or sample.osh files.
    • To use SSL encryption when you use the WebHDFS API or HttpFS API to communicate with HDFS, you might need to import the server public certificate into your truststore and specify values for truststore parameters. See Configuring the truststore.

Data connection parameters for File Connector - HDFS

Specify values for the following parameters when you create a data connection to HDFS.
Name
Specify the name of the data connection.
Description
Specify a description of the data connection.
Choose file system
Select the file system to import metadata from, either WebHDFS or HttpFS.
Host
If you do not specify a custom URL, you must specify the name of the host that provides a REST HTTP gateway that supports the HDFS file system operations. The host is on the name node in WebHDFS or on either the name node or the edge node in HttpFS.
Port
Specify the port to connect to. If you do not specify a port number, the connector uses one of the following port numbers:
  • If you do not select Use SSL (HTTPS), the connector uses 50070 for WebHDFS or 14000 for HttpFS.
  • If you select Use SSL (HTTPS), the connector uses the port number 50470 for WebHDFS or 14443 for HttpFS.
Use SSL (HTTPS)
Select to use Secure Sockets Layer (HTTPS).
Use Kerberos
Select to use Kerberos authentication.
Service principal
Specify the Kerberos service principal (SPN) that you want to use for the host. Use this property if the realm of the host is different from the realm of the user. This property is used for authenticating across all of your domains. When you specify the service principal for the web application server, you must specific the fully qualified domain name (FQDN) principle of the WLE server with the realm. For example, HTTP/testmach.austin.ibm.com. This parameter is optional.
Use keytab
Select to use a Kerberos keytab file for the password.
Keytab
Specify the name and the path of the keytab file on the engine tier computer.
Custom URL
If you prefer to use a custom URL instead of one that is generated based on the values that you specify for Use SSL (HTTPS), Host, and Port, specify the base URL for the server, either http or https.
User name
Required. Specify the name of a user who can connect to the HDFS system.
Password
If you did not select Use keytab, specify the password for the specified user.