Creating the configuration file for the Big Data File stage
To establish connections to distributed file systems, such as a Hadoop Distributed File System (HDFS) in InfoSphere® BigInsights®, you can create the ishdfs.config configuration file to contain the required CLASSPATH information.
You must have IBM® InfoSphere DataStage® administrator level access to create the ishdfs.config file. Also, the Big Data File stage must be installed on a Linux® or AIX® operating system.
The ishdfs.config file sets up the CLASSPATH parameter with Java™ libraries and file system folders that are necessary to import metadata from the HDFS and move data to the HDFS.
If the HDFS and the InfoSphere Information Server engine are not installed on the same computer, you can copy the HDFS client library and client configuration files to the computer where the InfoSphere Information Server engine is installed, or make them available from the remote HDFS system. Whatever method you use, the HDFS client .jar files and configuration file directories must be accessible to the InfoSphere Information Server engine.
When the Big Data File stage is configured to use the HDFS API to communicate with the HDFS, it uses the CLASSPATH variable in the ishdfs.config configuration file if the configuration file is available. The use of this CLASSPATH variable overrides any other setting of the CLASSPATH variable for the Big Data File stage.