Configure the Apache Hadoop integration

Deploy Apache Hadoop and prepare LSF for the integration package

Before you begin

The Apache Hadoop integration supports the following platforms:
  • x86_64
  • ppc64
  • ppc64le

Procedure

  1. Download and deploy Apache Hadoop.

    The latest version is available from the official Apache Hadoop site: http://hadoop.apache.org/releases

    This integration supports Hadoop, Versions 1 and 2, and is tested on Open Source Hadoop, Versions 1.2.1 and 2.7.2; however, this integration should also work with other versions and other Hadoop distributions.

    Note: You do not need to configure Apache Hadoop after installation. The Hadoop connector scripts automatically configure Apache Hadoop for you.
  2. Set the $HADOOP_HOME environment variable as the file path to the Hadoop installation directory.

    If you do not set the $HADOOP_HOME environment before using the integration, you must run the lsfhadoop.sh connector script with the --hadoop-dir option to specify the file path to the Hadoop installation directory.

  3. Set the $JAVA_HOME environment variable as the file path to the Java runtime installation directory.

    If you do not set the $JAVA_HOME environment before using the integration, you must run the lsfhadoop.sh connector script with the --java-home option to specify the file path to the Hadoop installation directory.

    Note: Running the --java-home option allows you to overwrite the value of the $JAVA_HOME environment variable on the command line. For more details, refer to Run a Hadoop application on LSF.

What to do next

  • You must ensure that the $HADOOP_HOME and $JAVA_HOME directories are accessible to each LSF server host.
  • You can overwrite the $HADOOP_HOME and $JAVA_HOME environment variables at job submission time by using the --hadoop-dir and --java-home options with lsfhadoop.sh. For more details, refer to Run a Hadoop application on LSF.