Installing or upgrading the data movement feature on Hadoop

If you plan to run the data movement functionality from Hadoop, you must first install the IBM® Fast Data Movement package on the Hadoop node from which data movement is to be run.

The installation and the upgrade procedure of the data movement feature are identical.

About this task

The installation package is available on Db2® Warehouse container in the following path: /services/bludownload/FDM/fast-data-movement-v1.0.2.tar.gz.

Note: If Kerberos authentication is set up on Hadoop, you must run kinit before the installation to obtain the Kerberos ticket.

Procedure

  1. Download the fast-data-movement-version.tar.gz package from the Db2 Warehouse container and save it in a location accessible to your Hadoop node.
  2. Change to the directory where the fast-data-movement-version.tar.gz file is located.
  3. Run the following command to extract the .tar.gz package:
    gunzip fast-data-movement-version.tar.gz
  4. Extract the .tar package:
    tar -xvf fast-data-movement-version.tar
  5. As root, run the installation script:
    ./fast_data_movement_install.sh
    
    The following installation parameters are available:
    --silent
    Displays no prompts for user input. All defaults are used. If the script cannot proceed without additional information, installation exits with an error. With silent option you must specify both --hive_conf and --hive_lib options.
    --datamove_path <datamove_path>
    Specifies the user defined pathname for the data movement utility files. The files will be placed in <datamove_path>.
    --hdfs_user <hdfs_user>
    Specifies the user name that has permissions to HDFS.
    --hive_conf <hive_configuration_path>
    Specifies the Hive configuration directory with full path. If omitted, default paths will be checked.
    --hive_lib <hive_library_path>
    Specifies the Hive libraries directory with full path. If omitted, default paths will be checked.
    Note:

    If the paths to Hive or BigSQL /lib and /conf directories are other than default, they must be specified.

    When specifying Hive paths you must always provide both --hive_conf, and --hive_lib options.

    These paths are then used when running the fdm.sh script to run data movement from Hadoop. The following table presents default paths depending on the service provider. If the installation script cannot locate the required files two times in a row, the installation process fails.
    Table 1. Default paths for Hadoop files
    Service Provider /lib /conf
    BigInsights 3 Hive /opt/ibm/biginsights/hive/lib/ /opt/ibm/biginsights/hive/conf/
    BigSQL /home/bigsql/sqllib/java/ /opt/ibm/biginsights/hive/conf/
    BigInsights 4 Hive /usr/iop/current/hive-client/lib/ /usr/ibmpacks/bigsql/4.0/hive/conf/
    BigSQL /home/bigsql/sqllib/java/ /usr/ibmpacks/bigsql/4.0/hive/conf/
    Hortonworks /usr/hdp/current/hive-client/lib/ /usr/hdp/current/hive-client/conf/
    Hortonworks 2.3.2 and above /usr/hdp/current/atlas-server/hook/hive/: /usr/hdp/current/hive-client/lib/ /usr/hdp/current/hive-client/conf/
    Cloudera 4 /usr/lib/hive/lib/: /usr/share/cmf/cloudera-navigator-server/libs/cdh4/ /etc/hive/conf/
    Cloudera 5 /opt/cloudera/parcels/CDH/lib/hive/lib/ /etc/hive/conf/
    Cloudera QuickStart /usr/lib/hive/lib/ /etc/hive/conf/

Results

The data movement files are now available on your Hadoop nodes.

Note: If you are installing data movement to the default directory, the installation logs are saved to the installation location. For default installations, the path is /fastDataMovement/var/log/fastDataMovement_install. If you specify an alternate location using the --datamove_path option, the logs are saved in the /<datamove_path>/var/log/fastDataMovement_install directory.