If you plan to run the data movement functionality from Hadoop, you must first install
the IBM® Fast Data Movement package on the Hadoop node from which data
movement is to be run.
The installation and the upgrade procedure of the data movement feature are identical.
About this task
The installation package is available on Db2® Warehouse
container in the following path:
/services/bludownload/FDM/fast-data-movement-v1.0.2.tar.gz.
Note: If Kerberos authentication is set up on Hadoop, you must run kinit before
the installation to obtain the Kerberos ticket.
Procedure
- Download the fast-data-movement-version.tar.gz
package from the Db2 Warehouse container and save it in a
location accessible to your Hadoop node.
- Change to the directory where the
fast-data-movement-version.tar.gz file is
located.
- Run the following command to extract the .tar.gz package:
gunzip fast-data-movement-version.tar.gz
- Extract the .tar package:
tar -xvf fast-data-movement-version.tar
-
As root, run the installation script:
./fast_data_movement_install.sh
The following installation parameters are available:
- --silent
- Displays no prompts for user input. All defaults are used. If the script cannot proceed without
additional information, installation exits with an error. With silent option you must specify both
--hive_conf and --hive_lib options.
- --datamove_path <datamove_path>
- Specifies the user defined pathname for the data movement utility files. The files will be
placed in <datamove_path>.
- --hdfs_user <hdfs_user>
- Specifies the user name that has permissions to HDFS.
- --hive_conf <hive_configuration_path>
- Specifies the Hive configuration directory with full path. If omitted, default paths will be
checked.
- --hive_lib <hive_library_path>
- Specifies the Hive libraries directory with full path. If omitted, default paths will be
checked.
Note:
If the paths to Hive or BigSQL /lib and /conf
directories are other than default, they must be specified.
When specifying Hive paths you must always provide both --hive_conf, and
--hive_lib options.
These paths are then used when running the
fdm.sh script to run data
movement from Hadoop. The following table presents default paths depending on the service provider.
If the installation script cannot locate the required files two times in a row, the installation
process fails.
Table 1. Default paths
for Hadoop files
Service Provider |
/lib |
/conf |
BigInsights 3 |
Hive |
/opt/ibm/biginsights/hive/lib/ |
/opt/ibm/biginsights/hive/conf/ |
BigSQL |
/home/bigsql/sqllib/java/ |
/opt/ibm/biginsights/hive/conf/ |
BigInsights 4 |
Hive |
/usr/iop/current/hive-client/lib/ |
/usr/ibmpacks/bigsql/4.0/hive/conf/ |
BigSQL |
/home/bigsql/sqllib/java/ |
/usr/ibmpacks/bigsql/4.0/hive/conf/ |
Hortonworks |
/usr/hdp/current/hive-client/lib/ |
/usr/hdp/current/hive-client/conf/ |
Hortonworks 2.3.2 and above |
/usr/hdp/current/atlas-server/hook/hive/:
/usr/hdp/current/hive-client/lib/ |
/usr/hdp/current/hive-client/conf/ |
Cloudera 4 |
/usr/lib/hive/lib/:
/usr/share/cmf/cloudera-navigator-server/libs/cdh4/ |
/etc/hive/conf/ |
Cloudera 5 |
/opt/cloudera/parcels/CDH/lib/hive/lib/ |
/etc/hive/conf/ |
Cloudera QuickStart |
/usr/lib/hive/lib/ |
/etc/hive/conf/ |
Results
The data movement files are now available on your Hadoop nodes.
Note: If you are installing data movement to the default directory, the installation logs are saved
to the installation location. For default installations, the path is
/fastDataMovement/var/log/fastDataMovement_install. If you specify an alternate
location using the --datamove_path option, the logs are saved in the
/<datamove_path>/var/log/fastDataMovement_install directory.