Configuring IBM Aspera as a data transfer tool

IBM Aspera is a data transfer tool that makes efficient, policy-based use of network bandwidth in high latency networks.

About this task

Aspera can be used to transfer data between data sources and the staging area. Using Aspera to transfer data between the staging area and the execution host is not supported. In common configurations, the network that connects the data staging area and the execution hosts is fast enough that data transfer speed isn’t a performance concern. The data transfer nodes (I/O nodes) are Aspera clients, which initiate all file transfers. The external data repositories (the data source and data destination hosts) are Aspera servers. Aspera uses SSH public keys for non-interactive authentication. Refer to the Aspera documentation for information about how to generate and configure SSH keys.

LSF data manager can work with any data transfer tool that supports a non-interactive command-line interface. The data transfer tool is configured by the parameter FILE_TRANSFER_CMD in the lsf.datamanager file. The argument to this parameter must be a single executable command. Passing command arguments by configuring the arguments directly in the parameter isn’t supported. The transfer command is run with the same user account as the job submission user.

For more information, see Data transfer job script interface.

The following steps show how to set up a simple integration for data manager file transfer that uses IBM Aspera:

Procedure

  1. Write a transfer script (LSF_SERVERDIR/ascp_wrap.sh) so that the Aspera ascp command can find the SSH credentials. Make the script executable (chmod 755 ascp_wrap.sh).
    #!/bin/sh 
    /usr/bin/ascp -i $HOME/.ssh/id_rsa “$@”
  2. Configure data manager to use the ascp command.
    Data manager doesn’t expand environment variables or lsf.datamanager configuration parameters, so you must explicitly specify the full path to ascp_wrap.sh. Edit LSF_ENVDIR/lsf.datamanager and add the parameter FILE_TRANSFER_CMD to the Parameters section.
    Begin Parameters
    ADMINS = lsfadmin
    STAGING_AREA = /var/lib/staging
    CACHE_INPUT_GRACE_PERIOD = 1440
    CACHE_OUTPUT_GRACE_PERIOD = 180
    CACHE_PERMISSIONS = user
    QUERY_NTHREADS = 4
    REMOTE_CACHE_REFRESH_INTERVAL = 15
    FILE_TRANSFER_CMD = /usr/share/lsf/9.1/linux2.6-glibc2.3-x86_64/etc/ascp_wrap.sh
    End Parameters
    
  3. Restart the data manager daemon.
    bdata admin reconfig
  4. Use the bdata showconf command to confirm that the configuration change took effect.
    bdata showconf
    LSF data management configuration at Tue Feb  3 10:34:09 2015
            ADMINS = lsfadmin
            CACHE_INPUT_GRACE_PERIOD = 1440 (minutes)
            CACHE_OUTPUT_GRACE_PERIOD = 180 (minutes)
            CACHE_PERMISSIONS = user
            FILE_PROCESSING_NTHREADS = 0
            FILE_TRANSFER_CMD = /usr/share/lsf/9.1/linux2.6-glibc2.3-x86_64/etc/ascp_wrap.sh
            LSB_TIME_DMD = 0
            LSF_DATA_HOSTS = hostA
            LSF_DATA_PORT = 61729
            LSF_LOGDIR = /usr/share/lsf/log
            LSF_LOG_MASK = LOG_WARNING
            QUERY_NTHREADS = 4
            REMOTE_CACHE_REFRESH_INTERVAL = 15 (seconds)
            STAGING_AREA = /var/lib/staging