Setting up Connect:Direct Node on S3 object store providers

Note that by default, Connect:Direct for Unix uses AWS S3 object store to transfer data between nodes installed on a EC2 instance or On-premise nodes, using pre-defined AWS S3 buckets. The following sections describe tasks required to activate Connect:Direct Node on AWS S3.

By default, cloud support for Connect:Direct is not enabled. To enable cloud support, create and manage all related AWS services and complete the following tasks:
  • Pre-requisites to activate Connect:Direct Unix on AWS
  • Installing Connect:Direct Unix node on cloud
  • Configuring Connect:Direct node for S3
IBM® Connect:Direct® for UNIX can also be configured to extend support to other S3 object store providers such as Minio, Dell EMC ECS, and Red Hat Ceph to execute public and on-premise cloud-based operations. The following parameters must be declared in Initparms, when installing Connect:Direct for Unix, to extend support for other S3 object store providers:
  • name
  • s3.endpointUrl
  • s3.endpointPort
  • s3.profilePath
  • s3.profileName
For more information see the section, Configuring the Connect:Direct Node for S3.

Pre-requisites to set-up Connect:Direct Unix on AWS

Before you configure Connect:Direct node definitions necessary for using Connect:Direct for UNIX, you must complete the following tasks:
  1. Set up AWS accounts and credentials
  2. Select and create an AMI instance, RedHat or SuSE
  3. Create IAM user/roles
  4. Create Security group. Port numbers which are specific to Connect:Direct should be added to the security group
  5. S3 Bucket Management

    For more information see, https://aws.amazon.com/account.

    Installing Connect:Direct Unix node on Cloud

    No specific configuration is required to install Connect:Direct for Unix node on an EC2 instance. For information to install Connect:Direct for UNIX see, Installing Connect:Direct for UNIX.

    If you are upgrading from an old release of Connect:Direct for UNIX node note that:
    • CD Unix, Linux platform, and JRE are now included in the base installation
    • Initparms to be included during the S3 plugin configuration are updated during the upgrade process
    Configuration considerations
    • All values used to change the default S3 IO EXIT behavior should be provided through s3. variables either by declaring them as defaults via. initparms.cfg file or by declaring specific values in sysopts.
    • An s3. variable is searched first in sysopts. If no value is retrieved from sysopts, the default value declared in the initparm.cfg file is used.
    • A parameter declared in sysopts overrides the initparm.cfg file parameter value.

    Configuring the Connect:Direct Node for S3

  1. S3 configuration is added to the Initparms during installation-.
    # S3 IO Exit parameters
    file.ioexit:\
     :name=s3:\
     :library=/cdunix/ndm/lib/libcdjnibridge.so:\
     :home.dir=/cdunix/ndm/ioexit-plugins/s3:\
     :options=-Djava.class.path=/cdunix/ndm/ioexit-plugins/s3/
     cd-s3-ioexit.jar com.aricent.ibm.mft.connectdirect.s3ioexit.S3IOExitFactory:
    S3 configuration is added to the Initparms during installation.
    • name – S3 object store plugin name
      • To declare a new S3 provider, a new entry with a new scheme must be created such as:
        # S3 IO Exit parameters
        file.ioexit:\
         :name=new:\... 
      • To define another S3 object store provider, declare the provider name in Initparms as a separate entry.
        Note: AWS is defined as the default S3 provider in Initparms.
        #AWS S3 provider
        File.ioexit:\
        name=s3:\
        #Minio S3 provider
        File.ioexit:\
        name=m3:\ 
    • library – Identifies the full path to libcdjnibridge.so shared library.
    • home.dir – Identifies the full path to the S3 home directory.
    • options – Identifies the JVM properties to use, class path and main class (S3IOExitFactory) to invoke.

      Default values are set in the Initparms using the :option field. Default values for the following parameters can be declared using the -D syntax in the :option= field.

      Note: Parameter declaration names are case sensitive.
    Parameter Description Example
    s3.endPointUrl New endpoint URL.

    Default: None

    s3.endPointUrl=10.120.133.151
    s3.endPointPort Define Endpoint port as an integer.

    Default: None

    s3.endPointPort=9020
    s3.endPointSecure Exit will generate HTTP or HTTPS URI depending on this parameter.

    Default: YES

    s3.endPointSecure=NO
    s3.profilePath Name of the credential file to use to retrieve profiles entries. Can be included in quoted.

    Default: None

    s3.profilePath=’/home/some user/s3io/credentials’
    s3.profileName Entry name in the credentials file.

    Default: None

    s3.profileName=new

    [new]

    aws_access_key_id = 2L0LF3NEQYYBQNV2P7NI aws_secret_access_key = YdovcT2yRgQAVuHliN1ns0s67E26vO8G

    s3.virtualHostedUri

    Sets the URI style to virtual-hosted–style or path-style URLs to access a bucket. Set the parameter to:
    • YES to request a Virtual-hosted-style URI
    • NO to request a Path style URI

    Default values:

    Scheme name is S3: Virtual hosted style

    Other scheme name: Path style

    see https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html

    note 1: Virtual hosted style will be effectively active only if endpoint is a dns name.

    note 2: Scheme name S3 can refer to another S3 provider than AWS S3

    note 3: Some S3 providers don't support virtual hosted URI.

     

    Example declaration

    # NEW IO Exit parameters
    file.ioexit:\
    :name=new:\
    :library=/opt/cd43/ndm/lib/libcdjnibridge.so:\
    :home.dir=/opt/cd43/ndm/ioexit-plugins/s3:\
    :options=-Xmx640m -Ds3.profileName=newentry
    -Ds3.endPointUrl=10.120.133.151 
    -Ds3.endPointPort=9020
    -Ds3.profilePath=’/home/some user/s3io/credentials’ 
    -Djava.class.path=/opt/cd43/ndm/ioexit-plugins/s3/cd-s3
    -ioexit.jar com.aricent.ibm.mft.connectdirect.s3ioexit.S3IOExitFactory:

    The following parameters are used for tuning or diagnostics in rare situations -

    
    s3ioexit.objectSize        # Default max object size is 5TB, this is similar to the 
                               ulimit feature to restrict the file size that can be 
                               uploaded.
    s3ioexit.dwldRange         # Defaults to 5MB, sets the S3 read buffer size.
    s3ioexit.partSize          # Since the part size is calculated dynamically, this
                               should be removed ( or for test only )
    s3ioexit.trace_level       # Allows more detailed trace logs than supported by CDU. 
    cdioexit_trace_destination # Indicates trace to SMGR.TRC or external trace log file,
                               needed for detail S3 traces that would overflow SMGR.TRC
    s3.executorMaxPool         # Max number of threads for parts upload. 
                                 Default:1O, Max:20
    s3.executorMaxRetries      # Number of retries when an upload thread failed its 
                                 allocation (when a memory exception occurred) 
                                 Default:10, Min:1, Max:20
    Note: To enable Multipart upload, set ckpt parameter value to 0. Checkpoint restart can be explicitly configured within a copy step through the ckpt parameter. If it is not configured in the copy step, it can be configured in the Initparms through the ckpt.interval parameter.
  2. Alternatively, S3 configuration can also be added as values in sysopts using the :variable=value: syntax.
    Note: Parameter declaration names is not case sensitive.
    Sysopts=”:s3.profileName=newentry:s3.endPointUrl=10.120.133.151:s3.endPointPort=
    9020:s3.profilePath=’/home/some user/s3io/credentials’”
  3. AWS Credentials Management -

    With introduction of AWS cloud support on Connect:Direct Unix, the user will need to manage AWS credentials.

    A simple method of associating credentials with a Connect:Direct user is to use the AWS CLI Configure command, which places the credential in ~/.aws/credentials, for example, /home/ec2-user/.aws/credentials.

    The AWS credentials are only required to access S3 during a pnode or snode copy step. During the copy step, the user is impersonated by the S3 IO Exit and the users home directory is used to access AWS credentials.

  4. Functional User Authorities Configuration -
    The user authorities file; userfile.cfg has been updated to support restricted S3 upload and download directories. When defined, only the S3 bucket defined may be used to send files from (upload) or receive files to (download).
    cd-cloud-user:\
     :admin.auth=n:\
     :pstmt.copy.ulimit=n:\
     :pstmt.upload=y:\
     :pstmt.upload_dir=s3://uploadBucket:\
     :pstmt.download=y:\
     :pstmt.download_dir=s3://downloadBucket:\
     :pstmt.run_dir=:\
     :pstmt.submit_dir=:\
     :name=:\
     :phone=:\
     :descrip=:

    S3 File transfers are limited to user file data and are not supported by Run Task/Job or other areas that specify a filename, for example, run_dir and submit_dir in the above example can only refer to standard file system locations.