Setting up Connect:Direct Node on S3 object store providers

Note that by default, Connect:Direct for Unix uses AWS S3 object store to transfer data between nodes installed on a EC2 instance or On-premise nodes, using pre-defined AWS S3 buckets. The following sections describe tasks required to activate Connect:Direct Node on AWS S3.

By default, cloud support for Connect:Direct is not enabled. To enable cloud support, create and manage all related AWS services and complete the following tasks:
  • Pre-requisites to activate Connect:Direct Unix on AWS
  • Installing Connect:Direct Unix node on cloud
  • Configuring Connect:Direct node for S3
IBM® Connect:Direct® for UNIX can also be configured to extend support to other S3 object store providers such as Minio, Dell EMC ECS, and Red Hat Ceph to execute public and on-premise cloud-based operations. The following parameters must be declared in Initparms, when installing Connect:Direct for Unix, to extend support for other S3 object store providers:
  • name
  • s3.endpointUrl
  • s3.endpointPort
  • s3.profilePath
  • s3.profileName
For more information see the section, Configuring the Connect:Direct Node for S3.

Pre-requisites to set-up Connect:Direct Unix on AWS

Before you configure Connect:Direct node definitions necessary for using Connect:Direct for UNIX, you must complete the following tasks:
  1. Set up AWS accounts and credentials
  2. Select and create an AMI instance, RedHat or SuSE
  3. Create IAM user/roles
  4. Create Security group. Port numbers which are specific to Connect:Direct should be added to the security group
  5. S3 Bucket Management

For more information see, https://aws.amazon.com/account.

Installing Connect:Direct Unix node on Cloud

No specific configuration is required to install Connect:Direct for Unix node on an EC2 instance. For information to install Connect:Direct for UNIX see, Installing Connect:Direct for UNIX.

If you are upgrading from an old release of Connect:Direct for UNIX node note that:
  • CD Unix, Linux platform, and JRE are now included in the base installation
  • Initparms to be included during the S3 plugin configuration are updated during the upgrade process

Configuration considerations

  • All values used to change the default S3 IO EXIT behavior should be provided through s3. variables either by declaring them as defaults via. initparms.cfg file or by declaring specific values in sysopts.
  • An s3. variable is searched first in sysopts. If no value is retrieved from sysopts, the default value declared in the initparm.cfg file is used.
  • A parameter declared in sysopts overrides the initparm.cfg file parameter value

Configuring the Connect:Direct Node for S3

  1. S3 configuration is added to the Initparms during installation-.
    # S3 IO Exit parameters
    file.ioexit:\
     :name=s3:\
     :library=/cdunix/ndm/lib/libcdjnibridge.so:\
     :home.dir=/cdunix/ndm/ioexit-plugins/s3:\
     :options=-Djava.class.path=/cdunix/ndm/ioexit-plugins/s3/
     cd-s3-ioexit.jar com.aricent.ibm.mft.connectdirect.s3ioexit.S3IOExitFactory:
    S3 configuration is added to the Initparms during installation.
    • name – S3 object store plugin name
      • To declare a new S3 provider, a new entry with a new scheme must be created such as:
        # S3 IO Exit parameters
        file.ioexit:\
         :name=new:\... 
      • To define another S3 object store provider, declare the provider name in Initparms as a separate entry.
        Note: AWS is defined as the default S3 provider in Initparms.
        #AWS S3 provider
        File.ioexit:\
        name=s3:\
        #Minio S3 provider
        File.ioexit:\
        name=m3:\ 
    • library – Identifies the full path to libcdjnibridge.so shared library.
    • home.dir – Identifies the full path to the S3 home directory.
    • options – Identifies the JVM properties to use, class path and main class (S3IOExitFactory) to invoke.

      Default values are set in the Initparms using the :option field. Default values for the following parameters can be declared using the -D syntax in the :option= field.

      Note: Parameter declaration names are case sensitive.
    Parameter Description Example
    s3.endPointUrl New endpoint URL.

    Default: None

    s3.endPointUrl=10.120.133.151
    s3.endPointPort Define Endpoint port as an integer.

    Default: None

    s3.endPointPort=9020
    s3.endPointSecure Exit will generate HTTP or HTTPS URI depending on this parameter.

    Default: YES

    s3.endPointSecure=NO
    s3.profilePath Name of the credential file to use to retrieve profiles entries. Can be included in quoted.

    Default: None

    s3.profilePath=’/home/some user/s3io/credentials’
    s3.profileName Entry name in the credentials file.

    Default: None

    s3.profileName=new

    Note: credential file should have one entry for new profile.
    [profile new]
    aws_access_key_id = anaccesskey
    aws_secret_access_key =asecretaccesskey

    s3.virtualHostedUri

    Sets the URI style to virtual-hosted–style or path-style URLs to access a bucket. Set the parameter to:
    • YES to request a Virtual-hosted-style URI
    • NO to request a Path style URI

    Default values:

    Scheme name is S3: Virtual hosted style

    Other scheme name: Path style

    see https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html

    Note: Virtual hosted style will be effectively active only if endpoint is a DNS name
    Note: Scheme name S3 can refer to another S3 provider than AWS S3
    Note: Some S3 providers don't support virtual hosted URI
    s3.virtualHostedUri=YES
    s3.sseS3

    Amazon S3 default encryption enables users to set the default encryption behavior for an S3 bucket. You can set default encryption on a bucket such that all new objects are encrypted when they are stored in the bucket. The objects are encrypted using server-side encryption with either Amazon S3-managed keys (SSE-S3) or Customer Master Keys (CMKs) stored in AWS Key Management Service (AWS KMS).

    Set the parameter to YES to request server side encryption.

    Default value is NO

    For more information, see Bucket Encryption

    s3.sseS3=YES

    Example declaration

    " NEW IO Exit parameters file.ioexit:\
    :name=new:\
    :library=/cdunix/ndm/lib/libcdjnibridge.so:\
    :home.dir=/cdunix/ndm/ioexit-plugins/s3:\
    :options=-Xmx640m -Ds3.profileName=newentry
    -Ds3.endPointUrl=10.120.133.151
    -Ds3.endPointPort=9020
    -Ds3.profilePath=’/home/some user/s3io/credentials’
    -Djava.class.path=/cdunix/ndm/ioexit-plugins/s3/cd-s3
    -ioexit.jar com.aricent.ibm.mft.connectdirect.s3ioexit.S3IOExitFactory:

    The following parameters are used for tuning or diagnostics in rare situations -

    
    s3ioexit.objectSize        # Default max object size is 5TB, this is similar to the 
                               ulimit feature to restrict the file size that can be 
                               uploaded.
    s3ioexit.dwldRange         # Defaults to 5MB, sets the S3 read buffer size.
    s3ioexit.partSize          # Since the part size is calculated dynamically, this
                               should be removed ( or for test only )
    s3ioexit.trace_level       # Allows more detailed trace logs than supported by CDU. 
    cdioexit_trace_destination # Indicates trace to SMGR.TRC or external trace log file,
                               needed for detail S3 traces that would overflow SMGR.TRC
    s3.executorMaxPool         # Max number of threads for parts upload. 
                                 Default:1O, Max:20
    s3.executorMaxRetries      # Number of retries when an upload thread failed its 
                                 allocation (when a memory exception occurred) 
                                 Default:10, Min:1, Max:20
    Note: To enable Multipart upload, set ckpt parameter value to 0. Checkpoint restart can be explicitly configured within a copy step through the ckpt parameter. If it is not configured in the copy step, it can be configured in the Initparms through the ckpt.interval parameter.
  2. Alternatively, S3 configuration can also be added as values in sysopts using the :variable=value: syntax.

    S3 parameters set in sysopts will override default values or values set in initparms.cfg file for the referenced entry in the copy step statement that is, s3:// for an S3 entry and ascheme:// for an ascheme entry.

    Using sysopts and S3 parameters can provide flexibility when managing more than one profile or more than one S3 provider. The following example assumed that an s3 entry in initparms.cfg file is the default one and copy step uses S3:// as the scheme name.
    • sysopts=”:s3.profileName=anotherprofile:” will request to use this specific profile.
    • sysopts=”:s3.sseS3=YES:” requests server side encryption for this S3 object.
    • sysopts=”:s3.profileName=newentry:s3.endPointUrl=10.120.133.151:s3.endPointPort= 9020:s3.profilePath=’/home/some user/s3io/credentials’:” will point to another S3 provider but default is on AWS.
    Note: Parameter declaration names is not case sensitive. Also for sysopts, names are not case sensitive.
  3. AWS Credentials Management -

    With AWS cloud support being extended on Connect:Direct for UNIX the user is required to manage AWS credentials.

    A simple method of associating credentials with a Connect:Direct user is to use the AWS CLI Configure command, which places the credential in ~/.aws/credentials, for example, /home/ec2-user/.aws/credentials.

    The AWS credentials are only required to access S3 during a pnode or snode copy step. During the copy step, the user is impersonated by the S3 IO Exit and the users home directory is used to access AWS credentials.

    On AWS and with the default s3 entry in initparms.cfg file that is, with no extra parameters, Connect:Direct for UNIX uses the following default credential providers chain:
    • Environment variables: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
    • Java system properties: aws.accessKeyId and aws.secretKey
    • The default credential profiles file path: ~/.aws/credentials
    • Amazon ECS container credentials: loaded from the Amazon ECS when environment variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is set
    • Instance profile credentials: used on EC2 instances and delivered through the Amazon EC2 metadata service. Instance profile credentials are used only when AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is not set
    • Web Identity Token credentials from the environment or container
      Note: The default credential providers chain implementation described above may not apply to other S3 providers.
  4. Functional User Authorities Configuration -
    The user authorities file; userfile.cfg has been updated to support restricted S3 upload and download directories. When defined, only the S3 bucket defined may be used to send files from (upload) or receive files to (download).
    cd-cloud-user:\
     :admin.auth=n:\
     :pstmt.copy.ulimit=n:\
     :pstmt.upload=y:\
     :pstmt.upload_dir=s3://uploadBucket:\
     :pstmt.download=y:\
     :pstmt.download_dir=s3://downloadBucket:\
     :pstmt.run_dir=:\
     :pstmt.submit_dir=:\
     :name=:\
     :phone=:\
     :descrip=:

    S3 File transfers are limited to user file data and are not supported by Run Task/Job or other areas that specify a filename, for example, run_dir and submit_dir in the above example can only refer to standard file system locations.