Setting up Connect:Direct Node on S3 object store providers
Note that by default, Connect:Direct for Unix uses AWS S3 object store to transfer data between nodes installed on a EC2 instance or On-premise nodes, using pre-defined AWS S3 buckets. The following sections describe tasks required to activate Connect:Direct Node on AWS S3.
- Pre-requisites to activate Connect:Direct Unix on AWS
- Installing Connect:Direct Unix node on cloud
- Configuring Connect:Direct node for S3
name
s3.endpointUrl
s3.endpointPort
s3.profilePath
s3.profileName
Pre-requisites to set-up Connect:Direct Unix on AWS
- Set up AWS accounts and credentials
- Select and create an AMI instance, RedHat or SuSE
- Create IAM user/roles
- Create Security group. Port numbers which are specific to Connect:Direct should be added to the security group
- S3 Bucket Management
For more information see, https://aws.amazon.com/account.
Installing Connect:Direct Unix node on Cloud
No specific configuration is required to install Connect:Direct for Unix node on an EC2 instance. For information to install Connect:Direct for UNIX see, Installing Connect:Direct for UNIX.
If you are upgrading from an old release of Connect:Direct for UNIX node note that:- CD Unix, Linux platform, and JRE are now included in the base installation
- Initparms to be included during the S3 plugin configuration are updated during the upgrade process
Configuration considerations- All values used to change the default S3 IO EXIT behavior should be provided through
s3.
variables either by declaring them as defaults via.initparms.cfg
file or by declaring specific values insysopts
. - An
s3.
variable is searched first insysopts
. If no value is retrieved fromsysopts
, the default value declared in theinitparm.cfg
file is used. - A parameter declared in
sysopts
overrides theinitparm.cfg
file parameter value.
Configuring the Connect:Direct Node for S3
- S3 configuration is added to the Initparms during installation-.
# S3 IO Exit parameters file.ioexit:\ :name=s3:\ :library=/cdunix/ndm/lib/libcdjnibridge.so:\ :home.dir=/cdunix/ndm/ioexit-plugins/s3:\ :options=-Djava.class.path=/cdunix/ndm/ioexit-plugins/s3/ cd-s3-ioexit.jar com.aricent.ibm.mft.connectdirect.s3ioexit.S3IOExitFactory:
S3 configuration is added to the Initparms during installation.- name – S3 object store plugin name
- To declare a new S3 provider, a new entry with a new scheme must be created such as:
# S3 IO Exit parameters file.ioexit:\ :name=new:\...
- To define another S3 object store provider, declare the provider name in Initparms as a separate
entry.Note: AWS is defined as the default S3 provider in Initparms.
#AWS S3 provider File.ioexit:\ name=s3:\ #Minio S3 provider File.ioexit:\ name=m3:\
- To declare a new S3 provider, a new entry with a new scheme must be created such as:
- library – Identifies the full path to libcdjnibridge.so shared library.
- home.dir – Identifies the full path to the S3 home directory.
- options – Identifies the JVM properties to use, class path and main class
(S3IOExitFactory) to invoke.
Default values are set in the Initparms using the
:option
field. Default values for the following parameters can be declared using the-D
syntax in the:option=
field.Note: Parameter declaration names are case sensitive.
Parameter Description Example s3.endPointUrl New endpoint URL. Default: None
s3.endPointUrl=10.120.133.151 s3.endPointPort Define Endpoint port as an integer. Default: None
s3.endPointPort=9020 s3.endPointSecure Exit will generate HTTP or HTTPS URI depending on this parameter. Default: YES
s3.endPointSecure=NO s3.profilePath Name of the credential file to use to retrieve profiles entries. Can be included in quoted. Default: None
s3.profilePath=’/home/some user/s3io/credentials’ s3.profileName Entry name in the credentials file. Default: None
s3.profileName=new
[new]
aws_access_key_id = 2L0LF3NEQYYBQNV2P7NI aws_secret_access_key = YdovcT2yRgQAVuHliN1ns0s67E26vO8G
s3.virtualHostedUri
Sets the URI style to virtual-hosted–style or path-style URLs to access a bucket. Set the parameter to:- YES to request a Virtual-hosted-style URI
- NO to request a Path style URI
Default values:
Scheme name is S3: Virtual hosted style
Other scheme name: Path style
see https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
note 1: Virtual hosted style will be effectively active only if endpoint is a dns name.
note 2: Scheme name S3 can refer to another S3 provider than AWS S3
note 3: Some S3 providers don't support virtual hosted URI.
Example declaration
# NEW IO Exit parameters file.ioexit:\ :name=new:\ :library=/opt/cd43/ndm/lib/libcdjnibridge.so:\ :home.dir=/opt/cd43/ndm/ioexit-plugins/s3:\ :options=-Xmx640m -Ds3.profileName=newentry -Ds3.endPointUrl=10.120.133.151 -Ds3.endPointPort=9020 -Ds3.profilePath=’/home/some user/s3io/credentials’ -Djava.class.path=/opt/cd43/ndm/ioexit-plugins/s3/cd-s3 -ioexit.jar com.aricent.ibm.mft.connectdirect.s3ioexit.S3IOExitFactory:
The following parameters are used for tuning or diagnostics in rare situations -
s3ioexit.objectSize # Default max object size is 5TB, this is similar to the ulimit feature to restrict the file size that can be uploaded. s3ioexit.dwldRange # Defaults to 5MB, sets the S3 read buffer size. s3ioexit.partSize # Since the part size is calculated dynamically, this should be removed ( or for test only ) s3ioexit.trace_level # Allows more detailed trace logs than supported by CDU. cdioexit_trace_destination # Indicates trace to SMGR.TRC or external trace log file, needed for detail S3 traces that would overflow SMGR.TRC s3.executorMaxPool # Max number of threads for parts upload. Default:1O, Max:20 s3.executorMaxRetries # Number of retries when an upload thread failed its allocation (when a memory exception occurred) Default:10, Min:1, Max:20
Note: To enable Multipart upload, setckpt
parameter value to 0. Checkpoint restart can be explicitly configured within a copy step through theckpt
parameter. If it is not configured in the copy step, it can be configured in the Initparms through theckpt.interval
parameter. - name – S3 object store plugin name
- Alternatively, S3 configuration can also be added as values in
sysopts
using the:variable=value:
syntax.Note: Parameter declaration names is not case sensitive.Sysopts=”:s3.profileName=newentry:s3.endPointUrl=10.120.133.151:s3.endPointPort= 9020:s3.profilePath=’/home/some user/s3io/credentials’”
- AWS Credentials Management -
With introduction of AWS cloud support on Connect:Direct Unix, the user will need to manage AWS credentials.
A simple method of associating credentials with a Connect:Direct user is to use the AWS CLI Configure command, which places the credential in ~/.aws/credentials, for example, /home/ec2-user/.aws/credentials.
The AWS credentials are only required to access S3 during a pnode or snode copy step. During the copy step, the user is impersonated by the S3 IO Exit and the users home directory is used to access AWS credentials.
- Functional User Authorities Configuration -The user authorities file; userfile.cfg has been updated to support restricted S3 upload and download directories. When defined, only the S3 bucket defined may be used to send files from (upload) or receive files to (download).
cd-cloud-user:\ :admin.auth=n:\ :pstmt.copy.ulimit=n:\ :pstmt.upload=y:\ :pstmt.upload_dir=s3://uploadBucket:\ :pstmt.download=y:\ :pstmt.download_dir=s3://downloadBucket:\ :pstmt.run_dir=:\ :pstmt.submit_dir=:\ :name=:\ :phone=:\ :descrip=:
S3 File transfers are limited to user file data and are not supported by Run Task/Job or other areas that specify a filename, for example, run_dir and submit_dir in the above example can only refer to standard file system locations.