IBM Support

How to create a Datastage Remote Engine for Public Cloud - IBM DataStage-aaS Anywhere

How To


Summary

Setting up Remote Engine(IBM DataStage-aaS Anywhere) for IBM Cloud

Objective

Deploy a Datastage Remote Engine as a Kubernetes cluster or as a Container. This is a new offering called 'IBM DataStage-aaS Anywhere' for the IBM Public Cloud users.

Useful links:

https://dataplatform.cloud.ibm.com/docs/content/dstage/dsnav/topics/datastage-environments.html?audience=wdp&context=cpdaas#topic_hdt_v5y_spb__remoteeng

Environment

IBM Cloud with IBM DataStage-aaS Anywhere

Steps

The steps are as simple as documented in
But, this document is mainly created to show the steps in more detail with examples. So, keep the above link handy.
1. You would need a RH Linux System to setup the Remote Engine with Podman and jq installed.
2. Take a putty session to the RH Linux System where you are going to setup the Remote Engine.
[root@c31391v1 ~]#  podman --version
podman version 4.6.1
[root@c31391v1 tmp]# sudo yum install jq
Updating Subscription Management repositories.
Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)                                                                                                 88 kB/s | 4.1 kB     00:00
Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)                                                                                             105 kB/s | 4.5 kB     00:00
Package jq-1.6-7.el8.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

[root@c31391v1 ~]# git clone https://github.com/IBM/DataStage.git
Cloning into 'DataStage'...
remote: Enumerating objects: 589, done.
remote: Counting objects: 100% (194/194), done.
remote: Compressing objects: 100% (120/120), done.
remote: Total 589 (delta 88), reused 114 (delta 74), pack-reused 395
Receiving objects: 100% (589/589), 41.57 MiB | 26.92 MiB/s, done.
Resolving deltas: 100% (307/307), done.

[root@c31391v1 DataStage]# pwd
/root/DataStage
[root@c31391v1 DataStage]# git pull
Already up to date.
​
[root@c31391v1 DataStage]# openssl enc -aes-256-cbc -k secret -P -md sha1
salt=AADA90FC37AE1CA7
key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
iv =xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Keep the values of "key" and "iv" from the above output as we would be using it later.
3. Go to IBM Cloud ->IBM Cloud Pak for Data-> Access (IAM) on the left side pane. This would open up a new page with API keys. For internal Cloud, https://test.cloud.ibm.com/iam/apikeys
 Go to 'API keys' from the left side pane and click on 'Create'
image-20240306143533-4
image-20240306143801-6
Download the json output from the following window after creating the API Key. The json file contains your IBM Cloud APIKey that we will use later.
4. You also need to fetch the IBM Cloud Container Registry API Key. This apikey will be used to download the images needed to run Remote Engine. Currently there is no way to generate this, so it needs to be requested via IBM Cloud Support: https://cloud.ibm.com/unifiedsupport.
5. Get the project ID of the project in the IBM Cloud that you want to configure the Remote Engine for.
image-20240306144928-8
6. If you are on Staging and using a free account then you will have to request to be bypassed via requesting allowlisting so you can setup a Remote Engine for a free account:
7. Now that you have all the pre-requisites in place, you are now ready to create a Remote Engine.
There are two ways to generate the command to create a Remote Engine.
 - From IBM Cloud -> Project ->Manage->Run->Remote (as shown below) and click on 'installation script':
image-20240306164034-3
Here, we will create the Remote Environment as a 'Container' using Podman. But, you can also create the Remote Engine as a Kubernetes Cluster. (See https://github.com/IBM/DataStage/tree/3f2285f6047f300c8f0ad13b0772407ab261d8fa/RemoteEngine)
image-20240306131931-2
image-20240306164742-4
This would generate the commands for you to start/stop/restart/clean the Remote Engine that you can run on your local Redhat system
image-20240306164943-5
 - From Shell
Instead of generating command from the project, you can directly type the command using the README file (link provided above).
[root@c31391v1 docker]# pwd
/root/DataStage/RemoteEngine/docker


[root@c31391v1 docker]# ./dsengine.sh start -n 'MLGProjRemoteEngine' -a "$IBMCLOUD_APIKEY" -e "$ENCRYPTION_KEY" -i "$ENCRYPTION_IV" -p "$IBMCLOUD_CONTAINER_REGISTRY_APIKEY" --project-id "$PROJECT_ID" --home ys1dev

IBM DataStage Remote Engine 0.0.1

DATASTAGE_HOME=ys1dev
GATEWAY_URL=https://api.dataplatform.dev.cloud.ibm.com
PROJECT_ID=8c5c8500-a451-43a7-b8ac-c7b36715229d
REMOTE_ENGINE_PREFIX=MLGProjRemoteEngine
DOCKER_REGISTRY=icr.io/datastage
CONTAINER_MEMORY=4g
CONTAINER_CPUS=2
DOCKER_VOLUMES_DIR=/tmp/docker/volumes

Creating init scripts ...
Checking for existing container 'MLGProjRemoteEngine_runtime'
Existing container MLGProjRemoteEngine_runtime found in a stopped state
Starting container 'MLGProjRemoteEngine_runtime' ...
MLGProjRemoteEngine_runtime
  waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 5 seconds
  waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 10 seconds
  waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 15 seconds
  waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 20 seconds
  waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 25 seconds
  waiting for MLGProjRemoteEngine_runtime to start ... time elapsed: 30 seconds
{"service_name":"ds-px-runtime","status":"ok","timestamp":"2024-03-06T05:57:10.487Z","version":"1.0.1667","px_version":"develop-11.7.1-2024.02.20.21.32.41","connector_version":"refs/heads/cpd/4.8.3-11.7.1-SCAPI_VERSION-10-"}
Started container MLGProjRemoteEngine_runtime in 30 seconds

Runtime Environment 'Remote Engine MLGProjRemoteEngine' is available, and can be used to run DataStage flows

You can now see the container running and you can exec into the pods as shown below. It looks pretty much like any regular CP4D instance. But, there is only one conductor node.
[root@c31391v1 DataStage]# podman ps
CONTAINER ID  IMAGE                                                                                                   COMMAND               CREATED     STATUS       PORTS       NAMES
79eaa3cd3606  icr.io/datastage/ds-px-runtime@sha256:072a38d760c5d88a11c6333998c9ddae5ed3f8e98915a93e0197b349f1e1f01e  -c /px-storage/in...  5 days ago  Up 22 hours              MLGProjRemoteEngine_runtime
[root@c31391v1 DataStage]#  podman exec -it 79eaa3cd3606 bash
bash-4.4# ls -alrt
total 8
drwxr-xr-x   2 root   root      6 Jun 21  2021 srv
lrwxrwxrwx   1 root   root      8 Jun 21  2021 sbin -> usr/sbin
drwxr-xr-x   2 root   root      6 Jun 21  2021 mnt
drwxr-xr-x   2 root   root      6 Jun 21  2021 media
lrwxrwxrwx   1 root   root      9 Jun 21  2021 lib64 -> usr/lib64
lrwxrwxrwx   1 root   root      7 Jun 21  2021 lib -> usr/lib
dr-xr-xr-x   2 root   root      6 Jun 21  2021 boot
lrwxrwxrwx   1 root   root      7 Jun 21  2021 bin -> usr/bin
drwx------   2 root   root      6 Feb 12 20:10 lost+found
drwxr-xr-x   1 root   root    109 Feb 12 20:11 usr
drwxr-xr-x   1 root   root     52 Feb 12 20:11 var
drwxr-xr-x   2 root   root     25 Feb 21 05:33 licenses
drwxr-xr-x   1 root   root     20 Feb 21 05:36 home
drwxr-xr-x   1 root   root     17 Feb 21 05:36 opt
drwxr-xr-x   1 root   root     28 Feb 21 05:36 debug
lrwxrwxrwx   1 root   root     33 Feb 21 05:37 output -> /opt/ibm/wlp/output/defaultServer
lrwxrwxrwx   1 root   root     38 Feb 21 05:37 config -> /opt/ibm/wlp/usr/servers/defaultServer
-rw-r--r--   1 root   root    692 Feb 21 13:14 ibm.com_IBM_DataStage_as_a_Service_Anywhere-1.0.0.swidtag
drwxr-xr-x   1 root   root   4096 Feb 21 13:16 etc
dr-xr-x---   1 root   root     20 Feb 21 13:16 root
drwxrwxr-x   8 root   root    106 Feb 28 18:27 ds-storage
dr-xr-xr-x   1 root   root     99 Mar  1 17:13 ..
dr-xr-xr-x   1 root   root     99 Mar  1 17:13 .
drwxr-xr-x   1 root   root     27 Mar  1 17:13 run
drwxrwxr-x  11 root   root    184 Mar  1 17:14 px-storage
dr-xr-xr-x 213 root   root      0 Mar  5 23:56 proc
dr-xr-xr-x  13 root   root      0 Mar  5 23:56 sys
drwxr-xr-x   5 root   root    340 Mar  5 23:56 dev
drwxrwxrwx   1 dsuser dsuser  139 Mar  5 23:56 logs
drwxrwxrwt   1 root   root    172 Mar  6 21:40 tmp
bash-4.4# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 Mar05 ?        00:00:00 /bin/bash -c /px-storage/init-volume.sh;/opt/ibm/startup.sh
root          42       1  0 Mar05 ?        00:00:00 /bin/sh /opt/ibm/startup.sh
root          48      42  0 Mar05 ?        00:00:00 /bin/sh /opt/ibm/initScripts/startcontainer.sh
root          51      48  0 Mar05 ?        00:00:00 /bin/sh ./startup.sh
root          52      48  0 Mar05 ?        00:00:03 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -f /dev/null
root         132       1  0 Mar05 ?        00:01:02 /opt/ibm/PXService/Server/DSWLM/../../jdk/bin/java -Xmx1024m -classpath /opt/ibm/PX
root         225       1  0 Mar05 ?        00:06:44 /opt/java/bin/java -javaagent:/opt/ibm/wlp/bin/tools/ws-javaagent.jar -Djava.awt.he
root         343      51  0 Mar05 ?        00:00:03 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -f /dev/null
root       17723       0  0 21:54 pts/0    00:00:00 bash
root       17730   17723  0 21:55 pts/0    00:00:00 ps -ef
8. Go to the project and select the Remote Engine to be the default Engine to run the jobs on:
image-20240307143943-2
9. Test the connection by running a simple job:
image-20240307144116-3
Your Remote Engine is now good to use.
You can create a Remote Engine as a Kubernetes Cluster using similar steps. But, please check the pre-requisites for Kubernetes Cluster in the README link provided above.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSV3IMK","label":"IBM DataStage as a Service"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
12 February 2025

UID

ibm17129979