How To
Summary
Setting up Remote Engine(IBM DataStage-aaS Anywhere) for IBM Cloud
Objective
Deploy a Datastage Remote Engine as a Kubernetes cluster or as a Container. This is a new offering called 'IBM DataStage-aaS Anywhere' for the IBM Public Cloud users.
Useful links:
https://dataplatform.cloud.ibm.com/docs/content/dstage/dsnav/topics/datastage-environments.html?audience=wdp&context=cpdaas#topic_hdt_v5y_spb__remoteeng
Environment
IBM Cloud with IBM DataStage-aaS Anywhere
Steps
The steps are as simple as documented in
But, this document is mainly created to show the steps in more detail with examples. So, keep the above link handy.
1. You would need a RH Linux System to setup the Remote Engine with Podman and jq installed.
2. Take a putty session to the RH Linux System where you are going to setup the Remote Engine.
[root@c31391v1 ~]# podman --version
podman version 4.6.1
[root@c31391v1 tmp]# sudo yum install jq
Updating Subscription Management repositories.
Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) 88 kB/s | 4.1 kB 00:00
Red Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs) 105 kB/s | 4.5 kB 00:00
Package jq-1.6-7.el8.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!
[root@c31391v1 ~]# git clone https://github.com/IBM/DataStage.git
Cloning into 'DataStage'...
remote: Enumerating objects: 589, done.
remote: Counting objects: 100% (194/194), done.
remote: Compressing objects: 100% (120/120), done.
remote: Total 589 (delta 88), reused 114 (delta 74), pack-reused 395
Receiving objects: 100% (589/589), 41.57 MiB | 26.92 MiB/s, done.
Resolving deltas: 100% (307/307), done.
[root@c31391v1 DataStage]# pwd
/root/DataStage
[root@c31391v1 DataStage]# git pull
Already up to date.
[root@c31391v1 DataStage]# openssl enc -aes-256-cbc -k secret -P -md sha1
salt=AADA90FC37AE1CA7
key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
iv =xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Keep the values of "key" and "iv" from the above output as we would be using it later.
3. Go to IBM Cloud ->IBM Cloud Pak for Data-> Access (IAM) on the left side pane. This would open up a new page with API keys. For internal Cloud, https://test.cloud.ibm.com/iam/apikeys
Go to 'API keys' from the left side pane and click on 'Create'


Download the json output from the following window after creating the API Key. The json file contains your IBM Cloud APIKey that we will use later.
4. You also need to fetch the IBM Cloud Container Registry API Key. This apikey will be used to download the images needed to run Remote Engine. Currently there is no way to generate this, so it needs to be requested via IBM Cloud Support: https://cloud.ibm.com/unifiedsupport.
5. Get the project ID of the project in the IBM Cloud that you want to configure the Remote Engine for.

6. If you are on Staging and using a free account then you will have to request to be bypassed via requesting allowlisting so you can setup a Remote Engine for a free account:
7. Now that you have all the pre-requisites in place, you are now ready to create a Remote Engine.
There are two ways to generate the command to create a Remote Engine.
There are two ways to generate the command to create a Remote Engine.
- From IBM Cloud -> Project ->Manage->Run->Remote (as shown below) and click on 'installation script':

Here, we will create the Remote Environment as a 'Container' using Podman. But, you can also create the Remote Engine as a Kubernetes Cluster. (See https://github.com/IBM/DataStage/tree/3f2285f6047f300c8f0ad13b0772407ab261d8fa/RemoteEngine)


This would generate the commands for you to start/stop/restart/clean the Remote Engine that you can run on your local Redhat system

- From Shell
Instead of generating command from the project, you can directly type the command using the README file (link provided above).
[root@c31391v1 docker]# pwd
/root/DataStage/RemoteEngine/docker
[root@c31391v1 docker]# ./dsengine.sh start -n 'MLGProjRemoteEngine' -a "$IBMCLOUD_APIKEY" -e "$ENCRYPTION_KEY" -i "$ENCRYPTION_IV" -p "$IBMCLOUD_CONTAINER_REGISTRY_APIKEY" --project-id "$PROJECT_ID" --home ys1dev
IBM DataStage Remote Engine 0.0.1
DATASTAGE_HOME=ys1dev
GATEWAY_URL=https://api.dataplatform.dev.cloud.ibm.com
PROJECT_ID=8c5c8500-a451-43a7-b8ac-c7b36715229d
REMOTE_ENGINE_PREFIX=MLGProjRemoteEngine
DOCKER_REGISTRY=icr.io/datastage
CONTAINER_MEMORY=4g
CONTAINER_CPUS=2
DOCKER_VOLUMES_DIR=/tmp/docker/volumes
Creating init scripts ...
Checking for existing container 'MLGProjRemoteEngine_runtime'
Existing container MLGProjRemoteEngine_runtime found in a stopped state
Starting container 'MLGProjRemoteEngine_runtime' ...
MLGProjRemoteEngine_runtime
waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 5 seconds
waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 10 seconds
waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 15 seconds
waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 20 seconds
waiting for MLGProjRemoteEngine_runtime to start... time elapsed: 25 seconds
waiting for MLGProjRemoteEngine_runtime to start ... time elapsed: 30 seconds
{"service_name":"ds-px-runtime","status":"ok","timestamp":"2024-03-06T05:57:10.487Z","version":"1.0.1667","px_version":"develop-11.7.1-2024.02.20.21.32.41","connector_version":"refs/heads/cpd/4.8.3-11.7.1-SCAPI_VERSION-10-"}
Started container MLGProjRemoteEngine_runtime in 30 seconds
Runtime Environment 'Remote Engine MLGProjRemoteEngine' is available, and can be used to run DataStage flows
You can now see the container running and you can exec into the pods as shown below. It looks pretty much like any regular CP4D instance. But, there is only one conductor node.
[root@c31391v1 DataStage]# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
79eaa3cd3606 icr.io/datastage/ds-px-runtime@sha256:072a38d760c5d88a11c6333998c9ddae5ed3f8e98915a93e0197b349f1e1f01e -c /px-storage/in... 5 days ago Up 22 hours MLGProjRemoteEngine_runtime
[root@c31391v1 DataStage]# podman exec -it 79eaa3cd3606 bash
bash-4.4# ls -alrt
total 8
drwxr-xr-x 2 root root 6 Jun 21 2021 srv
lrwxrwxrwx 1 root root 8 Jun 21 2021 sbin -> usr/sbin
drwxr-xr-x 2 root root 6 Jun 21 2021 mnt
drwxr-xr-x 2 root root 6 Jun 21 2021 media
lrwxrwxrwx 1 root root 9 Jun 21 2021 lib64 -> usr/lib64
lrwxrwxrwx 1 root root 7 Jun 21 2021 lib -> usr/lib
dr-xr-xr-x 2 root root 6 Jun 21 2021 boot
lrwxrwxrwx 1 root root 7 Jun 21 2021 bin -> usr/bin
drwx------ 2 root root 6 Feb 12 20:10 lost+found
drwxr-xr-x 1 root root 109 Feb 12 20:11 usr
drwxr-xr-x 1 root root 52 Feb 12 20:11 var
drwxr-xr-x 2 root root 25 Feb 21 05:33 licenses
drwxr-xr-x 1 root root 20 Feb 21 05:36 home
drwxr-xr-x 1 root root 17 Feb 21 05:36 opt
drwxr-xr-x 1 root root 28 Feb 21 05:36 debug
lrwxrwxrwx 1 root root 33 Feb 21 05:37 output -> /opt/ibm/wlp/output/defaultServer
lrwxrwxrwx 1 root root 38 Feb 21 05:37 config -> /opt/ibm/wlp/usr/servers/defaultServer
-rw-r--r-- 1 root root 692 Feb 21 13:14 ibm.com_IBM_DataStage_as_a_Service_Anywhere-1.0.0.swidtag
drwxr-xr-x 1 root root 4096 Feb 21 13:16 etc
dr-xr-x--- 1 root root 20 Feb 21 13:16 root
drwxrwxr-x 8 root root 106 Feb 28 18:27 ds-storage
dr-xr-xr-x 1 root root 99 Mar 1 17:13 ..
dr-xr-xr-x 1 root root 99 Mar 1 17:13 .
drwxr-xr-x 1 root root 27 Mar 1 17:13 run
drwxrwxr-x 11 root root 184 Mar 1 17:14 px-storage
dr-xr-xr-x 213 root root 0 Mar 5 23:56 proc
dr-xr-xr-x 13 root root 0 Mar 5 23:56 sys
drwxr-xr-x 5 root root 340 Mar 5 23:56 dev
drwxrwxrwx 1 dsuser dsuser 139 Mar 5 23:56 logs
drwxrwxrwt 1 root root 172 Mar 6 21:40 tmp
bash-4.4# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Mar05 ? 00:00:00 /bin/bash -c /px-storage/init-volume.sh;/opt/ibm/startup.sh
root 42 1 0 Mar05 ? 00:00:00 /bin/sh /opt/ibm/startup.sh
root 48 42 0 Mar05 ? 00:00:00 /bin/sh /opt/ibm/initScripts/startcontainer.sh
root 51 48 0 Mar05 ? 00:00:00 /bin/sh ./startup.sh
root 52 48 0 Mar05 ? 00:00:03 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -f /dev/null
root 132 1 0 Mar05 ? 00:01:02 /opt/ibm/PXService/Server/DSWLM/../../jdk/bin/java -Xmx1024m -classpath /opt/ibm/PX
root 225 1 0 Mar05 ? 00:06:44 /opt/java/bin/java -javaagent:/opt/ibm/wlp/bin/tools/ws-javaagent.jar -Djava.awt.he
root 343 51 0 Mar05 ? 00:00:03 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -f /dev/null
root 17723 0 0 21:54 pts/0 00:00:00 bash
root 17730 17723 0 21:55 pts/0 00:00:00 ps -ef
8. Go to the project and select the Remote Engine to be the default Engine to run the jobs on:

9. Test the connection by running a simple job:

Your Remote Engine is now good to use.
You can create a Remote Engine as a Kubernetes Cluster using similar steps. But, please check the pre-requisites for Kubernetes Cluster in the README link provided above.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB76","label":"Data Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSV3IMK","label":"IBM DataStage as a Service"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
12 February 2025
UID
ibm17129979