Managing a remote engine for DataStage Anywhere

DataStage® Anywhere supports maintenance, updates, and other data considerations with remote runtime engines.

Maintenance

Manage your data plane through the remote engine. For more information on managing a remote engine, see DataStage Remote Engine using Docker. To update your remote engine with the automated scripts, download the container image to your internal registries and deploy it. The scripts include information on controls including creating, running, cleaning, and upgrading a remote engine.

Scaling

You can add or remove remote engines to scale deployments throughout the month. There is no deployment limit, but you are charged for the maximum amount of VPCs deployed each month whether or not they are used.

Disaster recovery

Deploy additional remote engines to support disaster recovery.

Data observability

You can put an observability solution in place within your container management platform. Databand is integrated with DataStage Anywhere and can monitor DataStage pipelines.

Storage

The DataStage operator mounts default storage to the remote engine's Kubernetes pods. To add additional storage with persistent volumes, see Setting up an NFS mount in DataStage.

Enabling alternative Cloud Object Storage location for remote engine logs

By default job run logs for the remote engine are pushed to the default bucket in IBM Cloud® Object Storage (COS). You can enable an alternate COS location for storing the job run logs.

To disable pushing job run logs to IBM Cloud Object Storage default bucket for the Kubernetes deployment, use the following command:

kubectl -n <namespace> set env deployment/<instance-name>-ibm-datastage-px-runtime DISABLE_REMOTE_LOG_PUSH=true

To enable pushing logs to the alternative COS location for the Kubernetes deployment, use the following command that creates the secret containing new COS location:

kubectl -n <namespace> create secret generic datastage-log-cos-location \
--from-literal=CUSTOM_S3_BUCKET_NAME=<bucket-name> \
--from-literal=CUSTOM_S3_REGION=<region> \
--from-literal=CUSTOM_S3_ENDPOINT=<endpoint> \
--from-literal=CUSTOM_S3_ACCESS_KEY=<access-key> \
--from-literal=CUSTOM_S3_SECRET_KEY=<secret-key>

This command triggers pod restart. If you use the disabling command first, you must restart the pod manually.

Importing / exporting assets

To avoid job environment configuration issues, use the DataStage specific import / export function.

Create new project and update project settings to bind it to the remote engine.
Use cpdctl dsjob to export assets from the original project.
Use cpdct dsjob to import assets to the new project that is bound to the remote engine.

Setting proxy information

Container deployment supports proxy information. To set the proxy information to a remote engine for DataStage Anywhere, set the following environment variable in the container:

REMOTE_HTTPS_PROXY=http://username:password@host:port

Proxy information is not available for Kubernetes deployment.

Custom resource name limitation

When you set up custom resource (CR), the CR name must have less than 28 characters.

Configuring parameters on a job or project level

You can manually override parameters on a job or project level by using the cpdctl command in the remote engine pod.

On a job level, you can provide the following environment variable: APT_PARAM_VALUE_FILE with a parameter file to override as value. For example:

APT_PARAM_VALUE_FILE=/ds-storage/param.txt

On a project level, run the following command: /ds-storage/PXRuntime/Projects/<projectId>/.local_jpfile in the suggested directory.