Known issues and limitations for DataStage

The following known issues and limitations apply to DataStage.

Known issues

General

Stages

Connectors

Google Cloud Storage: Flows that use Iceberg table format fail with "Invalid bucket name error.."
Sequential File connector: Parquet file format fails with Use DataStage properties selected
Microsoft Azure Databricks connector: TIME data type is not supported
DataStage jobs that use secrets from vault may fail when https_proxy environment variable is configured on the PXRuntime instance.
Amazon Redshift Connector fails in ELT mode
Db2 SSL connection fails due to enforced hostname validation
DataStage flows with Apache Impala connector hang when using DataStage properties authentication

Pipelines

Limitations

Limitations for general areas
Limitations for connectors

Known issues

Known issues for general areas:

Link icons display incorrectly: Applies to: 5.2.0
When you load a flow, link icons might display incorrectly if outdated icons have been cached. Workaround: Clear your browser cache.

Parameter sets are created in the Root directory: Applies to: 5.2.0
When you create a parameter set, the parameter set is created in the Root directory, instead of the location you selected with Select folder.

The last two parameters in PROJDEF cannot be edited

Applies to: 5.2.0

The last two parameters in PROJDEF cannot be edited because of the Save and Cancel buttons.

Test connection does not work for local parameters

Applies to: 5.2.0

The Test connection feature does not work for connections with properties parameterized as local parameters instead of parameter sets.

Flows display status Not Compiled when status should be Unknown

Applies to: 5.2.0

After upgrades, flows display status Not compiled instead of status Unknown, so flows that were compiled before an upgrade may not appear so. Workaround: Status is corrected when the flow is recompiled.

Match designer sorts records on each page, not across all records

Applies to: 5.2.0

Result records in the Match designer are sorted on a per-page basis, not across all records. Workaround: Set the default page size to a number larger than the number of records.

Jobs may fail with compute pod connection failure after backup and restore

Applies to: 5.2.0

After backup and restore, jobs may fail to run due to connection failure with the compute pods. Workaround: Restart the compute pods.

oc -n ${PROJECT_CPD_INST_OPERANDS} delete pod -l app.kubernetes.io/component=px-compute

Migrated jobs fail if they contain passwords encrypted with iisenc

Applies to: 5.2.0

Passwords encrypted with iisenc are not supported in migrated jobs. Workaround: Change passwords to cleartext.

Previously queued jobs are running again when a backup is restored

Applies to: 5.2.0

When you create a backup and there are jobs in the queue, those jobs are restored and restarted once the backup is recovered. Workaround: You can use the following command to change the default 48-hour threshold.

# change the queued job recovery threshold to 24 hours
oc set env deploy <instance-name>-ibm-datastage-px-runtime DS_QUEUED_JOB_RECOVERY_HRS=24

Schema file not generated when APT_WRITE_SCHEMA is set to FALSE

Applies to: 5.2.0

When the environment variable APT_WRITE_SCHEMA is set to FALSE, schema files (.fs/.ds) are not generated. These schema files are required by cpdctl to view datasets or filesets. Without the schema file, users might not be able to view the dataset or fileset contents using cpdctl.

Workaround: Set APT_WRITE_SCHEMA=TRUE to ensure schema files are generated.

DataStage backup fails with error on the upgraded environment version 5.1.3 to 5.2.1

Applies to:5.2.1

Backup fails with the following error:

missing properties \'addon-name\', \'owner\': aux-meta validation

Workaround: Delete the ConfigMap and let it regenerate:

Delete the existing Configmap

oc delete cm datastage-maint-aux-ckpt-cm -n ${PROJECT_CPD_INST_OPERANDS}

oc delete cm datastage-maint-aux-br-cm -n ${PROJECT_CPD_INST_OPERANDS}

Wait for the ConfigMap to regenerate

oc get cm -n ${PROJECT_CPD_INST_OPERANDS} | egrep "datastage-maint-aux-ckpt-cm|datastage-maint-aux-br-cm"

Installing the DataStage Enterprise Plus with the ArgoCD fails with the OutOfSync state: Applies to:5.4.0; When you use ArgoCD to install the datastage-ent-plus together with components that have dependency on the datastage-ent, it shows that some of the DataStage applications that are generated by ArgoCD are OutOfSync because datastage-ent and datastage-ent-plus components cannot co-exist.
Workaround:Install datastage-ent-plus first and then other components. Ignore the OutOfSync status if DataStage was installed successfully.

Known issues for stages:

Java Integration stage custom properties might be lost if you reselect the JAR file

Applies to: 5.2.0

Custom properties in the Java Integration stage might be lost if you reselect the JAR file.

Checksum values may change after migration to a new platform when NLS mapping is used

Applies to: 5.2.0

When you migrate from traditional DataStage on platforms like AIX and Windows to modern DataStage on Linux, values generated by the Checksum stage may change, particularly if NLS mapping is in use.

Flows containing the REST or Data Service stages fail to run

Applies to: 5.2.0

Due to security fixes, flows containing the REST or Data Service stages may throw errors regarding the JAR files and fail to run. Workaround: Recompile the flows.

XML Output stage imported from .isx file fails to run if set to write output to a file

Applies to: 5.2.0

If a flow imported from an ISX file contains an XML Output stage with Write output to a file selected, the flow will fail to run. Workaround: Deselect Write output to a file and add a Sequential file connector after the XML Output stage. Use the Sequential file connector to write your output into a file.

Transformer stage fails to compile with large amounts of generated code

Applies to: 5.2.0

Flows with large amounts of generated and nested code in the Transformer stage fail to compile due to resource limits. Workaround: Increase the PX-runtime resource limits.

Known issues for connectors:

See also Known issues for Common core services for the issues that affect connectors that are also used for other services in Cloud Pak for Data.

Limited support for watsonx.data™ Presto catalog types

Applies to: 5.2.0

The IBM watsonx.data Presto connector supports only Iceberg catalog, and IBM Cloud Object Storage and Amazon S3 bucket types, when writing in DataStage.

Google Cloud Storage: Flows that use Iceberg table format fail with "Invalid bucket name error.."

Applies to: 5.2.0

If your DataStage flow includes data from a Google Cloud Storage connection and you select the Iceberg table format, the flow will fail unless you specify the full path to the bucket in the Endpoint folder property. For example, bucket-name/path-to-table-location. If you also specify the bucket name in the Google Cloud Storage connection form or in the Google Cloud Storage connector properties, you must use the same path that you specify in the Endpoint folder property.

Sequential File connector: Parquet file format does not work when Use DataStage properties is selected

Applies to: 5.2.0

The Sequential File connector fails to run with Parquet file format when Use DataStage properties is selected. If you deselect Use DataStage properties, the file format automatically switches from Parquet to CSV.

Workaround: Deselect Use DataStage properties and reselect Parquet under File format.

Microsoft Azure Databricks connector: TIME data type is not supported

Applies to: 5.2.0

The TIME data type is supported except when you use the following extensions:

timezone
microsecond + timezone

DataStage jobs that use secrets from vault may fail when https_proxy environment variable is configured on the PXRuntime instance.

Applies to: 5.2.0

When https_proxy environment variable is set, the REST API call to retrieve the secrets from vault is also being routed through the proxy server.

Workaround:Set the no_proxy environment variable to include

cluster.local,.svc,datastage-ibm-datastage-ds-nginx

See the following example:

# Set no_proxy env in px-runtime and px-compute
# Append '.cluster.local,.svc,datastage-ibm-datastage-ds-nginx' to current value if it's already set
oc set env deploy/ds-px-default-ibm-datastage-px-runtime no_proxy=".svc,datastage-ibm-datastage-ds-nginx"
oc set env sts/ds-px-default-ibm-datastage-px-compute no_proxy=".cluster.local,.svc,datastage-ibm-datastage-ds-nginx"

Amazon Redshift Connector fails in ELT mode

Applies to: 5.2.0

When running a flow using the Amazon Redshift Connector in ELT mode, the job fails with the error message: Digital envelope routines unsupported for ELT. This issue is specific to ELT mode execution.

Workaround: Run the flow in ETL.

Db2 SSL connection fails due to enforced hostname validation

Applies to: 5.3.1 and later

A driver update for all Db2 SCAPI flavors enables SSL hostname validation by default. Previously, hostname validation was not enforced. As a result, existing Db2 SSL connections that use certificates with a hostname that does not match the hostname defined in the connection will fail.

Workaround: You can fix your connection in one of the following ways:

Update the SSL certificate so that its hostname matches the hostname specified in the connection. This option is recommended for security purposes.
Select Skip certificate hostname validation to use your connection without validating SSL certificate hostname.

After you update the connection, recompile any affected DataStage flows.

DataStage flows with Apache Impala connector hang when using DataStage properties authentication

Applies to: 5.3.1 and later

DataStage flows might hang and time out when the Apache Impala connector is configured with the Use DataStage properties property and the Username and password authentication method. Due to the driver update from 5.1.4.000121 to 6.0.0.1279, the driver update changed the default authentication from none to LDAP username and password, which causes mismatch in authentication.

Workaround: Set the following environment variable either at the DataStage flow level or in the project runtime environment settings. This workaround disables authentication. Using LDAP or Kerberos is recommended for secure authentication.

$CC_HIVE_CONNECTION_ATTRIBUTES_TO_OVERRIDE="AuthenticationMethod=none”

If, after recompiling, DataStage flows with the Use DataStage properties property are still failing, complete the following steps:

In the Apache Impala connection settings, change authentication method from Username and password to a different method.
Re-select the Username and password option.
Save and recompile your flow.

Known issues for DataStage in Pipelines

These known issues are DataStage-specific. For known issues in Pipelines not listed here, see Known issues and limitations for Orchestration Pipelines.

Environment variables without double quotes are treated as string literals after migration

Applies to: 5.2

In traditional DataStage, parameters and variables enclosed in single quotes are replaced before being passed to bash. In modern DataStage, expressions are passed directly to bash and executed as a bash script. Environment variables surrounded by single quotes will be treated as string literals. Workaround: Replace single quotes around environment variables with double quotes. See the following examples.

# In the following line of code, single quotes surround the environment variable ${e1_loop1_Counter}p'.

sed -n '${e1_loop1_Counter}p' `${STAGING_PATH}/attachmentdocID.txt` > ${STAGING_PATH}/${GETEACHID_FILE}

# With double quotes added: 

sed -n "${e1_loop1_Counter}p" ${STAGING_PATH}/attachmentdocID.txt > STAGINGPATH/{GETEACHID_FILE}

# Single quotes
grep -i '^#MASCARA_ARCHIVO_ENTRADA#.#EXTENSION_ARCHIVO#$' 
# Double quotes
grep -i "^#MASCARA_ARCHIVO_ENTRADA#.#EXTENSION_ARCHIVO#$"

A storage volume connection must be created in a project to support use of files in that storage volume's mount path

Applies to: 5.2.0

If a file in a storage volume is being accessed by mount path, the flow will fail if a storage volume connection has not been made for that volume. If the flow is exported to another project, the flow will fail if a storage volume connection has not been made in that project. Workaround: Create a storage volume connection.

Non-ASCII characters are not recognized in several Orchestration Pipelines fields

Applies to: 5.2.0

In the following fields in Orchestration Pipelines, non-ASCII characters cannot be used:

Pipeline/job parameter name
User variable name
Environment variable name
Output variable name
Email address

The number of nodes is limited by configuration size: Applies to: 5.2.0
The number of standard nodes across all active pipelines can be no more than 500 for a SMALL configuration, 750 for a MEDIUM, and 1000 for a LARGE.

Unsupported functions

Applies to: 5.2.0

Unsupported functions return "1" or "unsupported."

Migration adds an extra node to a loop or exception handler

Applies to: 5.2.0

When outside nodes are accessed inside a loop or exception handler, migration adds an extra Set user variables node.

ISX import sets up a "Continue pipeline on error" option on all Run nodes

Applies to: 5.2.0

By default, nodes are set to Continue pipeline on error. When Automatically handle activities that fail is selected, all migrated nodes will be set to Fail on pipeline error unless a condition is defined on a link for when an error is thrown.

ISX file must contain all parallel and sequence jobs with all their dependencies

Applies to: 5.2.0

During migration, all parallel and sequence jobs along with all their dependencies need to be included in the ISX file. If one sequence job depends on another sequence job, but the dependent sequence job is not included in the migrated ISX file, migration marks the dependent job as Run DataStage job node instead of Run Pipeline job node. Also, migration creates extra nodes for any missing parameters.

Jobs migrate with the default Node cache settings

Applies to: 5.2.0

If the option Add checkpoints so sequence is restartable on failure is set on a sequence job level, the job migrates with the Enable caching for specific nodes in node properties panel caching method. In the cache usage section, the migration also sets the Use cache when all selected conditions are met as a default option with both Retrying from a previous failed run and Pipeline version is unchanged from previous run set. At a node level, Create data cache at this node is selected. If Do not enable checkpoint is selected at the node level, Create data cache at this node option at that node is not selected during migration. For more information on node caches, see Manage default settings.

Migration generates user-variable to share routine output across different sub-pipelines

Applies to: 5.2.0

To share routine output across different sub-pipelines, the Routine Activity node migrates as a Run Bash script node plus Set user variables node. Run Bash node generates a placeholder script that can be updated.

Migration adds extra nodes when a flow contains references to the missing parameters

Applies to: 5.2.0

If a flow contains references to the missing parameters or parameter sets, migration creates an additional Bash script node before the node which references the missing parameters, and displays an unbound_reference_warning message in a flow import. The naming for the inserted node starts with node. The Bash script node generates a placeholder script and adds the missing parameters as the node’s output. The node with the missing parameter reference is then updated to call the output of the Bash script node.

Optimized runner job fails on first run in a new project

Applies to: 5.2.0

When an optimized runner job is triggered from the Jobs page in the UI within a newly created project, the initial run may fail. This issue occurs only on the first execution in a new project environment.

Workaround: Re-run the job. Subsequent runs are expected to complete successfully.

Limitations

Limitations for general areas:

Special characters in column names are not supported

Applies to: 5.2.0

Special characters such as ' in column names are not supported. The column name must start with a letter or underscore _ character. The column name can contain only alphanumeric and underscore ASCII or unicode characters.

Structured query access to metadata is not supported

Applies to: 5.2.0

Structured query access to DataStage metadata is not supported.

Workaround: You can export the project as a .zip file, unzip the file, and use a text tool to search. You can also check the project into a git repository and use GitHub search.

Match designer does not support weight contribution

Applies to: 5.2.0

Weight contribution for weight comparison is not supported for the Match designer.

Project import and export does not retain the data in file sets and data sets: Applies to: 5.2.0
Project-level import and exports do not package file set and data set data into the .zip file, so flows that use data sets and file sets will fail to run after export. Workaround: Rerun the jobs that create those data sets and file sets to reestablish those objects.

File sets for data over 500 M are not exported: Applies to: 5.2.0
If the size of the actual files in a file set is more than 500 M, no data will be stored in the exported zip file.

Function libraries do not support the const char* return type: Applies to: 5.2.0
User-defined functions with the const char* return type are not supported.

Status updates are delayed in completed job instances: Applies to: 5.2.0
When multiple instances of the same job are run, some instances continue to display a "Running" status for 8-10 minutes after they have completed. Workaround: Use the dsjob jobrunclean command and specify a job name and run-id to delete an active job.

Node pool constraint is not supported: Applies to: 5.2.0
The node pool constraint property is not supported.

Reading FIFOs on persistent volumes across pods causes stages to hang: Applies to: 5.2.0
Reading FIFOs on persistent volumes across pods is not supported and causes the stage reading the FIFO to hang. Workaround: Constrain the job to a single pod by setting APT_WLM_COMPUTE_PODS=1.

Unassigned environment variables and parameter sets are not migrated: Applies to: 5.2.0
Environment variables and parameter sets that have not been assigned a value will be skipped during export. When jobs are migrated, they contain only those environment variables and parameter sets that have been assigned a value for that job.

Migrated jobs fail reading FIFOs on a mounted volume and might hang

Applies to: 5.2.0

Using FIFOs on mounted volumes for data transfer between conductor and player pods is not supported. FIFOs created in mounted volumes may behave unpredictably across container boundaries or nodes. If the pods are on different nodes, there is no shared kernel state.

Message handler is not included when exporting a project via the UI

Applies to: 5.3.1

When exporting a project using the Export Project option in the UI, the message handler is not included in the exported package. This is a functional limitation, not an error condition. The message handler is included when downloading a flow with dependencies instead.

px‑runtime cannot run mkfifo on Persistent Volume when enableScratchDiskPV is set to true

Applies to: 5.3.1

When enableScratchDiskPV is set to true, the px‑runtime uses the Persistent Volume (PV) for scratch space.

However, the mkfifo command is not supported on the PV. If mkfifo is needed, px‑runtime automatically falls back to using /tmp on the pod.

All other scratch‑space functions continues to use the PV as expected.

Limitations for connectors:

Apache Hive and Apache Impala connectors: Kerberos SSO authentication does not work

Applies to: 5.2.0

Kerberos SSO authentication does not work for the following connectors when used in the DataStage service:

Apache Hive when you select Use DataStage properties in the Output tab for a source node or the Input tab for a target node.
Apache Impala

Complex flows using Google BigQuery cannot be executed in ELT mode with Link as view

Applies to: 5.2.0

Complex flows with Link as view set as their materialization policy may fail to run in ELT mode due to the nested view limitations of Google BigQuery.

Recompile flows created with personal credentials by a different user

Applies to: 5.2.0

If you want to run a flow that was created by a different user and the flow includes data from a connection that was created with personal credentials, you need to recompile the flow and enter your own personal credentials for the connection.

Only one data asset can be created with the Sequential file connector

Applies to: 5.2.0

When you select Create data asset in the Sequential file connector, a single data asset is created even if multiple file names are provided. Only the first file becomes a data asset.

Previewing data and using the asset browser to browse metadata do not work for these connections:

Applies to: 5.2.0

Apache Cassandra for DataStage®
Apache HBase
IBM MQ

"Test connection" does not work for these connections:

Applies to: 5.2.0

Apache Cassandra for DataStage

Applies to: 5.2.1

Teradata source in ODBC connector: test connection and data preview are not supported.

File-based connectors only support the ISO-8601 format for timestamp with time zone support

Applies to:5.2.0

File-based connectors only support the ISO-8601 format when reading from a data source or writing to a data source a column of type timestamp with the timezone.

The format of the timestamp is: yyyy MMM dd HH:mm:ss.SSS zzz

Oracle stored procedures do not accept Date/Time/Timestamp literals in CALL statements

Applies to: 5.3.1

When running a flow that calls an Oracle stored procedure with Date, Time, or Timestamp input parameters, the job fails if literal values or named parameters (@column_name) are used in the CALL statement. These data types must be passed by using parameter binding.


-- Correct syntax
CALL my_procedure(?, ?, ?)

-- Incorrect syntax
CALL my_procedure(@date_column, @time_column, @timestamp_column)

Azure Data Lake Storage does not work with the current IBM watsonx.data credential provider

Applies to: 5.3.1

When running a flow that uses the Azure Data Lake Storage authentication may fail because the current version of the IBM watsonx.data credential provider is not compatible with the client used by the connector. The connector does not accept the SAS tokens returned by the current IBM watsonx.data credential provider.

Configure the IBM watsonx.data credential provider to return a bearer token, which is accepted by the existing Azure Data Lake Storage client.