Known issues and limitations for DataStage
The following known issues and limitations apply to DataStage.
- Known issues
-
- General
-
- Link icons display incorrectly
- Parameter sets are created in the Root directory
- The last two parameters in PROJDEF cannot be edited
- Test connection does not work for local parameters
- Flows display status Not Compiled when status should be Unknown
- Match designer sorts records on each page, not across all records
- Jobs may fail after backup and restore with compute pod connection failure
- Migrated jobs that contain passwords encrypted with iisenc fail
- Previously queued jobs are running again when a backup is restored
- Schema file not generated when APT_WRITE_SCHEMA is set to FALSE
- DataStage backup fails with error on the upgraded environment version 5.1.3 to 5.2.1
- Installing the DataStage Enterprise Plus with the ArgoCD fails with
the
OutOfSyncstate
- Stages
-
- Java Integration stage custom properties might be lost if you reselect the JAR file
- Checksum values may change upon migration when NLS mapping is used
- Flows containing the REST or Data Service stages fail to run
- XML Output stage imported from .isx file fails to run if set to write output to a file
- Transformer stage fails to compile with large amounts of generated code
- Connectors
-
- Google Cloud Storage: Flows that use Iceberg table format fail with "Invalid bucket name error.."
- Sequential File connector: Parquet file format fails with Use DataStage properties selected
- Microsoft Azure Databricks connector: TIME data type is not supported
- DataStage jobs that use secrets from vault may fail when https_proxy environment variable is configured on the PXRuntime instance.
- Amazon Redshift Connector fails in ELT mode
- Db2 SSL connection fails due to enforced hostname validation
- DataStage flows with Apache Impala connector hang when using DataStage properties authentication
- Pipelines
-
- Environment variables with single quotes instead of double quotes are treated as string literals after migration
- Flows using storage volume files by mount path fail without a storage volume connection
- Non-ASCII characters are not recognized in several Orchestration Pipelines fields
- Number of nodes is limited by configuration size
- Unsupported functions
- Migration adds an extra node to a loop or exception handler
- ISX import sets up a "Continue pipeline on error" option on all Run nodes
- ISX file must contain all parallel and sequence jobs with all their dependencies
- Jobs migrate with the default Node cache settings
- Migration generates user-variable to share routine output across different sub-pipelines
- Migration adds extra nodes when a flow contains references to the missing parameters
- Optimized runner job fails on first run in new project
- Limitations
Known issues
- Known issues for general areas:
-
- Link icons display incorrectly
Applies to: 5.2.0
When you load a flow, link icons might display incorrectly if outdated icons have been cached. Workaround: Clear your browser cache.
- Parameter sets are created in the Root directory
Applies to: 5.2.0
When you create a parameter set, the parameter set is created in the Root directory, instead of the location you selected with Select folder.
- The last two parameters in PROJDEF cannot be edited
-
Applies to: 5.2.0
The last two parameters in PROJDEF cannot be edited because of the Save and Cancel buttons.
- Test connection does not work for local parameters
-
Applies to: 5.2.0
The Test connection feature does not work for connections with properties parameterized as local parameters instead of parameter sets.
- Flows display status Not Compiled when status should be Unknown
-
Applies to: 5.2.0
After upgrades, flows display status Not compiled instead of status Unknown, so flows that were compiled before an upgrade may not appear so. Workaround: Status is corrected when the flow is recompiled.
- Match designer sorts records on each page, not across all records
-
Applies to: 5.2.0
Result records in the Match designer are sorted on a per-page basis, not across all records. Workaround: Set the default page size to a number larger than the number of records.
- Jobs may fail with compute pod connection failure after backup and restore
-
Applies to: 5.2.0
After backup and restore, jobs may fail to run due to connection failure with the compute pods. Workaround: Restart the compute pods.oc -n ${PROJECT_CPD_INST_OPERANDS} delete pod -l app.kubernetes.io/component=px-compute
- Migrated jobs fail if they contain passwords encrypted with
iisenc -
Applies to: 5.2.0
Passwords encrypted with
iisencare not supported in migrated jobs. Workaround: Change passwords to cleartext.
- Previously queued jobs are running again when a backup is restored
-
Applies to: 5.2.0
When you create a backup and there are jobs in the queue, those jobs are restored and restarted once the backup is recovered. Workaround: You can use the following command to change the default 48-hour threshold.# change the queued job recovery threshold to 24 hours oc set env deploy <instance-name>-ibm-datastage-px-runtime DS_QUEUED_JOB_RECOVERY_HRS=24
- Schema file not generated when APT_WRITE_SCHEMA is set to FALSE
-
Applies to: 5.2.0
When the environment variable APT_WRITE_SCHEMA is set to FALSE, schema files (.fs/.ds) are not generated. These schema files are required by
cpdctlto view datasets or filesets. Without the schema file, users might not be able to view the dataset or fileset contents usingcpdctl.Workaround: Set APT_WRITE_SCHEMA=TRUE to ensure schema files are generated.
- DataStage backup fails with error on the upgraded environment version 5.1.3 to 5.2.1
-
Applies to:5.2.1
Backup fails with the following error:missing properties \'addon-name\', \'owner\': aux-meta validationWorkaround: Delete the ConfigMap and let it regenerate:- Delete the existing
Configmap
oc delete cm datastage-maint-aux-ckpt-cm -n ${PROJECT_CPD_INST_OPERANDS}oc delete cm datastage-maint-aux-br-cm -n ${PROJECT_CPD_INST_OPERANDS} - Wait for the ConfigMap to
regenerate
oc get cm -n ${PROJECT_CPD_INST_OPERANDS} | egrep "datastage-maint-aux-ckpt-cm|datastage-maint-aux-br-cm"
- Delete the existing
Configmap
- Installing the DataStage
Enterprise Plus with the ArgoCD fails with the
OutOfSyncstate -
Applies to:5.4.0
- Known issues for stages:
-
- Java Integration stage custom properties might be lost if you reselect the JAR file
-
Applies to: 5.2.0
Custom properties in the Java Integration stage might be lost if you reselect the JAR file.
- Checksum values may change after migration to a new platform when NLS mapping is used
-
Applies to: 5.2.0
When you migrate from traditional DataStage on platforms like AIX and Windows to modern DataStage on Linux, values generated by the Checksum stage may change, particularly if NLS mapping is in use.
- Flows containing the REST or Data Service stages fail to run
-
Applies to: 5.2.0
Due to security fixes, flows containing the REST or Data Service stages may throw errors regarding the JAR files and fail to run. Workaround: Recompile the flows.
- XML Output stage imported from .isx file fails to run if set to write output to a file
-
Applies to: 5.2.0
If a flow imported from an ISX file contains an XML Output stage with Write output to a file selected, the flow will fail to run. Workaround: Deselect Write output to a file and add a Sequential file connector after the XML Output stage. Use the Sequential file connector to write your output into a file.
- Transformer stage fails to compile with large amounts of generated code
-
Applies to: 5.2.0
Flows with large amounts of generated and nested code in the Transformer stage fail to compile due to resource limits. Workaround: Increase the PX-runtime resource limits.
- Known issues for connectors:
-
See also Known issues for Common core services for the issues that affect connectors that are also used for other services in Cloud Pak for Data.
- Limited support for watsonx.data™ Presto catalog types
-
Applies to: 5.2.0
The IBM watsonx.data Presto connector supports only Iceberg catalog, and IBM Cloud Object Storage and Amazon S3 bucket types, when writing in DataStage.
- Google Cloud Storage: Flows that use Iceberg table format fail with "Invalid bucket name error.."
-
Applies to: 5.2.0
If your DataStage flow includes data from a Google Cloud Storage connection and you select the Iceberg table format, the flow will fail unless you specify the full path to the bucket in the Endpoint folder property. For example, bucket-name/path-to-table-location. If you also specify the bucket name in the Google Cloud Storage connection form or in the Google Cloud Storage connector properties, you must use the same path that you specify in the Endpoint folder property.
- Sequential File connector: Parquet file format does not work when Use DataStage properties is selected
-
Applies to: 5.2.0
The Sequential File connector fails to run with Parquet file format when Use DataStage properties is selected. If you deselect Use DataStage properties, the file format automatically switches from Parquet to CSV.
Workaround: Deselect Use DataStage properties and reselect Parquet under File format.
- Known issues for DataStage in Pipelines
-
These known issues are DataStage-specific. For known issues in Pipelines not listed here, see Known issues and limitations for Orchestration Pipelines.
Limitations
- Limitations for general areas:
-
- Special characters in column names are not supported
-
Applies to: 5.2.0
Special characters such as
'in column names are not supported. The column name must start with a letter or underscore_character. The column name can contain only alphanumeric and underscore ASCII or unicode characters.
- Structured query access to metadata is not supported
Applies to: 5.2.0
Structured query access to DataStage metadata is not supported.
Workaround: You can export the project as a .zip file, unzip the file, and use a text tool to search. You can also check the project into a git repository and use GitHub search.
- Match designer does not support weight contribution
-
Applies to: 5.2.0
Weight contribution for weight comparison is not supported for the Match designer.
- Project import and export does not retain the data in file sets and data sets
Applies to: 5.2.0
Project-level import and exports do not package file set and data set data into the .zip file, so flows that use data sets and file sets will fail to run after export. Workaround: Rerun the jobs that create those data sets and file sets to reestablish those objects.
- File sets for data over 500 M are not exported
Applies to: 5.2.0
If the size of the actual files in a file set is more than 500 M, no data will be stored in the exported zip file.
- Function libraries do not support the
const char*return type Applies to: 5.2.0
User-defined functions with theconst char*return type are not supported.
- Status updates are delayed in completed job instances
Applies to: 5.2.0
When multiple instances of the same job are run, some instances continue to display a "Running" status for 8-10 minutes after they have completed. Workaround: Use thedsjob jobruncleancommand and specify a job name and run-id to delete an active job.
- Node pool constraint is not supported
Applies to: 5.2.0
The node pool constraint property is not supported.
- Reading FIFOs on persistent volumes across pods causes stages to hang
Applies to: 5.2.0
Reading FIFOs on persistent volumes across pods is not supported and causes the stage reading the FIFO to hang. Workaround: Constrain the job to a single pod by setting APT_WLM_COMPUTE_PODS=1.
- Unassigned environment variables and parameter sets are not migrated
Applies to: 5.2.0
Environment variables and parameter sets that have not been assigned a value will be skipped during export. When jobs are migrated, they contain only those environment variables and parameter sets that have been assigned a value for that job.
- Migrated jobs fail reading FIFOs on a mounted volume and might hang
-
Applies to: 5.2.0
Using FIFOs on mounted volumes for data transfer between conductor and player pods is not supported. FIFOs created in mounted volumes may behave unpredictably across container boundaries or nodes. If the pods are on different nodes, there is no shared kernel state.
- Message handler is not included when exporting a project via the UI
- px‑runtime cannot run
mkfifoon Persistent Volume whenenableScratchDiskPVis set to true -
Applies to: 5.3.1
When
enableScratchDiskPVis set to true, the px‑runtime uses the Persistent Volume (PV) for scratch space.However, the
mkfifocommand is not supported on the PV. Ifmkfifois needed, px‑runtime automatically falls back to using/tmpon the pod.All other scratch‑space functions continues to use the PV as expected.
- Limitations for connectors:
-
- Apache Hive and Apache Impala connectors: Kerberos SSO authentication does not work
-
Applies to: 5.2.0
Kerberos SSO authentication does not work for the following connectors when used in the DataStage service:
- Apache Hive when you select Use DataStage properties in the Output tab for a source node or the Input tab for a target node.
- Apache Impala
- Complex flows using Google BigQuery cannot be executed in ELT mode with Link as view
-
Applies to: 5.2.0
Complex flows with Link as view set as their materialization policy may fail to run in ELT mode due to the nested view limitations of Google BigQuery.
- Recompile flows created with personal credentials by a different user
-
Applies to: 5.2.0
If you want to run a flow that was created by a different user and the flow includes data from a connection that was created with personal credentials, you need to recompile the flow and enter your own personal credentials for the connection.
- Only one data asset can be created with the Sequential file connector
-
Applies to: 5.2.0
When you select Create data asset in the Sequential file connector, a single data asset is created even if multiple file names are provided. Only the first file becomes a data asset.
- Previewing data and using the asset browser to browse metadata do not work for these connections:
-
Applies to: 5.2.0
- Apache Cassandra for DataStage®
- Apache HBase
- IBM MQ
- "Test connection" does not work for these connections:
-
Applies to: 5.2.0
- Apache Cassandra for DataStage
Applies to: 5.2.1
- Teradata source in ODBC connector: test connection and data preview are not supported.
- File-based connectors only support the ISO-8601 format for timestamp with time zone support
- Applies to:5.2.0
- Oracle stored procedures do not accept Date/Time/Timestamp literals in CALL statements
- Azure Data Lake Storage does not work with the current IBM watsonx.data credential provider