Known issues and limitations for DataStage

The following known issues and limitations apply to DataStage.

Known issues
General
Stages
Connectors
Pipelines
Limitations

Known issues

Known issues for general areas:

Flows using the HBase client fail to run

Applies to: 4.8.5 and later

The HBase client has been removed from PX-runtime, so flows using the client will fail to run. Workaround: You can install the H Base client with the following steps:
  1. Log into the cluster with oc.
  2. Get the base client from https://hbase.apache.org/downloads.html. Be sure to follow verification instructions.
    wget https://dlcdn.apache.org/hbase/2.5.8/hbase-2.5.8-bin.tar.gz
  3. Extract the compressed file.
    tar -zxvf hbase-2.5.8-bin.tar.gz
  4. If another H Base client was previously manually added to an instance, clean that client. Skip this step if a client has not been previously added.
    #!/bin/bash
    
    # Update to the desired instance.
    INSTANCE=ds-px-default
    
    POD=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1 |head -n 1)
    echo "Cleaning up HbaseClient for instance ${INSTANCE} via Pod $POD"
    
    # rm client files
    oc exec ${POD} -- rm -fr /px-storage/HbaseClient
    
    # rm sym link
    oc exec ${POD} -- rm -fr /opt/ibm/PXService/HbaseClient
  5. Populate the engine with the extracted client's /lib directory's contents.
    #!/bin/bash
    
    # Update to the desired instance.
    INSTANCE=ds-px-default
    
    PODS=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1)
    POD=$(echo "${PODS}" |head -n 1)
    
    SOURCE=<absolute path to the extracted client>/hbase-2.5.8/lib
    
    oc cp ${SOURCE} ${POD}:px-storage/HbaseClient/
    
    oc delete pod ${PODS}
    
  6. Restart the pods and check to make sure the folder /opt/ibm/PXService/HbaseClient contains the .jar files.
    #!/bin/bash
    # Update to the desired instance.
    INSTANCE=ds-px-default
    POD=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1 |head -n 1)
    oc exec ${POD} -- ls /opt/ibm/PXService/HbaseClient |head -n 5
Flows display status Not Compiled when status should be Unknown

Applies to: 4.8.4 and later

After upgrades, flows display status Not compiled instead of status Unknown, so flows that were compiled before an upgrade may not appear so. Workaround: Status is corrected when the flow is recompiled.

Match designer sorts records on each page, not across all records

Applies to: 4.8.4 and later

Result records in the Match designer are sorted on a per-page basis, not across all records. Workaround: Set the default page size to a number larger than the number of records.

Jobs may fail with compute pod connection failure after backup and restore

Applies to: 4.8.0 and later

After backup and restore, jobs may fail to run due to connection failure with the compute pods. Workaround: Restart the compute pods.
oc -n ${PROJECT_CPD_INST_OPERANDS} delete pod -l app.kubernetes.io/component=px-compute
Migrated jobs fail if they contain passwords encrypted with iisenc

Applies to: 4.8.0 and later

Passwords encrypted with iisenc are not supported in migrated jobs. Workaround: Change passwords to cleartext.

Data asset displays created time instead of last modified time

Applies to: 4.8.0 and later

Data assets created with sequential file may not update the Last modified field when their metadata is modified. Note that the value of Last modified represents the time when the metadata of the asset was last modified, not the last modified time of the physical file.

Five or more PXRuntime instances cannot be created without increasing operator memory

Applies to: 4.8.0 and later

To create five or more PXRuntime instances, a user must update the CSV and increase the memory limit of the operator pod. Workaround: Get the DataStage cluster service version in the operator namespace.
oc -n ${PROJECT_CPD_INST_OPERATORS} get csv | grep ibm-cpd-datastage-operator
Patch the CSV to increase operator pod memory to 2Gi.
oc -n ${PROJECT_CPD_INST_OPERATORS}  patch csv <datastage-csv-name> --type='json' -p='[{"path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", "value": "2Gi", "op": "replace"}]'
Known issues for stages:

Flows containing the REST or Data Service stages fail to run

Applies to: 4.8.2 and later

Due to security fixes, flows containing the REST or Data Service stages may throw errors regarding the JAR files and fail to run. Workaround: Recompile the flows.

XML Output stage imported from .isx file fails to run if set to write output to a file

Applies to: 4.8.0 and later

If a flow imported from an ISX file contains an XML Output stage with Write output to a file selected, the flow will fail to run. Workaround: Deselect Write output to a file and add a Sequential file connector after the XML Output stage. Use the Sequential file connector to write your output into a file.

Transformer stage fails to compile with large amounts of generated code

Applies to: 4.8.0 and later

Flows with large amounts of generated and nested code in the Transformer stage fail to compile due to resource limits. Workaround: Increase the PX-runtime resource limits.

Known issues for connectors:

See also Known issues for Common core services for the issues that affect connectors that are used for other services in Cloud Pak for Data.


IBM MQ connection: "SSL connection" option is not working

Applies to: 4.8.5 and later

In the Create connection form, under the Certificates (optional) section, there is a selection for SSL connection. This option does not work. Instead, if you want to connect to IBM MQ with a secure connection, follow instructions at Configuring SSL for the IBM MQ connection.

Dremio Cloud: Connections fail in parallel execution mode

Applies to: 4.8.3 and later

If you connect to a Dremio Cloud instance and you run a DataStage flow or job in parallel execution mode, the flow will fail.

Amazon S3: Flows that use Delta Lake table format fail with "Internal error occurred: Assertion error..."

Applies to: 4.8.4

Fixed in: 4.8.5

If your DataStage flow includes data from an Amazon S3 connection and you select the Delta Lake table format, the flow will fail unless you specify the full path to the bucket in the Endpoint folder property. For example, bucket-name/path-to-table-location. If you also specify the bucket name in the Amazon S3 connection form or in the Amazon S3 connector properties, you must use the same path that you specify in the Endpoint folder property.

Dremio connection fails with "java.lang.NoClassDefFoundError"

Applies to: 4.8.3

Fixed in: 4.8.4

When you run a DataStage flow with data from a Dremio connection, the flow will fail with the error message: java.lang.NoClassDefFoundError: org.apache.arrow.flight.sql.impl.FlightSql$SqlInfo.

Table format properties are ignored

Applies to: 4.8.3

4.8.4 and later: The Amazon S3 and Apache HDFS connectors support the Table format property.

4.8.4: The Microsoft Azure Data Lake Storage connector does not include the Table format property.

These connectors include the Table format property in the Stage tab:
  • Amazon S3
  • Apache HDFS
  • Microsoft Azure Data Lake Storage

The Table format property (Deltalake, Flat file, or Iceberg) and the related Table properties are not supported in DataStage. Any values that you enter for these properties are ignored.

Apache HDFS: Selecting a value for the Table format property hides the File name field

Applies to: 4.8.3

Fixed in: 4.8.4

If you select a Table format property (Deltalake, Flat file, or Iceberg), the File name field is hidden. The Table format property cannot be deselected.

Workaround: Delete the connector from the canvas and re-enter the values.

Db2 for i: Flow fails in Parallel execution mode if a Select statement is specified

Applies to: 4.8.3

Fixed in: 4.8.4

The Db2 for i connector cannot run in Parallel execution mode if a Select statement is specified and the SQL contains an alias on the table name reference.

Workaround: Either remove the alias or set the execution mode to Sequential.

Presto connector: Flow fails in Parallel execution mode

Applies to: 4.8.1

Fixed in: 4.8.2

If your flow uses a Presto connector for read operations that include partitioned data (Parallel execution mode), the flow will fail.

Workaround: Change the execution mode to Sequential.

Previewing data without a schema is not supported for MySQL and MariaDB

Applies to: 4.8.0

Fixed in: 4.8.1

If you do not provide a schema, you will be unable to preview data in MySQL and MariaDB.

ODBC MongoDB data source: SCRAM-SHA-256 authentication method is not supported

Applies to: 4.8.0 and 4.8.1

Fixed in: 4.8.2

If you create an ODBC connection for a MongoDB data source that uses the SCRAM-SHA-256 authentication method (AM), the job will fail.

Workaround: Change the server-side authentication to SCRAM-SHA-1. Alternatively, use the MongoDB connection or the Generic JDBC connection.

Known issues for DataStage in Pipelines

These known issues are DataStage-specific. For known issues in Pipelines not listed here, see Known issues and limitations for Watson Pipelines. For DataStage-specific limitations, see Migrating and constructing pipeline flows for DataStage.

A storage volume connection must be created in a project to support use of files in that storage volume's mount path

Applies to: 4.8.0 and later

If a file in a storage volume is being accessed by mount path, the flow will fail if a storage volume connection has not been made for that volume. If the flow is exported to another project, the flow will fail if a storage volume connection has not been made in that project. Workaround: Create a storage volume connection.

Existing jobs containing Run Bash script may break

Applies to: 4.8.0 and later

Due to a fix made in 4.7.1, trailing \n characters are no longer removed from the output of Run Bash script. Jobs made in 4.7.0 may break due to the fix. Workaround: See Known issues and limitations for Watson Pipelines.

Redundant environments created on import

Applies to: 4.8.0 and later

Migration adds environments as both local parameters and environments with a $ sign on the Run DataStage job or Run Pipeline job node. Workaround: Remove the redundant environments.

Custom job run names are not supported in Pipelines

Applies to: 4.8.0

Fixed in: 4.8.1

Specific pipeline job runs cannot be given names because DSJobInvocationID is not supported in pipeline jobs.

Non-ASCII characters are not recognized in several Watson Pipeline fields

Applies to: 4.8.0 and later

In the following fields in Watson Pipeline, non-ASCII characters cannot be used:
  • Pipeline/job parameter name
  • User variable name
  • Environment variable name
  • Output variable name
  • Email address
Number of nodes is limited by configuration size

Applies to: 4.8.0 and later

The number of standard nodes across all active pipelines can be no more than 500 for a SMALL configuration, 750 for a MEDIUM, and 1000 for a LARGE.

Limitations

Limitations for general areas:

Structured query access to metadata is not supported

Applies to: 4.8.5 and later

Structured query access to DataStage metadata is not supported.

Workaround: You can export the project as a .zip file, unzip the file, and use a text tool to search. You can also check the project into a git repository and use GitHub search.
Match designer does not support weight contribution

Applies to: 4.8.5 and later

Weight contribution for weight comparison is not supported for the Match designer.

Project import and export does not retain the data in file sets and data sets

Applies to: 4.8.0 and later

Project-level import and exports do not package file set and data set data into the .zip file, so flows that use data sets and file sets will fail to run after export. Workaround: Rerun the jobs that create those data sets and file sets to reestablish those objects.
File sets for data over 500 M are not exported

Applies to: 4.8.0 and later

If the size of the actual files in a file set is more than 500 M, no data will be stored in the exported zip file.
Function libraries do not support the const char* return type

Applies to: 4.8.0 and later

User-defined functions with the const char* return type are not supported.
Status updates are delayed in completed job instances

Applies to: 4.8.0 and later

When multiple instances of the same job are run, some instances continue to display a "Running" status for 8-10 minutes after they have completed. Workaround: Use the dsjob jobrunclean command and specify a job name and run-id to delete an active job.
Node pool constraint is not supported

Applies to: 4.8.0 and later

The node pool constraint property is not supported.
Reading FIFOs on persistent volumes across pods causes stages to hang

Applies to: 4.8.0 and later

Reading FIFOs on persistent volumes across pods is not supported and causes the stage reading the FIFO to hang. Workaround: Constrain the job to a single pod by setting APT_WLM_COMPUTE_PODS=1.
Unassigned environment variables and parameter sets are not migrated

Applies to: 4.8.0 and later

Environment variables and parameter sets that have not been assigned a value will be skipped during export. When jobs are migrated, they contain only those environment variables and parameter sets that have been assigned a value for that job.
Limitations for connectors:

Complex flows using Google BigQuery cannot be executed in ELT mode with Link as view

Applies to: 4.8.5 and later

Complex flows with Link as view set as their materialization policy may fail to run in ELT mode due to the nested view limitations of Google BigQuery.

Recompile flows created with personal credentials by a different user

Applies to: 4.8.0 and later

If you want to run a flow that was created by a different user and the flow includes data from a connection that was created with personal credentials, you need to recompile the flow and enter your own personal credentials for the connection.

Only one data asset can be created with the Sequential file connector

Applies to: 4.8.0 and later

When you select Create data asset in the Sequential file connector, a single data asset is created even if multiple file names are provided. Only the first file becomes a data asset.

Previewing data and using the asset browser to browse metadata do not work for these connections:

Applies to: 4.8.0 and later

  • Apache Cassandra (optimized)
  • Apache HBase
  • IBM MQ
"Test connection" does not work for these connections:

Applies to: 4.8.0 and later

  • Apache Cassandra (optimized)