Known issues and limitations for DataStage
The following known issues and limitations apply to DataStage.
- Known issues
-
- General
-
- Flows using HBase client fail to run
- Flows display status Not Compiled when status should be Unknown
- Match designer sorts records on each page, not across all records
- Jobs may fail after backup and restore with compute pod connection failure
- Migrated jobs that contain passwords encrypted with iisenc fail
- Data asset displays created time instead of last modified time
- Five or more PXRuntime instances cannot be created without increasing operator memory
- Stages
- Connectors
-
- IBM® MQ connection: "SSL connection" option is not working
- Dremio Cloud: Connections fail in parallel execution mode
- Amazon S3: Flows that use Delta Lake table format fail with "Internal error occurred: Assertion error..."
- Dremio connection fails with "java.lang.NoClassDefFoundError"
- Table format properties are ignored
- Apache HDFS: Selecting a value for the Table format property hides the File name field
- Db2® for i connector: Flow fails in Parallel execution mode with a Select statement
- Presto connector: Flow fails in Parallel execution mode
- Previewing data without a schema is not supported in MariaDB and MySQL
- ODBC MongoDB data source: SCRAM-SHA-256 authentication method is not supported
- Pipelines
-
- Flows using storage volume files by mount path fail without a storage volume connection
- Existing jobs containing Run Bash script may break
- Redundant environments created on import
- Custom job run names are not supported in Pipelines
- Non-ASCII characters are not recognized in several Watson Pipeline fields
- Number of nodes is limited by configuration size
- Limitations
Known issues
- Known issues for general areas:
-
- Flows using the HBase client fail to run
-
Applies to: 4.8.5 and later
The HBase client has been removed from PX-runtime, so flows using the client will fail to run. Workaround: You can install the H Base client with the following steps:- Log into the cluster with
oc
. - Get the base client from https://hbase.apache.org/downloads.html. Be sure to follow verification
instructions.
wget https://dlcdn.apache.org/hbase/2.5.8/hbase-2.5.8-bin.tar.gz
- Extract the compressed file.
tar -zxvf hbase-2.5.8-bin.tar.gz
- If another H Base client was previously manually added to an instance, clean that client. Skip
this step if a client has not been previously added.
#!/bin/bash # Update to the desired instance. INSTANCE=ds-px-default POD=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1 |head -n 1) echo "Cleaning up HbaseClient for instance ${INSTANCE} via Pod $POD" # rm client files oc exec ${POD} -- rm -fr /px-storage/HbaseClient # rm sym link oc exec ${POD} -- rm -fr /opt/ibm/PXService/HbaseClient
- Populate the engine with the extracted client's /lib directory's
contents.
#!/bin/bash # Update to the desired instance. INSTANCE=ds-px-default PODS=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1) POD=$(echo "${PODS}" |head -n 1) SOURCE=<absolute path to the extracted client>/hbase-2.5.8/lib oc cp ${SOURCE} ${POD}:px-storage/HbaseClient/ oc delete pod ${PODS}
- Restart the pods and check to make sure the folder
/opt/ibm/PXService/HbaseClient contains the .jar files.
#!/bin/bash # Update to the desired instance. INSTANCE=ds-px-default POD=$(oc get pods |grep ${INSTANCE} |cut -d" " -f1 |head -n 1) oc exec ${POD} -- ls /opt/ibm/PXService/HbaseClient |head -n 5
- Log into the cluster with
- Flows display status Not Compiled when status should be Unknown
-
Applies to: 4.8.4 and later
After upgrades, flows display status Not compiled instead of status Unknown, so flows that were compiled before an upgrade may not appear so. Workaround: Status is corrected when the flow is recompiled.
- Match designer sorts records on each page, not across all records
-
Applies to: 4.8.4 and later
Result records in the Match designer are sorted on a per-page basis, not across all records. Workaround: Set the default page size to a number larger than the number of records.
- Jobs may fail with compute pod connection failure after backup and restore
-
Applies to: 4.8.0 and later
After backup and restore, jobs may fail to run due to connection failure with the compute pods. Workaround: Restart the compute pods.oc -n ${PROJECT_CPD_INST_OPERANDS} delete pod -l app.kubernetes.io/component=px-compute
- Migrated jobs fail if they contain passwords encrypted with
iisenc
-
Applies to: 4.8.0 and later
Passwords encrypted with
iisenc
are not supported in migrated jobs. Workaround: Change passwords to cleartext.
- Data asset displays created time instead of last modified time
-
Applies to: 4.8.0 and later
Data assets created with sequential file may not update the
Last modified
field when their metadata is modified. Note that the value ofLast modified
represents the time when the metadata of the asset was last modified, not the last modified time of the physical file.
- Five or more PXRuntime instances cannot be created without increasing operator memory
-
Applies to: 4.8.0 and later
To create five or more PXRuntime instances, a user must update the CSV and increase the memory limit of the operator pod. Workaround: Get the DataStage cluster service version in the operator namespace.
Patch the CSV to increase operator pod memory to 2Gi.oc -n ${PROJECT_CPD_INST_OPERATORS} get csv | grep ibm-cpd-datastage-operator
oc -n ${PROJECT_CPD_INST_OPERATORS} patch csv <datastage-csv-name> --type='json' -p='[{"path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", "value": "2Gi", "op": "replace"}]'
- Known issues for stages:
-
- Flows containing the REST or Data Service stages fail to run
-
Applies to: 4.8.2 and later
Due to security fixes, flows containing the REST or Data Service stages may throw errors regarding the JAR files and fail to run. Workaround: Recompile the flows.
- XML Output stage imported from .isx file fails to run if set to write output to a file
-
Applies to: 4.8.0 and later
If a flow imported from an ISX file contains an XML Output stage with Write output to a file selected, the flow will fail to run. Workaround: Deselect Write output to a file and add a Sequential file connector after the XML Output stage. Use the Sequential file connector to write your output into a file.
- Transformer stage fails to compile with large amounts of generated code
-
Applies to: 4.8.0 and later
Flows with large amounts of generated and nested code in the Transformer stage fail to compile due to resource limits. Workaround: Increase the PX-runtime resource limits.
- Known issues for connectors:
-
See also Known issues for Common core services for the issues that affect connectors that are used for other services in Cloud Pak for Data.
- IBM MQ connection: "SSL connection" option is not working
-
Applies to: 4.8.5 and later
In the Create connection form, under the Certificates (optional) section, there is a selection for SSL connection. This option does not work. Instead, if you want to connect to IBM MQ with a secure connection, follow instructions at Configuring SSL for the IBM MQ connection.
- Dremio Cloud: Connections fail in parallel execution mode
-
Applies to: 4.8.3 and later
If you connect to a Dremio Cloud instance and you run a DataStage flow or job in parallel execution mode, the flow will fail.
- Amazon S3: Flows that use Delta Lake table format fail with "Internal error occurred: Assertion error..."
-
Applies to: 4.8.4
Fixed in: 4.8.5
If your DataStage flow includes data from an Amazon S3 connection and you select the Delta Lake table format, the flow will fail unless you specify the full path to the bucket in the Endpoint folder property. For example, bucket-name/path-to-table-location. If you also specify the bucket name in the Amazon S3 connection form or in the Amazon S3 connector properties, you must use the same path that you specify in the Endpoint folder property.
- Dremio connection fails with "java.lang.NoClassDefFoundError"
-
Applies to: 4.8.3
Fixed in: 4.8.4
When you run a DataStage flow with data from a Dremio connection, the flow will fail with the error message:
java.lang.NoClassDefFoundError: org.apache.arrow.flight.sql.impl.FlightSql$SqlInfo
.
- Table format properties are ignored
Applies to: 4.8.3
4.8.4 and later: The Amazon S3 and Apache HDFS connectors support the Table format property.
4.8.4: The Microsoft Azure Data Lake Storage connector does not include the Table format property.
These connectors include the Table format property in the Stage tab:- Amazon S3
- Apache HDFS
- Microsoft Azure Data Lake Storage
The Table format property (Deltalake, Flat file, or Iceberg) and the related Table properties are not supported in DataStage. Any values that you enter for these properties are ignored.
- Apache HDFS: Selecting a value for the Table format property hides the File name field
-
Applies to: 4.8.3
Fixed in: 4.8.4
If you select a Table format property (Deltalake, Flat file, or Iceberg), the File name field is hidden. The Table format property cannot be deselected.
Workaround: Delete the connector from the canvas and re-enter the values.
- Db2 for i: Flow fails in Parallel execution mode if a Select statement is specified
-
Applies to: 4.8.3
Fixed in: 4.8.4
The Db2 for i connector cannot run in Parallel execution mode if a Select statement is specified and the SQL contains an alias on the table name reference.
Workaround: Either remove the alias or set the execution mode to Sequential.
- Presto connector: Flow fails in Parallel execution mode
-
Applies to: 4.8.1
Fixed in: 4.8.2
If your flow uses a Presto connector for read operations that include partitioned data (Parallel execution mode), the flow will fail.
Workaround: Change the execution mode to Sequential.
- Previewing data without a schema is not supported for MySQL and MariaDB
-
Applies to: 4.8.0
Fixed in: 4.8.1
If you do not provide a schema, you will be unable to preview data in MySQL and MariaDB.
- ODBC MongoDB data source: SCRAM-SHA-256 authentication method is not supported
-
Applies to: 4.8.0 and 4.8.1
Fixed in: 4.8.2
If you create an ODBC connection for a MongoDB data source that uses the SCRAM-SHA-256 authentication method (AM), the job will fail.
Workaround: Change the server-side authentication to SCRAM-SHA-1. Alternatively, use the MongoDB connection or the Generic JDBC connection.
- Known issues for DataStage in Pipelines
-
These known issues are DataStage-specific. For known issues in Pipelines not listed here, see Known issues and limitations for Watson Pipelines. For DataStage-specific limitations, see Migrating and constructing pipeline flows for DataStage.
Limitations
- Limitations for general areas:
-
- Structured query access to metadata is not supported
Applies to: 4.8.5 and later
Structured query access to DataStage metadata is not supported.
Workaround: You can export the project as a .zip file, unzip the file, and use a text tool to search. You can also check the project into a git repository and use GitHub search.
- Match designer does not support weight contribution
-
Applies to: 4.8.5 and later
Weight contribution for weight comparison is not supported for the Match designer.
- Project import and export does not retain the data in file sets and data sets
Applies to: 4.8.0 and later
Project-level import and exports do not package file set and data set data into the .zip file, so flows that use data sets and file sets will fail to run after export. Workaround: Rerun the jobs that create those data sets and file sets to reestablish those objects.
- File sets for data over 500 M are not exported
Applies to: 4.8.0 and later
If the size of the actual files in a file set is more than 500 M, no data will be stored in the exported zip file.
- Function libraries do not support the
const char*
return type Applies to: 4.8.0 and later
User-defined functions with theconst char*
return type are not supported.
- Status updates are delayed in completed job instances
Applies to: 4.8.0 and later
When multiple instances of the same job are run, some instances continue to display a "Running" status for 8-10 minutes after they have completed. Workaround: Use thedsjob jobrunclean
command and specify a job name and run-id to delete an active job.
- Node pool constraint is not supported
Applies to: 4.8.0 and later
The node pool constraint property is not supported.
- Reading FIFOs on persistent volumes across pods causes stages to hang
Applies to: 4.8.0 and later
Reading FIFOs on persistent volumes across pods is not supported and causes the stage reading the FIFO to hang. Workaround: Constrain the job to a single pod by setting APT_WLM_COMPUTE_PODS=1.
- Unassigned environment variables and parameter sets are not migrated
Applies to: 4.8.0 and later
Environment variables and parameter sets that have not been assigned a value will be skipped during export. When jobs are migrated, they contain only those environment variables and parameter sets that have been assigned a value for that job.
- Limitations for connectors:
-
- Complex flows using Google BigQuery cannot be executed in ELT mode with Link as view
-
Applies to: 4.8.5 and later
Complex flows with Link as view set as their materialization policy may fail to run in ELT mode due to the nested view limitations of Google BigQuery.
- Recompile flows created with personal credentials by a different user
-
Applies to: 4.8.0 and later
If you want to run a flow that was created by a different user and the flow includes data from a connection that was created with personal credentials, you need to recompile the flow and enter your own personal credentials for the connection.
- Only one data asset can be created with the Sequential file connector
-
Applies to: 4.8.0 and later
When you select Create data asset in the Sequential file connector, a single data asset is created even if multiple file names are provided. Only the first file becomes a data asset.
- Previewing data and using the asset browser to browse metadata do not work for these connections:
-
Applies to: 4.8.0 and later
- Apache Cassandra (optimized)
- Apache HBase
- IBM MQ
- "Test connection" does not work for these connections:
-
Applies to: 4.8.0 and later
- Apache Cassandra (optimized)