Known issues and limitations
The following known issues and limitations apply to IBM® watsonx.data.
watsonx.data Developer edition
watsonx.data on IBM Software Hub
Known issues: Access
- Presto workloads and zen core authentication failure
- Data access denied due to credential mismatch
- User is still visible in the Access control page of an engine after removing the user from the CPD platform
- User cannot access tables created in the COS bucket due to a missing metadata error
- RBAC management for buckets with non-compliant names
- Access is denied when querying an external database
- Assigning Grant or Revoke privilege
- Only table creator has DROP access in Apache Hive (API)
- User-provided certificates are not supported by watsonx.data
- Unable to edit storage policies and associated data objects
- Presto workloads and zen core authentication failure
-
Applies to: 2.0.0
- Data access denied due to credential mismatch
-
Applies to: 1.1.0 and later
- The user is still visible in the Access control page of an engine after removing the user from the CPD platform
-
Applies to: 1.1.3 and later
- User cannot access tables that are created in the COS bucket due to a missing metadata error
-
Applies to: 1.1.2 and later
- RBAC management for buckets with non-compliant names
-
Applies to: 2.0.0 and later
Users attempting to manage role-based access control (RBAC) for buckets with names that deviate from the following regular expression
(^[a-zA-Z0-9-_.]+$)encounters an error. Bucket names must consist of only lowercase and uppercase letters, numbers, underscores, and hyphens. Any characters apart from this list prevents RBAC management functions from working as expected.
- Access is denied when querying an external database
-
Applies to: 1.1.0 and later
When a user with the User role and Create access (the user only has Create access), is added to an external database, they cannot run the select query from the table they have created. Though the user can connect to the PrestoPresto (Java) engine and create tables and schemas, they cannot query from the table. The system displays aAccess Deniedmessage.
Workaround: Provide select privilege for the table the user created.Query 20230608_132213_00042_wpmk2 failed: Access Denied: Cannot select from columns [id] in table or view tab_appiduser_01
- Assigning Grant or Revoke privilege
-
Applies to: 1.1.0 and later
Assigning Grant or Revoke privilege to a user through access policy does not work as expected in the following scenarios:
- User_A adds a bucket and a Hive catalog (for example,
useracat02). - User_A creates a schema and a table.
- User_B and User_C are assigned User roles to the catalog.
- User_A adds allow grant policy to User_B
- User_B connects to the catalog and runs
grant selectto User_C.presto:default> grant select on useracat02.schema_test_01.tab_1 to "6ff74bf7-b71b-42f2-88d9-a98fdbaed304"; - When the User_C connects to the catalog and runs
selectcommand on the table, the command fails with access denied message.presto:default> select * from useracat02.schema_test_01.tab_1; Query 20230612_073938_00132_hthnz failed: Access Denied: Cannot select from columns [name, id, salary, age] in table or view tab_1
- User_A adds a bucket and a Hive catalog (for example,
- Only table creator has DROP access in Apache Hive (API)
-
Applies to: 1.1.0 and later
Only the creator of a table can drop the table that is created in the Apache Hive catalog. Other users cannot drop the table even if they have an explicit DROP access to the table. They get the
Access Deniedmessage.
- User-provided certificates are not supported by watsonx.data
-
Applies to: 1.1.0 and later
User-provided certificates are not supported in watsonx.data when adding database connections, object store buckets, or by using ibm-lh utility.
- Unable to edit storage policies and associated data objects
- Applies to: 2.1.3
After a policy is created for a storage, you cannot edit the policy and the associated data objects.
Workaround: Perform a hard refresh after creating the policy before attempting any edits.
Known issues: Catalog, schema, and tables
- Hive catalog does not support .csv format for create table int type column
- Time data type support in Hive and Iceberg
- Special characters and mixed case impacting data synchronization
- Accessing Hive and Iceberg tables in the same glue metastore catalog
- Table names with multiple dots
- Timeout error when creating schema in AFM bucket
- Creating a schema without a location
- Creating schema location with path
- Unable to view created schema
- Unique names for schema and bucket
- Hive external column names with uppercase full-width letters cannot be recognized when file-column-names-read-as-lower-case is set to true
- Default hive catalog returns empty results
- Schema, Tables, and Columns displayed in Query workspace and Virtualization windows show different case formats compared to MDS or remote datasource
- Failed to load schema for Db2 on Cloud during data ingestion
- Schema, Tables, and Columns show inconsistent case formats between Query Workspace, Virtualization, and remote data sources
- Hive catalog does not support CSV format for create table int type column
-
Applies to: 2.0.3 and later
- Time data type support in Hive and Iceberg
-
Applies to: 2.0.0 and later
- Special characters and mixed cases impacting data synchronization
-
Applies to: 2.0.0 and later
- Accessing Hive and Iceberg tables in the same glue metastore catalog
-
Applies to: 1.1.0 and later
- Table names with multiple dots
-
Applies to: 1.1.0 and later
- Timeout error when creating schema in AFM bucket
-
Applies to: 1.1.1 and later
- Creating a schema without a location
-
Applies to: 1.1.0 and later
When you create a schema without a location, it is not listed in the schema list of any catalog.
For example, if you create a schema without specifying the location of the bucket, the schema is created in HMS and not in the bucket. When you try to create a new schema with the same name, it fails and responds that the schema already exists.
Workaround: Specify the location of the bucket when creating a schema.
- Creating schema location with path
-
Applies to: 1.1.0 and later
Use one of the following location options when creating a schema:- Location pointing to a
bucket/subpathwithout a trailing/. - Location pointing to a
bucket/subpathwith a trailing/– Recommended for better structuring.
Note: Though you can use a location pointing to a bucket only with or without a trailing/, it might lead to failure. Therefore, it is recommended to use a subpath. The subpath must be the same as the schema name. For example, to create a schematest1, specify the location aslocation='s3a://iceberg-bucket/test1' - Location pointing to a
- Unable to view created schema
-
Applies to: 1.1.0 and later
When a user with the User role and the Create access (the user only has the Create access) is added to an external database, they cannot see the schemas that they created. Though the user can create schemas, they cannot view them. The following is the system response:presto:default> show schemas; Schema -------- (0 rows)Workaround: Provide select privilege for the schema the user created.
- Unique names for schema and storage
-
Applies to: 1.1.0 and later
A schema and a storage cannot be created with the same name.
For example, if you create a schema that is named “sales” in one catalog, the same name cannot be used for another schema in another catalog. Similarly, if you register a storage with the name “salesbucket”, another storage with the same cannot be registered, even if the storage is located in a different object store.
Workaround: Use unique names when creating schemas and storages.
- Hive external column names with uppercase full-width letters cannot be recognized when file-column-names-read-as-lower-case is set to true
- Applies to: 2.1.1
When the presto worker catalog property file-column-names-read-as-lower-case is set to true, it converts field names in ASCII uppercase letters to ASCII lowercase. As a result, data under column names with uppercase full-width characters will not be recognized and will appear as "null".
- Default hive catalog returns empty results
-
Applies to: 2.2.2 and later
The default Hive catalog in the target cluster returns empty results when running a SELECT query, despite schemas and tables being restored correctly.
- Schema, Tables, and Columns displayed in Query workspace and Virtualization windows show different case formats compared to MDS or remote datasource
-
Applies to: 2.3.0 and later
When using GFS in watsonx.data, schema, tables, and column names displayed in Query Workspace differ in case formatting from those shown in the Virtualization screen and the remote datasource.- Virtualization window: Displays schema and tables in the same case as the remote datasource.
- Query workspace (GFS catalog): Displays all schema, tables, and columns in lowercase when case sensitivity is disabled.
- Failed to load schema for Db2 on Cloud during data ingestion
-
Applies to: 2.3.0 and later
Fixed in: 2.3.1
When ingesting data from Db2 on Cloud using Data Manager in Software Hub 5.3.0, the schema fails to load even if the user has Admin/Owner/Writer permissions. The error displayed is:
An error occurred while loading the schemas for database: db2cloud.Workaround: Use ad-hoc (temporary) connection to register the Db2 database and ingest data.
- Schema, Tables, and Columns show inconsistent case formats between Query Workspace, Virtualization, and remote data sources
-
Applies to: 2.3.0 and later
When using GFS in watsonx.data, schema, table, and column names may appear in different case formats across components:- Virtualization window: Displays schema and tables in the same case as the remote data source.
- Query Workspace (GFS catalog): Displays all schema, tables, and columns in lowercase when case sensitivity is disabled.
Additional Details:
- Presto C++ engine does not support mixed-case schema, table, or column names. It can only read objects if they are created using the default case conventions of the respective data source.
- If case sensitivity is enabled, all functionalities operate as expected.
- If case sensitivity remains disabled, queries will fail for objects created in mixed case or non-default case formats.
Default case conventions for federated tables:
- MSSQL, PostgreSQL, MySQL: Use lowercase names for schemas and tables.
- Db2, Netezza, Oracle, Snowflake: Use uppercase names for schemas and tables.
- Query Workspace will still display names in lowercase, but queries should reference objects using the correct default case for optimization.
Workaround:
- Enable case sensitivity in GFS for consistent behavior across components.
- If case sensitivity cannot be enabled:
- Create schemas, tables, and columns using the default case of the underlying data source (uppercase for Db2, Oracle, etc.; lowercase for MySQL, PostgreSQL, etc.).
- Avoid mixed-case object names, as they are not supported by Presto C++.
Known issues: Data source and Storage
Server concurrency limit reachederror in flight server- Using
IDas a column name in CassandraCREATE TABLE - Custom database crashes due to invalid parameters or properties
EXISTSclause on Apache Phoenix tables generateException while executing queryerror- Known issues for Apache Derby connector (Arrow-based connector)
- Test connection for arrow connectors fails in FIPS-enabled clusters
- Apache Kafka test connection fails in FIPS-enabled clusters without SCRAM-SHA-512
- AWS S3 buckets added as Custom S3 storage will fail when running queries
Server concurrency limit reachederror in flight server-
Applies to: 2.1.0 and later
- Using
IDas a column name in CassandraCREATE TABLE -
Applies to: 2.0.0 and later
In Cassandra, you cannot create a table with a column namedIDwhile using a Cassandra connector through Presto. This is becauseIDis a reserved keyword for the Cassandra driver that is used by Presto, which automatically generates a UUID for each row. Attempting to create a table with a column nameIDresults in an error message indicating a duplicate column declaration as follows:Duplicate column 'id' declaration for table 'tm_lakehouse_engine_ks.testtable12'Workaround: Avoid using
IDas a column name when creating Cassandra tables through Presto.
- Custom database crashes due to invalid parameters or properties
-
Applies to: 1.1.4 and later
Creating a custom database with invalid parameters or properties can lead to the database crashing due to memory issues or other internal errors. The custom database becomes unavailable or unusable and other functions within the platform might also be impacted.
Workaround: You can directly modify the database record to remove or correct the problematic parameters or properties. In IBM Cloud Pak for Data (CPD), using the Patch API to update the database configuration can trigger a restart of the engine process to resolve the issue once the property or parameter is removed. The problematic parameters or properties could be removed using the patch API for customization by passing the property name inremove_engine_properties.- Login to Postgres database by using the
command:
oc exec -it ibm-lh-postgres-edb-1 -- bash psql -U postgres -l psql -U postgres ibm_lh_repo \dt - Remove the database record that contains wrong parameter from
metadata_properties.Example:
delete from metadata_properties where type = 'jvm-worker' and service_id='presto-01'; - Restart engine using the Patch API.
- Login to Postgres database by using the
command:
EXISTSclause on Apache Phoenix tables generateException while executing queryerror-
Applies to: 2.1.1 and later
Queries involving the
EXISTSclause on Apache Phoenix tables may fail unexpectedly, even when the referenced column is valid. This occurs due to limitations in Apache Phoenix's interpretation of theEXISTSclause, particularly in cases with ambiguous or misaligned query structures.Workaround: To address this limitation, apply one of the following strategies:
- Establish a clear relationship between the subquery and the main query. Introduce a filter
condition within the subquery to create a meaningful relationship between the subquery and the main
query. For example, where
department_id_bigint IS NOT NULLin the subquery. For more information, refer the following example:SELECT DISTINCT t1.first_name_varchar, t2.performance_rating_real, t1.team_head_varchar FROM phoenix.tm_lh_engine.employee t1, phoenix.tm_lh_engine.departments t2 WHERE EXISTS ( SELECT 1 FROM phoenix.tm_lh_engine.departments WHERE department_id_bigint IS NOT NULL ) - Establish a clear relationship between the tables involved by explicitly joining the tables in
the subquery. This ensures the subquery is contextually relevant and resolves the execution issue.
For example, where
t3.department_id_bigint = t2.department_id_bigintin the subquery. For more information, refer the following example:SELECT DISTINCT t1.first_name_varchar, t2.performance_rating_real, t1.team_head_varchar FROM phoenix.tm_lh_engine.employee t1, phoenix.tm_lh_engine.departments t2 WHERE EXISTS ( SELECT 1 FROM phoenix.tm_lh_engine.departments t3 WHERE t3.department_id_bigint = t2.department_id_bigint )
- Establish a clear relationship between the subquery and the main query. Introduce a filter
condition within the subquery to create a meaningful relationship between the subquery and the main
query. For example, where
- Known issues for Apache Derby connector (Arrow-based connector)
- Applies to: 2.2.0 and later
The error generated in the following scenarios is documented in the Derby document (see SQL error messages and exceptions) and the Error code is
42Z71.- XML data type is not supported in both Apache Derby and Presto.
- The
SHOW COLUMNSquery will represent XML type columns asVARCHAR. - When performing a
SELECTquery, users must explicitly specify the columns to be selected. UsingSELECT *will fail if the table contains columns with XML data types.
- Test connection for arrow connectors fails in FIPS-enabled clusters
- Applies to: 2.2.1 and later
Test connection for arrow connectors may fail when deployed in FIPS-enabled clusters due to cryptographic restrictions. This affects connectors such as Greenplum, MariaDB, and Salesforce, which rely on underlying datasources or libraries incompatible with FIPS mode during connection validation.
- Apache Kafka test connection fails in FIPS-enabled clusters
- Applies to: 2.2.1 and later
For Apache Kafka, test connection may fail unless the SASL_MECHANISM is explicitly set to "SCRAM-SHA-512". This mechanism is compatible with FIPS requirements and should be used to ensure successful connection testing in FIPS-enabled environments.
- AWS S3 buckets added as Custom S3 storage will fail when running queries
-
AWS S3 buckets that are added to watsonx.data using the Custom S3 Storage option will encounter failures when running queries.
Known issues: Engine
- Invalid file associations in Presto resource group through UI and engine restart issues
- Presto pod restarts with invalid API patch
- Associating HANA with PrestoPresto (Java) engine gives an error
- Columns of type numeric array are left out while
starting
DatabaseMetadata.getColumns() - The default deployment type for Presto (C++) engine
is
Smallsize - Issue with prestodb module in ibm-lh-client
- PrestoPresto (Java) do not recognize the path as a directory
- PrestoPresto (Java) JDBC driver returns incorrect values of date and timestamp
- Gluten engine not supported on Power architecture
- Invalid file associations in Presto resource group through UI and engine restart issues
-
Applies to: 2.0.2 and later
- Presto pod restarts with invalid API patch
-
Applies to: 2.0.0
- Associating HANA with PrestoPresto (Java) engine gives an error
-
Applies to: 1.1.1 and later
- Columns of type numeric array are left out while starting
DatabaseMetadata.getColumns() -
Applies to: 1.1.0 and later
When you start
DatabaseMetadata.getColumns()using the PrestoPresto (Java) JDBC driver, columns of type numeric array are left out.
- The default deployment type for Presto (C++) engine is
Smallsize -
Applies to: 2.1.0 and later
When a Presto (C++) engine is created opting the
Customconfiguration mode, the deployed engine is always insmallsize.Workaround: Customize the Presto (C++) resource configurations. Refer to Customizing components.
- Issue with prestodb module in ibm-lh-client
-
Applies to: 1.1.0 and later
Due to an issue with theprestodbmodule inibm-lh-client, you must complete the following steps to connect to a Python client when usingibm-lh-client:- Start the sandbox container for the registered Presto
engine.
ibm-lh-client/bin/dev-sandbox --engine=demo-b - In the bash prompt, install the
prestodbmodule.export HOME=/tmp pip3 install SQLAlchemy 'pyhive[presto]' presto-python-client
- Start the sandbox container for the registered Presto
engine.
- PrestoPresto (Java) do not recognize the path as a directory
-
Applies to: 1.1.0 and later
When you create a new table with a PrestoPresto (Java) Hive connector that uses an S3 folder from an external location, PrestoPresto (Java) does not recognize the path as a directory and an error might occur.
For example: When creating a customer table in the target directoryDBCERT/tbintin a bucket that is calleddqmdbcertpqby using the IBM Cloud UX and Aspera S3 console, the following error is encountered:External location must be a directory.CREATE TABLE "hive-beta"."dbcert"."tbint" ( RNUM int , CBINT bigint ) WITH ( format='PARQUET', external_location = 's3a://dqmdbcertpq/DBCERT/tbint' ); Query 20230509_113537_00355_cn58z failed: External location must be a directoryObjects in a file system are stored as objects and their path. The object and path must have an associated metadata. If the path is not associated with the metadata, PrestoPresto (Java) fails to recognize the object and responds that the path is not a directory.
- PrestoPresto (Java) JDBC driver returns incorrect values of date and timestamp
-
Applies to: 1.1.0 and later
Date and timestamp string literals before 1900-01-01 return incorrect values.
- Gluten engine not supported on Power architecture
-
Applies to: 2.3.0 and later
Gluten engine is not supported on Power (ppc64le) architecture in watsonx.data. Attempting to create Spark applications or start the Spark history server with Gluten engine on Power clusters results in failures.
Known issues: Ingestion
- Files with different schemas result in null values
- Inconsistent CSV and Parquet file ingestion behavior
- Incorrect recognition of Gregorian dates in Presto with Hive Parquet tables
- Token expired error using Spark ingestion through web console
- Ingestion not supported using external Spark
- Ingestion fails if CSV file contains bad record
No columns to parse from fileerror- Special characters in target table names can cause ingestion failures
- The command
CREATE TABLE AS SELECT(CTAS) fails in Db2® when attempted on ingested data tables in Iceberg - Unsupported special characters in schema and table creation through Ingestion UI
- Ingestion fails for amazon S3 IAM role type storage
- Files with different schemas result in null values
-
Applies to: 2.0.2 and later
- Inconsistent CSV and Parquet file ingestion behavior
-
Applies to: 2.0.0 and later
- Incorrect recognition of Gregorian dates in Presto with Hive Parquet tables
-
Applies to: 2.0.0 and later
- Token expired error while using Spark ingestion through web console
-
Applies to: 1.1.2 and later
- Ingestion is not supported by using external Spark
- Though an external Spark engine (Fully or Self managed) can be added to watsonx.data, ingestion job is not possible using the Spark engine. Ingestion is supported only by a Colocated Spark engine.
- Ingestion fails if CSV file contains bad record
-
Applies to: 1.1.0 and later
ibm-lh tool does not support skipping maximum bad records for CSV files if the mismatch field is greater than the table definition.
No columns to parse from fileerrorApplies to: 1.1.0 and later
When you try to ingest folder from AWS S3 using the ibm-lh tool, the following error may be encountered if there are '0' sized empty files in the folder:
No columns to parse from file
- Special characters in target table names can cause ingestion failures
-
Ingestion fails if a target table name contains special characters in it when ingesting through the web console.
- The command
CREATE TABLE AS SELECT(CTAS) fails in Db2 when attempted on ingested data tables in Iceberg -
Applies to: 1.1.0 and later
Some versions of Db2 require an explicit length specification for
VARCHARcolumns. This requirement causes failure of the commandCREATE TABLE AS SELECT(CTAS) in Db2 when attempted on ingested data tables in Iceberg.Workaround: Change the SQL statement from
VARCHARtoVARCHAR(20).Example:create table "db2"."testgaissue"."testga" as ( select cast(checkingstatus as varchar(100)) as checkingstatus, loanduration, cast(credithistory as varchar(100)) as credithistory, cast(loanpurpose as varchar(100)) as loanpurpose, loanamount, cast(existingsavings as varchar(100)) as existingsavings, cast(employmentduration as varchar(100)) as employmentduration, installmentpercent, cast(sex as varchar(100)) as sex, cast(othersonloan as varchar(100)) as othersonloan, currentresidenceduration, cast(ownsproperty as varchar(100)) as ownsproperty, age, cast(installmentplans as varchar(100)) as installmentplans, cast(housing as varchar(100)) as housing, existingcreditscount, cast(job as varchar(100)) as job, dependents, cast(telephone as varchar(100)) as telephone, cast(foreignworker as varchar(100)) as foreignworker, cast(risk as varchar(100)) as risk from "iceberg_data"."project"."testga" );
- Unsupported special characters in schema and table creation through Ingestion UI
- Applies to: 2.2.1 and later
The following special characters are not supported when creating schemas and tables through the Ingestion UI:
%and+These restrictions are enforced due to limitations in underlying storage engines such as Hive, Delta, and Hudi. While the Data Manager page may allow a broader set of special characters (for example,
!,@,#,&,_,-,=,+,],},<, and>), the ingestion flow enforces stricter validation to ensure compatibility across services.
- Ingestion fails for amazon S3 IAM role type storage
- Applies to: 2.2.1 and later
Both lite ingestion and native Spark ingestion fail when they target Amazon S3 storage configured with IAM role type authentication.
Known issues: Installation and upgrade
- watsonx.data installation might fail with reason
OOMKilled. - Milvus API fails when upgrading from 2.0.0. to 2.0.1 or higher versions
- Installation of watsonx.data is stuck in
In Progressstate in the CPD 4.8.5 version - Mixed case functionality might fail when upgrading from 1.1.4 to 2.0.0 or 2.0.1
- Upgrading standalone watsonx.data might fail
- Installation path directory with space
- watsonx.data installation might fail with
reason
OOMKilled. -
Applies to: 2.0.3 and later
When you try to install watsonx.data, the installation might fail with
WxdAddonstatusInProgressandibm-lakehouse-controller-managerpod keeps on restarting. Theibm-lakehouse-controller-managerpod status shows the following:
The reasonState: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: OOMKilled Exit Code: 137OOMKilledmeans that your container memory limit is reached and the container is restarted.Workaround: Increase the pod memory limit for
csv/ibm-lakehouse-operatorby completing the following steps:- Run the following command to obtain the name of the cluster service version (csv) in the
operators project:
export PROJECT_CPD_INST_OPERATORS=<cpd_operators_project> export CLUSTER_SERVICE_VERSION_NAME=$(oc get csv -o name -n $PROJECT_CPD_INST_OPERATORS | grep ibm-lakehouse-operator) - Run the following command to confirm that the csv name is returned by step
1:
echo $CLUSTER_SERVICE_VERSION_NAME clusterserviceversion.operators.coreos.com/ibm-lakehouse-operator.v4.0.0 - Run the following command to increase the operator memory limit from 1G to 2G by patching the
csv:
oc patch $CLUSTER_SERVICE_VERSION_NAME --type json -n $PROJECT_CPD_INST_OPERATORS -p '[ { "op": "replace", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", "value": "2G" } ]' - Run the following command to confirm the new memory
limit:
echo "Memory Limit: $(oc get $CLUSTER_SERVICE_VERSION_NAME -n $PROJECT_CPD_INST_OPERATORS --output jsonpath={.spec.install.spec.deployments[0].spec.template.spec.containers[0].resources.limits.memory})" Memory Limit: 2GThe pod becomes stable after increasing the limit and completes the pending install and upgrade tasks.
- Run the following command to obtain the name of the cluster service version (csv) in the
operators project:
- Milvus API fails when upgrading from 2.0.0 to 2.0.1 or higher versions
-
Applies to: 2.0.0 only
When upgrading watsonx.data 2.0.0 version to 2.0.1 or higher versions, Milvus API fails with the following error:
or“exception”: “Authmode values not exist in the table”“exception”: “ADLS Authmode doesn’t match or not a supported auth method”Workaround: To resolve the issue, you must remove and re-add the existing ADLS storage configuration in the watsonx.data instance.
- Installation of watsonx.data is stuck in
In Progressstate in the CPD 4.8.5 version -
Applies to: 1.1.4 only
When attempting to install watsonx.data (
wxdaddon) on 4.8.5 deployment, the installation process stalls and thewxdaddonCustom Resource (CR) remains inIn Progressstate with aMODULE FAILUREerror message.Workaround: You must contact IBM support if encountered with the
MODULE FAILUREerror.
- Mixed case functionality might fail when upgrading from 1.1.4 to 2.0.0 or 2.0.1
-
Applies to: 1.1.4
Upgrading from Cloud Pak for Data (CPD) version 1.1.4 to 2.0.0 or the developer package version 1.1.4 to 2.0.0 or 2.0.1 does not update the parameter
enable-mix-case-supporttotruewhich might affect the schemas and tables in mixed case.Workaround: To be able to access all the schemas and tables including the mixed case names, you must follow the instructions:-
For CPD upgrades: Run the customization API patch.
After the upgrade, you must execute a customization API patch with the global parameter
"enable-mixed-case-support": "true". - For Developer Package Upgrades:
- Log in to the
ibm-lh-prestopod within your environment. - Locate and edit the
/opt/presto/etc/custom-config.propertiesfile. - Update the property
enable-mixed-case-supportin the file to set astrue.enable-mixed-case-support=true - Run the following command to restart the Presto
service.
/opt/presto/bin/launcher_restart_handler.sh restart
- Log in to the
-
- Upgrading standalone watsonx.data might fail
-
Applies to: Standalone versions 1.0.0, 1.0.1, 1.0.2, 1.0.3
When you try to upgrade a standalone watsonx.data to a latest version 5.0.x/2.0.x, the upgrading might fail.
Workaround: To upgrade without failure, you must do the following:
- Upgrade the standalone 1.0.x version of watsonx.data to 1.1.0/4.8.0 version of watsonx.data. See Upgrading watsonx.data from version 1.0.x to 1.1.x.
- Upgrade the watsonx.data 1.1.0/4.8.0 version to any latest version of watsonx.data by following Upgrading watsonx.data from version 1.0.x or 1.1.x to 2.0.x.
- Installation path directory with space
-
Applies to: 1.1.0 and later
When you run the
setup.shscript for the watsonx.data Developer version, if the installation path has a directory that contains spaces, an error might occur.For example, if the installation path is:
the error might be similar to the following message:/Users/john/documents/userdata/Hybrid Data Management/Lakehouse/ibm/lh-dev/bin./setup.sh: line 19: /Users/john/documents/userdata/Hybrid: no such file or directoryWorkaround: Install the watsonx.data developer version into a directory that contains no space.
Known issues: Milvus
- User role with
CreateCollectionL3 policy fails to create collection in Milvus - Milvus unresponsive to queries
- Inaccurate row count after deletions in Milvus
- Milvus service cannot be deleted using the delete icon from the user interface (UI) of watsonx.data developer edition in 1.1.3 version
- Milvus API error with non-premium watsonx.data accounts
- User role with
CreateCollectionL3 policy fails to create collection in Milvus -
Applies to: 2.0.1 and later
Users with
User rolewhile creating collections in Milvus with pymilvus can fail when using theORM ConnectionandMilvusClient Connectionmethods.Workaround: You must follow the instructions:-
ORM Connection: The user requires bothDescribeCollectionandCreateCollectionprivileges granted in the L3 policy page. You must select all collections in a database while grantingDescribeCollectionprivilege in the L3 policy through web console. -
MilvusClient Connection: OnlyCreateCollectionprivilege is necessary in the L3 policy page. However, the first attempt to create a collection will fail.- Run the
create_collectionfunction once. - Re-run the
create_collectionfunction again. This allows the policies to synchronise and the collection creation will succeed.
- Run the
-
- Milvus unresponsive to queries
-
Applies to: 1.1.3 and later
Milvus may not respond to queries when attempting to load collections or partitions that exceed available memory capacity. This occurs because all search and query operations within Milvus are executed in memory, requiring the entire collection or partition to be loaded before querying.
Workaround:- Consider the memory limitations of your Milvus deployment and avoid loading excessively large collections or partitions.
- If Milvus becomes unresponsive to queries, employ the appropriate Milvus API to unload or
release some collections from memory. An example using Python SDK:
collection.release().
- Inaccurate row count after deletions in Milvus
-
Applies to: 1.1.3 and later
The
collection.num_entitiesproperty might not reflect the actual number of rows in a Milvus collection after deletion operations. This property provides an estimate and may not account for deleted entities.To get an accurate count of rows, execute a
count(*)query on the collection. This provides an accurate count even after deletions.Pymilvus syntax:collection = pymilvus.Collection(...) collection.query(expr='', fields=['count(*)'])
- Milvus service cannot be deleted using the delete icon from the user interface (UI) of watsonx.data developer edition in 1.1.3 version
-
Applies to: 1.1.3
- Milvus API error with non-premium watsonx.data accounts
- Applies to: 2.2.1 and later
When using a non-premium watsonx.data account, attempting to access Milvus collections through the AMS API endpoint
GET /v1/access/unstructured_objects?objectType=<collection-name>&objectId=default.collectionmay result in the following error:[ERROR][handler]: RPC error: [search], <MilvusException: (code=65535, message=Get "/v1/access/unstructured_objects?objectType=milvus_collection&objectId=default.docling_extracted_collection_2": unsupported protocol scheme "")>.Workaround: Modify the kafka configuration as follows:common: security: rowfilter: enabled: false
Known issues: Presto (C++)
- Attempting to query Query History and Monitoring Management (QHMM) related tables using Presto (C++) engines might encounter errors
- Attempting to read Parquet v2 tables through Presto (C++) results in an error
- Presto (C++) fails to query an external partitioned table
- Presto (C++) fails to query gosales db
- Unable to access catalog when it is associated with only Presto (C++)
- Schema evolution scenarios fail in Presto (C++)
- Query failures during Presto (C++) engine deployment due to QHMM catalog update
- Presto (C++) fails to parse CHAR(N) data type
- Presto (C++) cannot read Iceberg tables with equality deletes
- Attempting to query Query History and Monitoring Management (QHMM) related tables using Presto (C++) engines might encounter errors
-
Applies to: 2.0.0 and later
When you attempt to query QHMM related tables using Presto (C++) engines, you might encounter errors due to unsupported file formats. Presto (C++) supports only Parquet v1 formats. You can not use Presto (C++) to query data or tables in other formats.
- Attempting to read Parquet v2 tables through Presto (C++) results in an error
-
Applies to: 2.0.0 and later
When you attempt to read Parquet v2 tables through Presto (C++) that were created via Data manager in watsonx.data, it gives the following error:Error in ZlibDecompressionStream::Next
- Presto (C++) fails to query an external partitioned table
-
Applies to: 2.0.0 and later
When you query an external table with
CHARdata type columns, the query fails to run. This issue occurs due to the limitation that Presto (C++) does not supportCHARdata types.Workaround: Change the
CHARdata type column toVARCHARdata type.
- Presto (C++) fails to query gosales db
Applies to: 2.1.0 and later
When querying gosales db using Presto(C++) returns null values.Workaround: Use the customization API to set
file-column-names-read-as-lower-case=trueto set the Presto worker catalog property.
- Unable to access catalog when it is associated with only Presto (C++)
-
Applies to: 2.1.0 and later
When the user adds a catalog and associates it only with Presto (C++), the user cannot access the catalog. The system displays a
Catalog does not existerror message, and the catalog does not appear in the list when the "show catalogsquery is executed.Workaround: Associate the catalog to both Presto (C++) and Presto (Java).
- Schema evolution scenarios fail in Presto (C++)
-
Applies to: 2.1.0 and later
When you drop and/or add table columns, queries might fail. For example, see the sequence of statements below, after which the queries on the table fail.create table ice.s3.tessch.12 (age int, name varchar(25), place varchar(25) insert into ice.s3.tessch.t12 values (35, 'ken', 'paris') alter table ice.s3.tessch.t12 drop column age select * from ice.s3.tessch.t12 alter table ice.s3.tessch.t8 add column place varchar(25)Workaround: ForPARQUET, run the following command in session:set session <catalog-name>.parquet_use_column_names=true;Note: ReplaceOr set<catalog-name>with the actual catalog being used.hive.parquet.use-column-names=truein catalog properties, and forORC, sethive.orc.use-column-names=truein catalog properties. For more information, see Catalog.
- Query failures during Presto (C++) engine deployment due to QHMM catalog update
-
Applies to: 2.2.1 and later
Provisioning a new Presto (C++) engine triggers a silent restart of the default Presto (Java )engine, which is associated with the QHMM catalog. This restart:
- Interrupts queries against remote data sources (IBM Cloud Object Storage, Amazon S3, IBM Db2),
- Causes execution errors during deployment,
- And fails to show any restart status in the UI, leading to unexpected workload disruption.
- Presto (C++) fails to parse CHAR(N) data type
-
Applies to: 2.3.0 and later
Queries executed through Presto (C++) fail when encountering CHAR(N) columns in relational databases. This results in the error:
Failed to parse type [char(N)]. syntax error, unexpected NUMBER. The issue occurs because Presto (C++) does not support CHAR(N) as a first-class type. Velox, the execution engine, lacks native CHAR(N) handling.Workaround: Enable system property
kCharNToVarcharImplicitCast
- Presto (C++) cannot read Iceberg tables with equality deletes
-
Applies to: 2.3.1
Presto (C++) engine cannot read Iceberg tables on which equality deletes have been performed by external engines. While Iceberg supports two types of delete operations (positional deletes and equality deletes), Presto (C++) currently lacks full support for equality delete operations.
Known issues: Presto (Java)
UPDATEwith subquery fails with "Not yet implemented" error in Presto- Presto (Java) does not support creation of schema and table in the Delta Lake
- Newly created Presto (Java) engine is not restored without HADR bucket association
- UPDATE with subquery fails with "Not yet implemented" error in Presto
- Applies to: 2.2.0 and later
When executing an
UPDATEstatement that uses a subquery to update data from one table into another, the query fails with the following error:java.lang.UnsupportedOperationException: not yet implemented: expression translator for com.facebook.presto.sql.tree.SubqueryExpressionWorkaround: Use a
MERGE INTOstatement instead ofUPDATEwith a subquery. Refer to the following examples for more information:UPDATE query:UPDATE "iceberg_data"."iceberg_schema"."employees" SET salary = salary + ( SELECT ( d.salary_correction ) FROM "iceberg_data"."iceberg_schema"."employees" e, "iceberg_data"."iceberg_schema"."department_bonus" d WHERE d.department_id = e.department_id );Equivalent MERGE query:merge into employees e using department_bonus d on d.department_id = e.department_id when matched THEN update set salary = salary + d.salary_correction;
- Presto (Java) does not support creation of schema and table in the Delta Lake
-
Applies to: 2.0.0 and later
- Newly created Presto (Java) engine is not restored without HADR bucket association
- Applies to: 2.2.2 and later
The system fails to restore a newly created Presto (Java) engine from the source cluster to the target cluster during HADR operations if the engine is not associated with HADR bucket storage. This results in the engine being absent from the target cluster post-restoration.
Workaround: Associate the Presto (Java) engine with the HADR backup bucket before initiating the restore.
Known issues: Query Optimizer
- Calculation error for
OPT_SORTHEAPin Query Optimizer - Query Optimizer is not upgraded during service upgrades
- Viewing logs for the table statistics jobs after upgrading watsonx.data gives an error
- Calculation error for
OPT_SORTHEAPin Query Optimizer -
Due to a calculation error in the configuration setting of Query Optimizer for the value of
OPT_SORTHEAP, the performance of Query Optimizer might be affected.Applies to: 2.0.0 and later
Workaround:- To resolve the calculation error for
OPT_SORTHEAPin Query Optimizer, complete the following steps to update the configuration asOPT_SORTHEAP= <initial_value>toOPT_SORTHEAP <initial_value>/20.- Set up the
PROJECT_CPD_INSTANCEenvironment variable pointing to the namespace where watsonx.data is installed.export PROJECT_CPD_INSTANCE=<wxd_namespace> - Edit the value of
OPT_SORTHEAPtoOPT_SORTHEAP <initial_value>/20by running the following command.oc edit db2uinstance lakehouse-oaas -n $PROJECT_CPD_INSTANCE - Wait for sometime for the
STATEto change toReady for lakehouse-oaasand run the following command.watch "oc get db2uinstance -n $PROJECT_CPD_INSTANCE"
- Set up the
- To resolve the calculation error for
- Query Optimizer is not upgraded during service upgrades
- Applies to:2.1.0 and later
During a watsonx.data service upgrade, the Query Optimizer component may not automatically upgrade. The
apply-olmcommand might not discover the Query Optimizer component for upgrade.Workaround: You must run the following command to upgrade Query Optimizer explicitly:
cpd-cli manage apply-olm --components=wxd_query_optimizer --release=5.1.1 --cpd_operator_ns=cpd-operator --license_acceptance=true
- Viewing logs for the table statistics jobs after upgrading watsonx.data gives an error
- Applies to: 2.2.2 only
After upgrading or patching a cluster with a newer watsonx.data build, previously collected table statistics jobs are still visible in the Statistics tab. However, attempting to view logs from these jobs results in the following error because the associated logs were deleted during the upgrade process, while the job entries were retained:
Error: Failed to find any logs for task ID: <Internal_task_ID>.
Known issues: OpenTelemetry
- OpenTelemetry configuration is not retained after upgrade
- Applies to: 2.2.0 and later
If OpenTelemetry was enabled in watsonx.data 2.1.1, the configuration is not preserved after upgrading to version 2.2.0 or later. Telemetry data collection will be disabled until the settings are manually reconfigured.
Workaround:
Manually reconfigure OpenTelemetry settings after upgrading to watsonx.data 2.2.0 or later versions. Refer Enabling OpenTelemetry for services in watsonx.data for detailed steps.
Known issues: Semantic automation for data enrichment
- Asset profile errors when publishing enriched assets from CloudPak for Data projects
- Business terms remain after the semantic automation layer integration is deleted from IBM watsonx.data
- SAL registration and re-registration failures in watsonx.data due to API key handling and missing governance permissions
- Asset profile errors when publishing enriched assets from CloudPak for Data projects
-
Applies to: 2.0.2 and later
- Business terms remain after the semantic automation layer integration is deleted from IBM watsonx.data
- Applies to: 2.1.0 and later
- SAL registration and re-registration failures in watsonx.data due to API key handling and missing governance permissions
- Applies to: 2.2.1 and laterIf you are trying to register SAL in the watsonx.data console, you may run into two common issues:
- If you enter an incorrect ZenApiKey during your first registration attempt, the system will block all future attempts, even if you correct the key; because it detects a conflicting backend record.
- If you encounter a permission error while deleting a SAL registration, it means your account lacks the Governance Artifacts Admin role, which is required to complete the deletion successfully.
Workaround: To resolve this issue, you need to assign the Governance Artifacts Admin role to your account. Once you have this role, you will be able to register, delete, and re-register SAL integrations without errors. Complete the following steps to assign the Governance Artifacts Admin role to your account:- Log in to IBM Software Hub.
- From the navigation menu, select Access control.
- Under the Roles tab, click New role.
- Enter a name and a description for the role.
- Click Next.
- Select Governance artifacts > Administer governance artifacts.
- Click Next.
- Review the summary and click Create.
- From the navigation menu, select Access control.
- Under the Users tab, click the Display name of the user in the table.
- Click Assign Roles.
- In the Assign Roles window, select the role you created and click Assign 1 role.
Known issues: Spark
ALTER TABLEoperation fails in Spark job submission- Database names containing hyphens or spaces cannot be queried by the Spark engine in a Python notebook, even when the appropriate Spark access control extension has been added.
- Spark job failure due to expired ADLS signature during Write/Delete/Update operation
- Spark application submission and Ingestion Jobs fail on FIPS enabled clusters
- Compatibility issue: Spark fails to read iceberg tables written by presto with Parquet V2
- Partitioned data access fails through Data Access Service (DAS) with Spark on S3-compatible storages
- Query server remains in "Active" state with invalid credentials if no catalog is associated
- Spark application fails to run when filename contains spaces or special characters
- Spark license type not updated after applying Software Hub (SWH) entitlement
- Spark license type not updated after reapplying Software Hub (SWH) entitlement
- Spark 4.0 fails to execute SQL queries in ANSI mode with provided configuration
- TPC-DS Query 54 fails with
ArithmeticExceptionin ANSI mode on Spark 4.0 - Spark 3.4 performance regression for TPC-DS queries with Red Hat JDK
- Restrictions on Spark write operations for Unity Catalog API tables in the Hive catalog
- Spark data ingestion fails when using MRAP buckets in watsonx.data 2.3.1
- Spark jobs fail to run due to excessive log files
- Databand-enabled Spark jobs fail on Spark 4.0 in watsonx.data
- ADLS Gen2 with service principal authentication not supported for Spark engine home bucket
- Failure details not populated for failed Spark applications
- Spark jobs on SOD cluster fail with "Deployment not found" error
- Row-level deletion fails in Presto for tables created by Spark
- Spark Applications Stuck in STARTING State on Dataplane Engines with Object Storage
ALTER TABLEoperation fails in Spark job submission-
Applies to: 2.0.1 and later
- Database names containing hyphens or spaces cannot be queried by the Spark engine in a Python notebook, even when the appropriate Spark access control extension has been added.
- Applies to: 2.1.0 and later
- Database names containing hyphens or spaces cannot be queried by the Spark engine in a Python notebook, even when the appropriate Spark access control extension has been added.
- Applies to: 2.1.0 and later
- Spark job failure due to expired ADLS signature during Write/Delete/Update operation
- Applies to: 2.0.3 and later
- Spark application submission and Ingestion Jobs fail on FIPS enabled clusters
- Applies to: 2.2 and later
Spark is not FIPS compliant, which causes Spark application submission and ingestion jobs to fail.
Workaround: Submit the Spark application in the FIPS enabled cluster with the following payload."conf": { "spark.driver.extraJavaOptions": "-Dcom.redhat.fips=false", "spark.executor.extraJavaOptions": "-Dcom.redhat.fips=false", }, "env": { "SPARK_MASTER_OPTS": "-Dcom.redhat.fips=false", "SPARK_WORKER_OPTS": "-Dcom.redhat.fips=false"
- Compatibility issue: Spark fails to read iceberg tables written by presto with Parquet V2
- Applies to: 2.2.1 and later
Spark fails to read data inserted into iceberg tables by Presto when Presto is explicitly configured to use the Parquet V2 writer. This issue occurs because Spark does not support vectorized reads for certain Parquet V2 encodings, such as
DELTA_BINARY_PACKED. A typical error message isUnsupportedOperationException: Cannot support vectorized reads for column [CustomerID] optional int32 CustomerID = 1 with encoding DELTA_BINARY_PACKED. Disable vectorized reads to read this table/file at org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator.initDataReader(VectorizedPageIterator.java:98).Workaround: If you encounter this error while reading a table especially one created using earlier versions of watsonx.data, set the following Spark configuration.config("spark.sql.iceberg.vectorization.enabled", "false")
- Partitioned data access fails through Data Access Service (DAS) with Spark on S3-compatible storages
- Applies to: 2.2.1 and later
When you use Apache Spark to access partitioned data through DAS on S3-compatible storages, you may encounter the following error:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden;Workaround: Add the following properties to your Spark job:"spark.hadoop.fs.s3a.bucket.<bucket_name>.endpoint":"xxx", "spark.hadoop.fs.s3a.bucket.<bucket_name>.access.key":"xxx", "spark.hadoop.fs.s3a.bucket.<bucket_name>.secret.key":"xxx", "spark.hadoop.fs.s3a.bucket.<bucket_name>.aws.credentials.provider":"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
- Query server remains in "Active" state with invalid credentials if no catalog is associated
-
Applies to: 2.2.2 and later
Query server in Spark engine does not use credentials if there is no catalog being associated, hence it will remain in "Active" state and it will continue to work as expected.
- Spark application fails to run when filename contains spaces or special characters
- Applies to: 2.2.1 and later
If you upload a Python (.py) file with spaces or special characters in its filename (e.g., wordcount (1).py), the Spark job fails to execute. The system does not handle such filenames, resulting in the following error during job submission.
/opt/ibm/entrypoint/start-spark-job-wrapper.sh: eval: line 293: syntax error near unexpected token (' /opt/ibm/entrypoint/start-spark-job-wrapper.sh: eval: line 293: spark-submit --master spark://spark-master-headless-b778988f-24ff-49c2-aa05-a56f3c204f0b:7077 s3a://sparkqa-donotdelete-pr-7aqi2frntm5vlz/spark_jobs/uploads/8f472c67-23aa-4f94-8ed6-c9f2dbe13e20/application/wordcount (1).py '/opt/ibm/spark/examples/src/main/resources/people.txt''Workaround: To avoid this issue, you must rename the Python application file to remove spaces and special characters before uploading. For example, rename
wordcount (1).pytowordcount_1.py.
- Spark license type not updated after applying Software Hub (SWH) entitlement
- Applies to: 2.3.0
Fixed in: 2.3.1
When the SWH entitlement is applied using the
cpd-cli apply entitlementcommand, the Spark does not automatically update itswxdLicenseTypefield. This can lead to inconsistencies in license recognition within the system.Workaround: After applying the entitlement using thecpd-cli apply entitlementcommand, manually patch the Spark CR using the following command:oc patch AnalyticsEngine analyticsengine-sample -n <instance_namespace> --type merge --patch '{"spec": {"wxdLicenseType":"<license_type>"}}'
- Spark license type not updated after reapplying Software Hub (SWH) entitlement
- Applies to: 2.3.1 and later
When the SWH entitlement is removed and then re-applied using the cpd-cli apply entitlement command, the Spark does not automatically update its
wxdLicenseTypefield. This can lead to inconsistencies in license recognition within the system.Workaround: After reapplying the entitlement using the
cpd-cli apply entitlementcommand, restart the lakehouse operator pod to get the changes reflected immediately. Without restarting the pod, the system will automatically reconcile the changes after 30 minutes.To restart the lakehouse operator pod, use the following command:oc delete pod -n <cpd_operator_namespace> -l app.kubernetes.io/name=ibm-lakehouse
- Spark 4.0 fails to execute SQL queries in ANSI mode with provided configuration
- Applies to: 2.3.0 and later
When running SQL queries on Spark 4.0 with ANSI mode enabled (
spark.sql.ansi.enabled=true), queries fail withExtendedAnalysisExceptiondue to strict type enforcement in ANSI mode. This issue occurs even when using configurations that work on Spark 3.5.Workaround: Use non-ansi mode by setting the parameter:
"spark.sql.ansi.enabled": "false"
- TPC-DS Query 54 fails with
ArithmeticExceptionin ANSI mode on Spark 4.0 - Applies to: 2.2.1
When running TPC-DS benchmark query Q54 on Spark 4.0 in ANSI mode, the query fails with an
ArithmeticException. This issue occurs on watsonx.data version 2.2.1 clusters and affects workloads using Spark 4.0 with ANSI mode enabled.Workaround: Use non-ansi mode by setting the parameter:
"spark.sql.ansi.enabled": "false"
- Spark 3.4 performance regression for TPC-DS queries with Red Hat JDK
- Applies to: 2.3.0 and later
Spark 3.4 with the patched runtime (
spark-hb-wxd-cpd-miniforge-runtimes-v34: sha256:80a0e477089eae0eb94375fd985979655db4f459862182a1d4bb4deeb6876aff) shows significant performance regressions for TPC-DS Scala queries Q2–Q4 when using Red Hat JDK. These queries are 2×–4× slower compared to watsonx.data 2.2.1, while Spark 3.5 shows expected performance improvements.Workaround: Use IBM Semeru Java runtime instead of Red Hat JDK for Spark 3.4 workloads. Testing with Semeru Java shows significant performance improvements over the baseline:
Spark 3.4 with Semeru Java (watsonx.data 2.3.0):
Configuration:
- Driver: 5×20 cores
- Executor: 4×16 cores
- Executor count: 14
Results (averaged over 3 runs):
- Q1: 309.3 seconds (~13.2% improvement over baseline)
- Q2: 531.7 seconds (~16.9% improvement over baseline)
- Q3: 550.3 seconds (~4.5% improvement over baseline)
- Q4: 528.9 seconds (~6.8% improvement over baseline)
- Configure your Spark application to use Semeru Java runtime instead of Red Hat JDK
- Update the runtime image reference in your Spark configuration
- Resubmit your Spark jobs with the updated configuration
- Restrictions on Spark write operations for Unity Catalog API tables in the Hive catalog
-
Applies to: 2.3.0 and later
Spark cannot fully write to Unity Catalog tables and schemas created through the Unity Catalog API when they use the Hive catalog. Currently, Spark does not allow you to run INSERT operations on these tables.
- Spark data ingestion fails when using MRAP buckets in watsonx.data 2.3.1
-
Applies to: 2.3.1 and later
Data ingestion jobs fail when Spark reads from or writes to an MRAP bucket. Spark returns the following error:
Access Points usage is required but not configured for the bucketWorkaround: Update the Spark engine configuration:
Change from:spark.hadoop.fs.s3a.accesspoint.required=trueChange to:spark.hadoop.fs.s3a.<bucket.bucketname.mrap>.accesspoint.required=true
- Spark jobs fail to run due to excessive log files
-
Applies to: 2.2.0 and later
All jobs running on Spark engines stop and fail to run, entering a loop of retrying. The
spark-workerandspark-masterpods enter a cycling state ofInit>Terminatingand never reach aRunningstate.Workaround: Apply a CronJob for periodic cleanup to prevent recurrence:
Note:This CronJob deletes logs older than seven days.
- Get the
spark-hb-nginximage:oc get deploy spark-hb-nginx -o=jsonpath='{$.spec.template.spec.containers[:1].image}' - Apply the following CronJob, replacing
<spark-hb-nginx>with the image retrieved in the previous step:apiVersion: batch/v1 kind: CronJob metadata: name: spark-platform-logs-cleanup spec: schedule: "0 0 */2 * *" jobTemplate: spec: template: spec: containers: - name: demo-clean image: <spark-hb-nginx> command: ["/bin/sh","-c"] args: - | echo "===== Spark jobs cleanup started =====" TARGET_PATH="/mnt/asset_file_api/projects" echo "Listing directories older than 7 days (DRY RUN):" find ${TARGET_PATH} \ -type d \ -regextype posix-extended \ -regex ".*/assets/spark_jobs/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89aAbB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89aAbB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}" \ ! -path "*/assets/spark_jobs/uploads/*" \ -mtime +7 \ -print echo "Deleting directories..." find ${TARGET_PATH} \ -type d \ -regextype posix-extended \ -regex ".*/assets/spark_jobs/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89aAbB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89aAbB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}" \ ! -path "*/assets/spark_jobs/uploads/*" \ -mtime +7 \ -exec rm -rf {} + echo "===== Cleanup finished =====" volumeMounts: - name: file-api-pv mountPath: /mnt/asset_file_api restartPolicy: OnFailure volumes: - name: file-api-pv persistentVolumeClaim: claimName: file-api-claim
- Get the
- Databand-enabled Spark jobs fail on Spark 4.0 in watsonx.data
-
Applies to: 2.2.0 and later
Spark jobs with Databand-enabled fails when using Spark 4.0. The failure occurs during job submission due to incompatibilities between Databand Spark instrumentation and Spark 4.0 APIs.
- ADLS Gen2 with service principal authentication not supported for Spark engine home bucket
-
Applies to: 2.3.2 and later
ADLS Gen2 buckets configured with service principal authentication mode are not supported as engine home bucket for Spark engines.
- Failure details not populated for failed Spark applications
-
Applies to: 2.3.2 and later
Failure details are not populated in the Spark engine applications table for certain failed Spark applications. When expanding a failed application row in the UI, the failure details section remains empty.
Workaround : For applications that started running before failing, you can download the application logs to investigate the failure cause.
- In the Spark engine applications table, check if the failed application has a value in the Started on column
- If the Started on column has a timestamp, it indicates the application started running before it failed
- For such applications, you can download the application logs from the overflow menu of that application's row in the table to investigate the failure
- Spark jobs on SOD cluster fail with "Deployment not found" error
-
Applies to: 2.3.0 and later
Spark jobs submitted to a SOD (Software on Demand) cluster using the wxd-spark engine fail with a deployment error.
Workaround: Create thewxd-norbac-saservice account in the dataplane namespace:oc create serviceaccount wxd-norbac-sa -n ${DATAPLANE_NAMESPACE}
- Row-level deletion fails in Presto for tables created by Spark
-
Applies to: 2.2.1 and later
When you attempt to delete rows from an Iceberg table that was created by Spark using a Presto engine, the operation fails with the following error:
This connector only supports delete where one or more partitions are deleted entirely. Configure write.delete.mode table property to allow row level deletions.Workaround: Use Spark SQL to perform row-level delete operations on tables created by Spark. Spark supports row-level deletions in both
copy-on-writeandmerge-on-readmodes while Presto does so with onlymerge-on-readmode.
- Spark Applications Stuck in STARTING State on Dataplane Engines with Object Storage
-
Applies to: 2.4.0 and later
Spark applications submitted to dataplane engines that use Object Storage (COS) as the home bucket may become stuck in the STARTING state and fail to transition to RUNNING or COMPLETED.
Known issues: SQL queries
InsertandSelectwith Hive and Presto (C++) on FIPS enabled clusters- String literal interpretation in Presto
- PrestoPresto (Java) queries with many columns and size exceeding default limit
- An unexpected error in parquet metadata reading occurs when running the queries on partitioned data
DROP TABLEcommand on an Iceberg table does not remove folder and files from object storage
InsertandSelectwith Hive and Presto (C++) on FIPS enabled clusters-
Applies to: 2.0.0 and later
- String literal interpretation in Presto
-
Applies to: 1.1.0 and later
- PrestoPresto (Java) queries with many columns and size exceeding default limit
-
Applies to: 1.1.0 and later
- An unexpected error in parquet metadata reading occurs when running the queries on partitioned data
-
Applies to: 1.1.1
DROP TABLEcommand on an Iceberg table does not remove folder and files from object storage-
Applies to: 1.1.0 and later
Known issues: Web console
- Missing data validation for Amazon S3 storage endpoints
-
Applies to: 1.1.0 and later
Known issues: Others
- Ingesting data to default or externally added buckets by using watsonx.data Presto connector might fail
- Data that is imported from watsonx.data bucket fails
- Query history page displays an error when a catalog or schema is not specified in a query
- The Schema load, create table, and insert queries are not working in the Developer edition
- Querying QHMM
query_history_viewfails withInvalid format: "" error - Materialized table creation in the Query workspace Succeeds but fails in the Spark notebook using the same permissions
- Trust store access issue on fresh or upgraded CPD clusters
- SQL views cannot be queried across engines(Spark/Presto)
- Stats sync job remains stuck during execution
- ScaleConfig changes not reflected until operator restart
- Salesforce connection fails in watsonx.data when deployed in separate namespaces
- Query execution fails temporarily after updating storage or database credentials in watsonx.data developer edition
- Postgres primary missing after power outage in watsonx.data
- watsonx.data UI inaccessible after Software hub reinstallation
- Hive metastore customizations lost after upgrade to watsonx.data version 2.1.x due to transition to MDS
- Milvus and Presto edit details page shows internal server error initially after creation
- Steward users unable to perform ingestion despite having required permissions
- Data protection not enforced in watsonx.data for assets added through Presto connector
- No API key validation when configuring Unified Lineage in watsonx.data
- ACL pods remain after ACL bucket deletion
- Presto and Milvus public hostnames change after upgrade to watsonx.data 2.3.1
- watsonx.data instance not upgraded after service upgrade
- S3 buckets with MRAP are not supported on Power architecture
- The watsonx.data upgrade fails on step 2
- User group limit for /idp/users_groups API
- Ingesting data to default or externally added buckets by using watsonx.data Presto connector might fail
-
Applies to: 2.1.0 only
Ingesting data into default buckets or externally added buckets by using the watsonx.data Presto connector with the data policies defined in watsonx.data might fail.
Workaround: Use File connectors to ingest data into storage associated in watsonx.data.
For example, you can use Generic S3 connector to ingest data into any S3 compliant storage in watsonx.data, like Amazon S3 storage or IBM Cloud Object Storage.
- Data that is imported from watsonx.data bucket fails
-
Applies to: 1.1.0 and later
When you run the import script (import-bucket-data.sh) to restore the watsonx.data bucket data, the system displays an error that the containeribm-lh-lakehouse-miniois not found.
Workaround: Delete the running MinIo pod by using the following command and rerun the import script.+ oc exec -t ibm-lh-lakehouse-minio-7db7c6788f-g4r8 -n cpd-instance -- bash -c 'mc alias set ibm-lh http://ibm-lh-lakehouse-minio-svc.cpd-instance.svc.cluster.local:9000 <access-key> <secret_key> \ --config-dir=/tmp/.mc \ --insecure && mc alias set ibm-lh_backup https://s3.us-west-2.amazonaws.com/ <access-key> <secret_key> \ --config-dir=/tmp/.mc --insecure' error: unable to upgrade connection: container not found ("ibm-lh-lakehouse-minio")oc delete -n $CPD_NAMESPACE $(oc get rs -o name -n $CPD_NAMESPACE | grep "ibm-lh-lakehouse-minio")
- Query history page displays an error when a catalog or schema is not specified in a query
- Applies to: 2.1.0 and later
When running a query in the Query workspace with a catalog or schema selected from the drop-down, the query executes successfully. However, an error message stating "Catalog or Schema must be specified when session catalog or schema is not set" may appear when retrieving the query details from the query history page.
Workaround: To prevent this issue, it is recommended to explicitly specify the catalog and schema within the query itself. This ensures that the query not only executes successfully but also that its details can be accurately retrieved from the query history page.
- The Schema load, create table, and insert queries are not working in the Developer edition
Applies to: 2.1.2 and later
Users cannot list schemas, create or insert tables, or run any other queries from both the Data Manager and Query workspace. This issue occurs intermittently in the Developer editionWorkaround: If you have the installation privileges (or can contact a privileged user), complete the following steps:
- Run the following commands:
podman exec -it ibm-lh-postgres sh export PGPASSWORD=$POSTGRES_PASSWORD - Run the following command to login to
PostgresSQL:
psql -h ibm-lh-postgres-svc -p 5432 -U $POSTGRES_USER -w -d ibm_lh_repo select pg_terminate_backend(pid) from pg_stat_activity where datname='iceberg_catalog'; DROP DATABASE "iceberg_catalog"; create database iceberg_catalog; exit - Run the following commands to stop and restart MDS
service.
cd /root/ibm-lh-dev/bin ./stop_service ibm-lh-mds-rest ./stop_service ibm-lh-mds-thrift ./start_service ibm-lh-mds-rest ./start_service ibm-lh-mds-thrift
- Run the following commands:
- Querying QHMM
query_history_viewfails withInvalid format: "" error - Applies to: 2.1.1 and later
Querying QHMM
query_history_viewfails withInvalid format: ""error whenend_dateis null string. This error occurs whenfrom_iso8601_timestamp(end)called in the view.Workaround: Use
query_historytable instead.
- Materialized table creation in the Query workspace Succeeds but fails in the Spark notebook using the same permissions
- Applies to: 2.1.3 and later
When attempting to create a materialized table using a SQL query in the Query workspace, the operation is successful. The user has read access to the bucket and appropriate access policies (insert, update, select, delete) for the default Iceberg catalog. However, when the same SQL statement is executed within a Spark notebook using the watsonx.data Spark template generates the following error
the action is not allowed.Workaround: Define the L3 policy for the iceberg-bucket storage in the
Create access control policypage.
- Trust store access issue on fresh or upgraded CPD clusters
- Applies to: 2.2 and later
On a fresh or upgraded CPD cluster, the system may log the exception
java.security.KeyManagementException:problem accessing trust store, which can disrupt key functionalities such as executingCREATE SCHEMAcommands and synchronizing remote data sources.Workaround: Use the following command to restart the MDS pod after logging in with administrative access using oc.kubectl get pods | grep mds # get mds pods ibm-lh-lakehouse-mds-rest-7894ccc9c-np9dw # find the rest pod kubectl delete pod ibm-lh-lakehouse-mds-rest-7894ccc9c-np9dw
- SQL views cannot be queried across engines(Spark/Presto)
- Applies to: 2.1.0 and later
SQL views created by an engine with Hive iceberg catalog are recognised by other engines, but cannot be queried across engines, as one engine cannot understand the SQL dialect of another engine.
- Stats sync job remains stuck during execution
- Applies to: 2.2.1 and later
Stats sync job may remain stuck during execution due to unknown conditions. When this occurs, users can check the logs to view the job status in the optimizer or Db2. If the job status is NOTRECEIVED, NOTRUN, or UNKNOWN, users must manually force delete the job.
After the stuck job is deleted:- If there are jobs currently in the Queued list, the first one will automatically move to Active and begin execution.
- If no jobs are queued, users can manually submit a new job.
Status Definitions:- NOTRECEIVED: The system did not receive a call for the given task ID.
- NOTRUN: An error prevented the scheduler from invoking the task’s procedure.
- UNKNOWN: The task began execution, but the scheduler failed to record the outcome due to an unexpected condition.
- ScaleConfig changes not reflected until operator restart
- Applies to: 2.2.1 and later
Changing the T-shirt size in the watsonx.data CRD updates the ScaleConfig, but the change is not reflected in watsonx.data progress until the operator is restarted.
- Salesforce connection fails in watsonx.data when deployed in separate namespaces
- Applies to: 2.2.1 and later
Users may experience connection failures when trying to connect to Salesforce in watsonx.data, even if the credentials and settings are correct. This issue occurs when IBM watsonx.data and Common Core Services (CCS), such as Watson Studio, are deployed in different namespaces (for example,
watsonxin one namespace and CCS in another likezenorcpd-instance).The system requires both watsonx.data and CCS to be deployed in the same namespace for the Flight server (which handles data connections) to function properly. If they are deployed separately, users may see unclear error messages such as
Invalid hostname or credentialsorService is temporarily unavailable. Additionally, the system currently does not allow users to disable SSL or upload their own security certificates.Workaround: Make sure that both watsonx.data and CCS are installed in the same namespace during deployment.
- Query execution fails temporarily after updating storage or database credentials in watsonx.data developer edition
- Applies to: 2.2.1 and later
After updating expired credentials for a storage or database resource associated with a Presto engine in watsonx.data developer edition, query execution in the Query workspace fails for approximately 30 to 40 seconds. After this delay, queries execute successfully without further issues.
- Postgres primary missing after power outage in watsonx.data
-
Applies to: 2.2.0 and later
After a data center power outage, users running watsonx.data on Software Hub may encounter a failure in the ibm-lh-postgres-edb cluster where no primary instance is available. In one reported case, one pod (ibm-lh-postgres-edb-3) was stuck in a switchover state, displaying the message “This is an old primary instance, waiting for the switchover to finish,” while kubectl cnpg status indicated “Primary instance not found.” This caused the entire cluster to become unresponsive, making watsonx.data unusable.
Workaround: To restore cluster, complete the following steps:
-
Identify a Healthy Replica: Run the following command to check the status of the Postgres pods:
oc get pods | grep postgresLook for a pod in 1/1 Running state. This indicates a healthy replica.
-
Promote the Healthy Replica: Run the cnp tool to promote the healthy replica to primary. For example, if ibm-lh-postgres-edb-1 is healthy:
kubectl cnp promote ibm-lh-postgres-edb ibm-lh-postgres-edb-1 -
Force Delete the Stuck Primary (if promotion is blocked): If the promotion fails due to lingering artifacts from the old primary pod, run the following command to force delete the problematic pod and its PVC.Note: Ensure your replicas are in 1/1 Running state before proceeding.
kubectl delete pod ibm-lh-postgres-edb-3 --grace-period=0 --force kubectl delete pvc ibm-lh-postgres-edb-3 -
Verify Cluster Recovery: After cleanup, the new primary pod and PVC should come up. Run the following command to confirm the cluster status:
kubectl cnp status ibm-lh-postgres-edb
-
- watsonx.data UI inaccessible after Software hub reinstallation
- Applies to: 2.2.1 and later
After uninstalling and reinstalling Software hub on a cluster, the user may be unable to access the watsonx.data UI. This issue occurs because the
wxdengineCustom Resource (CR) from the previous installation remains in the cluster and causes reconciliation failures during reinstallation. This typically affects users who uninstall watsonx.data when an engine (such as Presto (C++)) never finished provisioning.Workaround: Use one of the following method.
- Method one: Before uninstalling watsonx.data, delete the engine that failed to finish provisioning directly from the UI.
- Method two: After uninstalling watsonx.data, run the following command to remove any
leftover
wxdengineresources:oc delete wxdengine <engineName>
- Hive metastore customizations lost after upgrade to watsonx.data version 2.1.x due to transition to MDS
-
Applies to: 2.1.0 and later
Upgrading from watsonx.data version 2.0.x to 2.1.x causes the system to lose Hive Metastore (HMS) resource customizations, such as hive_resources_limits_memory. This happens because version 2.1.x replaces HMS with Metadata Service (MDS), which does not automatically inherit HMS settings.
Workaround: To restore the previous configuration, manually set the equivalent MDS resource properties after the upgrade:
- hive_resources_limits_cpu maps to mds_thrift_resources_limits_cpu
- hive_resources_limits_memory maps to mds_thrift_resources_limits_memory
- hive_resources_requests_cpu maps to mds_thrift_resources_requests_cpu
- hive_resources_requests_memory maps to mds_thrift_resources_requests_memory
- Milvus and Presto edit details page shows internal server error initially after creation
-
Applies to: 2.2.1 and later
You might encounter a 500 Internal Server Error when trying to edit the description on the Milvus and Presto engine details page shortly after creating the engine. This issue typically occurs during the first few attempts because the system delays policy propagation due to caching.
- Steward users unable to perform ingestion despite having required permissions
-
Applies to: 2.2.1 and later
Fixed in: 2.3.1
Steward users with appropriate permissions (including metastore admin rights and write access to Iceberg catalog buckets) are unable to perform ingestion operations
- Data protection not enforced in watsonx.data for assets added through Presto connector
-
Applies to: 2.2.1 and later
When a table from watsonx.data is added to an IBM Knowledge Catalog (IKC) governed catalog using the generic Presto connector, DPS access‑deny (and masking) rules are enforced in Software Hub but are not enforced in the watsonx.data console.
The same table, if added using the IBM watsonx.data Presto connector, is enforced in both Software Hub and watsonx.data.
Workaround: Use regular or watsonx.data connector to obtain assets from watsonx.data side.
- No API key validation when configuring Unified Lineage in watsonx.data
-
Applies to: 2.2.2 and later
Currently, there is no API key validation when configuring Unified Lineage in watsonx.data. If the API key is invalid, expired, or incorrectly updated, lineage ingestion fails without any indication, and no lineage graphs are generated. The interface does not display any error message when the API key is invalid.
Workaround:
- Double-check the API key used when configuring Unified Lineage with watsonx.data.
- If no lineage graphs are generated, test the API key to ensure it is valid and can generate a bearer token.
- ACL pods remain after ACL bucket deletion
-
Applies to: 2.3.1 and later
When a user initiates ACL bucket deletion, the bucket is sometimes deleted before the ACL service pods are removed. If the user tries to add another ACL bucket from the UI (Add storage tearsheet), they will not be able to designate the bucket as an ACL storage since the service pods are still running.
Workaround: Run the following command to disable the ACL component.oc patch wxd/lakehouse \ --type=merge \ -n ${PROJECT_CPD_INST_OPERANDS} \ -p '{ "spec": { "enable_ACLv2": false } }'
- Presto and Milvus public hostnames change after upgrade to watsonx.data 2.3.1
- Applies to: 2.3.1
The upgrade to watsonx.data 2.3.1 changes the public hostnames for Presto and Milvus, which breaks existing connections and bookmarked URLs to these services.
Workaround: To restore the previous hostnames, you can customize the network configuration for your engines. For detailed instructions, see Customizing network configuration.
Example command to customize the Presto (C++) engine hostname:oc patch wxdengine/lakehouse-prestissimo126 \ -n <operand_ns> \ --type=merge \ -p '{ "spec": { networking: custom_hosts: presto: "ibm-lh-lakehouse-prestissimo126-presto-svc-cpd-instance.apps.perf7.ibm.prestodb.dev" } }'
- watsonx.data instance not upgraded after service upgrade
-
Applies to: 2.2.1 and later
After upgrading the watsonx.data service, the watsonx.data instance may not be automatically upgraded along with the service upgrade. The instance continues to display the previous version in the Software Hub, even though the underlying Custom Resources (CRs) have been successfully updated to the new version.
Workaround: Restart the zen-watcher pod to refresh the watsonx.data instance and make it accessible.
- S3 buckets with MRAP are not supported on Power architecture
-
Applies to: 2.3.1 and later
When using watsonx.data on Power (ppc64le) architecture, S3 storage buckets configured with Multi-Region Access Point (MRAP) are not supported.
- The watsonx.data upgrade fails on step 2
- Applies to: Upgrades to 2.3.1 and later
Fixed in: 2.3.1 patch 3
When you upgrade watsonx.data to Version 2.3.1 or a 2.3.1 patch, the
install-componentscommand fails with the following error:[ERROR] The command timed out waiting for the custom resources associated with the watsonx_data component to be ready on the host ... - Diagnosing the problem:
-
- Check the status of the watsonx.data custom
resource (
WxdAddon wxdaddon):oc get WxdAddon wxdaddon -n ${PROJECT_CPD_INST_OPERAND} - Look for the following information in the
statussection:progress: 60% progressMessage: 3/7 - Successfully performed blue-green switch upgradeStatus: 2/7 - Staging an upgraded blue-green presto/prestissimo upgradeStep: "2"
If you see the preceding information, proceed to Resolving the problem.
- Check the status of the watsonx.data custom
resource (
- Resolving the problem:
-
To resolve the problem:
- Delete the
ibm-lh-tls-certificatecertificate:oc delete certificate.cert-manager.io/ibm-lh-tls-certificate \ -n ${PROJECT_CPD_INST_OPERANDS} - Delete the
ibm-lh-tls-secretsecret:oc delete secret/ibm-lh-tls-secret \ -n ${PROJECT_CPD_INST_OPERANDS} - Wait for the certificate and secret to be recreated:
oc get secret,cert \ -n ${PROJECT_CPD_INST_OPERANDS} \ | grep ibm-lh-tlsThe command should return a response with the following format:
secret/ibm-lh-tls-secret kubernetes.io/tls 7 9m19s certificate.cert-manager.io/ibm-lh-tls-certificate True ibm-lh-tls-secret 9m20s - Set the WXD_OPERATOR_POD to the name of the watsonx.data operator
pod:
export WXD_OPERATOR_POD=$(oc get po -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep "ibm-lakehouse-controller-manager-") - Delete the operator
pod:
oc delete ${WXD_OPERATOR_POD} -n ${PROJECT_CPD_INST_OPERANDS}
The upgrade will complete after the operator restarts.
- Delete the
- User group limit for /idp/users_groups API
-
Applies to: 2.4.0 and later
The
/idp/users_groupsAPI supports a maximum of 100 user groups. Using more than 100 user groups can cause timeout issues.
Limitations: Access
- Spark does not support IKC data governance policies
- User access control is not supported for fully managed and self managed Spark engines
- Add external MinIO bucket to allowlist to establish connection from air-gapped watsonx.data cluster.
- Spark does not support IKC data governance policies
-
The Spark engine uses the built-in data policies for data governance and does not support IKC data access policies.
- User access control is not supported for fully managed and self managed Spark engines
-
Applies to: 1.1.2 and later
The Access control tab is not supported for fully-managed or self-managed Spark engine. Administrators cannot carry out the access control operations for fully managed or self managed Spark engines.
- Add MinIO bucket to allowlist to establish connection with watsonx.data
-
Applies to: 1.1.2 and later
Limitations: Catalog, schema, and tables
- Unsupported special characters in schema and table creation
- Unsupported special characters in storage location creation
- Use valid schema, table, and column names to ensure query reliability
- Cross catalog schema creation anomaly in Presto
- Creating schemas in the root path of Ceph Object Storage gives an error
- Hive does not support
jsondata that starts with array - Hive catalog table creation by using
external_locationfails due to wrong placement of file - Table creation fails if the column names differ only by spaces
- Unable to delete data from columns with special characters in their names
- Unsupported special characters in schema, table, and column creation
-
Applies to: 2.1.0 and later
The following special characters are not supported while creating schemas and tables:
Schemas (Hive and Iceberg):
{,[,(, and).Tables (Hive):
{,[,(, and).(Creation of tables within a schema name that starts with the special character `@` shall result in an error).Tables (Iceberg):
{,[,(,),$, and@. - Additional unsupported special characters from 2.1.1
-
The following special characters are not supported while creating schemas and tables:
Schemas (Hive and Iceberg):
/,^,+,?,*, and$.Tables (Hive):
/,^,+,?,*, and$.Tables (Iceberg):
/,^,+,?, and*.It is recommended to not use special characters such as question mark (?), hyphen (-), asterisk (*) or delimiter characters like \r, \n, and \t in table, column, and schema names. Though these special characters are supported and tables, columns, and schemas can be created, using them might cause issues when running the
INSERTcommand or applying access policies for the same.To ensure a seamless experience, please follow the list below:- Schema names can contain letters, numbers or one of
!,#,&,],},<,>,=,%, and@. - Table names can contain letters, numbers or one of
!,#,&,],},<,>,=, and;. - Columns can contain letters, numbers one of
!,#,&,[,],<,>,_,:, and@.
- Schema names can contain letters, numbers or one of
- Additional unsupported special characters in table creation from 2.1.3
- The following special characters are not supported while creating
tables:
}, ", and '.
- Additional unsupported special characters in schemas creation from 2.3.0
- The following special characters are not supported while creating storage location:
", \, :, ;, and '.
- Unsupported special characters in storage location creation
- Applies to: 2.1.3 and later
The following special characters are not supported while creating storage location:
$, ^, +, ?, *, {, [, (, }, @, ", and '.
- Additional unsupported special characters in storage location creation from 2.3.0
- The following special characters are not supported while creating storage location:
\, ), :, ;, and >.
- Use valid schema, table, and column names to ensure query reliability
- Applies to: 2.2.1 and later
Avoid using leading or trailing spaces in schema, table, or column names when creating tables in the Query workspace. While the creation may succeed, these extra spaces can lead to issues during querying or interaction. To ensure smooth and reliable operation, always use clean names without extra spaces.
- Cross catalog schema creation anomaly in Presto
-
Applies to: 1.1.0 and later
An anomaly exists in schema creation for Hive and Iceberg catalogs managed by Presto. When using a common Hive Metastore Service for multiple catalogs (Example, an Iceberg catalog and a Hive catalog, or two Iceberg or Hive catalogs), creating a schema in one catalog might create it in a wrong catalog. This occurs if the location specified during schema creation belongs to a different catalog than intended.
Workaround: You must always explicitly provide the correct storage path associated with the target catalog when using
CREATE SCHEMAstatements in Presto. This ensures the schema is created in the desired location.
- Creating schemas in the root path of Ceph Object Storage gives an error
-
Applies to: 1.1.0 and later
Due to a bug in IBM Storage Ceph 5/6 and Red Hat Ceph Storage 4/5/6, if you are creating schema in the root path of one of the Ceph Object Storage in the watsonx.data, it gives you the following error message.Executing query failed with error: com.facebook.presto.spi.PrestoException: Failed to create schema. Check the credentials, permissions and storage path for the bucket. Make sure that the bucket is registered with wxd and retry.Solution: You can upgrade your IBM Storage Ceph and Red Hat Ceph Storage to 7.0z1 and 7.1 versions respectively.
Workaround: If you are still using the older IBM Storage Ceph 5/6 and Red Hat Ceph Storage 4/5/6 versions, you must do the following:
When you create a schema in Ceph Object Storage, a pseudo-directory must be created prior to creating schema in watsonx.data.
Run the following command to use the
s5cmdS3 client to create a pseudo-directory and insert an empty file into it:touch a s5cmd --endpoint-url s3.ceph.example.com cp a s3://watsonx/mycatalog/myschema/The copy command puts an empty file in the
/mycatalog/myschemapseudo-directory.Use the newly created pseudo-directory as the path for creating schema in watsonx.data web console.
- Hive does not support
jsondata that starts with array -
Applies to: 1.1.0 and later
Hive does not support
jsondata that starts with array.
- Hive catalog table creation by using
external_locationfails due to wrong placement of file -
Applies to: 1.1.0 and later
Hive catalog table creation by using
external_locationfails when the file is placed in the root of the bucket.
- Table creation fails if the column names differ only by spaces
-
Applies to: 1.1.0 and later
When you create a table from a data file by using the watsonx.data web console, the column names must be unique. Due to this limitation, if a CSV data file has column names that differ only by "spaces" for example,
Cash Flow per ShareandCashFlowPerShare, then these columns are considered to have the same names and table creation fails.
- Unable to delete data from columns with special characters in their names
-
Applies to: 2.1.1 and later
Unable to delete data from columns with special characters in their names, as special characters are not supported in column names within the
WHEREclause.
Limitations: Database and storage
- Transactions not supported in unlogged Informix databases
- LDAP authentication is not supported for Teradata connector
- Netezza®
Performance Server
INSERTstatement limitation - Unsupported Db2 operations
- Handling Null Values in Elasticsearch
- Loading Nested JSON with Elasticsearch
-
Db2 does not
support
CREATE VIEWstatement for a table from another catalog - Netezza
Performance Server does not support
CREATE VIEWstatement for a table from another catalog - Failure to filter
DATE,TIME,TIMESTAMP, andVARBINARYcolumns usingWHEREclause in MongoDB connector
- Transactions not supported in unlogged Informix databases
-
Applies to: 1.1.4 and later
In watsonx.data, when attempting to execute queries with transactional implications on unlogged Informix databases, queries will fail. This is because unlogged Informix databases, by design, do not support transactions.
- LDAP authentication is not supported for Teradata connector
-
Applies to: 1.1.0 and later
The watsonx.data Teradata connector does not currently support LDAP (Lightweight Directory Access Protocol) for user authentication.
- Netezza Performance Server
INSERTstatement limitation -
Applies to: 1.1.0 and later
Netezza Performance Server currently does not support inserting multiple rows directly into a table using
VALUESclause. This functionality is limited to single-row insertions. Refer to the official Netezza Performance Server documentation for details on theINSERTstatement.The following example usingVALUESfor multiple rows is not supported:INSERT INTO EMPLOYEE VALUES (3,'Roy',45,'IT','CityB'),(2,'Joe',45,'IT','CityC');Workaround: Use a subquery withSELECTandUNION ALLto construct a temporary result set and insert it into the target table.INSERT INTO EMPLOYEE SELECT * FROM(SELECT 4,'Steve',35,'FIN','CityC' UNION ALL SELECT 5,'Paul',37,'OP','CityA') As temp;
- Unsupported Db2 operations
-
Applies to: 1.1.0 and later
watsonx.data currently does not support the
ALTER TABLE DROP COLUMNoperation for Db2 column-organized tables.Note: By default, Db2 instances create tables in column-organized format.watsonx.data does not support creating row-organized tables in Db2.
- Handling Null Values in Elasticsearch
-
Applies to: 1.1.0 and later
Elasticsearch connector requires explicit definition of index mappings for fields to handle null values when loading data.
- Loading Nested JSON with Elasticsearch
-
Applies to: 1.1.0 and later
Elasticsearch connector requires users to explicitly specify nested JSON structures as arrays of type ROW for proper loading and querying. To process such structures, use the UNNEST operation.
-
Db2 does not support
CREATE VIEWstatement for a table from another catalog -
Applies to: 1.1.0 and later
For Db2, you can create the view for a table only if that table is in the same catalog and the same schema.
- Netezza Performance Server does not support
CREATE VIEWstatement for a table from another catalog -
Applies to: 1.1.0 and later
For Netezza Performance Server, you can create the view for a table only if that table is in the same catalog and the same schema.
- Failure to filter
DATE,TIME,TIMESTAMP, andVARBINARYcolumns usingWHEREclause in MongoDB connector - Applies to: 2.3.0 and laterWhen using the MongoDB connector in Presto, queries with a
WHEREclause fail to return records when filtering on columns of the following data types:DATETIMETIMESTAMPVARBINARY
ROW FILTERrely on the underlyingWHEREclause for evaluation.
Limitations: Engine
- Back up your data to prevent data loss
- PrestoPresto (Java) needs precision of the DECIMAL column to be within a valid range
- Unable to create views in Presto
- HMS and PrestoPresto (Java) log level from Default Error level to Debug level
- External Spark engines are not migrated during upgrade
- Presto Federation read-only operations limitation
- Back up your data to prevent data loss while working with VS Code development environment - Spark Labs
- Applies to: 2.0.0 and later
As Spark labs are ephemeral in nature, you must back up the data stored periodically to prevent potential data loss during upgrades or a Spark master crash.
- PrestoPresto (Java) needs precision of the DECIMAL column to be within a valid range
- Applies to: 1.1.0 and later
PrestoPresto (Java) needs precision of the DECIMAL column in the PostgreSQL table creation statement to be within a valid range.
- Unable to create views in Presto
-
Applies to: 1.1.0 and later
PrestoPresto (Java) describes a view in a mapped database as a TABLE rather than a VIEW. This is apparent to JDBC program connecting to the PrestoPresto (Java) engine.
- HMS and PrestoPresto (Java) log level from Default Error level to Debug level
-
Applies to: 1.1.0 and later
watsonx.data console does not support changing HMS and PrestoPresto (Java) log level from Default Error level to Debug level.
Workaround:- Run the following curl command to change the log level inside the HMS
pods:
curl -k -X POST 'https://localhost:8281/v1/hms/loglevel' -H 'Content-Type: application/json' -d '{"log-level": "DEBUG"}' - Run the following curl command to change the log level inside the Presto
pods:
curl --location 'https://<host>:8481/v1/lh_engine/change_configuration' \ -k --header 'secret: $LH_INSTANCE_SECRET' \ --header 'Content-Type: application/json' \ --data '{ "type":"loglevel", "value":"info", "restart":true }'
- Run the following curl command to change the log level inside the HMS
pods:
- External Spark engines are not migrated during upgrade
-
Applies to: 2.3.1
External Spark engines that are created in watsonx.data 2.3.0 are not available after you upgrade to watsonx.data 2.3.1.
External Spark engine support is removed in watsonx.data 2.3.1. During the upgrade from 2.3.0 to 2.3.1, external Spark engines are not migrated and are not returned by the Spark Engine APIs. As a result, external Spark engines are not displayed in the console.
In fresh installations of watsonx.data 2.3.1, the external Spark engine option is not available.
- Presto Federation read-only operations limitation
-
Applies to: 2.4.0 and later
Presto Federation supports only read-only operations. Write operations and DDL statements are not available.
Presto Federation supports the following read-only operations:SELECT- Query data from federated data sourcesDESCRIBE- Display column information for tablesSHOW- Display metadata (catalogs, schemas, tables, columns)
Other operations such as
INSERT,UPDATE,DELETE, and DDL statements are not available through Presto Federation.
Limitations: Presto (C++)
- Limitations - Presto (C++)
- Metastore admins and metastore viewers are unable to view the schema and table details
- Limitations - Presto (C++)
-
Applies to: 2.0.0 and later
- Presto (C++) engine currently does not support database catalogs.
- Parquet is the only supported file format.
- Hive connector is supported.
- Default Iceberg table has read only support for Parquet v1 format.
- TPC-H/TPC-DS queries are supported.
DELETE FROMandCALLSQL statements are not supported.START,COMMIT, andROLLBACKtransactions are not supported.- Data types
CHAR,TIME, andTIME WITH TIMEZONEare not supported. These data types are subsumed byVARCHAR,TIMESTAMP, andTIMESTAMP WITH TIMEZONE.IPADDRESS,IPPREFIX,UUID,kHYPERLOGLOG,P4HYPERLOGLOG,QDIGEST, andTDIGESTare not supported.VARCHARsupports only a limited length.Varchar(n)with a maximum length bound is not supported.TIMEandTIME WITH TIMEZONEis supported in community development.TIMESTAMPcolumns in Parquet files cannot be read.
- Scalar functions:
IPFunctions,QDigest,HyperLogLog, and Geospatial internationalization are not supported.
- Aggregate functions:
- QDigest, Classification metrics, and Differential entropy are not supported.
- S3 and S3 compatible file systems (both read and write) are supported.
- Metastore admins and metastore viewers are unable to view the schema and table details
- Applies to: 2.1.2 and later
A user with the metastore admin and metastore viewer privileges in the Query workspace and Data manager, cannot view the schema and table details unless a view policy is defined for schemas and tables.
Limitations: SQL queries
- Incomplete information on column length in
SHOW COLUMNSoutput - Timestamp with timezone handling limitation in
CREATE/ALTER TABLE - Alter column is not supported for Hive and Iceberg catalogs
- Incomplete information on column length in
SHOW COLUMNSoutput -
Applies to: 1.1.0 and later
The
SHOW COLUMNSquery in Presto currently provides information about columns including name, data type, additional details (extra), and comments. This issue highlights that the existing functionality lacks details about the length of character-based data types (CHAR and VARCHAR). While some connectors return the actual length defined during table creation, others might provide a default value or no information at all.To address this limitation, three new columns have been added to the
SHOW COLUMNSoutput:- Scale: Applicable to DECIMAL data type, indicating the number of digits after the decimal point.
- Precision: Applicable to numerical data types, specifying the total number of digits. (Default: 10)
- Length: Intended for CHAR and VARCHAR data types, representing the maximum number of characters allowed.
Current Limitations:
-
The reported length in the "Length" column might not always reflect the actual size defined in the table schema due to connector limitations.
-
Connectors that don't provide length information will display a default value or null depending upon connector.
- Timestamp with timezone handling limitation in
CREATE/ALTER TABLE -
Applies to: 1.1.4 and later
PrestoPresto (Java) previously had a limitation where
CREATE TABLEandALTER TABLEstatements incorrectly treated timestamps with timezones as simple timestamps. Since these are distinct data types, this could lead to errors. To address this issue, the functionality of mapping timestamps with timezones has been disabled.Workaround: You need to modify
CREATE TABLEandALTER TABLEstatements to use plain timestamps (without timezone information).
- Alter column is not supported for Hive and Iceberg catalogs
-
Applies to: 1.1.0 and later
ALTER TABLEoperations that change a column's type to an incompatible type (for example, from STRING to MAP) are not supported for Hive and Iceberg catalogs.
Limitations: Others
- Unable to disable the QHMM feature
- Spark ingestion currently does not support special characters like quotation marks, back ticks, and parentheses for partitioned table column names.
- Cut or Copy icons are still enabled even after the action is performed
No space left on deviceerror occurs- Using the S3 Select Pushdown option
- The default
information_schemaview of a catalog lists schemas and tables from other catalogs - Issue with uppercase Turkish character İ in Oracle database using WE8ISO8859P9 character set (ORA-00911 Error)
- HDFS bucket addition is not supported via CPDCTL
- Absence of column NDV stats in Iceberg tables leads to suboptimal query plans
- Limitation of querying role-related
information_schematable fortpcdsortpchconnectors - Directory creation behavior for external tables
- Unable to disable the QHMM feature
-
Applies to: 2.1
Administrators are unable to disable the QHMM feature when the Data Pruning feature is enabled in QHMM. The following error is encountered when you try to disable the QHMM feature when Data pruning is enabled :{"errors":null,"exception":"QHMM should be enabled to set prune configurations","message_code":"Bad Request","status_code":400}Workaround: Disable the QHMM pruning configuration from the Query monitoring page before proceeding to disable QHMM.
- Spark ingestion currently does not support special characters like quotation marks, back ticks, and parentheses for partitioned table column names.
-
Applies to: 2.0.1 only
- Cut or Copy icons are still enabled even after the action is performed
-
Applies to: 1.1.0 and later
When you select text in the Query workspace, Cut and Copy icons are enabled. The Cut and Copy icons remain enabled after performing the actions. The Cut and Copy icons must be disabled when no text is selected after the action is completed.
Workaround: Clipboard settings in the Advanced preferences option of the Firefox browser must be set totrue. Following are the list of Clipboard settings:dom.event.clipboardevents.enableddom.event.asyncClipboard.clipboardItemdom.event.asyncClipboard.readTextdom.event.testing.asyncClipboard
-
No space left on deviceerror occurs -
Applies to: 1.1.0 and later
When you run queries on PrestoPresto (Java) while caching is enabled, a
No space left on deviceerror is displayed.Workaround: To resolve this error, log in to the cache directory and delete all entries in it.
- Using the S3 Select Pushdown option
-
Applies to: 1.1.0 and later
The S3 Select Pushdown option allows you to filter the data at the source and retrieve just the subset of data that you need. In watsonx.data, this option is disabled by default. You can enable the S3 Select Pushdown option (
s3_select_pushdown_enabled) by using the API. Currently, the S3 Select Pushdown option is supported only on IBM Storage Ceph and Amazon Web Services (AWS).
- You can select only non-database catalogs
-
Applies to: 1.1.2 and later
When integrating with IBM Knowledge Catalog, you can select only non-database catalogs. Database catalogs are not supported in watsonx.data.
For more information, see Integrating with IBM Knowledge Catalog.
- The default
information_schemaview of a catalog lists schemas and tables from other catalogs - Applies to: 2.1.0 and later
If a user has more than one catalog, the default
information_schemaview will display the schemas and tables from other catalogs as well, regardless of the catalogs associated with the engine.
- Issue with uppercase Turkish character İ in Oracle database using WE8ISO8859P9 character set (ORA-00911 Error)
- HDFS bucket addition is not supported via CPDCTL
- Applies to: 2.1.1 and later
Adding HDFS buckets is currently not supported by the cpdctl
wx-dataplugin.
- Absence of column NDV stats in Iceberg tables leads to suboptimal query plans
- Applies to: 2.1.1 and later
In the current implementation, for Iceberg tables within Presto (Java) and Presto (C++), the column NDV (Number of Distinct Values) statistics are not used when available in MDS. NDVs are important to generating optimal query plans. Without them, there can be significant performance degradation.
Workaround: For non-partitioned tables, useSET SESSION <iceberg_catalog>.hive_statistics_merge_strategy='USE_NULLS_FRACTION_AND_NDV';.Note: This workaround does not apply to partitioned tables.
- Limitation of querying role-related
information_schematable fortpcdsortpchconnectors - Applies to: 2.2.1 and later
Users encounter an error when querying role-related
information_schematable fortpcdsortpchconnectors. This behavior is intentional and expected for these connectors in Presto, astpcdsandtpchare benchmarking connectors that do not support role-based security features.Workaround: To prevent errors, avoid querying role-related
information_schematables (such as applicable_roles, enabled_roles, and roles) fortpcdsortpchconnectors.
- Directory creation behavior for external tables
- Applies to: 2.3.0 and later
MDS no longer creates directories automatically during
CREATE SCHEMAorCREATE TABLEoperations for most storage types. Users must ensure that the directory specified in the external_location property already exists before runningCREATE TABLEquery.