Known issues and limitations
Known issues: Access
- User is still visible in the Access control page of an engine after removing the user from the CPD platform.
- User cannot access tables created in the COS bucket due to a missing metadata error
- RBAC management for buckets with non-compliant names
- Access is denied when querying an external database
- Assigning Grant or Revoke privilege
- Only table creator has DROP access in Apache Hive (API)
- User-provided certificates are not supported by watsonx.data
- Bucket admin or creator can not update the bucket and catalog role
- Users with
Reader
orUser
role granted Implicit permission by default
- User is still visible in the Access control page of an engine after removing the user from the CPD platform.
-
Applies to: 1.1.3 and later
- User cannot access tables that are created in the COS bucket due to a missing metadata error
-
Applies to: 1.1.2 and later
- RBAC management for buckets with non-compliant names
-
Applies to: 1.1.0 and later
Users attempting to manage role-based access control (RBAC) for buckets with names that deviate from the following regular expression
(^[a-zA-Z0-9-_]+$)
will encounter an error. Bucket names must consist only lowercase and uppercase letters, numbers, underscores, and hyphens. Any characters apart from this list will prevent RBAC management functionality from working as expected.
- Access is denied when querying an external database
-
Applies to: 1.1.0 and later
When a user with the User role and Create access (the user only has Create access), is added to an external database, they cannot run the select query from the table they have created. Though the user can connect to the Presto engine and create tables and schemas, they cannot query from the table. The system displays aAccess Denied
message.
Workaround: Provide select privilege for the table the user created.Query 20230608_132213_00042_wpmk2 failed: Access Denied: Cannot select from columns [id] in table or view tab_appiduser_01
- Assigning Grant or Revoke privilege
-
Applies to: 1.1.0 and later
Assigning Grant or Revoke privilege to a user through access policy does not work as expected in the following scenarios:
- User_A adds a bucket and a Hive catalog (for example,
useracat02
). - User_A creates a schema and a table.
- User_B and User_C are assigned User roles to the catalog.
- User_A adds allow grant policy to User_B
- User_B connects to the catalog and runs
grant select
to User_C.presto:default> grant select on useracat02.schema_test_01.tab_1 to "6ff74bf7-b71b-42f2-88d9-a98fdbaed304";
- When the User_C connects to the catalog and runs
select
command on the table, the command fails with access denied message.presto:default> select * from useracat02.schema_test_01.tab_1; Query 20230612_073938_00132_hthnz failed: Access Denied: Cannot select from columns [name, id, salary, age] in table or view tab_1
- User_A adds a bucket and a Hive catalog (for example,
- Only table creator has DROP access in Apache Hive (API)
-
Applies to: 1.1.0 and later
Only the creator of a table can drop the table that is created in the Apache Hive catalog. Other users cannot drop the table even if they have an explicit DROP access to the table. They get the
Access Denied
message.
- User-provided certificates are not supported by watsonx.data
-
Applies to: 1.1.0 and later
User-provided certificates are not supported in watsonx.data when adding database connections, object store buckets, or by using ibm-lh utility.
- Bucket admin or creator cannot update the bucket and catalog role
-
Applies to: 1.1.1
- Users with
Reader
orUser
role granted Implicit permission by default -
Applies to: 1.1.0
Known issues: Catalog, schema, and tables
- Catalog property customization not applied to new Hive buckets
- Timeout error when creating schema in AFM bucket
- Creating a schema without a location
- Creating schema location with path
- Enabling Amazon S3 bucket makes catalogs inactive for 15 minutes
- Unable to view created schema
- Unique names for schema and bucket
- Creating schema for target table
- Catalog property customization not applied to new Hive buckets
-
Applies to: 1.1.3 and later
- Timeout error when creating schema in AFM bucket
-
Applies to: 1.1.1 and later
- Creating a schema without a location
-
Applies to: 1.1.0 and later
When you create a schema without a location, it is not listed in the schema list of any catalog.
For example, if you create a schema without specifying the location of the bucket, the schema is created in HMS and not in the bucket. When you try to create a new schema with the same name, it fails and responds that the schema already exists.
Workaround: Specify the location of the bucket when creating a schema.
- Creating schema location with path
-
Applies to: 1.1.0 and later
Use one of the following location options when creating a schema:- Location pointing to a
bucket/subpath
without a trailing/
. - Location pointing to a
bucket/subpath
with a trailing/
– Recommended for better structuring.
Note: Though you can use a location pointing to a bucket only with or without a trailing/
, it might lead to failure. Therefore, it is recommended to use a subpath. The subpath must be the same as the schema name. For example, to create a schematest1
, specify the location aslocation='s3a://iceberg-bucket/test1'
- Location pointing to a
- Enabling Amazon S3 bucket makes catalogs inactive for 15 minutes
-
Applies to: 1.1.0 and later
After enabling Amazon S3 bucket type, you need to wait for 15 minutes to use the catalogs. You might get the following error:Failed to create external path _s3a://bucket/schema_ name for database _database name_. This may result in access not being allowed if the StorageBasedAuthorizationProvider is enabled: null
Workaround: If the error persists, restart the HMS pod.oc get po -n cpd-instance |grep ibm-lh-lakehouse-hive-metastore |awk '{print $1}' |xargs oc delete po -n cpd-instance
- Unable to view created schema
-
Applies to: 1.1.0 and later
When a user with the User role and the Create access (the user only has the Create access) is added to an external database, they cannot see the schemas that they created. Though the user can create schemas, they cannot view them. The following is the system response:presto:default> show schemas; Schema -------- (0 rows)
Workaround: Provide select privilege for the schema the user created.
- Unique names for schema and bucket
-
Applies to: 1.1.0 and later
A schema and a bucket cannot be created with the same name.
For example, if you create a schema that is named “sales” in one catalog, the same name cannot be used for another schema in another catalog. Similarly, if you register a bucket with the name “salesbucket”, another bucket with the same cannot be registered, even if the bucket is located in a different object store.
Workaround: Use unique names when creating schemas and buckets.
- Creating schema for target table
-
Applies to: 1.1.0 and later
Fixed in: 1.1.3
Create a schema for the target table if the schema does not exist.
Known issues: Database and connectors
- Presto engine crashes when databases that are not supported are configured through the custom database feature in watsonx.data
- SAP HANA connector is not accessible in FIPS enabled cluster
- Presto engine crashes when databases that are not supported are configured through the custom database feature in watsonx.data
-
Applies to: 1.1.0 and later
Fixed in: 1.1.4
When databases that are not supported by Presto are configured through the custom database feature in watsonx.data, the Presto engine crashes. This leads to failure of readiness/liveliness check and subsequently pod
CrashLoopBackOff
.This failure issue is fixed by handling the exception
No factory for connector
. With the fix, the engine does not fail and continues to function with the existing connector. Also, the unsupported connector that is added is not available in engine. However, the connector is listed in the watsonx.data UI.
- SAP HANA connector is not accessible in FIPS enabled cluster
-
Applies to: 1.1.3
Fixed in: 1.1.4
The SAP HANA connector relies on a Bring Your Own JAR (BYOJ) process, where users upload their own JDBC driver,
ngdbc-2.17.12.jar
JAR file to connect to SAP HANA databases. As part of the security process, uploaded JAR files are scanned withClamAV
antivirus.However,
ClamAV
scanning currently fails in environments that are configured with Federal Information Processing Standard (FIPS) mode enabled. Users using FIPS enabled environments is unable to establish connections to SAP HANA databases through the SAP HANA connector.
Known issues: Engine
- Associating SAP HANA with Presto engine gives an error
- Columns of type numeric array are left out while
starting
DatabaseMetadata.getColumns()
- The deployment type is singlenode
- Issue with prestodb module in ibm-lh-client
- Presto do not recognize the path as a directory
- Presto JDBC driver returns incorrect values of date and timestamp
- Some Presto engines are not visible in the web console after upgrading to 1.1.2
- Adding a CA certificate for a second engine with the
cert-mgmt
utility fails - Running CTAS statements with a large source table might fail or restart the presto server
- A persistent
java.lang.NullPointerException
error occurs
- Associating SAP HANA with Presto engine gives an error
-
Applies to: 1.1.1 and later
- Columns of type numeric array are left out while starting
DatabaseMetadata.getColumns()
-
Applies to: 1.1.0 and later
When you start
DatabaseMetadata.getColumns()
using the Presto JDBC driver, columns of type numeric array are left out.
- The default deployment type is singlenode
-
Applies to: 1.1.0 and later
When an engine is created, regardless of the configuration mode that is selected, the deployed engine is always singlenode and small size.
Workaround: Customize the engine to multinode and different resource configurations, refer to Specifying additional customization for watsonx.data.
- Issue with prestodb module in ibm-lh-client
-
Applies to: 1.1.0 and later
Due to an issue with the prestodb module in ibm-lh-client, you must complete the following steps to connect to a Python client when using ibm-lh-client:- Start the sandbox container for the registered Presto
engine.
ibm-lh-client/bin/dev-sandbox --engine=demo-b
- In the bash prompt, install the prestodb module.
export HOME=/tmp pip3 install SQLAlchemy 'pyhive[presto]' presto-python-client
- Start the sandbox container for the registered Presto
engine.
- Presto do not recognize the path as a directory
-
Applies to: 1.1.0 and later
When you create a new table with a Presto Hive connector that uses an S3 folder from an external location, Presto does not recognize the path as a directory and an error might occur.
For example: When creating a customer table in the target directoryDBCERT/tbint
in a bucket that is calleddqmdbcertpq
by using the IBM Cloud UX and Aspera S3 console, the following error is encountered:External location must be a directory
.CREATE TABLE "hive-beta"."dbcert"."tbint" ( RNUM int , CBINT bigint ) WITH ( format='PARQUET', external_location = 's3a://dqmdbcertpq/DBCERT/tbint' ); Query 20230509_113537_00355_cn58z failed: External location must be a directory
Objects in a file system are stored as objects and their path. The object and path must have an associated metadata. If the path is not associated with the metadata, Presto fails to recognize the object and responds that the path is not a directory.
- Presto JDBC driver returns incorrect values of date and timestamp
-
Applies to: 1.1.0 and later
Date and timestamp string literals before 1900-01-01 return incorrect values.
- Some Presto engines are not visible in the web console after upgrading to 1.1.2
-
Applies to: 1.1.2
- Adding a CA certificate for a second engine with the
cert-mgmt
utility fails -
Applies to: 1.1.0 and later
- Running CTAS statements with a large source table might fail or restart the presto server
-
Applies to: 1.1.0
- A persistent
java.lang.NullPointerException
error occurs -
Applies to: 1.1.0 and later
Known issues: Ingestion
- Unable to submit Spark application with external Spark engine where the application script or binary resides in an object storage bucket
- Has header checkbox cannot be cleared when ingesting a CSV file
- Token expired error using Spark ingestion through web console
- Ingestion not supported using external Spark
- Ingestion fails if CSV file contains bad record
No columns to parse from file
error- Spark ingestion through UI is not possible without Presto engine permission
- Special characters in target table names can cause ingestion failures
- The command
CREATE TABLE AS SELECT
(CTAS) fails in Db2® when attempted on ingested data tables in Iceberg - Spark CLI ingestion does not load data when using
--create-if-not-exist
- Parquet files that are ingested through the web console generate null values in the target table
- Incorrect result with timestamp columns after ingesting data from CSV files
- Spark CLI ingestion removes existing data when using
--create-if-not-exist
- The staging folder is not dropped when ingestion is interrupted
- Unable to submit Spark application with external Spark engine where the application script or binary resides in an object storage bucket
-
Applies to: 1.1.4 and later
- Has header checkbox cannot be cleared when ingesting a CSV file
-
Applies to: 1.1.1 and later
- Token expired error while using Spark ingestion through web console
-
Applies to: 1.1.2 and later
- Ingestion is not supported by using external Spark
- Though an external Spark engine (Fully or Self managed) can be added to watsonx.data, ingestion job is not possible using the Spark engine. Ingestion is supported only by a Colocated Spark engine.
- Ingestion fails if CSV file contains bad record
-
Applies to: 1.1.0 and later
ibm-lh tool does not support skipping maximum bad records for CSV files if the mismatch field is greater than the table definition.
No columns to parse from file
errorApplies to: 1.1.0 and later
When you try to ingest folder from AWS S3 using the ibm-lh tool, the following error may be encountered if there are '0' sized empty files in the folder:
No columns to parse from file
- Spark ingestion through UI is not possible without Presto engine permission
-
Applies to: 1.1.0 and later
- Special characters in target table names can cause ingestion failures
-
Ingestion fails if a target table name contains special characters in it when ingesting through the web console.
- The command
CREATE TABLE AS SELECT
(CTAS) fails in Db2 when attempted on ingested data tables in Iceberg -
Applies to: 1.1.0 and later
Some versions of Db2 require an explicit length specification for
VARCHAR
columns. This requirement causes failure of the commandCREATE TABLE AS SELECT
(CTAS) in Db2 when attempted on ingested data tables in Iceberg.Workaround: Change the SQL statement from
VARCHAR
toVARCHAR(20)
.Example:create table "db2"."testgaissue"."testga" as ( select cast(checkingstatus as varchar(100)) as checkingstatus, loanduration, cast(credithistory as varchar(100)) as credithistory, cast(loanpurpose as varchar(100)) as loanpurpose, loanamount, cast(existingsavings as varchar(100)) as existingsavings, cast(employmentduration as varchar(100)) as employmentduration, installmentpercent, cast(sex as varchar(100)) as sex, cast(othersonloan as varchar(100)) as othersonloan, currentresidenceduration, cast(ownsproperty as varchar(100)) as ownsproperty, age, cast(installmentplans as varchar(100)) as installmentplans, cast(housing as varchar(100)) as housing, existingcreditscount, cast(job as varchar(100)) as job, dependents, cast(telephone as varchar(100)) as telephone, cast(foreignworker as varchar(100)) as foreignworker, cast(risk as varchar(100)) as risk from "iceberg_data"."project"."testga" );
- Spark CLI ingestion does not load data when using
--create-if-not-exist
-
Applies to: 1.1.2
- Parquet files that are ingested through the web console generate null values in the target table
-
Applies to: 1.1.0 and later
- Incorrect result with timestamp columns after ingesting data from CSV files
-
Applies to: 1.1.1
- Spark CLI ingestion removes existing data when using
--create-if-not-exist
-
Applies to: 1.1.1
- The staging folder is not dropped when ingestion is interrupted
-
Applies to: 1.1.0
Fixed in: 1.1.1
When you do an ingestion job by using a staging folder, the staging folder is dropped when ingestion is completed successful or when ingestion fails due to an exception error.
But the staging folder is not dropped if ingestion is interrupted or forcefully terminated by pressing Ctrl+C.
Workaround: Delete the staging folder manually.
Known issues: Installation and upgrade
- Installation path directory with space
- Upgrade to watsonx.data 1.1.1 fails when there is only one presto worker pod
- Installation path directory with space
-
Applies to: 1.1.0 and later
When you run the
setup.sh
script for the watsonx.data Developer version, if the installation path has a directory that contains spaces, an error might occur.For example, if the installation path is:
the error might be similar to the following message:/Users/john/documents/userdata/Hybrid Data Management/Lakehouse/ibm/lh-dev/bin
./setup.sh: line 19: /Users/john/documents/userdata/Hybrid: no such file or directory
Workaround: Install the watsonx.data developer version into a directory that contains no space.
- Upgrade to watsonx.data 1.1.1 fails when there is only one presto worker pod
- Applies to: 1.1.1
Known issues: Milvus
- Milvus unresponsive to queries
- Inaccurate row count after deletions in Milvus
- Potential data loss during batch insert of large data collection in Milvus
- Milvus service cannot be deleted using the delete icon from the user interface (UI) of watsonx.data developer edition in 1.1.3 version
- Milvus collections missing after upgrading watsonx.data developer edition from 1.1.3 to 1.1.4
- Milvus unresponsive to queries
-
Applies to: 1.1.3 and later
Milvus may not respond to queries when attempting to load collections or partitions that exceed available memory capacity. This occurs because all search and query operations within Milvus are executed in memory, requiring the entire collection or partition to be loaded before querying.
Workaround:- Consider the memory limitations of your Milvus deployment and avoid loading excessively large collections or partitions.
- If Milvus becomes unresponsive to queries, employ the appropriate Milvus API to unload or
release some collections from memory. An example using Python SDK:
collection.release()
.
- Inaccurate row count after deletions in Milvus
-
Applies to: 1.1.3 and later
The
collection.num_entities
property might not reflect the actual number of rows in a Milvus collection after deletion operations. This property provides an estimate and may not account for deleted entities.To get an accurate count of rows, execute a
count(*)
query on the collection. This provides an accurate count even after deletions.Pymilvus syntax:collection = pymilvus.Collection(...) collection.query(expr='', fields=['count(*)'])
- Potential data loss during batch insert of large data collection in Milvus
-
Applies to: 1.1.3 and later
Potential data loss may occur when inserting large dataset (5 million vectors) through the Milvus batch insert API with a single final flush. A subset of rows might be missing from the ingested data.
Workaround:- Flush the collection manually every 500,000 rows.
- Use the bulk insert API for data ingestion, see Insert Entities from Files. This is the recommended way to ingest large data sets.
- Milvus service cannot be deleted using the delete icon from the user interface (UI) of watsonx.data developer edition in 1.1.3 version
-
Applies to: 1.1.3
- Milvus collections missing after upgrading watsonx.data developer edition from 1.1.3 to 1.1.4
-
Applies to: 1.1.3
Fixed in: 1.1.4
When you upgrade watsonx.data developer edition from 1.1.3 to 1.1.4, previously created Milvus collections in 1.1.3 may become unavailable or inaccessible.
Workaround: Do the following workaround before loading data into the Milvus service in 1.1.3:-
Edit the
ibm-lh-dev/etc/ibm-lh-etcd.conf
file and add the following line:mnt_dir=/etcd
- Restart the Milvus and etcd services.
-
Known issues: SQL queries
- Presto queries with many columns and size exceeding default limit
- An unexpected error in parquet metadata reading occurs when running the queries on partitioned data
- Unrestricted access to SQL statements in worksheets
DROP TABLE
command on an Iceberg table does not remove folder and files from object storage- Unable to query SSL enabled Db2 instance with certificate uploaded in a FIPS cluster
- Trailing spaces in
WHERE
clause values - Saving worksheets in the SQL editor fails after upgrading to watsonx.data 1.1.1
- Presto queries with many columns and size exceeding default limit
-
Applies to: 1.1.0 and later
- An unexpected error in parquet metadata reading occurs when running the queries on partitioned data
-
Applies to: 1.1.1
- Unrestricted access to SQL statements in worksheets
-
Applies to: 1.1.0 and later
DROP TABLE
command on an Iceberg table does not remove folder and files from object storage-
Applies to: 1.1.0 and later
- Unable to query SSL enabled Db2 instance with certificate uploaded in a FIPS cluster
-
Applies to: 1.1.3
Fixed in: 1.1.4
- Trailing spaces in
WHERE
clause values -
Applies to: 1.1.3 and later
Fixed in: 1.1.4
- Saving worksheets in the SQL editor fails after upgrading to watsonx.data 1.1.1
-
Applies to: 1.1.1
Known issues: Web console
- Unique character handling in upload file feature
-
Applies to: 1.1.0 and later
- Test connection with SSL enabled is not supported
-
Applies to: 1.1.0 and later
Known issues: Others
- Data that is imported from watsonx.data bucket fails
- A
java.lang.UnsupportedOperationException
error occurs during selecting a partition table from S3 bucket - Integrating watsonx.data with IBM Knowledge Catalog is not supported in version 1.1.1
- Data that is imported from watsonx.data bucket fails
-
Applies to: 1.1.0 and later
When you run the import script (import-bucket-data.sh
) to restore the watsonx.data bucket data, the system displays an error that the containeribm-lh-lakehouse-minio
is not found.
Workaround: Delete the running MinIo pod by using the following command and rerun the import script.+ oc exec -t ibm-lh-lakehouse-minio-7db7c6788f-g4r8 -n cpd-instance -- bash -c 'mc alias set ibm-lh http://ibm-lh-lakehouse-minio-svc.cpd-instance.svc.cluster.local:9000 <access-key> <secret_key> \ --config-dir=/tmp/.mc \ --insecure && mc alias set ibm-lh_backup https://s3.us-west-2.amazonaws.com/ <access-key> <secret_key> \ --config-dir=/tmp/.mc --insecure' error: unable to upgrade connection: container not found ("ibm-lh-lakehouse-minio")
oc delete -n $CPD_NAMESPACE $(oc get rs -o name -n $CPD_NAMESPACE | grep "ibm-lh-lakehouse-minio")
- A
java.lang.UnsupportedOperationException
error occurs during selecting a partition table from S3 bucket -
Applies to: 1.1.1 and later
- Integrating watsonx.data with IBM Knowledge Catalog is not supported in version 1.1.1
-
Applies to: 1.1.1
Fixed in: 1.1.2
Workaround: Upgrade to watsonx.data 1.1.2 version to integrate IBM Knowledge Catalog with watsonx.data.
Limitations: Access
- User access control is not supported for fully managed and self managed Spark engines
- Add external MinIO bucket to allowlist to establish connection from air-gapped watsonx.data cluster.
- User access control is not supported for fully managed and self managed Spark engines
-
Applies to: 1.1.2 and later
The Access control tab is not supported for fully-managed or self-managed Spark engine. Administrators cannot carry out the access control operations for fully managed or self managed Spark engines.
- Add MinIO bucket to allowlist to establish connection with watsonx.data
-
Applies to: 1.1.2 and later
Limitations: Catalog, schema, and tables
- Cross catalog schema creation anomaly in Presto
- Creating schemas in the root path of Ceph Object Storage gives an error
- Hive does not support
json
data that starts with array - Hive catalog table creation by using
external_location
fails due to wrong placement of file - Table creation fails if the column names differ only by spaces
- Using special characters in schema, table, or column names
- Cross catalog schema creation anomaly in Presto
-
Applies to: 1.1.0 and later
An anomaly exists in schema creation for Hive and Iceberg catalogs managed by Presto. When using a common Hive Metastore Service for multiple catalogs (Example, an Iceberg catalog and a Hive catalog, or two Iceberg or Hive catalogs), creating a schema in one catalog might create it in a wrong catalog. This occurs if the location specified during schema creation belongs to a different catalog than intended.
Workaround: You must always explicitly provide the correct storage path associated with the target catalog when using
CREATE SCHEMA
statements in Presto. This ensures the schema is created in the desired location.
- Creating schemas in the root path of Ceph Object Storage gives an error
-
Applies to: 1.1.0 and later
Due to a bug in IBM Storage Ceph 5/6 and Red Hat Ceph Storage 4/5/6, if you are creating schema in the root path of one of the Ceph Object Storage in the watsonx.data, it gives you the following error message.Executing query failed with error: com.facebook.presto.spi.PrestoException: Failed to create schema. Check the credentials, permissions and storage path for the bucket. Make sure that the bucket is registered with wxd and retry.
Solution: You can upgrade your IBM Storage Ceph and Red Hat Ceph Storage to 7.0z1 and 7.1 versions respectively.
Workaround: If you are still using the older IBM Storage Ceph 5/6 and Red Hat Ceph Storage 4/5/6 versions, you must do the following:
When you create a schema in Ceph Object Storage, a pseudo-directory must be created prior to creating schema in watsonx.data.
Run the following command to use the
s5cmd
S3 client to create a pseudo-directory and insert an empty file into it:touch a s5cmd --endpoint-url s3.ceph.example.com cp a s3://watsonx/mycatalog/myschema/
The copy command puts an empty file in the
/mycatalog/myschema
pseudo-directory.Use the newly created pseudo-directory as the path for creating schema in watsonx.data web console.
- Hive does not support
json
data that starts with array -
Applies to: 1.1.0 and later
Hive does not support
json
data that starts with array.
- Hive catalog table creation by using
external_location
fails due to wrong placement of file -
Applies to: 1.1.0 and later
Hive catalog table creation by using
external_location
fails when the file is placed in the root of the bucket.
- Table creation fails if the column names differ only by spaces
-
Applies to: 1.1.0 and later
When you create a table from a data file by using the watsonx.data web console, the column names must be unique. Due to this limitation, if a CSV data file has column names that differ only by "spaces" for example,
Cash Flow per Share
andCashFlowPerShare
, then these columns are considered to have the same names and table creation fails.
- Using special characters in schema, table, or column names
-
Applies to: 1.1.0 and later
It is recommended to not use special characters such as question mark (?) or asterisk (*) in table, column names and schema names. Though these special characters are supported and tables, columns and schemas can be created, using these special characters might cause issues when running the
INSERT
command.
Limitations: Database and connectors
- Redshift connector case sensitivity
- Transactions not supported in unlogged Informix databases
- LDAP authentication is not supported for Teradata connector
- Netezza®
Performance Server
INSERT
statement limitation - Unsupported Db2 operations
- Handling Null Values in Elasticsearch
- Loading Nested JSON with Elasticsearch
-
Db2 does not
support
CREATE VIEW
statement for a table from another catalog - Netezza
Performance Server does not support
CREATE VIEW
statement for a table from another catalog
- Redshift connector case sensitivity
-
Applies to: 1.1.4 and later
The Redshift connector may not handle mixed-case database, table, and column names if the Redshift cluster configuration
enable_case_sensitive_identifier
is set tofalse
(default). When this configuration isfalse
, Redshift treats all identifiers as lowercase.When user comes up with Redshift cluster configuration
enable_case_sensitive_identifier
set to true, then mixed-case will work.
- Transactions not supported in unlogged Informix databases
-
Applies to: 1.1.4 and later
In watsonx.data, when attempting to execute queries with transactional implications on unlogged Informix databases, queries will fail. This is because unlogged Informix databases, by design, do not support transactions.
- LDAP authentication is not supported for Teradata connector
-
Applies to: 1.1.0 and later
The watsonx.data Teradata connector does not currently support LDAP (Lightweight Directory Access Protocol) for user authentication.
- Netezza Performance Server
INSERT
statement limitation -
Applies to: 1.1.0 and later
Netezza Performance Server currently does not support inserting multiple rows directly into a table using
VALUES
clause. This functionality is limited to single-row insertions. Refer to the official Netezza Performance Server documentation for details on theINSERT
statement.The following example usingVALUES
for multiple rows is not supported:INSERT INTO EMPLOYEE VALUES (3,'Roy',45,'IT','CityB'),(2,'Joe',45,'IT','CityC');
Workaround: Use a subquery withSELECT
andUNION ALL
to construct a temporary result set and insert it into the target table.INSERT INTO EMPLOYEE SELECT * FROM(SELECT 4,'Steve',35,'FIN','CityC' UNION ALL SELECT 5,'Paul',37,'OP','CityA') As temp;
- Unsupported Db2 operations
-
Applies to: 1.1.0 and later
watsonx.data currently does not support the
ALTER TABLE DROP COLUMN
operation for Db2 column-organized tables.Note: By default, Db2 instances create tables in column-organized format.watsonx.data does not support creating row-organized tables in Db2.
- Handling Null Values in Elasticsearch
-
Applies to: 1.1.0 and later
Elasticsearch connector requires explicit definition of index mappings for fields to handle null values when loading data.
- Loading Nested JSON with Elasticsearch
-
Applies to: 1.1.0 and later
Elasticsearch connector requires users to explicitly specify nested JSON structures as arrays of type ROW for proper loading and querying. To process such structures, use the UNNEST operation.
-
Db2 does not support
CREATE VIEW
statement for a table from another catalog -
Applies to: 1.1.0 and later
For Db2, you can create the view for a table only if that table is in the same catalog and the same schema.
- Netezza Performance Server does not support
CREATE VIEW
statement for a table from another catalog -
Applies to: 1.1.0 and later
For Netezza Performance Server, you can create the view for a table only if that table is in the same catalog and the same schema.
Limitations: Engine
- Presto needs precision of the DECIMAL column to be within a valid range
- Unable to create views in Presto
- HMS and Presto log level from Default Error level to Debug level
- Presto fails to restrict NULL values on a column with NOT NULL constraint
- Presto REST API with BigInt data
- Presto needs precision of the DECIMAL column to be within a valid range
- Applies to: 1.1.0 and later
Presto needs precision of the DECIMAL column in the PostgreSQL table creation statement to be within a valid range.
- Unable to create views in Presto
-
Applies to: 1.1.0 and later
Presto describes a view in a mapped database as a TABLE rather than a VIEW. This is apparent to JDBC program connecting to the Presto engine.
- HMS and Presto log level from Default Error level to Debug level
-
Applies to: 1.1.0 and later
watsonx.data console does not support changing HMS and Presto log level from Default Error level to Debug level.
Workaround:- Run the following curl command to change the log level inside the HMS
pods:
curl -k -X POST 'https://localhost:8281/v1/hms/loglevel' -H 'Content-Type: application/json' -d '{"log-level": "DEBUG"}'
- Run the following curl command to change the log level inside the Presto
pods:
curl --location 'https://<host>:8481/v1/lh_engine/change_configuration' \ -k --header 'secret: $LH_INSTANCE_SECRET' \ --header 'Content-Type: application/json' \ --data '{ "type":"loglevel", "value":"info", "restart":true }'
- Run the following curl command to change the log level inside the HMS
pods:
- Presto fails to restrict NULL values on a column with NOT NULL constraint
-
Applies to: 1.1.0 and later
Fixed in: 1.1.1
When you define a table with columns that have a NOT NULL constraint, the Presto engine fails to restrict the NULL values in columns that are defined with a NOT NULL constraint. Allowing NULL values leads to data inconsistency, resulting in read failure when executing queries.
- Presto REST API with BigInt data
-
Applies to: 1.1.0
Fixed in: 1.1.1
Query workspace uses the Presto REST API for submitting and getting results to Presto. If the results consist of BigInt data, the last 3 digits are truncated to 000.
Workaround: If the results consist of BigInt data, you must cast the BigInt column with varchar to preserve the precision of the result.
Limitations: Ingestion
- Delimiters supported for ingestion through UI
-
Applies to: 1.1.0 and later
Fixed in: 1.1.1
When you create a table from a data file by using the watsonx.data web console, use a comma (
,
) for the delimiter. Comma (,) is the only supported delimiter for ingestion through UI.
Limitations: SQL queries
- Timestamp with timezone handling limitation in
CREATE
/ALTER TABLE
- Alter column is not supported for Hive and Iceberg catalogs
- Dropping incompatible column types in
ALTER TABLE
- Case sensitivity of column names in queries
- Timestamp with timezone handling limitation in
CREATE
/ALTER TABLE
-
Applies to: 1.1.4 and later
Presto previously had a limitation where
CREATE TABLE
andALTER TABLE
statements incorrectly treated timestamps with timezones as simple timestamps. Since these are distinct data types, this could lead to errors. To address this issue, the functionality of mapping timestamps with timezones has been disabled.Workaround: You need to modify
CREATE TABLE
andALTER TABLE
statements to use plain timestamps (without timezone information).
- Alter column is not supported for Hive and Iceberg catalogs
-
Applies to: 1.1.0 and later
ALTER TABLE
operations that change a column's type to an incompatible type (for example, from STRING to MAP) are not supported for Hive and Iceberg catalogs.
- Dropping incompatible column types in
ALTER TABLE
-
Applies to: 1.1.0 and later
Using
ALTER TABLE
to drop a column fails if the remaining columns have data types incompatible with the dropped column. This behavior applies to both Hive and Iceberg tables.Workaround:- Set
hive.metastore.disallow.incompatible.col.type.changes
configuration property tofalse
in the Hive Metastore (HMS). - Restart the HMS.
- Set
- Case sensitivity of column names in queries
-
Applies to: 1.1.0 and later
Limitations: Others
- IBM Knowledge Catalog integration does not support row-level filtering
- Cut or Copy icons are still enabled even after the action is performed
No space left on device
error occurs- Using the S3 Select Pushdown option
- Tables that are not in the IBM Knowledge Catalog might be inaccessible when an integration is active
- IBM Knowledge Catalog integration does not support row-level filtering
-
Applies to: 1.1.2 and later
After IBM Knowledge Catalog integration, data-masking rules are enforced in watsonx.data. But row-filtering rules are not applied, which can cause the rows to be visible and accessible.
- Cut or Copy icons are still enabled even after the action is performed
-
Applies to: 1.1.0 and later
When you select text in the Query workspace, Cut and Copy icons are enabled. The Cut and Copy icons remain enabled after performing the actions. The Cut and Copy icons must be disabled when no text is selected after the action is completed.
Workaround: Clipboard settings in the Advanced preferences option of the Firefox browser must be set totrue
. Following are the list of Clipboard settings:dom.event.clipboardevents.enabled
dom.event.asyncClipboard.clipboardItem
dom.event.asyncClipboard.readText
dom.event.testing.asyncClipboard
-
No space left on device
error occurs -
Applies to: 1.1.0 and later
When you run queries on Presto while caching is enabled, a
No space left on device
error is displayed.Workaround: To resolve this error, log in to the cache directory and delete all entries in it.
- Using the S3 Select Pushdown option
-
Applies to: 1.1.0 and later
The S3 Select Pushdown option allows you to filter the data at the source and retrieve just the subset of data that you need. In watsonx.data, this option is disabled by default. You can enable the S3 Select Pushdown option (
s3_select_pushdown_enabled
) by using the API. Currently, the S3 Select Pushdown option is supported only on IBM Storage Ceph and Amazon Web Services (AWS).
- You can select only non-database catalogs
-
Applies to: 1.1.2 and later
When integrating with IBM Knowledge Catalog, you can select only non-database catalogs. Database catalogs are not supported in watsonx.data.
For more information, see Integrating with IBM Knowledge Catalog.
- Tables that are not in the IBM Knowledge Catalog might be inaccessible when an integration is active
-
Applies to: 1.1.2 and later
Fixed in: 1.1.3
With an IBM Knowledge Catalog integration active, the tables that are not in a governed catalog remain inaccessible. For more information, see the integration prerequisites Integrating with IBM Knowledge Catalog.