Known issues and limitations

The following known issues and limitations apply to IBM® watsonx.data.

Known issues: Access

User is still visible in the Access control page of an engine after removing the user from the CPD platform.

Applies to: 1.1.3 and later

User cannot access tables that are created in the COS bucket due to a missing metadata error

Applies to: 1.1.2 and later

Sometimes, the data that is created from the COS data source by using the iceberg catalog becomes inaccessible due to a missing metadata error.

(Unable to execute HTTP request: No X509TrustManager implementation available)

Users cannot create any additional schemas and tables in COS.

Workaround: To recover all missing tables from your existing COS iceberg catalog, complete the following steps.
  1. Deactivate the COS bucket.
  2. Dissociate the COS catalog from Presto engine.
  3. Associate the COS catalog with Presto engine.
  4. Select Sync all objects from the sync catalog dialog box and click Sync.
RBAC management for buckets with non-compliant names

Applies to: 1.1.0 and later

Users attempting to manage role-based access control (RBAC) for buckets with names that deviate from the following regular expression (^[a-zA-Z0-9-_]+$) will encounter an error. Bucket names must consist only lowercase and uppercase letters, numbers, underscores, and hyphens. Any characters apart from this list will prevent RBAC management functionality from working as expected.

Access is denied when querying an external database

Applies to: 1.1.0 and later

When a user with the User role and Create access (the user only has Create access), is added to an external database, they cannot run the select query from the table they have created. Though the user can connect to the Presto engine and create tables and schemas, they cannot query from the table. The system displays a Access Denied message.

Query 20230608_132213_00042_wpmk2 failed: Access Denied: Cannot select from columns [id] in table or view tab_appiduser_01
Workaround: Provide select privilege for the table the user created.
Assigning Grant or Revoke privilege

Applies to: 1.1.0 and later

Assigning Grant or Revoke privilege to a user through access policy does not work as expected in the following scenarios:

  • User_A adds a bucket and a Hive catalog (for example, useracat02).
  • User_A creates a schema and a table.
  • User_B and User_C are assigned User roles to the catalog.
  • User_A adds allow grant policy to User_B
  • User_B connects to the catalog and runs grant select to User_C.
    presto:default> grant select on useracat02.schema_test_01.tab_1 to "6ff74bf7-b71b-42f2-88d9-a98fdbaed304";
  • When the User_C connects to the catalog and runs select command on the table, the command fails with access denied message.
    presto:default> select * from useracat02.schema_test_01.tab_1;
    Query 20230612_073938_00132_hthnz failed: Access Denied: Cannot select from columns [name, id, salary, age] in table or view tab_1
Only table creator has DROP access in Apache Hive (API)

Applies to: 1.1.0 and later

Only the creator of a table can drop the table that is created in the Apache Hive catalog. Other users cannot drop the table even if they have an explicit DROP access to the table. They get the Access Denied message.

User-provided certificates are not supported by watsonx.data

Applies to: 1.1.0 and later

User-provided certificates are not supported in watsonx.data when adding database connections, object store buckets, or by using ibm-lh utility.

Bucket admin or creator cannot update the bucket and catalog role

Applies to: 1.1.1

Fixed in: 1.1.2

Bucket admin or creator cannot update the bucket and catalog role.

Workaround: If you are a nonadmin user, ask admin to remove you and add again with expected role. If you have the admin permission, change your role.

Users with Reader or User role granted Implicit permission by default

Applies to: 1.1.0

Fixed in: 1.1.1

For wxd-system bucket and wxd_system_data catalog, users with Reader or User role are granted Implicit permission by default.

Workaround: Change the role of users from Reader or User to Admin, and then change it back to Reader or User. After switching the permission type from Implicit to Explicit and granting proper data policy permissions, those users can access the system bucket/catalog. It is also applicable for user groups.

Known issues: Catalog, schema, and tables

Catalog property customization not applied to new Hive buckets

Applies to: 1.1.3 and later

Adding new Hive buckets does not trigger a Presto pod restart, resulting in operator customizations not being applied to the new buckets.

Operator customizations made to existing catalogs are not applied to newly added Hive buckets. When a new Hive catalog is added, existing catalogs may also lose previously applied customizations.

Workaround: You must restart the Presto pods by following the steps:
  1. Set the environment variable PROJECT_CPD_INSTANCE to the namespace where watsonx.data is installed:
    export PROJECT_CPD_INSTANCE=<instance-namespace>
  2. Restart the Presto pod associated with the impacted engine:
    oc delete pod -l engineName=<engine-name>
Timeout error when creating schema in AFM bucket

Applies to: 1.1.1 and later

When you try to create a schema in an Active file management (AFM) bucket present in the Spectrum Scale system, you might encounter a timeout error.

Creating a schema without a location

Applies to: 1.1.0 and later

When you create a schema without a location, it is not listed in the schema list of any catalog.

For example, if you create a schema without specifying the location of the bucket, the schema is created in HMS and not in the bucket. When you try to create a new schema with the same name, it fails and responds that the schema already exists.

Workaround: Specify the location of the bucket when creating a schema.

Creating schema location with path

Applies to: 1.1.0 and later

Use one of the following location options when creating a schema:
  • Location pointing to a bucket/subpath without a trailing /.
  • Location pointing to a bucket/subpath with a trailing / – Recommended for better structuring.
Note: Though you can use a location pointing to a bucket only with or without a trailing /, it might lead to failure. Therefore, it is recommended to use a subpath. The subpath must be the same as the schema name. For example, to create a schema test1, specify the location as location='s3a://iceberg-bucket/test1'
Enabling Amazon S3 bucket makes catalogs inactive for 15 minutes

Applies to: 1.1.0 and later

After enabling Amazon S3 bucket type, you need to wait for 15 minutes to use the catalogs. You might get the following error:
Failed to create external path _s3a://bucket/schema_ name for database _database name_.
This may result in access not being allowed if the StorageBasedAuthorizationProvider is enabled: null
Workaround: If the error persists, restart the HMS pod.
oc get po -n cpd-instance |grep ibm-lh-lakehouse-hive-metastore |awk '{print $1}' |xargs oc delete po -n cpd-instance
Unable to view created schema

Applies to: 1.1.0 and later

When a user with the User role and the Create access (the user only has the Create access) is added to an external database, they cannot see the schemas that they created. Though the user can create schemas, they cannot view them. The following is the system response:

presto:default> show schemas;
 Schema 
--------
(0 rows)

Workaround: Provide select privilege for the schema the user created.

Unique names for schema and bucket

Applies to: 1.1.0 and later

A schema and a bucket cannot be created with the same name.

For example, if you create a schema that is named “sales” in one catalog, the same name cannot be used for another schema in another catalog. Similarly, if you register a bucket with the name “salesbucket”, another bucket with the same cannot be registered, even if the bucket is located in a different object store.

Workaround: Use unique names when creating schemas and buckets.

Creating schema for target table

Applies to: 1.1.0 and later

Fixed in: 1.1.3

Create a schema for the target table if the schema does not exist.

Known issues: Database and connectors

Presto engine crashes when databases that are not supported are configured through the custom database feature in watsonx.data

Applies to: 1.1.0 and later

Fixed in: 1.1.4

When databases that are not supported by Presto are configured through the custom database feature in watsonx.data, the Presto engine crashes. This leads to failure of readiness/liveliness check and subsequently pod CrashLoopBackOff.

This failure issue is fixed by handling the exception No factory for connector. With the fix, the engine does not fail and continues to function with the existing connector. Also, the unsupported connector that is added is not available in engine. However, the connector is listed in the watsonx.data UI.

SAP HANA connector is not accessible in FIPS enabled cluster

Applies to: 1.1.3

Fixed in: 1.1.4

The SAP HANA connector relies on a Bring Your Own JAR (BYOJ) process, where users upload their own JDBC driver, ngdbc-2.17.12.jar JAR file to connect to SAP HANA databases. As part of the security process, uploaded JAR files are scanned with ClamAV antivirus.

However, ClamAV scanning currently fails in environments that are configured with Federal Information Processing Standard (FIPS) mode enabled. Users using FIPS enabled environments is unable to establish connections to SAP HANA databases through the SAP HANA connector.

Known issues: Engine

Associating SAP HANA with Presto engine gives an error

Applies to: 1.1.1 and later

You might get an error message when associating SAP HANA with the Presto engine. However, the association is successful and queries can be run. There is no workaround for this error message currently.

Columns of type numeric array are left out while starting DatabaseMetadata.getColumns()

Applies to: 1.1.0 and later

When you start DatabaseMetadata.getColumns() using the Presto JDBC driver, columns of type numeric array are left out.

The default deployment type is singlenode

Applies to: 1.1.0 and later

When an engine is created, regardless of the configuration mode that is selected, the deployed engine is always singlenode and small size.

Workaround: Customize the engine to multinode and different resource configurations, refer to Specifying additional customization for watsonx.data.

Issue with prestodb module in ibm-lh-client

Applies to: 1.1.0 and later

Due to an issue with the prestodb module in ibm-lh-client, you must complete the following steps to connect to a Python client when using ibm-lh-client:
  1. Start the sandbox container for the registered Presto engine.
    ibm-lh-client/bin/dev-sandbox --engine=demo-b
  2. In the bash prompt, install the prestodb module.
    export HOME=/tmp
    pip3 install SQLAlchemy 'pyhive[presto]' presto-python-client
Presto do not recognize the path as a directory

Applies to: 1.1.0 and later

When you create a new table with a Presto Hive connector that uses an S3 folder from an external location, Presto does not recognize the path as a directory and an error might occur.

For example: When creating a customer table in the target directory DBCERT/tbint in a bucket that is called dqmdbcertpq by using the IBM Cloud UX and Aspera S3 console, the following error is encountered: External location must be a directory.
CREATE TABLE "hive-beta"."dbcert"."tbint" (
RNUM int , CBINT bigint
) WITH (
format='PARQUET', external_location = 's3a://dqmdbcertpq/DBCERT/tbint'
);
 Query 20230509_113537_00355_cn58z failed: External location must be a directory

Objects in a file system are stored as objects and their path. The object and path must have an associated metadata. If the path is not associated with the metadata, Presto fails to recognize the object and responds that the path is not a directory.

Presto JDBC driver returns incorrect values of date and timestamp

Applies to: 1.1.0 and later

Date and timestamp string literals before 1900-01-01 return incorrect values.

Some Presto engines are not visible in the web console after upgrading to 1.1.2

Applies to: 1.1.2

Fixed in: 1.1.3

After you upgrade watsonx.data to 1.1.2, some of the Presto engines are not visible in the web console. However, engine functions and client connections work as expected.

Workaround: If you come across this issue, follow the steps to manually update the Postgres engines table entries from ahana_presto to Presto.
  1. Run the following command to find the postgres-upgrade-to-112-job pod.

    export PROJECT_CPD_INST_OPERANDS=<INSTANCE_NAMESPACE>
    export UPGRADE_JOB=$(oc get po -o name -n $PROJECT_CPD_INST_OPERANDS | grep "ibm-lh-postgres-upgrade-to-112-job")
  2. Run the following commands to update engine type to Presto.

    oc debug $UPGRADE_JOB -n $PROJECT_CPD_INST_OPERANDS
    
    . /mnt/infra/ibm-lh-config/env.properties
    . /mnt/infra/ibm-lh-secrets/env.properties
    export PGPASSWORD=$POSTGRES_PASSWORD
    
    psql -h ibm-lh-postgres-edb-rw -p 5432 -U $POSTGRES_USER -w -d ibm_lh_repo
    UPDATE engine SET type='Presto' WHERE type='ahana_presto';
Adding a CA certificate for a second engine with the cert-mgmt utility fails

Applies to: 1.1.0 and later

Fixed in: 1.1.2

When a user tries to add a CA certificate for a second presto engine by running ./cert-mgmt command, a permission denied error occurs as follows:
mv: replace '/mnt/infra/tls/cabundle.crt', overriding mode 0644 (rw-r--r--)? yes
mv: inter-device move failed: '/tmp/cabundle.crt' to '/mnt/infra/tls/cabundle.crt'; unable to remove target: Permission denied

The inter-device move error occurs when the container file system is located in a device or disk partition that is different from the user location localstorage/ directory in the watsonx.data client installation.

Workaround: The user needs to press enter to accept the overwrite of the target file.

Running CTAS statements with a large source table might fail or restart the presto server

Applies to: 1.1.0

Fixed in: 1.1.1
When you run CTAS statements with a large source table, the statement might fail or restart the presto server. For example,
create table mycatalog.myschema.targettable as select
    mycolumn,
    ....
from mycatalog.myschema.sourcetable
with data
;

Workaround: Increase the ephemeral storage for the presto pod to match the data size of the source table. Scale up the number of replica presto workers with the sum of all ephemeral storage that matches the data size of the source table.

A persistent java.lang.NullPointerException error occurs

Applies to: 1.1.0 and later

Fixed in: 1.1.1

When you run complex and concurrent SQL query workloads, a persistent java.lang.NullPointerException, 500: Internal Server Error error occurs in one of the Presto workers as follows:
2023-08-30T22:12:20.741Z        ERROR   remote-task-callback-3095       com.facebook.presto.execution.StageExecutionStateMachine        Stage execution 20230830_221206_31201_z3xuz.14.0 failed
com.facebook.presto.spi.PrestoException: Expected response code from https://172-17-151-136.fd034d239041414591df37bc2533573e.pod.cluster.local:8480/v1/task/20230830_221206_31201_z3xuz.14.0.5?summarize to be 200, but was 500: Internal Server Error
java.lang.NullPointerException
        at io.airlift.units.Duration.millisPerTimeUnit(Duration.java:237)
        at io.airlift.units.Duration.getValue(Duration.java:94)
        at io.airlift.units.Duration.convertTo(Duration.java:109)
        ....
Workaround: Shut down and restart wxdEngine to restart all worker nodes.
  1. Run the following command to get the Presto engine name that you want to shut down:
    oc get wxdengine --namespace ${PROJECT_CPD_INST_OPERANDS}
  2. Run the following command to shut down wxdEngine:
    oc patch wxdEngine <engine_instance> \
    --namespace ${PROJECT_CPD_INST_OPERANDS} \
    --patch '{"spec":{"shutdown":"true"}}' \
    --type=merge

    Alternatively, run the following command to force shutdown:

    oc patch  wxdEngine <engine_instance>  \
    --namespace ${PROJECT_CPD_INST_OPERANDS} \
    --patch '{"spec":{"shutdown":"force"}}' \
    --type=merge
  3. Run the following command to restart wxdEngine:
    oc patch wxdEngine <engine_instance> \
    --namespace ${PROJECT_CPD_INST_OPERANDS} \
    --patch '{"spec":{"shutdown":"false"}}' \
    --type=merge

Known issues: Ingestion

Unable to submit Spark application with external Spark engine where the application script or binary resides in an object storage bucket

Applies to: 1.1.4 and later

If you are using an external Spark engine (IBM Cloud or co-located), you might not be able to submit a Spark application where the application script or binary is located in an object storage bucket. You may see the following error in you application logs:
ConfigurationParseException: Configuration parse exception: Access KEY is empty

Workaround: You must set the object storage connection parameters for your Spark application as default Spark configuration in the Analytics Engine (Spark) instance registered as your external Spark engine.

Example:
spark.hadoop.fs.s3a.bucket.<s3_bucket_name>.endpoint = <cos_endpoint>
spark.hadoop.fs.s3a.bucket.<s3_bucket_name>.access.key = <s3 bucket HMAC access key>
spark.hadoop.fs.s3a.bucket.<s3_bucket_name>.secret.key =  <s3 bucket  HMAC secret key>

For co-located Analytics Engine instance:

  1. Log in to your cluster.
  2. From the navigation pane, go to Services and click Instances. Click Spark instance.
  3. In the Configuration tab, click on Edit button for Default Spark configuration card, set the connection properties and Save the configuration.

For IBM Cloud Analytics Engine instance:

  1. Log in to https://cloud.ibm.com/resources with the appropriate user credentials.
  2. Expand Analytics section and click on your Analytics Engine service instance.
  3. In the Configuration tab, click Edit for Default Spark configuration card, set the connection properties and save the configuration.
Has header checkbox cannot be cleared when ingesting a CSV file

Applies to: 1.1.1 and later

Fixed in: 1.1.3

When you try to ingest a CSV file with or without header, you cannot clear Has header checkbox.

Workaround: Use CLI ingestion to ingest .CSV files.

Token expired error while using Spark ingestion through web console

Applies to: 1.1.2 and later

When you ingest by using Spark through the web console, the ingestion job shall continue running and finish successfully even if you encounter a false error message as follows:

Spark ingestion error. Job status command authorization token expired or does not have sufficient permission.
Ingestion is not supported by using external Spark
Though an external Spark engine (Fully or Self managed) can be added to watsonx.data, ingestion job is not possible using the Spark engine. Ingestion is supported only by a Colocated Spark engine.

Applies to: 1.1.2 and later

Ingestion fails if CSV file contains bad record

Applies to: 1.1.0 and later

ibm-lh tool does not support skipping maximum bad records for CSV files if the mismatch field is greater than the table definition.

No columns to parse from file error

Applies to: 1.1.0 and later

When you try to ingest folder from AWS S3 using the ibm-lh tool, the following error may be encountered if there are '0' sized empty files in the folder:

No columns to parse from file

Workaround: First list the folders inside the bucket by using aws s3 ls command. If '0' sized empty files are listed, copy all the files to another folder by using aws s3 cp command.

Spark ingestion through UI is not possible without Presto engine permission

Applies to: 1.1.0 and later

You cannot ingest the data through UI by using Spark engine without an admin role for the default Presto engine.

Special characters in target table names can cause ingestion failures

Ingestion fails if a target table name contains special characters in it when ingesting through the web console.

Applies to: 1.1.0 and later

Workaround: You can ingest data by using ingestion through CLI.
The command CREATE TABLE AS SELECT (CTAS) fails in Db2 when attempted on ingested data tables in Iceberg

Applies to: 1.1.0 and later

Some versions of Db2 require an explicit length specification for VARCHAR columns. This requirement causes failure of the command CREATE TABLE AS SELECT (CTAS) in Db2 when attempted on ingested data tables in Iceberg.

Workaround: Change the SQL statement from VARCHAR to VARCHAR(20).

Example:
create table "db2"."testgaissue"."testga" as (
  select
    cast(checkingstatus as varchar(100)) as checkingstatus,
    loanduration,
    cast(credithistory as varchar(100)) as credithistory,
    cast(loanpurpose as varchar(100)) as loanpurpose,
    loanamount,
    cast(existingsavings as varchar(100)) as existingsavings,
    cast(employmentduration as varchar(100)) as employmentduration,
    installmentpercent,
    cast(sex as varchar(100)) as sex,
    cast(othersonloan as varchar(100)) as othersonloan,
    currentresidenceduration,
    cast(ownsproperty as varchar(100)) as ownsproperty,
    age,
    cast(installmentplans as varchar(100)) as installmentplans,
    cast(housing as varchar(100)) as housing,
    existingcreditscount,
    cast(job as varchar(100)) as job,
    dependents,
    cast(telephone as varchar(100)) as telephone,
    cast(foreignworker as varchar(100)) as foreignworker,
    cast(risk as varchar(100)) as risk
  from
    "iceberg_data"."project"."testga"
);
Spark CLI ingestion does not load data when using --create-if-not-exist

Applies to: 1.1.2

Fixed in: 1.1.3

When you pass the --create-if-not-exist parameter during a Spark CLI ingestion with a preexisting target table, no data will be loaded into the table, even though the ingest tool may report a successful ingestion.
Workaround: You must not pass in the --create-if-not-exist parameter when the target table already exists or you can also ingest using Spark through the web console.
Parquet files that are ingested through the web console generate null values in the target table

Applies to: 1.1.0 and later

Fixed in: 1.1.2

When you ingest parquet data files that contain mixed data types within a column (for example, int, string, and other types), the columns display null values in the target table.

Workaround: You can use the command line ibm-lh tool utility for ingesting parquet data files that contain mixed data types within a column.

Incorrect result with timestamp columns after ingesting data from CSV files

Applies to: 1.1.1

Fixed in: 1.1.2

While ingesting CSV files containing timestamps with nanosecond precision, Presto incorrectly translates them to the target table because it only supports microsecond accuracy.
Workaround: Use Spark ingestion from web console.
  1. Register the Spark engine in CPD, see Registering an engine.

  2. Create a target table in Presto with type timestamp columns converted to type varchar.

  3. Ingest data through the web console using the registered Spark engine, see Ingesting data by using Spark.

Spark CLI ingestion removes existing data when using --create-if-not-exist

Applies to: 1.1.1

Fixed in: 1.1.2

When a user passes in the --create-if-not-exist parameter during Spark CLI ingestion with a preexisting target table, it drops the existing target table and creates a new table with the source data.
Workaround: To preserve existing data within the target table, you must not pass in the --create-if-not-exist parameter when the target table already exists.
The staging folder is not dropped when ingestion is interrupted

Applies to: 1.1.0

Fixed in: 1.1.1

When you do an ingestion job by using a staging folder, the staging folder is dropped when ingestion is completed successful or when ingestion fails due to an exception error.

But the staging folder is not dropped if ingestion is interrupted or forcefully terminated by pressing Ctrl+C.

Workaround: Delete the staging folder manually.

Known issues: Installation and upgrade

Installation path directory with space

Applies to: 1.1.0 and later

When you run the setup.sh script for the watsonx.data Developer version, if the installation path has a directory that contains spaces, an error might occur.

For example, if the installation path is:
/Users/john/documents/userdata/Hybrid Data Management/Lakehouse/ibm/lh-dev/bin
the error might be similar to the following message:
./setup.sh: line 19: /Users/john/documents/userdata/Hybrid: no such file or directory

Workaround: Install the watsonx.data developer version into a directory that contains no space.

Upgrade to watsonx.data 1.1.1 fails when there is only one presto worker pod
Applies to: 1.1.1

Fixed in: 1.1.2

Upgrade to watsonx.data 1.1.1 fails when there is only one presto worker pod. Running the following command shows that a worker with only one replica has <none> value for UPDATED REPLICAS.
oc get $(oc get sts -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep worker) -n ${PROJECT_CPD_INST_OPERANDS} -o custom-columns='NAME:metadata.name,REPLICAS:spec.replicas,UPDATED REPLICAS:status.updatedReplicas'

NAME                                            REPLICAS   UPDATED REPLICAS
ibm-lh-lakehouse-presto-01-prestissimo-worker   0          <none>
ibm-lh-lakehouse-presto-01-presto-worker        1          <none>      <---------
Workaround:
  1. Scale the worker statefulset(s) that have updated replicas from <none> to 0.
    oc scale sts ibm-lh-lakehouse-presto-01-presto-worker  -n ${PROJECT_CPD_INST_OPERANDS} --replicas=0
  2. Once the pod is deleted, scale worker statefulset(s) back to 1 replica.
    oc scale sts ibm-lh-lakehouse-presto-01-presto-worker  -n ${PROJECT_CPD_INST_OPERANDS} --replicas=1
  3. When the pod appears again, verify that updated replicas is not <none>.
    oc get $(oc get sts -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep worker) -n ${PROJECT_CPD_INST_OPERANDS} -o custom-columns='NAME:metadata.name,REPLICAS:spec.replicas,UPDATED REPLICAS:status.updatedReplicas'
    NAME                                            REPLICAS   UPDATED REPLICAS
    ibm-lh-lakehouse-presto-01-prestissimo-worker   0          <none>
    ibm-lh-lakehouse-presto-01-presto-worker        1         1

Known issues: Milvus

Milvus unresponsive to queries

Applies to: 1.1.3 and later

Milvus may not respond to queries when attempting to load collections or partitions that exceed available memory capacity. This occurs because all search and query operations within Milvus are executed in memory, requiring the entire collection or partition to be loaded before querying.

Workaround:
  • Consider the memory limitations of your Milvus deployment and avoid loading excessively large collections or partitions.
  • If Milvus becomes unresponsive to queries, employ the appropriate Milvus API to unload or release some collections from memory. An example using Python SDK: collection.release().
Inaccurate row count after deletions in Milvus

Applies to: 1.1.3 and later

The collection.num_entities property might not reflect the actual number of rows in a Milvus collection after deletion operations. This property provides an estimate and may not account for deleted entities.

To get an accurate count of rows, execute a count(*) query on the collection. This provides an accurate count even after deletions.

Pymilvus syntax:
collection = pymilvus.Collection(...)
collection.query(expr='', fields=['count(*)'])
Potential data loss during batch insert of large data collection in Milvus

Applies to: 1.1.3 and later

Potential data loss may occur when inserting large dataset (5 million vectors) through the Milvus batch insert API with a single final flush. A subset of rows might be missing from the ingested data.

Workaround:
  • Flush the collection manually every 500,000 rows.
  • Use the bulk insert API for data ingestion, see Insert Entities from Files. This is the recommended way to ingest large data sets.
Milvus service cannot be deleted using the delete icon from the user interface (UI) of watsonx.data developer edition in 1.1.3 version

Applies to: 1.1.3

Milvus collections missing after upgrading watsonx.data developer edition from 1.1.3 to 1.1.4

Applies to: 1.1.3

Fixed in: 1.1.4

When you upgrade watsonx.data developer edition from 1.1.3 to 1.1.4, previously created Milvus collections in 1.1.3 may become unavailable or inaccessible.

Workaround: Do the following workaround before loading data into the Milvus service in 1.1.3:
  1. Edit the ibm-lh-dev/etc/ibm-lh-etcd.conf file and add the following line:
    mnt_dir=/etcd
  2. Restart the Milvus and etcd services.

Known issues: SQL queries

Presto queries with many columns and size exceeding default limit

Applies to: 1.1.0 and later

Presto queries involving multiple tables with a large number of columns (for example, 1000 columns per table or more) in the SELECT clause might encounter performance issues across all deployment environments.

The iterative optimizer times out when max_reorder_joins is set to 5 or higher (the default timeout is 3 minutes) and gives the following error:
The optimizer exhausted the time limit of 180000 ms
For queries exceeding the default max-task-update-size limit (16MB in Presto), you might observe a TaskUpdate size exceeding this limit error (the specific value of limit depends on the actual query).
Workaround:
  • You can improve query performance by temporarily disabling the reorder_joins rule using the following session property:
    set session reorder_joins = false;
  • Increase the max-task-update-size value in the config.properties file if the issue involves a TaskUpdate size exceeding the limit error and restart Presto.

    Example:

    experimental.internal-communication.max-task-update-size=64MB
An unexpected error in parquet metadata reading occurs when running the queries on partitioned data

Applies to: 1.1.1

When you try to run the queries on partitioned data, you get the error Unexpected error in parquet metadata reading after cache miss.

Workaround: Disable the metastore versioned caching and header and footer caching. See Enhancing the query performance through caching for more information.

Unrestricted access to SQL statements in worksheets

Applies to: 1.1.0 and later

SQL statements within worksheets can be shared with all users who have access to the instance. These statements could be viewed, edited, or deleted by any of these users.

DROP TABLE command on an Iceberg table does not remove folder and files from object storage

Applies to: 1.1.0 and later

DROP TABLE command on an Iceberg table removes the table metadata from the metastore, but the data is not removed from the object store.

Unable to query SSL enabled Db2 instance with certificate uploaded in a FIPS cluster

Applies to: 1.1.3

Fixed in: 1.1.4

Querying attempts to connect to SSL enabled Db2 instance that include certificate will fail in Federal Information Processing Standard (FIPS) environments.

For further information, contact IBM support team.

Trailing spaces in WHERE clause values

Applies to: 1.1.3 and later

Fixed in: 1.1.4

SQL statements submitted through the workspace editor might encounter unexpected behavior when WHERE clause contains a value with trailing spaces.

Workaround: To ensure consistent behavior, construct WHERE clauses that do not depend on the presence or number of trailing spaces in the compared value.

Saving worksheets in the SQL editor fails after upgrading to watsonx.data 1.1.1

Applies to: 1.1.1

Fixed in: 1.1.2

After upgrading to watsonx.data 1.1.1, saving worksheets in the SQL editor fails with an error Error: pq created_on.
Workaround:
  1. Find the postgres-upgrade-to-111-job pod.
    export PROJECT_CPD_INST_OPERANDS=<INSTANCE_NAMESPACE>
    export UPGRADE_JOB=$(oc get po -o name -n $PROJECT_CPD_INST_OPERANDS | grep "ibm-lh-postgres-upgrade-to-111-job")
  2. Run the following commands.
    oc debug $UPGRADE_JOB -n $PROJECT_CPD_INST_OPERANDS
    
    . /mnt/infra/ibm-lh-config/env.properties
    . /mnt/infra/ibm-lh-secrets/env.properties
    export PGPASSWORD=$POSTGRES_PASSWORD
    
    psql -h ibm-lh-postgres-edb-rw -p 5432 -U $POSTGRES_USER -w -d ibm_lh_repo
    
    ALTER TABLE stored_queries ALTER COLUMN created_on drop default;
    ALTER TABLE stored_queries ALTER COLUMN created_on TYPE bigint USING EXTRACT(EPOCH FROM created_on);
    ALTER TABLE stored_queries ALTER COLUMN created_on set DEFAULT EXTRACT(EPOCH FROM now());

Known issues: Web console

Unique character handling in upload file feature

Applies to: 1.1.0 and later

Creating tables using Create table from file in watsonx.data fails for file upload that contain unsupported unique characters.
Workaround: You can do one of the following approaches to overcome this issue:
  • Manually enter the SQL commands containing the unique characters through the SQL editor.
  • Modify the file to remove or replace the unsupported unique characters before upload.
Test connection with SSL enabled is not supported

Applies to: 1.1.0 and later

When a user enables SSL connection for data sources, the test connection is not supported through the web console.

Known issues: Others

Data that is imported from watsonx.data bucket fails

Applies to: 1.1.0 and later

When you run the import script (import-bucket-data.sh) to restore the watsonx.data bucket data, the system displays an error that the container ibm-lh-lakehouse-minio is not found.
+ oc exec -t ibm-lh-lakehouse-minio-7db7c6788f-g4r8 -n cpd-instance -- bash -c 'mc alias set ibm-lh http://ibm-lh-lakehouse-minio-svc.cpd-instance.svc.cluster.local:9000 <access-key> <secret_key> \
--config-dir=/tmp/.mc \
--insecure && mc alias set ibm-lh_backup https://s3.us-west-2.amazonaws.com/ <access-key> <secret_key> \
--config-dir=/tmp/.mc --insecure'
error: unable to upgrade connection: container not found ("ibm-lh-lakehouse-minio")
Workaround: Delete the running MinIo pod by using the following command and rerun the import script.
oc delete -n $CPD_NAMESPACE $(oc get rs -o name -n $CPD_NAMESPACE | grep "ibm-lh-lakehouse-minio")
A java.lang.UnsupportedOperationException error occurs during selecting a partition table from S3 bucket

Applies to: 1.1.1 and later

Fixed in: 1.1.3

When you try to select a partition table from S3 bucket, you get java.lang.UnsupportedOperationException error.

Workaround: Disable the metastore versioned caching and header and footer caching. See Enhancing the query performance through caching for more information.

Integrating watsonx.data with IBM Knowledge Catalog is not supported in version 1.1.1

Applies to: 1.1.1

Fixed in: 1.1.2

Workaround: Upgrade to watsonx.data 1.1.2 version to integrate IBM Knowledge Catalog with watsonx.data.

Limitations: Access

User access control is not supported for fully managed and self managed Spark engines

Applies to: 1.1.2 and later

The Access control tab is not supported for fully-managed or self-managed Spark engine. Administrators cannot carry out the access control operations for fully managed or self managed Spark engines.

Add MinIO bucket to allowlist to establish connection with watsonx.data

Applies to: 1.1.2 and later

To establish connection with watsonx.data, you must add the MinIO bucket URL to allowlist on your air-gapped cluster.

Limitations: Catalog, schema, and tables

Cross catalog schema creation anomaly in Presto

Applies to: 1.1.0 and later

An anomaly exists in schema creation for Hive and Iceberg catalogs managed by Presto. When using a common Hive Metastore Service for multiple catalogs (Example, an Iceberg catalog and a Hive catalog, or two Iceberg or Hive catalogs), creating a schema in one catalog might create it in a wrong catalog. This occurs if the location specified during schema creation belongs to a different catalog than intended.

Workaround: You must always explicitly provide the correct storage path associated with the target catalog when using CREATE SCHEMA statements in Presto. This ensures the schema is created in the desired location.

Creating schemas in the root path of Ceph Object Storage gives an error

Applies to: 1.1.0 and later

Due to a bug in IBM Storage Ceph 5/6 and Red Hat Ceph Storage 4/5/6, if you are creating schema in the root path of one of the Ceph Object Storage in the watsonx.data, it gives you the following error message.
Executing query failed with error: com.facebook.presto.spi.PrestoException: Failed to create schema. Check the credentials, permissions and storage path for the bucket. Make sure that the bucket is registered with wxd and retry.

Solution: You can upgrade your IBM Storage Ceph and Red Hat Ceph Storage to 7.0z1 and 7.1 versions respectively.

Workaround: If you are still using the older IBM Storage Ceph 5/6 and Red Hat Ceph Storage 4/5/6 versions, you must do the following:

When you create a schema in Ceph Object Storage, a pseudo-directory must be created prior to creating schema in watsonx.data.

Run the following command to use the s5cmd S3 client to create a pseudo-directory and insert an empty file into it:

touch a
s5cmd --endpoint-url s3.ceph.example.com cp a s3://watsonx/mycatalog/myschema/

The copy command puts an empty file in the /mycatalog/myschema pseudo-directory.

Use the newly created pseudo-directory as the path for creating schema in watsonx.data web console.

Hive does not support json data that starts with array

Applies to: 1.1.0 and later

Hive does not support json data that starts with array.

Hive catalog table creation by using external_location fails due to wrong placement of file

Applies to: 1.1.0 and later

Hive catalog table creation by using external_location fails when the file is placed in the root of the bucket.

Table creation fails if the column names differ only by spaces

Applies to: 1.1.0 and later

When you create a table from a data file by using the watsonx.data web console, the column names must be unique. Due to this limitation, if a CSV data file has column names that differ only by "spaces" for example, Cash Flow per Share and CashFlowPerShare, then these columns are considered to have the same names and table creation fails.

Using special characters in schema, table, or column names

Applies to: 1.1.0 and later

It is recommended to not use special characters such as question mark (?) or asterisk (*) in table, column names and schema names. Though these special characters are supported and tables, columns and schemas can be created, using these special characters might cause issues when running the INSERT command.

Limitations: Database and connectors

Redshift connector case sensitivity

Applies to: 1.1.4 and later

The Redshift connector may not handle mixed-case database, table, and column names if the Redshift cluster configuration enable_case_sensitive_identifier is set to false (default). When this configuration is false, Redshift treats all identifiers as lowercase.

When user comes up with Redshift cluster configuration enable_case_sensitive_identifier set to true, then mixed-case will work.

Transactions not supported in unlogged Informix databases

Applies to: 1.1.4 and later

In watsonx.data, when attempting to execute queries with transactional implications on unlogged Informix databases, queries will fail. This is because unlogged Informix databases, by design, do not support transactions.

LDAP authentication is not supported for Teradata connector

Applies to: 1.1.0 and later

The watsonx.data Teradata connector does not currently support LDAP (Lightweight Directory Access Protocol) for user authentication.

Netezza Performance Server INSERT statement limitation

Applies to: 1.1.0 and later

Netezza Performance Server currently does not support inserting multiple rows directly into a table using VALUES clause. This functionality is limited to single-row insertions. Refer to the official Netezza Performance Server documentation for details on the INSERT statement.

The following example using VALUES for multiple rows is not supported:
INSERT INTO EMPLOYEE VALUES (3,'Roy',45,'IT','CityB'),(2,'Joe',45,'IT','CityC');
Workaround: Use a subquery with SELECT and UNION ALL to construct a temporary result set and insert it into the target table.
INSERT INTO EMPLOYEE SELECT * FROM(SELECT 4,'Steve',35,'FIN','CityC' UNION ALL SELECT 5,'Paul',37,'OP','CityA') As temp;
Unsupported Db2 operations

Applies to: 1.1.0 and later

watsonx.data currently does not support the ALTER TABLE DROP COLUMN operation for Db2 column-organized tables.

Note: By default, Db2 instances create tables in column-organized format.

watsonx.data does not support creating row-organized tables in Db2.

Handling Null Values in Elasticsearch

Applies to: 1.1.0 and later

Elasticsearch connector requires explicit definition of index mappings for fields to handle null values when loading data.

Loading Nested JSON with Elasticsearch

Applies to: 1.1.0 and later

Elasticsearch connector requires users to explicitly specify nested JSON structures as arrays of type ROW for proper loading and querying. To process such structures, use the UNNEST operation.

Db2 does not support CREATE VIEW statement for a table from another catalog

Applies to: 1.1.0 and later

For Db2, you can create the view for a table only if that table is in the same catalog and the same schema.

Netezza Performance Server does not support CREATE VIEW statement for a table from another catalog

Applies to: 1.1.0 and later

For Netezza Performance Server, you can create the view for a table only if that table is in the same catalog and the same schema.

Limitations: Engine

Presto needs precision of the DECIMAL column to be within a valid range
Applies to: 1.1.0 and later

Presto needs precision of the DECIMAL column in the PostgreSQL table creation statement to be within a valid range.

Unable to create views in Presto

Applies to: 1.1.0 and later

Presto describes a view in a mapped database as a TABLE rather than a VIEW. This is apparent to JDBC program connecting to the Presto engine.

HMS and Presto log level from Default Error level to Debug level

Applies to: 1.1.0 and later

watsonx.data console does not support changing HMS and Presto log level from Default Error level to Debug level.

Workaround:
  • Run the following curl command to change the log level inside the HMS pods:
    curl -k -X POST 'https://localhost:8281/v1/hms/loglevel' -H 'Content-Type: application/json' -d '{"log-level": "DEBUG"}'
  • Run the following curl command to change the log level inside the Presto pods:
    curl --location 'https://<host>:8481/v1/lh_engine/change_configuration' \
    -k --header 'secret: $LH_INSTANCE_SECRET' \
    --header 'Content-Type: application/json' \
    --data '{
        "type":"loglevel",
        "value":"info",
        "restart":true
    }'
Presto fails to restrict NULL values on a column with NOT NULL constraint

Applies to: 1.1.0 and later

Fixed in: 1.1.1

When you define a table with columns that have a NOT NULL constraint, the Presto engine fails to restrict the NULL values in columns that are defined with a NOT NULL constraint. Allowing NULL values leads to data inconsistency, resulting in read failure when executing queries.

Presto REST API with BigInt data

Applies to: 1.1.0

Fixed in: 1.1.1

Query workspace uses the Presto REST API for submitting and getting results to Presto. If the results consist of BigInt data, the last 3 digits are truncated to 000.

Workaround: If the results consist of BigInt data, you must cast the BigInt column with varchar to preserve the precision of the result.

Limitations: Ingestion

Delimiters supported for ingestion through UI

Applies to: 1.1.0 and later

Fixed in: 1.1.1

When you create a table from a data file by using the watsonx.data web console, use a comma (,) for the delimiter. Comma (,) is the only supported delimiter for ingestion through UI.

Limitations: SQL queries

Timestamp with timezone handling limitation in CREATE/ALTER TABLE

Applies to: 1.1.4 and later

Presto previously had a limitation where CREATE TABLE and ALTER TABLE statements incorrectly treated timestamps with timezones as simple timestamps. Since these are distinct data types, this could lead to errors. To address this issue, the functionality of mapping timestamps with timezones has been disabled.

Workaround: You need to modify CREATE TABLE and ALTER TABLE statements to use plain timestamps (without timezone information).

Alter column is not supported for Hive and Iceberg catalogs

Applies to: 1.1.0 and later

ALTER TABLE operations that change a column's type to an incompatible type (for example, from STRING to MAP) are not supported for Hive and Iceberg catalogs.

Dropping incompatible column types in ALTER TABLE

Applies to: 1.1.0 and later

Using ALTER TABLE to drop a column fails if the remaining columns have data types incompatible with the dropped column. This behavior applies to both Hive and Iceberg tables.

Workaround:
  1. Set hive.metastore.disallow.incompatible.col.type.changes configuration property to false in the Hive Metastore (HMS).
  2. Restart the HMS.
Case sensitivity of column names in queries

Applies to: 1.1.0 and later

Queries referencing column names are case-insensitive. The results will display columns using the exact casing provided in the query, regardless of the actual casing in the database.

Limitations: Others

IBM Knowledge Catalog integration does not support row-level filtering

Applies to: 1.1.2 and later

After IBM Knowledge Catalog integration, data-masking rules are enforced in watsonx.data. But row-filtering rules are not applied, which can cause the rows to be visible and accessible.

Cut or Copy icons are still enabled even after the action is performed

Applies to: 1.1.0 and later

When you select text in the Query workspace, Cut and Copy icons are enabled. The Cut and Copy icons remain enabled after performing the actions. The Cut and Copy icons must be disabled when no text is selected after the action is completed.

Workaround: Clipboard settings in the Advanced preferences option of the Firefox browser must be set to true. Following are the list of Clipboard settings:
  • dom.event.clipboardevents.enabled
  • dom.event.asyncClipboard.clipboardItem
  • dom.event.asyncClipboard.readText
  • dom.event.testing.asyncClipboard
No space left on device error occurs

Applies to: 1.1.0 and later

When you run queries on Presto while caching is enabled, a No space left on device error is displayed.

Workaround: To resolve this error, log in to the cache directory and delete all entries in it.

Using the S3 Select Pushdown option

Applies to: 1.1.0 and later

The S3 Select Pushdown option allows you to filter the data at the source and retrieve just the subset of data that you need. In watsonx.data, this option is disabled by default. You can enable the S3 Select Pushdown option (s3_select_pushdown_enabled) by using the API. Currently, the S3 Select Pushdown option is supported only on IBM Storage Ceph and Amazon Web Services (AWS).

You can select only non-database catalogs

Applies to: 1.1.2 and later

When integrating with IBM Knowledge Catalog, you can select only non-database catalogs. Database catalogs are not supported in watsonx.data.

For more information, see Integrating with IBM Knowledge Catalog.

Tables that are not in the IBM Knowledge Catalog might be inaccessible when an integration is active

Applies to: 1.1.2 and later

Fixed in: 1.1.3

With an IBM Knowledge Catalog integration active, the tables that are not in a governed catalog remain inaccessible. For more information, see the integration prerequisites Integrating with IBM Knowledge Catalog.