Known issues and limitations

Platform known issues and limitations are listed in the IBM Software Hub 5.2 documentation.

See the main list of IBM Software Hub known issues in the IBM Software Hub documentation and the known issues for each of the services that are installed on your system.

The following are known issues specific to watsonx.data:

Spark license type not updated after reapplying Software Hub (SWH) entitlement

When the SWH entitlement is removed and then re-applied using the cpd-cli apply entitlement command, the Spark does not automatically update its licenseType field. This can lead to inconsistencies in license recognition within the system.

Workaround: After reapplying the entitlement using the cpd-cli apply entitlement command, manually patch the Spark CR using the following command:

oc patch AnalyticsEngine analyticsengine-sample -n <instance_namespace> --type merge --patch '{"spec": {"wxdLicenseType":"<license_type>"}}'

Postgres primary missing after power outage in watsonx.data

Applies to: 2.2.0 and later

After a data center power outage, users running watsonx.data on Software Hub may encounter a failure in the ibm-lh-postgres-edb cluster where no primary instance is available. In one reported case, one pod (ibm-lh-postgres-edb-3) was stuck in a switchover state, displaying the message “This is an old primary instance, waiting for the switchover to finish,” while kubectl cnpg status indicated “Primary instance not found.” This caused the entire cluster to become unresponsive, making watsonx.data unusable.

Workaround: To restore cluster, complete the following steps:

  1. Identify a healthy replica - Run the following command to check the status of the Postgres pods:

    oc get pods | grep postgres
    

    Look for a pod in 1/1 Running state. This indicates a healthy replica.

  2. Promote the healthy replica: Run the cnp tool to promote the healthy replica to primary. For example, if ibm-lh-postgres-edb-1 is healthy:

    kubectl cnp promote ibm-lh-postgres-edb ibm-lh-postgres-edb-1
    
  3. Force delete the stuck primary (if promotion is blocked): If the promotion fails due to lingering artifacts from the old primary pod, run the following command to force delete the problematic pod and its PVC.

    Note: Ensure your replicas are in 1/1 Running state before proceeding.
    kubectl delete pod ibm-lh-postgres-edb-3 --grace-period=0 --force
    kubectl delete pvc ibm-lh-postgres-edb-3
    
  4. Verify cluster recovery: After cleanup, the new primary pod and PVC should come up. Run the following command to confirm the cluster status:

    kubectl cnp status ibm-lh-postgres-edb
    

Unable to create schema under catalogs of type blackhole using Create schema action

Applies to: 2.2.0 and later FIxed in: 2.2.1

User is unable to create schema under catalogs of type blackhole using Create schema action in the Query workspace.

Workaround: Enter the create schema query directly in the Query workspace for the blackhole catalog.

watsonx.data Milvus fails to run a UDI flow

Applies to: 2.2.0 and later FIxed in: 2.2.1

watsonx.data Milvus fails to run a UDI flow created using the default datasift collection.

Workaround: Use a new datasift collection in place of default collection.

Document Library (DL) does not have sparse embedding model information

Applies to: 2.2.0 and later

DL does not provide information about the embedding model used for creating sparse embeddings.

Running concurrency tests against the retrieval service generates TEXT_TO_SQL_EXECUTION_ERROR

Applies to: 2.2.0 and later

The system generates TEXT_TO_SQL_EXECUTION_ERROR when running concurrency tests against the retrieval service with a concurrency number greater than 2, accompanied by the following error details: [{"code":"too_many_requests","message":"SAL0148E: Too many requests. Please wait and retry later."}]

Workaround: Complete the following steps:

  1. Run the following command to set max_parallelism_compute_embeddings to a higher value in semantic-automation service. Default value is 2.

    oc set env -n cpd1-instance deployment/semantic-automation max_parallelism_compute_embeddings=<value>
    
  2. Set the value of number_of_workers_for_embeddings that controls the number of workers in semantic-automation service same as max_parallelism_compute_embeddings value.

  3. Provide more cpu/memory to the semantic-embedding pod to align with the increased in max_parallelism_compute_embeddings and number_of_workers_for_embeddings value.
    Or, add more replicas for semantic-embedding service.

Row filtering based on user groups will not be applied when ACLs are ingested from data sources that contains user groups

Applies to: 2.2.0 and later

Issue with ACL ingestion before catalog mapping

Applies to: 2.2.0 and later

If ACL ingestion happens before the user maps the ACL catalogs, the ACLs will be ingested to iceberg_data.

Workaround: To change the storage to another catalog, complete the following steps:

  1. Add the ACL catalog through the UI.
  2. Delete the existing ACL schema from the iceberg_data table.
  3. Restart the CPG pod.

The system occasionally misses information during entity extraction from the document

Applies to: 2.2.0 and later

There could be some information missing in extraction of the entities from the document. This issue is intermittent and not observed always. When 10 or more data assets are imported, the extraction may miss 1-2 documents.

ETL process does not support currency units normalization for monetary entities

Applies to: 2.2.0 and later

ETL process does not currently support normalizing currency units for monetary entities. This means that if invoices contain amounts in various currencies, the data is extracted as-is without any currency conversion or normalization.

Flat schema for line items may not accurately represent the hierarchical structure of the data

Applies to: 2.2.0 and later

Line items in data assets are extracted using a flat schema, which may not accurately represent the hierarchical structure of the data. Following is an example of the original and the extracted invoice.

known issue Flat 1

known issue Flat 2

Separate Iceberg tables are created in the unstructured ingestion flow instead of appending the new data

Applies to: 2.2.0 and later

In the unstructured ingestion flow, it is noticed that separate Iceberg tables are created instead of appending the new data to the existing table in the same document library and Milvus collection. This might result in incorrect response from the prompt lab. This will be fixed in the GA release.

Chat with Prompt Lab returns inappropriate responses

Applies to: 2.2.0 and later

When the retrieval results are empty, Chat with Prompt Lab may sometimes return inappropriate responses. This will be fixed in the GA release.

Limitations of the Amount and Date columns are in VARCHAR schema

Applies to: 2.2.0 and later

Currently, only the following date formats are accepted:

%b %d, %Y, %m-%d-%Y, %y-%m-%d, %d.%m.%Y, %m.%d.%Y, %d-%M-%Y, %Y-%m-%d

For Amount type columns, the following rules apply:

  • The value must contain at least one numerical character.
  • The value can have a maximum of one decimal point.
  • The value must adhere to the DOUBLE data type constraints.
  • If more than 1 value gets extracted inside a single cell, it will be treated as a single value.
  • Negative numbers are treated as positive.

The system generates incorrect resuts when multiple document libraries exist in a project

Applies to: 2.2.0 and later

Incorrect results are noticed intermittently when multiple document libraries exist in a project. Therefore, it is recommended to have one document library per project.

Limitations during Text-to-SQL conversion

Applies to: 2.2.0 and later

The following limitations are observed with Text to SQL conversion:

  • Semantic matching of the schema - The LLM does not correctly match columns with semantic similarity.
  • Wrong dialect for date queries.

Limitation for the SQL execution inside the retrieval service

Applies to: 2.2.0 and later

The SQL execution inside the retrieval service fails when multiple presto connections, catalogs, or schemas are used across multiple document sets in a Document Library.

ACL-enabled data enrichment may fail during ingestion

Applies to: 2.2.0 and later

With ACL enabled, the data enrichment may fail during Ingestion with the following Warning message: computing vectors for asset failed with error. This will be fixed in the upcoming release.

Limitations in Customizable Schema Support in Milvus

Applies to: 2.2.0 and later

Milvus currently imposes the following limitations on customizable schema support:

  • Each collection must include at least one vector field.
  • A primary key field is required.

Retrieval service behavior: Milvus hybrid search across all vector fields

When multiple vector fields are defined in a collection, Retrieval Service performs a Milvus hybrid search across all of them regardless of their individual relevance to the query.

Ensuring data governance with the document_id field

A document_id field is required to ensure proper data governance and to enable correlation between vector and SQL-based data.

Pipeline extracts incorrect values

Applies to: 2.2.0 and later

The pipeline extracts malformed or irrelevant data, such as including “thru 2025” in the Year field or pulling random text into document_title and name fields.

System merges multiple entities into One column

Applies to: 2.2.0 and later

The system combines distinct entities like GSTIN and State into a single column, reducing data granularity and usability.

Pipeline assigns incorrect data types

Applies to: 2.2.0 and later

The pipeline stores numeric and date fields (for example, deal_size, year) as VARCHAR instead of INT or DATE, which impacts query performance and validation.

System fails to transform values appropriately

Applies to: 2.2.0 and later

The system does not convert shorthand values (for example, 30K to 30000) or standardize currency formats (for example, $39M), affecting data consistency.

Extraction process handles null values inconsistently

Applies to: 2.2.0 and later

The extraction process inserts the string “None” instead of actual null values, which breaks SQL logic and affects data integrity.

SQL Generation Logic Produces Faulty Queries Due to Improper Casting and Schema Confusion

Applies to: 2.2.0 and later

The SQL generation process fails when it uses incorrect casting methods (for example, CAST(REPLACE, ',') without handling symbols like $, and struggles with duplicate columns, resulting in invalid or unusable queries.

Spark allows application submission with locally uploaded data assets

Applies to: 2.2.1 and later

You can create data assets either by uploading files from a local source, or by importing files from connections within the project. However, Spark applications are only compatible with data assets that are uploaded locally. If a Spark job attempts to use a data asset imported from a connection, the job fails to run.