Known issues and limitations

The following known issues and limitations apply to watsonx.data as a Service with gen AI experience.

watsonx.ai Studio issues

You might encounter some of these issues when getting started with and using watsonx.ai Studio:

Duplicating a notebook doesn't create a unique name in the new projects UI

When you duplicate a notebook in the new projects UI, the duplicate notebook is not created with a unique name.

Failure to export a notebook to HTML in the Jupyter Notebook editor

When you are working with a Jupyter Notebook created in a tool other than watsonx.ai Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.

Workaround

  1. In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.

  2. Remove the following metadata:

    "widgets": {
       "state": {},
       "version": "1.1.2"
    }
    
  3. Click Edit.

  4. Save the notebook.

Connection to notebook kernel is taking longer than expected after running a code cell

If you try to reconnect to the kernel and immediately run a code cell (or if the kernel reconnection happened during code execution), the notebook doesn't reconnect to the kernel and no output is displayed for the code cell. You need to manually reconnect to the kernel by clicking Kernel > Reconnect. When the kernel is ready, you can try running the code cell again.

Using the predefined sqlContext object in multiple notebooks causes an error

You might receive an Apache Spark error if you use the predefined sqlContext object in multiple notebooks. Create a new sqlContext object for each notebook. See this Stack Overflow explanation.

Connection failed message

If your kernel stops, your notebook is no longer automatically saved. To save it, click File > Save manually, and you should get a Notebook saved message in the kernel information area, which appears before the Spark version. If you get a message that the kernel failed, to reconnect your notebook to the kernel click Kernel > Reconnect. If nothing you do restarts the kernel and you can't save the notebook, you can download it to save your changes by clicking File > Download as > Notebook (.ipynb). Then you need to create a new notebook based on your downloaded notebook file.

Can't connect to notebook kernel

If you try to run a notebook and you see the message Connecting to Kernel, followed by Connection failed. Reconnecting and finally by a connection failed error message, the reason might be that your firewall is blocking the notebook from running.

If watsonx.ai Studio is installed behind a firewall, you must add the WebSocket connection wss://dataplatform.cloud.ibm.com to the firewall settings. Enabling this WebSocket connection is required when you're using notebooks and RStudio.

Insufficient resources available error when opening or editing a notebook

If you see the following message when opening or editing a notebook, the environment runtime associated with your notebook has resource issues:

Insufficient resources available
A runtime instance with the requested configuration can't be started at this time because the required hardware resources aren't available.
Try again later or adjust the requested sizes.

To find the cause, try checking the status page for IBM Cloud incidents affecting watsonx.ai Studio. Additionally, you can open a support case at the IBM Cloud Support portal.

Files that are uploaded through the watsonx.ai Studio UI are not validated or scanned for potentially malicious content

Files that you upload through the watsonx.ai Studio UI are not validated or scanned for potentially malicious content. It is strongly recommended that you run security software, such as an anti-virus application on all files before uploading your files to ensure the security of your content.

watsonx.ai Runtime issues

You might encounter some of these issues when working with watsonx.ai Runtime.

Region requirements

You can only associate a watsonx.ai Runtime service instance with your project when the watsonx.ai Runtime service instance and the watsonx.ai Studio instance are located in the same region.

Accessing links if you create a service instance while associating a service with a project

While you are associating a watsonx.ai Runtime service to a project, you have the option of creating a new service instance. If you choose to create a new service, the links on the service page might not work. To access the service terms, APIs, and documentation, right click the links to open them in new windows.

Orchestration Pipelines issues

The issues pertain to Orchestration Pipelines.

Asset browser does not always reflect the count for total numbers of asset type

When selecting an asset from the asset browser, such as choosing a source for a Copy node, you see that some of the assets list the total number of that asset type available, but notebooks do not.

Cannot delete pipeline versions

Currently, you cannot delete saved versions of pipelines that you no longer need. All versions are deleted when the pipeline is deleted.

Cache appears enabled but is not enabled

If the Copy assets Pipelines node's Copy mode is set to Overwrite, cache is displayed as enabled but remains disabled.

Pipelines cannot save some SQL statements

Pipelines cannot save when SQL statements with parentheses are passed in a script or user variable.

To resolve this issue, replace all instances of parentheses with their respective ASCII code (( with #40 and ) with #41) and replace the code when you set it as a user variable.

For example, the statement select CAST(col1 as VARCHAR(30)) from dbo.table in a Run Bash script node causes an error. Instead, use the statement select CAST#40col1 as VARCHAR#4030#41#41 from dbo.table and replace the instances when setting it as a user variable.

Orchestration Pipelines abort when limit for annotations is reached

Pipeline expressions require annotations, which have a limit due to the limit for annotations in Kubernetes. If you reach this limit, your pipeline will abort without displaying logs.

Orchestration Pipelines limitations

These limitations apply to Orchestration Pipelines.

Single pipeline limits

These limitations apply to a single pipeline, regardless of configuration.

  • Any single pipeline cannot contain more than 120 standard nodes
  • Any pipeline with a loop cannot contain more than 600 nodes across all iterations (for example, 60 iterations - 10 nodes each)

Input and output size limits

Input and output values, which include pipeline parameters, user variables, and generic node inputs and outputs, cannot exceed 10 KB of data.

Bash scripts throws errors with curl commands

The Bash scripts in your pipelines might cause errors if you implement curl commands in them. To prevent this issue, set your curl commands as parameters. To save a pipeline that causes error when saving, try exporting the isx file and importing them into a new project.

Cloud Object Storage issues

Note: This applies only to deployments on IBM Cloud.

These issues apply to working with Cloud Object Storage.

Viewers might experience temporary limitations

When the IBM Cloud Object Storage configuration is out of date, project collaborators with the Viewer role might see this message:

Limited actions
The project storage is out of date. Some actions might not work for you until a project collaborator with the Admin or Editor role opens this project. To view the collaborators, go to the Manage page.

When a collaborator with the Editor or Admin role opens the project, the IBM Cloud Object Storage configuration is automatically updated.

watsonx.data intelligence

You might encounter some of the following issues when working with watsonx.data intelligence.

Catalog asset search doesn't support special characters

If search keywords contain any of the following special characters, the search filter doesn't return the most accurate results.

Search keywords:

. + - && || ! ( ) { } [ ] ^ " ~ * ? : \

Workaround: To obtain the most accurate results, search only for the keyword after the special character. For example, instead of AUTO_DV1.SF_CUSTOMER, search for SF_CUSTOMER.

Predefined governance artifacts might not be available

If you don't see any predefined classifications or data classes, reinitialize your tenant by using the following API call:

curl -X POST "https://api.dataplatform.cloud.ibm.com/v3/glossary_terms/admin/initialize_content" -H "Authorization: Bearer $BEARER_TOKEN" -k

Assets are blocked if evaluation fails

The following restrictions apply to data assets in a catalog with policies enforced: File-based data assets that have a header can't have duplicate column names, a period (.), or single quotation mark (') in a column name.

If evaluation fails, the asset is blocked to all users except the asset owner. All other users see an error message that the data asset cannot be viewed because evaluation failed and the asset is blocked.

DataStage lineage job can produce no lineage

If you want to create a DataStage project export through the API, GET/v2/asset_exports/{export_id} API endpoint might not finish working and returns PENDING status. As a result, DataStage lineage job acts as complete and produces no lineage.

Workaround: Manually upload the project export as an asset and add the .zip file to the metadata import job to be processed.

Information about documents that couldn't be analyzed is not recorded

When you run an unstructured data curation analysis flow in a Spark runtime environment, information about which documents couldn't be analyzed is not recorded. The number of ingested and analyzed documents doesn't match, but there is no information about which files weren't analyzed and why.

When you run the flow in a Python runtime environment, error information is captured as expected.

In flows where a single document class is selected, classification and extraction might not work

In a flow where only one document class is provided for processing, the documents might not be properly processed.

In the Unstructured Data Integration flow, the Classification operator or Extract operator might fail to classify the documents or extract any entities respectively.

In unstructured data curation, the analysis flow might properly classify the documents. However, when you run the processing flow, the metrics might show that the documents were skipped for extraction or no entities were extracted.

Workaround: Manually update the generated flow:

  1. Replace the Classification operator with an Extract operator, and select all document classes.
  2. Remove any additional Extract operators that appear later in the flow.

watsonx.data known issues and limitations

You might encounter some of the following issues when working with watsonx.data.

Retrival search limitation with ReAct strategy in GPT-OSS deployments

Retrieval search fails if the user has only vector details in the DL and selects the GPT-OSS model.

Inconsistent behavior when disabling GenAI ACLs

When GenAI ACLs are disabled through the UI, row-level filtering in Presto may continue to function without errors until the associated ACL bucket is removed from watsonx.data. This behavior is incidental and should not be considered reliable

The system occasionally misses information during entity extraction from the document

There could be some information missing in extraction of the entities from the document. This issue is intermittent and not observed always. When 10 or more data assets are imported, the extraction may miss 1-2 documents.

ETL process does not support currency units normalization for monetary entities

ETL process does not currently support normalizing currency units for monetary entities. This means that if invoices contain amounts in various currencies, the data is extracted as-is without any currency conversion or normalization.

Limitations during Text-to-SQL conversion

The following limitations are observed with Text to SQL conversion:

  • Semantic matching of the schema - The LLM does not correctly match columns with semantic similarity.
  • Wrong dialect for date queries.
  • Only supports VARCHAR columns, other datatypes result in non-executable SQL when operations are requested.
  • If a single field contains multiple separate values, the entire field content is treated as a single unified value.

Limitation for granite retrieval service model

The following limitations have been observed in the use of the granite retrieval service model:

  • The system does not cast columns to DOUBLE or DATE even if their names suggest they store double or date values when the granite retrieval service model is used.
  • When the user query includes explicit instructions to perform operations like summing, averaging, or filtering based on comparisons whether the value is greater than or lesser than some other value, and so forth, the system generates non-executable SQL statements.

Limitation for llama retrieval service model

The following limitations have been observed in the use of the llama retrieval service model:

  • When casting date values only the following date formats are accepted:
    %b %d, %Y, %m-%d-%Y, %y-%m-%d, %d.%m.%Y, %m.%d.%Y, %d-%M-%Y, %Y-%m-%d, %d/%m/%Y, %m/%d/%Y

    If the value is not in the supported format, the default date value assigned will be 2000-01-01.
  • If the value is not a valid double, the system does not cast it to DOUBLE and it assigns NULL.
  • When the system reuses an already cast column in another clause (such as order by or group by), it casts and replaces regex again. This causes errors because the regex only accepts VARCHAR parameters, but the column's data type has already changed due to the initial cast, leading to SQL execution failures.

Row filtering based on user groups will not be applied when ACLs are ingested from data sources that contains user groups.

Limitations in Customizable Schema Support in Milvus

Milvus currently imposes the following limitations on customizable schema support:

  • Each collection must include at least one vector field.
  • A primary key field is required.

Retrieval service behavior: Milvus hybrid search across all vector fields

When multiple vector fields are defined in a collection, Retrieval Service performs a Milvus hybrid search across all of them regardless of their individual relevance to the query.

Ensuring data governance with the document_id field

A document_id field is required to ensure proper data governance and to enable correlation between vector and SQL-based data.

System fails to transform values appropriately

The system does not convert shorthand values (for example, 30K to 30000) or standardize currency formats (for example, $39M), affecting data consistency.