Known issues and limitations
The following known issues and limitations apply to watsonx.data as a Service with gen AI experience.
- Regional limitations
- watsonx.ai Studio
- watsonx.ai Runtime issues
- Orchestration Pipelines known issues
- Orchestration Pipelines limitations
- Issues with Cloud Object Storage
- watsonx.data intelligence
- watsonx.data limitations
watsonx.ai Studio issues
You might encounter some of these issues when getting started with and using watsonx.ai Studio:
Duplicating a notebook doesn't create a unique name in the new projects UI
When you duplicate a notebook in the new projects UI, the duplicate notebook is not created with a unique name.
Failure to export a notebook to HTML in the Jupyter Notebook editor
When you are working with a Jupyter Notebook created in a tool other than watsonx.ai Studio, you might not be able to export the notebook to HTML. This issue occurs when the cell output is exposed.
Workaround
-
In the Jupyter Notebook UI, go to Edit and click Edit Notebook Metadata.
-
Remove the following metadata:
"widgets": { "state": {}, "version": "1.1.2" } -
Click Edit.
-
Save the notebook.
Connection to notebook kernel is taking longer than expected after running a code cell
If you try to reconnect to the kernel and immediately run a code cell (or if the kernel reconnection happened during code execution), the notebook doesn't reconnect to the kernel and no output is displayed for the code cell. You need to manually reconnect to the kernel by clicking Kernel > Reconnect. When the kernel is ready, you can try running the code cell again.
Using the predefined sqlContext object in multiple notebooks causes an error
You might receive an Apache Spark error if you use the predefined sqlContext object in multiple notebooks. Create a new sqlContext object for each notebook. See this Stack Overflow explanation.
Connection failed message
If your kernel stops, your notebook is no longer automatically saved. To save it, click File > Save manually, and you should get a Notebook saved message in the kernel information area, which appears before the Spark version. If you get a message that the kernel failed, to reconnect your notebook to the kernel click Kernel > Reconnect. If nothing you do restarts the kernel and you can't save the notebook, you can download it to save your changes by clicking File > Download as > Notebook (.ipynb). Then you need to create a new notebook based on your downloaded notebook file.
Hyperlinks to notebook sections don't work in preview mode
If your notebook contains sections that you link to from an introductory section at the top of the notebook for example, the links to these sections will not work if the notebook was opened in view-only mode in Firefox. However, if you open the notebook in edit mode, these links will work.
Can't connect to notebook kernel
If you try to run a notebook and you see the message Connecting to Kernel, followed by Connection failed. Reconnecting and finally by a connection failed error message, the reason might be that your firewall is blocking
the notebook from running.
If watsonx.ai Studio is installed behind a firewall, you must add the WebSocket connection wss://dataplatform.cloud.ibm.com to the firewall settings. Enabling this WebSocket connection is required when you're using notebooks and
RStudio.
Files that are uploaded through the watsonx.ai Studio UI are not validated or scanned for potentially malicious content
Files that you upload through the watsonx.ai Studio UI are not validated or scanned for potentially malicious content. It is strongly recommended that you run security software, such as an anti-virus application on all files before uploading your files to ensure the security of your content.
watsonx.ai Runtime issues
You might encounter some of these issues when working with watsonx.ai Runtime.
Region requirements
You can only associate a watsonx.ai Runtime service instance with your project when the watsonx.ai Runtime service instance and the watsonx.ai Studio instance are located in the same region.
Accessing links if you create a service instance while associating a service with a project
While you are associating a watsonx.ai Runtime service to a project, you have the option of creating a new service instance. If you choose to create a new service, the links on the service page might not work. To access the service terms, APIs, and documentation, right click the links to open them in new windows.
Orchestration Pipelines issues
The issues pertain to Orchestration Pipelines.
Asset browser does not always reflect the count for total numbers of asset type
When selecting an asset from the asset browser, such as choosing a source for a Copy node, you see that some of the assets list the total number of that asset type available, but notebooks do not.
Cannot delete pipeline versions
Currently, you cannot delete saved versions of pipelines that you no longer need. All versions are deleted when the pipeline is deleted.
Cache appears enabled but is not enabled
If the Copy assets Pipelines node's Copy mode is set to Overwrite, cache is displayed as enabled but remains disabled.
Pipelines cannot save some SQL statements
Pipelines cannot save when SQL statements with parentheses are passed in a script or user variable.
To resolve this issue, replace all instances of parentheses with their respective ASCII code (( with #40 and ) with #41) and replace the code when you set it as a user variable.
For example, the statement select CAST(col1 as VARCHAR(30)) from dbo.table in a Run Bash script node causes an error. Instead, use the statement select CAST#40col1 as VARCHAR#4030#41#41 from dbo.table and replace the instances when setting it as a user variable.
Orchestration Pipelines abort when limit for annotations is reached
Pipeline expressions require annotations, which have a limit due to the limit for annotations in Kubernetes. If you reach this limit, your pipeline will abort without displaying logs.
Orchestration Pipelines limitations
These limitations apply to Orchestration Pipelines.
Single pipeline limits
These limitations apply to a single pipeline, regardless of configuration.
- Any single pipeline cannot contain more than 120 standard nodes
- Any pipeline with a loop cannot contain more than 600 nodes across all iterations (for example, 60 iterations - 10 nodes each)
Input and output size limits
Input and output values, which include pipeline parameters, user variables, and generic node inputs and outputs, cannot exceed 10 KB of data.
Bash scripts throws errors with curl commands
The Bash scripts in your pipelines might cause errors if you implement curl commands in them. To prevent this issue, set your curl commands as parameters. To save a pipeline that causes error when saving, try exporting the isx file and importing them into a new project.
Cloud Object Storage issues
These issues apply to working with Cloud Object Storage.
Viewers might experience temporary limitations
When the IBM Cloud Object Storage configuration is out of date, project collaborators with the Viewer role might see this message:
Limited actions
The project storage is out of date. Some actions might not work for you until a project collaborator with the Admin or Editor role opens this project. To view the collaborators, go to the Manage page.
When a collaborator with the Editor or Admin role opens the project, the IBM Cloud Object Storage configuration is automatically updated.
watsonx.data intelligence
You might encounter some of the following issues when working with watsonx.data intelligence.
Catalog asset search doesn't support special characters
If search keywords contain any of the following special characters, the search filter doesn't return the most accurate results.
Search keywords:
. + - && || ! ( ) { } [ ] ^ " ~ * ? : \
Workaround: To obtain the most accurate results, search only for the keyword after the special character. For example, instead of AUTO_DV1.SF_CUSTOMER, search for SF_CUSTOMER.
Predefined governance artifacts might not be available
If you don't see any predefined classifications or data classes, reinitialize your tenant by using the following API call:
curl -X POST "https://api.dataplatform.cloud.ibm.com/v3/glossary_terms/admin/initialize_content" -H "Authorization: Bearer $BEARER_TOKEN" -k
Assets are blocked if evaluation fails
The following restrictions apply to data assets in a catalog with policies enforced: File-based data assets that have a header can't have duplicate column names, a period (.), or single quotation mark (') in a column name.
If evaluation fails, the asset is blocked to all users except the asset owner. All other users see an error message that the data asset cannot be viewed because evaluation failed and the asset is blocked.
DataStage lineage job can produce no lineage
If you want to create a DataStage project export through the API, GET/v2/asset_exports/{export_id} API endpoint might not finish working and returns PENDING status. As a result, DataStage lineage job acts as complete and produces
no lineage.
Workaround: Manually upload the project export as an asset and add the .zip file to the metadata import job to be processed.
Information about documents that couldn't be analyzed is not recorded
When you run an unstructured data curation analysis flow in a Spark runtime environment, information about which documents couldn't be analyzed is not recorded. The number of ingested and analyzed documents doesn't match, but there is no information about which files weren't analyzed and why.
When you run the flow in a Python runtime environment, error information is captured as expected.
In flows where a single document class is selected, classification and extraction might not work
In a flow where only one document class is provided for processing, the documents might not be properly processed.
In the Unstructured Data Integration flow, the Classification operator or Extract operator might fail to classify the documents or extract any entities respectively.
In unstructured data curation, the analysis flow might properly classify the documents. However, when you run the processing flow, the metrics might show that the documents were skipped for extraction or no entities were extracted.
Workaround: Manually update the generated flow:
- Replace the Classification operator with an Extract operator, and select all document classes.
- Remove any additional Extract operators that appear later in the flow.
watsonx.data known issues and limitations
You might encounter some of the following issues when working with watsonx.data.
Retrival search limitation with ReAct strategy in GPT-OSS deployments
Retrieval search fails if the user has only vector details in the DL and selects the GPT-OSS model.
Inconsistent behavior when disabling GenAI ACLs
When GenAI ACLs are disabled through the UI, row-level filtering in Presto may continue to function without errors until the associated ACL bucket is removed from watsonx.data. This behavior is incidental and should not be considered reliable
The system occasionally misses information during entity extraction from the document
There could be some information missing in extraction of the entities from the document. This issue is intermittent and not observed always. When 10 or more data assets are imported, the extraction may miss 1-2 documents.
ETL process does not support currency units normalization for monetary entities
ETL process does not currently support normalizing currency units for monetary entities. This means that if invoices contain amounts in various currencies, the data is extracted as-is without any currency conversion or normalization.
Limitations during Text-to-SQL conversion
The following limitations are observed with Text to SQL conversion:
- Semantic matching of the schema - The LLM does not correctly match columns with semantic similarity.
- Wrong dialect for date queries.
- Only supports VARCHAR columns, other datatypes result in non-executable SQL when operations are requested.
- If a single field contains multiple separate values, the entire field content is treated as a single unified value.
Limitation for granite retrieval service model
The following limitations have been observed in the use of the granite retrieval service model:
- The system does not cast columns to DOUBLE or DATE even if their names suggest they store double or date values when the granite retrieval service model is used.
- When the user query includes explicit instructions to perform operations like summing, averaging, or filtering based on comparisons whether the value is greater than or lesser than some other value, and so forth, the system generates non-executable SQL statements.
Limitation for llama retrieval service model
The following limitations have been observed in the use of the llama retrieval service model:
- When casting date values only the following date formats are accepted:
%b %d, %Y,%m-%d-%Y,%y-%m-%d,%d.%m.%Y,%m.%d.%Y,%d-%M-%Y,%Y-%m-%d,%d/%m/%Y,%m/%d/%Y
If the value is not in the supported format, the default date value assigned will be2000-01-01. - If the value is not a valid double, the system does not cast it to DOUBLE and it assigns NULL.
- When the system reuses an already cast column in another clause (such as order by or group by), it casts and replaces regex again. This causes errors because the regex only accepts VARCHAR parameters, but the column's data type has already changed due to the initial cast, leading to SQL execution failures.
Row filtering based on user groups will not be applied when ACLs are ingested from data sources that contains user groups.
Limitations in Customizable Schema Support in Milvus
Milvus currently imposes the following limitations on customizable schema support:
- Each collection must include at least one vector field.
- A primary key field is required.
Retrieval service behavior: Milvus hybrid search across all vector fields
When multiple vector fields are defined in a collection, Retrieval Service performs a Milvus hybrid search across all of them regardless of their individual relevance to the query.
Ensuring data governance with the document_id field
A document_id field is required to ensure proper data governance and to enable correlation between vector and SQL-based data.
System fails to transform values appropriately
The system does not convert shorthand values (for example, 30K to 30000) or standardize currency formats (for example, $39M), affecting data consistency.