Known issues and limitations for Watson OpenScale
The following list contains the limitations and known issues for IBM Watson OpenScale.
Limitations
-
When you enable batch processing, Watson OpenScale has the following limitations:
- Only support for Structured data
- Only support for Production environments
- Combinations of environments supported
- Remote Spark Cluster + Non-kerberized Hive
- Remote Spark Cluster + Kerberized Hive
- Remote Spark Cluster + Db2
- IAE + Non-kerberized Hive
- IAE + Db2
- IAE + Kerberized Hive
- During an evaluation request, from the model summary screen, you might see an error on the Model Summary window that displays, “Evaluation for Quality/Drift monitor didn’t finish within 900 seconds.” Although you see the error, the actual monitor evaluation finishes to completion. If you encounter such an error, navigate back to the Insights dashboard, check if a quality or drift score is visible in the deployment tile, and then come back to the Model Summary window.
- You must create a new volume and not use the default volume when you use Analytics Engine Powered by Apache Spark to prepare your deployment environment
- You must install dependencies by using Python 3.9.x or higher and upload them to the mount path when you use Analytics Engine Powered by Apache Spark to prepare your deployment environment
- In your Hive table, if there is a column, whether feature or not, that is named
rawPrediction
, configuring and evaluating the drift monitor fails. - If a column named
probability
is in your Hive table, and it is not configured with modeling-role probability, configuring and evaluating the drift monitor fails. - Pyspark ML, the framework that is used to build the drift detection model, does not entertain Boolean fields when the drift model is trained. The training table must have any
boolean
columns represented asstring
. - If the drift detection model was generated by running the configuration notebook against a Hadoop Cluster (Cluster A) that is different from the Hadoop cluster (Cluster B) that is used for monitoring, evaluating the drift monitor fails. To correct this problem, you must perform the following steps:
- Download the drift archive by using the notebook.
- Extract the contents of the drift archive to a folder.
- In a text editor, open the
ddm_properties.json
file. - Look for the property
drift_model_path
. This property has the path where the drift model is stored in HDFS in this cluster. - Download the folder in the
drift_model_path
to your local workstation. - Copy this folder in an HDFS location
/new/path
in your production cluster. - Update the property
drift_model_path
in theddm_properties.json
. The new property looks like the following sample:hdfs://production_cluster_host:port/new/path
- Compress the contents of the drift archive folder as a tar.gz file. Do not compress the folder itself, only the contents. All the files must be present at the top location and not inside a folder in the archive.
-
When you configure settings for SHAP global explanations, Watson OpenScale has the following limitations:
- The sample size that you use to configure explanations can affect the number of explanations that Watson OpenScale can generate during specific time periods. If you attempt to generate multiple explanations for large sample sizes, Watson OpenScale might fail to process your transactions.
- If you configure explanations for multiple Watson OpenScale subscriptions, you must specify the default values for the sample size and number of perturbations settings when your deployment contains 20 features or less.
-
When you want to specify a model endpoint to configure a Watson Machine Learning batch deployment, Watson OpenScale has the following limitations:
- You must create a remote Watson Machine Learning provider.
- You can't use the Watson Machine Learning deployment space that contains your online deployment to add a batch deployment.
-
When configuring batch subscriptions, if the partition column name is changed for an existing table, Watson OpenScale doesn't validate the column if the name is not specified in the table. You must verify that the partition column name that you specify is also in the table. If the partition column isn't in the table, your monitor evaluations might fail or run incorrectly.
-
After you create a new table in Watson OpenScale version 4.5 when you configure batch subscriptions, if you change the partition column name, Watson OpenScale does not add the new partition column.
-
If you have a feature name that contains a period (
.
), the Chart Builder interface isn't displayed. Instead, you see aFailed to get data
error message. -
Watson OpenScale does not support custom metric endpoints that are deployed on remote Cloud Pak for Data clusters. You can use a custom notebook to specify the
token_info
endpoint to generate a token that you can use to add a custom metric endpoint. -
Support for the zLinux platform has the following limitations:
- Only scikit-learn, XGBoost frameworks, and Python functions are supported for IBM Watson Machine Learning.
- Datamart databases must be created with a larger page size than the default value to work well with wide datasets, as shown in the following example:
CREATE DB {db name} PAGESIZE {PAGESIZE integer} (8192 for 8K or more)
- Hive tables that are created with the ORC format are not supported while monitoring batch subscriptions with IBM Analytics Engine and Hive.
-
Watson OpenScale does not support models where the data type of the model prediction is binary. You must change such models so that the data type of their prediction is a string or integer data type.
-
Drift is supported for structured data only.
-
Although classification models support both data and accuracy drift, regression models support only data drift.
-
Drift is not supported for Python functions.
-
If the training of the drift detection model doesn't meet the quality standards, then model drift detection is disabled. If model drift detection is disabled, a drop in model accuracy can't be detected. In this scenario, no action is required.
-
Support for the XGBoost framework has the following limitations for classification problems: For binary classification, Watson OpenScale supports the
binary:logistic
logistic regression function with an output as a probability ofTrue
. For multiclass classification, Watson OpenScale supports themulti:softprob
function where the result contains the predicted probability of each data point belonging to each class. -
Fairness and drift metrics are not supported for unstructured (image or text) data types.
-
Having an equals sign (=) in the column name of a dataset causes an issue with explainability and generates the following error message:
Error: An error occurred while computing feature importance
. Do not use an equals sign (=) in a column name. It is not supported. -
The maximum character limit for a service instance name is 41 characters.
-
Support for scikit-learn 0.20 is deprecated with IBM Watson OpenScale version 4.0.2. When you upgrade to IBM Watson OpenScale version 4.0.2., if your existing drift detection model uses scikit-learn 0.20, the drift detection model stops working. If you have configured your IBM Watson OpenScale instance to detect the drift in accuracy and the drift in data, the drift in accuracy detection does not work. If you configured your IBM Watson OpenScale instance to detect only the drift in accuracy, the drift detection monitor does not work as drift is not measured for payload data. Additionally, the model monitor evaluation page does not show a configured drift monitor and you cannot view any past drift metrics. To avoid these limitations, you must retrain your drift detection model and reconfigure the drift detection monitor.
-
If you are connecting to a Db2 database to import test data for model evaluations, you must specify uppercase column names in the input schema to correspond with the case-sensitive names in the database.
-
If you upload test data for your preproduction model evaluations that exceeds the default maximum
10485760 bytes
data size for thepayload-logging-service-api
pod, your upload might cause an error. To avoid this error, you must set the value for the-Dservice.defaults.import.max_csv_line_length
option in theADDITIONAL_JVM_OPTIONS
environment variable to a larger size that fits your data set. -
For proper processing of payload analytics, Watson OpenScale does not support column names with double quotation marks (") in the payload. This affects both scoring payload and feedback data in CSV and JSON formats.
-
Explainability is not supported for SPSS multiclass models that return only the winning class probability.
-
For IBM Watson Machine Learning, scoring input for image classification models that are sent for payload logging cannot exceed 1 MB. To avoid time out issues, images must not exceed 100 x 100 x 3 pixels and must be sent sequentially so that the explanation for the second image is requested when the first one is completed.
-
The Amazon SageMaker BlazingText algorithm input payload format is not supported in the current release of Watson OpenScale.
-
The use of Oracle compatibility mode causes problems for Watson OpenScale. You might receive an error, such as "Drift archive could not be uploaded for service instance" if you attempt to use any supported databases with Oracle compatibility mode activated. To use Watson OpenScale with your database, you must disable compatibility mode.
-
Scoring payloads for a model must fit within the maximum width allowed for the table created by payload logging in the datamart database (with some buffer for the internal-use columns that IBM Watson OpenScale itself adds). In addition, apart from the width there is also a hardcoded limit of 1012 features. Because many models have features of mixed types, the following sample configurations can be used for planning purposes:
- For int64 or float64 or strings of length 64 or less, count as 64.
- For strings from 65 to 2048, count as 2048.
- For strings from 2048 to 32 K, count as 32 K.
- The total length of all features should be no more than ~900 K.
Known issues
Watson OpenScale has the following known issues:
- Watson OpenScale etcd pod fails to start
- Watson OpenScale instance upgrade from version 4.0.x with OCS storage fails
- Attempt to create new service instance fails after upgrade
- Fairness evaluation of Remote Spark model fails
- Watson OpenScale instance does not display correct status after shutdown
- Watson OpenScale etcd cluster pods fail to come up in healthy state on AWS cluster with Portworx storage
- Watson OpenScale instance fails to open
Watson OpenScale etcd pod fails to start
Watson OpenScale etcd pods might fail to start due to corrupted data. This issue can occur intermittently after you restore a Watson OpenScale instance from a backup instance. To fix this issue, you must complete the following steps:
-
Log in to Red Hat OpenShift Container platform with the following command:
oc login <OpenShift_URL>:<port>
-
Scale down the
etcd Statefulset
cluster with the following command:instanceProjectName='cpd-instance' instanceCRName='aiopenscale' oc scale sts ${instanceCRName}-ibm-aios-etcd -n ${instanceProjectName} --replicas=0
If you did not install Cloud Pak for Data in the
cpd-instance
project or useaiopenscale
as the name of the Watson OpenScale custom resource, specify accurate values in theinstanceProjectName
andinstanceCRName
fields. -
Delete the persistent volume claim (PVC) that is bound to the
etcd
pod that failed as shown in the following example:oc patch pvc data-${instanceCRName}-ibm-aios-etcd-1 -n ${instanceProjectName} --type=merge -p '{"metadata": {"finalizers":null}}' oc delete pvc data-${instanceCRName}-ibm-aios-etcd-1 -n ${instanceProjectName}
-
Scale up the
etcd Statefulset
cluster with the following command:oc scale sts ${instanceCRName}-ibm-aios-etcd -n ${instanceProjectName} --replicas=3
This command replicates data from other healthy
etcd
pods in the pod that failed. -
Check the status of the
etcd
pods with the following command:oc get pod -l component=aios-etcd -n ${instanceProjectName}
Watson OpenScale instance fails to open
Your attempt to open a Watson OpenScale instance might fail due to the following error:
Error loading public access token key
To fix this issue, you must complete the following steps:
-
Log in to Red Hat OpenShift Container Platform with the following command:
oc login <OpenShift_URL>:<port>
-
Run the following command to force the Watson OpenScale operator to reconcile the Watson OpenScale custom resource:
instanceProjectName='cpd-instance' instanceCRName='aiopenscale' oc patch WOService ${instanceCRName} -n ${instanceProjectName} --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' oc patch WOService ${instanceCRName} -n ${instanceProjectName} --type merge --patch '{"spec": {"ignoreForMaintenance": false}}'
If you did not install Cloud Pak for Data in the
cpd-instance
project or useaiopenscale
as the name of the Watson OpenScale custom resource, specify accurate values in theinstanceProjectName
andinstanceCRName
fields. -
Check the status of the reconcilliation with the following command:
oc get WOService ${instanceCRName} -n ${instanceProjectName} -o jsonpath='{.status.wosStatus} {"\n"}'
The status of the custom resources changes to
Completed
when the reconcilliation finishes successfully.
Watson OpenScale instance upgrade from version 4.0.x with OCS storage fails
When you upgrade Watson OpenScale to version 4.5.x or 4.6.0 from a 4.0.x version that uses Red Hat OpenShift Container Storage (OCS) classes, the periodic reconcillation of the Watson OpenScale custom resource might fail. This failure might occur because of multiple storage classes that are used during the upgrade for the block and file storage types. Watson OpenScale version 4.0.x installations support only a single storage class for any storage type. To fix this issue, you must complete the following steps:
-
Log in to Red Hat OpenShift Container Platform with the following command:
oc login <OpenShift_URL>:<port>
-
Use the
patch
command to change the Watson OpenScale custom resource configuration and use a single storage class for the block and file storage parameters:instanceProjectName='cpd-instance' instanceCRName='aiopenscale' oc patch WOService ${instanceCRName} -n ${instanceProjectName} --type merge --patch '{"spec": {"blockStorageClass": "ocs-storagecluster-cephfs"}}'
If you did not install Cloud Pak for Data in the cpd-instance
project or use aiopenscale
as the name of the Watson OpenScale custom resource, specify accurate values in the instanceProjectName
and instanceCRName
fields.
-
Check the status of the reconcilliation with the following command:
oc get WOService ${instanceCRName} -n ${instanceProjectName} -o jsonpath='{.status.wosStatus} {"\n"}'
The status of the custom resources changes to
Completed
when the reconcilliation finishes successfully.
Attempt to create new service instance fails after upgrade
When you upgrade Watson OpenScale version 3.5.x to version 4.6.x, your attempt to create a new instance might fail due to the Error ez_addon_provision_failed
error. To fix this issue, you must complete the following steps:
-
Log in to Red Hat OpenShift Container Platform with the following command:
oc login <OpenShift_URL>:<port>
-
Delete the Watson OpenScale version 3.5.x
ConfigMap
object with the following command:instanceProjectName='cpd-instance' oc delete cm ibm-addon-config-aios -n $instanceProjectName
-
Restart the platform
zen-watcher
pod.oc delete pod -l component=zen-watcher -n $instanceProjectName
-
Check the status of the pod with the following command:
oc get pod -l component=zen-watcher -n $instanceProjectName
Fairness evaluation of Remote Spark model fails
When you run a fairness evaluation in Watson OpenScale on a model that uses Remote Spark version 3.1 or earlier, the evaluation might fail due to the following error:
Detected implicit cartesian product for INNER join between logical plans....
This error occurs when the Remote Spark environment misidentifies the join operation as a cartesian product operation. You can fix this issue by upgrading to Spark version 3.2 or later.
Watson OpenScale instance does not display correct status after shutdown
After you shut down a Watson OpenScale instance, the status of the instance displays as pending
or failed
on the Cloud Pak for Data service instance details page. The shutdown is still successful and you can use the
following command to verify that the instance is not running:
instanceProjectName='cpd-instance'
instanceCRName='aiopenscale'
oc get WOService ${instanceCRName} -n ${instanceProjectName} -o jsonpath='{.status.wosStatus} {"\n"}'
If you did not install Cloud Pak for Data in the cpd-instance
project or use aiopenscale
as the name of the Watson OpenScale custom resource, specify accurate values in the instanceProjectName
and instanceCRName
fields.
The status of the instance displays as shutdown
to confirm that the shutdown finished successfully.
Applies to: 4.6.0
Fixed in: 4.6.1
OpenScale etcd cluster pods fail to come up in healthy state on AWS cluster with Portworx storage
If you are installing OpenScale on AWS with Portworx storage, you might encounter a failure with the micro-service pods waiting for the OpenScale etcd cluster to come up and be healthy. This can occur when the persistent volumes that are backing
up the OpenScale etcd cluster are slow executing read/write operations, resulting in a failure of the etcd cluster. To resolve this issue, increase the health check livenessProbe
timeout settings in the etcd statefulset podSpecTemplate
.
Note: OpenScale installation uses persistent volumes requiring RWO and RWX access modes. OpenScale etcd cluster requires a persistent volume with RWO access mode. Occasionally, on certain AWS clusters, while using the default portworx-shared-gp3 storage class for OpenScale installation, the OpenScale etcd cluster fails to initialise itself to a healthy state. The problem is due to the slower persistent volume with default storage class. To resolve this issue, perform the OpenScale installation by configuring a different storage class for RWO access mode. For example:
cat <<EOF |oc apply -f -
apiVersion: wos.cpd.ibm.com/v1
kind: WOService
metadata:
name: aiopenscale # This is the recommended name, but you can change it
namespace: project-name # Replace with the project where you will install Watson OpenScale
spec:
scaleConfig: small | medium # The default value is `small`, but you can scale up to `medium`
license:
accept: true
license: Standard | Enterprise # Specify the license you purchased
version: 4.0.6
type: service
storageClass: portworx-shared-gp3
rwoStorageClass: portworx-metastoredb-sc
EOF
If you encounter errors that indicate scoring against Azure Machine Learning Service cannot be reached, such as receiving an HTTP Status Code of 403, check your enterprise security policies and ensure that the scoring URL is properly re-categorized with the tools used, as needed, to allow Watson OpenScale to properly access the scoring endpoints.
Parent topic: Evaluating AI models with Watson OpenScale