Resetting status of ongoing Watson OpenScale model evaluations
After you upgrade a Watson OpenScale instance, you can reset the status of ongoing model evaluations that aren't running properly.
Before you begin
You must get an API key. For more information, see Generating an API authorization token.
About this task
When you complete a non-disruptive monthly upgrade of a Watson OpenScale instance, some of its features, such as scheduled or on-demand model evaluations, might not function properly. When the upgrade finishes, you can use the following steps to reset the status of ongoing model evaluations:
Procedure
- Log in to Red Hat OpenShift Container Platform with the following command:
oc login <OpenShift_URL>:<port> - Scale down the Watson OpenScale micro-service
with the following command:
instanceProjectName='cpd-instance' instanceCRName='aiopenscale' oc scale deployment -n ${instanceProjectName} -l "component in (aios-bias,aios-bkpi,aios-drift,aios-explainability,aios-fast,aios-feedback,aios-ml,aios-mrm,aios-notification,aios-scheduling)" --replicas=0If you did not install Cloud Pak for Data in the
cpd-instanceproject or useaiopenscaleas the name of the Watson OpenScale custom resource, specify accurate values in theinstanceProjectNameandinstanceCRNamefields. - Log in to the operator pod with the following command:
operatorProjectName='cpd-operator' OPERATOR_POD_NAME=$(oc get pods -n ${operatorProjectName} | grep wos | awk {'print $1'}) oc exec --tty --stdin ${OPERATOR_POD_NAME} -n ${operatorProjectName} -- /bin/bashIf you did not install the Watson OpenScale operator in the
cpd-operatorproject, specify accurate values in theoperatorProjectNamefield. - Set the values of the environment variables that are required to reset the evaluation
status with the following command:
instanceProjectName='cpd-instance' instanceCRName='aiopenscale' export ETCD_ENDPOINTS=https://${instanceCRName}-ibm-aios-etcd.${instanceProjectName}.svc.cluster.local:2379 export ETCD_USER=root export ETCD_PASSWORD=`kubectl get secret ${instanceCRName}-ibm-aios-etcd-secrets -n ${instanceProjectName} -o jsonpath='{.data.etcd-root-password}' | base64 -d` export ETCD_CACERT_BASE64=`kubectl get secret internal-tls -n ${instanceProjectName} -o jsonpath='{.data.ca\.crt}'` export AIOS_GATEWAY_URL=https://${instanceCRName}-ibm-aios-nginx-internal.${instanceProjectName} export AIOS_SERVICE_CREDENTIALS=<api_token> - Navigate to the
filesfolder in the operator by running the following command:cd roles/service/files - Start the reset procedure by specifying the required arguments as shown in the following
example:
RESET_TIMESTAMP='2022-05-30T00:00:00.000Z' DATA_MART_IDS='00000000-0000-0000-0000-000000000000,00000000-0000-0000-0000-1655797537073567' ./wos_restore.sh -t ${RESET_TIMESTAMP} --delta 30 -i ${DATA_MART_IDS} -pThe
RESET_TIMESTAMPattribute is theISO-8601timestamp from when the status of ongoing evaluations must be reset. You must use theYYYY-MM-DDTHH:MM:SS.sssZformat to specify the timestamp.The
DATA_MART_IDSattribute is a comma separated list of the target Watson OpenScale data mart identifiers that enable restoration. The value of a data mart identifier is the Watson OpenScale service instance identifier with the00000000-0000-0000-0000-prefix. The default Watson OpenScale service instance includes a fixed00000000-0000-0000-0000-000000000000data mart ID.You can use the following commands to view a list of Watson OpenScale service instance identifiers:
curl -s -k -H "Authorization: Bearer ${TOKEN}" "https://internal-nginx-svc.${instanceProjectName}.svc:12443/zen-data/v3/service_instances?addon_type=aios&fetch_all_instances=true" | jq -r '.service_instances[] | [.id, .display_name] | @tsv'Note: To run this script, you must generate and export token as the${MY_TOKEN}environment variable. For details, see Generating an API authorization token.The command displays the list of Watson OpenScale service instance names and ID pairs as shown in the following example:1655797537073567 inst2 1655691348195375 openscale-defaultinstance - Run the exit command to exit the operator pod.
- After the restoration finishes, restart the
aios-redisandaios-configurationservice pods with the following commands:oc delete pod -n ${instanceProjectName} -l app.kubernetes.io/component=aios-redis oc delete pod -n ${instanceProjectName} -l app.kubernetes.io/component=aios-configuration - Force the Watson OpenScale operator to reconcile
the Watson OpenScale instance with the following command:
oc patch WOService ${instanceCRName} -n ${instanceProjectName} --type merge --patch '{"spec": {"forceReconcile": "'$(date +%s)'"}}' - Check the status of the Watson OpenScale custom resource reconciliation with the
following command:
oc get WOService ${instanceCRName} -n ${instanceProjectName} -o jsonpath='{.status.wosStatus} {"\n"}'The status of the custom resource changes to
Completedwhen the reconciliation finishes successfully.