Resetting status of ongoing Watson OpenScale model evaluations

After you upgrade a Watson OpenScale instance, you can reset the status of ongoing model evaluations that aren't running properly.

Before you begin

You must get an API key. For more information, see Generating an API authorization token.

About this task

When you complete a non-disruptive monthly upgrade of a Watson OpenScale instance, some of its features, such as scheduled or on-demand model evaluations, might not function properly. When the upgrade finishes, you can use the following steps to reset the status of ongoing model evaluations:

Procedure

  1. Log in to Red Hat OpenShift Container Platform with the following command:
    oc login <OpenShift_URL>:<port>
  2. Scale down the Watson OpenScale micro-service with the following command:
    instanceProjectName='cpd-instance'
    instanceCRName='aiopenscale'
    
    oc scale deployment -n ${instanceProjectName} -l "component in (aios-bias,aios-bkpi,aios-drift,aios-explainability,aios-fast,aios-feedback,aios-ml,aios-mrm,aios-notification,aios-scheduling)" --replicas=0
    

    If you did not install Cloud Pak for Data in the cpd-instance project or use aiopenscale as the name of the Watson OpenScale custom resource, specify accurate values in the instanceProjectName and instanceCRName fields.

  3. Log in to the operator pod with the following command:
    operatorProjectName='cpd-operator'
    OPERATOR_POD_NAME=$(oc get pods -n ${operatorProjectName} | grep wos | awk {'print $1'})
    oc exec --tty --stdin ${OPERATOR_POD_NAME} -n ${operatorProjectName} -- /bin/bash
    

    If you did not install the Watson OpenScale operator in the cpd-operator project, specify accurate values in the operatorProjectName field.

  4. Set the values of the environment variables that are required to reset the evaluation status with the following command:
    instanceProjectName='cpd-instance'
    instanceCRName='aiopenscale'
    
    export ETCD_ENDPOINTS=https://${instanceCRName}-ibm-aios-etcd.${instanceProjectName}.svc.cluster.local:2379
    export ETCD_USER=root
    export ETCD_PASSWORD=`kubectl get secret ${instanceCRName}-ibm-aios-etcd-secrets -n ${instanceProjectName} -o jsonpath='{.data.etcd-root-password}' | base64 -d`
    
    export ETCD_CACERT_BASE64=`kubectl get secret internal-tls -n ${instanceProjectName} -o jsonpath='{.data.ca\.crt}'`
    
    export AIOS_GATEWAY_URL=https://${instanceCRName}-ibm-aios-nginx-internal.${instanceProjectName}
    export AIOS_SERVICE_CREDENTIALS=<api_token>
    
    
  5. Navigate to the files folder in the operator by running the following command:
    cd roles/service/files
    
  6. Start the reset procedure by specifying the required arguments as shown in the following example:
    RESET_TIMESTAMP='2022-05-30T00:00:00.000Z'
    DATA_MART_IDS='00000000-0000-0000-0000-000000000000,00000000-0000-0000-0000-1655797537073567'
    ./wos_restore.sh -t ${RESET_TIMESTAMP} --delta 30 -i ${DATA_MART_IDS} -p
    

    The RESET_TIMESTAMP attribute is the ISO-8601 timestamp from when the status of ongoing evaluations must be reset. You must use the YYYY-MM-DDTHH:MM:SS.sssZ format to specify the timestamp.

    The DATA_MART_IDS attribute is a comma separated list of the target Watson OpenScale data mart identifiers that enable restoration. The value of a data mart identifier is the Watson OpenScale service instance identifier with the 00000000-0000-0000-0000- prefix. The default Watson OpenScale service instance includes a fixed 00000000-0000-0000-0000-000000000000 data mart ID.

    You can use the following commands to view a list of Watson OpenScale service instance identifiers:

    curl -s -k -H "Authorization: Bearer ${TOKEN}" "https://internal-nginx-svc.${instanceProjectName}.svc:12443/zen-data/v3/service_instances?addon_type=aios&fetch_all_instances=true" | jq -r '.service_instances[] | [.id, .display_name] | @tsv'
    
    Note: To run this script, you must generate and export token as the ${MY_TOKEN} environment variable. For details, see Generating an API authorization token.
    The command displays the list of Watson OpenScale service instance names and ID pairs as shown in the following example:
    1655797537073567	inst2
    1655691348195375	openscale-defaultinstance
    
  7. Run the exit command to exit the operator pod.
  8. After the restoration finishes, restart the aios-redis and aios-configuration service pods with the following commands:
    oc delete pod -n ${instanceProjectName} -l app.kubernetes.io/component=aios-redis
    oc delete pod -n ${instanceProjectName} -l app.kubernetes.io/component=aios-configuration
    
  9. Force the Watson OpenScale operator to reconcile the Watson OpenScale instance with the following command:
    oc patch WOService ${instanceCRName} -n ${instanceProjectName} --type merge --patch '{"spec": {"forceReconcile": "'$(date +%s)'"}}'
    
  10. Check the status of the Watson OpenScale custom resource reconciliation with the following command:
    oc get WOService ${instanceCRName} -n ${instanceProjectName} -o jsonpath='{.status.wosStatus} {"\n"}'
    

    The status of the custom resource changes to Completed when the reconciliation finishes successfully.