Backing up and restoring the Watson Machine Learning Accelerator service

Use this information to backup or restore the IBM Watson® Machine Learning Accelerator service.

Online backup and restore of Watson Machine Learning Accelerator
Offline backup and restore of Watson Machine Learning Accelerator
After restoring Watson Machine Learning Accelerator

Online backup and restore of Watson Machine Learning Accelerator

Online backup

To complete an online backup, see Cloud Pak for Data online backup and restore.

Online restore

After you restore the Watson Machine Learning Accelerator service using the Cloud Pak for Data restore process, you must run an additional script to restore owner references to all Watson Machine Learning Accelerator resources.

Before you begin:

Before performing an online restore, make sure that the Watson Machine Learning Accelerator namespace is deleted.

Steps:

Log in to your OpenShift cluster as a project administrator.
```
oc login OpenShift_URL:port
```
Switch to the Watson Machine Learning Accelerator namespace.
```
oc project wmla-namespace
```

Return owner references to Watson Machine Learning Accelerator resources, run the following script depending on your version of Watson Machine Learning Accelerator:

#!/bin/bash
wmla_name=`oc get wmla -o name|awk -F/ '{print $NF}'`
wmla_uid=`oc get wmla $wmla_name -o jsonpath='{.metadata.uid}'`
user_pvc=`oc get wmla $wmla_name -o jsonpath={.spec.usePreCreatedPvcs}`

for r in \
certificate.cert-manager.io/wmla-ca-crt \
certificate.cert-manager.io/wmla-internal-keys \
certificate.cert-manager.io/wmla-nginx-keys \
certificate.cert-manager.io/wmla-internal-keys-ecdsa \
certificate.cert-manager.io/wmla-nginx-keys-ecdsa \
certificate.cert-manager.io/wmla-worker-keys \
configmap/cpd-wmla-br-cm \
configmap/cpd-wmla-ckpt-cm \
configmap/cpd-wmla-qu-cm \
configmap/cpd-wmla-add-on-br-cm \
configmap/wmla-edi-lbd-nginx \
configmap/wmla-gpu-types \
configmap/wmla-install-info-cm \
configmap/wmla-watchdog-conf \
configmap/wmla-wml-accelerator-instance-cm \
configmap/wmla-dlpd-bootstrap \
configmap/wmla-edi \
configmap/wmla-edi-dlim \
configmap/wmla-edi-imd-nginx \
configmap/wmla-edi-isd \
configmap/wmla-edi-isd-ingress \
configmap/wmla-grafana-configmap \
configmap/wmla-grafana-ini \
configmap/wmla-grafana-providers \
configmap/wmla-infoservice \
configmap/wmla-jupyter-hub-config \
configmap/wmla-logstash-conf \
configmap/wmla-mongodb-shells \
configmap/wmla-msd \
configmap/wmla-mss \
configmap/wmla-nginx-conf \
configmap/wmla-nginx-grafana-sidecar-conf \
configmap/wmla-nginx-sidecar-conf \
configmap/wmla-prometheus \
configmap/wmla-version-info \
configmap/wmlaconfigmap \
deployment.apps/wmla-auth-rest \
deployment.apps/wmla-conda \
deployment.apps/wmla-dlpd \
deployment.apps/wmla-edi-imd \
deployment.apps/wmla-edi-lbd \
deployment.apps/wmla-grafana \
deployment.apps/wmla-gui \
deployment.apps/wmla-infoservice \
deployment.apps/wmla-ingress \
deployment.apps/wmla-jupyter-gateway \
deployment.apps/wmla-jupyter-hub \
deployment.apps/wmla-jupyter-proxy \
deployment.apps/wmla-logstash \
deployment.apps/wmla-msd \
deployment.apps/wmla-mss \
deployment.apps/wmla-prometheus \
deployment.apps/wmla-watchdog \
horizontalpodautoscaler.autoscaling/wmla-auth-rest-hpa \
horizontalpodautoscaler.autoscaling/wmla-dlpd-hpa \
horizontalpodautoscaler.autoscaling/wmla-edi-lbd-hpa \
horizontalpodautoscaler.autoscaling/wmla-gui-hpa \
horizontalpodautoscaler.autoscaling/wmla-ingress-hpa \
horizontalpodautoscaler.autoscaling/wmla-watchdog-hpa \
ingress.networking.k8s.io/wmla-jupyter-ingress \
issuer.cert-manager.io/wmla-ca \
issuer.cert-manager.io/wmla-root-issuer \
networkpolicy.networking.k8s.io/wmla-dlpd-netpol \
networkpolicy.networking.k8s.io/wmla-edi-imd-network-policy \
networkpolicy.networking.k8s.io/wmla-edi-isd-network-policy \
networkpolicy.networking.k8s.io/wmla-infoservice-netpol \
networkpolicy.networking.k8s.io/wmla-ingress-network-policy \
networkpolicy.networking.k8s.io/wmla-logstash-network-policy \
networkpolicy.networking.k8s.io/wmla-msd-netpol \
networkpolicy.networking.k8s.io/wmla-namespace-network-policy \
persistentvolumeclaim/wmla-conda \
persistentvolumeclaim/wmla-cws-share \
persistentvolumeclaim/wmla-edi \
persistentvolumeclaim/wmla-infoservice \
persistentvolumeclaim/wmla-logging \
persistentvolumeclaim/wmla-mygpfs \
persistentvolumeclaim/wmla-grafana \
persistentvolumeclaim/wmla-prometheus \
poddisruptionbudget.policy/wmla-jupyter-hub-pdb \
poddisruptionbudget.policy/wmla-jupyter-proxy-pdb \
role.rbac.authorization.k8s.io/wmla-core-role \
role.rbac.authorization.k8s.io/wmla-edi \
role.rbac.authorization.k8s.io/wmla-msd-mss \
role.rbac.authorization.k8s.io/wmla-notebook-role \
role.rbac.authorization.k8s.io/wmla-role \
rolebinding.rbac.authorization.k8s.io/wmla-core-rb \
rolebinding.rbac.authorization.k8s.io/wmla-edi \
rolebinding.rbac.authorization.k8s.io/wmla-msd-mss \
rolebinding.rbac.authorization.k8s.io/wmla-notebook-rb \
rolebinding.rbac.authorization.k8s.io/wmla-rb \
route.route.openshift.io/wmla-console \
route.route.openshift.io/wmla-grafana \
route.route.openshift.io/wmla-inference \
route.route.openshift.io/wmla-jupyter-notebook \
secret/wmla-dlpd-conf \
secret/wmla-eg-secret \
secret/wmla-grafana-secret \
secret/wmla-jupyter-hub-secret \
secret/wmla-mongodb-secret \
secret/wmla-prometheus-htpasswd \
service/wmla-auth-rest \
service/wmla-dlpd \
service/wmla-edi \
service/wmla-edi-admin \
service/wmla-etcd \
service/wmla-grafana \
service/wmla-gui \
service/wmla-inference \
service/wmla-infoservice \
service/wmla-ingress \
service/wmla-jupyter-enterprise-gateway \
service/wmla-jupyter-hub \
service/wmla-jupyter-proxy-api \
service/wmla-jupyter-proxy-public \
service/wmla-logstash-service \
service/wmla-mongodb \
service/wmla-msd \
service/wmla-mss \
service/wmla-prometheus \
serviceaccount/wmla-core-sa \
serviceaccount/wmla-msd-mss \
serviceaccount/wmla-norbac \
serviceaccount/wmla-notebook-sa \
serviceaccount/wmla-sa \
statefulset.apps/wmla-etcd \
statefulset.apps/wmla-mongodb \
zenextension/zen-wmla-frontdoor-extension \
zenextension/zen-wmla-edi-frontdoor-extension \
wmla-add-on.spectrumcomputing.ibm.com/wmla;
do
	oc get $r >& /dev/null
	if [ $? == "0" ]; then
		#skip patch user pvc
		if [ x$user_pvc == 'xtrue' ];then
			resoucetype=`echo $r|awk -F'/' '{print $1}'`
			if [ x$resoucetype == 'xpersistentvolumeclaim' ];then
				echo "skip user defined PVC $r"
				continue
			fi
		fi
		echo "Patch ownerReferences for $r"
		oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla\",\"name\":\"$wmla_name\",\"uid\":\"$wmla_uid\"}]}}"
	fi
done

#update ownerReferences for wmla resource plans
wmla_rps=`oc get rp -o name`
ns_rp=`oc get rp platform  -o jsonpath={.spec.parent}`
wmla_fix_rp=`oc get rp platform  -o jsonpath={.spec.children[0].name}`
cpd_fix_rp=`oc get rp platform  -o jsonpath={.spec.children[1].name}`
for r in $wmla_rps;
do
	#skip scheduler created resource plans
	rp_name=`echo $r|awk -F'/' '{print $2}'`
	if [ x$rp_name == "xplatform" -o x$rp_name == "x$ns_rp" -o x$rp_name == "x$wmla_fix_rp" -o x$rp_name == "x$cpd_fix_rp" ];then
		echo "skip resource plan $rp_name"
		continue
	fi
	echo "Patch ownerReferences for $r"
	oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla\",\"name\":\"$wmla_name\",\"uid\":\"$wmla_uid\"}]}}"
done

#update ownerReferences for deploy/isd and isd/service
isds=`oc get deploy -o name|grep wmla-edi-isd`
imd_uid=`oc get deploy wmla-edi-imd -o jsonpath='{.metadata.uid}'`
for r in $isds;
do
	echo "Patch ownerReferences for $r"
	oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"apps/v1\",\"blockOwnerDeletion\":true,\"controller\":true,\"kind\":\"Deployment\",\"name\":\"wmla-edi-imd\",\"uid\":\"$imd_uid\"}]}}"
done

isd_servicess=`oc get services -o name|grep wmla-edi-isd`
for r in $isd_servicess;
do
	isd_name=`echo $r|awk -F/ '{print $NF}'`
	isd_uid=`oc get deploy $isd_name -o jsonpath='{.metadata.uid}'`
	echo "Patch ownerReferences for $r"
	oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"apps/v1\",\"blockOwnerDeletion\":true,\"controller\":true,\"kind\":\"Deployment\",\"name\":\"$isd_name\",\"uid\":\"$isd_uid\"}]}}"
done

#update ownerReferences for wmla-add-on cm
wmla_add_on_name=`oc get wmla-add-on -o name|awk -F/ '{print $NF}'`
if [ x$wmla_add_on_name != x ];then
	wmla_add_on_uid=`oc get wmla-add-on $wmla_add_on_name -o jsonpath='{.metadata.uid}'`
	oc patch configmap/cpd-wmla-add-on-br-cm --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	wmla_instance_cm=`oc get cm -o name|grep wml-accelerator-instance-cm`
	oc patch $wmla_instance_cm --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	wmla_connection_cm=`oc get cm -o name|grep wml-accelerator-connection-info-extension`
	if [[ 'x' != x"$wmla_connection_cm" ]];then
		oc patch $wmla_connection_cm --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	fi
	wmla_zen_extension=`oc get zenextension -o name|grep wml-accelerator-zen-extension|awk '{print $1}'`
	if [[ 'x' != x"$extension" ]];then
	    oc patch $wmla_zen_extension --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	fi
fi

#remove unused sa docker config secret
sa_secrets=`oc get secret --field-selector type=kubernetes.io/dockercfg -o name|grep 'secret/wmla'`
for s in $sa_secrets;do
    owner=`oc get $s -o jsonpath='{.metadata.ownerReferences}' 2> /dev/null`
    if [ x$owner == 'x' ];then
        echo "remove $s"
        oc delete $s
    fi
done

Offline backup and restore of Watson Machine Learning Accelerator

Offline backup

Before you complete an offline backup of the Watson Machine Learning Accelerator service using the standard backup process, you must stop all running workloads.

Steps:

Stop all running workloads.
1. As an Watson Machine Learning Accelerator project administrator, stop all running jobs, see Stopping an application.
2. Stop all running deployed models. Use the WML Accelerator command line interface to stop each running model, see Stop an inference service.
Back up the Watson Machine Learning Accelerator service using the standard backup process. See: Cloud Pak for Data offline backup and restore (OADP utility)

Offline restore

Before you begin:

Before performing an online restore, make sure that the Watson Machine Learning Accelerator namespace is deleted.

Steps:

Log in to your OpenShift cluster as a project administrator.
```
oc login OpenShift_URL:port
```
Switch to the Watson Machine Learning Accelerator namespace.
```
oc project wmla-namespace
```

Return owner references to Watson Machine Learning Accelerator resources, run the following script depending on your version of Watson Machine Learning Accelerator:

#!/bin/bash
wmla_name=`oc get wmla -o name|awk -F/ '{print $NF}'`
wmla_uid=`oc get wmla $wmla_name -o jsonpath='{.metadata.uid}'`
user_pvc=`oc get wmla $wmla_name -o jsonpath={.spec.usePreCreatedPvcs}`

for r in \
certificate.cert-manager.io/wmla-ca-crt \
certificate.cert-manager.io/wmla-internal-keys \
certificate.cert-manager.io/wmla-nginx-keys \
certificate.cert-manager.io/wmla-internal-keys-ecdsa \
certificate.cert-manager.io/wmla-nginx-keys-ecdsa \
certificate.cert-manager.io/wmla-worker-keys \
configmap/cpd-wmla-br-cm \
configmap/cpd-wmla-ckpt-cm \
configmap/cpd-wmla-qu-cm \
configmap/cpd-wmla-add-on-br-cm \
configmap/wmla-edi-lbd-nginx \
configmap/wmla-gpu-types \
configmap/wmla-install-info-cm \
configmap/wmla-watchdog-conf \
configmap/wmla-wml-accelerator-instance-cm \
configmap/wmla-dlpd-bootstrap \
configmap/wmla-edi \
configmap/wmla-edi-dlim \
configmap/wmla-edi-imd-nginx \
configmap/wmla-edi-isd \
configmap/wmla-edi-isd-ingress \
configmap/wmla-grafana-configmap \
configmap/wmla-grafana-ini \
configmap/wmla-grafana-providers \
configmap/wmla-infoservice \
configmap/wmla-jupyter-hub-config \
configmap/wmla-logstash-conf \
configmap/wmla-mongodb-shells \
configmap/wmla-msd \
configmap/wmla-mss \
configmap/wmla-nginx-conf \
configmap/wmla-nginx-grafana-sidecar-conf \
configmap/wmla-nginx-sidecar-conf \
configmap/wmla-prometheus \
configmap/wmla-version-info \
configmap/wmlaconfigmap \
deployment.apps/wmla-auth-rest \
deployment.apps/wmla-conda \
deployment.apps/wmla-dlpd \
deployment.apps/wmla-edi-imd \
deployment.apps/wmla-edi-lbd \
deployment.apps/wmla-grafana \
deployment.apps/wmla-gui \
deployment.apps/wmla-infoservice \
deployment.apps/wmla-ingress \
deployment.apps/wmla-jupyter-gateway \
deployment.apps/wmla-jupyter-hub \
deployment.apps/wmla-jupyter-proxy \
deployment.apps/wmla-logstash \
deployment.apps/wmla-msd \
deployment.apps/wmla-mss \
deployment.apps/wmla-prometheus \
deployment.apps/wmla-watchdog \
horizontalpodautoscaler.autoscaling/wmla-auth-rest-hpa \
horizontalpodautoscaler.autoscaling/wmla-dlpd-hpa \
horizontalpodautoscaler.autoscaling/wmla-edi-lbd-hpa \
horizontalpodautoscaler.autoscaling/wmla-gui-hpa \
horizontalpodautoscaler.autoscaling/wmla-ingress-hpa \
horizontalpodautoscaler.autoscaling/wmla-watchdog-hpa \
ingress.networking.k8s.io/wmla-jupyter-ingress \
issuer.cert-manager.io/wmla-ca \
issuer.cert-manager.io/wmla-root-issuer \
networkpolicy.networking.k8s.io/wmla-dlpd-netpol \
networkpolicy.networking.k8s.io/wmla-edi-imd-network-policy \
networkpolicy.networking.k8s.io/wmla-edi-isd-network-policy \
networkpolicy.networking.k8s.io/wmla-infoservice-netpol \
networkpolicy.networking.k8s.io/wmla-ingress-network-policy \
networkpolicy.networking.k8s.io/wmla-logstash-network-policy \
networkpolicy.networking.k8s.io/wmla-msd-netpol \
networkpolicy.networking.k8s.io/wmla-namespace-network-policy \
persistentvolumeclaim/wmla-conda \
persistentvolumeclaim/wmla-cws-share \
persistentvolumeclaim/wmla-edi \
persistentvolumeclaim/wmla-infoservice \
persistentvolumeclaim/wmla-logging \
persistentvolumeclaim/wmla-mygpfs \
persistentvolumeclaim/wmla-grafana \
persistentvolumeclaim/wmla-prometheus \
poddisruptionbudget.policy/wmla-jupyter-hub-pdb \
poddisruptionbudget.policy/wmla-jupyter-proxy-pdb \
role.rbac.authorization.k8s.io/wmla-core-role \
role.rbac.authorization.k8s.io/wmla-edi \
role.rbac.authorization.k8s.io/wmla-msd-mss \
role.rbac.authorization.k8s.io/wmla-notebook-role \
role.rbac.authorization.k8s.io/wmla-role \
rolebinding.rbac.authorization.k8s.io/wmla-core-rb \
rolebinding.rbac.authorization.k8s.io/wmla-edi \
rolebinding.rbac.authorization.k8s.io/wmla-msd-mss \
rolebinding.rbac.authorization.k8s.io/wmla-notebook-rb \
rolebinding.rbac.authorization.k8s.io/wmla-rb \
route.route.openshift.io/wmla-console \
route.route.openshift.io/wmla-grafana \
route.route.openshift.io/wmla-inference \
route.route.openshift.io/wmla-jupyter-notebook \
secret/wmla-dlpd-conf \
secret/wmla-eg-secret \
secret/wmla-grafana-secret \
secret/wmla-jupyter-hub-secret \
secret/wmla-mongodb-secret \
secret/wmla-prometheus-htpasswd \
service/wmla-auth-rest \
service/wmla-dlpd \
service/wmla-edi \
service/wmla-edi-admin \
service/wmla-etcd \
service/wmla-grafana \
service/wmla-gui \
service/wmla-inference \
service/wmla-infoservice \
service/wmla-ingress \
service/wmla-jupyter-enterprise-gateway \
service/wmla-jupyter-hub \
service/wmla-jupyter-proxy-api \
service/wmla-jupyter-proxy-public \
service/wmla-logstash-service \
service/wmla-mongodb \
service/wmla-msd \
service/wmla-mss \
service/wmla-prometheus \
serviceaccount/wmla-core-sa \
serviceaccount/wmla-msd-mss \
serviceaccount/wmla-norbac \
serviceaccount/wmla-notebook-sa \
serviceaccount/wmla-sa \
statefulset.apps/wmla-etcd \
statefulset.apps/wmla-mongodb \
zenextension/zen-wmla-frontdoor-extension \
zenextension/zen-wmla-edi-frontdoor-extension \
wmla-add-on.spectrumcomputing.ibm.com/wmla;
do
	oc get $r >& /dev/null
	if [ $? == "0" ]; then
		#skip patch user pvc
		if [ x$user_pvc == 'xtrue' ];then
			resoucetype=`echo $r|awk -F'/' '{print $1}'`
			if [ x$resoucetype == 'xpersistentvolumeclaim' ];then
				echo "skip user defined PVC $r"
				continue
			fi
		fi
		echo "Patch ownerReferences for $r"
		oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla\",\"name\":\"$wmla_name\",\"uid\":\"$wmla_uid\"}]}}"
	fi
done

#update ownerReferences for wmla resource plans
wmla_rps=`oc get rp -o name`
ns_rp=`oc get rp platform  -o jsonpath={.spec.parent}`
wmla_fix_rp=`oc get rp platform  -o jsonpath={.spec.children[0].name}`
cpd_fix_rp=`oc get rp platform  -o jsonpath={.spec.children[1].name}`
for r in $wmla_rps;
do
	#skip scheduler created resource plans
	rp_name=`echo $r|awk -F'/' '{print $2}'`
	if [ x$rp_name == "xplatform" -o x$rp_name == "x$ns_rp" -o x$rp_name == "x$wmla_fix_rp" -o x$rp_name == "x$cpd_fix_rp" ];then
		echo "skip resource plan $rp_name"
		continue
	fi
	echo "Patch ownerReferences for $r"
	oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla\",\"name\":\"$wmla_name\",\"uid\":\"$wmla_uid\"}]}}"
done

#update ownerReferences for deploy/isd and isd/service
isds=`oc get deploy -o name|grep wmla-edi-isd`
imd_uid=`oc get deploy wmla-edi-imd -o jsonpath='{.metadata.uid}'`
for r in $isds;
do
	echo "Patch ownerReferences for $r"
	oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"apps/v1\",\"blockOwnerDeletion\":true,\"controller\":true,\"kind\":\"Deployment\",\"name\":\"wmla-edi-imd\",\"uid\":\"$imd_uid\"}]}}"
done

isd_servicess=`oc get services -o name|grep wmla-edi-isd`
for r in $isd_servicess;
do
	isd_name=`echo $r|awk -F/ '{print $NF}'`
	isd_uid=`oc get deploy $isd_name -o jsonpath='{.metadata.uid}'`
	echo "Patch ownerReferences for $r"
	oc patch $r --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"apps/v1\",\"blockOwnerDeletion\":true,\"controller\":true,\"kind\":\"Deployment\",\"name\":\"$isd_name\",\"uid\":\"$isd_uid\"}]}}"
done

#update ownerReferences for wmla-add-on cm
wmla_add_on_name=`oc get wmla-add-on -o name|awk -F/ '{print $NF}'`
if [ x$wmla_add_on_name != x ];then
	wmla_add_on_uid=`oc get wmla-add-on $wmla_add_on_name -o jsonpath='{.metadata.uid}'`
	oc patch configmap/cpd-wmla-add-on-br-cm --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	wmla_instance_cm=`oc get cm -o name|grep wml-accelerator-instance-cm`
	oc patch $wmla_instance_cm --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	wmla_connection_cm=`oc get cm -o name|grep wml-accelerator-connection-info-extension`
	if [[ 'x' != x"$wmla_connection_cm" ]];then
		oc patch $wmla_connection_cm --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	fi
	wmla_zen_extension=`oc get zenextension -o name|grep wml-accelerator-zen-extension|awk '{print $1}'`
	if [[ 'x' != x"$extension" ]];then
	    oc patch $wmla_zen_extension --type merge -p "{\"metadata\":{\"ownerReferences\":[{\"apiVersion\":\"spectrumcomputing.ibm.com/v1\",\"kind\":\"Wmla-add-on\",\"name\":\"$wmla_add_on_name\",\"uid\":\"$wmla_add_on_uid\"}]}}"
	fi
fi

#remove unused sa docker config secret
sa_secrets=`oc get secret --field-selector type=kubernetes.io/dockercfg -o name|grep 'secret/wmla'`
for s in $sa_secrets;do
    owner=`oc get $s -o jsonpath='{.metadata.ownerReferences}' 2> /dev/null`
    if [ x$owner == 'x' ];then
        echo "remove $s"
        oc delete $s
    fi
done

After restoring Watson Machine Learning Accelerator

After restoring Watson Machine Learning Accelerator, make sure to address the following known issues.

Known issue:

After restoring Watson Machine Learning Accelerator, a known issue exists where the cluster is unhealthy and fails to get the status of the endpoint.

To resolve this issue:

Run the following command to check the status of the wmla-etcd cluster:

oc exec -it wmla-etcd-0 -- bash -c "ETCDCTL_API=3 etcdctl --cacert=/etc/pki/etcd/ca.crt --cert=/etc/pki/etcd/tls.crt --key=/etc/pki/etcd/tls.key --insecure-skip-tls-verify endpoint status --cluster"

The following error is displayed if the cluster is unhealthy:

Defaulted container "etcd" out of: etcd, init-data-dir (init)
{"level":"warn","ts":"2023-06-19T12:01:20.476Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://wmla-etcd-2.wmla-etcd:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup wmla-etcd-2.wmla-etcd on 172.30.0.10:53: no such host\""}
Failed to get the status of endpoint https://wmla-etcd-2.wmla-etcd:2379 (context deadline exceeded)
https://wmla-etcd-1.wmla-etcd:2379, 968d327db883b4b4, 3.3.27, 9.0 MB, true, 2715, 2044
https://wmla-etcd-0.wmla-etcd:2379, f5b85a4577d2c8db, 3.3.27, 9.1 MB, false, 2715, 2044

Modify the wmla-etcd statefulset to make the failure pod be ready for maintenance.

oc edit statefulset wmla-etcd

Remove the liveness probe and readiness probe by removing the following lines:

        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 60
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 2379
          timeoutSeconds: 1

        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 20
          successThreshold: 1
          tcpSocket:
            port: 2379
          timeoutSeconds: 1

Modify the container as follows:

      containers:
      - command:
        - /bin/sh
        - -c
        - |
          PEERS="wmla-etcd-0=https://wmla-etcd-0.wmla-etcd:2380,wmla-etcd-1=https://wmla-etcd-1.wmla-etcd:2380,wmla-etcd-2=https://wmla-etcd-2.wmla-etcd:2380"
          ETCD_INITIAL_CLUSTER_STATE="new"
          if [ "$WMLA_ETCD_FAILURE_NODE" == "$HOSTNAME" -a ! -f /var/run/etcd/${HOSTNAME}.etcd/_recovered ]; then
              rm -rf /var/run/etcd/${HOSTNAME}.etcd
              echo "Restore ${HOSTNAME} in mainteance ..."
              ETCD_INITIAL_CLUSTER_STATE="existing"
              sleep 5
              mkdir -p /var/run/etcd/${HOSTNAME}.etcd
              touch /var/run/etcd/${HOSTNAME}.etcd/_recovered
          fi
          exec etcd --name ${HOSTNAME} \
            --listen-peer-urls https://0.0.0.0:2380 \
            --listen-client-urls https://0.0.0.0:2379 \
            --advertise-client-urls https://${HOSTNAME}.wmla-etcd:2379 \
            --initial-advertise-peer-urls https://${HOSTNAME}:2380 \
            --initial-cluster-token wmla-etcd-cluster \
            --initial-cluster ${PEERS} \
            --initial-cluster-state ${ETCD_INITIAL_CLUSTER_STATE} \
            --data-dir /var/run/etcd/${HOSTNAME}.etcd \
            --cert-file=/etc/pki/etcd/tls.crt \
            --key-file=/etc/pki/etcd/tls.key \
            --trusted-ca-file=/etc/pki/etcd/ca.crt \
            --client-cert-auth \
            --peer-cert-file=/etc/pki/etcd/tls.crt \
            --peer-key-file=/etc/pki/etcd/tls.key \
            --peer-trusted-ca-file=/etc/pki/etcd/ca.crt \
            --peer-client-cert-auth \
            --quota-backend-bytes=8589934592 \
            --cipher-suites TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

Set the WMLA_ETCD_FAILURE_NODE environment variable to wmla-etcd-2 for the pod that failed:
```
oc set env statefulset/wmla-etcd WMLA_ETCD_FAILURE_NODE=wmla-etcd-2
```
Restart the failed pod:
```
oc delete pod wmla-etcd-2
```

Run the following command to check the status of the wmla-etcd cluster:

oc exec -it wmla-etcd-0 -- bash -c "ETCDCTL_API=3 etcdctl --cacert=/etc/pki/etcd/ca.crt --cert=/etc/pki/etcd/tls.crt --key=/etc/pki/etcd/tls.key --insecure-skip-tls-verify endpoint status --cluster"

The following is displayed for a healthy cluster:

Defaulted container "etcd" out of: etcd, init-data-dir (init)
https://wmla-etcd-1.wmla-etcd:2379, 968d327db883b4b4, 3.3.27, 1.4 MB, true, 2815, 2094
https://wmla-etcd-2.wmla-etcd:2379, cc26316c8c459e22, 3.3.27, 2.7 MB, false, 2815, 2094
https://wmla-etcd-0.wmla-etcd:2379, f5b85a4577d2c8db, 3.3.27, 1.4 MB, false, 2815, 2094

Known issue:

After restoring Watson Machine Learning Accelerator, the wmla-mongodb-1 or wmla-mongodb-2 pod may fail to start.

If this issue has occurred, complete the following steps to start the pods. Depending on the status of the cluster, this procedure may take several minutes to complete.

Scale down the MongoDB service to replica number 1:

oc scale --replicas=1 sts wmla-mongodb -n <wmla_instance_namespace>

Wait for the MongoDB pods to scale down and stabilize.

Remove wmla-mongodb-1 and wmla-mongodb-2 PVCs. Do not delete wmla-mongodb-0 PVC.

oc delete pvc data-wmla-mongodb-1 data-wmla-mongodb-2 -n <wmla_instance_namespace>

Scale up the MongoDB service to replica number 3:

oc scale  --replicas=3 sts wmla-mongodb -n <wmla_instance_namespace>