IBM Support

API Connect Management Subsystem Restore Stuck with Message: Bringing up restored cluster

Troubleshooting


Problem

After restoring the Management database using a backup Custom Resource (CR), the restore process may fail to complete. The status remains stuck at:
Bringing up restored cluster
In such cases, the recovery pod stays in a Pending state, preventing further progress. Investigation using kubectl describe pod often reveals issues such as missing PersistentVolumeClaims (PVCs), especially for db-wal.
Once the PVC issue is resolved, the restore must be re-attempted after cleaning up artifacts from the previous failed restore.

Diagnosing The Problem

1. Check Restore Status
kubectl -n <management namespace> get ManagementRestore --sort-by=.metadata.creationTimestamp

Example output:
NAME                                                                  STATUS     MESSAGE                                            BACKUP   CLUSTER                  PITR   AGE
managementrestore.management.apiconnect.ibm.com/mgmt-restore-02       Running    Bringing up restored cluster                                ttg-apimanagement-mgmt          5d
2. Check Operator Logs for Restore Status
kubectl logs pod/<your_ibm-apiconnect_id_operator_pod> -n <api connect operator namespace> | grep -i ManagementRestore

Example Log (improved for readability):
{
    "level":"info",
    "ts":"2025-09-07T00:00:00.000Z",
    "logger":"controllers.ManagementCluster.Reconcile.reconcileManagementRestore.op-utils:CheckServicesReady",
    "msg":"EDB Cluster not ready",
    "mgmt-instance":"cp4i-subin/subin-apimanagement-mgmt",
    "Status.Phase":"Setting up primary"
}
3. Check EDB Operator Logs
kubectl logs pod/<your_edb_id_operator_pod> -n <edb operator namespace> | grep -i ManagementRestore

Example: The pod name could be similar to postgresql-operator-controller-manager-1-22-5-5cb8f94c6-kpc8x_manager (improved for readability)

{
    "level":"info",
    "ts":"2025-09-22T00:00:00Z",
    "msg":"The current primary instance is fenced or is still recovering from it, we won't trigger a switchover",
    "controller":"cluster",
    "controllerGroup":"postgresql.k8s.enterprisedb.io",
    "controllerKind":"Cluster",
    "Cluster":
    {
        "name":"subin-apiman-9816b16e-9816b16e-db",
        "namespace":"cp4i-subin"
    },
    "namespace":"cp4i-subin",
    "name":"subin-apiman-9816b16e-9816b16e-db",
    "reconcileID":"9479b8e2-d271-469e-9a79-27d195105424"
}
4. Check PVC Status
kubectl get pvc/subin-apiman-9816b16e-9816b16e-db-1 -o yaml -n <management namespace>
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    ....
    k8s.enterprisedb.io/pvcStatus: initializing
....
  name: ttg-apiman-9816b16e-9816b16e-db-1
  namespace: subin-prod
  ownerReferences:
  - apiVersion: postgresql.k8s.enterprisedb.io/v1
    controller: true
    kind: Cluster
    name: subin-apiman-9816b16e-9816b16e-db
5. Check Recovery Pod Status
kubectl get pods -n <management namespace> | grep recovery
NAME                                                              READY   STATUS      RESTARTS      AGE     IP              NODE                                       NOMINATED NODE   READINESS GATES
subin-apiman-9816b16e-9816b16e-db-1-full-recovery-q9v57             0/1     Pending     0             3d7h    <none>          <none>                                     <none>           <none>
6. Review Pod Events
kubectl describe pod/subin-apiman-9816b16e-9816b16e-db-1-full-recovery-q9v57 -n <management namespace>
Events:
  Type     Reason            Age                      From               Message
  ----     ------            ----                     ----               -------
  Warning  FailedScheduling  6m21s (x4374 over 3d7h)  default-scheduler  0/14 nodes are available: persistentvolumeclaim "subin-apiman-9816b16e-9816b16e-db-1-wal" not found. preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling..

Resolving The Problem

To unblock the restore process:

1. Delete Restore Artifacts

kubectl get mgmtr
kubectl delete mgmtr <name of the restore CR>

kubectl get cm | grep oplock
kubectl delete cm <mgmt-oplock>

2. Delete Stuck Recovery Job

kubectl get jobs | grep recovery
kubectl delete job <recovery job name>

3. Re-run the Restore Follow the same steps used previously to initiate the restore.
 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB77","label":"Automation Platform"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"ARM Category":[{"code":"a8mKe000000CaZaIAK","label":"API Connect-\u003EAPIC Platform - Mgmt DB"}],"ARM Case Number":"TS020341970","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.0.7;10.0.8"}]

Document Information

Modified date:
23 September 2025

UID

ibm17245844