Troubleshooting stanza-create job for S3 backup configuration

Troubleshoot configuration of s3 backups when the stanza-create job fails.

When you configure Management subsystem backups for s3 providers, the operator runs a stanza-create job. This job creates the stanza (Postgres cluster configuration) on the upstream S3 server, which is used for backup and archive procedures. The job also brings up the necessary pod.

Check the status of the stanza-create job:
kubectl get jobs -n <namespace> | grep stanza

On failure:

  • stanza-create pods fail, and job status is 0/1:
    kubectl get jobs | grep stanza
     m1-3bfe12ac-postgres-stanza-create          0/1           32m        32m
  • Listing all pods brought up by stanza-create job shows multiple pods because a job tries backoffLimit times to complete the job. By default, there will be 7 pods in error state:
    kubectl get pods | grep stanza
     m1-3bfe12ac-postgres-stanza-create-726z4                    0/1     Error       0          11m
     m1-3bfe12ac-postgres-stanza-create-9vq4n                    0/1     Error       0          22m
     m1-3bfe12ac-postgres-stanza-create-gbtgb                    0/1     Error       0          30m
     m1-3bfe12ac-postgres-stanza-create-gl5g5                    0/1     Error       0          17m
     m1-3bfe12ac-postgres-stanza-create-q8qhw                    0/1     Error       0          26m
     m1-3bfe12ac-postgres-stanza-create-t84zs                    0/1     Error       0          8m54s
     m1-3bfe12ac-postgres-stanza-create-z8tfs                    0/1     Error       0          12m

    Note: In some cases, the pods are cleaned by Kubernetes and the logs of these pods can be lost.

  • Pod log errors show a reason for the failure. For example, when a hostname is not valid:
    kubectl logs m1-3bfe12ac-postgres-stanza-create-gbtgb
    
    time="2021-04-13T20:01:19Z" level=info msg="pgo-backrest starts"
    time="2021-04-13T20:01:19Z" level=info msg="debug flag set to false"
    time="2021-04-13T20:01:19Z" level=info msg="backrest stanza-create command 
    requested"
    time="2021-04-13T20:01:19Z" level=info msg="s3 flag enabled for backrest command"
    time="2021-04-13T20:01:19Z" level=info msg="command to execute is [pgbackrest 
    stanza-create  --db-host=192.1.2.3 --db-path=/pgdata/m1-3bfe12ac-postgres -- 
    repo1-type=s3]"
    time="2021-04-13T20:01:19Z" level=info msg="command is pgbackrest stanza-create  -- 
    db-host=192.1.2.3 --db-path=/pgdata/m1-3bfe12ac-postgres --repo1-type=s3 "
    time="2021-04-13T20:05:21Z" level=error msg="command terminated with exit code 49"
    time="2021-04-13T20:05:21Z" level=info msg="output=[]"
  • Job status is set to type: Failed and status: "True", with reason: BackoffLimitExceeded:
     kubectl get job m1-3bfe12ac-postgres-stanza-create -o yaml
    
    status:
      conditions:
        - lastProbeTime: "2021-04-13T20:28:03Z"
          lastTransitionTime: "2021-04-13T20:28:03Z"
          message: Job has reached the specified backoff limit
          reason: BackoffLimitExceeded
          status: "True"
          type: Failed
          failed: 7
    startTime: "2021-04-13T20:01:18Z"
  • If pod logs are lost, you can execute a command inside the backrest-shared-repo pod:
    1. Obtain the backrest-shared-repo pod name:
      kubectl get pods -n <namespace> | grep backrest-shared-repo

      For example:

      kubectl get pods | grep backrest-shared-repo
      m1-eb8edc18-postgres-backrest-shared-repo-98dd46cc6-twl95   1/1     Running 
    2. Exec into the pod:
      kubectl exec -it <backrest-shared-repo-pod> -- bash
    3. Run:
      pgbackrest info --output json --repo1-type s3

      For example, for invalid hostname:

      ERROR: [049]: unable to get address for 'test1-v10-backup.s3.exampleaws.com': [-2] Name or service not knownCopy

      The exact ERROR: will vary depending on the misconfiguration. Common errors:

      Invalid access key
      Invalid access key secret
      Invalid bucket region
      Invalid bucket or folder path
  • To fix the incorrect settings, see Reconfiguring or adding backup settings after installation of the management subsystem.