Troubleshooting stanza-create job for S3 backup configuration
Troubleshoot configuration of s3 backups when the stanza-create job fails.
When you configure Management subsystem backups for s3 providers, the operator runs a
stanza-create
job. This job creates the stanza (Postgres cluster configuration) on
the upstream S3 server, which is used for backup and archive procedures. The job also brings up the
necessary pod.
Check the status of the
stanza-create
job:kubectl get jobs -n <namespace> | grep stanza
On failure:
stanza-create
pods fail, and job status is0/1
:kubectl get jobs | grep stanza m1-3bfe12ac-postgres-stanza-create 0/1 32m 32m
- Listing all pods brought up by
stanza-create
job shows multiple pods because a job triesbackoffLimit
times to complete the job. By default, there will be7
pods in error state:kubectl get pods | grep stanza m1-3bfe12ac-postgres-stanza-create-726z4 0/1 Error 0 11m m1-3bfe12ac-postgres-stanza-create-9vq4n 0/1 Error 0 22m m1-3bfe12ac-postgres-stanza-create-gbtgb 0/1 Error 0 30m m1-3bfe12ac-postgres-stanza-create-gl5g5 0/1 Error 0 17m m1-3bfe12ac-postgres-stanza-create-q8qhw 0/1 Error 0 26m m1-3bfe12ac-postgres-stanza-create-t84zs 0/1 Error 0 8m54s m1-3bfe12ac-postgres-stanza-create-z8tfs 0/1 Error 0 12m
Note: In some cases, the pods are cleaned by Kubernetes and the logs of these pods can be lost.
- Pod log errors show a reason for the failure. For example, when a hostname is not
valid:
kubectl logs m1-3bfe12ac-postgres-stanza-create-gbtgb time="2021-04-13T20:01:19Z" level=info msg="pgo-backrest starts" time="2021-04-13T20:01:19Z" level=info msg="debug flag set to false" time="2021-04-13T20:01:19Z" level=info msg="backrest stanza-create command requested" time="2021-04-13T20:01:19Z" level=info msg="s3 flag enabled for backrest command" time="2021-04-13T20:01:19Z" level=info msg="command to execute is [pgbackrest stanza-create --db-host=192.1.2.3 --db-path=/pgdata/m1-3bfe12ac-postgres -- repo1-type=s3]" time="2021-04-13T20:01:19Z" level=info msg="command is pgbackrest stanza-create -- db-host=192.1.2.3 --db-path=/pgdata/m1-3bfe12ac-postgres --repo1-type=s3 " time="2021-04-13T20:05:21Z" level=error msg="command terminated with exit code 49" time="2021-04-13T20:05:21Z" level=info msg="output=[]"
- Job status is set to
type: Failed
andstatus: "True"
, withreason: BackoffLimitExceeded
:kubectl get job m1-3bfe12ac-postgres-stanza-create -o yaml status: conditions: - lastProbeTime: "2021-04-13T20:28:03Z" lastTransitionTime: "2021-04-13T20:28:03Z" message: Job has reached the specified backoff limit reason: BackoffLimitExceeded status: "True" type: Failed failed: 7 startTime: "2021-04-13T20:01:18Z"
- If pod logs are lost, you can execute a command inside the
backrest-shared-repo
pod:- Obtain the
backrest-shared-repo
pod name:kubectl get pods -n <namespace> | grep backrest-shared-repo
For example:
kubectl get pods | grep backrest-shared-repo m1-eb8edc18-postgres-backrest-shared-repo-98dd46cc6-twl95 1/1 Running
- Exec into the
pod:
kubectl exec -it <backrest-shared-repo-pod> -- bash
- Run:
pgbackrest info --output json --repo1-type s3
For example, for invalid hostname:
ERROR: [049]: unable to get address for 'test1-v10-backup.s3.exampleaws.com': [-2] Name or service not knownCopy
The exact
ERROR:
will vary depending on the misconfiguration. Common errors:Invalid access key
Invalid access key secret
Invalid bucket region
Invalid bucket or folder path
- Obtain the
- To fix the incorrect settings, see Reconfiguring or adding backup settings after installation of the management subsystem.