Troubleshooting Failed Job
Job has failed and has been redelivered
In the context of an application deployed on Kubernetes/OpenShift, an Optimization Server Worker does not complete its execution and no explicit error is visible in the container logs. In the Job list widget, the job is marked as failed with the following message:
java.lang.RuntimeException: Job had failed and has been redelivered, then abandoned
This is usually caused by a worker pod that tries to allocate more memory that it is allowed to do by the Kubernetes configuration. To solve the problem, change the memory limits in the worker service helm chart to a more appropriate value:
spec:
containers:
- env:
- name: JAVA_TOOL_OPTIONS # Configures the JVM memory
value: -Xmx4000m -Xms500m -XX:+CrashOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/carhartt-mso-checker-worker-heap-dump.hprof
...
resources: # Configures the Kubernetes Pod resources
limits:
memory: 4256Mi
requests:
cpu: 100m
memory: 1000Mi
...