Pod takes a long time to start
If the Workflow Authoring or Workflow Runtime pod takes a long time to start, and all
other pods are up and running, the problem might be caused by resource limits. To solve the problem,
modify the custom resource (CR) file, increase the resource limits, and apply the
change.
A container is not allowed to use more than its resource limits. After the readiness probe fails several times, Kubernetes restarts the pod. For more information, see Managing Resource for containers: Requests and limits.
Resolution
- Modify the
workflow_authoring_configuration
orbaw_configuration
section of your CR file to set the CPU and memory resource limits to a higher value. For example, add the following lines:
To have your changes take effect, run the following command and wait for the next operator reconcile:resources: limits: ## CPU limit for Workflow Authoring cpu: 4 ## Memory limit for Workflow Authoring memory: 3Gi
oc apply - f <your_CR.yaml>
- Optional: Adjust the properties of the two probes (liveness probe and readiness probe). For
example, add the following lines in the
workflow_authoring_configuration
orbaw_configuration
section of your CR file:
To have your changes take effect, run the following command and wait for the next operator reconcile:probe: ws: liveness_probe: ## Number of seconds after the Workflow Authoring container starts before the liveness probe is initiated initial_delay_seconds: 300 ## Number of seconds to wait before the next probe. period_seconds: 10 ## Number of seconds after which the probe times out. timeout_seconds: 10 ## When a probe fails, number of times that Kubernetes will try before giving up and restarting the container. failure_threshold: 3 ## Minimum consecutive successes for the probe to be considered successful after it failed. success_threshold: 1 readinessProbe: ## Number of seconds after the Workflow Authoring container starts before the readiness probe is initiated initial_delay_seconds: 240 ## Number of seconds to wait before the next probe. period_seconds: 5 ## Number of seconds after which the probe times out. timeout_seconds: 5 ## When a probe fails, number of times that Kubernetes will try before giving up and restarting the container. failure_threshold: 6 ## Minimum consecutive successes for the probe to be considered successful after it failed. success_threshold: 1
oc apply - f <your_CR.yaml>