Pod takes a long time to start

If the Workflow Authoring or Workflow Runtime pod takes a long time to start, and all other pods are up and running, the problem might be caused by resource limits. To solve the problem, modify the custom resource (CR) file, increase the resource limits, and apply the change.

A container is not allowed to use more than its resource limits. After the readiness probe fails several times, Kubernetes restarts the pod. For more information, see Managing Resource for containers: Requests and limits.

Resolution

  1. Modify the workflow_authoring_configuration or baw_configuration section of your CR file to set the CPU and memory resource limits to a higher value. For example, add the following lines:
        resources:
          limits:
            ## CPU limit for Workflow Authoring
            cpu: 4
            ## Memory limit for Workflow Authoring
            memory: 3Gi
    To have your changes take effect, run the following command and wait for the next operator reconcile:
    oc apply - f <your_CR.yaml>
  2. Optional: Adjust the properties of the two probes (liveness probe and readiness probe). For example, add the following lines in the workflow_authoring_configuration or baw_configuration section of your CR file:
        probe:
          ws:
            liveness_probe:
              ## Number of seconds after the Workflow Authoring container starts before the liveness probe is initiated
              initial_delay_seconds: 300
              ## Number of seconds to wait before the next probe.
              period_seconds: 10
              ## Number of seconds after which the probe times out.
              timeout_seconds: 10
              ## When a probe fails, number of times that Kubernetes will try before giving up and restarting the container.
              failure_threshold: 3
              ## Minimum consecutive successes for the probe to be considered successful after it failed.
              success_threshold: 1
            readinessProbe:
              ## Number of seconds after the Workflow Authoring container starts before the readiness probe is initiated
              initial_delay_seconds: 240
              ## Number of seconds to wait before the next probe.
              period_seconds: 5
              ## Number of seconds after which the probe times out.
              timeout_seconds: 5
              ## When a probe fails, number of times that Kubernetes will try before giving up and restarting the container.
              failure_threshold: 6
              ## Minimum consecutive successes for the probe to be considered successful after it failed.
              success_threshold: 1
    To have your changes take effect, run the following command and wait for the next operator reconcile:
    oc apply - f <your_CR.yaml>