Some containers fail to start up successfully

When starting the IBM Z® Resource Discovery Data Service, some containers fail to complete their startup. Various errors states may be shown.

Symptom

The dockerManageZoa.sh upor podmanManageZoa.sh up fails to complete successfully, or it completes successfully, but subsequent dockerManageZoa.sh ps or podmanManageZoa.sh ps output shows error states on one or more containers.

Error states may include the following:

  • Container keeps restarting.
  • Container has a health status of “unhealthy”.
  • Container status is shown as “created” rather than as “Up nn minutes”.

Solution

The solution to this problem depends on the root cause of the observed failure.
  1. Containers that keep restarting typically have encountered a fatal error that causes them to shut down. Due to the default container restart configuration of restart: always, a container shutdown due to error will automatically result in Docker or Podman attempting to create a new container, resulting in a restart loop.
    1. Determine the cause of the failure by obtaining the container log output: docker logs -f <container-name> or podman logs -f <container-name>.
    2. From the log output, determine the root cause of the problem and resolve it. If necessary, ask IBM Support for assistance.
    3. After the root cause has been resolved, shut down and re-start the entire application:
      ./bin/dockerManageZoa.sh down
      ./bin/dockerManageZoa.sh up
      
      or
      ./bin/podmanManageZoa.sh down
      ./bin/podmanManageZoa.sh up
      
  2. Containers that show a health status of “unhealthy” have failed their built-in health check. This may have one of the following causes:
    • An internal error has occurred within the container that does not cause the container to shut down, but that will prevent it from functioning successfully as part of the overall solution. This internal error has been detected by a built-in health check, and the container was marked as “unhealthy” as a result.
    • No internal error has occurred, but it took the container too long to reach the state at which it was able to pass its built-in health check. Containers with built-in health check are expected to pass that health check within 180 seconds after startup; failure to meet this requirement will also result in the container being marked as “unhealthy”.
    To determine the next action, you must identify the exact cause of the “unhealthy” container status by reviewing the container log output.
    • If the container log does not show any errors, the problem was most likely caused by slow system performance. Simply run the dockerManageZoa.sh up / podmanManageZoa.sh up again to initiate a new health check. In the absence of manifest errors within the affected containers, the startup should complete successfully due to the extra time provided to the containers to reach steady state and pass their health checks.
    • If the container log does show errors, resolve those errors before taking any further action. If necessary, ask IBM Support for assistance. Then, shut down and re-start the entire application.
  3. Containers that show a status of “created” instead of “Up nn minutes” typically have a dependency on one or more other containers that have either encountered an error or have failed their health checks. Containers in this condition will not start up until all problems with their prerequisite containers have been resolved. This problem scenario will therefore resolve itself as a byproduct of the corrective actions taken for Problem Scenario A. or B. above.