Troubleshooting core services errors

Diagnose and resolve common issues with core platform services including authentication failures, service startup problems, and connectivity errors.

IBM Container Registry authentication failure

Symptom

Image pull fails due to an authentication or access error.

Example,
pull access denied
unauthorized
authentication required

Resolution

Log in to IBM Container Registry again:
docker login cp.icr.io
Enter the following credentials:
Username: cp
Password: <IBM entitlement key>
After you log in successfully, run the deployment again:
./deploy.sh up --profile core

Core service fails to start

Symptom

One or more services do not run after deployment.

Resolution

Check the status:
./deploy.sh status
Check the logs for the failed service:
./deploy.sh logs <service-name> -f
Check all the containers:
docker ps -a
If you correct the configuration values after the first attempt, redeploy the core services:
./deploy.sh up --profile core

The NGINX endpoint is not accessible

Symptom

The endpoint does not open in the browser:
https://<host>:8443/

Resolution

Verify that NGINX is running:
./deploy.sh status
Check the NGINX logs:
./deploy.sh logs nginx -f
  • Verify that the server firewall allows inbound access to port 8443.

  • Verify that you use the correct hostname or IP address.
If the NGINX hostname or IP address changes after the initial deployment, regenerate the certificates:
./deploy.sh up --profile core --regenerate-certs

Browser displays certificate warning or certificate mismatch

Symptom

The browser displays a certificate warning due to certificate mismatch, expiration, or trust issues.

Resolution

Verify that NGINX_HOSTNAME and NGINX_HOST_IP match the hostname or IP address used to access the service.

If the host details change, regenerate the certificates:
./deploy.sh up --profile core --regenerate-certs
Access the service again:
https://<host>:8443/

OpenSearch fails to start

Symptom

The OpenSearch container stops or shows an unhealthy status.

Resolution

Check OpenSearch logs:
./deploy.sh logs opensearch -f
Verify that the following value is set in .env.core:
OPENSEARCH_INITIAL_ADMIN_PASSWORD

If the issue relates to certificates or a stale OpenSearch setup, perform a clean redeployment of OpenSearch related core data only if data loss is acceptable.

For a complete fresh core deployment:
./deploy.sh cleanup --profile core --volumes
./deploy.sh up --profile core
Warning: This command removes persisted volumes for core services.

PostgreSQL connection issue

Symptom

A service fails to connect to PostgreSQL.

Resolution

Verify that the following values are set correctly in .env.core:
POSTGRES_USER
POSTGRES_PASSWORD
POSTGRES_DB
GATEWAY_DB_NAME
GATEWAY_DB_USER
GATEWAY_DB_PASSWORD
Check the PostgreSQL logs:
./deploy.sh logs postgres -f
Check the service status:
./deploy.sh status
After you correct the configuration, redeploy the core services:
./deploy.sh up --profile core

Redis connection issue

Symptom

A service fails to connect to Redis.

Resolution

Check the Redis logs:
./deploy.sh logs redis -f

If you configure REDIS_PASSWORD, ensure that dependent services use the same value.

If Redis authentication is not required, leave REDIS_PASSWORD empty.

After you make the changes, redeploy the core services:
./deploy.sh up --profile core

AIOps connection issue

Symptom

AIOps related services cannot connect to IBM Z Automation Web Console or IBM Z Workload Scheduler endpoints.

Resolution

Verify that the following values are correct:
SMU_HOSTNAME
SMU_USERNAME
SMU_PASSWORD
SMU_ADMIN_USERNAME
SMU_ADMIN_PASSWORD
ZWS_HOSTNAME
ZWS_USERNAME
ZWS_PASSWORD
AIOPS_USERNAME
AIOPS_PASSWORD
Check the AIOps logs:
./deploy.sh logs aiops -f

Verify that the deployment host can connect to the configured IBM Z Automation Web Console or IBM Z Workload Scheduler hosts.

After you correct the values, redeploy the core services:
./deploy.sh up --profile core

Gateway service issue

Symptom

The gateway service fails to start or dependent services fail to authenticate with the gateway.

Resolution

Verify that the following values are set:
GATEWAY_DB_NAME
GATEWAY_DB_USER
GATEWAY_DB_PASSWORD
GATEWAY_ADMIN_API_KEY
Check the gateway logs:
./deploy.sh logs gateway -f

GATEWAY_ADMIN_API_KEY is required for the first deployment.

After you correct the gateway configuration, redeploy the core services:
./deploy.sh up --profile core

Content Ingestion UI authentication issue

Symptom

The Content Ingestion UI prompts for credentials, but login fails.

Resolution

Verify the configured basic authentication values:
BASIC_AUTH_USERNAME
BASIC_AUTH_PASSWORD
Check the Content Ingestion UI logs:
./deploy.sh logs content-ingestion-ui -f
After you correct the values, redeploy the core services:
./deploy.sh up --profile core

Client ingestion authentication issue

Symptom

Client ingestion requests fail with unauthorized or authentication errors.

Resolution

Verify that the following value is configured:
CLIENT_INGESTION_AUTH_KEY
Check the client ingestion service logs:
./deploy.sh logs client-ingestion-service -f
After you correct the key, redeploy the core services:
./deploy.sh up --profile core

Docker Compose CLI Container Not Running (applicable only for zCX environments)

If the docker-compose-cli container is not running, do the following:
  1. Verify the container status:
    docker ps -a | grep docker-compose-cli
  2. If the container is not running, restart it.
    docker start docker-compose-cli

Clean redeployment of core services

Use a clean redeployment only when you need a fresh setup or when you can remove persisted data.
./deploy.sh cleanup --profile core --volumes
./deploy.sh up --profile core
To remove images:
./deploy.sh cleanup --profile core --volumes --images
./deploy.sh up --profile core
Warning: The cleanup command with --volumes removes persisted data.