ax-environment-api-deploy pods in CrashLoopBackOff

Troubleshooting

Problem

You have started to notice that the ax-environment-api-deploy pods in your Cloud Pak for Data cluster has gone into a CrashLoopBackOff state. Restarts of these pods fail to resolve this problem.

Symptom

Symptoms for this problem are:

ax-environment-api-deploy pods are in CrashLoopBackOff
End-users report that they are unable to launch Jupyter notebook sessions (old or new) within the cluster
Issue is intermittent. Some times users were able to launch a notebook session, sometimes they are not

Describe of this pod captures:

    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 28 Apr 2026 03:50:56 +0000
      Finished:     Tue, 28 Apr 2026 03:53:04 +0000
    Ready:          False

YAML contains:

  containerStatuses:
  - containerID: cri-o://8392b1e2d59e2efc314424a0df4c41064ee0c85de2e449c2980160f8ec4b4ae8
    image: cp.icr.io/cp/cpd/environments-api@sha256:6ef5dcb2b3e087183a1de2067295d38db0672547f717f9f59e2229bafc27eb95
    imageID: cp.icr.io/cp/cpd/environments-api@sha256:6ef5dcb2b3e087183a1de2067295d38db0672547f717f9f59e2229bafc27eb95
    lastState:
      terminated:
        containerID: cri-o://883e4b901bacac888b29093709bf0574689170a911feeae48f53af36d3236a5f
        exitCode: 1
        finishedAt: "2026-04-28T03:53:04Z"
        reason: Error
        startedAt: "2026-04-28T03:50:56Z"
    name: runtime
    ready: false
    restartCount: 19

Cluster Events related to this pod:

Readiness probe failed: Get "https://10.210.22.132:3860/v2/environments/monitor": dial tcp 10.210.22.132:3860: connect: connection refused
Back-off restarting failed container runtime in pod ax-environments-api-deploy-7c66b46f-cwv4d_cp4d(49561815-82fa-4891-8e0c-71bfdc6465da)
Liveness probe failed: Get "https://10.210.22.132:3860/v2/environments/monitor": dial tcp 10.210.22.132:3860: connect: connection refused
Back-off restarting failed container runtime in pod ax-environments-api-deploy-7f7f7cf58-622zx_cp4d(2f0efd5c-1408-479b-9290-e07c70dffcf3)
failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)
invalid metrics (1 invalid out of 1), first error is: failed to get cpu resource metric value: failed to get cpu utilization: did not receive metrics for targeted pods (pods might be unready)

ax-environment-api-deploy runtime log:

{"timestamp":"2026-04-28T03:58:21.094Z","category":"application","level":"ERROR","saveServiceCopy":false,"appname":"environments-api","name":"icp4d-environments-api","message":"Error querying Cloudant view to list environment specifications.","err":{},"viewParams":{"ddoc":"environment_specifications","view":"global_by_guid","descending":false,"includeDocs":true},"transaction_ID":"4ec0efa3-cede-461d-8872-f535c477a2a4"}
{"timestamp":"2026-04-28T03:58:21.095Z","category":"app","level":"ERROR","saveServiceCopy":false,"appname":"environments-api","name":"icp4d-environments-api","message":"Error: Callback was already called.\n at /home/node/environments-api/node_modules/async/dist/async.js:326:36\n at dbErrorCb (/home/node/environments-api/src/lib/specifications-helper/base/specifications.js:236:18)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)"}
{"timestamp":"2026-04-28T03:58:21.096Z","category":"app","level":"ERROR","saveServiceCopy":false,"appname":"environments-api","name":"icp4d-environments-api","message":"Slack notification is disabled but process is in an undefined state. Terminating process.","err":{}}

ax-ws-notebooks-ui-deploy runtime log:

[2026-04-28T03:58:20.991] [[32mINFO [39m] [outgoing-request.projects-client] [transaction_ID=4ec0efa3-cede-461d-8872-f535c477a2a4] [200] GET https://internal-nginx-svc.cp4d.svc.cluster.local:12443/v2/projects/8867b602-c222-4e57-bb06-5b803c22d28f (21 ms), qs={"include":"everything"}
[2026-04-28T03:58:21.106] [[91mERROR[39m] [outgoing-request.environments-client] [transaction_ID=4ec0efa3-cede-461d-8872-f535c477a2a4] [502] GET https://internal-nginx-svc.cp4d.svc.cluster.local:12443/v2/environments (115 ms), qs={"types":"default_spark,remote_spark,notebook,wxdata_spark","exclude_types":[],"project_id":"8867b602-c222-4e57-bb06-5b803c22d28f"}
[2026-04-28T03:58:21.107] [[91mERROR[39m] [application ] [transaction_ID=4ec0efa3-cede-461d-8872-f535c477a2a4] StatusCodeError: Request failed with unexpected status code. Expected one of [200] but was 502
at StatusCodeError.ContextualError [as constructor] (/home/node/ws-notebooks-ui/node_modules/@ax/common-js-node-fetch/bundles/ax-common-js-node-fetch.umd.js:506:30)
at new StatusCodeError (/home/node/ws-notebooks-ui/node_modules/@ax/common-js-node-fetch/bundles/ax-common-js-node-fetch.umd.js:515:30)
at /home/node/ws-notebooks-ui/node_modules/@ax/common-js-node-fetch/bundles/ax-common-js-node-fetch.umd.js:748:64
at step (/home/node/ws-notebooks-ui/node_modules/@ax/common-js-node-fetch/bundles/ax-common-js-node-fetch.umd.js:215:29)
at Object.next (/home/node/ws-notebooks-ui/node_modules/@ax/common-js-node-fetch/bundles/ax-common-js-node-fetch.umd.js:164:55)
at fulfilled (/home/node/ws-notebooks-ui/node_modules/@ax/common-js-node-fetch/bundles/ax-common-js-node-fetch.umd.js:145:30)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
transaction_ID: '4ec0efa3-cede-461d-8872-f535c477a2a4',
expectedStatusCodes: [ 200 ],
statusCode: 502,
status: 502,
method: 'GET',
url: 'https://internal-nginx-svc.cp4d.svc.cluster.local:12443/v2/environments',
response: Response {
size: 0,
timeout: 45000,
statusCode: 502,
[Symbol(Body internals)]: { body: [PassThrough], disturbed: true, error: null },
[Symbol(Response internals)]: {
url: 'https://internal-nginx-svc.cp4d.svc.cluster.local:12443/v2/environments?types=default_spark%2Cremote_spark%2Cnotebook%2Cwxdata_spark&exclude_types=&project_id=8867b602-c222-4e57-bb06-5b803c22d28f',
status: 502,
statusText: 'Bad Gateway',
headers: [Headers],
counter: 0
}
},
client: 'environments-client',
responseBody: 'Error 502 - Bad Gateway'
}

ax-ws-notebooks-ui-deploy runtime log:

[2026-04-28T03:58:10.727] [[32mINFO [39m] [application ] [transaction_ID=e2f6f1f2-664b-42f5-b289-f19146bf07cc] Group membership established
[2026-04-28T03:58:10.989] [[32mINFO [39m] [outgoing-request] [transaction_ID=null] [200] GET https://internal-nginx-svc.cp4d.svc.cluster.local:12443/v2/environments (261 ms), qs={"project_id":"8867b602-c222-4e57-bb06-5b803c22d28f"}
[2026-04-28T03:58:11.014] [[32mINFO [39m] [incoming-request] [transaction_ID=e2f6f1f2-664b-42f5-b289-f19146bf07cc] [200] POST /analytics/notebooks/v2/api/notebooks/enhanced?project_id=8867b602-c222-4e57-bb06-5b803c22d28f (293 ms)
[2026-04-28T03:58:21.204] [[32mINFO [39m] [incoming-request] [transaction_ID=f53eebcd-b99f-4633-aa64-96305c465c74] [200] GET /analytics/notebooks/v2/static/graphics/http-error-other-spread.svg (3 ms)
[2026-04-28T03:58:24.792] [[91mERROR[39m] [iamid-auth ] [transaction_ID=null] Unexpected response from entitlement check: 500
[2026-04-28T03:58:24.792] [[91mERROR[39m] [iamid-auth ] [transaction_ID=null] {"code":500,"error":"Internal Server Error","reason":"Failed to retrieve entitlements from cache: Failed to retrieve entitlements from Redis: Failed to get key 'pca:ent:999' from Redis store: Name: AbortError. Message: GET can't be processed. The connection is already closed...","message":"The API encountered an unexpected condition which prevented it from fulfilling the request.","description":"[500] Internal Server Error: Failed to retrieve entitlements from cache: Failed to retrieve entitlements from Redis: Failed to get key 'pca:ent:999' from Redis store: Name: AbortError. Message: GET can't be processed. The connection is already closed... The API encountered an unexpected condition which prevented it from fulfilling the request."}
[2026-04-28T03:58:24.792] [[91mERROR[39m] [iamid-auth ] [transaction_ID=null] Unexpected response from entitlement check: 500
[2026-04-28T03:58:24.793] [[32mINFO [39m] [application ] [transaction_ID=11dee872-0220-4e65-9343-414514a33c53] Group membership established
[2026-04-28T03:58:24.796] [[32mINFO [39m] [incoming-request] [transaction_ID=11dee872-0220-4e65-9343-414514a33c53] [204] GET /analytics/notebooks/v2/api/accounts/sessionpulse (22 ms)

Cause

The error "Failed to get key 'pca:ent:999' from Redis store:" is not coming from Notebooks components. This just prints the error message as part of our log. The root cause is within the v2/entitlements endpoint, which appears to be hosted within portal-common-api-* pods. This would normally be caused by stale connections that need to be refreshed.

Environment

Openshift Container Platform : OCP 4.16

Cloud Platform : VSphere/VMWare

Product Version : 5.2.0

Diagnosing The Problem

Collect the Software Hub Diagnostics and review the logs for the Watson Studio and Common Core Services.

Also check the following in your cluster:

Worker nodes were in a Ready state;
The cc-home-pvc to confirm it had plenty of space available;
Run "$ oc get pod -A -o wide| grep -Ev '([[:digit:]])/\1.*R' | grep -v Completed" and saw a number of pods in a Pending/Init state, but none were related to Watson Studio service;
Checked the active_scripts symlink in the asset-files-api pod and it is created properly and pointing to the correct folder and permissions are set fine (not owned by root);

Resolving The Problem

Restart the portal-common-api-* pods in your cluster to resolve this issue.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSGU851","label":"IBM Watson Studio for IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p0000006xtMAAQ","label":"Services-\u003EData Science Tools-\u003EWatson Studio"}],"ARM Case Number":"TS022052394","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Tips

ax-environment-api-deploy pods in CrashLoopBackOff

Troubleshooting

Problem

Symptom

Cause

Environment

Diagnosing The Problem

Resolving The Problem

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?