Troubleshooting
Problem
In Terraform Enterprise version 2.0.0 and 2.0.1, you may observe that some nodes in your installation begin to fail health checks. Load balancers or orchestration platforms may report these nodes as unhealthy, leading to service degradation. When you inspect the readiness endpoint, it returns a 503 Service Unavailable HTTP status code, and the response body indicates that the atlas check is in an ERROR state.
Symptom
Symptoms of this issue include failing health checks to /api/v1/health/readiness and runs becoming stuck at "Sentinel policies running".

Cause
This issue is caused by a known bug in Terraform Enterprise related to Policy Checks and Policy Evaluations. When these features are in use, requests from the policy worker to the main web platform can generate database queries that hang indefinitely. This exhausts the available web workers in the atlas container, preventing it from serving new requests, including health checks.
Environment
- Terraform Enterprise 2.0.0 and 2.0.1
- Policy Checks or Policy Evaluation enforced on workspaces
Diagnosing The Problem
Executing a curl command from within a failing container against the readiness endpoint shows the atlas check in status ERRORED:
$ curl http://127.0.0.1:8080/api/v1/health/readiness
{"node":"terraform-enterprise-b677f7fbf-t8k8x","status":"ERROR","checks":[{"check":"archivist","status":"OK"},{"check":"atlas","status":"ERROR"},{"check":"database","status":"OK"},{"check":"redis","status":"OK"},{"check":"task-worker","status":"OK"},{"check":"vault","status":"OK"}]}Further investigation by directly querying the internal health check for the atlas service (the web platform) shows that the request hangs indefinitely.
$ curl -v http://127.0.0.1:9292/_health_check
* Trying 127.0.0.1:9292...
* Connected to 127.0.0.1 (127.0.0.1) port 9292 (#0)
> GET /_health_check HTTP/1.1
> Host: 127.0.0.1:9292
> User-Agent: curl/7.81.0
> Accept: */*
>
## The request hangs here without receiving a response.
Resolving The Problem
Two solutions are available. The first is a temporary workaround to restore service stability, and the second is the permanent fix.
Solution 1: Disable Policy Checks and Evaluations (Workaround)
This workaround restores service stability by disabling the feature that triggers the bug. You must apply this change to any workspace that uses policy sets.
- Identify Affected Workspaces: Navigate to the Settings > Policies page in each organization in your Terraform Enterprise instance that have policy sets configured.
- Remove or Disable Policies: Remove all workspaces from each Policy Set's scope. Disabling the policies prevents them from being evaluated during a run, which stops the problematic database queries from being generated.
- Restart Terraform Enterprise: Restart the Terraform Enterprise nodes to restore service.
Verify the Fix: After disabling the policies, trigger runs in workspaces and monitor the health check endpoint on the nodes. You can do this by executing the
curlcommand again inside the container.curl http://127.0.0.1:8080/api/v1/health/readinessThe
atlascheck should return to anOKstatus, and the overall status should no longer beERROR.
Solution 2: Upgrade Terraform Enterprise (Permanent Fix)
The long-term solution is to upgrade your Terraform Enterprise instance to a version where this bug has been resolved.
- Monitor Release Notes: Review the official Terraform Enterprise release notes for information about a fix for the policy evaluation bug.
- Perform Upgrade: Once a version containing the fix is available, schedule and perform an upgrade of your Terraform Enterprise installation by following the official upgrade documentation.
After applying either the workaround or the permanent fix, the atlas service will no longer experience web worker exhaustion. The /api/v1/health/readiness endpoint will consistently return an HTTP 200 status code, and all health checks within the response body will report an OK status. This will restore the stability of your Terraform Enterprise installation and prevent nodes from being incorrectly marked as unhealthy.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
12 May 2026
UID
ibm17272682