GPU and model issues
You might face watsonx Orchestrate installation issues due to GPU and models. Go though the following sections for resolutions of the problems.
wo-ai-cognitive-mapper-svc cannot schedule to GPU work node
- Symptoms
wo-ai-cognitive-mapper-svcpods might not be scheduled onto GPU worker nodes.- Diagnosing the problem
- The StatefulSet
wo-ai-cognitive-mapper-svc-6c44d9576dis configured to run only 2 replicas, and the corresponding pods:wo-ai-cognitive-mapper-svc-6c44d9576d-74z2cwo-ai-cognitive-mapper-svc-6c44d9576d-cfp88
Are running normally without scheduling issues.
The upgrade did not require more replicas, and no GPU scheduling failures were found for the active pods.
- Solution
- If you still observe an unscheduled or failing pod, you can:
- Verify pod status again and confirm whether only two pods exist (expected).
- Check pod events by using:
oc describe pod <pod-name> -n <namespace>Show more lines look specifically for GPU scheduling errors. For example, insufficient GPU, taints, node selectors, and toleration is missing.
In summary, the service is functioning as expected with two healthy replicas. Scheduling issues must be investigated only if other pods appear or if events indicate node scheduling constraints.
Agentic creation failure without models
- Symptoms
-
If no models are defined in agentic installation of watsonx Orchestrate, agent creation and default agent invocation fails, resulting in errors such as "Creating the agent failed. Please try again."
- Solution
- To resolve this issue, use one of the following approaches:
- For an agentic installation of watsonx Orchestrate, specify at least one model in the install-options.yml file.
- Use
watsonx_ai_ifmto mirror the wanted models.