Configuring NVIDIA Inference Microservices (NIMs)

After you install CAS, the cas-config configmap must be created. It provides the nv-ingest endpoint information.

To determine the nv-ingest endpoint information, you must know the namespace in which nv-ingest is installed. The endpoints are constructed by using the pattern http(s)://..svc.cluster.local. When you use the default installation namespace of nv-ingest and because it does not support https by default, it results in the following URLs:
  • Embed service URL: http://llama-32-nv-embedqa-1b-v2.nv-ingest.svc.cluster.local
  • Ingest service URL: http://nv-ingest.nv-ingest.svc.cluster.local
Note: These URLs must not end with the / character.
To complete this activity, use the oc command line tool and run the following command:
oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: cas-config
  namespace: ibm-cas
data:
  NVMM_EMBED_SERVICE: <Embed Service URL as determined above>
  NVMM_NIM_SERVICE:  <Ingest Service URL as determined above>
EOF

NVIDIA re-ranker

Optionally, you can set NVMM_NEMO_RANKER in the CasInstall CR to enable the NVIDIA re-ranker service. This service analyzes semantic relevance and reorders search results to improve precision in enterprise search and streamline AI-driven workflows. For more information, see the NVIDIA documentation.

  • Re-ranker service URL: http://llama-32-nv-rerankqa-1b-v2.nv-ingest.svc.cluster.local
Note: The service URL must not end with the / character.

To enable the NVIDIA re-ranker service, set the NVMM_NEMO_RANKER flag to YES and the NVMM_NEMO_RANKER_SERVICE flag to the re-ranker service URL in the cas-config ConfigMap:

spec:
oc patch configmap cas-config \
  -n ibm-cas \
  --type merge \
  -p '{"data":{"NVMM_NEMO_RANKER": "YES", "NVMM_NEMO_RANKER_SERVICE": "<Re-ranker service URL as determined above>"}}'