Configuring NVIDIA Inference Microservices (NIMs)

After you install CAS, the cas-config configmap must be created. It provides the NeMo Retriever Library endpoint information.

To determine the NeMo Retriever Library endpoint information, you must know the namespace in which NeMo Retriever Library is installed. The endpoints are constructed by using the pattern http(s)://..svc.cluster.local. When you use the default installation namespace of NeMo Retriever Library and because it does not support https by default, it results in the following URLs:
  • Embed service URL: http://llama-32-nv-embedqa-1b-v2.nv-ingest.svc.cluster.local
  • Ingest service URL: http://nv-ingest.nv-ingest.svc.cluster.local
Note: These URLs must not end with the / character.
To complete this activity, use the oc command line tool and run the following command:
oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: cas-config
  namespace: ibm-cas
data:
  NVMM_EMBED_SERVICE: <Embed Service URL as determined above>
  NVMM_NIM_SERVICE:  <Ingest Service URL as determined above>
EOF

NVIDIA re-ranker

Optionally, you can set NVMM_NEMO_RANKER in the CasInstall CR to enable the NVIDIA re-ranker service. This service analyzes semantic relevance and reorders search results to improve precision in enterprise search and streamline AI-driven workflows. For more information, see the NVIDIA documentation.

  • Re-ranker service URL: http://llama-32-nv-rerankqa-1b-v2.nv-ingest.svc.cluster.local
Note: The service URL must not end with the / character.

To enable the NVIDIA re-ranker service, set the NVMM_NEMO_RANKER flag to YES and the NVMM_NEMO_RANKER_SERVICE flag to the re-ranker service URL in the cas-config ConfigMap:

spec:
oc patch configmap cas-config \
  -n ibm-cas \
  --type merge \
  -p '{"data":{"NVMM_NEMO_RANKER": "YES", "NVMM_NEMO_RANKER_SERVICE": "<Re-ranker service URL as determined above>"}}'