Configuring NVIDIA Inference Microservices (NIMs)
After you install CAS, the
cas-config configmap must be created. It provides the NeMo Retriever Library endpoint information.
To determine the NeMo Retriever Library endpoint
information, you must know the namespace in which NeMo Retriever Library is installed. The endpoints are
constructed by using the pattern
http(s)://..svc.cluster.local. When you use the
default installation namespace of NeMo Retriever Library
and because it does not support https by default, it results in the following URLs:- Embed service URL:
http://llama-32-nv-embedqa-1b-v2.nv-ingest.svc.cluster.local - Ingest service URL:
http://nv-ingest.nv-ingest.svc.cluster.local
Note: These URLs must not end with the
/ character.To complete this activity, use the oc command line tool and run the following
command:
oc apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: cas-config
namespace: ibm-cas
data:
NVMM_EMBED_SERVICE: <Embed Service URL as determined above>
NVMM_NIM_SERVICE: <Ingest Service URL as determined above>
EOFNVIDIA re-ranker
Optionally, you can set NVMM_NEMO_RANKER in the CasInstall CR
to enable the NVIDIA re-ranker service.
This service analyzes semantic relevance and reorders search results to improve precision in
enterprise search and streamline AI-driven workflows. For more information, see the NVIDIA documentation.
- Re-ranker service URL:
http://llama-32-nv-rerankqa-1b-v2.nv-ingest.svc.cluster.local
Note: The service URL must not end with the
/ character.To enable the NVIDIA re-ranker service,
set the NVMM_NEMO_RANKER flag to YES and the
NVMM_NEMO_RANKER_SERVICE flag to the re-ranker service URL in the
cas-config ConfigMap:
spec:
oc patch configmap cas-config \
-n ibm-cas \
--type merge \
-p '{"data":{"NVMM_NEMO_RANKER": "YES", "NVMM_NEMO_RANKER_SERVICE": "<Re-ranker service URL as determined above>"}}'