Enabling GPU metrics computation

You can increase the speed of generating evaluation results by enabling data safety, answer quality, and retrieval quality metrics computation to run on GPUs.

Procedure

You must complete the following steps to enable metrics computation to run on GPUs:

  1. Log into the cluster.
    oc login -u <user> -p <password>
  2. Go to the project that contains the watsonxaiifm and openscale custom resources.
    oc project <project_name>
  3. Patch the watsonxaiifm-cr custom resource to run the detectors on GPUs.
    oc patch watsonxaiifm watsonxaiifm-cr -p '{"spec":{"fms_detector_hap_replicas":1,"fms_detector_hap_resources":{"limits":{"cpu":"1","memory":"4Gi","nvidia.com/gpu":"1"},"requests":{"cpu":"100m","memory":"2Gi","nvidia.com/gpu":"1"}},"fms_detector_pii_replicas":1,"fms_detector_pii_resources":{"limits":{"cpu":"1","memory":"4Gi","nvidia.com/gpu":"1"},"requests":{"cpu":"100m","memory":"2Gi","nvidia.com/gpu":"1"}},"answer_relevance_replicas":1,"answer_relevance_resources":{"limits":{"cpu":"1","memory":"4Gi","nvidia.com/gpu":"1"},"requests":{"cpu":"100m","memory":"2Gi","nvidia.com/gpu":"1"}},"context_relevance_replicas":1,"context_relevance_resources":{"limits":{"cpu":"1","memory":"4Gi","nvidia.com/gpu":"1"},"requests":{"cpu":"100m","memory":"2Gi","nvidia.com/gpu":"1"}},"faithfulness_replicas":1,"faithfulness_resources":{"limits":{"cpu":"1","memory":"4Gi","nvidia.com/gpu":"1"},"requests":{"cpu":"100m","memory":"2Gi","nvidia.com/gpu":"1"}}}}' --type=merge
    If you want to perform node pinning to run the detectors in GPU nodes, run the following command:
    oc patch watsonxaiifm watsonxaiifm-cr -p '{"spec":{"detector_overrides":{"hap":{"nodeSelector":{"kubernetes.io/hostname":"<worker_hostname>"}},"pii":{"nodeSelector":{"kubernetes.io/hostname":"<worker_hostname>"}},"answer_relevance":{"nodeSelector":{"kubernetes.io/hostname":"<worker_hostname>"}},"context_relevance":{"nodeSelector":{"kubernetes.io/hostname":"<worker_hostname>"}},"faithfulness":{"nodeSelector":{"kubernetes.io/hostname":"<worker_hostname>"}}}}}' --type=merge
  4. Verify that the watsonxaiifm-cr patch completes successfully.
    oc get watsonxaiifm watsonxaiifm-cr
  5. Patch the openscale custom resource to enable the flag to leverage the enabled models.
    oc patch woservice aiopenscale -p '{"spec": {"mcm": {"use_gr_api":"True", "enable_metric_parallellism":"True"} }}' --type=merge
  6. Verify that the openscale patch completes successfully.
    oc get woservice aiopenscale