Installing operators for services that require GPUs

If you plan to install services that require graphics processing units (GPUs), you must install several operators that support the management of all NVIDIA software components that are needed to provision GPUs. In addition, if you plan to install services that use Inference foundation models, you must install Red Hat® OpenShift® AI to start and serve the models.

Installation phase
  • You are not here. Setting up a client workstation
  • You are not here. Setting up a cluster
  • You are not here. Collecting required information
  • You are not here. Preparing to run installs in a restricted network
  • You are not here. Preparing to run installs from a private container registry
  • You are here icon. Preparing the cluster for Cloud Pak for Data
  • You are not here. Preparing to install an instance of Cloud Pak for Data
  • You are not here. Installing an instance of Cloud Pak for Data
  • You are not here. Setting up the Cloud Pak for Data control plane
  • You are not here. Installing solutions and services
Who needs to complete this task?

Cluster administrator A cluster administrator must complete this task.

When do you need to complete this task?
One-time setup Complete this task if you plan to install one or more of the following services:
  • IBM Knowledge Catalog Premium
  • IBM Knowledge Catalog Standard
  • Watson Machine Learning (required to use optional GPU features)
  • Watson Machine Learning Accelerator
  • Watson Studio Runtimes that require GPU
  • watsonx.ai
  • watsonx Assistant (required to use optional GPU features)
  • watsonx Code Assistant for Red Hat Ansible® Lightspeed
  • watsonx Code Assistant for Z
  • watsonx Code Assistant for Z Code Explanation
  • watsonx.governance (the service does not require GPUs but does have a dependency on Red Hat OpenShift AI)
  • watsonx Orchestrate

About this task

All of the services that require GPUs require the Node Feature Discovery Operator and the NVIDIA GPU Operator. However, some services also require the Red Hat OpenShift AI operator.

Review the following table to determine which operators you must install based on the Cloud Pak for Data services that you plan to install:

Service Node Feature Discovery Operator NVIDIA GPU Operator Red Hat OpenShift AI
IBM Knowledge Catalog Premium
IBM Knowledge Catalog Standard
Watson Machine Learning Not required.
Watson Machine Learning Accelerator Not required.
Watson Studio Runtimes that require GPU Not required.
watsonx.ai
watsonx Assistant Required to use optional GPU features. Required to use optional GPU features. Required to use optional GPU features.
watsonx Code Assistant for Red Hat Ansible Lightspeed
watsonx Code Assistant for Z
watsonx Code Assistant for Z Code Explanation
watsonx.governance Not required Not required
  • 5.0.0 Required.
  • 5.0.1 or later Not required.
watsonx Orchestrate

Procedure

The steps that you must complete depend on whether your cluster is connected to the internet:

  1. To install the operators on a cluster that can connect to the internet:
    1. Complete Installing the Node Feature Discovery (NFD) Operator in the NVIDIA GPU Operator on Red Hat OpenShift Container Platform documentation.
    2. Complete Installing the NVIDIA GPU Operator in the NVIDIA GPU Operator on Red Hat OpenShift Container Platform documentation.
    3. If you need to install Red Hat OpenShift AI, complete Preparing Red Hat OpenShift AI for use in IBM Cloud Pak for Data in the Red Hat OpenShift AI documentation.
  2. To install the operators on a cluster that is disconnected or air-gapped:
    1. Complete Deploy GPU Operators in a disconnected or air-gapped environment in the NVIDIA GPU Operator on Red Hat OpenShift Container Platform documentation.
    2. If you need to install Red Hat OpenShift AI, complete Preparing Red Hat OpenShift AI for use in IBM Cloud Pak for Data in the Red Hat OpenShift AI documentation.

What to do next

Now that you've installed the operators for services that require GPUs, you're ready to complete Creating secrets for services that use Multicloud Object Gateway.

You can optionally complete Configuring NVIDIA Multi-Instance GPU (MIG) if your environment includes services that support MIG.