Installing operators for services that require GPUs
If you plan to install services that require graphics processing units (GPUs), you must install several operators that support the management of all NVIDIA software components that are needed to provision GPUs. In addition, if you plan to install services that use Inference foundation models, you must install Red Hat® OpenShift® AI to start and serve the models.
- Installation phase
-
- Setting up a client workstation
- Setting up a cluster
- Collecting required information
- Preparing to run installs in a restricted network
- Preparing to run installs from a private container registry
- Preparing the cluster for Cloud Pak for Data
- Preparing to install an instance of Cloud Pak for Data
- Installing an instance of Cloud Pak for Data
- Setting up the Cloud Pak for Data control plane
- Installing solutions and services
- Who needs to complete this task?
-
Cluster administrator A cluster administrator must complete this task.
- When do you need to complete this task?
-
One-time setup Complete this task if you plan to install one or more of the following services:
- IBM Knowledge Catalog Premium
- IBM Knowledge Catalog Standard
- Watson Machine Learning (required to use optional GPU features)
- Watson Machine Learning Accelerator
- Watson Studio Runtimes that require GPU
- watsonx.ai
- watsonx Assistant (required to use optional GPU features)
- watsonx Code Assistant for Red Hat Ansible® Lightspeed
- watsonx Code Assistant for Z
- watsonx Code Assistant for Z Code Explanation
- watsonx.governance (the service does not require GPUs but does have a dependency on Red Hat OpenShift AI)
- watsonx Orchestrate
About this task
All of the services that require GPUs require the Node Feature Discovery Operator and the NVIDIA GPU Operator. However, some services also require the Red Hat OpenShift AI operator.
Review the following table to determine which operators you must install based on the Cloud Pak for Data services that you plan to install:
Service | Node Feature Discovery Operator | NVIDIA GPU Operator | Red Hat OpenShift AI |
---|---|---|---|
IBM Knowledge Catalog Premium | ✓ | ✓ | ✓ |
IBM Knowledge Catalog Standard | ✓ | ✓ | ✓ |
Watson Machine Learning | ✓ | ✓ | Not required. |
Watson Machine Learning Accelerator | ✓ | ✓ | Not required. |
Watson Studio Runtimes that require GPU | ✓ | ✓ | Not required. |
watsonx.ai | ✓ | ✓ | ✓ |
watsonx Assistant | Required to use optional GPU features. | Required to use optional GPU features. | Required to use optional GPU features. |
watsonx Code Assistant for Red Hat Ansible Lightspeed | ✓ | ✓ | ✓ |
watsonx Code Assistant for Z | ✓ | ✓ | ✓ |
watsonx Code Assistant for Z Code Explanation | ✓ | ✓ | ✓ |
watsonx.governance | Not required | Not required |
|
watsonx Orchestrate | ✓ | ✓ | ✓ |
Procedure
The steps that you must complete depend on whether your cluster is connected to the internet:
What to do next
Now that you've installed the operators for services that require GPUs, you're ready to complete Creating secrets for services that use Multicloud Object Gateway.
You can optionally complete Configuring NVIDIA Multi-Instance GPU (MIG) if your environment includes services that support MIG.