Configuring GPU features and models

Use GPU features to boost your watsonx Assistant's efficiency through accelerated processing, handling complex models, real-time inference, and large-scale data handling. This improves the efficiency and accuracy of AI models. Usage of the GPU features is optional.

Permissions you need for these tasks:: You must be an administrator of the Red Hat® OpenShift® project to manage the cluster.

Complete the following tasks to configure the GPU features and supported models:

Enabling or disabling GPU features
Supported foundation models for GPU features
System requirements
Enabling a specific model
Disabling the Out of the Box model
Disabling the specialized model
Adjusting Replicas for a model
Adjusting Shards for Out of the Box model

Enabling or disabling GPU features

To enable the GPU features, use the following command:

oc patch wa wa --type=merge -p="{\"configOverrides\": {\"enabled_components\": {\"store\": {\"ifm\": true}}, \"watsonx_enabled\": true }}"

To disable the GPU features, use the following command:

oc patch wa wa --type=merge -p="{\"configOverrides\": {\"enabled_components\": {\"store\": {\"ifm\": false}}, \"watsonx_enabled\": false }}"

Supported foundation models for GPU features

5.1.2 and later

Note: Supported foundation models are available only when the GPU feature is enabled.

GPU features support the following foundation models during installation:

Specialized model in watsonx Assistant: ibm-granite-8b-unified-api-model-v2

Out of the Box models

granite-3-8b-instruct
llama-3-1-70b-instruct

You can install one or more models based on the GPU features that you want to enable. Use the following table to determine which models to install:

Model name	Requires additional model during installation	Conversational search Query rewrite	Conversational search Answer generation	Conversational skills Custom actions Information gathering
granite-3-8b-instruct	Yes. One of the following models: ibm-granite-8b-unified-api-model-v2 llama-3-1-70b-instruct	No	Yes	No
ibm-granite-8b-unified-api-model-v2	Yes. One of the following models: granite-3-8b-instruct llama-3-1-70b-instruct	Yes	No	Yes
llama-3-1-70b-instruct	No	Yes	Yes	Yes

System requirements

5.1.2 and later

The following table lists the recommended number of GPUs to configure on a single OpenShift worker node that are provided with watsonx Assistant at the default context window length.

Specialized model in watsonx Assistant

Model name	Description	System requirements	Supported GPU
Model name ibm-granite-8b-unified-api-model-v2 Model ID `ibm-granite-8b-unified-api-model-v2`	Granite models are used for a wide range of generative and nongenerative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with more innovations from IBM Research and the open community.	CPUs 10 Memory 64 GB RAM Storage 45 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S

For details on Out of the Box models and their system requirements, see Foundation models.

Enabling a specific model

5.1.2 and later

Note: If the available GPU memory is inadequate for the installation of a new model, ensure to disable any unused models to clear the space. For more information, see Disabling the Out of the Box model or Disabling the specialized model.

When you enable GPU features, ibm-granite-8b-unified-api-model-v2 and granite-3-8b-instruct models are installed automatically. If you want to install the model of your choice, use the following commands.

To enable the

Out of the
Box

model without modifying replicas or shards:

oc patch wa wa --type='merge' -p='{"configOverrides":{"ifm":{"model_config":{"ootb":{"<model-name>":{}}}}}}'

To enable the Specialized model without modifying replicas or shards:

oc patch wa wa --type='merge' -p='{"configOverrides":{"ifm":{"model_config":{"syom":{"ibm-granite-8b-unified-api-model-v2":{}}}}}}'

If you want to change the model after installation, you must restart the store deployment to pick up the newly installed models.

oc rollout restart deployment wa-store

Disabling the `Out of the Box` model

5.1.2 and later

To disable the Out of the Box model, do the following steps:

Set the model's name that you want to remove in model_name.
```
export MODEL_NAME="<model-name>"
```

Remove the entry from watsonx Assistant custom resource.

oc patch wa wa --type json --patch "[{ "op": "remove", "path": "/configOverrides/ifm/model_config/ootb/$MODEL_NAME" }]"

Remove the entry from watsonx.ai™ IFM custom resource.

oc get watsonxaiifm watsonxaiifm-cr -o json | jq ".spec.install_model_list -= [\"${MODEL_NAME}\"]" | oc apply -f -

Remove the InferenceService resource for the model.
```
oc delete isvc ${MODEL_NAME}
```

Disabling the specialized model

5.1.2 and later

To disable the specialized model, run the following command:

oc patch wa wa --type json --patch '[{ "op": "remove", "path": "/configOverrides/ifm/model_config/syom" }]'

Adjusting `Replicas` for a model

5.1.2 and later

You can start extra model replicas to handle the increased load.

To enable and adjust replicas for the Specialized model, use the following command:

oc patch wa wa --type='merge' -p='{"configOverrides":{"ifm":{"model_config":{"syom":{"ibm-granite-8b-unified-api-model-v2":{"replicas": <replica-value>}}}}}}'

To enable and adjust replicas for the Out of the Box model, use the following command:

oc patch wa wa --type='merge' -p='{"configOverrides":{"ifm":{"model_config":{"ootb":{"<model-name>":{"replicas": <replica-value>}}}}}}'

Adjusting `Shards` for `Out of the Box` model

5.1.2 and later

To enable and adjust shards for Out of the Box model, use the following command:

oc patch wa wa --type='merge' -p='{"configOverrides":{"ifm":{"model_config":{"ootb":{"<model-name>":{"shards": <shard-value>}}}}}}'

Configuring GPU features and models

Enabling or disabling GPU features

Supported foundation models for GPU features

System requirements

Enabling a specific model

Disabling the Out of the Box model

Disabling the specialized model

Adjusting Replicas for a model

Adjusting Shards for Out of the Box model

Disabling the `Out of the Box` model

Adjusting `Replicas` for a model

Adjusting `Shards` for `Out of the Box` model