Changing foundation model sharding configuration
You can divide foundation models into smaller units called shards and run the model shards on multiple GPUs. You can configure how foundation models are sharded for the IBM watsonx.ai service. Every model shard requires a GPU to run.
Before you begin
- You must be an instance administrator.
- Follow the appropriate steps for changing or removing the sharding configuration for foundation
models that are installed in your deployment.
By default, the number of shards a foundation model is partitioned into is equal to the minimum number of GPUs required to run the model. When you remove the sharding configuration for a particular foundation model, the model is partitioned into the default number of shards that run on the minimum number of GPUs.
For information about how to shard foundation models during installation, see Adding foundation models.
Procedure
To change how a foundation model that is already installed is sharded:
You can patch the custom resource for the
watsonxaiifm service to modify the
sharding configuration in the following ways:
- To change the number of shards into which a foundation model is partitioned:
oc patch watsonxaiifm watsonxaiifm-cr \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --type merge \ --patch '{"spec": {"model_install_parameters":{"<model_id_with_underscore>":{"shard": <shard-value>}}}}'- When you specify the model ID in the
<model_id_with_underscore>variable, replace hyphens in the model ID with underscores. For example, for theibm-granite-13b-chat-v2model, use"ibm_granite_13b_chat_v2". - Replace
<shard-value>with the number of shards (number of GPUs) you want to use. Accepted shard values are2,4, or8only. If you specify a value other than one of these accepted values, the default shard value for the model is used. No message is shown to inform you that your configuration change is not applied.
- When you specify the model ID in the
- To remove all sharding configuration settings that were applied to the service:
oc patch watsonxaiifm watsonxaiifm-cr \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --type json \ --patch '[{ "op": "remove", "path": "/spec/model_install_parameters" }]' - To remove the sharding configuration for a foundation model and return to the default
sharding settings for that
model:
When you specify the model ID of the foundation model in theoc patch watsonxaiifm watsonxaiifm-cr \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --type json \ --patch '[{ "op": "remove", "path": "/spec/model_install_parameters/<model_id_with_underscore>/shard" }]'<model_id_with_underscore>variable, replace hyphens in the model ID with underscores. For example, for theibm-granite-13b-chat-v2model, use"ibm_granite_13b_chat_v2". -
To remove node assignments for foundation model shards and let the nodes be assigned by the cluster based on availability:
When you specify the model ID in theoc patch watsonxaiifm watsonxaiifm-cr \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --type json \ --patch '[{ "op": "remove", "path": "/spec/model_install_parameters/<model_id_with_underscore>/nodeSelector" }]'<model_id_with_underscore>variable, replace hyphens in the model ID with underscores. For example, for theibm-granite-13b-chat-v2model, use"ibm_granite_13b_chat_v2".