Properties and parameters for custom foundation models
You can set and adjust the parameters of your custom foundation model to define its behavior.
Model parameters
You must enter the following details when you register your custom foundation model:
| Field | Type | Required or optional | Description |
|---|---|---|---|
model_id |
String | Required | Specify the ID of the custom foundation model. |
location |
Object | Required | Specify the location of the custom foundation model. See Location properties. |
tags |
String | Optional | Provide additional metadata about the model. |
parameters |
Object | Optional | Specify the parameters of the model. See Global parameters for custom foundation models |
functions |
String | Specify the functions of a model. For example: image_chat,
audio_chat
embedding, or rerank. You must first verify the available
functions in the model card. |
If the
functions field is not specified, the model defaults to text generation
and text chat (if a chat template is available):
|
Location properties
You can use the following parameters to describe the location of your deployed custom foundation model:
| Location | Type | Required or optional | Description |
|---|---|---|---|
| pvc_name | String | Required | Use this parameter to specify the Persistent Volume Claim (PVC) where your custom foundation model is stored. |
| sub_path | String | Optional | Use this parameter to specify the subpath of the model within the PVC. |
Global parameters for custom foundation models
- Time series models do not take any parameters. Do not provide any global parameters when you are setting up or deploying a custom time series model.
- Models that use a custom inference runtime image don't accept parameters at deployment creation stage. You must set these parameters either when you create the runtime definition, or during model registration.
- You must set the value of your base model parameters within the range that is specified in the
following table. If you don't do that, your deployment might fail and inferencing will not be
possible. If the default values for your model parameters result in an error, modify the model's
registry in the
watsonxaiifmCR.
You can use the following global parameters for your custom foundation models:
| Parameter | Type | Range of values | Default value | Description |
|---|---|---|---|---|
max_num_seqs |
Number | max_num_seqs >= 1 |
16 | Specifies the maximum number of sequences (requests) that are processed in parallel during inference. Higher values increase throughput but require more KV cache memory. |
max_model_length |
Number | max_model_length >= 20; max_model_length <=
model_context_length x max_num_seqs <= available KV cache
memory |
2048 | Specifies the maximum total number of tokens (input + output) per sequence. Must be within
the model's context length and chosen based on the value of max_num_seqs. Both of
these parameters affect KV cache memory usage. |
These optional parameters apply only to models that have a chat API and use the vLLM runtime engine.
| Parameter | Type | Range of values | Default value | Description |
|---|---|---|---|---|
tool_call_parser |
String | Name of the tool parser that matches the model | N/A | Enables automatic selection from a list of tools that are provided by user at inference phase. You can find the list of available parsers in vLLM documentation |
chat_template |
String | Name of the template file | N/A | Overrides the standard chat template that is provided with the model. For more information, see Setting up storage and uploading the model. |
From release 5.2.2, to ensure lower token consumption and increased inferencing speed in repeated
inference scenarios, models that use the vLLM runtime engine have prefix caching set to
true by default. If your use case is different or you're experiencing issues such
as high cache usage and OOM (out of memory) errors, add the enable_prefix_caching
parameter to your model parameters and set its value to false.
Properties for global parameters for custom foundation models
You can use the following properties for the global parameters for custom foundation models:
| Property | Type | Required or optional | Description |
|---|---|---|---|
name |
String | Required | Use this property to specify the name of the parameter. |
default |
String, number, boolean | Required | Use this property to specify the default value of the parameter. |
min |
Number | Optional | Use this property to specify the minimum value of the parameter. The min
value must be less than or equal to the entered value. |
max |
Number | Optional | Use this property to specify the maximum value of the parameter. The max
value must be greater than or equal to the entered value. |
options |
String, number | Optional | Use this property to specify a list of options to choose for the parameter. The type of
options value must be the same as parameter value. The selected value must be from within the
options list. |
- For models that use standard inference runtimes:
- If you don't set default parameters during the model registration phase, the default parameters are set automatically at the deployment creation phase. You can then override them during an update.
- If you set default model parameters at the model registration phase, you can then override them at the creation phase and during an update.
- Time-series models do not take any parameters. Do not provide any parameters when you are deploying a custom time-series model. If you provide parameters when you deploy a custom time-series model, they will have no effect.
- Models that use a custom inference runtime image ignore parameters that are set at deployment creation stage. You must set these parameters either when you create the runtime definition, or during model registration. Also, the list of accepted parameters might be different from the list of parameters that are used by models that use standard inference runtimes.