Properties and parameters for custom foundation models

You can set and adjust the parameters of your custom foundation model to define its behavior.

Model parameters

You must enter the following details when you register your custom foundation model:

Field	Type	Required or optional	Description
`model_id`	String	Required	Specify the ID of the custom foundation model.
`location`	Object	Required	Specify the location of the custom foundation model. See Location properties.
`tags`	String	Optional	Provide additional metadata about the model.
`parameters`	Object	Optional	Specify the parameters of the model. See Global parameters for custom foundation models
`functions`	String	Specify the functions of a model. For example: `image_chat`, `audio_chat` `embedding`, or `rerank`. You must first verify the available functions in the model card.	If the `functions` field is not specified, the model defaults to text generation and text chat (if a chat template is available): If the model does not include a chat template, the default task is text generation. If the model includes a chat template, the default tasks are: text generation and text chat.

Location properties

You can use the following parameters to describe the location of your deployed custom foundation model:

Location	Type	Required or optional	Description
pvc_name	String	Required	Use this parameter to specify the Persistent Volume Claim (PVC) where your custom foundation model is stored.
sub_path	String	Optional	Use this parameter to specify the subpath of the model within the PVC.

Global parameters for custom foundation models

Important:

Time series models do not take any parameters. Do not provide any global parameters when you are setting up or deploying a custom time series model.
Models that use a custom inference runtime image don't accept parameters at deployment creation stage. You must set these parameters either when you create the runtime definition, or during model registration.
You must set the value of your base model parameters within the range that is specified in the following table. If you don't do that, your deployment might fail and inferencing will not be possible. If the default values for your model parameters result in an error, modify the model's registry in the watsonxaiifm CR.

You can use the following global parameters for your custom foundation models:

Table 1. Global parameters for all custom foundation models
Parameter	Type	Range of values	Default value	Description
`max_num_seqs`	Number	`max_num_seqs` >= 1	16	Specifies the maximum number of sequences (requests) that are processed in parallel during inference. Higher values increase throughput but require more KV cache memory.
`max_model_length`	Number	`max_model_length` >= 20; `max_model_length` <= `model_context_length` x `max_num_seqs` <= available KV cache memory	2048	Specifies the maximum total number of tokens (input + output) per sequence. Must be within the model's context length and chosen based on the value of `max_num_seqs`. Both of these parameters affect KV cache memory usage.

These optional parameters apply only to models that have a chat API and use the vLLM runtime engine.

Table 2. Global parameters that apply only to models that have a chat API
Parameter	Type	Range of values	Default value	Description
`tool_call_parser`	String	Name of the tool parser that matches the model	N/A	Enables automatic selection from a list of tools that are provided by user at inference phase. You can find the list of available parsers in vLLM documentation
`chat_template`	String	Name of the template file	N/A	Overrides the standard chat template that is provided with the model. For more information, see Setting up storage and uploading the model.

From release 5.2.2, to ensure lower token consumption and increased inferencing speed in repeated inference scenarios, models that use the vLLM runtime engine have prefix caching set to true by default. If your use case is different or you're experiencing issues such as high cache usage and OOM (out of memory) errors, add the enable_prefix_caching parameter to your model parameters and set its value to false.

Properties for global parameters for custom foundation models

You can use the following properties for the global parameters for custom foundation models:

Table 3. Properties for global parameters for custom foundation models
Property	Type	Required or optional	Description
`name`	String	Required	Use this property to specify the name of the parameter.
`default`	String, number, boolean	Required	Use this property to specify the default value of the parameter.
`min`	Number	Optional	Use this property to specify the minimum value of the parameter. The `min` value must be less than or equal to the entered value.
`max`	Number	Optional	Use this property to specify the maximum value of the parameter. The `max` value must be greater than or equal to the entered value.
`options`	String, number	Optional	Use this property to specify a list of options to choose for the parameter. The type of options value must be the same as parameter value. The selected value must be from within the `options` list.

Important:

For models that use standard inference runtimes:
- If you don't set default parameters during the model registration phase, the default parameters are set automatically at the deployment creation phase. You can then override them during an update.
- If you set default model parameters at the model registration phase, you can then override them at the creation phase and during an update.
- Time-series models do not take any parameters. Do not provide any parameters when you are deploying a custom time-series model. If you provide parameters when you deploy a custom time-series model, they will have no effect.
Models that use a custom inference runtime image ignore parameters that are set at deployment creation stage. You must set these parameters either when you create the runtime definition, or during model registration. Also, the list of accepted parameters might be different from the list of parameters that are used by models that use standard inference runtimes.