Adding custom foundation models to watsonx.ai Lightweight Engine

If the curated set of models in IBM watsonx.ai does not include the foundation model that you want to use for inferencing from your watsonx.ai lightweight engine installation, you can install your own custom model.

To review the curated foundation models that are available with IBM watsonx.ai to check whether an existing model might meet your needs, see System requirements for foundation models in IBM watsonx.ai.

Important: If you installed the full IBM watsonx.ai service, you follow different steps to add custom foundation models. For more information, see Deploying custom foundation models in IBM watsonx.ai.

Prerequisites

The IBM watsonx.ai service must be installed in lightweight engine mode.

Supported foundation model architectures

To check the architecture of a foundation model, find the config.json file for the foundation model, and then check the model_type value.

The following table lists the general-purpose model architectures that are supported by the watsonx.ai lightweight engine:

Model type	Supported quantization methods
bloom	Not applicable
falcon	Not applicable
gemma2	Not applicable
gpt_bigcode	GPTQ
gpt_neox	Not applicable
gptj	Not applicable
llama	GPTQ
llama2	GPTQ
mistral	Not applicable
mixtral	GPTQ
nemotron	Not applicable
olmo	Not applicable
persimmon	Not applicable
phi	Not applicable
phi3	Not applicable
qwen2	AWQ
sphinx	Not applicable

The following table lists the time-series model architectures that are supported by the watsonx.ai lightweight engine:

Model type	Supported quantization methods
tinytimemixer	Not applicable

Quantization methods

Quantization is a process that reduces the mount of compute resources and memory used hen you inference a foundation model. You can set the following quantization methods for foundation models with the architectures that are listed

Post-training quantization for generative pre-trained transformers (GPTQ)
Activation aware quantization (AWQ)

Procedure

A system administrator must complete these steps to add a custom foundation model to the IBM watsonx.ai lightweight engine.

Upload the model.
Follow the steps in the Setting up storage and uploading the model procedure.

Make a note of the pvc_name for the persistent volume claim where you store the downloaded model source files.

Important: Complete only the storage setup and model download tasks, and then return to this procedure. Other steps in the full-service installation instructions describe how to create a deployment to host the custom foundation model. You do not need to set up a deployment to use custom foundation models from a watsonx.ai lightweight engine installation.
Create a ConfigMap file for the custom foundation model.
ConfigMap files are used by the Red Hat® OpenShift® AI layer of the service to serve configuration information to independent containers that run in pods or to other system components, such as controllers. See Creating a ConfigMap file.
To register the custom foundation model, apply the ConfigMap file by using the following command:
```
oc apply -f configmap.yml
```
The service operator picks up the configuration information and applies it to your cluster.
You can check the status of the service by using the following command. When Completed is returned, the custom foundation models are ready for use.
```
oc get watsonxaiifm -n ${PROJECT_CPD_INST_OPERANDS}
```

Creating a ConfigMap file

Create a ConfigMap file for the custom foundation model by copying the following template, and then replacing the variables in the template with the appropriate values for your foundation model. The following table lists the variables for you to replace in the template.

ConfigMap field	Description
`metadata.name`	Model name with hyphens as delimiters. For example, if the model name is `tiiuae/falcon-7b`, specify `tiiuae-falcon-7b`.
`data.model.`	Model name with underscores as delimiters `<full_model_name>`. For example, if the model name is `tiiuae/falcon-7b`, specify `tiiuae_falcon_7b`.
`data.model.<full_model_name>.pvc_name`	Persistent volume claim where the model source files are stored. Use the `pvc_name` that you noted in an earlier step. For example, `tiiuae-falcon-7b-pvc`
`data.model.<full_model_name>.pvc_size`	Size of persistent volume claim where the model source files are stored. For example, `60Gi`.
`data.model.<full_model_name>.dir_name`	Directory where the model content is stored. This value matches the `MODEL_PATH` from the model download job. For example, `models--tiiuae-falcon-7b`
`data.model.<full_model_name>.storage_uri`	Universal resource identifier for the directory where the model source files are stored with the syntax `pvc://<pvc where model is downloaded>/`. For example, `pvc://tiiuae-falcon-7b-pvc/`.
`data.model.<full_model_name>.env.DTYPE_STR`	Data type of text strings that the model can process. For example, `float16`. For more information about supported values, see Global parameters for custom foundation models.
`data.model.<full_model_name>.annotations. productVersion`	The IBM watsonx.ai service operator version. For example, `9.1.0`. To get this value, use the following command: `oc get watsonxaiifm watsonxaiifm-cr -o jsonpath="{.spec.version}"`
`data.model.<full_model_name>.annotations.cloudpakInstanceId`	The IBM® Software Hub instance ID. For example, `b0871d64-ceae-47e9-b186-6e336deaf1f1`. To get this value, use the following command: `oc get cm product-configmap -o jsonpath="{.data.CLOUD_PAK_INSTANCE_ID}"`
`data.model.<full_model_name>.labels_syom.icpdsupport/module`	Model name with hyphens as delimiters. For example, if the model name is `tiiuae/falcon-7b`, specify `tiiuae-falcon-7b`
`data.model.<full_model_name>.labels_syom.app`	Model name with hyphens as delimiters and prefixed with `text-`. For example, if the model name is `tiiuae/falcon-7b`, specify `text-tiiuae-falcon-7b`.
`data.model.<full_model_name>.labels_syom.syom_model`	Model name with single hyphens as delimiters, except for the first delimiter, which uses two hyphens. For example, `tiiuae--falcon-7b`.
`data.model.<full_model_name>.wx_inference_proxy.`	Model ID (`<full/model_name>`). For example, `tiiuae/falcon-7b`
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.label`	Model name without provider prefix. For example, `falcon-7b`.
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.provider`	Model provider. For example, `tiiuae`
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.short discription of model`	Short description of the model in less than 100 characters.
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.long discription of model`	Long description of the model.
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.min_shot_size`	min shot size
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.tier`	Model tier.
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.number_params`	Number of model parameters. For example, `7b`
`data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.lifecycle.available.since_version`	The first IBM watsonx.ai service operator version in which the model was added. For examples, `9.1.0`.

For example ConfigMap, see Registering custom foundation models for global deployment.

What to do next

To test the custom foundation model that you added to a watsonx.ai lightweight engine installation, submit an inference request to the model programmatically. For more details, see Working with the watsonx.ai lightweight engine.