Adding custom foundation models to watsonx.ai Lightweight Engine
To review the curated foundation models that are available with IBM watsonx.ai to check whether an existing model might meet your needs, see System requirements for foundation models in IBM watsonx.ai.
Prerequisites
The IBM watsonx.ai service must be installed in lightweight engine mode.
Supported foundation model architectures
To check the architecture of a foundation model, find the config.json file
for the foundation model, and then check the model_type value.
| Model type | Supported quantization methods |
|---|---|
| bloom | Not applicable |
| falcon | Not applicable |
| gemma2 | Not applicable |
| gpt_bigcode | GPTQ |
| gpt_neox | Not applicable |
| gptj | Not applicable |
| llama | GPTQ |
| llama2 | GPTQ |
| mistral | Not applicable |
| mixtral | GPTQ |
| nemotron | Not applicable |
| olmo | Not applicable |
| persimmon | Not applicable |
| phi | Not applicable |
| phi3 | Not applicable |
| qwen2 | AWQ |
| sphinx | Not applicable |
| Model type | Supported quantization methods |
|---|---|
| tinytimemixer | Not applicable |
- Quantization methods
- Quantization is a process that reduces the mount of compute resources and memory used hen you
inference a foundation model. You can set the following quantization methods for foundation models
with the architectures that are listed
- Post-training quantization for generative pre-trained transformers (GPTQ)
- Activation aware quantization (AWQ)
Procedure
- Upload the model.
Follow the steps in the Setting up storage and uploading the model procedure.
Make a note of the
pvc_namefor the persistent volume claim where you store the downloaded model source files.Important: Complete only the storage setup and model download tasks, and then return to this procedure. Other steps in the full-service installation instructions describe how to create a deployment to host the custom foundation model. You do not need to set up a deployment to use custom foundation models from a watsonx.ai lightweight engine installation. - Create a ConfigMap file for the custom foundation model.
ConfigMap files are used by the Red Hat® OpenShift® AI layer of the service to serve configuration information to independent containers that run in pods or to other system components, such as controllers. See Creating a ConfigMap file.
- To register the custom foundation model, apply the ConfigMap file by using
the following
command:
The service operator picks up the configuration information and applies it to your cluster.oc apply -f configmap.yml - You can check the status of the service by using the following command. When
Completedis returned, the custom foundation models are ready for use.oc get watsonxaiifm -n ${PROJECT_CPD_INST_OPERANDS}
Creating a ConfigMap file
| ConfigMap field | Description |
|---|---|
metadata.name |
Model name with hyphens as delimiters. For example, if the model name is
tiiuae/falcon-7b, specify tiiuae-falcon-7b. |
data.model. |
Model name with underscores as delimiters <full_model_name>. For
example, if the model name is tiiuae/falcon-7b, specify
tiiuae_falcon_7b. |
data.model.<full_model_name>.pvc_name |
Persistent volume claim where the model source files are stored. Use the
pvc_name that you noted in an earlier step. For example,
tiiuae-falcon-7b-pvc |
data.model.<full_model_name>.pvc_size |
Size of persistent volume claim where the model source files are stored. For example,
60Gi. |
data.model.<full_model_name>.dir_name |
Directory where the model content is stored. This value matches the
MODEL_PATH from the model download job. For example,
models--tiiuae-falcon-7b |
data.model.<full_model_name>.storage_uri |
Universal resource identifier for the directory where the model source files are stored
with the syntax pvc://<pvc where model is downloaded>/. For example,
pvc://tiiuae-falcon-7b-pvc/. |
data.model.<full_model_name>.env.DTYPE_STR |
Data type of text strings that the model can process. For example,
float16.For more information about supported values, see Global parameters for custom foundation models. |
data.model.<full_model_name>.annotations.
productVersion |
The IBM
watsonx.ai service operator version. For example, 9.1.0.
To get this value, use the following command: |
data.model.<full_model_name>.annotations.cloudpakInstanceId |
The IBM® Software
Hub instance ID. For
example, b0871d64-ceae-47e9-b186-6e336deaf1f1.To get this value, use the
following command: |
data.model.<full_model_name>.labels_syom.icpdsupport/module |
Model name with hyphens as delimiters. For example, if the model name is
tiiuae/falcon-7b, specify tiiuae-falcon-7b |
data.model.<full_model_name>.labels_syom.app |
Model name with hyphens as delimiters and prefixed with text-. For
example, if the model name is tiiuae/falcon-7b, specify
text-tiiuae-falcon-7b. |
data.model.<full_model_name>.labels_syom.syom_model |
Model name with single hyphens as delimiters, except for the first delimiter, which uses
two hyphens. For example, tiiuae--falcon-7b. |
data.model.<full_model_name>.wx_inference_proxy. |
Model ID (<full/model_name>). For example,
tiiuae/falcon-7b |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.label |
Model name without provider prefix. For example, falcon-7b. |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.provider |
Model provider. For example, tiiuae |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.short
discription of model |
Short description of the model in less than 100 characters. |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.long
discription of model |
Long description of the model. |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.min_shot_size |
min shot size |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.tier |
Model tier. |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.number_params |
Number of model parameters. For example, 7b |
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.lifecycle.available.since_version |
The first IBM
watsonx.ai service operator version in which the model was added. For
examples, 9.1.0. |
What to do next
To test the custom foundation model that you added to a watsonx.ai lightweight engine installation, submit an inference request to the model programmatically. For more details, see Working with the watsonx.ai lightweight engine.