Adding custom foundation models to watsonx.ai Lightweight Engine

If the curated set of models in IBM watsonx.ai does not include the foundation model that you want to use for inferencing from your watsonx.ai lightweight engine installation, you can install your own custom model.

To review the curated foundation models that are available with IBM watsonx.ai to check whether an existing model might meet your needs, see System requirements for foundation models in IBM watsonx.ai.

Important: If you installed the full IBM watsonx.ai service, you follow different steps to add custom foundation models. For more information, see Deploying custom foundation models in IBM watsonx.ai.

Prerequisites

The IBM watsonx.ai service must be installed in lightweight engine mode.

Supported foundation model architectures

To check the architecture of a foundation model, find the config.json file for the foundation model, and then check the model_type value.

The following table lists the general-purpose model architectures that are supported by the watsonx.ai lightweight engine:
Model type Supported quantization methods
bloom Not applicable
falcon Not applicable
gemma2 Not applicable
gpt_bigcode GPTQ
gpt_neox Not applicable
gptj Not applicable
llama GPTQ
llama2 GPTQ
mistral Not applicable
mixtral GPTQ
nemotron Not applicable
olmo Not applicable
persimmon Not applicable
phi Not applicable
phi3 Not applicable
qwen2 AWQ
sphinx Not applicable
The following table lists the time-series model architectures that are supported by the watsonx.ai lightweight engine:
Model type Supported quantization methods
tinytimemixer Not applicable
Quantization methods
Quantization is a process that reduces the mount of compute resources and memory used hen you inference a foundation model. You can set the following quantization methods for foundation models with the architectures that are listed
  • Post-training quantization for generative pre-trained transformers (GPTQ)
  • Activation aware quantization (AWQ)

Procedure

A system administrator must complete these steps to add a custom foundation model to the IBM watsonx.ai lightweight engine.
  1. Upload the model.

    Follow the steps in the Setting up storage and uploading the model procedure.

    Make a note of the pvc_name for the persistent volume claim where you store the downloaded model source files.

    Important: Complete only the storage setup and model download tasks, and then return to this procedure. Other steps in the full-service installation instructions describe how to create a deployment to host the custom foundation model. You do not need to set up a deployment to use custom foundation models from a watsonx.ai lightweight engine installation.
  2. Create a ConfigMap file for the custom foundation model.

    ConfigMap files are used by the Red Hat® OpenShift® AI layer of the service to serve configuration information to independent containers that run in pods or to other system components, such as controllers. See Creating a ConfigMap file.

  3. To register the custom foundation model, apply the ConfigMap file by using the following command:
    oc apply -f configmap.yml
    The service operator picks up the configuration information and applies it to your cluster.
  4. You can check the status of the service by using the following command. When Completed is returned, the custom foundation models are ready for use.
    oc get watsonxaiifm -n ${PROJECT_CPD_INST_OPERANDS}

Creating a ConfigMap file

Create a ConfigMap file for the custom foundation model by copying the following template, and then replacing the variables in the template with the appropriate values for your foundation model. The following table lists the variables for you to replace in the template.
ConfigMap field Description
metadata.name Model name with hyphens as delimiters. For example, if the model name is tiiuae/falcon-7b, specify tiiuae-falcon-7b.
data.model. Model name with underscores as delimiters <full_model_name>. For example, if the model name is tiiuae/falcon-7b, specify tiiuae_falcon_7b.
data.model.<full_model_name>.pvc_name Persistent volume claim where the model source files are stored. Use the pvc_name that you noted in an earlier step. For example, tiiuae-falcon-7b-pvc
data.model.<full_model_name>.pvc_size Size of persistent volume claim where the model source files are stored. For example, 60Gi.
data.model.<full_model_name>.dir_name Directory where the model content is stored. This value matches the MODEL_PATH from the model download job. For example, models--tiiuae-falcon-7b
data.model.<full_model_name>.storage_uri Universal resource identifier for the directory where the model source files are stored with the syntax pvc://<pvc where model is downloaded>/. For example, pvc://tiiuae-falcon-7b-pvc/.
data.model.<full_model_name>.env.DTYPE_STR Data type of text strings that the model can process. For example, float16.

For more information about supported values, see Global parameters for custom foundation models.

data.model.<full_model_name>.annotations. productVersion The IBM watsonx.ai service operator version. For example, 9.1.0.

To get this value, use the following command: oc get watsonxaiifm watsonxaiifm-cr -o jsonpath="{.spec.version}"

data.model.<full_model_name>.annotations.cloudpakInstanceId The IBM® Software Hub instance ID. For example, b0871d64-ceae-47e9-b186-6e336deaf1f1.

To get this value, use the following command: oc get cm product-configmap -o jsonpath="{.data.CLOUD_PAK_INSTANCE_ID}"

data.model.<full_model_name>.labels_syom.icpdsupport/module Model name with hyphens as delimiters. For example, if the model name is tiiuae/falcon-7b, specify tiiuae-falcon-7b
data.model.<full_model_name>.labels_syom.app Model name with hyphens as delimiters and prefixed with text-. For example, if the model name is tiiuae/falcon-7b, specify text-tiiuae-falcon-7b.
data.model.<full_model_name>.labels_syom.syom_model Model name with single hyphens as delimiters, except for the first delimiter, which uses two hyphens. For example, tiiuae--falcon-7b.
data.model.<full_model_name>.wx_inference_proxy. Model ID (<full/model_name>). For example, tiiuae/falcon-7b
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.label Model name without provider prefix. For example, falcon-7b.
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.provider Model provider. For example, tiiuae
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.short discription of model Short description of the model in less than 100 characters.
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.long discription of model Long description of the model.
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.min_shot_size min shot size
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.tier Model tier.
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.number_params Number of model parameters. For example, 7b
data.model.<full_model_name>.wx_inference_proxy.<full/model_name>.lifecycle.available.since_version The first IBM watsonx.ai service operator version in which the model was added. For examples, 9.1.0.
For example ConfigMap, see Registering custom foundation models for global deployment.

What to do next

To test the custom foundation model that you added to a watsonx.ai lightweight engine installation, submit an inference request to the model programmatically. For more details, see Working with the watsonx.ai lightweight engine.