Planning to deploy a custom foundation model in watsonx.ai

Review the considerations and requirements for deploying a custom foundation model for inferencing with watsonx.ai.

As you prepare to deploy a custom foundation model, review these requirements:

Consider the type of model that you are deploying. Tasks differ slightly depending on whether you are downloading a model from a public repository like Hugging Face or a model located in your environment. For each deployment task, follow the steps for your scenario.
Review the Role requirements for the tasks that are associated with deploying a custom foundation model.
Review the supported architectures for custom foundation models to make sure that your model is compatible.
Review whether or not your model requires a custom hardware specification. See Hardware requirements for custom foundation models.
Verify the list of modalities (text, audio, video, and image) that can be used when inferencing your model.

Role requirements for the tasks that are associated with deploying a custom foundation model

The system administrator must perform the following tasks:

Building a custom inference runtime image (if required)
Setting up storage.
Uploading the model.
Registering the model with watsonx.ai.

Requirements and usage notes for custom foundation models

Deployable custom models must meet these requirements:

The file list for the model must contain a config.json file. The config.json is required to load the model in the inferencing runtime. Deployment service will mandate for existence of the file config.json in the foundation model content folder after it is uploaded to storage. See Planning to deploy a custom foundation model for steps on how to check for the file.
General-purpose models: the model must be in a safetensors format with the supported transformers library. If the model is not in safetensors format but is otherwise compatible, a conversion utility will make necessary changes as part of the model preparation process.
General-purpose models: the file list for the model must contain a tokenizer.json file. If your model directory does not contain this file, you can still try deploying the model, but you must manually override settings on your cluster. For
Time-series models: the model directory for time-series models, must contain the tsfm_config.json file. Time-series models that are hosted on Hugging Face (model_type: tinytimemixer) may not include this file. If the file is not there when the model is downloaded and deployed, forecasting will fail. To avoid forecasting issues, you'll have to perform an extra step when you download the model.

Note: If your model meets all the requirements, but still fails, see Troubleshooting.

Collecting the prerequisite details for a custom foundation model

As an example, for the falcon-40b model that is stored on Hugging Face, click Files and versions to view the file structure and check for config.json:

viewing repository and checking for config.json file

The example model uses a version of the falcon architecture.

json.config file showing that example model uses a version of the falcon architecture

This example model contains the tokenizer.json file and is in the .safetensors format:

repository directory showing file structure where tokenizer.json that is in the .safetensors format

Hardware requirements for custom foundation models

The standard supported hardware configurations to deploy custom foundation models are:

NVIDIA A100 GPUs with 80 GB RAM
NVIDIA H100 GPUs with 80 GB RAM
NVIDIA H200 GPUs with 141 GB RAM

If your GPU configuration is different (for example NVIDIA H100 GPUs with 40 GB RAM), you must create a custom hardware specification. For details, see Creating custom hardware specifications.

Restriction: You cannot use GPUs that are based on the Intel Gaudi 3 AI Accelerator architecture for custom foundation model deployments.