Planning to deploy a custom foundation model in watsonx.ai

Review the considerations and requirements for deploying a custom foundation model for inferencing with watsonx.ai.

As you prepare to deploy a custom foundation model, review these requirements:

Role requirements for the tasks that are associated with deploying a custom foundation model

The system administrator must perform the following tasks:
  1. Building a custom inference runtime image (if required)
  2. Setting up storage.
  3. Uploading the model.
  4. Registering the model with watsonx.ai.

Requirements and usage notes for custom foundation models

Deployable custom models must meet these requirements:

  • The file list for the model must contain a config.json file. The config.json is required to load the model in the inferencing runtime. Deployment service will mandate for existence of the file config.json in the foundation model content folder after it is uploaded to storage. See Planning to deploy a custom foundation model for steps on how to check for the file.
  • General-purpose models: the model must be in a safetensors format with the supported transformers library. If the model is not in safetensors format but is otherwise compatible, a conversion utility will make necessary changes as part of the model preparation process.
  • General-purpose models: the file list for the model must contain a tokenizer.json file. If your model directory does not contain this file, you can still try deploying the model, but you must manually override settings on your cluster. For
  • Time-series models: the model directory for time-series models, must contain the tsfm_config.json file. Time-series models that are hosted on Hugging Face (model_type: tinytimemixer) may not include this file. If the file is not there when the model is downloaded and deployed, forecasting will fail. To avoid forecasting issues, you'll have to perform an extra step when you download the model.
Note: If your model meets all the requirements, but still fails, see Troubleshooting.

Collecting the prerequisite details for a custom foundation model

As an example, for the falcon-40b model that is stored on Hugging Face, click Files and versions to view the file structure and check for config.json:

viewing repository and checking for config.json file

The example model uses a version of the falcon architecture.

json.config file showing that example model uses a version of the falcon architecture

This example model contains the tokenizer.json file and is in the .safetensors format:

repository directory showing file structure where tokenizer.json that is in the .safetensors format

Hardware requirements for custom foundation models

The standard supported hardware configurations to deploy custom foundation models are:
  • NVIDIA A100 GPUs with 80 GB RAM
  • NVIDIA H100 GPUs with 80 GB RAM
  • NVIDIA H200 GPUs with 141 GB RAM
If your GPU configuration is different (for example NVIDIA H100 GPUs with 40 GB RAM), you must create a custom hardware specification. For details, see Creating custom hardware specifications.
Restriction: You cannot use GPUs that are based on the Intel Gaudi 3 AI Accelerator architecture for custom foundation model deployments.