Planning to deploy a custom foundation model in watsonx.ai
Review the considerations and requirements for deploying a custom foundation model for inferencing with watsonx.ai.
As you prepare to deploy a custom foundation model, review these requirements:
- Consider the type of model that you are deploying. Tasks differ slightly depending on whether you are downloading a model from a public repository like Hugging Face or a model located in your environment. For each deployment task, follow the steps for your scenario.
- Review the Role requirements for the tasks that are associated with deploying a custom foundation model.
- Review the supported architectures for custom foundation models to make sure that your model is compatible.
- Review whether or not your model requires a custom hardware specification. See Hardware requirements for custom foundation models.
- Verify the list of modalities (text, audio, video, and image) that can be used when inferencing your model.
Role requirements for the tasks that are associated with deploying a custom foundation model
The system administrator must perform the following tasks:
- Building a custom inference runtime image (if required)
- Setting up storage.
- Uploading the model.
- Registering the model with watsonx.ai.
Requirements and usage notes for custom foundation models
Deployable custom models must meet these requirements:
- The file list for the model must contain a
config.jsonfile. Theconfig.jsonis required to load the model in the inferencing runtime. Deployment service will mandate for existence of the fileconfig.jsonin the foundation model content folder after it is uploaded to storage. See Planning to deploy a custom foundation model for steps on how to check for the file. - General-purpose models: the model must be in a
safetensorsformat with the supportedtransformerslibrary. If the model is not insafetensorsformat but is otherwise compatible, a conversion utility will make necessary changes as part of the model preparation process. - General-purpose models: the file list for the model must contain a
tokenizer.jsonfile. If your model directory does not contain this file, you can still try deploying the model, but you must manually override settings on your cluster. For - Time-series models: the model directory for time-series models, must contain the
tsfm_config.jsonfile. Time-series models that are hosted on Hugging Face (model_type:tinytimemixer) may not include this file. If the file is not there when the model is downloaded and deployed, forecasting will fail. To avoid forecasting issues, you'll have to perform an extra step when you download the model.
Note: If your model meets all the requirements, but still fails, see Troubleshooting.
Collecting the prerequisite details for a custom foundation model
As an example, for the falcon-40b model that is stored on Hugging Face, click
Files and versions to view the file structure and check for
config.json:

The example model uses a version of
the falcon architecture.

This
example model contains the tokenizer.json file and is in the
.safetensors format:

Hardware requirements for custom foundation models
The standard supported hardware configurations to deploy custom foundation models are:
- NVIDIA A100 GPUs with 80 GB RAM
- NVIDIA H100 GPUs with 80 GB RAM
- NVIDIA H200 GPUs with 141 GB RAM
Restriction: You cannot use GPUs that are based on the Intel Gaudi 3 AI
Accelerator architecture for custom foundation model
deployments.