Deploying custom foundation models in IBM watsonx.ai

You can upload and deploy a custom foundation model for use with watsonx.ai™ inferencing capabilities.

In addition to working with foundation models that are curated by IBM, you can now upload and deploy your own foundation models. After the models are deployed and registered with watsonx.ai, create prompts that inference the custom models from the Prompt Lab.

Restriction: You cannot use GPUs that are based on the Intel Gaudi 3 AI Accelerator architecture for custom foundation model deployments.

Deploying a custom foundation model provides the flexibility for you to implement the AI solutions that are right for your use case. The deployment process differs slightly depending on the source of your custom foundation model.

It is best to get the model directly from the model builder. One place to find new models is Hugging Face, a repository for open source foundation models used by many model builders.

Watch this video to see how to set up storage for the custom model, load the model into the storage, and register the model to make it available for deployment.

This video provides a visual method to learn the concepts and tasks in this documentation.

Deploying custom foundation models

You must prepare the custom foundation model and upload the model to PVC storage. After storing the model, you must register the model with watsonx.ai.

The following graphic shows the process followed by a system administrator:

diagram showing the process that a system admin should take to deploy a custom foundation model

When you complete the storage and registration process, the MLOps engineers can deploy the custom foundation mode and prompt engineers can use the deployed model for prompting. For more information, see Deploying custom foundation models in the IBM watsonx.ai and watsonx.governance™ documentation.

Preparing the model and uploading to PVC storage

Note:

The vLLM inferencing server provides an optimized inference runtime for serving many popular foundation model architectures. Certain models are not yet supported, though. To enable usage of these custom foundation models, you must add or build a custom inference runtime image for your custom foundation model.

To prepare the model and upload it to PVC storage, the system administrator must perform the following tasks:

Review the supported architecture frameworks, hardware specifications, and software specifications for custom foundation models. See Planning to deploy a custom foundation model.
Add or build a custom inference runtime image for your custom foundation model (only models that are not yet supported by the standard vLLM inference server). See Building a custom inference runtime image for your custom foundation model.
Set up a storage repository for hosting the model and then upload the model to the storage repository. See Setting up storage and uploading the model.
Register the custom foundation model to use with watsonx.ai. See Registering a custom foundation model

For a watsonx.ai lightweight engine installation, you follow different steps to add custom foundation models. For details, see Adding custom foundation models to watsonx.ai lightweight engine.