Deploying custom foundation models in IBM watsonx.ai
You can upload and deploy a custom foundation model for use with watsonx.ai™ inferencing capabilities.
Deploying a custom foundation model provides the flexibility for you to implement the AI solutions that are right for your use case. The deployment process differs slightly depending on the source of your custom foundation model.
It is best to get the model directly from the model builder. One place to find new models is Hugging Face, a repository for open source foundation models used by many model builders.
Watch this video to see how to set up storage for the custom model, load the model into the storage, and register the model to make it available for deployment.
This video provides a visual method to learn the concepts and tasks in this documentation.
Deploying custom foundation models
You must prepare the custom foundation model and upload the model to PVC storage. After storing the model, you must register the model with watsonx.ai.
The following graphic shows the process followed by a system administrator:
When you complete the storage and registration process, the MLOps engineers can deploy the custom foundation mode and prompt engineers can use the deployed model for prompting. For more information, see Deploying custom foundation models in the IBM watsonx.ai and watsonx.governance™ documentation.
Preparing the model and uploading to PVC storage
The TGIS and vLLM inferencing servers provide optimized inference runtimes for serving many popular foundation model architectures. However certain models are not yet supported by these inference servers. To enable usage of these custom foundation models, you must add or build a custom inference runtime image for your custom foundation model.
- Review the supported architecture frameworks, hardware specifications, and software specifications for custom foundation models. See Planning to deploy a custom foundation model.
- Add or build a custom inference runtime image for your custom foundation model (only models that are not yet supported by standard TGIS and vLLM inference servers). See Building a custom inference runtime image for your custom foundation model.
- Set up a storage repository for hosting the model and then upload the model to the storage repository. See Setting up storage and uploading the model.
- Register the custom foundation model to use with watsonx.ai
For a watsonx.ai lightweight engine installation, you follow different steps to add custom foundation models. For details, see Adding custom foundation models to watsonx.ai lightweight engine.