Deploying custom foundation models in IBM watsonx.ai
You can upload and deploy a custom foundation model for use with watsonx.ai™ inferencing capabilities.
Deploying a custom foundation model provides the flexibility for you to implement the AI solutions that are right for your use case. The deployment process differs slightly depending on the source of your custom foundation model.
It is best to get the model directly from the model builder. One place to find new models is Hugging Face, a repository for open source foundation models used by many model builders.
Watch this video to see how to set up storage for the custom model, load the model into the storage, and register the model to make it available for deployment.
This video provides a visual method to learn the concepts and tasks in this documentation.
Deploying custom foundation models
You must prepare the custom foundation model and upload the model to PVC storage. After storing the model, you must register the model with watsonx.ai.
The following graphic shows the process followed by a system administrator:
When you complete the storage and registration process, the MLOps engineers can deploy the custom foundation mode and prompt engineers can use the deployed model for prompting. For more information, see Deploying custom foundation models in the IBM watsonx.ai and watsonx.governance™ documentation.
Preparing the model and uploading to PVC storage
The vLLM inferencing server provides an optimized inference runtime for serving many popular foundation model architectures. Certain models are not yet supported, though. To enable usage of these custom foundation models, you must add or build a custom inference runtime image for your custom foundation model.
- Review the supported architecture frameworks, hardware specifications, and software specifications for custom foundation models. See Planning to deploy a custom foundation model.
- Add or build a custom inference runtime image for your custom foundation model (only models that are not yet supported by the standard vLLM inference server). See Building a custom inference runtime image for your custom foundation model.
- Set up a storage repository for hosting the model and then upload the model to the storage repository. See Setting up storage and uploading the model.
- Register the custom foundation model to use with watsonx.ai. See Registering a custom foundation model
For a watsonx.ai lightweight engine installation, you follow different steps to add custom foundation models. For details, see Adding custom foundation models to watsonx.ai lightweight engine.