Deploying custom foundation models in IBM watsonx.ai
You can upload and deploy a custom foundation model for use with watsonx.ai™ inferencing capabilities.
Deploying a custom foundation model provides the flexibility for you to implement the AI solutions that are right for your use case. The deployment process differs slightly depending on the source of your custom foundation model.
It is best to get the model directly from the model builder. One place to find new models is Hugging Face, a repository for open source foundation models used by many model builders.
Watch this video to see how to set up storage for the custom model, load the model into the storage, and register the model to make it available for deployment.
This video provides a visual method to learn the concepts and tasks in this documentation.
Deploying custom foundation models
You must prepare the custom foundation model and upload the model to PVC storage. After storing the model, you must register the model with watsonx.ai.
The following graphic shows the process followed by a system administrator:
When you complete the storage and registration process, the MLOps engineers can deploy the custom foundation mode and prompt engineers can use the deployed model for prompting. For more information, see Deploying custom foundation models in the IBM watsonx.ai and watsonx.governance™ documentation.
Preparing the model and uploading to PVC storage
The vLLM inferencing server provides an optimized inference runtime for serving many popular foundation model architectures. Certain models are not yet supported, though. To enable usage of these custom foundation models, you must add or build a custom inference runtime image for your custom foundation model.
- Review the supported architecture frameworks, hardware specifications, and software specifications for custom foundation models. See Planning to deploy a custom foundation model.
- [OPTIONAL] Add or build a custom inference runtime image for your custom foundation model (only models that are not yet supported by the standard vLLM inference server). See Building a custom inference runtime image for your custom foundation model.
- Set up a storage repository for hosting the model and then upload the model to the storage repository. See Setting up storage and uploading the model.
- You can deploy custom foundation models (CFMs) in two ways:
- Registering a custom foundation model in a IBM watsonx.ai project or space. See Registering a custom foundation model. In this approach, the model is registered and deployed within a specific project or space. The deployment is scoped to that project or space, and access is limited accordingly.
- Registering custom foundation models for global deployment. See Registering custom foundation models for global deployment. In this approach, models are deployed globally by using theIBM watsonx.ai Inference Frameworks Manager (IFM) operator. Models that are deployed in this way can be shared across multiple projects and spaces within the cluster, rather than being tied to a single project or space. Global deployment is particularly useful for environments with limited GPU resources. Instead of deploying separate instances of the same model in each project or space, a single shared deployment can serve multiple consumers. Registering custom foundation models for global deployment helps optimize resource utilization and provides a scalable way for enterprises to use custom foundation models across the organization.
For a watsonx.ai lightweight engine installation, you follow different steps to add custom foundation models. For details, see Adding custom foundation models to watsonx.ai lightweight engine.