Planning to deploy a custom foundation model
Review the considerations and requirements for deploying a custom foundation model for inferencing with watsonx.ai.
Service The required watsonx.ai service and other supplemental services are not available by default. An administrator must install these services on the IBM Cloud Pak for Data platform. To determine whether a service is installed, open the Services catalog and check whether the service is enabled.
Deploying custom foundation model is available starting with Cloud Pak for Data 4.8.4.
As you prepare to deploy a custom foundation model, review these requirements.
- Consider the type of model that you are deploying. Tasks differ slightly depending on whether you are downloading a model from a public repository like Hugging Face or a model from located in environment. For each deployment task, follow the steps for your scenario.
- Review the role requirements for the tasks that are associated with deploying a custom foundation model.
Task Role Set up storage Cluster administrator Upload model Cluster administrator Register the model with watsonx.ai Cluster administrator Create the model asset watsonx.ai user Deploy the custom model watsonx.ai user Prompt the deployed model watsonx.ai user - Confirm that the cluster where you are uploading the custom foundation model does not have MIG support in Red Hat OpenShift enabled. Deployment of a custom foundation model is not supported on a cluster with MIG enabled.
- Review the supported architectures for custom foundation models to make sure that your model is compatible.
- Collect the details required as prerequisites for deploying a custom foundation model.
Collecting the prerequisite details for a custom foundation model
-
Check for the existence of the file
config.json
in the foundation model content folder. Theconfig.json
is required to load the model in the Text Generation Inference Server (TGIS) runtime. Deployment service will mandate for existence of the fileconfig.json
in the foundation model content folder after it is uploaded to the PVC. For example, for the falcon-40b model stored on Hugging Face, click Files and versions to view the file structure and check forconfig.json
. -
Open the
config.json
file to confirm that the foundation model uses a supported architecture. The example model uses a version of the supportedfalcon
architecture. -
View the list of files for the foundation model to check for the file
tokenizer.json
and that the model content is in.safetensors
format. The example model shown here is in Pytorch format, notsafetensors
.
safetensors
format and does not include the tokenizer.json
file, the necessary conversions are performed when the model is downloaded and set up.
Next steps
Set up storage and upload the custom foundation model
Parent topic: Deploying a custom foundation model