Planning to deploy a custom foundation model

Review the considerations and requirements for deploying a custom foundation model for inferencing with watsonx.ai.

Service The required watsonx.ai service and other supplemental services are not available by default. An administrator must install these services on the IBM Cloud Pak for Data platform. To determine whether a service is installed, open the Services catalog and check whether the service is enabled.

Deploying custom foundation model is available starting with Cloud Pak for Data 4.8.4.

As you prepare to deploy a custom foundation model, review these requirements.

Consider the type of model that you are deploying. Tasks differ slightly depending on whether you are downloading a model from a public repository like Hugging Face or a model from located in environment. For each deployment task, follow the steps for your scenario.

Review the role requirements for the tasks that are associated with deploying a custom foundation model.

Task	Role
Set up storage	Cluster administrator
Upload model	Cluster administrator
Register the model with watsonx.ai	Cluster administrator
Create the model asset	watsonx.ai user
Deploy the custom model	watsonx.ai user
Prompt the deployed model	watsonx.ai user

Confirm that the cluster where you are uploading the custom foundation model does not have MIG support in Red Hat OpenShift enabled. Deployment of a custom foundation model is not supported on a cluster with MIG enabled.
Review the supported architectures for custom foundation models to make sure that your model is compatible.
Collect the details required as prerequisites for deploying a custom foundation model.

Collecting the prerequisite details for a custom foundation model

Check for the existence of the file config.json in the foundation model content folder. The config.json is required to load the model in the Text Generation Inference Server (TGIS) runtime. Deployment service will mandate for existence of the file config.json in the foundation model content folder after it is uploaded to the PVC. For example, for the falcon-40b model stored on Hugging Face, click Files and versions to view the file structure and check for config.json.
Open the config.json file to confirm that the foundation model uses a supported architecture. The example model uses a version of the supported falcon architecture.
View the list of files for the foundation model to check for the file tokenizer.json and that the model content is in .safetensors format. The example model shown here is in Pytorch format, not safetensors.

Note:

If the model is not in safetensors format and does not include the tokenizer.json file, the necessary conversions are performed when the model is downloaded and set up.

Next steps

Set up storage and upload the custom foundation model

Parent topic: Deploying a custom foundation model