Deploying custom foundation models tuned with PEFT techniques with REST API
To deploy a custom foundation model that is tuned with a PEFT technique, start by creating a repository asset for the custom foundation model, followed by deploying the asset. Then, create a repository asset for the model that is trained with a PEFT technique and deploy the model.
Before you begin
- The administrator must store the LLM to PVC storage and register the model with watsonx.ai. For more information, see Deploying custom foundation models in IBM watsonx.ai in the IBM Software Hub documentation.
- The custom foundation model must be trained with a PEFT technique. For more information, see Tuning a foundation model programmatically.
- You must authenticate by generating and entering your API key.
Creating a repository asset for the custom foundation model
Create a Watson Machine Learning repository asset for the custom foundation model by providing the model details.
The following code sample shows how to create the custom foundation model asset by using REST API:
curl -X POST "https://<HOST>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"name":"custom foundation model asset",
"space_id":"<space_id>",
"foundation_model":{
"model_id":"ibm/granite-3-1-8b-base" // use the same model_id you used to register the CFM in watsonxaiifm-cr
},
"type":"custom_foundation_model_1.0",
"software_spec":{
"name":"watsonx-cfm-caikit-1.1"
}
}'
Deploying the custom foundation model asset
When you create an online deployment for your custom foundation model, you must set the enable_lora
parameter to true
in the JSON payload so that you can deploy the LoRA or QLoRA adapters by using the custom foundation
model.
The LoRA or QLoRA parameter values required to create the custom foundation model deployment can be set by the administrator or MLOps engineer.
For example, the admin can set the values of parameters such as max_gpu_loras
, max_cpu_loras
, max_lora_rank
, as shown by the following code sample:
custom_foundation_models:
- location:
pvc_name: ibm-granite-3-1-8b-base-pvc
model_id: ibm-granite/granite-3.1-8b-base
parameters:
- default: true
name: enable_lora
- default: 10
name: max_gpu_loras
- default: 8
name: max_cpu_loras
- default: 4
name: max_lora_rank
If the administrator sets the values of the custom foundation model parameters after registering the model in watsonxaiifm-cr
, you can override the default values set by the admin by specifying the updates values in the deployment
payload.
The following code sample shows how to create an online deployment for the custom foundation model asset with REST API:
curl -X POST "https://<HOST>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"asset":{
"id": "<asset_id>" // WML base foundation model asset
},
"online":{
"parameters":{
"serving_name": "cfmservingname",
"foundation_model": {
"max_batch_weight": 10000,
"max_sequence_length": 8192,
"enable_lora": true,
"max_gpu_loras": 8,
"max_cpu_loras": 16,
"max_lora_rank": 32
}
}
},
"hardware_spec": { // Only one, of "id" or "name" must be set.
"id": "<hardware_spec_id>",
"num_nodes": 1
},
"description": "Testing deployment using base foundation model",
"name":"custom_fm_deployment",
"space_id": "<space_id>" // Either "project_id" (or) "space_id". Only "one" is allowed
}'
Running this code returns the deployment ID in the metadata.id
field, as shown here:
{
"entity": {
"asset": {
"id": "d92c00ab-5242-4861-8410-813529cfcdf5"
},
"custom": {
},
"deployed_asset_type": "custom_foundation_model",
"description": "Granite Base Model Deployment",
"hardware_spec": {
"id": "e5ebf6cd-a6e0-4a90-8326-c743b59a752c",
"name": "custom_hw_spec",
"num_nodes": 1
}
....
}
}
Polling for deployment status
Poll for the deployment status by using the deployment ID and wait until the state changes from initializing to ready.
curl -X GET "https://<HOST>/ml/v4/deployments/<deployment_id>?version=2024-01-29&space_id=<space_id>" \
-H "Authorization: Bearer <token>"
After successful creation of the deployment, the polling status returns the deployed_asset_type
as custom_foundation_model
.
"deployed_asset_type": "custom_foundation_model"
Creating the LoRA or QLoRA adapter model asset
While training the model, if you did not enable the auto_update_model
option, you must create a repository asset for the LoRA or QLoRA adapters.
If the auto_update_model
option was enabled during training, the LoRA adapter model asset is already created in the Watson Machine Learning repository. In that case, you can proceed with creating a deployment for the LoRA adapter
model asset.
After the training process for fine-tuning the LoRA or QLoRA adapter completes, the content of the LoRA or QLoRA adapter is stored in the PVC. The model content is written to a directory named after the model model in the PVC. You must provide
the PVC name as the foundation_model.model_id
in the model creation input payload.
The following code sample shows how to create the LoRA adapter model asset:
curl -X POST "https://<HOST>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"name":"lora adapter model asset",
"space_id":"<space_id>",
"foundation_model":{
"model_id":"finetunedadapter-385316ea-1d25-4274-9550-e387a1355241" // this is the name of the pvc created post training
},
"type":"lora_adapter_1.0",
"software_spec":{
"name":"watsonx-cfm-caikit-1.1"
},
"training":{
"base_model":{
"model_id":"ibm/granite-3-1-8b-base" // your cfm model_id as registered in the watsonxaiifm-cr
},
"task_id":"summarization",
"fine_tuning": {
"peft_parameters": {
"type": "lora",
"rank": 8
},
"verbalizer": "<Replace with verbalizer using for Fine tuning>", // For example: Input: {input} Output:
}
}
}'
Deploying the LoRA or QLoRA adapter model asset
Use the deployed custom foundation model to deploy the LoRA adapters as an additional layer on the custom foundation model.
The following code sample shows how to create a deployment for the LoRA adapter model:
curl -X POST "https://<HOST>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"asset":{
"id":<asset_id> // WML lora adapter model asset
},
"online":{
"parameters":{
"serving_name":"lora_adapter_dep"
}
},
"base_deployment_id": "<replace with your WML custom foundation model deployment ID>",
"description": "Testing deployment using lora adapter model",
"name":"lora_adapter_deployment",
"space_id":<space_id> // Either "project_id" (or) "space_id". Only "one" is allowed
}'
Running this code returns the deployment ID in the metadata.id field, as shown here:
"deployed_asset_type": "lora_adapter",
"description": "Lora Trained CFM Granite Deployment",
"name": "lora_adapter_deployment_medical",
"online": {
"parameters": {
}
Polling for deployment status
Poll for the deployment status by using the deployment ID and wait until the state changes from initializing to ready.
curl -X GET "https://<HOST>/ml/v4/deployments/<deployment_id>?version=2024-01-29&space_id=<space_id>" \
-H "Authorization: Bearer <token>"
After successful creation of the deployment, the polling status returns the deployed_asset_type
as lora_adapter
.
"deployed_asset_type": "lora_adapter"
Learn more
Parent topic: Deploying fine-tuned custom foundation models