Deploying PEFT models with REST API
To deploy a PEFT model with watsonx.ai REST API, start by creating and deploying the base foundation model asset, followed by creating and deploying the LoRA adapter model asset, which enables the fine-tuned model to be deployed and used for online inferencing. This process allows for the deployment of fine-tuned models with PEFT techniques, enabling real-time predictions.
Before you begin
- Review requirements for deploying PEFT models, including supported models, hardware and software requirements, and deployment types. For more information, see Requirements for deploying PEFT models with REST API.
- You must add the IBM shipped foundation model to watsonx.ai. For more information, see Adding foundation models to IBM watsonx.ai in the IBM Software Hub documentation.
- You must authenticate by generating and entering your API key.
Creating the base foundation model asset
Create a Watson Machine Learning model asset by providing the fine-tune model details by using the REST API.
The following code sample shows how to create the base foundation model asset for a supported foundation model:
curl -X POST "https://<HOST>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"name":"base foundation model asset",
"space_id":"<space_id>",
"foundation_model":{
"model_id":"ibm/granite-3-1-8b-base"
},
"type":"base_foundation_model_1.0",
"software_spec":{
"name":"watsonx-cfm-caikit-1.1"
}
}'
curl -X POST "{region}.ml.cloud.ibm.com/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"name":"base foundation model asset",
"space_id":"<space_id>",
"foundation_model":{
"model_id":"ibm/granite-3-1-8b-base"
},
"type":"base_foundation_model_1.0",
"software_spec":{
"name":"watsonx-cfm-caikit-1.1"
}
}'
Deploying the base foundation model asset
When you create an online deployment for your base foundation model, you must set the enable_lora
parameter to true
in the JSON payload so that you can deploy the LoRA or QLoRA adapters by using the base foundation
model.
The following code sample shows how to create an online deployment for the base foundation model asset with REST API:
curl -X POST "https://<HOST>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"asset":{
"id": "<asset_id>" // WML base foundation model asset
},
"online":{
"parameters":{
"serving_name": "basefmservingname",
"foundation_model": {
"max_batch_weight": 10000,
"max_sequence_length": 8192,
"enable_lora": true,
"max_gpu_loras": 8,
"max_cpu_loras": 16,
"max_lora_rank": 32
}
}
},
"hardware_spec": { // Only one, of "id" or "name" must be set.
"id": "<hardware_spec_id>",
"num_nodes": 1
},
"description": "Testing deployment using base foundation model",
"name":"base_fm_deployment",
"space_id": "<space_id>" // Either "project_id" (or) "space_id". Only "one" is allowed
}'
Running this code returns the deployment ID in the metadata.id
field.
Polling for deployment status
Poll for the deployment status by using the deployment ID and wait until the state changes from initializing
to ready
.
curl -X GET "https://<HOST>/ml/v4/deployments/<deployment_id>?version=2024-01-29&space_id=<space_id>" \
-H "Authorization: Bearer <token>"
After successful creation of the deployment, the polling status returns the deployed_asset_type
as base_foundation_model
.
"deployed_asset_type": "base_foundation_model"
Creating the LoRA or QLoRA adapter model asset
While training the model, if you did not enable the auto_update_model
option, you must create a repository asset for the LoRA or QLoRA adapters.
If the auto_update_model
option was enabled during training, the LoRA adapter model asset is already created in the Watson Machine Learning repository. In that case, you can proceed with creating a deployment for the LoRA adapter
model asset.
After the training process for fine-tuning the LoRA or QLoRA adapter completes, the content of the LoRA or QLoRA adapter is stored in the PVC. The model content is written to a directory named after the model in the PVC. You must provide the
PVC name as the foundation_model.model_id
in the model creation input payload.
The following code sample shows how to create the LoRA adapter model asset:
curl -X POST "https://<HOST>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"name":"lora adapter model asset",
"space_id":"<space_id>",
"foundation_model":{
"model_id":"finetunedadapter-385316ea-1d25-4274-9550-e387a1355241"
},
"type":"lora_adapter_1.0",
"software_spec":{
"name":"watsonx-cfm-caikit-1.1"
},
"training":{
"base_model":{
"model_id":"ibm/granite-3-1-8b-base"
},
"task_id":"summarization",
"fine_tuning": {
"peft_parameters": {
"type": "lora",
"rank": 8
}
}
}
}'
Deploying the LoRA or QLoRA adapter model asset
Use the deployed base foundation model to deploy the LoRA adapters as an additional layer on the base foundation model.
The following code sample shows how to create a deployment for the LoRA adapter model:
curl -X POST "https://<HOST>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer <token>" \
-H "content-type: application/json" \
--data '{
"asset":{
"id":<asset_id> // WML LoRA adapter model asset
},
"online":{
"parameters":{
"serving_name":"lora_adapter_dep"
}
},
"base_deployment_id": "<replace with your WML base foundation model deployment ID>",
"description": "Testing deployment using lora adapter model",
"name":"lora_adapter_deployment",
"space_id":<space_id> // Either "project_id" (or) "space_id". Only "one" is allowed
}'
Running this code returns the deployment ID in the metadata.id
field.
Polling for deployment status
Poll for the deployment status by using the deployment ID and wait until the state changes from initializing
to ready
.
curl -X GET "https://<HOST>/ml/v4/deployments/<deployment_id>?version=2024-01-29&space_id=<space_id>" \
-H "Authorization: Bearer <token>"
After successful creation of the deployment, the polling status returns the deployed_asset_type
as base_foundation_model
.
"deployed_asset_type": "base_foundation_model"
Learn more
Parent topic: Deploying Parameter-Efficient Fine-Tuned (PEFT) models