Deploying base foundation models tuned with PEFT techniques
You can deploy base foundation models that are hosted by IBM and trained with PEFT techniques.
Deploying base foundation models tuned with PEFT techniques
Before you deploy your fine-tuned model, you must train your model to adjust the model's parameters based on the new dataset. This enables the model to adapt and specialize for the target domain. Fine-tuning the model with one of the PEFT techniques (LoRA or QLoRA) creates an adapter, which is used for deployment.
After fine-tuning the model, deploy the model in the production environment to get an endpoint for inferencing that can be used with your application. To deploy the LoRA or QLoRA adapter, start by deploying the base foundation model asset.
Use the deployed model to make predictions or generate text by providing input data to leverage the knowledge and adaptability gained during fine-tuning.
Tasks for deploying fine-tuned PEFT models
To deploy your fine-tune trained with PEFT technniques, you must follow a programmatic approach to deploy with watsonx.ai REST API:
- Review requirements: Review supported architectures and hardware and software requirements to deploy fine-tuned models that are trained with a PEFT technique by using watsonx.ai.
- Optional: Create a customized hardware specification: If you have a GPU configuration that is different from the predefined GPU configurations available for deployment, create a customized hardware specification.
- Deploy the PEFT model: To deploy your PEFT model with watsonx.ai REST API, follow these steps:
a. Create repository asset for base foundation model: Create a repository asset for the base foundation model that you want to fine-tune with a PEFT technique.
b. Deploy base foundation model: Create an online deployment for your base foundation model for a supported architecture.
c. Optional: Create repository asset for LoRA or QLoRA model asset: Ifauto_update_model
option was not enabled during training, create a repository asset for the LoRA or QLoRA adapters. For more information, see Tuning a foundation model.
d. Deploy the respository asset for LoRA or QLoRA model: Deploy the trained model with REST API. - Inference the deployed PEFT model: Test your deployed PEFT model for online inferencing.
- Manage deployed PEFT model: Access, update, scale, or delete the deployment details.
Learn more
Parent topic: Deploying Parameter-Efficient Fine-Tuned models