Deploying prompt tuned models

Prompt tuning is a technique that involves optimizing a set of input prompts to improve the performance of a language model on a specific task or dataset. The primary goal of prompt tuning is to adapt a pre-trained LLM to a specific task or domain without modifying the model's weights. You can deploy a prompt tuned model directly from the Tuning Studio.

Before you begin

  1. Create a prompt tuned model with the Tuning Studio. For more information, see Tuning Studio.

Deploying prompt tuned models from the Tuning Studio

When you use the Tuning Studio to create your prompt tuning experiment, you can deploy the resulting prompt tuned model directly.

To deploy your prompt tuned model from the Tuning Studio, follow these steps:

  1. From your project, open the tuning experiment for your prompt tuned model.
  2. From the list of tuned models, click New deployment for the prompt tuned model that you want to deploy.
  3. For the Deployment container, choose one of the following options:
    • This project: Deploys the tuned model and adds it to your project where you can test the tuned model. You can promote the tuned model deployment to a deployment space at any time. Choose this option if you want to do more testing of the tuned model before the model is used in production.
    • Deployment space: Promotes the tuned model to a deployment space and deploys the tuned model. A deployment space is separate from the project where you create the asset. This separation enables you to promote assets from multiple projects to a space, and deploy assets to more than one space. Choose this option when the tuned model is ready to be promoted for production use.
  4. Optional: Enable the option to view your deployment in your project or deployment space.
  5. Click Create.

Inferencing deployed model

You can test the prompt tuned model from one of the following:

  • Project: Useful when you want to test your model during the development and testing phases before moving it into production.
  • Deployment space: Useful when you want to test your model programmatically. From the API Reference tab, you can find information about the available endpoints and code examples. You can also submit input as text and choose to return the output or in a stream, as the output is generated. However, you cannot change the prompt parameters for the input text.
  • Prompt Lab: Useful when you want to use a tool with an intuitive user interface for prompting foundation models. You can customize the prompt parameters for each input. You can also save the prompt as a notebook so you can interact with it programmatically.

Follow these steps to inference your prompt tuned model deployment:

  1. From the Deployments tab of your project or deployment space, click the deployment name.
  2. Click the Test tab to input prompt text and get a response from the deployed asset.
  3. Enter test data in one the following formats, depending on the type of asset that you deployed:
    • Text: Enter text input data to generate a block of text as output.
    • Stream: Enter text input data to generate a stream of text as output.
    • JSON: Enter JSON input data to generate output in JSON format.
  4. Click Generate to get results based on your prompt.

Alternatively, to test your prompt tuned model deployment in the Prompt Lab, follow these steps:

  1. From the Deployments tab of your project or deployment space, click the deployment name.
  2. Click Open in Prompt Lab. If you are working in a deployment space, you are prompted to choose the project where you want to work with the model. Prompt Lab opens and the tuned model that you deployed is selected from the Model field.
  3. In the Try section, add a prompt to the Input field that follows the prompt pattern that your tuned model is trained to recognize, and then click Generate.

Retrieving the endpoint

Follow these steps to retrieve the endpoint URL for your prompt tuned model deployment. You need this URL to access the deployment from your applications:

  1. From the Deployments tab of your project or deployment space, click the deployment name.
  2. In the API Reference tab, find the private and public endpoint links and code snippets that you can use to include the endpoint details in an application.

Learn more

Parent topic: Deploying tuned models