Deploying foundation models on-demand with Python client library

Deploy foundation models on-demand programmatically with watsonx.ai Python client library. Deploying a foundation model on-demand makes it available on dedicated hardware for the exclusive use of your organization. IBM provides a set of curated models that are available for you to deploy on-demand.

Before you begin

You must set up or enable your task credentials to deploy foundation models on-demand. For more information, see Managing task credentials.
Review supported foundation model architectures, deployment types, and other considerations for deploying a foundation model on-demand. For more information, see Deploying foundation models on-demand.
Review the model card to verify what modalities (text, image, audio, or video) the model supports.

Deploying foundation models on-demand with Python client library

To deploy a foundation model on-demand by using the Python client library, create a model asset in the repository by creating the metadata for your asset and storing the model. Then, retrieve the asset ID and create an online deployment for the asset.

Creating model asset in the watsonx.ai repository

You must create an asset for the foundation model that you want to deploy on-demand in the watsonx.ai service repository. To store the model as an asset in the repository, create the metadata for your asset. After creating the metadata for your model asset, you can store the model in your repository.

The following code snippet shows how to create metadata for your foundation model asset in the watsonx.ai repository:

metadata = {
    client.repository.ModelMetaNames.NAME: "curated FM asset",
    client.repository.ModelMetaNames.TYPE: client.repository.ModelAssetTypes.CURATED_FOUNDATION_MODEL_1_0,
}

After creating the metadata for your foundation model asset, store the model by using the client.repository.store_model() function:

stored_model_details = client.repository.store_model(model='ibm/granite-13b-chat-v2-curated', meta_props=metadata)

Retrieving the identifier for your asset

When the foundation model asset is stored in the watsonx.ai repository, you can retrieve the asset ID for your model. The asset ID is required to create the deployment for your foundation model.

You can list all stored curated foundation models and filter them by framework type:

client.repository.list(framework_filter='curated_foundation_model_1.0')

The following code snippet shows how to retrieve the ID for your foundation model asset:

stored_model_asset_id = client.repository.get_model_id(stored_model_details)

Deploying foundation model on-demand

To create a new deployment for a foundation model that can be deployed on-demand with the Python client library, you must define a meta_props dictionary with the metadata that contains the details for your deployment.

You can optionally overwrite the model parameters when you create the metadata for your asset. To overwrite the model parameters, pass a dictionary with new parameters values in the FOUNDATION_MODEL field.

The following sample shows how to create an online deployment for your foundation model by using the watsonx.ai Python client library:

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "curated_fm_deployment",
    client.deployments.ConfigurationMetaNames.DESCRIPTION: "Testing deployment using curated foundation model",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.SERVING_NAME: "test_curated_fm_01"
}
deployment_details = client.deployments.create(stored_model_asset_id, meta_props)
deployment_id = client.deployments.get_uid(deployment_details)
print("The deployment id:", deployment_id)

Testing deployed foundation model on-demand with Python client library

You can test a foundation model that is deployed on-demand for online inferencing from the Python client library, as shown in the following code samples:

Simple conversation:

from ibm_watsonx_ai.metanames import GenChatParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models import ModelInference

generate_params = {
    GenParams.MAX_COMPLETION_TOKENS: 100,
    GenParams.TEMPERATURE: 0.1
}

model = ModelInference(
    deployment_id=deployment_id,
    params=generate_params,
    api_client=client
)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
]
generated_response = model.chat(messages=messages)

# Print full response
print(generated_response)

# Print only content
print(generated_response["choices"][0]["message"]["content"])

Inferencing with audio input:

import base64
from ibm_watsonx_ai.metanames import GenChatParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models import ModelInference

generate_params = {
    GenParams.MAX_COMPLETION_TOKENS: 100,
    GenParams.TEMPERATURE: 0.1
}

model = ModelInference(
    deployment_id=deployment_id,
    params=generate_params,
    api_client=client
)
# Path to your MP3 file
file_path = "sample_audio_file.mp3"

# Read file as binary and encode to Base64
with open(file_path, "rb") as mp3_file:
    encoded_bytes = base64.b64encode(mp3_file.read())

encoded_string = encoded_bytes.decode("utf-8")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Please extract the text from the mp3 file"},
            {
                "type" : "input_audio",
                "input_audio": {
                    "data": encoded_string,
                    "format": "mp3"
                }
            }
        ],
    }
]
generated_response = model.chat(messages=messages)

# Print full response
print(generated_response)

# Print only content
print(generated_response["choices"][0]["message"]["content"])

Inferencing with video input:

import base64
from ibm_watsonx_ai.metanames import GenChatParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models import ModelInference

generate_params = {
    GenParams.MAX_COMPLETION_TOKENS: 100,
    GenParams.TEMPERATURE: 0.1
}

model = ModelInference(
    deployment_id=deployment_id,
    params=generate_params,
    api_client=client
)
# Path to your MP3 file
file_path = "sample_video_file.mp4"

# Read file as binary and encode to Base64
with open(file_path, "rb") as mp4_file:
    encoded_bytes = base64.b64encode(mp4_file.read())

encoded_string = encoded_bytes.decode("utf-8")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Please extract the text from the mp4 file"},
            {
                "type": "video_url",
                "video_url": {
                    "url": "data:video/mp4;base64," + encoded_string,
                }
            }
        ],
    }
]
generated_response = model.chat(messages=messages)

# Print full response
print(generated_response)

# Print only content
print(generated_response["choices"][0]["message"]["content"])

For more details, see the ibm-watsonx-ai-library SDK documentation

Managing deployed on-demand foundation models with Python client library

Update, scale, or delete your foundation model that are deployed on-demand with the Python client library.

Retrieving deployment details

To retrieve the details of a deployment, use the get_details() function of the Python client library.

The following code sample shows how to use the Python client library to retieve the details of foundation models that are deployed on-demand:

deployment_details = deployed_model.get_details()

Alternatively, you can retrieve the details of a specific deployment by passing the deployment_id, as shown in the following code sample.

deployment_details = client.deployments.get_details(deployment_id)

Updating the deployment

You can update the deployment for your foundation model that is deployed on-demand.

The following code sample shows how to update the deployment details from the Python client library:

metadata = {client.deployments.ConfigurationMetaNames.NAME: "Deployment on Demand v2"}
updated_deployment_details = client.deployments.update(deployment_id, changes=metadata)

Scaling the deployment

You can deploy only one instance of a foundation model on-demand model in a deployment space. To handle increased demand, you can scale the deployment by creating additional copies.

This code sample shows how to scale the number of replicas for your deployment by updating the number of hardware requests from the Python client library:

metadata = {client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {"num_nodes": 1 }}
deployment_details = client.deployments.update(deployment_id, changes=metadata)

Deleting the deployment

You can delete your deployed foundation model when you no longer need it to stop the billing charges.

The following code sample shows how to delete a foundation model deployed on-demand with the Python client library:

client.deployments.delete(deployment_id)