Inferencing models through the model gateway

You can send requests to models using OpenAI-compatible endpoints through the model gateway using either the REST API or the OpenAI Python SDK. You can generate text, create chat-based responses, produce embeddings, and develop scalable solutions across multiple models tailored to your specific use cases.

Using the Rest API

The model gateway supports the following:

The endpoints expose an OpenAI-compatible but provider-agnostic API for the model gateway, which are used to route model requests. The gateway supports all of the preceding endpoints, however, some model providers may not support a specific endpoint's service in their backend. Trying to use a configured model provider with an unsupported endpoint service results in an error response.

Listing providers and models

You can list both the providers and models that you configured.

To list all configured model providers, use the following command:

curl -sS ${GATEWAY_URL}/v1/providers \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

To list all models enabled for a specific provider, use the following command:

curl -sS "${GATEWAY_URL}/v1/providers/${PROVIDER_UUID}/models" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

To list all models enabled (across all the configured providers), use the following command:

curl -sS "${GATEWAY_URL}/v1/models" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

Chat completions

To use the /v1/chat/completions endpoint, see the following example:

curl ${GATEWAY_URL}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "model": "azure/gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "Please explain everything in a way a 5th grader could understand—simple language, clear steps, and easy examples."
      },
      {
        "role": "user",
        "content": "Can you explain what TLS is and how I can use it?"
      }
    ]
  }'

Text completions/generation

To use the /v1/completions endpoint, see the following example:

curl ${GATEWAY_URL}/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "model": "ibm/llama-3-3-70b-instruct",
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'

Embeddings generation - /v1/embeddings

To use the /v1/embeddings endpoint, see the following example:

curl ${GATEWAY_URL}/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-3-large",
    "encoding_format": "float"
  }'

Using the OpenAI SDK

The model gateway maintains compatibility with the OpenAI API and as a result, the OpenAI SDKs can be used to interact with the gateway service by passing the $GATEWAY_URL rather than the OpenAI URL and IBM Cloud API key instead of the OpenAI API key.

To use the OpenAI Python SDK to make a chat completions request to the model gateway, see the following example:

import os
from openai import OpenAI

# Note that since we exported GATEWAY_URL=https://us-south.ml.cloud.ibm.com/ml/gateway/, we must specify the "/v1".
# This is because the client will invoke OpenAI child paths like "/chat/completions" not "/v1/chat/completions".
gateway_url = os.getenv("GATEWAY_URL") + "v1"
ibm_cloud_api_key = os.getenv("IBM_CLOUD_APIKEY")

print("Using GATEWAY_URL:", gateway_url)
print("Using IBM_CLOUD_APIKEY:", ibm_cloud_api_key)

# Customize client to connect to the model gateway using the IBM Cloud API key.
client = OpenAI(
    base_url=gateway_url,
    api_key=ibm_cloud_api_key,
)

# Create a Chat Completions request to the model gateway.
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(completion.choices[0].message)