Inferencing models through the model gateway

You can send requests to models using OpenAI-compatible endpoints through the model gateway using either the REST API or the OpenAI Python SDK. You can generate text, create chat-based responses, produce embeddings, and develop scalable solutions across multiple models tailored to your specific use cases.

Using the Rest API

The model gateway supports the following:

Listing providers and models
Chat completions (supports streaming)
Text completions/generations (supports streaming)
Embeddings generation

The endpoints expose an OpenAI-compatible but provider-agnostic API for the model gateway, which are used to route model requests. The gateway supports all of the preceding endpoints, however, some model providers may not support a specific endpoint's service in their backend. Trying to use a configured model provider with an unsupported endpoint service results in an error response.

Listing providers and models

You can list both the providers and models that you configured.

To list all configured model providers, use the following command:

curl -sS https://{region}.ml.cloud.ibm.com/ml/gateway/v1/providers \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

To list all models enabled for a specific provider, use the following command:

curl -sS "https://{region}.ml.cloud.ibm.com/ml/gateway/v1/providers/${PROVIDER_UUID}/models" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

To list all models enabled (across all the configured providers), use the following command:

curl -sS "https://{region}.ml.cloud.ibm.com/ml/gateway/v1/models" \
	-H "Content-Type: application/json" \
	-H "Authorization: Bearer ${IBM_CLOUD_APIKEY}"

Chat completions

To use the /v1/chat/completions endpoint, see the following example:

curl https://{region}.ml.cloud.ibm.com/ml/gateway/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "model": "azure/gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "Please explain everything in a way a 5th grader could understand—simple language, clear steps, and easy examples."
      },
      {
        "role": "user",
        "content": "Can you explain what TLS is and how I can use it?"
      }
    ]
  }'

For more details and examples, see Chat completions in the OpenAI API documentation.

Text completions/generation

To use the /v1/completions endpoint, see the following example:

curl https://{region}.ml.cloud.ibm.com/ml/gateway/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "model": "ibm/llama-3-3-70b-instruct",
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'

For more details and examples, see Text generation in the OpenAI API documentation.

Embeddings generation

To use the /v1/embeddings endpoint, see the following example:

curl https://{region}.ml.cloud.ibm.com/ml/gateway/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $IBM_CLOUD_APIKEY" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-3-large",
    "encoding_format": "float"
  }'

For more details and examples, see How to get embeddings in the OpenAI API documentation.

Using the OpenAI SDK

The model gateway maintains compatibility with the OpenAI API and as a result, the OpenAI SDKs can be used to interact with the gateway service by passing the IBM Cloud API key instead of the OpenAI API key.

To use the OpenAI Python SDK to make a chat completions request to the model gateway, see the following example:

import os
from openai import OpenAI

gateway_url = "https://{region}.ml.cloud.ibm.com/ml/gateway/v1"
ibm_cloud_api_key = os.getenv("IBM_CLOUD_APIKEY")

print("Using GATEWAY_URL:", gateway_url)
print("Using IBM_CLOUD_APIKEY:", ibm_cloud_api_key)

# Connect to the model gateway using the IBM Cloud API key.
client = OpenAI(
    base_url=gateway_url,
    api_key=ibm_cloud_api_key,
)

# Create a Chat Completions request to the model gateway.
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(completion.choices[0].message)