Inferencing models through the model gateway
Send requests to models by using OpenAI-compatible endpoints through the model gateway. You can use either the REST API or the OpenAI Python SDK to generate text, create chat-based responses, produce embeddings, and develop scalable solutions across multiple models tailored to your specific use cases.
Required permissions
: To inference gateway models, you must have one of the following permissions: - Administrator platform - Manage configurations
- Required credentials
- You must generate credentials to authenticate with watsonx.ai APIs. For details, see Generating a bearer token.
Ways to work
The model gateway endpoints provide an OpenAI-compatible unified API for any provider, which are used to route model requests.
You can inference gateway foundation models by using these programming methods:
REST API
For details about the model gateway APIs, see the watsonx.ai API reference documentation.
The model gateway supports the following endpoints:
- Listing providers and models
- Chat completions (supports streaming)
- Text completions/generations (supports streaming)
- Embeddings generation
Listing providers and models
You can list both the providers and models that you configured.
To list all configured model providers, use the following command:
curl -sS "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/providers" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}"
To list all models enabled for a specific provider, use the following command:
curl -sS "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/providers/${PROVIDER_UUID}/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}"
To list all models enabled (across all the configured providers), use the following command:
curl -sS "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}"
Chat completions
To use the /v1/chat/completions endpoint, see the following example:
curl "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"model": "azure/gpt-4o",
"messages": [
{
"role": "system",
"content": "Please explain everything in a way a 5th grader could understand—simple language, clear steps, and easy examples."
},
{
"role": "user",
"content": "Can you explain what TLS is and how I can use it?"
}
]
}'
For more details and examples, see Chat completions in the OpenAI API documentation.
Text completions/generation
curl "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"model": "ibm/llama-3-3-70b-instruct",
"prompt": "Say this is a test",
"max_tokens": 7,
"temperature": 0
}'
For more details and examples, see Text generation in the OpenAI API documentation.
Embeddings generation
To use the /v1/embeddings endpoint, see the following example:
curl "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${TOKEN}" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "text-embedding-3-large",
"encoding_format": "float"
}'
For more details and examples, see How to get embeddings in the OpenAI API documentation.
Python SDK
To work with the foundation models in the model gateway, you can use the Gateway class in the watsonx.ai Python library.
To get started, see the following sample notebooks:
- To build LLM apps that route requests to providers by using the LangGraph framework and model gateway, see LangGraph Agent Template.
The model gateway maintains compatibility with the OpenAI API. Therefore, the OpenAI SDKs can be used to inference gateway models by passing the bearer token instead of the OpenAI API key.
To use the OpenAI Python SDK to make a chat completions request through the model gateway, see the following example:
import os
from openai import OpenAI
gateway_url = "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1"
ibm_cloud_api_key = os.getenv("TOKEN")
client = OpenAI(
base_url=gateway_url,
api_key=bearer_token,
)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message)