Model gateway (preview)

You can securely access and interact with foundation models from multiple model providers through the model gateway. The model gateway provides an OpenAI-compatible API that routes requests to foundation models from various model providers.

Use the model gateway to efficiently switch between multiple model providers by routing and formatting requests through a unified interface. You can build and deploy AI agents, RAG patterns, and more by using the gateway models.

Note: The model gateway feature is in preview and available in the Toronto region only.

The model gateway is certified to access models from the following foundation model providers:

  • IBM watsonx.ai

  • OpenAI

  • Azure OpenAI

  • Anthropic

  • AWS Bedrock

  • Cerebras

  • NVIDIA NIM

  • Google Gemini

Capabilities

You can use model gateway with the following capabilities:

Secure management of access providers
Integrate with IBM Cloud Secrets Manager to securely store and manage API keys and other sensitive configuration data. Secrets Manager securely manages access credentials between the model providers that you select and watsonx.ai. You can integrate with IBM Cloud Identity and Access Management (IAM) to enforce access control over who can retrieve and manage these secrets.
Access to multiple model providers
Connect to various model providers through a single, unified interface. With an OpenAI-compatible API endpoint, you can interact with different models by using a consistent request format. Built-in load balancing distributes requests across available model to optimize performance and prevent overload. Accessing multiple providers gives flexibility to integrate models based on your use case and accelerates testing and deployment without requiring changes to existing codebase.
Custom model endpoints
Deploy and manage a set of foundation models curated by you by configuring endpoints through the model gateway. Custom endpoints provide secure and scalable integration of custom models into your applications.
Load balancing
Use a load balancer to ensure resiliency across multiple model backends, distribute traffic, and call a single stable alias while scaling backend capacity.
Rate limits
Set request-based and token-based limits to prevent resource-intensive workloads from consuming shared capacity and maintain fair allocation across providers.
Access policies
Use access policies to control access to models and load balancers (from the UI), and to provider endpoint, tenant enpoint, and policy endpoint (from the API).
Note:

Models added through the model gateway are not enabled for use in the Prompt Lab or Tuning Studio.

To open the model gateway, open the navigation menu, click on Administration, and then select Model Gateway.

Ways to work

You can use various methods to set up the model gateway. For details, see Setting up the model gateway.

To inference foundation models through the gateway, you can use the following methods:

  • watsonx.ai REST API
  • OpenAI Python SDK.

For details, see Inferencing models through the model gateway.

Workflow

The following diagram illustrates the workflow to set up the model gateway and inference models through the gateway:

Diagram that shows IBM watsonx.ai model gateway workflow

Here's a high-level overview of the steps that are required to set up and use the model gateway:

  1. Create a Secrets Manager service instance

  2. Allow the watsonx.ai Runtime service instance to access the Secrets Manager

  3. Configure foundation model providers through the model gateway. Add credentials and store them in Secrets Manager.

  4. Add models for each configured model provider

  5. Enable load balancing, create access policies, and set rate limits for models

  6. Inference foundation models that are accessible through the gateway

Learn more