Setting up the model gateway programmatically

Store configuration details for various model providers in IBM Cloud Secrets Manager and then add models by using an API that is compatible with OpenAI.

Before you begin

Generate credentials to authenticate with watsonx.ai APIs. For details, see Generating a bearer token.
Get credentials for each supported model provider that you plan to use.
Get the list of available model providers and their UUIDs. For details, see Listing providers and models.

Procedure

Define and create a secret in the IBM Software Hub vault for a model provider that you want to configure. If you use an external vault, specify the arguments for the model provider when you create the secret. For more information, see Managing secrets and vaults.

The following table shows the configuration requirements for each supported model provider. For more details, see the respective provider’s inference documentation.

Table 1. Supported model providers keys that are needed to set up their secrets
Provider name	Required arguments	Optional arguments	Notes
OpenAI	• `apikey`	• `base_url`	–
IBM watsonx.ai	• `base_url`	• `apikey` • `project_id` • `space_id` • `auth_url` • `api_version`	Supports `project_id`, `space_id` per-request auth via headers (`X-IBM-Project-Id`, `X-IBM-Space-Id`)
Azure OpenAI	• `apikey` • `resource_name` • `api_version`	• `subscription_id` • `resource_group_name` • `account_name`	`api_version` defaults to `2024-10-21`; subscription/resource group/account needed for model listing
Anthropic	• `apikey`	–	–
AWS Bedrock	• `access_key_id` • `secret_access_key` • `region`	• `session_token` • `base_url`	–
Cerebras	• `apikey`	–	–
Cohere	• `apikey`	–	–
Groq	• `apikey`	–	–
Mistral	• `apikey`	–	–
NVIDIA NIM	• `apikey`	–	–
Ollama	• `host`	• `keep_alive` • `clean_on_close`	Self-hosted/local deployment; no authentication; uses custom Ollama API format (not OpenAI-compatible); `keep_alive` defaults to 5 minutes
xAI	• `apikey`	–	–
Google Gemini	• `apikey`	–	–

Set the environment variable Vault_URN. You can copy the Vault_URN from the Administration > Configurations and settings > Vaults and secrets page by clicking the Copy icon next to your secret name. For example, see the following command:

export Vault_URN="<user-id>:<secret-name>"

Run the following REST API request to configure a model provider:

  curl -sS https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/providers/<provider> \
    -H "Authorization: Bearer ${TOKEN}" \
    -H "Content-Type: application/json" \
    -d "$(jq -n \
      --arg resource "${Vault_URN}" \
      --arg name "<custom-name-for-provider>" \
      '{name: $name, data_reference: {resource: $resource}}')"

If the internal vault is enabled, you can also configure credentials directly by running the following command:

  curl --request POST \
    --url https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/providers/<provider>  \
    -H 'Authorization: Bearer ${TOKEN}' \
    -H 'Content-Type: application/json' \
    --data '{
    "data": {
      "apikey": "<model-provider-api-key>"
    },
    "name": "<custom-name-for-model-provider>"
  }'

After a model provider is added, you can add models from that provider by using the provider’s UUID and the model’s ID in the request. The model ID must be an existing unique identifier that is recognized by the provider. Since some models are available from multiple providers, you can use model aliases, which are custom names that reference models instead of using their model IDs. For example, see the following command:
```
curl -X POST "https://cpd-<namespace-name>.apps.<OCP-domain>/ml/gateway/v1/providers/${PROVIDER_UUID}/models" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${TOKEN}" \
  -d '{ "alias": "<custom-name-for-model>", "id": "<model_id>"}'
```
For more details on each supported model provider, see the watsonx.ai API reference documentation.

What to do next

You can now send requests to models through the model gateway. For details, see Inferencing gateway models. You can also manage existing connections and models, enable load balancing, create access policies, and set rate limits.