Creating a custom product for a watsonx.ai API
Create a product for an AI Gateway API and add a plan that contains the required assembly rate limits.
Attention: A product that is created later with the auto-publish feature (when
publishing your API) does not include the required plan for using the AI service. If you did not
select Create product to generate a product while creating the watsonx.ai API, you must create a custom product before you can publish the new
API.
Create the product as explained in Creating a draft Product. In the
product definition, add a plan with the watson-ai-default
and
watson-ai-infer-text
assembly rate limits, as shown in the following example:
plans:
default-plan:
title: Default Plan
description: Default Plan
approval: false
rate-limits:
default:
value: 100/1hour
assembly-rate-limits:
watson-ai-default:
- value: 200/1minute
hard-limit: true
cache-only: false
is-client: true
use-api-name: false
use-app-id: false
use-client-id: true
weight: '1'
watson-ai-infer-text:
- value: 200/1minute
hard-limit: true
cache-only: false
is-client: true
use-api-name: false
use-app-id: false
use-client-id: true
weight: aiGeneratedTokenCount
- The watsonx.ai assembly rate limits are required
- The product must include a plan with the
watson-ai-default
andwatson-ai-infer-text
assembly rate limits, even if you add other plans to the product. You can configure the rate limits as needed for your own requirements. - Token-based rate limiting requires a token count
- The
weight: aiGeneratedTokenCount
property is required in thewatson-ai-infer-text
assembly rate limit. This variable indicates the number of tokens that will be added to the counter with each API call, and then compared with the rate limit threshold. - Rate limiting can apply to the catalog or to individual client IDs.
- Rate limiting is configured for the catalog that contains the API. By default, each client ID
that subscribes to the plan within a particular catalog is assigned its own rate limit threshold. To
configure a single threshold for the entire catalog, set
use-client-id: false
for that rate limit.