Extensions
Requirements
You need to provide an OpenAPI specification (JSON, YAML, or YML) to configure extensions. The API specification can have more than one servers defined, but there must be at least one.
*.azure.com *.*.services.ai.azure.com *.*.models.ai.azure.com *.openai.azure.com *.lambda-url.*.on.aws *.googleapis.com api.openai.com api.llama.com api.mistral.ai
The first GET operation in a specification is used to test connectivity and
authentication from the extensions UI. Ensure that all the default values for the URL and path
parameters of the GET operation is specified because those values are used during
testing.
Without a GET operation, an extension cannot be tested, but it can still be enabled. If there are any connectivity issues, you'll know it at runtime.
If an extension is used for integrating custom machine learning models, connectivity can be tested using the custom machine learning configuration UI. A GET operation is not required in that case.
Authentication
- Basic authentication
- Bearer token
- API key (Can be configured to use custom header and token prefixes.)
- OAuth 2.0 (Currently restricted to IBM®'s IAM API keys and MCSP.)
Sample specification
openapi: 3.0.0
info:
title: IBM watsonx.ai API
description: |
API for IBM watsonx.ai, a generative AI service that enables developers to build enterprise-ready AI applications.
This API allows you to interact with foundation models for text generation and inferencing.
version: '2023-05-02'
contact:
name: IBM Cloud Support
url: https://cloud.ibm.com/unifiedsupport/supportcenter
servers:
- url: https://dev.aws.wxai.ibm.com
description: watsonx.ai API server in Mumbai region
- url: https://ap-south-1.aws.wxai.ibm.com
description: watsonx.ai API server in Mumbai region
- url: https://us-east-1.aws.wxai.ibm.com
description: watsonx.ai API server in us-east-1 (North Virginia)
- url: https://{cluster_url}
description: Custom cluster URL
variables:
cluster_url:
default: ap-south-1.aws.wxai.ibm.com
description: The cluster URL for your watsonx.ai instance
security:
- IBMCloudAuth: []
paths:
/ml/v1/deployments/{id_or_name}/text/generation:
post:
summary: Generate text using a foundation model
description: |
This endpoint allows you to generate text using a deployed foundation model.
You provide prompt variables and generation parameters, and the model returns generated text.
operationId: generateTextStream
parameters:
- name: id_or_name
in: path
description: The ID or name of the deployed model
required: true
schema:
type: string
default: '5e3d3ed9-c757-4a41-990c-b5347fc2b7dd'
- name: version
in: query
description: The API version
required: true
schema:
type: string
default: '2021-05-01'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/TextGenerationRequest'
responses:
'200':
description: Successful text generation
content:
application/json:
schema:
$ref: '#/components/schemas/TextGenerationResponse'
'400':
description: Bad request
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'401':
description: Unauthorized
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'403':
description: Forbidden
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'404':
description: Model not found
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'500':
description: Internal server error
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
/ml/v1/text/generation:
post:
summary: Generate text using a specified model
description: |
This endpoint allows you to generate text using a specified model.
You provide the model ID, input prompt, and generation parameters, and the model returns generated text.
operationId: generateTextWithModel
parameters:
- name: version
in: query
description: The API version
required: true
schema:
type: string
default: '2023-05-02'
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/TextRequest'
responses:
'200':
description: Successful text generation
content:
application/json:
schema:
$ref: '#/components/schemas/TextGenerationResponse'
'400':
description: Bad request
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'401':
description: Unauthorized
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'403':
description: Forbidden
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'404':
description: Model not found
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'500':
description: Internal server error
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
/ml/v4/deployments/{deployment_id}:
get:
summary: Get deployment details
description: |
Retrieve information about a specific deployment by its ID.
This endpoint returns details about the deployment including its configuration, status, and metadata.
operationId: getDeployment
parameters:
- name: deployment_id
in: path
description: The ID of the deployment to retrieve
required: true
schema:
type: string
default: '76633500-9ad8-49fc-a794-69aab4e35dcc'
example: "5e3d3ed9-c757-4a41-990c-b5347fc2b7dd"
- name: space_id
in: query
description: The ID of the space containing the deployment
required: true
schema:
type: string
default: '7b71d5d5-8a86-4415-a0e5-b5e962328be6'
example: "7b71d5d5-8a86-4415-a0e5-b5e962328be6"
- name: version
in: query
description: The API version
required: true
schema:
type: string
default: '2023-05-02'
responses:
'200':
description: Successful retrieval of deployment details
content:
application/json:
schema:
$ref: '#/components/schemas/DeploymentResponse'
'400':
description: Bad request
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'401':
description: Unauthorized
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'403':
description: Forbidden
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'404':
description: Deployment not found
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
'500':
description: Internal server error
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
components:
schemas:
TextGenerationRequest:
type: object
properties:
parameters:
type: object
description: Parameters for text generation with prompt variables
properties:
prompt_variables:
type: object
description: Variables to be used in the prompt template
additionalProperties:
type: string
example:
Test: ""
max_new_tokens:
type: integer
description: The maximum number of tokens to generate
default: 100
minimum: 1
maximum: 2048
time_limit:
type: integer
description: The time limit for generation in milliseconds
default: 1000
minimum: 100
decoding_method:
type: string
description: The decoding method to use
enum: [greedy, sample]
default: greedy
temperature:
type: number
description: Controls randomness in generation (higher = more random)
minimum: 0
maximum: 2
default: 1.0
top_p:
type: number
description: Nucleus sampling parameter
minimum: 0
maximum: 1
default: 1.0
top_k:
type: integer
description: Top-k sampling parameter
minimum: 1
default: 50
repetition_penalty:
type: number
description: Penalty for repeating tokens
minimum: 1.0
default: 1.0
stop_sequences:
type: array
description: Sequences that will stop generation when produced
items:
type: string
example: ["\\n", "###"]
TextGenerationResponse:
type: object
properties:
model_id:
type: string
description: The ID of the model used for generation
created_at:
type: string
format: date-time
description: The timestamp when the response was created
results:
type: array
description: The generated text results
items:
type: object
properties:
generated_text:
type: string
description: The generated text
generated_token_count:
type: integer
description: The number of tokens generated
input_token_count:
type: integer
description: The number of tokens in the input
stop_reason:
type: string
description: The reason why generation stopped
enum: [max_tokens, stop_sequence, time_limit]
TextRequest:
type: object
required:
- model_id
- input
properties:
apikey:
type: string
description: API key for authentication
example: "your-api-key-here"
model_id:
type: string
description: The ID of the model to use for generation
example: "meta-llama/llama-3-3-70b-instruct"
input:
type: string
description: The input prompt for text generation
example: ""
parameters:
type: object
description: Parameters to control the text generation
properties:
decoding_method:
type: string
description: The decoding method to use
enum: [greedy, sample]
default: greedy
max_new_tokens:
type: integer
description: The maximum number of tokens to generate
default: 200
minimum: 1
maximum: 2048
example: 200
min_new_tokens:
type: integer
description: The minimum number of tokens to generate
default: 0
minimum: 0
example: 0
stop_sequences:
type: array
description: Sequences that will stop generation when produced
items:
type: string
example: []
repetition_penalty:
type: number
description: Penalty for repeating tokens
minimum: 1.0
default: 1.0
example: 1
time_limit:
type: integer
description: The time limit for generation in milliseconds
default: 1000
minimum: 100
example: 1000
temperature:
type: number
description: Controls randomness in generation (higher = more random)
minimum: 0
maximum: 2
default: 1.0
top_p:
type: number
description: Nucleus sampling parameter
minimum: 0
maximum: 1
default: 1.0
top_k:
type: integer
description: Top-k sampling parameter
minimum: 1
default: 50
project_id:
type: string
description: The ID of the project
example: "24d54a06-4324-493a-b800-aa4e25aeeda8"
moderations:
type: object
description: Content moderation settings
properties:
hap:
type: object
description: Hate, Abuse, and Profanity moderation settings
properties:
input:
type: object
description: Input moderation settings for HAP
properties:
enabled:
type: boolean
description: Whether HAP input moderation is enabled
default: true
threshold:
type: number
description: Threshold for HAP detection
minimum: 0
maximum: 1
default: 0.5
mask:
type: object
description: Masking settings for detected content
properties:
remove_entity_value:
type: boolean
description: Whether to remove entity values
default: true
output:
type: object
description: Output moderation settings for HAP
properties:
enabled:
type: boolean
description: Whether HAP output moderation is enabled
default: true
threshold:
type: number
description: Threshold for HAP detection
minimum: 0
maximum: 1
default: 0.5
mask:
type: object
description: Masking settings for detected content
properties:
remove_entity_value:
type: boolean
description: Whether to remove entity values
default: true
pii:
type: object
description: Personally Identifiable Information moderation settings
properties:
input:
type: object
description: Input moderation settings for PII
properties:
enabled:
type: boolean
description: Whether PII input moderation is enabled
default: true
threshold:
type: number
description: Threshold for PII detection
minimum: 0
maximum: 1
default: 0.5
mask:
type: object
description: Masking settings for detected content
properties:
remove_entity_value:
type: boolean
description: Whether to remove entity values
default: true
output:
type: object
description: Output moderation settings for PII
properties:
enabled:
type: boolean
description: Whether PII output moderation is enabled
default: true
threshold:
type: number
description: Threshold for PII detection
minimum: 0
maximum: 1
default: 0.5
mask:
type: object
description: Masking settings for detected content
properties:
remove_entity_value:
type: boolean
description: Whether to remove entity values
default: true
granite_guardian:
type: object
description: Granite Guardian moderation settings
properties:
input:
type: object
description: Input moderation settings for Granite Guardian
properties:
enabled:
type: boolean
description: Whether Granite Guardian input moderation is enabled
default: false
threshold:
type: number
description: Threshold for detection
minimum: 0
maximum: 1
default: 1
InferenceRequest:
type: object
required:
- input
properties:
apikey:
type: string
description: API key for authentication
example: "your-api-key-here"
input:
type: string
description: The input data for inference
example: "Classify the sentiment of this text: I really enjoyed the movie."
parameters:
type: object
description: Parameters to control the inference
properties:
max_new_tokens:
type: integer
description: The maximum number of tokens to generate
default: 100
minimum: 1
maximum: 2048
example: 100
time_limit:
type: integer
description: The time limit for inference in milliseconds
default: 1000
minimum: 100
example: 1000
task_type:
type: string
description: The type of inference task
enum: [classification, question_answering, summarization, translation]
example: "classification"
return_options:
type: object
description: Options for what to return in the response
properties:
include_input:
type: boolean
description: Whether to include the input in the response
default: false
include_intermediate_results:
type: boolean
description: Whether to include intermediate results
default: false
InferenceResponse:
type: object
properties:
model_id:
type: string
description: The ID of the model used for inference
created_at:
type: string
format: date-time
description: The timestamp when the response was created
results:
type: array
description: The inference results
items:
type: object
properties:
result:
type: string
description: The inference result
confidence:
type: number
description: Confidence score for the result
minimum: 0
maximum: 1
processing_time:
type: integer
description: Processing time in milliseconds
DeploymentListResponse:
type: object
properties:
total_count:
type: integer
description: Total number of deployments matching the criteria
example: 5
resources:
type: array
description: Array of deployment resources
items:
$ref: '#/components/schemas/DeploymentResponse'
DeploymentResponse:
type: object
properties:
metadata:
type: object
description: Metadata about the deployment
properties:
id:
type: string
description: The unique identifier of the deployment
example: "12345678-1234-1234-1234-123456789abc"
name:
type: string
description: The name of the deployment
example: "My Model Deployment"
description:
type: string
description: Description of the deployment
created_at:
type: string
format: date-time
description: Timestamp when the deployment was created
modified_at:
type: string
format: date-time
description: Timestamp when the deployment was last modified
space_id:
type: string
description: The ID of the space containing the deployment
example: "aa6dc728-958e-42b7-acdf-d403e16d1e9e"
owner:
type: string
description: The owner of the deployment
entity:
type: object
description: The deployment entity details
properties:
asset:
type: object
description: Information about the deployed asset
properties:
id:
type: string
description: The ID of the deployed model or asset
name:
type: string
description: The name of the deployed model or asset
deployed_asset_type:
type: string
description: The type of asset being deployed
example: "model"
hardware_spec:
type: object
description: Hardware specifications for the deployment
properties:
id:
type: string
description: Hardware specification ID
name:
type: string
description: Hardware specification name
num_nodes:
type: integer
description: Number of nodes
online:
type: object
description: Online deployment configuration
properties:
parameters:
type: object
description: Deployment parameters
status:
type: object
description: Current status of the deployment
properties:
state:
type: string
description: The current state of the deployment
enum: [initializing, updating, ready, failed]
example: "ready"
message:
type: string
description: Status message
ErrorResponse:
type: object
properties:
status_code:
type: integer
description: The HTTP status code
error:
type: string
description: Error type
message:
type: string
description: Error message
trace_id:
type: string
description: Trace ID for debugging
securitySchemes:
IBMAuthMCSP:
type: oauth2
description: |
Authentication for watsonx.ai API requires an IBM Cloud API key.
The API key must be included in the request body as: {"apikey": "your-api-key-value"}
To obtain an API key:
1. Create an API key in IBM Cloud
2. Include the API key in the request body of each API call
flows:
clientCredentials:
tokenUrl: https://iam.platform.saas.ibm.com/siusermgr/api/1.0/apikeys/token
scopes: {}
x-ibm-iam-type: MCSP
IBMCloudIAM:
type: oauth2
description: |
Authentication for watsonx.ai API using IBM Cloud IAM.
This scheme exchanges an API key for a bearer token using form-encoded credentials.
The token endpoint expects a form-encoded body with:
- grant_type: urn:ibm:params:oauth:grant-type:apikey
- apikey: your-api-key-value
To use this authentication:
1. Create an API key in IBM Cloud
2. Exchange it for a bearer token via the token endpoint
3. Use the bearer token in the Authorization header
flows:
clientCredentials:
tokenUrl: https://iam.cloud.ibm.com/identity/token
scopes: {}
x-ibm-iam-type: IBM
# Made with Bob