Generating text

Use foundation models in IBM watsonx.ai programmatically for text generation tasks.

Inference types

You can prompt a foundation model by using one of the following text generation methods:

Infer text: Waits to return the output that is generated by the foundation model all at one time.
Infer text event stream: Returns the output as it is generated by the foundation model. This method is useful in conversational use cases, where you want a chatbot or virtual assistant to respond to a user in a fluid way that mimics a real conversation.

For chat use cases, use the Chat API. See Adding generative chat function to your applications with the chat API.

Ways to develop

You can inference foundation models by using these programmatic methods:

Alternatively, you can use graphical tools from the watsonx.ai UI to inference foundation models. See Prompt Lab.

REST API

The method that you use to inference a foundation model differs depending on whether the foundation model is provided with watsonx.ai or is associated with a deployment.

To inference a foundation model that is provided with watsonx.ai, use the Text generation method.

curl -X POST 'https://<region>.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-02-11' \
-H 'Authorization: Bearer ${ACCESS_TOKEN}' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
--data-raw '{
  "input": "Tell me about interest rates",
  "parameters": {
    "max_new_tokens": 200
  },
  "model_id": "ibm/granite-3-8b-instruct",
  "project_id": "{project_id}"
}'

To inference a tuned or custom foundation model, use the Deployments>Infer text method.

The {model_id} is not required with this type of request because only one model is supported by the deployment.

Applying AI guardrails when inferencing

When you prompt a foundation model by using the API, you can use the moderations field to apply AI guardrails to foundation model input and output. For more information, see Removing harmful language from model input and output.

Inferencing with a prompt template

You can inference a foundation model with input text that follows a pattern that is defined by a prompt template.

For details, see Create a prompt template.

To extract prompt template text to use as input to the text generation method, take the following steps:

Use the Search asset types method of the Watson Data API to get the prompt template ID.

curl -X POST \
  'https://api.dataplatform.cloud.ibm.com/v2/asset_types/wx_prompt/search?version=2024-07-29&project_id={project_id}' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ${ACCESS_TOKEN}' \
  --data '{
   "query": "asset.name:<template_name>"
}'

The prompt template ID is specified as the metadata.asset_id.

Use the Get the inference input string for a given prompt method to get the prompt template text.
```
curl -X POST \
'https://api.dataplatform.cloud.ibm.com/wx/v1/prompts/{prompt-template-id}/input?version=2024-07-29&project_id={project_id}'
...
```
For more information, see Get the inference input string for a given prompt

You can submit the extracted prompt text as input to the Generate text method.

Encrypting inference requests

You can use custom keys managed in your cloud provider account to encrypt inference requests to foundation models and to decrypt the model response.

Note: You can use custom encryption keys to encrypt inference requests in the Dallas region only.

Create a task credential of the type key_manager_api_key.

curl --request POST 'https://<Cloud provider endpoint URL>/ml/v1/task_credentials' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ${ACCESS_TOKEN}' \
  --data '{
    "name": "Key manager task credentials",
    "description": "This is my task credentials",
    "type": "key_manager_api_key"
  }'

Generate a wrapped custom data encryption key (DEK) in an external keys management service such as IBM Key Protect. For details, see Encrypting at rest data on IBM Cloud.

Important: Save the base-64 encoded cipher text for the DEK.
Set the key reference by adding the cipher text for the DEK to the cloud resource name (CRN) for the key. You can find the key CRN in the key details in the IBM Key Protect service UI.

The key reference format is as follows:
```
export DEK_REF="crn:v1:bluemix:public:kms:<region>:a/<account-id>:<service-instance>:key:<key-id>:wdek:<cipher-text-for-DEK>"
```

Encypt inference requests to the foundation model by specifying the DEK reference in the text generation REST API request as follows:

curl -X POST 'https://<Cloud provider endpoint URL>/ml/v1/text/generation?version=2025-12-11' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer ${ACCESS_TOKEN}' \
  --data-raw '{
     "input": "Tell me about interest rates",
     "model_id": "ibm/granite-3-8b-instruct",
     "project_id": "{project_id}",
     "parameters": {
       "max_new_tokens": 200
     },
     "crypto": {
       "key_ref": ${DEK_REF}
     }
  }'

Python library

See the Model Inference class of the watsonx.ai Python library.

For details about how to use available sample Python notebooks, see the following topics:

Node.js

Text generation

See the following resources:

Text generation stream

See the following resource: