What's new and changed in watsonx.ai

watsonx.ai updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM Software Hub.

IBM watsonx™ Version 2.3.1

A new version of watsonx.ai was released in February 2026.

This release includes the following changes:

New features

This release of watsonx.ai includes the following features:

New foundation models in watsonx.ai

You can now use the following foundation models for inferencing from the Prompt Lab and the API:

ibm-defense-4-0-small
devstral-small-2512
devstral-medium-2512
ministral-14b-instruct-2512
mistral-large-2512

For details, see Supported foundation models.

Use text, audio, video, and images as context when you inference a custom foundation model

You can now add text, audio, video, and images as context when you inference any custom foundation model that has these capabilities.

For details, see Inferencing deployed custom foundation models.

Use audio input when you inference a foundation model with the watsonx.ai chat API

You can now add audio files encoded as binary data as context when you inference multimodal foundation models installed in your cluster that support audio input with the watsonx.ai chat API.

For details, see Adding generative chat function to your applications with the chat API.

Pass model parameters when invoking deployed AutoAI RAG services

You can now include temperature and top_p model parameters when you invoke deployed AutoAI RAG services. With these additional settings, you have more flexible control over how the model responds for each request, instead of relying only on the default behavior. You can adjust the model’s tone and output without changing the deployment itself.

For details, see Saving a RAG pattern as a deployable AI service.

New runtime requirement in AutoAI RAG experiments

AutoAI RAG now runs on the GenAI runtime, replacing the previous 24.1 runtime. The GenAI runtime is compatible with the latest foundation models, improves performance, and provides a more stable environment for RAG workloads. Make sure that you have the GenAI runtime installed in your environment to successfully run RAG experiments and deployments.

For details, see Automating a RAG pattern with AutoAI.

New model requirement for AutoAI RAG experiments with SQL knowledge bases

If you have any models that use SQL knowledge bases in AutoAI RAG, you must ensure that these models have the autoai_sql_rag function. AutoAI RAG uses this function to correctly evaluate and select models that work with SQL knowledge bases for retrieval.

For details, see Foundation models supported for AutoAI experiments.

Updates

The following updates were introduced in this release:

Use NVIDIA L40S GPUs to tune foundation models: You can now use NVIDIA L40S GPUs to tune foundation models by using the full fine tuning and low-rank adaptation (LoRA) fine tuning methods.

Issues fixed in this release

The following issues were fixed in this release:

Upgrade to IBM® Software Hub version 5.3 fails with ImagePullBackOff error

Issue: After upgrading to version 5.3:
- The watsonxaiifm and ibm_redis_cp custom resources (CRs) remain stuck in in progress state indefinitely.
- All the pods that belong to the watsonx_ai_ifm add-on have ImagePullBackOff status.
Resolution: The issue is now fixed.

Customer-reported issues fixed in this release

For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.

Deprecated features

The following features were deprecated in this release:

The watsonx.ai text generation API is deprecated

The "Infer text" and "Infer text event stream" endpoints of the watsonx.ai text generation API are now deprecated and will be removed in the future. Move over any prompt sessions, templates, notebooks, and AI services that use the text generation API to the watsonx.ai chat API.

For details, see the watsonx.ai API reference documentation.

The deprecation of the text generation API includes the following changes:

Saved prompt sessions, templates, and notebooks that use the text generation API might not return inference results as expected. You cannot save any new prompt sessions or templates that use the text generation API as project assets.
The following features will not be supported after the API is removed:
- Evaluating prompt templates
- The Structured and Freeform modes in the Prompt Lab. Use Chat mode instead.

Some key-value pair extraction modes are deprecated in the watsonx.ai text extraction API

The invoice and utility key-value pair extraction modes in the watsonx.ai text extraction API are now deprecated and will be removed in the future.

You can continue to extract text and key-value pair data from invoices and utility bills with the generic_with_semantic extraction mode.

For details, see Key-value pair extraction modes.

The following fields in the semantic_config text extraction API parameter are also deprecated:

target_image_width.
enable_text_hints and target_image_width in the schemas field.

For details, see the watsonx.ai API reference documentation.

IBM watsonx Version 2.3.0

A new version of watsonx.ai was released in December 2025.

This release includes the following changes:

New features

This release of watsonx.ai includes the following features:

New foundation models in watsonx.ai

You can now use the following foundation models for inferencing from the Prompt Lab and the API:

granite-4-h-tiny
granite-docling-258M
ibm-defense-4-0-micro

For details, see Supported foundation models.

Access models across multiple providers with the model gateway

You can now securely configure and interact with foundation models from multiple providers with the model gateway by using the API. In addition, you can manage the model gateway with integrated load-balancing, access policies, and rate limits.

For details, see Model gateway.

Capture semantic meaning and refine retrieved results by using custom embedding and reranking models

You can now add custom embedding and reranking models to watsonx.ai and use them to capture semantic meaning and refine retrieved results.

All custom foundation models now use the vLLM inferencing server

All custom foundation models now use the vLLM inferencing server. If your deployed models use the TGIS inferencing server, you might have to migrate them.

For details, see Requirements for deploying custom foundation models.

New text classification API in watsonx.ai

You can now use the new text classification method in the watsonx.ai REST API to classify your document before you extract textual content to use in a RAG solution.

You can classify your document with the classification API into one of several supported common document types without running a longer extraction task. By pre-processing the document, you can then customize your text extraction request to efficiently extract relevant details from your document.

For details about the text classification API and the document understanding library, see Text classification [Text classification] and Understanding documents.

New vector index transactional API in watsonx.ai

You can now use the new vector index transactional API methods to create and manage vector index assets in a project.

For details, see Create a vector index programmatically.

Use the user interface for Synthetic Data Generator to generate unstructured synthetic data

The user interface for creating jobs to generate unstructured synthetic data is now generally available. The user interface in Synthetic Data Generator makes creating and running jobs easier by organizing all the settings and seed document requirements into simple options and fields.

For details, see Creating jobs to generate unstructured synthetic data.

Improvements for AutoAI for RAG experiments

You can now use the following features for your AutoAI for RAG experiments:

Use semantic chunking in AutoAI for RAG experiments

You can now use the semantic chunking method to break down documents in an AutoAI for RAG experiment. Semantic chunking splits documents based on meaning, making it well-suited for complex or unstructured data.

For details, see Customizing chunking settings.
Use chat API models in AutoAI for RAG experiments

You can now use chat API models in AutoAI for RAG experiments, instead of prompt template models. These models must have chat capabilities to work in AutoAI for RAG experiments.

For details, see Supported foundation models.
Auto-deploy top pattern in AutoAI for RAG experiments

You can now enable automatic deployment of the top-performing pattern after an AutoAI for RAG experiment completes. You can turn on auto-deployment when you set up the experiment. Auto-deployment helps reduce manual steps and further automates the experiment workflow.

For details, see Creating the AutoAI RAG experiment.
Use multiple vector indexes in AutoAI for RAG experiments

You can now select up to 20 vector indexes for your document collection in an AutoAI for RAG experiment. During experiment setup, when you add document and evaluation sources, choose Knowledge bases and then select up to 20 connections. You can then define details for each connection, such as index name and embedding models. Using multiple indexes gives you more flexibility and can improve the quality and performance of your experiments.

For details, see Using vector store knowledge bases in AutoAI RAG experiments.
Use SQL database schemas in AutoAI for RAG experiments

You can now choose an SQL database schema as a knowledge base in an AutoAI for RAG experiment. You can use SQL connections such as Db2, PostgreSQL, and MySQL. When using SQL sources, chunking settings are disabled, and only answer correctness metrics are available for optimization. With an SQL RAG, structured data can be retrieved directly from the relational database, which can improve answer accuracy and relevance when compared with document-based sources.

For details, see Using SQL knowledge bases in AutoAI RAG experiments.

Updates

The following updates were introduced in this release:

New data connections added in AutoAI for RAG experiments

You can now use these data connections for document collections and test data in AutoAI for RAG experiments:

Google Cloud Storage
Box
Dropbox

Use the watsonx.ai chat API to control foundation model reasoning

You can now configure the reasoning capability of foundation models and specify the amount of details a model includes in the response by using new settings in the watsonx.ai chat API.

For details, see Adding generative chat function to your applications.

Use new storage types to store your documents and results from the text extraction API

You can now use the text extraction API with documents stored in the following data stores:

Box
IBM watsonx.data™ SharePoint
IBM FileNet P8

In addition, you can store the output from the text extraction process in a Box data store.

For details, see Text extraction.

Chat with documents in multiple vector data stores by using the Prompt Lab

You can now select documents stored in multiple vector data stores to ground foundation models in the chat mode in Prompt Lab.

For details, see Chatting with documents and media files.

Fine tune foundation models using parameter-efficient fine tuning methods in the Tuning Studio

You can now fine tune foundation models by using parameter-efficient tuning methods such as low-rank adaptation and quantized low-rank adaptation tuning in the Tuning Studio UI in addition to the watsonx.ai tuning API.

For details, see Tuning Studio.

Use NVIDIA RTX PRO 6000 GPUs with foundation models in watsonx.ai

You can now use NVIDIA RTX PRO 6000 GPUs to run your foundation models. The following models are supported withNVIDIA RTX PRO 6000 GPUs:

granite-4-h-small
granite-speech-3-3-8b
granite-guardian-3-2-5b
llama-3-2-11b-vision-instruct
llama-guard-3-11b-vision

Issues fixed in this release

The following issues were fixed in this release:

Installing the codestral-2508 foundation model results in a CrashLoopBackOff error

Issue: After installing the codestral-2508 model, the status indicates a CrashLoopBackOff error.
Resolution: The issue is now fixed.

Models may load indefinitely after upgrading to version 5.2.x

Issue: After upgrading watsonxaiifm, some models may fail to load and remain in an indefinite loading state during inference.
Resolution: The issue is now fixed.

Uploading large files in vector indexes may result in timeouts

Issue: When you create a vector index, uploading large file types that are greater than 1GB may result in timeouts.
Resolution: The issue is now fixed.

Customer-reported issues fixed in this release

For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.

Deprecated features

The following features were deprecated in this release:

Prompt tuning is removed

You can no longer use prompt tuning as a method to tune foundation models. All existing prompt tuning deployments will be removed when you upgrade the watsonx.ai service.

For details about alternative tuning methods, see Foundation model tuning methods.

Deprecated and removed foundation models

The following foundation model is now deprecated and will be removed in a future release:

pixtral-12b

The following foundation models are now removed from the watsonx.ai service:

codestral-22b
flan-t5-xl-3b
granite-13b-instruct-v2
jais-13b-chat
llama-4-scout-17b-16e-instruct
llama-3-405b-instruct
llama-2-13b-chat
mistral-large
mistral-small-24b-instruct-2501
mistral-small-instruct
mixtral-8x7b-instruct-v01

For details, see Foundation model lifecycle.