What's new and changed in watsonx.ai
watsonx.ai updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.
You can see a list of the new features for the platform and all of the services at What's new in IBM Software Hub.
IBM watsonx™ Version 2.3.1
A new version of watsonx.ai was released in February 2026.
This release includes the following changes:
- New features
-
This release of watsonx.ai includes the following features:
- New foundation models in watsonx.ai
-
You can now use the following foundation models for inferencing from the Prompt Lab and the API:
ibm-defense-4-0-smalldevstral-small-2512devstral-medium-2512ministral-14b-instruct-2512mistral-large-2512
For details, see Supported foundation models.
- Use text, audio, video, and images as context when you inference a custom foundation model
- You can now add text, audio, video, and images as context when you inference any custom
foundation model that has these capabilities.
For details, see Inferencing deployed custom foundation models.
- Use audio input when you inference a foundation model with the watsonx.ai chat API
- You can now add audio files encoded as binary data as context when you inference multimodal
foundation models installed in your cluster that support audio input with the watsonx.ai chat API.
For details, see Adding generative chat function to your applications with the chat API.
- Pass model parameters when invoking deployed AutoAI RAG services
- You can now include
temperatureandtop_pmodel parameters when you invoke deployed AutoAI RAG services. With these additional settings, you have more flexible control over how the model responds for each request, instead of relying only on the default behavior. You can adjust the model’s tone and output without changing the deployment itself.For details, see Saving a RAG pattern as a deployable AI service.
- New runtime requirement in AutoAI RAG experiments
- AutoAI RAG now runs on the GenAI runtime,
replacing the previous 24.1 runtime. The GenAI runtime is compatible with the latest foundation
models, improves performance, and provides a more stable environment for RAG workloads. Make sure
that you have the GenAI runtime installed in your environment to successfully run RAG experiments
and deployments.
For details, see Automating a RAG pattern with AutoAI.
- New model requirement for AutoAI RAG experiments with SQL knowledge bases
- If you have any models that use SQL knowledge bases in AutoAI RAG, you must ensure that these models have the
autoai_sql_ragfunction. AutoAI RAG uses this function to correctly evaluate and select models that work with SQL knowledge bases for retrieval.For details, see Foundation models supported for AutoAI experiments.
- Updates
- The following updates were introduced in this release:
- Use NVIDIA L40S GPUs to tune foundation models
- You can now use NVIDIA L40S GPUs to tune foundation models by using the full fine tuning and low-rank adaptation (LoRA) fine tuning methods.
- Issues fixed in this release
- The following issues were fixed in this release:
- Upgrade to IBM® Software
Hub version 5.3 fails with
ImagePullBackOfferror -
- Issue: After upgrading to version 5.3:
- The
watsonxaiifmandibm_redis_cpcustom resources (CRs) remain stuck inin progressstate indefinitely. - All the pods that belong to the
watsonx_ai_ifmadd-on haveImagePullBackOffstatus.
- The
- Resolution: The issue is now fixed.
- Issue: After upgrading to version 5.3:
- Upgrade to IBM® Software
Hub version 5.3 fails with
- Customer-reported issues fixed in this release
- For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.
- Deprecated features
- The following features were deprecated in this release:
- The watsonx.ai text generation API is deprecated
- The "Infer text" and "Infer text event stream" endpoints of the watsonx.ai text generation API are now deprecated
and will be removed in the future. Move over any prompt sessions, templates, notebooks, and AI
services that use the text generation API to the watsonx.ai chat API.
For details, see the watsonx.ai API reference documentation.
IBM watsonx Version 2.3.0
A new version of watsonx.ai was released in December 2025.
This release includes the following changes:
- New features
-
This release of watsonx.ai includes the following features:
- New foundation models in watsonx.ai
-
You can now use the following foundation models for inferencing from the Prompt Lab and the API:
granite-4-h-tinygranite-docling-258Mibm-defense-4-0-micro
For details, see Supported foundation models.
- Access models across multiple providers with the model gateway
-
You can now securely configure and interact with foundation models from multiple providers with the model gateway by using the API. In addition, you can manage the model gateway with integrated load-balancing, access policies, and rate limits.
For details, see Model gateway.
- Capture semantic meaning and refine retrieved results by using custom embedding and reranking models
-
You can now add custom embedding and reranking models to watsonx.ai and use them to capture semantic meaning and refine retrieved results.
- All custom foundation models now use the vLLM inferencing server
-
All custom foundation models now use the vLLM inferencing server. If your deployed models use the TGIS inferencing server, you might have to migrate them.
For details, see Requirements for deploying custom foundation models.
- New text classification API in watsonx.ai
- You can now use the new text classification method in the watsonx.ai REST API to classify your document
before you extract textual content to use in a RAG solution.
You can classify your document with the classification API into one of several supported common document types without running a longer extraction task. By pre-processing the document, you can then customize your text extraction request to efficiently extract relevant details from your document.
For details about the text classification API and the document understanding library, see Text classification [Text classification] and Understanding documents.
- New vector index transactional API in watsonx.ai
- You can now use the new vector index transactional API methods to create and manage vector index
assets in a project.
For details, see Create a vector index programmatically.
- Use the user interface for Synthetic Data Generator to generate unstructured synthetic data
- The user interface for creating jobs to generate unstructured synthetic data is now generally
available. The user interface in Synthetic Data Generator makes
creating and running jobs easier by organizing all the settings and seed document requirements into
simple options and fields.
For details, see Creating jobs to generate unstructured synthetic data.
- Improvements for AutoAI for RAG experiments
- You can now use the following features for your AutoAI for RAG experiments:
-
- Use semantic chunking in AutoAI for RAG experiments
-
You can now use the semantic chunking method to break down documents in an AutoAI for RAG experiment. Semantic chunking splits documents based on meaning, making it well-suited for complex or unstructured data.
For details, see Customizing chunking settings.
-
- Use chat API models in AutoAI for RAG experiments
-
You can now use chat API models in AutoAI for RAG experiments, instead of prompt template models. These models must have chat capabilities to work in AutoAI for RAG experiments.
For details, see Supported foundation models.
-
- Auto-deploy top pattern in AutoAI for RAG experiments
-
You can now enable automatic deployment of the top-performing pattern after an AutoAI for RAG experiment completes. You can turn on auto-deployment when you set up the experiment. Auto-deployment helps reduce manual steps and further automates the experiment workflow.
For details, see Creating the AutoAI RAG experiment.
-
- Use multiple vector indexes in AutoAI for RAG experiments
-
You can now select up to 20 vector indexes for your document collection in an AutoAI for RAG experiment. During experiment setup, when you add document and evaluation sources, choose Knowledge bases and then select up to 20 connections. You can then define details for each connection, such as index name and embedding models. Using multiple indexes gives you more flexibility and can improve the quality and performance of your experiments.
For details, see Using vector store knowledge bases in AutoAI RAG experiments.
-
- Use SQL database schemas in AutoAI for RAG experiments
-
You can now choose an SQL database schema as a knowledge base in an AutoAI for RAG experiment. You can use SQL connections such as Db2, PostgreSQL, and MySQL. When using SQL sources, chunking settings are disabled, and only answer correctness metrics are available for optimization. With an SQL RAG, structured data can be retrieved directly from the relational database, which can improve answer accuracy and relevance when compared with document-based sources.
For details, see Using SQL knowledge bases in AutoAI RAG experiments.
-
- Updates
- The following updates were introduced in this release:
- New data connections added in AutoAI for RAG experiments
- You can now use these data connections for document collections and test data in AutoAI for RAG experiments:
- Google Cloud Storage
- Box
- Dropbox
- Use the watsonx.ai chat API to control foundation model reasoning
- You can now configure the reasoning capability of foundation models and specify the amount of
details a model includes in the response by using new settings in the watsonx.ai chat API.
For details, see Adding generative chat function to your applications.
- Use new storage types to store your documents and results from the text extraction API
- You can now use the text extraction API with documents stored in the following data stores:
- Box
- IBM watsonx.data™ SharePoint
- IBM FileNet P8
For details, see Text extraction.
- Chat with documents in multiple vector data stores by using the Prompt Lab
- You can now select documents stored in multiple vector data stores to ground foundation models
in the chat mode in Prompt Lab.
For details, see Chatting with documents and media files.
- Fine tune foundation models using parameter-efficient fine tuning methods in the Tuning Studio
- You can now fine tune foundation models by using parameter-efficient tuning methods such as
low-rank adaptation and quantized low-rank adaptation tuning in the Tuning Studio UI in addition to the watsonx.ai tuning API.
For details, see Tuning Studio.
- Use NVIDIA RTX PRO 6000 GPUs with foundation models in watsonx.ai
-
You can now use NVIDIA RTX PRO 6000 GPUs to run your foundation models. The following models are supported withNVIDIA RTX PRO 6000 GPUs:
- granite-4-h-small
- granite-speech-3-3-8b
- granite-guardian-3-2-5b
- llama-3-2-11b-vision-instruct
- llama-guard-3-11b-vision
- Issues fixed in this release
- The following issues were fixed in this release:
- Installing the codestral-2508
foundation model results in a
CrashLoopBackOfferror -
- Issue: After installing the codestral-2508
model, the status
indicates a
CrashLoopBackOfferror. - Resolution: The issue is now fixed.
- Issue: After installing the codestral-2508
model, the status
indicates a
- Models may load indefinitely after upgrading to version 5.2.x
-
- Issue: After upgrading
watsonxaiifm, some models may fail to load and remain in an indefinite loading state during inference. - Resolution: The issue is now fixed.
- Issue: After upgrading
- Uploading large files in vector indexes may result in timeouts
-
- Issue: When you create a vector index, uploading large file types that are greater than 1GB may result in timeouts.
- Resolution: The issue is now fixed.
- Installing the codestral-2508
foundation model results in a
- Customer-reported issues fixed in this release
- For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.
- Deprecated features
- The following features were deprecated in this release:
- Prompt tuning is removed
- You can no longer use prompt tuning as a method to tune foundation models. All existing prompt
tuning deployments will be removed when you upgrade the watsonx.ai service.
For details about alternative tuning methods, see Foundation model tuning methods.
- Deprecated and removed foundation models
- The following foundation model is now deprecated and will be removed in a future release:
pixtral-12b
codestral-22bflan-t5-xl-3bgranite-13b-instruct-v2jais-13b-chatllama-4-scout-17b-16e-instructllama-3-405b-instructllama-2-13b-chatmistral-largemistral-small-24b-instruct-2501mistral-small-instructmixtral-8x7b-instruct-v01
For details, see Foundation model lifecycle.