What's new and changed in watsonx.ai

watsonx.ai updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM Software Hub.

IBM® watsonx™ Version 2.1.2

A new version of watsonx.ai was released in March 2025.

This release includes the following changes:

New features
This release of watsonx.ai includes the following features:
Work with new foundation models in Prompt Lab
You can now use the following foundation models for inferencing from the API and from the Prompt Lab in watsonx.ai:
  • granite-3-2-8b-instruct
  • granite-vision-3-2-2b
  • mistral-small-24b-instruct-2501

For details, see Supported foundation models.

Deploy custom foundation models globally
Administrators can now deploy custom foundation models globally by using the watsonx.ai Inference Framework Manager (IFM) operator, which makes these models available for use in all projects and deployment spaces. Globally deployed models enable administrators to centrally manage and deploy custom models, while providing data scientists and AI builders with access to these models across multiple projects and spaces. These models can then be used for creating and deploying structured or freeform prompt templates in all projects and spaces, and can be used for text generation or inferencing.

To learn more about inferencing custom foundation models that are deployed globally, see Inferencing deployed custom foundation models.

Enhanced interaction with deployed custom foundation models through chat API
You can now interact with your deployed custom foundation models through chat API. Chatting with a deployed custom foundation model provides you with a more natural and intuitive way to interact with the model, making it easier to ask questions, receive answers, and complete tasks. You can use the chat API to interact with your model for use cases such as customer service, language translation, and content generation to drive business value and improve user satisfaction.

For more information, see Inferencing deployed custom foundation models.

Train, deploy, and inference custom foundation models that are fine-tuned with PEFT techniques
You can now train, deploy, inference, and manage your custom foundation models that are fine-tuned with Parameter-Efficient Fine-Tuning (PEFT) techniques in a more flexible way. By deploying non-quantized (LoRA) or quantized (QLoRA) models using PEFT techniques, you can save resources while taking advantage of the flexibility and customization options provided by using a custom foundation model.

To learn more about fine tuning models, see Methods for tuning foundation models.

To learn more about deploying fine-tuned custom foundation models, see Deploying custom foundation models fine-tuned with PEFT.

Streamlined deployment for AutoAI for RAG experiments
From the AutoAI user interface, you can now save and deploy a RAG pattern as an AI service so that you can easily access the inferencing endpoint and test the pattern. This feature is available for RAG patterns developed using either the in-memory Chroma database or a Milvus database.

For more information, see Saving a RAG pattern.

Issues fixed in this release
The following issues were fixed in this release:
The codestral-2501 foundation model is not removed after being uninstalled
  • Issue: After you follow the instructions to remove the codestral-2501 foundation model from your deployment, the model remains on your cluster.
  • Resolution: The codestral-2501 model is now removed after it is uninstalled.
Customer-reported issues fixed in this release
For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak® for Data on the IBM Support website.
Deprecated features
The following features were deprecated in this release:
Deprecated foundation models
The following models are now deprecated and will be withdrawn in a future release:
  • granite-8b-japanese
Withdrawn foundation models
The following models are now withdrawn from the watsonx.ai service:
  • codellama-34b-instruct-hf
  • granite-13b-chat-v2
  • granite-20b-multilingual
Deprecation of Federated Learning
Federated Learning is deprecated and will be removed in a future release.

For details, see Foundation model lifecycle.

IBM watsonx Version 2.1.1

A new version of watsonx.ai was released in February 2025.

This release includes the following changes:

New features
This release of watsonx.ai includes the following features:
Save resources by training, deploying, and inferencing fine-tuned models with LoRA and QLoRA adapters

Fine tuning is the process of training a model on additional training data by updating the weights of the model, which results in a completely new model. Fine tuning is resource intensive process and normally requires you to have multiple GPUs due to memory requirements. To save resources, you can now use parameter-efficient fine tuning (PEFT) techniques such as low-rank adaptation (LoRA) and quantized low-rank adaptation (QLoRA) to train, deploy, and inference your fine-tuned models programmatically.

The following foundation models are now available for you to customize by using fine tuning techniques:

  • granite-3-1-8b-base
  • llama-3-1-8b
  • llama-3-1-70b
  • llama-3-1-70b-gptq

To learn more about deploying and inferencing a fine-tuned model with LoRA and QLoRA adapters, see Deploying a parameter-efficient fine-tuned (PEFT) model.

Deploy generative AI applications with AI services
You can now use AI services in watsonx.ai to deploy your applications. An AI service is a deployable unit of code that you can use to capture the logic of your generative AI use cases. While Python functions are the traditional way to deploy machine learning assets, AI services offer a more flexible option to deploy code for generative AI applications, such as streaming. When your AI services are successfully deployed, you can use the endpoint for inferencing from your application.

For more information, see Deploying AI services.

Deploy and inference DeepSeek-R1 distilled models with watsonx.ai for accelerated development
You can now use the distilled variants of DeepSeek-R1, a powerful open-sourced reasoning model, to securely deploy and inference DeepSeek-R1 models with watsonx.ai, which enabled you to accelerate the development of AI-powered solutions. The DeepSeek-R1 model can be deployed as a custom foundation model with watsonx.ai.

To learn more about deploying custom foundation models, see Requirements for deploying custom foundation models.

Enhancements for AutoAI for RAG experiments
These features for AutoAI RAG experiments are new with this release:
  • You can now add documents in for Markdown format for your document collections.
  • You can exercise greater control over chunking settings for creating RAG patterns.
  • Use either a standalone Milvus vector database or a Milvus database created with watsonx.ai for storing your indexed documents.

For more information, see Automating a RAG pattern with AutoAI.

Work with new foundation models in Prompt Lab
You can now use the following foundation models for inferencing from the API and from the Prompt Lab in watsonx.ai:
  • codestral-2501
  • llama-3-3-70b-instruct
  • mistral-large-instruct-2411
  • pixtral-large-instruct-2411

For details, see Supported foundation models.

Work with new Granite embedding models for text matching and retrieval tasks
You can now use the following embedding models to vectorize text in multiple languages:
  • granite-embedding-107m-multilingual-rtrvr
  • granite-embedding-278m-multilingual-rtrvr

For details, see Supported encoder models.

Use IBM Granite time series foundation models and the watsonx.ai API to forecast future values
Use the new time series API to pass historical data observations to an IBM Granite time series foundation model that can forecast future values with zero-shot inferencing. You can now work with the following time series models provided by IBM:
  • granite-ttm-512-96-r2
  • granite-ttm-1024-96-r2
  • granite-ttm-1536-96-r2

For details about how to use the forecast method of the watsonx.ai API, see Forecast future data values.

Updates
The following updates were introduced in this release:
Use the text extraction API to process more file types, including files added directly in the API request body
You can extract text from the following file types in addition to PDF files:
  • GIF
  • JPG
  • PNG
  • TIFF

For details, see Extracting text from documents.

New versions of the Granite Instruct foundation models
The granite-3-2b-instruct and granite-3-8b-instruct foundation models are updated to version 1.1.0. The latest modifications can handle larger prompts and provide better support for coding tasks. The context window length (input + output) for prompts is increased to 131,072 tokens.

For details, see Supported foundation models.

New versions of the Granite Guardian foundation models
The granite-guardian-3-2b and granite-guardian-3-8b foundation models are updated to version 1.1.0. The latest versions of the models are trained with additional synthetic data to improve performance for risks that are related to hallucination and jailbreak. The context window length (input + output) for prompts is increased to 131,072 tokens.

For details, see Supported foundation models.

Issues fixed in this release
The following issues were fixed in this release:
Duplicate file names generate an error when training an AutoAI RAG pattern
  • Issue: The following message is generated during training: AutoAI RAG experiment failed with: Not unique document file names passed in connections.
  • Resolution: Remove any duplicate files or rename files with different content but the same file name before you run the experiment.
AutoAI RAG experiment can fail because too many tokens are submitted for some models
  • Issue: If you are running an AutoAI experiment for retrieval-augmented generation (RAG), the experiment might fail with this error:
    Failure during generate. (POST [Internal URL]\nStatus code: 400, body: 
    {"errors":[{"code":"invalid_input_argument","message":"Invalid input argument for Model 
    \'google/flan-ul2\': the number of input tokens 5601 cannot exceed the total tokens limit 
    4096 for this model
  • Resolution: To resolve the issue, open the Experiment settings page for the experiment and try these configuration changes:
    • Deselect the smallest foundation model available for the experiment.
    • Disable the window retrieval method.
    Run the experiment again.
Customer-reported issues fixed in this release
For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.
Deprecated features
The following features were deprecated in this release:
Deprecated foundation models
The following models are now deprecated and will be withdrawn in a future release:
  • granite-13b-chat-v2
  • granite-20b-multilingual
  • codellama-34b-instruct-hf
  • llama-3-1-8b-instruct
  • llama-3-1-70b-instruct

For details, see Foundation model lifecycle.

IBM watsonx Version 2.1.0

A new version of watsonx.ai was released in December 2024.

This release includes the following changes:

New features
This release of watsonx.ai includes the following features:
New software specification for deploying custom foundation models
You can now deploy custom foundation models by using the new watsonx-cfm-caikit-1.1 software specification. This software specification is not available with every model architecture.

For details, see Requirements for deploying custom foundation models.

New model architectures for deploying custom foundation models
You can now deploy custom foundation models from the following model architectures with the vLLM runtime:
  • Bloom
  • Databricks
  • exaone
  • Falcon
  • GPTJ
  • Gemma
  • Gemma2
  • GPT_BigCode
  • GPT_Neox
  • GPTJ
  • GPT2
  • Granite
  • Jais
  • Llama
  • Marlin
  • Mistral
  • Mixtral
  • MPT
  • Nemotron
  • Olmo
  • Persimmon
  • Phi
  • Phi3
  • Qwen2

For more information, see Requirements for deploying custom foundation models.

Deploy custom foundation models on MIG-enabled clusters

You can now deploy custom foundation models on a cluster with Multi-Instance GPU (MIG) enablement. MIG is useful when you want to deploy an application that does not require the full power of an entire GPU.

Review the supported model architectures and hardware and software requirements for deployment. For more information, see Requirements for deploying custom foundation models on MIG-enabled clusters.

Deploy custom foundation models on specific GPU nodes
You can now deploy custom foundation models on specific GPU nodes when you have multiple GPU nodes available for deployment. Review the process of creating a customized hardware specification to use a specific GPU node for deploying your custom foundation model.

For more information, see Creating custom hardware specifications.

Automate the building of RAG patterns with AutoAI
Use AutoAI to automate the retrieval-augmented generation (RAG) process for a generative AI solution. Upload a collection of documents and transform them into vectors that can be used to improve the output from a large language model. Compare optimized pipelines to select the best RAG pattern for your application.
Screen capture of AutoAI RAG experiment summary

For details, see Building RAG solutions with AutoAI.

Simplify complex documents by using the text extraction API
Simplify your complex documents so that they can be processed by foundation models as part of a generative AI workflow. The text extraction API uses document understanding processing to extract text from document structures such as images, diagrams, and tables that foundation models often cannot interpret correctly.

For details, see Extracting text from documents.

Chat with multimodal foundation models about images
Add an image to your prompt and chat about the content of the image with a multimodal foundation model that supports image-to-text tasks. You can chat about images from the Prompt Lab in chat mode or by using the Chat API.

For details, see Chatting with documents and images.

Build conversational workflows with the watsonx.ai chat API
Use the watsonx.ai chat API to add generative AI capabilities, including agent-driven calls to third-party tools and services, into your applications.

For details, see Adding generative chat function to your apps and Building agent-driven chat workflows.

Add contextual information to foundation model prompts in Prompt Lab
Help a foundation model generate factual and up-to-date answers in RAG use cases by adding relevant contextual information to your prompt as grounding data. You can upload relevant documents or connect to a third-party vector store that has relevant data. When a new question is submitted, the question is used to query the grounding data for relevant facts. The top search results plus the original question are submitted as model input to help the foundation model incorporate relevant facts in its output.

For details, see Adding vectorized documents for grounding foundation model prompts.

Work with new foundation models in Prompt Lab
You can now use the following foundation models for inferencing from the API and from the Prompt Lab in watsonx.ai:
  • Granite Guardian 3.0 models in 2 billion and 8 billion parameter sizes
  • Granite Instruct 3.0 models in 2 billion and 8 billion parameter sizes
  • granite-20b-code-base-sql-gen
  • granite-20b-code-base-schema-linking
  • codestral-22b
  • Llama 3.2 Instruct models in 1 billion and 3 billion parameter sizes
  • Llama 3.2 Vision Instruct models in 11 billion and 90 billion parameter sizes
  • llama-guard-3-11b-vision
  • mistral-small
  • ministral-8b
  • pixtral-12b

For details, see Supported foundation models.

Work with new embedding models for text matching and retrieval tasks
You can now use the following embedding models to vectorize text in watsonx.ai:
  • all-minilm-l12-v2

For details, see Supported encoder models.

Enhance search and retrieval tasks with the text rerank API
Use the text rerank method in the watsonx.ai REST API together with the ms-marco-minilm-l-12-v2 reranker model to reorder a set of document passages based on their similarity to a specified query. Reranking adds precision to your answer retrieval workflows.

For details, see Reranking document passages.

Review benchmarks that show how foundation models perform on common tasks

Review foundation model benchmarks to learn about the capabilities of foundation models deployed in watsonx.ai before you try them out. Compare how various foundation models perform on the tasks that matter most for your use case.

For details, see Foundation model benchmarks.

Configure AI guardrails for user input and foundation model output separately in Prompt Lab
Adjust the sensitivity of the AI guardrails that find and remove harmful content when you experiment with foundation model prompts in Prompt Lab. You can set different filter sensitivity levels for user input and model output text, and you can save effective AI guardrails settings in prompt templates.

For details, see Removing harmful content.

Updates
The following updates were introduced in this release:
New version of the granite-3b-code-instruct foundation model
The granite-3b-code-instruct foundation model is updated to version 2.0.0. The latest modification can handle larger prompts. The context window length (input + output) for prompts increased from 8,192 to 128,000.

For details, see Supported foundation models.

Issues fixed in this release
The following issues were fixed in this release:
Prompt template or prompt session loses the associated custom foundation model after an upgrade
  • Issue: After you upgrade the software, when you open a prompt template or prompt session asset for a custom foundation model that was saved in an earlier software version, the foundation model field is empty. The Model field shows No model selected.
  • Resolution: A prompt template asset saved in an earlier software version retains the associated custom foundation model after an upgrade.
Cannot preview chat and freeform prompt templates that are saved to a catalog
  • Issue: When you save a prompt template that was created in the Prompt Lab in structured mode, and then add the prompt template to a catalog, you can preview the prompt template asset from the Assets page of the catalog. However, when you try to preview a prompt template asset that was created in chat or freeform mode, a blank page is displayed.
  • Resolution: You can preview prompt templates that are created in chat and freeform modes and saved to a catalog.
Customer-reported issues fixed in this release
For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.
Deprecated features
The following features were deprecated in this release:
Deprecated foundation models
The following models are now deprecated and will be withdrawn in a future release:
  • granite-7b-lab
  • llama2-13b-dpo-v7
  • llama-3-8b-instruct
  • llama-3-70b-instruct
  • mt0-xxl-13b
Withdrawn foundation models
The following models are now withdrawn from the watsonx.ai service:
  • llama-2-70b-chat
  • merlinite-7b

For details, see Foundation model lifecycle.