What's new and changed in watsonx.ai
watsonx.ai updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.
You can see a list of the new features for the platform and all of the services at What's new in IBM Software Hub.
IBM® watsonx™ Version 2.1.2
A new version of watsonx.ai was released in March 2025.
This release includes the following changes:
- New features
-
This release of watsonx.ai includes the following features:
- Work with new foundation models in Prompt Lab
- You can now use the following foundation models for inferencing from the API and from the
Prompt Lab in watsonx.ai:
granite-3-2-8b-instruct
granite-vision-3-2-2b
mistral-small-24b-instruct-2501
For details, see Supported foundation models.
- Deploy custom foundation models globally
- Administrators can now deploy custom foundation models globally by using the watsonx.ai Inference Framework Manager (IFM)
operator, which makes these models available for use in all projects and deployment spaces. Globally
deployed models enable administrators to centrally manage and deploy custom models, while providing
data scientists and AI builders with access to these models across multiple projects and spaces.
These models can then be used for creating and deploying structured or freeform prompt templates in
all projects and spaces, and can be used for text generation or inferencing.
To learn more about inferencing custom foundation models that are deployed globally, see Inferencing deployed custom foundation models.
- Enhanced interaction with deployed custom foundation models through chat API
- You can now interact with your deployed custom foundation models through chat API. Chatting with
a deployed custom foundation model provides you with a more natural and intuitive way to interact
with the model, making it easier to ask questions, receive answers, and complete tasks. You can use
the chat API to interact with your model for use cases such as customer service, language
translation, and content generation to drive business value and improve user satisfaction.
For more information, see Inferencing deployed custom foundation models.
- Train, deploy, and inference custom foundation models that are fine-tuned with PEFT techniques
- You can now train, deploy, inference, and manage your custom foundation models that are
fine-tuned with Parameter-Efficient Fine-Tuning (PEFT) techniques in a more flexible way. By
deploying non-quantized (LoRA) or quantized (QLoRA) models using PEFT techniques, you can save
resources while taking advantage of the flexibility and customization options provided by using a
custom foundation model.
To learn more about fine tuning models, see Methods for tuning foundation models.
To learn more about deploying fine-tuned custom foundation models, see Deploying custom foundation models fine-tuned with PEFT.
- Streamlined deployment for AutoAI for RAG experiments
- From the AutoAI user interface, you can
now save and deploy a RAG pattern as an AI service so that you can easily access the inferencing
endpoint and test the pattern. This feature is available for RAG patterns developed using either the
in-memory Chroma database or a Milvus
database.
For more information, see Saving a RAG pattern.
- Issues fixed in this release
- The following issues were fixed in this release:
- The codestral-2501 foundation model is not removed after being uninstalled
-
- Issue: After you follow the instructions to remove the codestral-2501 foundation model from your deployment, the model remains on your cluster.
- Resolution: The codestral-2501 model is now removed after it is uninstalled.
- Customer-reported issues fixed in this release
- For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak® for Data on the IBM Support website.
- Deprecated features
- The following features were deprecated in this release:
- Deprecated foundation models
- The following models are now deprecated and will be withdrawn in a future release:
granite-8b-japanese
- Withdrawn foundation models
- The following models are now withdrawn from the watsonx.ai service:
codellama-34b-instruct-hf
granite-13b-chat-v2
granite-20b-multilingual
- Deprecation of Federated Learning
- Federated Learning is deprecated and will be removed in a future release.
For details, see Foundation model lifecycle.
IBM watsonx Version 2.1.1
A new version of watsonx.ai was released in February 2025.
This release includes the following changes:
- New features
-
This release of watsonx.ai includes the following features:
- Save resources by training, deploying, and inferencing fine-tuned models with LoRA and QLoRA adapters
-
Fine tuning is the process of training a model on additional training data by updating the weights of the model, which results in a completely new model. Fine tuning is resource intensive process and normally requires you to have multiple GPUs due to memory requirements. To save resources, you can now use parameter-efficient fine tuning (PEFT) techniques such as low-rank adaptation (LoRA) and quantized low-rank adaptation (QLoRA) to train, deploy, and inference your fine-tuned models programmatically.
The following foundation models are now available for you to customize by using fine tuning techniques:
granite-3-1-8b-base
llama-3-1-8b
llama-3-1-70b
llama-3-1-70b-gptq
To learn more about fine tuning foundation models, see:To learn more about deploying and inferencing a fine-tuned model with LoRA and QLoRA adapters, see Deploying a parameter-efficient fine-tuned (PEFT) model.
- Deploy generative AI applications with AI services
- You can now use AI services in watsonx.ai to deploy your applications. An AI
service is a deployable unit of code that you can use to capture the logic of your generative AI use
cases. While Python functions are the traditional way to deploy machine learning assets, AI services
offer a more flexible option to deploy code for generative AI applications, such as streaming. When
your AI services are successfully deployed, you can use the endpoint for inferencing from your
application.
For more information, see Deploying AI services.
- Deploy and inference
DeepSeek-R1
distilled models with watsonx.ai for accelerated development - You can now use the distilled variants of
DeepSeek-R1
, a powerful open-sourced reasoning model, to securely deploy and inferenceDeepSeek-R1
models with watsonx.ai, which enabled you to accelerate the development of AI-powered solutions. TheDeepSeek-R1
model can be deployed as a custom foundation model with watsonx.ai.To learn more about deploying custom foundation models, see Requirements for deploying custom foundation models.
- Enhancements for AutoAI for RAG experiments
- These features for AutoAI RAG experiments
are new with this release:
- You can now add documents in for Markdown format for your document collections.
- You can exercise greater control over chunking settings for creating RAG patterns.
- Use either a standalone Milvus vector database or a Milvus database created with watsonx.ai for storing your indexed documents.
For more information, see Automating a RAG pattern with AutoAI.
- Work with new foundation models in Prompt Lab
- You can now use the following foundation models for inferencing from the API and from the
Prompt Lab in watsonx.ai:
codestral-2501
llama-3-3-70b-instruct
mistral-large-instruct-2411
pixtral-large-instruct-2411
For details, see Supported foundation models.
- Work with new Granite embedding models for text matching and retrieval tasks
- You can now use the following embedding models to vectorize text in multiple languages:
granite-embedding-107m-multilingual-rtrvr
granite-embedding-278m-multilingual-rtrvr
For details, see Supported encoder models.
- Use IBM Granite time series foundation models and the watsonx.ai API to forecast future values
- Use the new time series API to pass historical data observations to an IBM Granite time series
foundation model that can forecast future values with zero-shot inferencing. You can now work with
the following time series models provided by IBM:
granite-ttm-512-96-r2
granite-ttm-1024-96-r2
granite-ttm-1536-96-r2
For details about how to use the forecast method of the watsonx.ai API, see Forecast future data values.
- Updates
- The following updates were introduced in this release:
- Use the text extraction API to process more file types, including files added directly in the API request body
- You can extract text from the following file types in addition to PDF files:
- GIF
- JPG
- PNG
- TIFF
For details, see Extracting text from documents.
- New versions of the Granite Instruct foundation models
- The
granite-3-2b-instruct
andgranite-3-8b-instruct
foundation models are updated to version 1.1.0. The latest modifications can handle larger prompts and provide better support for coding tasks. The context window length (input + output) for prompts is increased to 131,072 tokens.For details, see Supported foundation models.
- New versions of the Granite Guardian foundation models
- The
granite-guardian-3-2b
andgranite-guardian-3-8b
foundation models are updated to version 1.1.0. The latest versions of the models are trained with additional synthetic data to improve performance for risks that are related to hallucination and jailbreak. The context window length (input + output) for prompts is increased to 131,072 tokens.For details, see Supported foundation models.
- Issues fixed in this release
- The following issues were fixed in this release:
- Duplicate file names generate an error when training an AutoAI RAG pattern
-
- Issue: The following message is generated during training:
AutoAI RAG experiment failed with: Not unique document file names passed in connections
. - Resolution: Remove any duplicate files or rename files with different content but the same file name before you run the experiment.
- Issue: The following message is generated during training:
- AutoAI RAG experiment can fail because too many tokens are submitted for some models
-
- Issue: If you are running an AutoAI
experiment for retrieval-augmented generation (RAG), the experiment might fail with this
error:
Failure during generate. (POST [Internal URL]\nStatus code: 400, body: {"errors":[{"code":"invalid_input_argument","message":"Invalid input argument for Model \'google/flan-ul2\': the number of input tokens 5601 cannot exceed the total tokens limit 4096 for this model
- Resolution: To resolve the issue, open the Experiment settings page
for the experiment and try these configuration changes:
- Deselect the smallest foundation model available for the experiment.
- Disable the window retrieval method.
- Issue: If you are running an AutoAI
experiment for retrieval-augmented generation (RAG), the experiment might fail with this
error:
- Customer-reported issues fixed in this release
- For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.
- Deprecated features
- The following features were deprecated in this release:
- Deprecated foundation models
- The following models are now deprecated and will be withdrawn in a future release:
granite-13b-chat-v2
granite-20b-multilingual
codellama-34b-instruct-hf
llama-3-1-8b-instruct
llama-3-1-70b-instruct
For details, see Foundation model lifecycle.
IBM watsonx Version 2.1.0
A new version of watsonx.ai was released in December 2024.
This release includes the following changes:
- New features
-
This release of watsonx.ai includes the following features:
- New software specification for deploying custom foundation models
- You can now deploy custom foundation models by using the new
watsonx-cfm-caikit-1.1
software specification. This software specification is not available with every model architecture.For details, see Requirements for deploying custom foundation models.
- New model architectures for deploying custom foundation models
- You can now deploy custom foundation models from the following model architectures with the
vLLM
runtime:Bloom
Databricks
exaone
Falcon
GPTJ
Gemma
Gemma2
GPT_BigCode
GPT_Neox
GPTJ
GPT2
Granite
Jais
Llama
Marlin
Mistral
Mixtral
MPT
Nemotron
Olmo
Persimmon
Phi
Phi3
Qwen2
For more information, see Requirements for deploying custom foundation models.
- Deploy custom foundation models on MIG-enabled clusters
-
You can now deploy custom foundation models on a cluster with Multi-Instance GPU (MIG) enablement. MIG is useful when you want to deploy an application that does not require the full power of an entire GPU.
Review the supported model architectures and hardware and software requirements for deployment. For more information, see Requirements for deploying custom foundation models on MIG-enabled clusters.
- Deploy custom foundation models on specific GPU nodes
- You can now deploy custom foundation models on specific GPU nodes when you have multiple GPU
nodes available for deployment. Review the process of creating a customized hardware specification
to use a specific GPU node for deploying your custom foundation model.
For more information, see Creating custom hardware specifications.
- Automate the building of RAG patterns with AutoAI
- Use AutoAI to automate the
retrieval-augmented generation (RAG) process for a generative AI solution. Upload a collection of
documents and transform them into vectors that can be used to improve the output from a large
language model. Compare optimized pipelines to select the best RAG pattern for your
application.
For details, see Building RAG solutions with AutoAI.
- Simplify complex documents by using the text extraction API
- Simplify your complex documents so that they can be processed by foundation models as part of a
generative AI workflow. The text extraction API uses document understanding processing to extract
text from document structures such as images, diagrams, and tables that foundation models often
cannot interpret correctly.
For details, see Extracting text from documents.
- Chat with multimodal foundation models about images
- Add an image to your prompt and chat about the content of the image with a multimodal foundation
model that supports image-to-text tasks. You can chat about images from the Prompt Lab in chat mode or by using the Chat
API.
For details, see Chatting with documents and images.
- Build conversational workflows with the watsonx.ai chat API
- Use the watsonx.ai chat API to add
generative AI capabilities, including agent-driven calls to third-party tools and services, into
your applications.
For details, see Adding generative chat function to your apps and Building agent-driven chat workflows.
- Add contextual information to foundation model prompts in Prompt Lab
- Help a foundation model generate factual and up-to-date answers in RAG use cases by adding
relevant contextual information to your prompt as grounding data. You can upload relevant documents
or connect to a third-party vector store that has relevant data. When a new question is submitted,
the question is used to query the grounding data for relevant facts. The top search results plus the
original question are submitted as model input to help the foundation model incorporate relevant
facts in its output.
For details, see Adding vectorized documents for grounding foundation model prompts.
- Work with new foundation models in Prompt Lab
- You can now use the following foundation models for inferencing from the API and from the
Prompt Lab in watsonx.ai:
- Granite Guardian 3.0 models in 2 billion and 8 billion parameter sizes
- Granite Instruct 3.0 models in 2 billion and 8 billion parameter sizes
granite-20b-code-base-sql-gen
granite-20b-code-base-schema-linking
codestral-22b
- Llama 3.2 Instruct models in 1 billion and 3 billion parameter sizes
- Llama 3.2 Vision Instruct models in 11 billion and 90 billion parameter sizes
llama-guard-3-11b-vision
mistral-small
ministral-8b
pixtral-12b
For details, see Supported foundation models.
- Work with new embedding models for text matching and retrieval tasks
- You can now use the following embedding models to vectorize text in watsonx.ai:
all-minilm-l12-v2
For details, see Supported encoder models.
- Enhance search and retrieval tasks with the text rerank API
- Use the text rerank method in the watsonx.ai REST API together with the
ms-marco-minilm-l-12-v2
reranker model to reorder a set of document passages based on their similarity to a specified query. Reranking adds precision to your answer retrieval workflows.For details, see Reranking document passages.
- Review benchmarks that show how foundation models perform on common tasks
-
Review foundation model benchmarks to learn about the capabilities of foundation models deployed in watsonx.ai before you try them out. Compare how various foundation models perform on the tasks that matter most for your use case.
For details, see Foundation model benchmarks.
- Configure AI guardrails for user input and foundation model output separately in Prompt Lab
- Adjust the sensitivity of the AI guardrails that find and remove harmful content when you
experiment with foundation model prompts in Prompt Lab. You can set different filter
sensitivity levels for user input and model output text, and you can save effective AI guardrails
settings in prompt templates.
For details, see Removing harmful content.
- Updates
- The following updates were introduced in this release:
- New version of the
granite-3b-code-instruct
foundation model - The
granite-3b-code-instruct
foundation model is updated to version 2.0.0. The latest modification can handle larger prompts. The context window length (input + output) for prompts increased from 8,192 to 128,000.For details, see Supported foundation models.
- New version of the
- Issues fixed in this release
- The following issues were fixed in this release:
- Prompt template or prompt session loses the associated custom foundation model after an upgrade
-
- Issue: After you upgrade the software, when you open a prompt template or prompt session
asset for a custom foundation model that was saved in an earlier software version, the foundation
model field is empty. The Model field shows
No model selected
. - Resolution: A prompt template asset saved in an earlier software version retains the associated custom foundation model after an upgrade.
- Issue: After you upgrade the software, when you open a prompt template or prompt session
asset for a custom foundation model that was saved in an earlier software version, the foundation
model field is empty. The Model field shows
- Cannot preview chat and freeform prompt templates that are saved to a catalog
-
- Issue: When you save a prompt template that was created in the Prompt Lab in structured mode, and then add the prompt template to a catalog, you can preview the prompt template asset from the Assets page of the catalog. However, when you try to preview a prompt template asset that was created in chat or freeform mode, a blank page is displayed.
- Resolution: You can preview prompt templates that are created in chat and freeform modes and saved to a catalog.
- Customer-reported issues fixed in this release
- For a list of customer-reported issues that were fixed in this release, see the Fix List for IBM Cloud Pak for Data on the IBM Support website.
- Deprecated features
- The following features were deprecated in this release:
- Deprecated foundation models
- The following models are now deprecated and will be withdrawn in a future release:
granite-7b-lab
llama2-13b-dpo-v7
llama-3-8b-instruct
llama-3-70b-instruct
mt0-xxl-13b
- Withdrawn foundation models
- The following models are now withdrawn from the watsonx.ai service:
llama-2-70b-chat
merlinite-7b
For details, see Foundation model lifecycle.