Foundation models in IBM watsonx.ai
You can deploy a collection of third-party and IBM models in IBM watsonx.ai.
GPU requirements
- NVIDIA A100 GPUs with 80 GB RAM
- NVIDIA H100 GPUs with 80 GB RAM
- NVIDIA H100 GPUs with 94 GB RAM
- NVIDIA L40S GPUs with 48 GB RAM (Not supported with all models. See tables for details.)
- L40S: GPU memory requirement / 48. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/48 = 5.1. The model needs 8 GPUs.
- A100/H100: GPU memory requirement / 80. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/80 = 3. The model needs 4 GPUs.
You might be able to run some models with fewer GPUs at context lengths other than the maximum or subject to other performance tradeoffs and constraints. If you use a configuration with fewer than the recommended number of GPUs, make sure to test the deployment to verify that the performance is satisfactory before you use the configuration in production. If you use a configuration with more than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.
You can optionally partition A100 or H100 GPU processors to add more than one foundation model to
a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be
partitioned indicate Yes for NVIDIA Multi-Instance GPU support in the foundation models table.
When you calculate the total number of GPUs that you need for your deployment, consider whether you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.
Provided foundation models
The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation models that are provided with IBM watsonx.ai at the default context window length for each model. Minimum system requirements may vary based on the context length you set, the number of model parameters, the model parameters' precision, and more.
For details about the foundation models provided with IBM watsonx.ai, including the default context window length, see Supported foundation models.
| Foundation model | Description | System requirements | Supported GPUs | Group name |
|---|---|---|---|---|
|
A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks. |
|
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
ibmwxAllam113bInstruct |
|
Code Llama is an AI model built on top of Llama 2, fine-tuned for generating and discussing code. |
|
|
ibmwxCodellamaCodellama34bInstructHf |
|
Ideal for complex tasks that require large reasoning capabilities or are highly
specialized. Attention: You must purchase Mistral AI with IBM separately before
you are entitled to download and use this model.
|
|
|
ibmwxCodestral2501 |
|
Ideal for complex tasks that require large reasoning capabilities or are highly
specialized. Attention: You must purchase Mistral AI with IBM separately before
you are entitled to download and use this model.
|
|
|
ibmwxCodestral22B |
|
General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese. |
|
|
ibmwxElyzaJapaneseLlama27bInstruct |
|
General use with zero- or few-shot prompts. Note: This foundation model can be
prompt tuned.
|
|
|
ibmwxGoogleFlanT5xl |
|
General use with zero- or few-shot prompts. |
|
|
ibmwxGoogleFlanT5xxl |
|
General use with zero- or few-shot prompts. |
|
|
ibmwxGoogleFlanul2 |
|
InstructLab foundation model from IBM that supports knowledge and skills contributed by the open source community. |
|
|
ibmwxGranite7bLab |
|
A per-trained instruct variant model from IBM designed to work with Japanese text. |
|
|
ibmwxGranite8bJapanese |
|
General use model from IBM that is optimized for dialogue use cases. |
|
|
ibmwxGranite13bChatv2 |
|
General use model from IBM that is optimized for question and answer use cases. Note: This model can be prompt tuned.
|
|
|
ibmwxGranite13bInstructv2 |
|
The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well-suited for generative tasks. |
|
|
ibmwxGranite20bMultilingual |
|
Granite 3.2 8b Instruct is a text-only model capable of reasoning which you can be enable or disable to use the capability that fits your use case. |
|
|
ibmwxGranite328BInstruct |
|
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community. |
|
|
ibmwxGranite32BInstruct |
|
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community. |
|
|
ibmwxGranite38BInstruct |
|
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community. |
|
|
ibmwxGraniteGuardian32b |
|
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community. |
|
|
ibmwxGraniteGuardian38b |
|
A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion. |
|
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
ibmwxGranite3bCodeInstruct |
|
An 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion. |
|
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
ibmwxGranite8bCodeInstruct |
|
A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion. |
|
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
ibmwxGranite20bCodeInstruct |
|
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community. |
|
|
ibmwxGranite20bCodeBaseSchemaLinking |
|
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community. |
|
|
ibmwxGranite20bCodeBaseSqlGen |
|
A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion. |
|
|
ibmwxGranite34bCodeInstruct |
|
Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks. |
|
|
ibmwxGraniteVision322Bs |
|
General use foundation model for generative tasks in Arabic. |
|
|
ibmwxCore42Jais13bChat |
|
A state-of-the-art refresh of the Llama 3.1 70B Instruct model by using the latest advancements in post training techniques. |
|
|
ibmwxLlama3370BInstruct |
|
A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output. |
|
|
ibmwxLlama321bInstruct |
|
A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output. |
|
|
ibmwxLlama323bInstruct |
|
A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output. |
|
|
ibmwxLlama3211bVisionInstruct |
|
A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output. |
|
|
ibmwxLlama3290bVisionInstruct |
|
A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output. |
|
|
ibmwxLlamaGuard311bVision |
|
An auto-regressive language model that uses an optimized transformer architecture. |
|
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
ibmwxLlama318bInstruct |
|
An auto-regressive language model that uses an optimized transformer architecture. |
|
|
ibmwxLlama3170bInstruct |
|
Meta's largest open-sourced foundation model to date, with 405 billion parameters, and optimized for dialogue use cases. |
|
|
ibmwxLlama3405bInstruct |
|
Pre-trained and instruction tuned generative text model optimized for dialogue use cases. |
|
|
ibmwxMetaLlamaLlama38bInstruct |
|
Pre-trained and instruction tuned generative text model optimized for dialogue use cases. |
|
|
ibmwxMetaLlamaLlama370bInstruct |
|
General use with zero- or few-shot prompts. Optimized for dialogue use cases. Note: This model can be prompt tuned.
|
|
|
ibmwxMetaLlamaLlama213bChat |
|
General use foundation model for generative tasks in Korean. |
|
|
ibmwxMncaiLlama213bDpov7 |
|
Ideal for complex tasks that require large reasoning capabilities or are highly
specialized. Attention: You must purchase Mistral AI with IBM separately before
you are entitled to download and use this model.
|
|
|
ibmwxMinistral8BInstruct |
|
Mistral Small 3 ( 2501 ) sets a new benchmark in the small Large Language Models category with less than 70 billion parameters. With a size of 24 billion parameters, the model achieves state-of-the-art capabilities comparable to larger models. |
|
|
ibmwxMistralSmall24BInstruct2501 |
|
Ideal for complex tasks that require large reasoning capabilities or are highly
specialized. Attention: You must purchase Mistral AI with IBM separately before
you are entitled to download and use this model.
|
|
|
ibmwxMistralSmallInstruct |
|
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art
reasoning capabilities that can be applied to any language-based task, including the most
sophisticated ones. Attention: You must purchase Mistral AI with IBM separately
before you are entitled to download and use this model.
|
|
|
ibmwxMistralLargeInstruct2411 |
|
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art
reasoning capabilities that can be applied to any language-based task, including the most
sophisticated ones. Attention: You must purchase Mistral AI with IBM separately
before you are entitled to download and use this model.
|
|
|
ibmwxMistralLarge |
|
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of
Experts. Mixtral-8x7B is not a commercial model and does not require a separate entitlement. |
|
|
ibmwxMistralaiMixtral8x7bInstructv01 |
|
General use with zero- or few-shot prompts. Supports prompts in languages other than English and multilingual prompts. |
|
|
ibmwxBigscienceMt0xxl |
|
A a 124-billion multimodal model built on top of Mistral Large 2, and demonstrates
frontier-level image understanding. Attention: You must purchase Mistral AI with
IBM separately before you are entitled to download and use this model.
|
|
|
ibmwxPixtralLargeInstruct |
|
A 12-billion parameter model pre-trained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation. |
|
|
ibmwxPixtral12b |
You cannot add deprecated or withdrawn models to your deployment. For more information about how deprecated and withdrawn models are handled, see Foundation model lifecycle.
Custom foundation models
- Full-service installation: Deploying custom foundation models
- Lightweight engine installation: Adding custom foundation models to watsonx.ai Lightweight Engine
Embedding and reranker models
| Model | System requirements | Group name |
|---|---|---|
|
|
ibmwxAllMinilmL6V2 |
|
|
ibmwxAllMinilmL12V2 |
|
|
ibmwxGranite107MMultilingualRtrvr |
|
|
ibmwxGranite278MMultilingualRtrvr |
|
|
ibmwxMsMarcoMinilmL12V2 |
|
|
ibmwxMultilingualE5Large |
|
|
ibmwxSlate30mEnglishRtrvr |
|
|
ibmwxSlate125mEnglishRtrvr |
Text extraction models
| Model | System requirements | Group name |
|---|---|---|
|
|
Not necessary. The models are always downloaded because they have a small footprint. |
Time series foundation models
5.1.1 and laterYou can use the time series API to pass historical data observations to a time series foundation model that can forecast future values. You can deploy the following time series foundation models:
| Model | System requirements | Group name |
|---|---|---|
|
|
ibmwxGraniteTimeseriesTtmV1 |
|
|
ibmwxGraniteTimeseriesTtmV1 |
|
|
ibmwxGraniteTimeseriesTtmV1 |
Foundation models compatible with LoRA and QLoRA fine tuning
5.1.1 and laterYou can use Parameter-Efficient Fine Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA) to train, deploy, and inference foundation models. The foundation models compatible with LoRA and QLoRA tuning can only be fine tuned. Unlike most large language models that are provided with IBM watsonx.ai, these models cannot be inferenced in the Prompt Lab or programmatically by using the API right away. The only way to inference one of these base models is to deploy the model as a custom foundation model.
You can deploy the following foundation models that are compatible with LoRA and QLoRA fine tuning:
| Model | System requirements | Supported GPUs | Group name |
|---|---|---|---|
|
|
|
ibmwxGranite318BBase |
|
|
|
ibmwxLlama318B |
|
|
|
ibmwxLlama318B |
|
|
|
ibmwxLlama3170BGptq |