Foundation models in IBM watsonx.ai

You can deploy a collection of third-party and IBM models in IBM watsonx.ai.

GPU requirements

One of the following types of GPUs is required to support the use of foundation models in IBM watsonx.ai:
  • NVIDIA A100 GPUs with 80 GB RAM
  • NVIDIA H100 GPUs with 80 GB RAM
  • NVIDIA H100 GPUs with 94 GB RAM
  • NVIDIA L40S GPUs with 48 GB RAM (Not supported with all models. See tables for details.)
Attention: You can install the IBM watsonx.ai service on the VMware vSphere platform with GPUs configured in passthrough mode. You cannot use virtual GPUs (vGPUs) with watsonx.ai™.
A general guideline for calculating the number of GPUs required for hosting the model is as follows:
  • L40S: GPU memory requirement / 48. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/48 = 5.1. The model needs 8 GPUs.
  • A100/H100: GPU memory requirement / 80. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/80 = 3. The model needs 4 GPUs.
To get an idea of the minimum GPU memory requirements, multiply the number of billion parameters of the model by 3. For example, for a foundation model with 12 billion parameters, multiply 12 by 3. An initial estimate of the memory required by the model is 36 GB. Then add 1 GB per 100,000 tokens in the context window length.

You might be able to run some models with fewer GPUs at context lengths other than the maximum or subject to other performance tradeoffs and constraints. If you use a configuration with fewer than the recommended number of GPUs, make sure to test the deployment to verify that the performance is satisfactory before you use the configuration in production. If you use a configuration with more than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.

You can optionally partition A100 or H100 GPU processors to add more than one foundation model to a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be partitioned indicate Yes for NVIDIA Multi-Instance GPU support in the foundation models table.

Restriction: You cannot tune foundation models in NVIDIA Multi-Instance GPU enabled clusters.

When you calculate the total number of GPUs that you need for your deployment, consider whether you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.

Provided foundation models

The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation models that are provided with IBM watsonx.ai at the default context window length for each model. Minimum system requirements may vary based on the context length you set, the number of model parameters, the model parameters' precision, and more.

For details about the foundation models provided with IBM watsonx.ai, including the default context window length, see Supported foundation models.

Note: You do not need to prepare these resources in addition to the overall service hardware requirements. If you meet the prerequisite hardware requirements for the service, you already have the resources you need. The following table describes the subset of resources that are required per model.
The following table describes the provided foundation models that you can deploy after you install the service.
Foundation model Description System requirements Supported GPUs Group name
Model name
allam-1-13b-instruct
Model ID
allam-1-13b-instruct
A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
ibmwxAllam113bInstruct
Model name
codellama-34b-instruct-hf
Model ID
codellama-codellama-34b-instruct-hf
Restriction: Withdrawn in 5.1.2
Code Llama is an AI model built on top of Llama 2, fine-tuned for generating and discussing code.
CPUs
3
Memory
96 GB RAM
Storage
77 GB
Configuration
You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 2 NVIDIA L40S

The 2 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxCodellamaCodellama34bInstructHf
Model name
codestral-2501
Model ID
codestral-2501
New in 5.1.1
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100

The 2 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
Yes, with additional configuration. For details, see Installing models on GPU partitions.
ibmwxCodestral2501
Model name
codestral-22b
Model ID
codestral-22b
New in 5.1.0
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
2
Memory
96 GB RAM
Storage
50 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxCodestral22B
Model name
elyza-japanese-llama-2-7b-instruct
Model ID
elyza-japanese-llama-2-7b-instruct
General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.
CPUs
2
Memory
96 GB RAM
Storage
50 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxElyzaJapaneseLlama27bInstruct
Model name
flan-t5-xl-3b
Model ID
google-flan-t5-xl
General use with zero- or few-shot prompts.
Note: This foundation model can be prompt tuned.
CPUs
2
Memory
128 GB RAM
Storage
21 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGoogleFlanT5xl
Model name
flan-t5-xxl-11b
Model ID
google-flan-t5-xxl
General use with zero- or few-shot prompts.
CPUs
2
Memory
128 GB RAM
Storage
52 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGoogleFlanT5xxl
Model name
flan-ul2-20b
Model ID
google-flan-ul2
General use with zero- or few-shot prompts.
CPUs
Based on the GPU type, the following number of CPUs are required:
  • 2 with NVIDIA A100
  • 2 with NVIDIA H100
  • 3 with NVIDIA L40S
Memory
128 GB RAM
Storage
85 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 2 NVIDIA L40S

The GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxGoogleFlanul2
Model name
granite-7b-lab
Model ID
ibm-granite-7b-lab
Restriction: Deprecated
InstructLab foundation model from IBM that supports knowledge and skills contributed by the open source community.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGranite7bLab
Model name
granite-8b-japanese
Model ID
ibm-granite-8b-japanese
Restriction: Deprecated
A per-trained instruct variant model from IBM designed to work with Japanese text.
CPUs
2
Memory
96 GB RAM
Storage
50 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGranite8bJapanese
Model name
granite-13b-chat-v2
Model ID
ibm-granite-13b-chat-v2
Restriction: Withdrawn in 5.1.2
General use model from IBM that is optimized for dialogue use cases.
CPUs
2
Memory
128 GB RAM
Storage
36 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGranite13bChatv2
Model name
granite-13b-instruct-v2
Model ID
ibm-granite-13b-instruct-v2
General use model from IBM that is optimized for question and answer use cases.
Note: This model can be prompt tuned.
CPUs
2
Memory
128 GB RAM
Storage
62 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGranite13bInstructv2
Model name
granite-20b-multilingual
Model ID
ibm-granite-20b-multilingual
Restriction: Withdrawn in 5.1.2
The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well-suited for generative tasks.
CPUs
2
Memory
96 GB RAM
Storage
100 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This foundation model cannot be sharded across multiple GPUs.
NVIDIA Multi-Instance GPU support
No
ibmwxGranite20bMultilingual
Model name
granite-3-2-8b-instruct
Model ID
granite-3-2-8b-instruct
New in 5.1.2
Granite 3.2 8b Instruct is a text-only model capable of reasoning which you can be enable or disable to use the capability that fits your use case.
CPUs
2
Memory
32 GB RAM
Storage
20 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 2 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxGranite328BInstruct
Model name
granite-3-2b-instruct
Model ID
granite-3-2b-instruct
New in 5.1.0
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
CPUs
2
Memory
96 GB RAM
Storage
6 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGranite32BInstruct
Model name
granite-3-8b-instruct
Model ID
granite-3-8b-instruct
New in 5.1.0
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
CPUs
2
Memory
96 GB RAM
Storage
20 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGranite38BInstruct
Model name
granite-guardian-3-2b
Model ID
granite-guardian-3-2b
New in 5.1.0
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
CPUs
2
Memory
96 GB RAM
Storage
10 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGraniteGuardian32b
Model name
granite-guardian-3-8b
Model ID
granite-guardian-3-8b
New in 5.1.0
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
CPUs
2
Memory
96 GB RAM
Storage
20 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxGraniteGuardian38b
Model name
granite-3b-code-instruct
Model ID
granite-3b-code-instruct
A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPUs
2
Memory
96 GB RAM
Storage
9 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
ibmwxGranite3bCodeInstruct
Model name
granite-8b-code-instruct
Model ID
granite-8b-code-instruct
An 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPUs
2
Memory
96 GB RAM
Storage
19 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
ibmwxGranite8bCodeInstruct
Model name
granite-20b-code-instruct
Model ID
granite-20b-code-instruct
A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPUs
2
Memory
96 GB RAM
Storage
70 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
ibmwxGranite20bCodeInstruct
Model name
granite-20b-code-base-schema-linking
Model ID
granite-20b-code-base-schema-linking
New in 5.1.0
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
CPUs
2
Memory
96 GB RAM
Storage
44 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxGranite20bCodeBaseSchemaLinking
Model name
granite-20b-code-base-sql-gen
Model ID
granite-20b-code-base-sql-gen
New in 5.1.0
Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.
CPUs
2
Memory
96 GB RAM
Storage
44 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxGranite20bCodeBaseSqlGen
Model name
granite-34b-code-instruct
Model ID
granite-34b-code-instruct
A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPUs
2
Memory
96 GB RAM
Storage
78 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxGranite34bCodeInstruct
Model name
granite-vision-3-2-2b
Model ID
granite-vision-3-2-2b
New in 5.1.2
Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.
CPUs
2
Memory
32 GB RAM
Storage
7 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
NVIDIA Multi-Instance GPU support
No
ibmwxGraniteVision322Bs
Model name
jais-13b-chat
Model ID
core42-jais-13b-chat
General use foundation model for generative tasks in Arabic.
CPUs
2
Memory
96 GB RAM
Storage
60 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
Note: This foundation model cannot be sharded across multiple GPUs.
NVIDIA Multi-Instance GPU support
No
ibmwxCore42Jais13bChat
Model name
llama-3-3-70b-instruct
Model ID
llama-3-3-70b-instruct
New in 5.1.1
A state-of-the-art refresh of the Llama 3.1 70B Instruct model by using the latest advancements in post training techniques.
CPUs
3
Memory
96 GB RAM
Storage
75 GB
Configuration
You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 4 NVIDIA L40S

The GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama3370BInstruct
Model name
llama-3-2-1b-instruct
Model ID
llama-3-2-1b-instruct
New in 5.1.0
A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output.
CPUs
2
Memory
96 GB RAM
Storage
10 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxLlama321bInstruct
Model name
llama-3-2-3b-instruct
Model ID
llama-3-2-3b-instruct
New in 5.1.0
A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output.
CPUs
2
Memory
96 GB RAM
Storage
8 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxLlama323bInstruct
Model name
llama-3-2-11b-vision-instruct
Model ID
llama-3-2-11b-vision-instruct
New in 5.1.0
A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 2 NVIDIA L40S

The GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama3211bVisionInstruct
Model name
llama-3-2-90b-vision-instruct
Model ID
llama-3-2-90b-vision-instruct
New in 5.1.0
A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output.
CPUs
16
Memory
246 GB RAM
Storage
200 GB
Configuration
You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 8 NVIDIA L40S

The 8 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama3290bVisionInstruct
Model name
llama-guard-3-11b-vision
Model ID
llama-guard-3-11b-vision
New in 5.1.0
A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxLlamaGuard311bVision
Model name
llama-3-1-8b-instruct
Model ID
llama-3-1-8b-instruct
Restriction: Deprecated in 5.1.1
An auto-regressive language model that uses an optimized transformer architecture.
CPUs
2
Memory
96 GB RAM
Storage
20 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
ibmwxLlama318bInstruct
Model name
llama-3-1-70b-instruct
Model ID
llama-3-1-70b-instruct
Restriction: Deprecated in 5.1.1
An auto-regressive language model that uses an optimized transformer architecture.
CPUs
16
Memory
246 GB RAM
Storage
163 GB
Configuration
You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA L40S

The 4 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama3170bInstruct
Model name
llama-3-405b-instruct
Model ID
llama-3-405b-instruct
Meta's largest open-sourced foundation model to date, with 405 billion parameters, and optimized for dialogue use cases.
CPUs
16
Memory
246 GB RAM
Storage
500 GB
Configuration
You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 8 NVIDIA L40S

The 8 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama3405bInstruct
Model name
llama-3-8b-instruct
Model ID
meta-llama-llama-3-8b-instruct
Restriction: Deprecated
Pre-trained and instruction tuned generative text model optimized for dialogue use cases.
CPUs
2
Memory
96 GB RAM
Storage
40 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxMetaLlamaLlama38bInstruct
Model name
llama-3-70b-instruct
Model ID
meta-llama-llama-3-70b-instruct
Restriction: Deprecated
Pre-trained and instruction tuned generative text model optimized for dialogue use cases.
CPUs
10
Memory
246 GB RAM
Storage
180 GB
Configuration
You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA L40S

The 4 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxMetaLlamaLlama370bInstruct
Model name
llama-2-13b-chat
Model ID
meta-llama-llama-2-13b-chat
General use with zero- or few-shot prompts. Optimized for dialogue use cases.
Note: This model can be prompt tuned.
CPUs
2
Memory
128 GB RAM
Storage
62 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxMetaLlamaLlama213bChat
Model name
llama2-13b-dpo-v7
Model ID
mncai-llama-2-13b-dpo-v7
Restriction: Deprecated
General use foundation model for generative tasks in Korean.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
NVIDIA Multi-Instance GPU support
Yes
ibmwxMncaiLlama213bDpov7
Model name
ministral-8b-instruct
Model ID
ministral-8b-instruct
New in 5.1.0
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
2
Memory
96 GB RAM
Storage
35 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
Yes
ibmwxMinistral8BInstruct
Model name
mistral-small-24b-instruct-2501
Model ID
mistral-small-24b-instruct-2501
New in 5.1.2
Mistral Small 3 ( 2501 ) sets a new benchmark in the small Large Language Models category with less than 70 billion parameters. With a size of 24 billion parameters, the model achieves state-of-the-art capabilities comparable to larger models.
CPUs
2
Memory
32 GB RAM
Storage
50 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
NVIDIA Multi-Instance GPU support
No
ibmwxMistralSmall24BInstruct2501
Model name
mistral-small-instruct
Model ID
mistral-small-instruct
New in 5.1.0
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
2
Memory
96 GB RAM
Storage
50 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxMistralSmallInstruct
Model name
mistral-large-instruct-2411
Model ID
mistral-large-instruct-2411
New in 5.1.1
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
5
Memory
246 GB RAM
Storage
140 GB
Configuration
You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100

The 4 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxMistralLargeInstruct2411
Model name
mistral-large
Model ID
mistral-large
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
16
Memory
246 GB RAM
Storage
240 GB
Configuration
You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 8 NVIDIA L40S

The 8 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxMistralLarge
Model name
mixtral-8x7b-instruct-v01
Model ID
mistralai-mixtral-8x7b-instruct-v01
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.

Mixtral-8x7B is not a commercial model and does not require a separate entitlement.

CPUs
3
Memory
96 GB RAM
Storage
195 GB
Configuration
You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 4 NVIDIA L40S

The GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxMistralaiMixtral8x7bInstructv01
Model name
mt0-xxl-13b
Model ID
bigscience-mt0-xxl
Restriction: Deprecated
General use with zero- or few-shot prompts. Supports prompts in languages other than English and multilingual prompts.
CPUs
2
Memory
128 GB RAM
Storage
62 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This foundation model cannot be sharded across multiple GPUs.
NVIDIA Multi-Instance GPU support
Yes
ibmwxBigscienceMt0xxl
Model name
pixtral-large-instruct-2411
Model ID
pixtral-large-instruct
New in 5.1.1
A a 124-billion multimodal model built on top of Mistral Large 2, and demonstrates frontier-level image understanding.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPUs
16
Memory
246 GB RAM
Storage
240 GB
Configuration
You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100

The 8 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxPixtralLargeInstruct
Model name
pixtral-12b
Model ID
pixtral-12b
New in 5.1.0
A 12-billion parameter model pre-trained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation.
CPUs
2
Memory
96 GB RAM
Storage
30 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
NVIDIA Multi-Instance GPU support
No
ibmwxPixtral12b

You cannot add deprecated or withdrawn models to your deployment. For more information about how deprecated and withdrawn models are handled, see Foundation model lifecycle.

Custom foundation models

In addition to foundation models that are curated by IBM, you can upload and deploy your own foundation models. For more information about how to upload, register, and deploy a custom foundation model, see the following information:

Embedding and reranker models

You can use the following text embedding and reranker models:
Model System requirements Group name
Model name
all-minilm-l6-v2
Model ID
all-minilm-l6-v2
CPUs
2
Memory
4 GB
Storage
1 GB
ibmwxAllMinilmL6V2
Model name
all-minilm-l12-v2
Model ID
all-minilm-l12-v2
New in 5.1.0
CPUs
2
Memory
4 GB
Storage
1 GB
ibmwxAllMinilmL12V2
Model name
granite-embedding-107m-multilingual
Model ID
granite-embedding-107m-multilingual
New in 5.1.1
CPUs
2
Memory
4 GB
Storage
2 GB
ibmwxGranite107MMultilingualRtrvr
Model name
granite-embedding-278m-multilingual
Model ID
granite-embedding-278m-multilingual
New in 5.1.1
CPUs
2
Memory
4 GB
Storage
2 GB
ibmwxGranite278MMultilingualRtrvr
Model name
ms-marco-MiniLM-L-12-v2
Model ID
ms-marco-minilm-l-12-v2
New in 5.1.0
CPUs
2
Memory
4 GB
Storage
10 GB
ibmwxMsMarcoMinilmL12V2
Model name
multilingual-e5-large
Model ID
multilingual-e5-large
CPUs
4
Memory
8 GB
Storage
10 GB
ibmwxMultilingualE5Large
Model name
slate-30m-english-rtrvr
Model ID
ibm-slate-30m-english-rtrvr
CPUs
2
Memory
4 GB
Storage
10 GB
ibmwxSlate30mEnglishRtrvr
Model name
slate-125m-english-rtrvr
Model ID
ibm-slate-125m-english-rtrvr
CPUs
2
Memory
4 GB
Storage
10 GB
ibmwxSlate125mEnglishRtrvr

Text extraction models

Text extraction is a programmatic method for extracting text from images, tables, and structured PDF documents by using the IBM watsonx.ai API. To use the text extraction API, you must install a set of machine learning models that do the natural language understanding processing during text extraction.
Note: You cannot install text extraction models with a watsonx.ai lightweight engine installation.
Model System requirements Group name
Model name
A set of text extraction models that are represented by the wdu identifier.
Model ID
wdu
CPUs
9
Memory
31 GB
Storage
20 GB
Not necessary. The models are always downloaded because they have a small footprint.

Time series foundation models

5.1.1 and later

You can use the time series API to pass historical data observations to a time series foundation model that can forecast future values. You can deploy the following time series foundation models:

Model System requirements Group name
Model name
granite-ttm-512-96-r2
Model ID
granite-ttm-512-96-r2
CPUs
2
Memory
4 GB
Storage
1 GB
ibmwxGraniteTimeseriesTtmV1
Model name
granite-ttm-1024-96-r2
Model ID
granite-ttm-1024-96-r2
CPUs
2
Memory
4 GB
Storage
1 GB
ibmwxGraniteTimeseriesTtmV1
Model name
granite-ttm-1536-96-r2
Model ID
granite-ttm-1536-96-r2
CPUs
2
Memory
4 GB
Storage
1 GB
ibmwxGraniteTimeseriesTtmV1

Foundation models compatible with LoRA and QLoRA fine tuning

5.1.1 and later

You can use Parameter-Efficient Fine Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA) to train, deploy, and inference foundation models. The foundation models compatible with LoRA and QLoRA tuning can only be fine tuned. Unlike most large language models that are provided with IBM watsonx.ai, these models cannot be inferenced in the Prompt Lab or programmatically by using the API right away. The only way to inference one of these base models is to deploy the model as a custom foundation model.

You can deploy the following foundation models that are compatible with LoRA and QLoRA fine tuning:

Model System requirements Supported GPUs Group name
Model name
granite-3-1-8b-base
Model ID
granite-3-1-8b-base
Fine tuning method
LoRA
CPUs
2
Memory
96 GB
Storage
20 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
NVIDIA Multi-Instance GPU support
No
ibmwxGranite318BBase
Model name
llama-3-1-8b
Model ID
llama-3-1-8b
Fine tuning method
LoRA
CPUs
2
Memory
96 GB
Storage
20 GB
Configuration
You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
NVIDIA Multi-Instance GPU support
No
ibmwxLlama318B
Model name
llama-3-1-70b
Model ID
llama-3-1-70b
Fine tuning method
LoRA
CPUs
16
Memory
246 GB
Storage
280 GB
Configuration
You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100

The 4 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama318B
Model name
llama-3-1-70b-gptq
Model ID
llama-3-1-70b-gptq
Fine tuning method
QLoRA
CPUs
5
Memory
246 GB
Storage
40 GB
Configuration
You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100

The 4 GPUs must be hosted on a single OpenShift worker node.

NVIDIA Multi-Instance GPU support
No
ibmwxLlama3170BGptq