You can deploy a collection of third-party and IBM models in IBM
watsonx.ai.
You can deploy the following types of foundation models:
GPU requirements
One of the following types of GPUs is required to support the use of foundation models in
IBM
watsonx.ai:
- NVIDIA A100 GPUs with 80 GB
RAM
- NVIDIA H100 GPUs with 80 GB
RAM
- NVIDIA H100 GPUs with 94 GB
RAM
- NVIDIA L40S GPUs with 48 GB
RAM (Not supported with all models.
See tables for details.)
Attention: You can install the IBM
watsonx.ai service on the VMware
vSphere platform with GPUs configured in
passthrough mode. You cannot use virtual GPUs (vGPUs) with watsonx.ai™.
A general guideline for calculating the number of GPUs required for hosting a foundation model is
as follows:
- L40S: GPU memory requirement / 48. You can use only a 1, 2, 4, or 8 GPU configuration, so round
up. For example, if the model needs 246 GB GPU memory, that's 246/48 = 5.1. The model needs 8
GPUs.
- A100/H100: GPU memory requirement / 80. You can use only a 1, 2, 4, or 8 GPU configuration, so
round up. For example, if the model needs 246 GB GPU memory, that's 246/80 = 3. The model needs 4
GPUs.
To get an idea of the minimum GPU memory requirements, multiply the number of billion
parameters of the model by 3. For example, for a foundation model with 12 billion parameters,
multiply 12 by 3. An initial estimate of the memory required by the model is 36 GB. Then add 1 GB
per 100,000 tokens in the context window length.
You might be able to run some models with fewer GPUs at context lengths other than the maximum or
subject to other performance tradeoffs and constraints. If you use a configuration with fewer than
the recommended number of GPUs, make sure to test the deployment to verify that the performance is
satisfactory before you use the configuration in production. If you use a configuration with more
than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is
recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.
You can optionally partition A100 or H100 GPU processors to add more than one foundation model to
a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be
partitioned indicate Yes
for NVIDIA Multi-Instance GPU support in the foundation models table.
Restriction: You cannot tune foundation models in NVIDIA Multi-Instance GPU enabled clusters.
When you calculate the total number of GPUs that you need for your deployment, consider whether
you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.
Provided foundation models
The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation
models that are provided with IBM
watsonx.ai at the default context window length for each
model. Minimum system requirements may vary based on the context length you set, the number of model
parameters, the model parameters' precision, and more.
For details about the foundation models provided with IBM
watsonx.ai, including the default
context window length, see Supported foundation models.
Note: You do not need to prepare these resources in addition to the overall service
hardware requirements. If you meet the prerequisite hardware requirements for the service, you
already have the resources you need. The following table describes the subset of resources that are
required per model.
- allam-1-13b-instruct
-
- Status:
Available
- Model ID:
allam-1-13b-instruct
- Group:
ibmwxAllam113bInstruct
A bilingual large language model
for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support
conversational tasks.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
Yes |
- codestral-2501
-
- Status:
Available
- Model ID:
codestral-2501
- Group:
ibmwxCodestral2501
-
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.
Attention: You must purchase Mistral AI with IBM separately before you are
entitled to download and use this model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
Yes, with additional configuration. For details, see
Installing models on GPU partitions. |
- codestral-22b
-
- Status:
Available
- Model ID:
codestral-22b
- Group:
ibmwxCodestral22B
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.
Attention: You must purchase Mistral AI with IBM separately before you are
entitled to download and use this model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
50 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
- elyza-japanese-llama-2-7b-instruct
-
- Status:
Available
- Model ID:
elyza-japanese-llama-2-7b-instruct
- Group:
ibmwxElyzaJapaneseLlama27bInstruct
-
General use with zero- or few-shot
prompts. Works well for classification and extraction in Japanese and for translation between
English and Japanese. Performs best when prompted in
Japanese.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
50 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- flan-t5-xl-3b
-
- Status:
Available
- Model ID:
google-flan-t5-xl
- Group:
ibmwxGoogleFlanT5xl
-
General use with zero- or few-shot
prompts.Note: This foundation model can be prompt
tuned.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
128 GB
RAM |
21 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- flan-t5-xxl-11b
-
- Status:
Available
- Model ID:
google-flan-t5-xxl
- Group:
ibmwxGoogleFlanT5xxl
-
General use with zero- or few-shot
prompts.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
128 GB
RAM |
52 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- flan-ul2-20b
-
- Status:
Available
- Model ID:
google-flan-ul2
- Group:
ibmwxGoogleFlanul2
-
General use with zero- or
few-shot prompts.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
|
128 GB
RAM |
85 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 2 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node. |
Yes |
- granite-7b-lab
-
- Status:
Deprecated
- Model ID:
ibm-granite-7b-lab
- Group:
ibmwxGranite7bLab
-
InstructLab foundation model from
IBM that supports knowledge and skills contributed by the open source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-8b-japanese
-
- Status:
Withdrawn in 5.2.0
- Model ID:
ibm-granite-8b-japanese
- Group:
ibmwxGranite8bJapanese
-
A pretrained instruct variant
model from IBM designed to work with Japanese text.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
50 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-13b-instruct-v2
-
- Status:
Available
- Model ID:
ibm-granite-13b-instruct-v2
- Group:
ibmwxGranite13bInstructv2
-
General use model from IBM that is
optimized for question and answer use cases.Note: This model can be prompt
tuned.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
128 GB
RAM |
62 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-3-3-8b-instruct
-
- Status:
Available in 5.2.0
- Model ID:
granite-3-3-8b-instruct
- Group:
ibmwxGranite338BInstruct
-
An IBM-trained, dense decoder-only
model, which is particularly well-suited for generative
tasks.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB
RAM |
18 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
No |
- granite-3-2-8b-instruct
-
- Status:
Available
- Model ID:
granite-3-2-8b-instruct
- Group:
ibmwxGranite328BInstruct
-
A text-only model that is capable
of reasoning. You can choose whether reasoning is enabled, based on your use
case.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
32 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 2 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node.
|
No |
- granite-3-2b-instruct
-
- Status:
Available
- Model ID:
granite-3-2b-instruct
- Group:
ibmwxGranite32BInstruct
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
6 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-3-8b-instruct
-
- Status:
Available
- Model ID:
granite-3-8b-instruct
- Group:
ibmwxGranite38BInstruct
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-guardian-3-2b
-
- Status:
Available
- Model ID:
granite-guardian-3-2b
- Group:
ibmwxGraniteGuardian32b
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-guardian-3-8b
-
- Status:
Available
- Model ID:
granite-guardian-3-8b
- Group:
ibmwxGraniteGuardian38b
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-3b-code-instruct
-
- Status:
Available
- Model ID:
granite-3b-code-instruct
- Group:
ibmwxGranite3bCodeInstruct
-
A 3-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
9 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
Yes |
- granite-8b-code-instruct
-
- Status:
Available
- Model ID:
granite-8b-code-instruct
- Group:
ibmwxGranite8bCodeInstruct
-
A 8-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
19 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
Yes |
- granite-20b-code-instruct
-
- Status:
Available
- Model ID:
granite-20b-code-instruct
- Group:
ibmwxGranite20bCodeInstruct
-
A 20-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
70 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
No |
- granite-20b-code-base-schema-linking
-
- Status:
Available
- Model ID:
granite-20b-code-base-schema-linking
- Group:
ibmwxGranite20bCodeBaseSchemaLinking
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
44 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
- granite-20b-code-base-sql-gen
-
- Status:
Available
- Model ID:
granite-20b-code-base-sql-gen
- Group:
ibmwxGranite20bCodeBaseSqlGen
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
44 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
- granite-34b-code-instruct
-
- Status:
Available
- Model ID:
granite-34b-code-instruct
- Group:
ibmwxGranite34bCodeInstruct
-
A 34-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
78 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
- granite-vision-3-2-2b
-
- Status:
Available
- Model ID:
granite-vision-3-2-2b
- Group:
ibmwxGraniteVision322Bs
-
Granite 3.2 Vision is a
image-text-in, text-out model capable of understanding images like charts for enterprise use cases
for computer vision tasks.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
32 GB RAM |
7 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
No |
- jais-13b-chat
-
- Status:
Available
- Model ID:
core42-jais-13b-chat
- Group:
ibmwxCore42Jais13bChat
-
General use foundation model for
generative tasks in Arabic.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
60 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
No |
-
llama-4-maverick-17b-128e-instruct-fp8
-
- Status:
Available in 5.2.0
- Model ID:
llama-4-maverick-17b-128e-instruct-fp8
- Group:
ibmwxLlama4Maverick17B128EInstructFp8
-
The Llama 4 collection of models
are natively multimodal AI models that enable text and multimodal experiences. These models leverage
a mixture-of-experts architecture to offer industry-leading performance in text and image
understanding.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
9 |
96 GB RAM |
425 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- llama-4-scout-17b-16e-instruct
-
- Status:
Available in 5.2.0
- Model ID:
llama-4-scout-17b-16e-instruct
- Group:
ibmwxLlama4Scout17B16EInstruct
-
The Llama 4 collection of models
are natively multimodal AI models that enable text and multimodal experiences. These models leverage
a mixture-of-experts architecture to offer industry-leading performance in text and image
understanding.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
9 |
96 GB RAM |
215 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- llama-3-3-70b-instruct
-
- Status:
Available
- Model ID:
llama-3-3-70b-instruct
- Group:
ibmwxLlama3370BInstruct
-
A state-of-the-art refresh of the
Llama 3.1 70B Instruct model that uses the latest advancements in post-training
techniques.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
3 |
96 GB RAM |
75 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H100
- 4 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node.
|
No |
-
llama-3-2-1b-instruct
-
- Status:
Available
- Model ID:
llama-3-2-1b-instruct
- Group:
ibmwxLlama321bInstruct
-
A pretrained and fine-tuned
generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- llama-3-2-3b-instruct
-
- Status:
Available
- Model ID:
llama-3-2-3b-instruct
- Group:
ibmwxLlama323bInstruct
-
A pretrained and fine-tuned
generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
9 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- llama-3-2-11b-vision-instruct
-
- Status:
Available
- Model ID:
llama-3-2-11b-vision-instruct
- Group:
ibmwxLlama3211bVisionInstruct
-
A pretrained and fine-tuned
generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 2 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- llama-3-2-90b-vision-instruct
-
- Status:
Available
- Model ID:
llama-3-2-90b-vision-instruct
- Group:
ibmwxLlama3290bVisionInstruct
-
A pretrained and fine-tuned
generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
16 |
246 GB
RAM |
200 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
- 8 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- llama-guard-3-11b-vision
-
- Status:
Available
- Model ID:
llama-guard-3-11b-vision
- Group:
ibmwxLlamaGuard311bVision
-
A pretrained and fine-tuned
generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
- llama-3-1-8b-instruct
-
Note: The deprecation of this model is reversed in 5.2.0.
-
- Status:
Available
- Model ID:
llama-3-1-8b-instruct
- Group:
ibmwxLlama318bInstruct
-
An auto-regressive language model
that uses an optimized transformer architecture.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
Yes |
- llama-3-1-70b-instruct
-
Note: The deprecation of this model is reversed in 5.2.0.
-
- Status:
Available
- Model ID:
llama-3-1-70b-instruct
- Group:
ibmwxLlama3170bInstruct
-
An auto-regressive language model
that uses an optimized transformer architecture.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
16 |
246 GB
RAM |
163 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 4 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- llama-3-405b-instruct
-
- Status:
Available
- Model ID:
llama-3-405b-instruct
- Group:
ibmwxLlama3405bInstruct
-
Meta's largest open-sourced
foundation model to date, with 405 billion parameters, and optimized for dialogue use
cases.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
16 |
246 GB
RAM |
500 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
- 8 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node.
|
No |
- llama-3-8b-instruct
-
- Status:
Deprecated
- Model ID:
meta-llama-llama-3-8b-instruct
- Group:
ibmwxMetaLlamaLlama38bInstruct
-
Pretrained and instruction tuned
generative text model optimized for dialogue use cases.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
40 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- llama-3-70b-instruct
-
- Status:
Deprecated
- Model ID:
meta-llama-llama-3-70b-instruct
- Group:
ibmwxMetaLlamaLlama370bInstruct
-
Pretrained and instruction tuned
generative text model optimized for dialogue use cases.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
10 |
246 GB
RAM |
180 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 4 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node.
|
No |
- llama-2-13b-chat
-
- Status:
Available
- Model ID:
meta-llama-llama-2-13b-chat
- Group:
ibmwxMetaLlamaLlama213bChat
-
General use with zero- or few-shot
prompts. Optimized for dialogue use cases.Note: This model can be prompt tuned.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
128 GB
RAM |
62 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- llama2-13b-dpo-v7
-
- Status:
Withdrawn in 5.2.0
- Model ID:
mncai-llama-2-13b-dpo-v7
- Group:
ibmwxMncaiLlama213bDpov7
-
General use foundation model for
generative tasks in Korean.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
Yes |
- ministral-8b-instruct
-
- Status:
Available
- Model ID:
ministral-8b-instruct
- Group:
ibmwxMinistral8BInstruct
-
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.Attention: You must purchase Mistral AI
with IBM separately before you are entitled to download and use this model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
35 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- mistral-small-3-1-24b-instruct-2503
-
- Status:
Available in 5.2.0
- Model ID:
mistral-small-3-1-24b-instruct-2503
- Group:
ibmwxMistralSmall3124BInstruct2503
-
Building upon Mistral Small 3
(2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long
context capabilities and is suitable for function calling and
agents.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
3 |
96 GB RAM |
105 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H100
|
Yes |
- mistral-small-24b-instruct-2501
-
- Status:
Available
- Model ID:
mistral-small-24b-instruct-2501
- Group:
ibmwxMistralSmall24BInstruct2501
-
Mistral Small 3 (2501) sets a new
benchmark in the small Large Language Models category with less than 70 billion parameters. With a
size of 24 billion parameters, the model achieves state-of-the-art capabilities comparable to larger
models.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
32 GB RAM |
50 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
No |
- mistral-small-instruct
-
- Status:
Available
- Model ID:
mistral-small-instruct
- Group:
ibmwxMistralSmallInstruct
-
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.Attention: You must purchase Mistral AI
with IBM separately before you are entitled to download and use this model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
50 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
- mistral-large-instruct-2411
-
- Status:
Available
- Model ID:
mistral-large-instruct-2411
- Group:
ibmwxMistralLargeInstruct2411
-
The most advanced Large Language
Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied
to any language-based task, including the most sophisticated
ones.Attention: You must purchase
Mistral AI with IBM separately before you are entitled to download and use this
model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
5 |
246 GB
RAM |
140 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- mistral-large
-
- Status:
Available
- Model ID:
mistral-large
- Group:
ibmwxMistralLarge
-
The most advanced Large Language
Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied
to any language-based task, including the most sophisticated
ones.Attention: You must purchase Mistral AI with
IBM separately before you are entitled to download and use this model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
16 |
246 GB
RAM |
240 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
- 8 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- mixtral-8x7b-instruct-v01
-
- Status:
Available
- Model ID:
mixtral-8x7b-instruct-v01
- Group:
ibmwxMistralaiMixtral8x7bInstructv01
-
The Mixtral-8x7B Large Language
Model (LLM) is a pretrained generative Sparse Mixture of Experts.
Mixtral-8x7B is not a
commercial model and does not require a separate
entitlement.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
3 |
96 GB RAM |
195 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H100
- 4 NVIDIA
L40S
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- mt0-xxl-13b
-
- Status:
Available
- Model ID:
bigscience-mt0-xxl
- Group:
ibmwxBigscienceMt0xxl
-
General use with zero- or few-shot
prompts. Supports prompts in languages other than English and multilingual
prompts.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
128 GB
RAM |
62 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- pixtral-large-instruct-2411
-
- Status:
Available
- Model ID:
pixtral-large-instruct
- Group:
ibmwxPixtralLargeInstruct
-
A a 124-billion multimodal model
built on top of Mistral Large 2, and demonstrates frontier-level image
understanding.Attention: You must purchase
Mistral AI with IBM separately before you are entitled to download and use this
model.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
16 |
246 GB
RAM |
240 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
All GPUs must be hosted on a single
OpenShift worker node. |
No |
- pixtral-12b
-
- Status:
Available
- Model ID:
pixtral-12b
- Group:
ibmwxPixtral12b
-
A 12-billion parameter model
pretrained and fine-tuned for generative tasks in text and image domains. The model is optimized for
multilingual use cases and provides robust performance in creative content
generation.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No |
You cannot add deprecated or withdrawn models to your deployment. For more information about how
deprecated and withdrawn models are handled, see Foundation model
lifecycle.
Custom foundation models
In addition to foundation models that are curated by IBM, you can upload and deploy your own
foundation models. For more information about how to upload, register, and deploy a custom
foundation model, see the following information:
Embedding models
Text embedding models are small enough that the models can run without GPU. However, if you need
better performance from the embedding models, you can configure them to use GPU.
- all-minim-l6-v2
-
- Status:
Available
- Model ID:
all-minilm-l6-v2
- Group:
ibmwxAllMinilmL6V2
-
Use all-minim-l6-v2 as a sentence
and short paragraph encoder. Given an input text, the model generates a vector that captures the
semantic information in the text.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
1 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- all-minim-l12-v2
-
- Status:
Available
- Model ID:
all-minilm-l12-v2
- Group:
ibmwxAllMinilmL12V2
-
Use all-minim-l12-v2 as a sentence
and short paragraph encoder. Given an input text, the model generates a vector that captures the
semantic information in the text.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
1 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-embedding-107m-multilingual
-
- Status:
Available
- Model ID:
granite-embedding-107m-multilingual
- Group:
ibmwxGranite107MMultilingualRtrvr
-
A 107 million parameter model from
the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text
embeddings for a given input like a query, passage, or
document.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
2 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- granite-embedding-278m-multilingual
-
- Status:
Available
- Model ID:
granite-embedding-278m-multilingual
- Group:
ibmwxGranite278MMultilingualRtrvr
-
A 278 million parameter model from
the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text
embeddings for a given input like a query, passage, or
document.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
2 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- multilingual-e5-large
-
- Status:
Available
- Model ID:
multilingual-e5-large
- Group:
ibmwxMultilingualE5Large
-
An embedding model built by
Microsoft and provided by Hugging Face. The multilingual-e5-large model is useful for tasks such as
passage or information retrieval, semantic similarity, bitext mining, and paraphrase
retrieval.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
4 |
8 GB |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- slate-30m-english-rtrvr
-
- Status:
Available
- Model ID:
ibm-slate-30m-english-rtrvr
- Group:
ibmwxSlate30mEnglishRtrvr
-
The IBM provided slate embedding
models are built to generate embeddings for various inputs such as queries, passages, or
documents.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- slate-125m-english-rtrvr
-
- Status:
Available
- Model ID:
ibm-slate-125m-english-rtrvr
- Group:
ibmwxSlate125mEnglishRtrvr
-
The IBM provided slate embedding
models are built to generate embeddings for various inputs such as queries, passages, or
documents.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
Reranker models
Reranker models are small enough that the models run without GPU.
- ms-marco-MiniLM-L-12-v2
-
- Status:
Available
- Model ID:
ms-marco-minilm-l-12-v2
- Group:
ibmwxMsMarcoMinilmL12V2
-
A reranker model built by
Microsoft and provided by Hugging Face. Given query text and a set of document passages, the model
ranks the list of passages from most-to-least related to the
query.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
10 GB |
This model does not require any
GPU. |
Not
applicable. |
Custom foundation models
In addition to foundation models that are curated by IBM, you can upload and deploy your own
foundation models. For more information about how to upload, register, and deploy a custom
foundation model, see the following information:
- Text
extraction
-
- Status:
Available
- Model ID:
wdu
- Group:
Not necessary. The models are
always downloaded because they have a small footprint.
-
A set of text extraction models
that are represented by the "wdu" identifier.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
9 |
31 GB |
20 GB |
These models do not require any
GPU. |
Not
applicable. |
Time series foundation models
You can use the time series API to pass historical data observations to a time series foundation
model that can forecast future values. You can deploy the following time series foundation
models:
- granite-ttm-512-96-r2
-
- Status:
Available
- Model ID:
granite-ttm-512-96-r2
- Group:
ibmwxGraniteTimeseriesTtmV1
-
The Granite time series models are
compact pretrained models for multivariate time series forecasting from IBM Research, also known as
Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and
generate a forecast dataset with 96 data points per channel by default.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
1 GB |
This model does not require any
GPU. |
Not
applicable. |
- granite-ttm-1024-96-r2
-
- Status:
Available
- Model ID:
granite-ttm-1024-96-r2
- Group:
ibmwxGraniteTimeseriesTtmV1
-
The Granite time series models are
compact pretrained models for multivariate time series forecasting from IBM Research, also known as
Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and
generate a forecast dataset with 96 data points per channel by default.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
1 GB |
This model does not require any
GPU. |
Not
applicable. |
- granite-ttm-1536-96-r2
-
- Status:
Available
- Model ID:
granite-ttm-1536-96-r2
- Group:
ibmwxGraniteTimeseriesTtmV1
-
The Granite time series models are
compact pretrained models for multivariate time series forecasting from IBM Research, also known as
Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and
generate a forecast dataset with 96 data points per channel by default.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
4 GB |
1 GB |
This model does not require any
GPU. |
Not
applicable. |
Foundation models compatible with LoRA and QLoRA fine tuning
You can use Parameter-Efficient Fine Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA)
and Quantized Low-Rank Adaptation (QLoRA) to train, deploy, and inference foundation models. The
foundation models compatible with LoRA and QLoRA tuning can only be fine tuned. Unlike most large
language models that are provided with IBM
watsonx.ai, these models cannot be inferenced in
the Prompt Lab or programmatically by
using the API right away. The only way to inference one of these base models is to deploy the model
as a custom foundation model.
You can deploy the following foundation models that are compatible with LoRA and QLoRA fine
tuning:
- granite-3-1-8b-base
-
- Status:
Available
- Fine tuning method: LoRA
- Model ID:
granite-3-1-8b-base
- Group:
ibmwxGranite318BBase
-
Granite 3.1 8b base is a
pretrained autoregressive foundation model with a context length of 128k intended for
tuning.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
No |
- llama-3-1-8b
-
- Status:
Available
- Fine tuning method: LoRA
- Model ID:
llama-3-1-8b
- Group:
ibmwxLlama318B
-
Llama-3-1-8b is a pretrained and
fine-tuned generative text model with 8 billion parameters, optimized for multilingual dialogue use
cases and code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
|
No |
- llama-3-1-70b
-
- Status:
Available
- Fine tuning method: LoRA
- Model ID:
llama-3-1-70b
- Group:
ibmwxLlama318B
-
Llama-3-1-70b is a pretrained and
fine-tuned generative text model with 70 billion parameters, optimized for multilingual dialogue use
cases and code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
16 |
246 GB
RAM |
280 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
|
No |
- llama-3-1-70b-gptq
-
- Status:
Available
- Fine tuning method: QLoRA
- Model ID:
llama-3-1-70b-gptq
- Group:
ibmwxLlama3170BGptq
-
Llama 3.1 70b is a pretrained and
fine-tuned generative text base model with 70 billion parameters, optimized for multilingual
dialogue use cases and code output.
-
CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
5 |
246 GB
RAM |
40 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
|
No |