You can deploy a collection foundation models developed by IBM or third-party providers
in your cluster.
You can deploy the following types of foundation models in
IBM
watsonx.ai:
GPU requirements
One of the following types of GPUs is required to support the use of foundation models in
IBM
watsonx.ai:
- NVIDIA A100 GPUs with 80 GB
RAM
- NVIDIA H20 GPUs with 96 GB
RAM (Not supported with all models. See
tables for details.)
Restriction: You cannot use AutoAI to automate search for retrieval-augmented
generation (RAG) patterns with NVIDIA H20
GPUs.
- NVIDIA H100 GPUs with 80 GB
RAM
- NVIDIA H100 GPUs with 94 GB
RAM
- NVIDIA H200 GPUs with 141 GB
RAM
- NVIDIA L40S GPUs with 48 GB
RAM (Not supported with all models.
See tables for details.)
- NVIDIA RTX PRO 6000 GPUs with 96
GB RAM (Not supported with all
models. See tables for details.)
- Intel Gaudi 3 AI
Accelerator GPUs with 128 GB
RAM (Not supported with all
models. See tables for details.)
Restriction: You cannot tune foundation models with
Intel Gaudi 3 AI
Accelerator GPUs.
For details about how to deploy models with
Intel Gaudi 3 AI
Accelerator and
NVIDIA RTX PRO 6000 GPUs, see
Installing foundation models with a custom vLLM image.
A general guideline for calculating the number of GPUs required for hosting a foundation model is
as follows:
- NVIDIA L40S: GPU memory requirement / 48.
You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246
GB GPU memory, that's 246/48 = 5.1. The model needs 8 GPUs.
- Intel Gaudi 3 AI
Accelerator: GPU memory requirement /
128. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model
needs 450 GB GPU memory, that's 450/128 = 3. The model needs 4 GPUs.
Attention: You can install the IBM
watsonx.ai service on the VMware
vSphere platform with GPUs configured in
passthrough mode. You cannot use virtual GPUs (vGPUs) with watsonx.ai™.
To get an idea of the minimum
GPU memory requirements, multiply the number of billion parameters of the model by 3. For example,
for a foundation model with 12 billion parameters, multiply 12 by 3. An initial estimate of the
memory required by the model is 36 GB. Then add 1 GB per 100,000 tokens in the context window
length.
You might be able to run some models with fewer GPUs at context lengths other than the maximum or
subject to other performance tradeoffs and constraints. If you use a configuration with fewer than
the recommended number of GPUs, make sure to test the deployment to verify that the performance is
satisfactory before you use the configuration in production. If you use a configuration with more
than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is
recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.
You can divide foundation models into smaller units called shards and run the model shards
on multiple GPUs. By default, the number of shards a foundation model is partitioned into is equal
to the minimum number of GPUs required to run the model. For information on the minimum number of
GPUs required for each model in the model system requirements tables. To customize the number of
shards a model is partitioned into, see Changing foundation model sharding configuration.
You can optionally partition A100 or H100 GPU processors to add more than one foundation model to
a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be
partitioned indicate Yes for NVIDIA Multi-Instance GPU support in the foundation models' system
requirement tables.
Restriction: You cannot tune foundation models in NVIDIA Multi-Instance GPU enabled clusters.
When you calculate the total number of GPUs that you need for your deployment, consider whether
you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.
Foundation models
The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation
models that are provided with IBM
watsonx.ai at the default context window length for each
model. Minimum system requirements may vary based on the context length you set, the number of model
parameters, the model parameters' precision, and more.
For details about how to use foundation models provided with IBM
watsonx.ai, including the
default context window length and capabilties, see Supported foundation models.
Note:
- You do not need to prepare these resources in addition to the overall service
hardware requirements. If you meet the prerequisite hardware requirements for the service, you
already have the resources you need.
- The foundation model system requirements describe the subset of resources that are required per
model. For models that require more than 1 GPU, all GPUs must be hosted on a single OpenShift worker node
- allam-1-13b-instruct
-
- Status:
Deprecated
- Model ID:
allam-1-13b-instruct
A bilingual large language model
for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support
conversational tasks.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.
|
Yes |
- codestral-2501
-
- Status:
Deprecated
- Model ID:
codestral-2501
-
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.
Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
Yes, with additional configuration. For details, see
Installing models on GPU partitions. |
- codestral-2508
-
- Status:
Available
- Model ID:
codestral-2508
-
Ideal for code generation and
high-precision fill-in-the-middle (FIM) completion. The foundation model is optimized for production
engineering environments such as latency-sensitive, context-aware, and
self-deployable.
Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 3 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H20
- 2 NVIDIA
H100
- 2 NVIDIA
H200
|
No
|
-
devstral-medium-2507
-
- Status:
Available
- Model ID:
devstral-medium-2507
-
The devstral-medium-2507
foundation model from Mistral AI is a high-performance code generation and agentic reasoning model.
Ideal for generalization across prompt styles and tool use in code agents and frameworks.
Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 GB
RAM |
250 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H20
- 4 NVIDIA
H100
- 4 NVIDIA
H200
|
No
|
-
devstral-medium-2512
-
- Status:
Available
- Model ID:
devstral-medium-2512
-
The devstral-medium-2512
foundation model from Mistral AI is an agentic model for software engineering tasks from the
Devstral 2 model family that excels at using tools to explore code bases, editing multiple files,
and power software engineering agents.
Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 GB
RAM |
200 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 4 NVIDIA
H200
|
No
|
-
devstral-small-2512
-
- Status:
Available
- Model ID:
devstral-small-2512
-
The devstral-small-2512 foundation
model from Mistral AI is an agentic model for software engineering tasks from the Devstral 2 model
family that excels at using tools to explore code bases, editing multiple files, and power software
engineering agents.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 Gi
RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
No
|
-
gpt-oss-20b
-
- Status:
Available
- Model ID:
gpt-oss-20b
-
The gpt-oss foundation models are
OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various
developer use cases.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
128 GB
RAM |
100 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
No
|
-
gpt-oss-120b
-
- Status:
Available
- Model ID:
gpt-oss-120b
-
The gpt-oss foundation models are
OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various
developer use cases.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 6 |
96 GB RAM |
195 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
|
No
|
-
granite-4-h-micro
-
- Status:
Available
- Model ID:
granite-4-h-micro
-
The Granite 4.0 foundation models
belong to the IBM Granite family of models. The granite-4-h-micro is a 3 billion parameter
foundation model built for structured and long-context capabilities. The model is ideal for
instruction following and tool-calling.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
No
|
-
granite-4-h-tiny
-
- Status:
Available
- Model ID:
granite-4-h-tiny
-
The Granite 4.0 foundation models
belong to the IBM Granite family of models. The granite-4-h-tiny is a 7 billion parameter
long-context instruction-tuned model developed using a diverse set of techniques with a structured
chat format, including supervised fine-tuning, model alignment using reinforcement learning, and
model merging. This model is ideal for instruction following and tool-calling
capabilities.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No
|
-
granite-4-h-small
-
- Status:
Available
- Model ID:
granite-4-h-small
-
The Granite 4.0 foundation models
belong to the IBM Granite family of models. The granite-4-h-small is 30 billion parameter foundation
model built for structured and long-context capabilities. The model is ideal for instruction
following and tool-calling capabilities.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 3 |
96 GB RAM |
150 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA RTX PRO
6000
|
No
|
- granite-3-3-8b-instruct
-
- Status:
Available
- Model ID:
granite-3-3-8b-instruct
-
An IBM-trained, dense decoder-only
model, which is particularly well-suited for generative
tasks.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
18 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 NVIDIA RTX PRO
6000
|
No |
- granite-3-2-8b-instruct
-
- Status:
Deprecated
- Model ID:
granite-3-2-8b-instruct
-
A text-only model that is capable
of reasoning. You can choose whether reasoning is enabled, based on your use
case.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
|
No |
- granite-3-2b-instruct
-
- Status:
Available
- Model ID:
granite-3-2b-instruct
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
6 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- granite-3-8b-instruct
-
- Status:
Available
- Model ID:
granite-3-8b-instruct
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
|
Yes |
- granite-3b-code-instruct
-
- Status:
Available
- Model ID:
granite-3b-code-instruct
-
A 3-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
9 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
Note: This model can be fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.
|
Yes |
- granite-8b-code-instruct
-
- Status:
Available
- Model ID:
granite-8b-code-instruct
-
A 8-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
19 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, NVIDIA H200, or NVIDIA L40S GPU.
|
Yes |
- granite-20b-code-instruct
-
- Status:
Available
- Model ID:
granite-20b-code-instruct
-
A 20-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
70 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.
|
No |
- granite-20b-code-base-schema-linking
-
- Status:
Available
- Model ID:
granite-20b-code-base-schema-linking
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
44 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No |
- granite-20b-code-base-sql-gen
-
- Status:
Available
- Model ID:
granite-20b-code-base-sql-gen
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
44 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No |
- granite-34b-code-instruct
-
- Status:
Available
- Model ID:
granite-34b-code-instruct
-
A 34-billion parameter instruction
fine-tuned model from IBM that supports code discussion, generation, and
conversion.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
78 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
|
No |
-
granite-docling-258M
-
- Status:
Available
- Model ID:
granite-docling-258M
-
Granite Docling is a multimodal
image text to text model efficient for document conversion. The model preserves the core features of
Docling while maintaining seamless integration with Docking documents to ensure full
compatibility.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 GB RAM |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No
|
- granite-guardian-3-2b
-
- Status:
Deprecated
- Model ID:
granite-guardian-3-2b
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
|
Yes |
- granite-guardian-3-8b
-
- Status:
Deprecated
- Model ID:
granite-guardian-3-8b
-
Granite models are designed to be
used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only
architecture, with additional innovations from IBM Research and the open-source
community.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- granite-guardian-3-2-5b
-
- Status:
Available
- Model ID:
granite-guardian-3-2-5b
-
The Granite model series is a
family of IBM-trained, dense decoder-only models, which are particularly well suited for generative
tasks. This model cannot be used through the API.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 1 |
4 GB RAM |
15 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
- 1 NVIDIA RTX PRO
6000
|
No
|
- granite-vision-3-2-2b
-
- Status:
Deprecated
- Model ID:
granite-vision-3-2-2b
-
Granite 3.2 Vision is a
image-text-in, text-out model capable of understanding images like charts for enterprise use cases
for computer vision tasks.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 GB RAM |
7 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No |
- granite-vision-3-3-2b
-
- Status:
Available
- Model ID:
granite-vision-3-3-2b
-
Granite 3.2 Vision is an
image-text-in, text-out model capable of understanding images like charts for enterprise use cases
for computer vision tasks.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 1 |
128 GB
RAM |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
No
|
- granite-4-1b-speech
-
- Status:
Available
- Model ID:
ibm/granite-4-1b-speech
-
Granite-4-1b-speech is a compact
and efficient speech-language model, specifically designed for multilingual automatic speech
recognition (ASR) and bidirectional automatic speech translation (AST). The model was trained on a
collection of public corpora comprising of diverse datasets for ASR and AST as well as synthetic
datasets tailored to support Japanese ASR, keyword-biased ASR and speech translation.
Granite-4-1b-speech was trained by modality aligning granite-4-1b-base to speech on publicly
available open source corpora containing audio inputs and text
targets.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
10Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes
|
- ibm-defense-4-0-micro
-
- Status:
Available
- Model ID:
ibm-defense-4-0-micro
-
The ibm-defense-4-0-micro is a
defense-focused large language model (LLM) fine-tuned by an IBM Granite model. This model is
designed to work with Janes foundation defense data, delivering fast, reliable and contextual
results for mission-critical tasks in defense
organizations.
Attention: You must purchase the IBM
watsonx.ai Defense Model entitlement separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 GB RAM |
60 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No
|
- ibm-defense-4-0-small
-
- Status:
Available
- Model ID:
ibm-defense-4-0-small
-
The ibm-defense-4-0-small is a
defense-focused large language model fine-tuned by an IBM Granite model. This model is designed to
work with Janes foundation defense data, delivering fast, reliable and contextual results for
mission-critical tasks in defense organizations.
Attention: You must purchase the IBM
watsonx.ai Defense Model entitlement separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
85 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
No
|
- ibm-defense-3-3-8b-instruct
-
- Status:
Available
- Model ID:
ibm-defense-3-3-8b-instruct
-
The IBM watsonx.ai Defense Model
is a specialized fine-tuned version of IBM’s granite-3-3-8b-instruct base model. The model is
developed through Janes trusted open-source defense data to support defense and intelligence
operations.
Attention: You must purchase the IBM
watsonx.ai Defense Model entitlement separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
8 GB RAM |
18 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
No
|
- llama-4-maverick-17b-128e-instruct-fp8
-
- Status:
Available
- Model ID:
llama-4-maverick-17b-128e-instruct-fp8
-
The Llama 4 collection of models
are natively multimodal AI models that enable text and multimodal experiences. These models leverage
a mixture-of-experts architecture to offer industry-leading performance in text and image
understanding.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 9 |
96 GB RAM |
425 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H100
- 4 NVIDIA
H200
|
No |
- llama-4-maverick-17b-128e-instruct-int4
-
- Status:
Available
- Model ID:
llama-4-maverick-17b-128e-instruct-int4
-
The Llama 4 collection of models
are multimodal AI models that enable text and multimodal experiences. These models leverage a
mixture-of-experts architecture to offer industry-leading performance in text and image
understanding.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 4 |
128 GB
RAM |
250 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- NVIDIA L40S not
supported
|
No
|
- llama-4-scout-17b-16e-instruct-int4
-
- Status:
Available
- Model ID:
llama-4-scout-17b-16e-instruct-int4
-
The Llama 4 collection of models
are natively multimodal AI models that enable text and multimodal experiences. These models leverage
a mixture-of-experts architecture to offer industry-leading performance in text and image
understanding.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 1 |
128 GB
RAM |
215 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- NVIDIA L40S not
supported
|
No
|
- llama-3-3-70b-instruct
-
- Status:
Available
- Model ID:
llama-3-3-70b-instruct
-
A state-of-the-art refresh of the
Llama 3.1 70B Instruct model that uses the latest advancements in post-training
techniques.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 3 |
96 GB RAM |
75 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H20
- 2 NVIDIA
H100
- 1 NVIDIA
H200
- 4 NVIDIA
L40S
|
No |
-
llama-3-2-1b-instruct
-
- Status:
Available
- Model ID:
llama-3-2-1b-instruct
-
A pretrained and fine-tuned
generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
|
Yes |
- llama-3-2-3b-instruct
-
- Status:
Available
- Model ID:
llama-3-2-3b-instruct
-
A pretrained and fine-tuned
generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
9 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
|
Yes |
- llama-3-2-11b-vision-instruct
-
- Status:
Available
- Model ID:
llama-3-2-11b-vision-instruct
-
A pretrained and fine-tuned
generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
- 1 NVIDIA RTX PRO
6000
|
No |
- llama-3-2-90b-vision-instruct
-
- Status:
Available
- Model ID:
llama-3-2-90b-vision-instruct
-
A pretrained and fine-tuned
generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 16 |
246 GB
RAM |
200 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H20
- 8 NVIDIA
H100
- 4 NVIDIA
H200
- 8 NVIDIA
L40S
|
No |
- llama-guard-3-11b-vision
-
- Status:
Available
- Model ID:
llama-guard-3-11b-vision
-
A pretrained and fine-tuned
generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 NVIDIA RTX PRO
6000
|
No |
- llama-3-1-8b-instruct
-
- Status:
Available
- Model ID:
llama-3-1-8b-instruct
-
An auto-regressive language model
that uses an optimized transformer architecture.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
- 1 Intel Gaudi 3 AI
Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA L40S GPU.
|
Yes |
- llama-3-1-70b-instruct
-
- Status:
Available
- Model ID:
llama-3-1-70b-instruct
-
An auto-regressive language model
that uses an optimized transformer architecture.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 16 |
246 GB
RAM |
163 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 2 NVIDIA
H200
- 4 NVIDIA
L40S
|
No |
- magistral-small-2509
-
- Status:
Available
- Model ID:
magistral-small-2509
-
Building upon Mistral Small 3.2
(2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on
top, it's a small, efficient reasoning model with 24B
parameters.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 3 |
96 Gi RAM |
120Gi |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H100
- 2 NVIDIA
L40S
|
Yes |
- magistral-medium-2509
-
- Status:
Available
- Model ID:
magistral-medium-2509
-
Magistral Medium 2509 is an update
to the 2507 version with improvements in math and coding benchmarks, along with image input
support.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
400Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- ministral-14b-instruct-2512
-
- Status:
Available
- Model ID:
ministral-14b-instruct-2512
-
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
No |
- ministral-8b-instruct
-
- Status:
Deprecated
- Model ID:
ministral-8b-instruct
-
Ideal for complex tasks that
require large reasoning capabilities or are highly
specialized.Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
35 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- ministral-3b-instruct-2512
-
- Status:
Available
- Model ID:
ministral-3b-instruct-2512
-
The smallest model in the
Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision
capabilities.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM
|
18Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- ministral-8b-instruct-2512
-
- Status:
Available
- Model ID:
ministral-8b-instruct-2512
-
A balanced model in the Ministral
3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision
capabilities.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
18Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- ministral-3-14b-instruct-2512-bf16
-
- Status:
Available
- Model ID:
ministral-3-14b-instruct-2512-bf16
-
The smallest model in the
Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision
capabilities.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM
|
18Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
-
mistral-small-3-2-24b-instruct-2506
-
- Status:
Available
- Model ID:
mistral-small-3-2-24b-instruct-2506
-
The
mistral-small-3-2-24b-instruct-2506 foundation model is an enhancement to
mistral-small-3-1-24b-instruct-2503, with better instruction following and tool calling performance.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 3 |
96 GB RAM |
210 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H20
- 2 NVIDIA
H100
- 2 NVIDIA
H200
|
No
|
- mistral-small-3-1-24b-instruct-2503
-
- Status:
Available
- Model ID:
mistral-small-3-1-24b-instruct-2503
-
Building upon Mistral Small 3
(2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long
context capabilities and is suitable for function calling and
agents.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 3 |
96 GB RAM |
105 GB |
You can use any of the following GPU types:
- 2 NVIDIA
A100
- 2 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
|
Yes |
- mistral-medium-2505
-
- Status:
Deprecated
- Model ID:
mistral-medium-2505
-
Mistral Medium 3 features
multimodal capabilities and an extended context length of up to 128k. The model can process and
understand visual inputs, long documents and supports many
languages.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 Gi
RAM |
280 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 4 NVIDIA
L40S
|
No
|
- mistral-medium-2508
-
- Status:
Available
- Model ID:
mistral-medium-2508
-
The mistral-medium-2508 foundation
model is an enhancement of mistral-medium-2505, with state-of-the-art performance in coding and
multimodal understanding.
Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 GB
RAM |
300 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H20
- 4 NVIDIA
H100
- 4 NVIDIA
H200
|
No
|
- mistral-large-2512
-
- Status:
Available
- Model ID:
mistral-large-2512
-
The mistral-large-2512 foundation
model, also known as Mistral Large 3, is a state-of-the-art general-purpose multimodal granular
mixture-of-experts model with 41 billion active parameters and 675 billion total parameter. The
model is trained from the ground up with 3000 NVIDIA H200
GPUs.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 48 |
512 GB
RAM |
969 GB |
You can use any of the following GPU types:
|
No |
- mistral-large-instruct-2411
-
- Status:
Available
- Model ID:
mistral-large-instruct-2411
-
The most advanced Large Language
Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied
to any language-based task, including the most sophisticated
ones.Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 GB
RAM |
140 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H20
- 4 NVIDIA
H100
- 4 NVIDIA
H200
|
No |
- nvidia-nemotron-nano-12b-v2-vl-fp8
-
- Status:
Available
- Model ID:
nvidia/nvidia-nemotron-nano-12b-v2-vl-fp8
-
NVIDIA-Nemotron-Nano-VL-12B-V2-FP8
is the quantized version of the NVIDIA Nemotron Nano VL V2 model, which is an auto-regressive vision
language model that uses an optimized transformer
architecture.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
30Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes |
- nvidia-nemotron-3-nano-30b-a3b-fp8
-
- Status:
Available
- Model ID:
nvidia/nemotron-3-nano-30b-a3b-fp8
-
Nemotron-Nano-3-30B-A3B-FP8 is a
quantized version of Nemotron-Nano-3-30B-A3B and is a large language model (LLM) trained from
scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It
responds to user queries and tasks by first generating a reasoning trace and then concluding with a
final response.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 4 |
64 Gi RAM |
40Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
L40S
|
Yes |
- pixtral-large-instruct-2411
-
- Status:
Available
- Model ID:
pixtral-large-instruct
-
A 124 billion multimodal model
built on top of Mistral Large 2, and demonstrates frontier-level image
understanding.Attention: You must purchase the
Mistral AI with IBM license separately to download and use this
model.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 16 |
246 GB
RAM |
240 GB |
You can use any of the following GPU types:
- 8 NVIDIA
A100
- 8 NVIDIA
H20
- 8 NVIDIA
H100
- 4 NVIDIA
H200
|
No |
- pixtral-12b
-
- Status:
Deprecated in 5.3.0
- Model ID:
pixtral-12b
-
A 12 billion parameter model
pretrained and fine-tuned for generative tasks in text and image domains. The model is optimized for
multilingual use cases and provides robust performance in creative content
generation.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
30 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
|
No |
-
voxtral-small-24b-2507
-
- Status:
Available
- Model ID:
voxtral-small-24b-2507
-
Voxtral Small is an enhancement of
Mistral Small 3.1, incorporating state-of-the-art audio capabilities and text performance, capable
of processing up to 30 minutes of audio.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
210 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
|
No
|
- voxtral-mini-2507
-
- Status:
Available
- Model ID:
voxtral-mini-2507
-
Voxtral Mini is an enhancement of
Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class
text performance. It excels at speech transcription, translation, and audio
understanding.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
32 Gi RAM |
18Gi |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
L40S
|
Yes
|
You cannot add deprecated or withdrawn models to your deployment. For more information about how
deprecated and withdrawn models are handled, see Foundation model
lifecycle.
Embedding models
Text embedding models are small enough that the models can run without GPU. However, if you need
better performance from the embedding models, you can configure them to use GPU. For details, see
Installing embedding models on GPUs.
For details about how to use embedding models provided with IBM
watsonx.ai, including their
capabilities, see Supported encoder models.
- all-minilm-l6-v2
-
- Status:
Available
- Model ID:
all-minilm-l6-v2
-
Use all-minilm-l6-v2 as a sentence
and short paragraph encoder. Given an input text, the model generates a vector that captures the
semantic information in the text.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
1 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- all-minilm-l12-v2
-
- Status:
Available
- Model ID:
all-minilm-l12-v2
-
Use all-minilm-l12-v2 as a
sentence and short paragraph encoder. Given an input text, the model generates a vector that
captures the semantic information in the text.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
1 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- granite-embedding-107m-multilingual
-
- Status:
Available
- Model ID:
granite-embedding-107m-multilingual
-
A 107 million parameter model from
the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text
embeddings for a given input like a query, passage, or
document.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
2 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- granite-embedding-278m-multilingual
-
- Status:
Available
- Model ID:
granite-embedding-278m-multilingual
-
A 278 million parameter model from
the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text
embeddings for a given input like a query, passage, or
document.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
2 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- multilingual-e5-large
-
- Status:
Available
- Model ID:
multilingual-e5-large
-
An embedding model built by
Microsoft and provided by Hugging Face. The multilingual-e5-large model is useful for tasks such as
passage or information retrieval, semantic similarity, bitext mining, and paraphrase
retrieval.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 4 |
8 GB |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- slate-30m-english-rtrvr
-
- Status:
Available
- Model ID:
ibm-slate-30m-english-rtrvr
-
The IBM provided slate embedding
models are built to generate embeddings for various inputs such as queries, passages, or
documents.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
- slate-125m-english-rtrvr
-
- Status:
Available
- Model ID:
ibm-slate-125m-english-rtrvr
-
The IBM provided slate embedding
models are built to generate embeddings for various inputs such as queries, passages, or
documents.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
10 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H20
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 1 NVIDIA
L40S
|
Yes |
Reranker models
Reranker models are small enough that the models can run without GPU. For details about how to
use reranker models provided with IBM
watsonx.ai, including their capabilities, see Supported encoder models.
-
granite-embedding-english-reranker-r2
-
- Status:
Available
- Model ID:
granite-embedding-english-reranker-r2
-
A 149 million parameter model from
the Granite Embeddings suite provided by IBM. The model has been trained for passage reranking,
based on the granite-embedding-english-r2 to use in RAG pipelines.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB RAM |
1 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H20
- 1 NVIDIA
H200
|
No
|
- ms-marco-MiniLM-L-12-v2
-
- Status:
Available
- Model ID:
ms-marco-minilm-l-12-v2
-
A reranker model built by
Microsoft and provided by Hugging Face. Given query text and a set of document passages, the model
ranks the list of passages from most-to-least related to the
query.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
10 GB |
This model does not require any
GPU. |
Not
applicable. |
- Document text
processing
-
- Status:
Available
- Model ID:
wdu
-
A set of natural language text
processing models that are represented by the "wdu"
identifier.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 9 |
31 GB |
20 GB |
These models do not require any
GPU. |
Not
applicable. |
Time series foundation models
You can use the time series API to pass historical data observations to a time series foundation
model that can forecast future values. For details about how to use time series models provided with
IBM
watsonx.ai, including their capabilities, see Supported foundation models.
You can deploy the following time series foundation models:
- granite-ttm-512-96-r2
-
- Status:
Available
- Model ID:
granite-ttm-512-96-r2
-
The Granite time series models are
compact pretrained models for multivariate time series forecasting from IBM Research, also known as
Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and
generate a forecast dataset with 96 data points per channel by default.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
1 GB |
This model does not require any
GPU. |
Not
applicable. |
- granite-ttm-1024-96-r2
-
- Status:
Available
- Model ID:
granite-ttm-1024-96-r2
-
The Granite time series models are
compact pretrained models for multivariate time series forecasting from IBM Research, also known as
Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and
generate a forecast dataset with 96 data points per channel by default.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
1 GB |
This model does not require any
GPU. |
Not
applicable. |
- granite-ttm-1536-96-r2
-
- Status:
Available
- Model ID:
granite-ttm-1536-96-r2
-
The Granite time series models are
compact pretrained models for multivariate time series forecasting from IBM Research, also known as
Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and
generate a forecast dataset with 96 data points per channel by default.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
4 GB |
1 GB |
This model does not require any
GPU. |
Not
applicable. |
Foundation models available for tuning
You can use various techniques such as full fine tuning, low-rank adaptation (LoRA) tuning, or
quantized low-rank adaptation (QLoRA) tuning to train, deploy foundation models that are compatible
with fine tuning.
These foundation models can only be fine tuned. Unlike most foundation models that are provided
with IBM
watsonx.ai, these models cannot be directly inferenced in the Prompt Lab or programmatically by using the
API. For details about resource requirements for model tuning, see Planning for foundation model tuning in IBM watsonx.ai.
You can install the following foundation models that can be fine tuned:
- granite-3-1-8b-base
-
- Status:
Available
- Tuning method: Full fine tuning, LoRA fine tuning
- Model ID:
granite-3-1-8b-base
-
Granite 3.1 8b base is a
pretrained autoregressive foundation model with a context length of 128k intended for
tuning.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
|
No |
- llama-3-1-8b
-
- Status:
Available
- Tuning method: Full fine tuning, LoRA fine tuning
- Model ID:
llama-3-1-8b
-
Llama-3-1-8b is a pretrained
generative text model with 8 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 2 |
96 GB RAM |
20 GB |
You can use any of the following GPU types:
- 1 NVIDIA
A100
- 1 NVIDIA
H100
- 1 NVIDIA
H200
- 2 NVIDIA
L40S
|
No |
- llama-3-1-70b
-
- Status:
Available
- Tuning method: Full fine tuning, LoRA fine tuning
- Model ID:
llama-3-1-70b
-
Llama-3-1-70b is a pretrained
generative text model with 70 billion parameters, optimized for multilingual dialogue use cases and
code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 16 |
246 GB
RAM |
280 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 4 NVIDIA
H200
- 2 NVIDIA
L40S
|
No |
- llama-3-1-70b-gptq
-
- Status:
Available
- Tuning method: QLoRA fine tuning
- Model ID:
llama-3-1-70b-gptq
-
Llama 3.1 70b is a pretrained
generative text base model with 70 billion parameters, optimized for multilingual dialogue use cases
and code output.
-
| CPU |
Memory |
Storage |
Supported GPUs |
NVIDIA Multi-Instance GPU support |
| 5 |
246 GB
RAM |
40 GB |
You can use any of the following GPU types:
- 4 NVIDIA
A100
- 4 NVIDIA
H100
- 4 NVIDIA
H200
|
No |
Custom foundation models
In addition to foundation models that are curated by IBM, you can upload and deploy your own
foundation models. For more information about how to upload, register, and deploy a custom
foundation model, see the following information:
Supported Accelerators on s390x
You can use hardware accelerators to improve foundation model inference performance on IBM Z and
LinuxONE systems running the s390x architecture. Accelerators provide optimized compute for AI
workloads on mainframe infrastructure.
- IBM Spyre accelerator
- IBM Spyre accelerators are inference accelerators for AI workloads on s390x systems. Spyre
accelerators integrate with IBM
watsonx.ai to provide hardware-accelerated inference for
foundation models on IBM Z and LinuxONE platforms.
-
- Hardware requirements
-
- 10 dedicated IFLs
- Two additional shared IFLs are required for the Spyre Support Appliance and the Appliance
Control Center (SSA/ACC).
- 758 GB Memory
- 2.1 TB Disk (Filesystem + FDF)
- 8x Spyre Cards