Foundation models in IBM watsonx.ai

You can deploy a collection of third-party and IBM models in IBM watsonx.ai.

GPU requirements

One of the following types of GPUs is required to support the use of foundation models in IBM watsonx.ai:
  • NVIDIA A100 GPUs with 80 GB RAM
  • NVIDIA H100 GPUs with 80 GB RAM
  • NVIDIA H100 GPUs with 94 GB RAM
  • NVIDIA L40S GPUs with 48 GB RAM (Not supported with all models. See tables for details.)
Attention: You can install the IBM watsonx.ai service on the VMware vSphere platform with GPUs configured in passthrough mode. You cannot use virtual GPUs (vGPUs) with watsonx.ai™.
A general guideline for calculating the number of GPUs required for hosting a foundation model is as follows:
  • L40S: GPU memory requirement / 48. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/48 = 5.1. The model needs 8 GPUs.
  • A100/H100: GPU memory requirement / 80. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/80 = 3. The model needs 4 GPUs.
To get an idea of the minimum GPU memory requirements, multiply the number of billion parameters of the model by 3. For example, for a foundation model with 12 billion parameters, multiply 12 by 3. An initial estimate of the memory required by the model is 36 GB. Then add 1 GB per 100,000 tokens in the context window length.

You might be able to run some models with fewer GPUs at context lengths other than the maximum or subject to other performance tradeoffs and constraints. If you use a configuration with fewer than the recommended number of GPUs, make sure to test the deployment to verify that the performance is satisfactory before you use the configuration in production. If you use a configuration with more than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.

You can optionally partition A100 or H100 GPU processors to add more than one foundation model to a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be partitioned indicate Yes for NVIDIA Multi-Instance GPU support in the foundation models table.

Restriction: You cannot tune foundation models in NVIDIA Multi-Instance GPU enabled clusters.

When you calculate the total number of GPUs that you need for your deployment, consider whether you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.

Provided foundation models

The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation models that are provided with IBM watsonx.ai at the default context window length for each model. Minimum system requirements may vary based on the context length you set, the number of model parameters, the model parameters' precision, and more.

For details about the foundation models provided with IBM watsonx.ai, including the default context window length, see Supported foundation models.

Note: You do not need to prepare these resources in addition to the overall service hardware requirements. If you meet the prerequisite hardware requirements for the service, you already have the resources you need. The following table describes the subset of resources that are required per model.
allam-1-13b-instruct
  • Status: Available
  • Model ID: allam-1-13b-instruct
  • Group: ibmwxAllam113bInstruct

A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
Yes
codestral-2501
  • Status: Available
  • Model ID: codestral-2501
  • Group: ibmwxCodestral2501

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
Yes, with additional configuration. For details, see Installing models on GPU partitions.
codestral-22b
  • Status: Available
  • Model ID: codestral-22b
  • Group: ibmwxCodestral22B

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
elyza-japanese-llama-2-7b-instruct
  • Status: Available
  • Model ID: elyza-japanese-llama-2-7b-instruct
  • Group: ibmwxElyzaJapaneseLlama27bInstruct

General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
flan-t5-xl-3b
  • Status: Available
  • Model ID: google-flan-t5-xl
  • Group: ibmwxGoogleFlanT5xl
General use with zero- or few-shot prompts.
Note: This foundation model can be prompt tuned.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 21 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
flan-t5-xxl-11b
  • Status: Available
  • Model ID: google-flan-t5-xxl
  • Group: ibmwxGoogleFlanT5xxl

General use with zero- or few-shot prompts.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 52 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
flan-ul2-20b
  • Status: Available
  • Model ID: google-flan-ul2
  • Group: ibmwxGoogleFlanul2

General use with zero- or few-shot prompts.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
128 GB RAM 85 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 2 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
Yes
granite-7b-lab
  • Status: Deprecated
  • Model ID: ibm-granite-7b-lab
  • Group: ibmwxGranite7bLab

InstructLab foundation model from IBM that supports knowledge and skills contributed by the open source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-8b-japanese
  • Status: Withdrawn in 5.2.0
  • Model ID: ibm-granite-8b-japanese
  • Group: ibmwxGranite8bJapanese

A pretrained instruct variant model from IBM designed to work with Japanese text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-13b-instruct-v2
  • Status: Available
  • Model ID: ibm-granite-13b-instruct-v2
  • Group: ibmwxGranite13bInstructv2
General use model from IBM that is optimized for question and answer use cases.
Note: This model can be prompt tuned.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 62 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-3-3-8b-instruct
  • Status: Available in 5.2.0
  • Model ID: granite-3-3-8b-instruct
  • Group: ibmwxGranite338BInstruct

An IBM-trained, dense decoder-only model, which is particularly well-suited for generative tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 18 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
No
granite-3-2-8b-instruct
  • Status: Available
  • Model ID: granite-3-2-8b-instruct
  • Group: ibmwxGranite328BInstruct

A text-only model that is capable of reasoning. You can choose whether reasoning is enabled, based on your use case.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 2 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
granite-3-2b-instruct
  • Status: Available
  • Model ID: granite-3-2b-instruct
  • Group: ibmwxGranite32BInstruct

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 6 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-3-8b-instruct
  • Status: Available
  • Model ID: granite-3-8b-instruct
  • Group: ibmwxGranite38BInstruct

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-guardian-3-2b
  • Status: Available
  • Model ID: granite-guardian-3-2b
  • Group: ibmwxGraniteGuardian32b

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-guardian-3-8b
  • Status: Available
  • Model ID: granite-guardian-3-8b
  • Group: ibmwxGraniteGuardian38b

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-3b-code-instruct
  • Status: Available
  • Model ID: granite-3b-code-instruct
  • Group: ibmwxGranite3bCodeInstruct

A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 9 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
Yes
granite-8b-code-instruct
  • Status: Available
  • Model ID: granite-8b-code-instruct
  • Group: ibmwxGranite8bCodeInstruct

A 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 19 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
Yes
granite-20b-code-instruct
  • Status: Available
  • Model ID: granite-20b-code-instruct
  • Group: ibmwxGranite20bCodeInstruct

A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 70 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
No
granite-20b-code-base-schema-linking
  • Status: Available
  • Model ID: granite-20b-code-base-schema-linking
  • Group: ibmwxGranite20bCodeBaseSchemaLinking

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 44 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
granite-20b-code-base-sql-gen
  • Status: Available
  • Model ID: granite-20b-code-base-sql-gen
  • Group: ibmwxGranite20bCodeBaseSqlGen

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 44 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
granite-34b-code-instruct
  • Status: Available
  • Model ID: granite-34b-code-instruct
  • Group: ibmwxGranite34bCodeInstruct

A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 78 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
granite-vision-3-2-2b
  • Status: Available
  • Model ID: granite-vision-3-2-2b
  • Group: ibmwxGraniteVision322Bs

Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 7 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
No
jais-13b-chat
  • Status: Available
  • Model ID: core42-jais-13b-chat
  • Group: ibmwxCore42Jais13bChat

General use foundation model for generative tasks in Arabic.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 60 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
No
llama-4-maverick-17b-128e-instruct-fp8
  • Status: Available in 5.2.0
  • Model ID: llama-4-maverick-17b-128e-instruct-fp8
  • Group: ibmwxLlama4Maverick17B128EInstructFp8

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 96 GB RAM 425 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
All GPUs must be hosted on a single OpenShift worker node.
No
llama-4-scout-17b-16e-instruct
  • Status: Available in 5.2.0
  • Model ID: llama-4-scout-17b-16e-instruct
  • Group: ibmwxLlama4Scout17B16EInstruct

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 96 GB RAM 215 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-3-70b-instruct
  • Status: Available
  • Model ID: llama-3-3-70b-instruct
  • Group: ibmwxLlama3370BInstruct

A state-of-the-art refresh of the Llama 3.1 70B Instruct model that uses the latest advancements in post-training techniques.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 75 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-2-1b-instruct
  • Status: Available
  • Model ID: llama-3-2-1b-instruct
  • Group: ibmwxLlama321bInstruct

A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
llama-3-2-3b-instruct
  • Status: Available
  • Model ID: llama-3-2-3b-instruct
  • Group: ibmwxLlama323bInstruct

A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 9 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
llama-3-2-11b-vision-instruct
  • Status: Available
  • Model ID: llama-3-2-11b-vision-instruct
  • Group: ibmwxLlama3211bVisionInstruct

A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 2 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-2-90b-vision-instruct
  • Status: Available
  • Model ID: llama-3-2-90b-vision-instruct
  • Group: ibmwxLlama3290bVisionInstruct

A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 200 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 8 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-guard-3-11b-vision
  • Status: Available
  • Model ID: llama-guard-3-11b-vision
  • Group: ibmwxLlamaGuard311bVision

A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
llama-3-1-8b-instruct
Note: The deprecation of this model is reversed in 5.2.0.
  • Status: Available
  • Model ID: llama-3-1-8b-instruct
  • Group: ibmwxLlama318bInstruct

An auto-regressive language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
Yes
llama-3-1-70b-instruct
Note: The deprecation of this model is reversed in 5.2.0.
  • Status: Available
  • Model ID: llama-3-1-70b-instruct
  • Group: ibmwxLlama3170bInstruct

An auto-regressive language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 163 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-405b-instruct
  • Status: Available
  • Model ID: llama-3-405b-instruct
  • Group: ibmwxLlama3405bInstruct

Meta's largest open-sourced foundation model to date, with 405 billion parameters, and optimized for dialogue use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 500 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 8 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-8b-instruct
  • Status: Deprecated
  • Model ID: meta-llama-llama-3-8b-instruct
  • Group: ibmwxMetaLlamaLlama38bInstruct

Pretrained and instruction tuned generative text model optimized for dialogue use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 40 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
llama-3-70b-instruct
  • Status: Deprecated
  • Model ID: meta-llama-llama-3-70b-instruct
  • Group: ibmwxMetaLlamaLlama370bInstruct

Pretrained and instruction tuned generative text model optimized for dialogue use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
10 246 GB RAM 180 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-2-13b-chat
  • Status: Available
  • Model ID: meta-llama-llama-2-13b-chat
  • Group: ibmwxMetaLlamaLlama213bChat
General use with zero- or few-shot prompts. Optimized for dialogue use cases.
Note: This model can be prompt tuned.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 62 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
llama2-13b-dpo-v7
  • Status: Withdrawn in 5.2.0
  • Model ID: mncai-llama-2-13b-dpo-v7
  • Group: ibmwxMncaiLlama213bDpov7

General use foundation model for generative tasks in Korean.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
Yes
ministral-8b-instruct
  • Status: Available
  • Model ID: ministral-8b-instruct
  • Group: ibmwxMinistral8BInstruct
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 35 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
mistral-small-3-1-24b-instruct-2503
  • Status: Available in 5.2.0
  • Model ID: mistral-small-3-1-24b-instruct-2503
  • Group: ibmwxMistralSmall3124BInstruct2503

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities and is suitable for function calling and agents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 105 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
Yes
mistral-small-24b-instruct-2501
  • Status: Available
  • Model ID: mistral-small-24b-instruct-2501
  • Group: ibmwxMistralSmall24BInstruct2501

Mistral Small 3 (2501) sets a new benchmark in the small Large Language Models category with less than 70 billion parameters. With a size of 24 billion parameters, the model achieves state-of-the-art capabilities comparable to larger models.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
No
mistral-small-instruct
  • Status: Available
  • Model ID: mistral-small-instruct
  • Group: ibmwxMistralSmallInstruct
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
mistral-large-instruct-2411
  • Status: Available
  • Model ID: mistral-large-instruct-2411
  • Group: ibmwxMistralLargeInstruct2411
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 140 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
All GPUs must be hosted on a single OpenShift worker node.
No
mistral-large
  • Status: Available
  • Model ID: mistral-large
  • Group: ibmwxMistralLarge
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 240 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 8 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
mixtral-8x7b-instruct-v01
  • Status: Available
  • Model ID: mixtral-8x7b-instruct-v01
  • Group: ibmwxMistralaiMixtral8x7bInstructv01

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.

Mixtral-8x7B is not a commercial model and does not require a separate entitlement.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 195 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
mt0-xxl-13b
  • Status: Available
  • Model ID: bigscience-mt0-xxl
  • Group: ibmwxBigscienceMt0xxl

General use with zero- or few-shot prompts. Supports prompts in languages other than English and multilingual prompts.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 62 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
pixtral-large-instruct-2411
  • Status: Available
  • Model ID: pixtral-large-instruct
  • Group: ibmwxPixtralLargeInstruct
A a 124-billion multimodal model built on top of Mistral Large 2, and demonstrates frontier-level image understanding.
Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 240 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
All GPUs must be hosted on a single OpenShift worker node.
No
pixtral-12b
  • Status: Available
  • Model ID: pixtral-12b
  • Group: ibmwxPixtral12b

A 12-billion parameter model pretrained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No

You cannot add deprecated or withdrawn models to your deployment. For more information about how deprecated and withdrawn models are handled, see Foundation model lifecycle.

Custom foundation models

In addition to foundation models that are curated by IBM, you can upload and deploy your own foundation models. For more information about how to upload, register, and deploy a custom foundation model, see the following information:

Embedding models

Text embedding models are small enough that the models can run without GPU. However, if you need better performance from the embedding models, you can configure them to use GPU.

all-minim-l6-v2
  • Status: Available
  • Model ID: all-minilm-l6-v2
  • Group: ibmwxAllMinilmL6V2

Use all-minim-l6-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
all-minim-l12-v2
  • Status: Available
  • Model ID: all-minilm-l12-v2
  • Group: ibmwxAllMinilmL12V2

Use all-minim-l12-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-embedding-107m-multilingual
  • Status: Available
  • Model ID: granite-embedding-107m-multilingual
  • Group: ibmwxGranite107MMultilingualRtrvr

A 107 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 2 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
granite-embedding-278m-multilingual
  • Status: Available
  • Model ID: granite-embedding-278m-multilingual
  • Group: ibmwxGranite278MMultilingualRtrvr

A 278 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 2 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
multilingual-e5-large
  • Status: Available
  • Model ID: multilingual-e5-large
  • Group: ibmwxMultilingualE5Large

An embedding model built by Microsoft and provided by Hugging Face. The multilingual-e5-large model is useful for tasks such as passage or information retrieval, semantic similarity, bitext mining, and paraphrase retrieval.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 8 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
slate-30m-english-rtrvr
  • Status: Available
  • Model ID: ibm-slate-30m-english-rtrvr
  • Group: ibmwxSlate30mEnglishRtrvr

The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
slate-125m-english-rtrvr
  • Status: Available
  • Model ID: ibm-slate-125m-english-rtrvr
  • Group: ibmwxSlate125mEnglishRtrvr

The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes

Reranker models

Reranker models are small enough that the models run without GPU.

ms-marco-MiniLM-L-12-v2
  • Status: Available
  • Model ID: ms-marco-minilm-l-12-v2
  • Group: ibmwxMsMarcoMinilmL12V2

A reranker model built by Microsoft and provided by Hugging Face. Given query text and a set of document passages, the model ranks the list of passages from most-to-least related to the query.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB This model does not require any GPU. Not applicable.

Custom foundation models

In addition to foundation models that are curated by IBM, you can upload and deploy your own foundation models. For more information about how to upload, register, and deploy a custom foundation model, see the following information:

Text extraction models

Text extraction is a programmatic method for extracting text from images, tables, and structured PDF documents by using the IBM watsonx.ai API. To use the text extraction API, you must install a set of machine learning models that do the natural language understanding processing during text extraction.
Note: You cannot install text extraction models with a watsonx.ai lightweight engine installation.
Text extraction
  • Status: Available
  • Model ID: wdu
  • Group: Not necessary. The models are always downloaded because they have a small footprint.

A set of text extraction models that are represented by the "wdu" identifier.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 31 GB 20 GB These models do not require any GPU. Not applicable.

Time series foundation models

You can use the time series API to pass historical data observations to a time series foundation model that can forecast future values. You can deploy the following time series foundation models:

granite-ttm-512-96-r2
  • Status: Available
  • Model ID: granite-ttm-512-96-r2
  • Group: ibmwxGraniteTimeseriesTtmV1

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.
granite-ttm-1024-96-r2
  • Status: Available
  • Model ID: granite-ttm-1024-96-r2
  • Group: ibmwxGraniteTimeseriesTtmV1

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.
granite-ttm-1536-96-r2
  • Status: Available
  • Model ID: granite-ttm-1536-96-r2
  • Group: ibmwxGraniteTimeseriesTtmV1

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.

Foundation models compatible with LoRA and QLoRA fine tuning

You can use Parameter-Efficient Fine Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA) to train, deploy, and inference foundation models. The foundation models compatible with LoRA and QLoRA tuning can only be fine tuned. Unlike most large language models that are provided with IBM watsonx.ai, these models cannot be inferenced in the Prompt Lab or programmatically by using the API right away. The only way to inference one of these base models is to deploy the model as a custom foundation model.

You can deploy the following foundation models that are compatible with LoRA and QLoRA fine tuning:

granite-3-1-8b-base
  • Status: Available
  • Fine tuning method: LoRA
  • Model ID: granite-3-1-8b-base
  • Group: ibmwxGranite318BBase

Granite 3.1 8b base is a pretrained autoregressive foundation model with a context length of 128k intended for tuning.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
No
llama-3-1-8b
  • Status: Available
  • Fine tuning method: LoRA
  • Model ID: llama-3-1-8b
  • Group: ibmwxLlama318B

Llama-3-1-8b is a pretrained and fine-tuned generative text model with 8 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
No
llama-3-1-70b
  • Status: Available
  • Fine tuning method: LoRA
  • Model ID: llama-3-1-70b
  • Group: ibmwxLlama318B

Llama-3-1-70b is a pretrained and fine-tuned generative text model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 280 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
No
llama-3-1-70b-gptq
  • Status: Available
  • Fine tuning method: QLoRA
  • Model ID: llama-3-1-70b-gptq
  • Group: ibmwxLlama3170BGptq

Llama 3.1 70b is a pretrained and fine-tuned generative text base model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 40 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
No