GPU requirements for models
If you plan to install services that use models, ensure that you have sufficient GPU and that you have GPUs that work with the models you need or want to use.
IBM Software Hub AI assistant
The following models are required only if you plan to use the IBM Software Hub AI assistant and you want to use models that are hosted on a local instance of watsonx.ai™.
- IBM Software Hub AI Assistant Cartridge
- watsonx.ai
- granite-4-h-small
- Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 150 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA RTX PRO 6000
No
- slate-30m-english-rtrvr
- Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB GPUs are not required.
If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
IBM Knowledge Catalog Premium
IBM Knowledge Catalog Standard
- granite-4-h-small
-
Required if the following statements are true:
- You plan to enable gen AI based features
- You want to run the gen AI based features on GPU
You can optionally run the gen AI based features on:- CPURestriction: This option can be used only for expanding metadata and term assignment when enriching metadata (
enableSemanticEnrichment: true).This option is not supported for converting natural language queries to SQL queries (
enableTextToSql: true). - A remote instance of watsonx.ai
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 150 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA RTX PRO 6000
No
Watson Studio Runtimes
If you plan to use Watson Studio Runtimes that require GPU, the service requires at least one GPU.
- Runtime 24.1 on Python 3.11 for GPU
- The following table includes the default resource requirements. However, you might need to
increase the resources depending on your use case.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 (default) 2 GB (default) No storage required. If you need storage, you can connect to a data store.
You can use any of the following GPU types: - 1 NVIDIA A30
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes NVIDIA Multi-Instance GPU support is limited to the following GPU types:
- NVIDIA A100
- NVIDIA H100
All of the partitions must be the same configuration and size.
- Runtime 25.1 on Python 3.12 for GPU
- The following table includes the default resource requirements. However, you might need to
increase the resources depending on your use case.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 (default) 2 GB (default) No storage required. If you need storage, you can connect to a data store.
You can use any of the following GPU types: - 1 NVIDIA A30
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes NVIDIA Multi-Instance GPU support is limited to the following GPU types:
- NVIDIA A100
- NVIDIA H100
All of the partitions must be the same configuration and size.
Watson Machine Learning
Watson Machine Learning does not provide any models. You can bring or create your own machine learning models, Deep Learning models, and foundation models.
If you plan to use deep learning or models that require GPU, the service requires at least one GPU.
| CPU | Memory | Storage | Supported GPUs | NVIDIA Multi-Instance GPU support |
|---|---|---|---|---|
| The number of CPU depend on the model that you use. | The amount of memory depends on the model that you use. | The amount of storage depend on the model that you use. | You can use any of the following GPU types:
All GPU nodes on the cluster must be the same type of GPU. |
Yes NVIDIA Multi-Instance GPU support is limited to the following GPU types:
All of the partitions must be the same configuration and size. |
Watson Speech services
- mistral-small-3-1-24b-instruct-2503
- Required only if you plan to enable enrichment to improve the readability and usability of raw Automatic Speech Recognition (ASR) transcripts.
watsonx.ai
You can choose which foundation models to install.
watsonx.ai supports the following types of foundation models:
Foundation models
- allam-1-13b-instruct
-
Status: Deprecated
A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.Yes
- codestral-2501
-
Status: Deprecated
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
Yes, with additional configuration. For details, see Installing models on GPU partitions.
- codestral-2508
-
Status: Available
Ideal for code generation and high-precision fill-in-the-middle (FIM) completion. The foundation model is optimized for production engineering environments such as latency-sensitive, context-aware, and self-deployable.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 30 GB You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H20
- 2 NVIDIA H100
- 2 NVIDIA H200
No
- devstral-medium-2507
-
Status: Available
The devstral-medium-2507 foundation model from Mistral AI is a high-performance code generation and agentic reasoning model. Ideal for generalization across prompt styles and tool use in code agents and frameworks.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 250 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H20
- 4 NVIDIA H100
- 4 NVIDIA H200
No
- devstral-medium-2512
-
Status: Available
The devstral-medium-2512 foundation model from Mistral AI is an agentic model for software engineering tasks from the Devstral 2 model family that excels at using tools to explore code bases, editing multiple files, and power software engineering agents.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 200 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H100
- 4 NVIDIA H200
No
- devstral-small-2512
-
Status: Available
The devstral-small-2512 foundation model from Mistral AI is an agentic model for software engineering tasks from the Devstral 2 model family that excels at using tools to explore code bases, editing multiple files, and power software engineering agents.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 Gi RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- gpt-oss-20b
-
Status: Available
The gpt-oss foundation models are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various developer use cases.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 128 GB RAM 100 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- gpt-oss-120b
-
Status: Available
The gpt-oss foundation models are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various developer use cases.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 6 96 GB RAM 195 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No
- granite-3-2b-instruct
-
Status: Available
Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 6 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-3-8b-instruct
-
Status: Available
Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Yes
- granite-3-2-8b-instruct
-
Status: Deprecated
A text-only model that is capable of reasoning. You can choose whether reasoning is enabled, based on your use case.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 GB RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
No
- granite-3-3-8b-instruct
-
Status: Available
An IBM-trained, dense decoder-only model, which is particularly well-suited for generative tasks.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 18 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 NVIDIA RTX PRO 6000
No
- granite-3b-code-instruct
-
Status: Available
A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 9 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Note: This model can be fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.Yes
- granite-8b-code-instruct
-
Status: Available
A 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 19 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, NVIDIA H200, or NVIDIA L40S GPU.Yes
- granite-20b-code-instruct
-
Status: Available
A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 70 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.No
- granite-34b-code-instruct
-
Status: Available
A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 78 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
No
- granite-20b-code-base-schema-linking
-
Status: Available
Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 44 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- granite-20b-code-base-sql-gen
-
Status: Available
Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 44 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- granite-4-h-micro
-
Status: Available
The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-micro is a 3 billion parameter foundation model built for structured and long-context capabilities. The model is ideal for instruction following and tool-calling.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- granite-4-h-small
-
Status: Available
The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-small is 30 billion parameter foundation model built for structured and long-context capabilities. The model is ideal for instruction following and tool-calling capabilities.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 150 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA RTX PRO 6000
No
- granite-4-h-tiny
-
Status: Available
The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-tiny is a 7 billion parameter long-context instruction-tuned model developed using a diverse set of techniques with a structured chat format, including supervised fine-tuning, model alignment using reinforcement learning, and model merging. This model is ideal for instruction following and tool-calling capabilities.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- granite-docling-258M
-
Status: Available
Granite Docling is a multimodal image text to text model efficient for document conversion. The model preserves the core features of Docling while maintaining seamless integration with Docking documents to ensure full compatibility.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 GB RAM 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- granite-guardian-3-2b
-
Status: Deprecated
Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Yes
- granite-guardian-3-8b
-
Status: Deprecated
Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-guardian-3-2-5b
-
Status: Available
The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well suited for generative tasks. This model cannot be used through the API.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 4 GB RAM 15 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 NVIDIA RTX PRO 6000
No
- granite-4-1b-speech
-
Status: Available
Granite-4-1b-speech is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). The model was trained on a collection of public corpora comprising of diverse datasets for ASR and AST as well as synthetic datasets tailored to support Japanese ASR, keyword-biased ASR and speech translation. Granite-4-1b-speech was trained by modality aligning granite-4-1b-base to speech on publicly available open source corpora containing audio inputs and text targets.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 10Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- granite-vision-3-2-2b
-
Status: Deprecated
Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 GB RAM 7 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- granite-vision-3-3-2b
-
Status: Available
Granite 3.2 Vision is an image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 128 GB RAM 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- ibm-defense-3-3-8b-instruct
-
Status: Available
The IBM watsonx.ai Defense Model is a specialized fine-tuned version of IBM’s granite-3-3-8b-instruct base model. The model is developed through Janes trusted open-source defense data to support defense and intelligence operations.
Attention: You must purchase the IBM watsonx.ai Defense Model entitlement separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 8 GB RAM 18 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- ibm-defense-4-0-micro
-
Status: Available
The ibm-defense-4-0-micro is a defense-focused large language model (LLM) fine-tuned by an IBM Granite model. This model is designed to work with Janes foundation defense data, delivering fast, reliable and contextual results for mission-critical tasks in defense organizations.
Attention: You must purchase the IBM watsonx.ai Defense Model entitlement separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 GB RAM 60 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
No
- ibm-defense-4-0-small
-
Status: Available
The ibm-defense-4-0-small is a defense-focused large language model fine-tuned by an IBM Granite model. This model is designed to work with Janes foundation defense data, delivering fast, reliable and contextual results for mission-critical tasks in defense organizations.
Attention: You must purchase the IBM watsonx.ai Defense Model entitlement separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 85 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- llama-3-1-8b-instruct
-
Status: Available
An auto-regressive language model that uses an optimized transformer architecture.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA L40S GPU.Yes
- llama-3-1-70b-instruct
-
Status: Available
An auto-regressive language model that uses an optimized transformer architecture.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 16 246 GB RAM 163 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H100
- 2 NVIDIA H200
- 4 NVIDIA L40S
No
- llama-3-2-1b-instruct
-
Status: Available
A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Yes
- llama-3-2-3b-instruct
-
Status: Available
A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 9 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 Intel Gaudi 3 AI Accelerator
Yes
- llama-3-3-70b-instruct
-
Status: Available
A state-of-the-art refresh of the Llama 3.1 70B Instruct model that uses the latest advancements in post-training techniques.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 75 GB You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H20
- 2 NVIDIA H100
- 1 NVIDIA H200
- 4 NVIDIA L40S
No
- llama-3-2-11b-vision-instruct
-
Status: Available
A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
- 1 NVIDIA RTX PRO 6000
No
- llama-3-2-90b-vision-instruct
-
Status: Available
A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 16 246 GB RAM 200 GB You can use any of the following GPU types: - 8 NVIDIA A100
- 8 NVIDIA H20
- 8 NVIDIA H100
- 4 NVIDIA H200
- 8 NVIDIA L40S
No
- llama-guard-3-11b-vision
-
Status: Available
A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 NVIDIA RTX PRO 6000
No
- llama-4-maverick-17b-128e-instruct-fp8
-
Status: Available
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 9 96 GB RAM 425 GB You can use any of the following GPU types: - 8 NVIDIA A100
- 8 NVIDIA H100
- 4 NVIDIA H200
No
- llama-4-maverick-17b-128e-instruct-int4
-
Status: Available
The Llama 4 collection of models are multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 4 128 GB RAM 250 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H100
- 4 NVIDIA H200
No
- llama-4-scout-17b-16e-instruct-int4
-
Status: Available
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 128 GB RAM 215 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- magistral-medium-2509
-
Status: Available
Magistral Medium 2509 is an update to the 2507 version with improvements in math and coding benchmarks, along with image input support.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 400Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- magistral-small-2509
-
Status: Available
Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 Gi RAM 120Gi You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H100
- 2 NVIDIA L40S
Yes
- ministral-3b-instruct-2512
-
Status: Available
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 18Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- ministral-8b-instruct
-
Status: Deprecated
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 35 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- ministral-8b-instruct-2512
-
Status: Available
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 18Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- ministral-14b-instruct-2512
-
Status: Available
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 GB RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- ministral-3-14b-instruct-2512-bf16
-
Status: Available
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 18Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- mistral-large-2512
-
Status: Available
The mistral-large-2512 foundation model, also known as Mistral Large 3, is a state-of-the-art general-purpose multimodal granular mixture-of-experts model with 41 billion active parameters and 675 billion total parameter. The model is trained from the ground up with 3000 NVIDIA H200 GPUs.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 48 512 GB RAM 969 GB You can use any of the following GPU types: - 8 NVIDIA H200
No
- mistral-large-instruct-2411
-
Status: Available
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 140 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H20
- 4 NVIDIA H100
- 4 NVIDIA H200
No
- mistral-medium-2505
-
Status: Deprecated
Mistral Medium 3 features multimodal capabilities and an extended context length of up to 128k. The model can process and understand visual inputs, long documents and supports many languages.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 Gi RAM 280 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H100
- 4 NVIDIA H200
- 4 NVIDIA L40S
No
- mistral-medium-2508
-
Status: Available
The mistral-medium-2508 foundation model is an enhancement of mistral-medium-2505, with state-of-the-art performance in coding and multimodal understanding.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 300 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H20
- 4 NVIDIA H100
- 4 NVIDIA H200
No
- mistral-small-3-1-24b-instruct-2503
-
Status: Available
Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities and is suitable for function calling and agents.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 105 GB You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
Yes
- mistral-small-3-2-24b-instruct-2506
-
Status: Available
The mistral-small-3-2-24b-instruct-2506 foundation model is an enhancement to mistral-small-3-1-24b-instruct-2503, with better instruction following and tool calling performance.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 210 GB You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H20
- 2 NVIDIA H100
- 2 NVIDIA H200
No
- nvidia-nemotron-nano-12b-v2-vl-fp8
-
Status: Available
NVIDIA-Nemotron-Nano-VL-12B-V2-FP8 is the quantized version of the NVIDIA Nemotron Nano VL V2 model, which is an auto-regressive vision language model that uses an optimized transformer architecture.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 30Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes - nvidia-nemotron-3-nano-30b-a3b-fp8
-
Status: Available
Nemotron-Nano-3-30B-A3B-FP8 is a quantized version of Nemotron-Nano-3-30B-A3B and is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 4 64 Gi RAM 40Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- pixtral-12b
-
Status: Deprecated
A 12 billion parameter model pretrained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 30 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No
- pixtral-large-instruct-2411
-
Status: Available
A 124 billion multimodal model built on top of Mistral Large 2, and demonstrates frontier-level image understanding.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 16 246 GB RAM 240 GB You can use any of the following GPU types: - 8 NVIDIA A100
- 8 NVIDIA H20
- 8 NVIDIA H100
- 4 NVIDIA H200
No
- voxtral-mini-2507
-
Status: Available
Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation, and audio understanding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 18Gi You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
Yes
- voxtral-small-24b-2507
-
Status: Available
Voxtral Small is an enhancement of Mistral Small 3.1, incorporating state-of-the-art audio capabilities and text performance, capable of processing up to 30 minutes of audio.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 210 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
No
Embedding models
Text embedding are small enough that the models can run without GPU. However, if you need better performance from the embedding models, you can configure them to use GPU.
- all-minilm-l6-v2
-
Status: Available
Use all-minilm-l6-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 1 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- all-minilm-l12-v2
-
Status: Available
Use all-minilm-l12-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 1 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-embedding-107m-multilingual
-
Status: Available
A 107 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 2 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-embedding-278m-multilingual
-
Status: Available
A 278 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 2 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-embedding-english-reranker-r2
-
Status: Available
A 149 million parameter model from the Granite Embeddings suite provided by IBM. The model has been trained for passage reranking, based on the granite-embedding-english-r2 to use in RAG pipelines.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB RAM 1 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
No
- multilingual-e5-large
-
Status: Available
An embedding model built by Microsoft and provided by Hugging Face. The multilingual-e5-large model is useful for tasks such as passage or information retrieval, semantic similarity, bitext mining, and paraphrase retrieval.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 4 8 GB 10 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- slate-30m-english-rtrvr
-
Status: Available
The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- slate-125m-english-rtrvr
-
Status: Available
The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
Reranker models
Reranker models are small enough that the models run without GPU.
- ms-marco-MiniLM-L-12-v2
-
Status: Available
A reranker model built by Microsoft and provided by Hugging Face. Given query text and a set of document passages, the model ranks the list of passages from most-to-least related to the query.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB This model does not require any GPU. Not applicable.
Document text processing models
Document text processing models are small enough that the models run without GPU.
- wdu
-
Status: Available
A set of natural language text processing models that are represented by the "wdu" identifier.
Restriction: You cannot install text processing models on watsonx.ai lightweight engine.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 9 31 GB 20 GB These models do not require any GPU. Not applicable.
Time series models
Time series are small enough that the models run without GPU.
- granite-ttm-512-96-r2
-
Status: Available
The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 1 GB This model does not require any GPU. Not applicable.
- granite-ttm-1024-96-r2
-
Status: Available
The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 1 GB This model does not require any GPU. Not applicable.
- granite-ttm-1536-96-r2
-
Status: Available
The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 1 GB This model does not require any GPU. Not applicable.
Foundation models available for tuning
- granite-3-1-8b-base
-
Status: Available
Tuning method: Full fine tuning, LoRA fine tuning
Granite 3.1 8b base is a pretrained autoregressive foundation model with a context length of 128k intended for tuning.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No
- llama-3-1-8b
-
Status: Available
Tuning method: Full fine tuning, LoRA fine tuning
Llama-3-1-8b is a pretrained generative text model with 8 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 96 GB RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No
- llama-3-1-70b
-
Status: Available
Tuning method: Full fine tuning, LoRA fine tuning
Llama-3-1-70b is a pretrained generative text model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 16 246 GB RAM 280 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H100
- 4 NVIDIA H200
- 2 NVIDIA L40S
No
- llama-3-1-70b-gptq
-
Status: Available
Tuning method: QLoRA fine tuning
Llama 3.1 70b is a pretrained generative text base model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 40 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H100
- 4 NVIDIA H200
No
watsonx Assistant
- gpt-oss-120b
- Required only if you plan to enable one or more of the following features:
- Rewrite user questions to an understood format for conversational search
- Answer conversational search questions
- Gather information to fill in variables in custom actions
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 6 96 GB RAM 195 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No - Deprecated models
- If you are upgrading to IBM Software
Hub Version
5.4, migrate to the
gpt-oss-120b model. The following
models are deprecated for use with watsonx Assistant:
- granite-3-8b-instruct
- ibm-granite-8b-unified-api-model-v2
- llama-3-1-70b-instruct
- llama-3-3-70b-instruct
watsonx BI
- gpt-oss-120b
- Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 6 96 GB RAM 195 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No
- granite-4-h-small
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 150 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- NVIDIA L40S not supported
- Intel Gaudi 3 AI Accelerator not supported
No - slate-30m-english-rtrvr
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
watsonx Code Assistant™
- granite-3-3-8b-instruct
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 18 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S*
No * NVIDIA L40S GPUs have lower performance than other GPUs. This type of GPU is suitable only for smaller payloads.
- ibm-granite-20b-code-javaenterprise-v2
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 2 GB RAM 45 GB You can use any of the following GPU types: - 1 NVIDIA H100
No - ms-marco-MiniLM-L-12-v2
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB This model does not require any GPU. Not applicable.
watsonx Code Assistant for Red Hat Ansible Lightspeed
- ibm-granite-20b-code-8k-ansible
-
Required. The model provides the following features:
- Ansible® task generation
- Ansible role generation
- Ansible code explanation
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 2 GB RAM 45 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA L40S
No
watsonx Code Assistant for Z Agentic
- mistral-medium-2508
-
Required.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 300 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H20
- 4 NVIDIA H100
- 4 NVIDIA H200
No
watsonx Code Assistant for Z Understand
- mistral-medium-2508
-
Required.
Attention: You must purchase the Mistral AI with IBM Z® for IBM Z license to download and use this model.CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 5 246 GB RAM 300 GB You can use any of the following GPU types: - 4 NVIDIA A100
- 4 NVIDIA H20
- 4 NVIDIA H100
- 4 NVIDIA H200
No - ms-marco-MiniLM-L-12-v2
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB This model does not require any GPU. Not applicable. - slate-125m-english-rtrvr
-
Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB GPUs are not required. If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
watsonx.data™ Premium
watsonx.data intelligence
- granite-4-h-small
-
Required if the following statements are true:
- You plan to enable gen AI based features
- You want to run the gen AI based features on GPU
You can optionally run the gen AI based features on:- CPU
- A remote instance of watsonx.ai.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 150 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA RTX PRO 6000
No
- Unstructured Data Integration
-
If you plan to install the Unstructured Data Integration feature, the feature requires several models. Use the following table to determine the GPU requirements based on the models that you plan to use.
What is the model used for? Models you can use Data class assignment You must use the following - mistral-small-3-1-24b-instruct-2503
Embedding Pick one of the following models: - granite-embedding-278m-multilingual
- multilingual-e5-large
Alternatively, you can use another embedding model if it is already available in your environment.
Text extraction You must use the following model: - mistral-small-3-1-24b-instruct-2503
HAP filtering Pick one of the following models: - granite-guardian-3-2-5b
- granite-guardian-3-8b
Important: For models that use more than 1 GPU, all GPUs must be hosted on a single Red Hat OpenShift Container Platform worker node.- mistral-small-3-1-24b-instruct-2503
- Required.This model is required for:
- Data class assignment
- Text extraction
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 105 GB You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
Yes
- granite-embedding-278m-multilingual
- Optional.
This model can be used for embedding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 2 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- multilingual-e5-large
- Optional.
This model can be used for embedding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 4 8 GB 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-embedding-278m-multilingual
- Optional.
This model can be used for embedding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 2 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- granite-guardian-3-2-5b
- Optional.
This model can be used for HAP filtering.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 4 GB RAM 15 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 NVIDIA RTX PRO 6000
No
- granite-guardian-3-8b
- Optional.
This model can be used for HAP filtering.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
watsonx.data integration
If you plan to install the Unstructured Data Integration feature, the feature requires several models. Use the following table to determine the GPU requirements based on the models that you plan to use.
| What is the model used for? | Models you can use |
|---|---|
| Data class assignment | You must use the following
|
| Embedding | Pick one of the following models:
Alternatively, you can use another embedding model if it is already available in your environment. |
| Text extraction | You must use the following model:
|
| HAP filtering | Pick one of the following models:
|
- mistral-small-3-1-24b-instruct-2503
- Required.This model is required for:
- Data class assignment
- Text extraction
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 3 96 GB RAM 105 GB You can use any of the following GPU types: - 2 NVIDIA A100
- 2 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
Yes - granite-embedding-278m-multilingual
- Optional.
This model can be used for embedding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 2 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes - multilingual-e5-large
- Optional.
This model can be used for embedding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 4 8 GB 10 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes - granite-embedding-278m-multilingual
- Optional.
This model can be used for embedding.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 2 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes - granite-guardian-3-2-5b
- Optional.
This model can be used for HAP filtering.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 1 4 GB RAM 15 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
- 1 NVIDIA RTX PRO 6000
No - granite-guardian-3-8b
- Optional.
This model can be used for HAP filtering.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 32 Gi RAM 20 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
watsonx Orchestrate
You can choose where the foundation models that you need are hosted:
- The same cluster as watsonx Orchestrate
-
Choosing a model GPU requirements You must use the models provided by IBM. You must have sufficient GPU on the cluster where you plan to install watsonx Orchestrate. - A remote or external cluster by using AI gateway
-
Choosing a model GPU requirements You can choose whether to use: - The models provided by IBM
- A custom model
If you use a custom model, you must register the external model through AI gateway.
Local GPU is not required. Remote GPU might be required:- If you plan to host models on a remote cluster, you must have sufficient GPU on the cluster
where you plan to install the foundation models.
For more information on GPU requirements, consult the documentation from the model provider.
- If you plan to use models hosted by a third-party, you don't need GPU.
- Models provided by IBM
-
Important: For models that use more than 1 GPU, all GPUs must be hosted on a single Red Hat OpenShift Container Platform worker node.
- gpt-oss-120b
- Required if you use local models. The model is used to:
- Answer conversational search questions
- Rewrite user questions to an understood format for conversational search
- Gather information to fill in variables in custom actions
- Select, connect, and coordinate multiple tools or APIs by using agentic AI
- Use prebuilt agentic AI agents that target specific domains
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 6 96 GB RAM 195 GB You can use any of the following GPU types: - 1 NVIDIA A100
- 1 NVIDIA H20
- 1 NVIDIA H100
- 1 NVIDIA H200
- 2 NVIDIA L40S
No
- slate-30m-english-rtrvr
-
Required.
The model is used to:- Provide semantic search of the watsonx Orchestrate catalog
- Agent knowledge file upload
This model does not require GPU and is always installed on the same cluster as watsonx Orchestrate.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support 2 4 GB 10 GB GPUs are not required.
If you need better performance, you can use any of the following GPU types:
- 1 NVIDIA A100
- 1 NVIDIA H100
- 1 NVIDIA H200
- 1 NVIDIA L40S
Yes
- Deprecated models
- If you are upgrading to IBM Software
Hub Version
5.4, migrate to the
gpt-oss-120b model. The following
models are deprecated for use with watsonx
Orchestrate:
- granite-3-8b-instruct
- ibm-granite-8b-unified-api-model-v2
- llama-3-1-70b-instruct
- llama-3-2-90b-vision-instruct