GPU requirements for models

If you plan to install services that use models, ensure that you have sufficient GPU and that you have GPUs that work with the models you need or want to use.

IBM Knowledge Catalog Premium

granite-3-8b-instruct
Required if the following statements are true:
  • You plan to enable gen AI based features
  • You want to run the gen AI based features on GPU
You can optionally run the gen AI based features on:
  • CPU
    Restriction: This option can be used only for expanding metadata and term assignment when enriching metadata (enableSemanticEnrichment: true).

    This option is not supported for converting natural language queries to SQL queries ( enableTextToSql: true).

  • A remote instance of watsonx.ai™
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes

IBM Knowledge Catalog Standard

granite-3-8b-instruct
Required if the following statements are true:
  • You plan to enable gen AI based features
  • You want to run the gen AI based features on GPU
You can optionally run the gen AI based features on:
  • CPU
    Restriction: This option can be used only for expanding metadata and term assignment when enriching metadata (enableSemanticEnrichment: true).

    This option is not supported for converting natural language queries to SQL queries ( enableTextToSql: true).

  • A remote instance of watsonx.ai
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes

Watson Studio Runtimes

If you plan to use Watson Studio Runtimes that require GPU, the service requires at least one GPU.

Runtime 24.1 on Python 3.11 for GPU
The following table includes the default resource requirements. However, you might need to increase the resources depending on your use case.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 (default) 2 GB (default) No storage required.

If you need storage, you can connect to a data store.

You can use any of the following GPU types:
  • 1 NVIDIA A30
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes

NVIDIA Multi-Instance GPU support is limited to the following GPU types:

  • NVIDIA A100
  • NVIDIA H100

All of the partitions must be the same configuration and size.

Runtime 25.1 on Python 3.12 for GPU
The following table includes the default resource requirements. However, you might need to increase the resources depending on your use case.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 (default) 2 GB (default) No storage required.

If you need storage, you can connect to a data store.

You can use any of the following GPU types:
  • 1 NVIDIA A30
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes

NVIDIA Multi-Instance GPU support is limited to the following GPU types:

  • NVIDIA A100
  • NVIDIA H100

All of the partitions must be the same configuration and size.

Watson Machine Learning

Watson Machine Learning does not provide any models. You can bring or create your own machine learning models, Deep Learning models, and foundation models.

If you plan to use deep learning or models that require GPU, the service requires at least one GPU.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
The number of CPU depend on the model that you use. The amount of memory depends on the model that you use. The amount of storage depend on the model that you use. You can use any of the following GPU types:
  • NVIDIA A100
  • NVIDIA H100
  • NVIDIA V100
  • NVIDIA L40S

All GPU nodes on the cluster must be the same type of GPU.

Yes

NVIDIA Multi-Instance GPU support is limited to the following GPU types:

  • NVIDIA A100
  • NVIDIA H100

All of the partitions must be the same configuration and size.

watsonx.ai

You can choose which foundation models to install.

Foundation models

allam-1-13b-instruct

Status: Available

A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Note: This model can be fine tuned when configured to use NVIDIA A100 or H100 GPU.
Yes
codestral-2501

Status: Available

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
Yes, with additional configuration. For details, see Installing models on GPU partitions.
codestral-2508

Status: Available

5.2.2 and later This model is available starting in IBM® Software Hub Version 5.2.2.

Ideal for code generation and high-precision fill-in-the-middle (FIM) completion. The foundation model is optimized for production engineering environments such as latency-sensitive, context-aware, and self-deployable.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 30 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 2 NVIDIA H200
  • NVIDIA L40S not supported
No
codestral-22b

Status: Deprecated

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
No
devstral-medium-2507

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The devstral-medium-2507 foundation model is a high-performance code generation and agentic reasoning model. Ideal for generalization across prompt styles and tool use in code agents and frameworks.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 150 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H20
  • 4 NVIDIA H100
  • 4 NVIDIA H200
  • NVIDIA L40S not supported
No
elyza-japanese-llama-2-7b-instruct

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
flan-t5-xl-3b

Status: Deprecated

General use with zero- or few-shot prompts.

Note: This foundation model can be prompt tuned.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 21 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
flan-t5-xxl-11b

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

General use with zero- or few-shot prompts.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 52 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
flan-ul2-20b

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

General use with zero- or few-shot prompts.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
128 GB RAM 85 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
Yes
gpt-oss-20b

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The gpt-oss foundation models are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various developer use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 100 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
gpt-oss-120b

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The gpt-oss foundation models are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various developer use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
6 96 GB RAM 195 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No
granite-4-h-micro

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-micro is a 3 billion parameter foundation model built for structured and long-context capabilities. The model is ideal for instruction following and tool-calling capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
granite-4-h-small

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-small is 30 billion parameter foundation model built for structured and long-context capabilities. The model is ideal for instruction following and tool-calling capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 150 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 2 NVIDIA H200
No
granite-7b-lab

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

InstructLab foundation model from IBM that supports knowledge and skills contributed by the open source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-8b-japanese

Status: Withdrawn

5.2.0 The model was withdrawn in IBM Software Hub Version 5.2.0.

A pretrained instruct variant model from IBM designed to work with Japanese text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-13b-instruct-v2

Status: Deprecated

General use model from IBM that is optimized for question and answer use cases.

Note: This model can be prompt tuned
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 62 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-3-3-8b-instruct

Status: Available

An IBM-trained, dense decoder-only model, which is particularly well-suited for generative tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 8 GB RAM 18 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-3-2-8b-instruct

Status: Available

A text-only model that is capable of reasoning. You can choose whether reasoning is enabled, based on your use case.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
All GPUs must be hosted on a single OpenShift worker node.
No
granite-3-2b-instruct

Status: Available

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 6 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-3-8b-instruct

Status: Available

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
granite-guardian-3-2b

Status: Available

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
granite-guardian-3-8b

Status: Available

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-guardian-3-2-5b

Status: Available

5.2.1 and later This model is available starting in IBM Software Hub Version 5.2.1.

The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well suited for generative tasks. This model cannot be used through the API.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 4 GB RAM 15 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-3b-code-instruct

Status: Available

A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 9 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be fine tuned when configured to use NVIDIA A100 or H100 GPU.
Yes
granite-8b-code-instruct

Status: Available

A 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 19 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be fine tuned when configured to use NVIDIA A100 or H100 GPU.
Yes
granite-20b-code-instruct

Status: Available

A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 70 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be fine tuned when configured to use NVIDIA A100 or H100 GPU.
No
granite-20b-code-base-schema-linking

Status: Available

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 44 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-20b-code-base-sql-gen

Status: Available

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 44 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-34b-code-instruct

Status: Available

A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 78 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
No
granite-vision-3-2-2b

Status: Deprecated

Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 7 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-vision-3-3-2b

Status: Available

5.2.1 and later This model is available starting in IBM Software Hub Version 5.2.1.

Granite 3.2 Vision is an image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 128 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
ibm-defense-3-3-8b-instruct

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The IBM watsonx.ai Defense Model is a specialized fine-tuned version of IBM’s granite-3-3-8b-instruct base model. The model is developed through Janes trusted open-source defense data to support defense and intelligence operations.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 8 GB RAM 18 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
jais-13b-chat

Status: Deprecated

General use foundation model for generative tasks in Arabic.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 60 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
llama-4-maverick-17b-128e-instruct-fp8

Status: Available

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 96 GB RAM 425 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 4 NVIDIA H200
All GPUs must be hosted on a single OpenShift worker node.
No
llama-4-maverick-17b-128e-instruct-int4

Status: Available

5.2.1 and later This model is available starting in IBM Software Hub Version 5.2.1.

The Llama 4 collection of models are multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 128 GB RAM 250 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
llama-4-scout-17b-16e-instruct

Status: Deprecated

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 96 GB RAM 215 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 4 NVIDIA H200
All GPUs must be hosted on a single OpenShift worker node.
No
llama-4-scout-17b-16e-instruct-int4

Status: Available

5.2.1 and later This model is available starting in IBM Software Hub Version 5.2.1.

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 128 GB RAM 215 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
llama-3-3-70b-instruct

Status: Available

A state-of-the-art refresh of the Llama 3.1 70B Instruct model that uses the latest advancements in post-training techniques.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 75 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 1 NVIDIA H200
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-2-1b-instruct

Status: Available

A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
llama-3-2-3b-instruct

Status: Available

A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 9 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
llama-3-2-11b-vision-instruct

Status: Available

A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-2-90b-vision-instruct

Status: Available

A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 200 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H20
  • 8 NVIDIA H100
  • 4 NVIDIA H200
  • 8 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-guard-3-11b-vision

Status: Available

A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
llama-3-1-8b-instruct

Status: Available

An auto-regressive language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be fine tuned when configured to use NVIDIA A100 or H100 GPU.
Yes
llama-3-1-70b-instruct

Status: Available

An auto-regressive language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 163 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 2 NVIDIA H200
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-405b-instruct

Status: Deprecated

Meta's largest open-sourced foundation model to date, with 405 billion parameters, and optimized for dialogue use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 500 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 4 NVIDIA H200
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-8b-instruct

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

Pretrained and instruction tuned generative text model optimized for dialogue use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 40 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
llama-3-70b-instruct

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

Pretrained and instruction tuned generative text model optimized for dialogue use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
10 246 GB RAM 180 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-2-13b-chat

Status: Deprecated

General use with zero- or few-shot prompts. Optimized for dialogue use cases.

Note: This model can be prompt tuned.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 62 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
llama2-13b-dpo-v7

Status: Withdrawn

5.2.0 The model was withdrawn in IBM Software Hub Version 5.2.0.

General use foundation model for generative tasks in Korean.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
Yes
ministral-8b-instruct

Status: Available

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 35 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
mistral-small-3-1-24b-instruct-2503

Status: Available

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities and is suitable for function calling and agents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 105 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
Yes
mistral-small-3-2-24b-instruct-2506

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The mistral-small-3-2-24b-instruct-2506 foundation model is an enhancement to mistral-small-3-1-24b-instruct-2503, with better instruction following and tool calling performance.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 210 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 2 NVIDIA H200
  • NVIDIA L40S not supported
No
mistral-small-24b-instruct-2501

Status: Deprecated

Mistral Small 3 (2501) sets a new benchmark in the small Large Language Models category with less than 70 billion parameters. With a size of 24 billion parameters, the model achieves state-of-the-art capabilities comparable to larger models.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
mistral-small-instruct

Status: Deprecated

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 50 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
mistral-medium-2505

Status: Available

5.2.1 and later This model is available starting in IBM Software Hub Version 5.2.1.

Mistral Medium 3 features multimodal capabilities and an extended context length of up to 128k. The model can process and understand visual inputs, long documents and supports many languages.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 258 GB RAM 280 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
  • 4 NVIDIA L40S
No
mistral-medium-2508

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

The mistral-medium-2508 foundation model is an enhancement of mistral-medium-2505, with state-of-the-art performance in coding and multimodal understanding.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 300 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H20
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
mistral-large-instruct-2411

Status: Available

The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 140 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H20
  • 4 NVIDIA H100
  • 4 NVIDIA H200
All GPUs must be hosted on a single OpenShift worker node.
No
mistral-large

Status: Deprecated

The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 240 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 4 NVIDIA H200
  • 8 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
mixtral-8x7b-instruct-v01

Status: Deprecated

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.

Mixtral-8x7B is not a commercial model and does not require a separate entitlement.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 195 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 1 NVIDIA H200
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
mt0-xxl-13b

Status: Withdrawn

5.2.2 The model was withdrawn in IBM Software Hub Version 5.2.2.

General use with zero- or few-shot prompts. Supports prompts in languages other than English and multilingual prompts.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 62 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
pixtral-large-instruct-2411

Status: Available

A a 124-billion multimodal model built on top of Mistral Large 2, and demonstrates frontier-level image understanding.

Attention: You must purchase Mistral AI with IBM to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 240 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H20
  • 8 NVIDIA H100
  • 4 NVIDIA H200
All GPUs must be hosted on a single OpenShift worker node.
No
pixtral-12b

Status: Available

A 12-billion parameter model pretrained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No
voxtral-small-2507

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

Voxtral Small is an enhancement of Mistral Small 3.1, incorporating state-of-the-art audio capabilities and text performance, capable of processing up to 30 minutes of audio.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 210 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No

Embedding models

Text embedding are small enough that the models can run without GPU. However, if you need better performance from the embedding models, you can configure them to use GPU.

all-minilm-l6-v2

Status: Available

Use all-minilm-l6-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
all-minilm-l12-v2

Status: Available

Use all-minilm-l12-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-embedding-107m-multilingual

Status: Available

A 107 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 2 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-embedding-278m-multilingual

Status: Available

A 278 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 2 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-embedding-english-reranker-r2

Status: Available

5.2.2 and later This model is available starting in IBM Software Hub Version 5.2.2.

A 149 million parameter model from the Granite Embeddings suite provided by IBM. The model has been trained for passage reranking, based on the granite-embedding-english-r2 to use in RAG pipelines.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB RAM 1 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
multilingual-e5-large

Status: Available

An embedding model built by Microsoft and provided by Hugging Face. The multilingual-e5-large model is useful for tasks such as passage or information retrieval, semantic similarity, bitext mining, and paraphrase retrieval.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 8 GB 10 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
slate-30m-english-rtrvr

Status: Available

The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
slate-125m-english-rtrvr

Status: Available

The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes

Reranker models

Reranker models are small enough that the models run without GPU.

ms-marco-MiniLM-L-12-v2

Status: Available

A reranker model built by Microsoft and provided by Hugging Face. Given query text and a set of document passages, the model ranks the list of passages from most-to-least related to the query.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB This model does not require any GPU. Not applicable.

Text extraction models

Text extraction models are small enough that the models run without GPU.

wdu

Status: Available

A set of text extraction models that are represented by the "wdu" identifier.

Restriction: You cannot install text extraction models on watsonx.ai lightweight engine.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 31 GB 20 GB These models do not require any GPU. Not applicable.

Time series models

Time series are small enough that the models run without GPU.

granite-ttm-512-96-r2

Status: Available

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.
granite-ttm-1024-96-r2

Status: Available

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.
granite-ttm-1536-96-r2

Status: Available

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.

LoRA and QLoRA fine tuning

granite-3-1-8b-base

Status: Available

Fine tuning method: LoRA

Granite 3.1 8b base is a pretrained autoregressive foundation model with a context length of 128k intended for tuning.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
llama-3-1-8b

Status: Available

Fine tuning method: LoRA

Llama-3-1-8b is a pretrained and fine-tuned generative text model with 8 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
llama-3-1-70b

Status: Available

Fine tuning method: LoRA

Llama-3-1-70b is a pretrained and fine-tuned generative text model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 280 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
llama-3-1-70b-gptq

Status: Available

Fine tuning method: QLoRA

Llama 3.1 70b is a pretrained and fine-tuned generative text base model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 40 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No

watsonx Assistant

If you plan to enable features that require GPUs, you must have GPUs that support the models that you plan to use.

You can install one or more models based on the features that you want to enable. Use the following table to determine which models to install:

Model
Conversational search

Query rewrite
Conversational search

Answer generation
Conversational skills

Custom actions information gathering
granite-3-8b-instruct Yes No Yes
ibm-granite-8b-unified-api-model-v2 No Yes No
llama-3-1-70b-instruct Yes Yes Yes
granite-3-8b-instruct

Status: Available

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
ibm-granite-8b-unified-api-model-v2
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
10 64 GB RAM 45 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
llama-3-1-70b-instruct

Status: Available

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 163 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 2 NVIDIA H200
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No

watsonx BI

granite-3-8b-instruct

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
slate-30m-english-rtrvr

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • Intel Gaudi 3 AI Accelerator not supported
Yes

watsonx Code Assistant

granite-3-3-8b-instruct

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 8 GB RAM 18 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
ibm-granite-20b-code-javaenterprise-v2

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 2 GB RAM 45 GB You can use any of the following GPU types:
  • 1 NVIDIA H100
No

watsonx Code Assistant for Red Hat Ansible Lightspeed

ibm-granite-20b-code-8k-ansible
The default model for watsonx Code Assistant™ for Red Hat® Ansible® Lightspeed. The model provides the following features:
  • Ansible task generation
  • Ansible role generation
  • Ansible code explanation
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 2 GB RAM 45 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
ibm-granite-3b-code-v1
Optional. The model provides the following features:
  • Ansible task generation

The model can be tuned with your own data to get results that are tailored for your specific use case.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 2 GB RAM 15 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 2 NVIDIA L40S
No

watsonx Code Assistant for Z

granite-20b-code-cobol-v1

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 96 GB RAM 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No

watsonx Code Assistant for Z Agentic

granite-code-z-xplain

The watsonx Code Assistant for Z Agentic service uses the granite-code-z-xplain model that is installed by the watsonx Code Assistant for Z Code Explanation service.

If watsonx Code Assistant for Z Code Explanation is already installed, no additional GPU are required for watsonx Code Assistant for Z Agentic.

Otherwise, watsonx Code Assistant for Z Code Explanation and the granite-code-z-xplain model are installed when you install watsonx Code Assistant for Z Agentic.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 4 GB RAM 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes

watsonx Code Assistant for Z Code Explanation

granite-code-z-xplain

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 4 GB RAM 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes

watsonx Code Assistant for Z Code Generation

The model that you need depends on the version of IBM Software Hub that is installed
granite-4-h-small
5.2.2 Required.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 150 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 2 NVIDIA H200
No
wca4z23-6base-64k-merged3.1-v1-chat

5.2.0 5.2.1 Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 17 GB You can use any of the following GPU types:
  • 1 NVIDIA A100 with 80 GB RAM
  • 1 NVIDIA H100 with 80 GB RAM
  • 1 NVIDIA H100 with 94 GB RAM
Yes

watsonx.data™ Premium

granite-3-2b-instruct

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 6 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
llama-3-3-70b-instruct

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 75 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 1 NVIDIA H200
  • 4 NVIDIA L40S
No
pixtral-12b

Required.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No

watsonx.data intelligence

granite-3-8b-instruct
Required if the following statements are true:
  • You plan to enable gen AI based features
  • You want to run the gen AI based features on GPU
You can optionally run the gen AI based features on:
  • CPU
  • A remote instance of watsonx.ai.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes

watsonx Orchestrate

You can choose where the foundation models that you need are hosted:

The same cluster as watsonx Orchestrate
Choosing a model GPU requirements
You must use one of the models provided by IBM.

The features that you plan to use determine the model or models that you must install.

You must have sufficient GPU on the cluster where you plan to install watsonx Orchestrate.
A remote or external cluster by using AI gateway
Choosing a model GPU requirements
You can choose whether to use:
  • One of the models provided by IBM

    If you use the models provided by IBM, the features that you plan to use determine the models that you must install.

  • A custom model

    If you use a custom model, you must register the external model through AI gateway.

Local GPU is not required.
Remote GPU might be required:
  • If you plan to host models on a remote cluster, you must have sufficient GPU on the cluster where you plan to install the foundation models.

    For more information on GPU requirements, consult the documentation from the model provider.

  • If you plan to use models hosted by a third-party, you don't need GPU.
Models provided by IBM

Review the following table to determine which model or models provide the features that you need:

Model
Agentic AI

Domain agents
Agentic AI

Tool and API orchestration
Conversational search

Answer generation
Conversational search

Query rewrite
Conversational skills

Custom actions information gathering
granite-3-8b-instruct No No Yes No No
ibm-granite-8b-unified-api-model-v2 No No No Yes Yes
llama-3-1-70b-instruct No Yes Yes Yes Yes
llama-3-2-90b-vision-instruct Yes Yes Yes Yes Yes
Important: The llama-3-2-90b-vision-instruct model is recommended over the llama-3-1-70b-instruct model. The llama-3-2-90b-vision-instruct model offers:
  • Better performance
  • More accurate results
slate-30m-english-rtrvr

Required.

This model provides semantic search of the watsonx Orchestrate catalog.

This model does not require GPU and is always installed on the same cluster as watsonx Orchestrate.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB

GPUs are not required.

If you need better performance, you can use any of the following GPU types:

  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-3-8b-instruct
Install this model if you want to enable watsonx Orchestrate to:
  • Answer conversational search questions
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
ibm-granite-8b-unified-api-model-v2
Install this model if you want to enable watsonx Orchestrate to:
  • Rewrite user questions to an understood format for conversational search
  • Gather information to fill in variables in a conversational skill
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
10 64 GB RAM 45 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
llama-3-1-70b-instruct
Important: The llama-3-2-90b-vision-instruct model is recommended over the llama-3-1-70b-instruct model. The llama-3-2-90b-vision-instruct model offers:
  • Better performance
  • More accurate results
Install this model if you want to enable watsonx Orchestrate to:
  • Answer conversational search questions
  • Rewrite user questions to an understood format for conversational search
  • Gather information to fill in variables in a conversational skill
  • Select, connect, and coordinate multiple tools or APIs by using agentic AI
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 163 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 2 NVIDIA H200
  • 4 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No
llama-3-2-90b-vision-instruct
Install this model if you want to enable watsonx Orchestrate to:
  • Answer conversational search questions
  • Rewrite user questions to an understood format for conversational search
  • Gather information to fill in variables in a conversational skill
  • Select, connect, and coordinate multiple tools or APIs by using agentic AI
  • Use prebuilt agentic AI agents that target specific domains
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 200 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H20
  • 8 NVIDIA H100
  • 4 NVIDIA H200
  • 8 NVIDIA L40S
All GPUs must be hosted on a single OpenShift worker node.
No