System requirements for foundation models in IBM watsonx.ai

You can deploy a collection foundation models developed by IBM or third-party providers in your cluster.

GPU requirements

One of the following types of GPUs is required to support the use of foundation models in IBM watsonx.ai:
  • NVIDIA A100 GPUs with 80 GB RAM
  • NVIDIA H20 GPUs with 96 GB RAM (Not supported with all models. See tables for details.)
    Restriction: You cannot use AutoAI to automate search for retrieval-augmented generation (RAG) patterns with NVIDIA H20 GPUs.
  • NVIDIA H100 GPUs with 80 GB RAM
  • NVIDIA H100 GPUs with 94 GB RAM
  • NVIDIA H200 GPUs with 141 GB RAM
  • NVIDIA L40S GPUs with 48 GB RAM (Not supported with all models. See tables for details.)
  • NVIDIA RTX PRO 6000 GPUs with 96 GB RAM (Not supported with all models. See tables for details.)
  • Intel Gaudi 3 AI Accelerator GPUs with 128 GB RAM (Not supported with all models. See tables for details.)
    Restriction: You cannot tune foundation models with Intel Gaudi 3 AI Accelerator GPUs.
For details about how to deploy models with Intel Gaudi 3 AI Accelerator and NVIDIA RTX PRO 6000 GPUs, see Installing foundation models with a custom vLLM image.
A general guideline for calculating the number of GPUs required for hosting a foundation model is as follows:
  • NVIDIA L40S: GPU memory requirement / 48. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/48 = 5.1. The model needs 8 GPUs.
  • Intel Gaudi 3 AI Accelerator: GPU memory requirement / 128. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 450 GB GPU memory, that's 450/128 = 3. The model needs 4 GPUs.
Attention: You can install the IBM watsonx.ai service on the VMware vSphere platform with GPUs configured in passthrough mode. You cannot use virtual GPUs (vGPUs) with watsonx.ai™.
To get an idea of the minimum GPU memory requirements, multiply the number of billion parameters of the model by 3. For example, for a foundation model with 12 billion parameters, multiply 12 by 3. An initial estimate of the memory required by the model is 36 GB. Then add 1 GB per 100,000 tokens in the context window length.

You might be able to run some models with fewer GPUs at context lengths other than the maximum or subject to other performance tradeoffs and constraints. If you use a configuration with fewer than the recommended number of GPUs, make sure to test the deployment to verify that the performance is satisfactory before you use the configuration in production. If you use a configuration with more than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.

You can divide foundation models into smaller units called shards and run the model shards on multiple GPUs. By default, the number of shards a foundation model is partitioned into is equal to the minimum number of GPUs required to run the model. For information on the minimum number of GPUs required for each model in the model system requirements tables. To customize the number of shards a model is partitioned into, see Changing foundation model sharding configuration.

You can optionally partition A100 or H100 GPU processors to add more than one foundation model to a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be partitioned indicate Yes for NVIDIA Multi-Instance GPU support in the foundation models' system requirement tables.

Restriction: You cannot tune foundation models in NVIDIA Multi-Instance GPU enabled clusters.

When you calculate the total number of GPUs that you need for your deployment, consider whether you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.

Foundation models

The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation models that are provided with IBM watsonx.ai at the default context window length for each model. Minimum system requirements may vary based on the context length you set, the number of model parameters, the model parameters' precision, and more.

For details about how to use foundation models provided with IBM watsonx.ai, including the default context window length and capabilties, see Supported foundation models.

Note:
  • You do not need to prepare these resources in addition to the overall service hardware requirements. If you meet the prerequisite hardware requirements for the service, you already have the resources you need.
  • The foundation model system requirements describe the subset of resources that are required per model. For models that require more than 1 GPU, all GPUs must be hosted on a single OpenShift worker node
allam-1-13b-instruct
  • Status: Deprecated
  • Model ID: allam-1-13b-instruct

A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.
Yes
codestral-2501
  • Status: Deprecated
  • Model ID: codestral-2501

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
Yes, with additional configuration. For details, see Installing models on GPU partitions.
codestral-2508
  • Status: Available
  • Model ID: codestral-2508

Ideal for code generation and high-precision fill-in-the-middle (FIM) completion. The foundation model is optimized for production engineering environments such as latency-sensitive, context-aware, and self-deployable.

Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 30 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 2 NVIDIA H200
No
devstral-medium-2507
  • Status: Available
  • Model ID: devstral-medium-2507

The devstral-medium-2507 foundation model from Mistral AI is a high-performance code generation and agentic reasoning model. Ideal for generalization across prompt styles and tool use in code agents and frameworks.

Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 250 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H20
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
devstral-medium-2512
  • Status: Available
  • Model ID: devstral-medium-2512

The devstral-medium-2512 foundation model from Mistral AI is an agentic model for software engineering tasks from the Devstral 2 model family that excels at using tools to explore code bases, editing multiple files, and power software engineering agents.

Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 200 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
devstral-small-2512
  • Status: Available
  • Model ID: devstral-small-2512

The devstral-small-2512 foundation model from Mistral AI is an agentic model for software engineering tasks from the Devstral 2 model family that excels at using tools to explore code bases, editing multiple files, and power software engineering agents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 Gi RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
gpt-oss-20b
  • Status: Available
  • Model ID: gpt-oss-20b

The gpt-oss foundation models are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various developer use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 128 GB RAM 100 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
gpt-oss-120b
  • Status: Available
  • Model ID: gpt-oss-120b

The gpt-oss foundation models are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, fine-tuning, and various developer use cases.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
6 96 GB RAM 195 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No
granite-4-h-micro
  • Status: Available
  • Model ID: granite-4-h-micro

The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-micro is a 3 billion parameter foundation model built for structured and long-context capabilities. The model is ideal for instruction following and tool-calling.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
granite-4-h-tiny
  • Status: Available
  • Model ID: granite-4-h-tiny

The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-tiny is a 7 billion parameter long-context instruction-tuned model developed using a diverse set of techniques with a structured chat format, including supervised fine-tuning, model alignment using reinforcement learning, and model merging. This model is ideal for instruction following and tool-calling capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-4-h-small
  • Status: Available
  • Model ID: granite-4-h-small

The Granite 4.0 foundation models belong to the IBM Granite family of models. The granite-4-h-small is 30 billion parameter foundation model built for structured and long-context capabilities. The model is ideal for instruction following and tool-calling capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 150 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA RTX PRO 6000
No
granite-3-3-8b-instruct
  • Status: Available
  • Model ID: granite-3-3-8b-instruct

An IBM-trained, dense decoder-only model, which is particularly well-suited for generative tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 18 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 NVIDIA RTX PRO 6000
No
granite-3-2-8b-instruct
  • Status: Deprecated
  • Model ID: granite-3-2-8b-instruct

A text-only model that is capable of reasoning. You can choose whether reasoning is enabled, based on your use case.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
No
granite-3-2b-instruct
  • Status: Available
  • Model ID: granite-3-2b-instruct

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 6 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-3-8b-instruct
  • Status: Available
  • Model ID: granite-3-8b-instruct

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
granite-3b-code-instruct
  • Status: Available
  • Model ID: granite-3b-code-instruct

A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 9 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.
Yes
granite-8b-code-instruct
  • Status: Available
  • Model ID: granite-8b-code-instruct

A 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 19 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, NVIDIA H200, or NVIDIA L40S GPU.
Yes
granite-20b-code-instruct
  • Status: Available
  • Model ID: granite-20b-code-instruct

A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 70 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA H200 GPU.
No
granite-20b-code-base-schema-linking
  • Status: Available
  • Model ID: granite-20b-code-base-schema-linking

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 44 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-20b-code-base-sql-gen
  • Status: Available
  • Model ID: granite-20b-code-base-sql-gen

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 44 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-34b-code-instruct
  • Status: Available
  • Model ID: granite-34b-code-instruct

A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 78 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
No
granite-docling-258M
  • Status: Available
  • Model ID: granite-docling-258M

Granite Docling is a multimodal image text to text model efficient for document conversion. The model preserves the core features of Docling while maintaining seamless integration with Docking documents to ensure full compatibility.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-guardian-3-2b
  • Status: Deprecated
  • Model ID: granite-guardian-3-2b

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
granite-guardian-3-8b
  • Status: Deprecated
  • Model ID: granite-guardian-3-8b

Granite models are designed to be used for a wide range of generative and non-generative tasks. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open-source community.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-guardian-3-2-5b
  • Status: Available
  • Model ID: granite-guardian-3-2-5b

The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well suited for generative tasks. This model cannot be used through the API.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 4 GB RAM 15 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
  • 1 NVIDIA RTX PRO 6000
No
granite-vision-3-2-2b
  • Status: Deprecated
  • Model ID: granite-vision-3-2-2b

Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 7 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
granite-vision-3-3-2b
  • Status: Available
  • Model ID: granite-vision-3-3-2b

Granite 3.2 Vision is an image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 128 GB RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
No
granite-4-1b-speech
  • Status: Available
  • Model ID: ibm/granite-4-1b-speech

Granite-4-1b-speech is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST). The model was trained on a collection of public corpora comprising of diverse datasets for ASR and AST as well as synthetic datasets tailored to support Japanese ASR, keyword-biased ASR and speech translation. Granite-4-1b-speech was trained by modality aligning granite-4-1b-base to speech on publicly available open source corpora containing audio inputs and text targets.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 10Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
ibm-defense-4-0-micro
  • Status: Available
  • Model ID: ibm-defense-4-0-micro

The ibm-defense-4-0-micro is a defense-focused large language model (LLM) fine-tuned by an IBM Granite model. This model is designed to work with Janes foundation defense data, delivering fast, reliable and contextual results for mission-critical tasks in defense organizations.

Attention: You must purchase the IBM watsonx.ai Defense Model entitlement separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 60 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
ibm-defense-4-0-small
  • Status: Available
  • Model ID: ibm-defense-4-0-small

The ibm-defense-4-0-small is a defense-focused large language model fine-tuned by an IBM Granite model. This model is designed to work with Janes foundation defense data, delivering fast, reliable and contextual results for mission-critical tasks in defense organizations.

Attention: You must purchase the IBM watsonx.ai Defense Model entitlement separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 85 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
ibm-defense-3-3-8b-instruct
  • Status: Available
  • Model ID: ibm-defense-3-3-8b-instruct

The IBM watsonx.ai Defense Model is a specialized fine-tuned version of IBM’s granite-3-3-8b-instruct base model. The model is developed through Janes trusted open-source defense data to support defense and intelligence operations.

Attention: You must purchase the IBM watsonx.ai Defense Model entitlement separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 8 GB RAM 18 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
No
llama-4-maverick-17b-128e-instruct-fp8
  • Status: Available
  • Model ID: llama-4-maverick-17b-128e-instruct-fp8

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 96 GB RAM 425 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H100
  • 4 NVIDIA H200
No
llama-4-maverick-17b-128e-instruct-int4
  • Status: Available
  • Model ID: llama-4-maverick-17b-128e-instruct-int4

The Llama 4 collection of models are multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 128 GB RAM 250 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • NVIDIA L40S not supported
No
llama-4-scout-17b-16e-instruct-int4
  • Status: Available
  • Model ID: llama-4-scout-17b-16e-instruct-int4

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
1 128 GB RAM 215 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • NVIDIA L40S not supported
No
llama-3-3-70b-instruct
  • Status: Available
  • Model ID: llama-3-3-70b-instruct

A state-of-the-art refresh of the Llama 3.1 70B Instruct model that uses the latest advancements in post-training techniques.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 75 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 1 NVIDIA H200
  • 4 NVIDIA L40S
No
llama-3-2-1b-instruct
  • Status: Available
  • Model ID: llama-3-2-1b-instruct

A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
llama-3-2-3b-instruct
  • Status: Available
  • Model ID: llama-3-2-3b-instruct

A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 9 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Yes
llama-3-2-11b-vision-instruct
  • Status: Available
  • Model ID: llama-3-2-11b-vision-instruct

A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
  • 1 NVIDIA RTX PRO 6000
No
llama-3-2-90b-vision-instruct
  • Status: Available
  • Model ID: llama-3-2-90b-vision-instruct

A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 200 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H20
  • 8 NVIDIA H100
  • 4 NVIDIA H200
  • 8 NVIDIA L40S
No
llama-guard-3-11b-vision
  • Status: Available
  • Model ID: llama-guard-3-11b-vision

A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 NVIDIA RTX PRO 6000
No
llama-3-1-8b-instruct
  • Status: Available
  • Model ID: llama-3-1-8b-instruct

An auto-regressive language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
  • 1 Intel Gaudi 3 AI Accelerator
Note: This model can be full fine tuned when configured to use an NVIDIA A100, NVIDIA H100, or NVIDIA L40S GPU.
Yes
llama-3-1-70b-instruct
  • Status: Available
  • Model ID: llama-3-1-70b-instruct

An auto-regressive language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 163 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 2 NVIDIA H200
  • 4 NVIDIA L40S
No
magistral-small-2509
  • Status: Available
  • Model ID: magistral-small-2509

Building upon Mistral Small 3.2 (2506), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 Gi RAM 120Gi You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 2 NVIDIA L40S
Yes
magistral-medium-2509
  • Status: Available
  • Model ID: magistral-medium-2509

Magistral Medium 2509 is an update to the 2507 version with improvements in math and coding benchmarks, along with image input support.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 400Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
ministral-14b-instruct-2512
  • Status: Available
  • Model ID: ministral-14b-instruct-2512

Ideal for complex tasks that require large reasoning capabilities or are highly specialized.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
ministral-8b-instruct
  • Status: Deprecated
  • Model ID: ministral-8b-instruct
Ideal for complex tasks that require large reasoning capabilities or are highly specialized.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 35 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
ministral-3b-instruct-2512
  • Status: Available
  • Model ID: ministral-3b-instruct-2512

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 18Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
ministral-8b-instruct-2512
  • Status: Available
  • Model ID: ministral-8b-instruct-2512

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 18Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
ministral-3-14b-instruct-2512-bf16
  • Status: Available
  • Model ID: ministral-3-14b-instruct-2512-bf16

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 18Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
mistral-small-3-2-24b-instruct-2506
  • Status: Available
  • Model ID: mistral-small-3-2-24b-instruct-2506

The mistral-small-3-2-24b-instruct-2506 foundation model is an enhancement to mistral-small-3-1-24b-instruct-2503, with better instruction following and tool calling performance.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 210 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H20
  • 2 NVIDIA H100
  • 2 NVIDIA H200
No
mistral-small-3-1-24b-instruct-2503
  • Status: Available
  • Model ID: mistral-small-3-1-24b-instruct-2503

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities and is suitable for function calling and agents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
3 96 GB RAM 105 GB You can use any of the following GPU types:
  • 2 NVIDIA A100
  • 2 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
Yes
mistral-medium-2505
  • Status: Deprecated
  • Model ID: mistral-medium-2505

Mistral Medium 3 features multimodal capabilities and an extended context length of up to 128k. The model can process and understand visual inputs, long documents and supports many languages.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 Gi RAM 280 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA L40S
No
mistral-medium-2508
  • Status: Available
  • Model ID: mistral-medium-2508

The mistral-medium-2508 foundation model is an enhancement of mistral-medium-2505, with state-of-the-art performance in coding and multimodal understanding.

Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 300 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H20
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
mistral-large-2512
  • Status: Available
  • Model ID: mistral-large-2512

The mistral-large-2512 foundation model, also known as Mistral Large 3, is a state-of-the-art general-purpose multimodal granular mixture-of-experts model with 41 billion active parameters and 675 billion total parameter. The model is trained from the ground up with 3000 NVIDIA H200 GPUs.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
48 512 GB RAM 969 GB You can use any of the following GPU types:
  • 8 NVIDIA H200
No
mistral-large-instruct-2411
  • Status: Available
  • Model ID: mistral-large-instruct-2411
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 140 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H20
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No
nvidia-nemotron-nano-12b-v2-vl-fp8
  • Status: Available
  • Model ID: nvidia/nvidia-nemotron-nano-12b-v2-vl-fp8

NVIDIA-Nemotron-Nano-VL-12B-V2-FP8 is the quantized version of the NVIDIA Nemotron Nano VL V2 model, which is an auto-regressive vision language model that uses an optimized transformer architecture.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 30Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes
nvidia-nemotron-3-nano-30b-a3b-fp8
  • Status: Available
  • Model ID: nvidia/nemotron-3-nano-30b-a3b-fp8

Nemotron-Nano-3-30B-A3B-FP8 is a quantized version of Nemotron-Nano-3-30B-A3B and is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 64 Gi RAM 40Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA L40S
Yes
pixtral-large-instruct-2411
  • Status: Available
  • Model ID: pixtral-large-instruct
A 124 billion multimodal model built on top of Mistral Large 2, and demonstrates frontier-level image understanding.
Attention: You must purchase the Mistral AI with IBM license separately to download and use this model.
CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 240 GB You can use any of the following GPU types:
  • 8 NVIDIA A100
  • 8 NVIDIA H20
  • 8 NVIDIA H100
  • 4 NVIDIA H200
No
pixtral-12b
  • Status: Deprecated in 5.3.0
  • Model ID: pixtral-12b

A 12 billion parameter model pretrained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 30 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No
voxtral-small-24b-2507
  • Status: Available
  • Model ID: voxtral-small-24b-2507

Voxtral Small is an enhancement of Mistral Small 3.1, incorporating state-of-the-art audio capabilities and text performance, capable of processing up to 30 minutes of audio.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 210 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
No
voxtral-mini-2507
  • Status: Available
  • Model ID: voxtral-mini-2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation, and audio understanding.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 32 Gi RAM 18Gi You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA L40S
Yes

You cannot add deprecated or withdrawn models to your deployment. For more information about how deprecated and withdrawn models are handled, see Foundation model lifecycle.

Embedding models

Text embedding models are small enough that the models can run without GPU. However, if you need better performance from the embedding models, you can configure them to use GPU. For details, see Installing embedding models on GPUs.

For details about how to use embedding models provided with IBM watsonx.ai, including their capabilities, see Supported encoder models.

all-minilm-l6-v2
  • Status: Available
  • Model ID: all-minilm-l6-v2

Use all-minilm-l6-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
all-minilm-l12-v2
  • Status: Available
  • Model ID: all-minilm-l12-v2

Use all-minilm-l12-v2 as a sentence and short paragraph encoder. Given an input text, the model generates a vector that captures the semantic information in the text.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-embedding-107m-multilingual
  • Status: Available
  • Model ID: granite-embedding-107m-multilingual

A 107 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 2 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
granite-embedding-278m-multilingual
  • Status: Available
  • Model ID: granite-embedding-278m-multilingual

A 278 million parameter model from the Granite Embeddings suite provided by IBM. The model can be used to generate high quality text embeddings for a given input like a query, passage, or document.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 2 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
multilingual-e5-large
  • Status: Available
  • Model ID: multilingual-e5-large

An embedding model built by Microsoft and provided by Hugging Face. The multilingual-e5-large model is useful for tasks such as passage or information retrieval, semantic similarity, bitext mining, and paraphrase retrieval.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
4 8 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
slate-30m-english-rtrvr
  • Status: Available
  • Model ID: ibm-slate-30m-english-rtrvr

The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes
slate-125m-english-rtrvr
  • Status: Available
  • Model ID: ibm-slate-125m-english-rtrvr

The IBM provided slate embedding models are built to generate embeddings for various inputs such as queries, passages, or documents.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H20
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 1 NVIDIA L40S
Yes

Reranker models

Reranker models are small enough that the models can run without GPU. For details about how to use reranker models provided with IBM watsonx.ai, including their capabilities, see Supported encoder models.

granite-embedding-english-reranker-r2
  • Status: Available
  • Model ID: granite-embedding-english-reranker-r2

A 149 million parameter model from the Granite Embeddings suite provided by IBM. The model has been trained for passage reranking, based on the granite-embedding-english-r2 to use in RAG pipelines.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB RAM 1 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H20
  • 1 NVIDIA H200
No
ms-marco-MiniLM-L-12-v2
  • Status: Available
  • Model ID: ms-marco-minilm-l-12-v2

A reranker model built by Microsoft and provided by Hugging Face. Given query text and a set of document passages, the model ranks the list of passages from most-to-least related to the query.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 10 GB This model does not require any GPU. Not applicable.

Document text processing models

Document text processing uses a programmatic method to extract and classify text from images, tables, and structured PDF documents by using multiple IBM watsonx.ai APIs.

To use the text classification and extraction APIs, you must install a set of machine learning models that do natural language understanding in the text processing pipeline. For details, see Installing models with the default configuration.
Restriction: You cannot install text processing models with a watsonx.ai lightweight engine installation.
Document text processing
  • Status: Available
  • Model ID: wdu

A set of natural language text processing models that are represented by the "wdu" identifier.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
9 31 GB 20 GB These models do not require any GPU. Not applicable.

Time series foundation models

You can use the time series API to pass historical data observations to a time series foundation model that can forecast future values. For details about how to use time series models provided with IBM watsonx.ai, including their capabilities, see Supported foundation models.

You can deploy the following time series foundation models:

granite-ttm-512-96-r2
  • Status: Available
  • Model ID: granite-ttm-512-96-r2

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.
granite-ttm-1024-96-r2
  • Status: Available
  • Model ID: granite-ttm-1024-96-r2

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.
granite-ttm-1536-96-r2
  • Status: Available
  • Model ID: granite-ttm-1536-96-r2

The Granite time series models are compact pretrained models for multivariate time series forecasting from IBM Research, also known as Tiny Time Mixers (TTM). The models work best with data points in minute or hour intervals and generate a forecast dataset with 96 data points per channel by default.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 4 GB 1 GB This model does not require any GPU. Not applicable.

Foundation models available for tuning

You can use various techniques such as full fine tuning, low-rank adaptation (LoRA) tuning, or quantized low-rank adaptation (QLoRA) tuning to train, deploy foundation models that are compatible with fine tuning.

These foundation models can only be fine tuned. Unlike most foundation models that are provided with IBM watsonx.ai, these models cannot be directly inferenced in the Prompt Lab or programmatically by using the API. For details about resource requirements for model tuning, see Planning for foundation model tuning in IBM watsonx.ai.

You can install the following foundation models that can be fine tuned:

granite-3-1-8b-base
  • Status: Available
  • Tuning method: Full fine tuning, LoRA fine tuning
  • Model ID: granite-3-1-8b-base

Granite 3.1 8b base is a pretrained autoregressive foundation model with a context length of 128k intended for tuning.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No
llama-3-1-8b
  • Status: Available
  • Tuning method: Full fine tuning, LoRA fine tuning
  • Model ID: llama-3-1-8b

Llama-3-1-8b is a pretrained generative text model with 8 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
2 96 GB RAM 20 GB You can use any of the following GPU types:
  • 1 NVIDIA A100
  • 1 NVIDIA H100
  • 1 NVIDIA H200
  • 2 NVIDIA L40S
No
llama-3-1-70b
  • Status: Available
  • Tuning method: Full fine tuning, LoRA fine tuning
  • Model ID: llama-3-1-70b

Llama-3-1-70b is a pretrained generative text model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
16 246 GB RAM 280 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
  • 2 NVIDIA L40S
No
llama-3-1-70b-gptq
  • Status: Available
  • Tuning method: QLoRA fine tuning
  • Model ID: llama-3-1-70b-gptq

Llama 3.1 70b is a pretrained generative text base model with 70 billion parameters, optimized for multilingual dialogue use cases and code output.

CPU Memory Storage Supported GPUs NVIDIA Multi-Instance GPU support
5 246 GB RAM 40 GB You can use any of the following GPU types:
  • 4 NVIDIA A100
  • 4 NVIDIA H100
  • 4 NVIDIA H200
No

Custom foundation models

In addition to foundation models that are curated by IBM, you can upload and deploy your own foundation models. For more information about how to upload, register, and deploy a custom foundation model, see the following information:

Supported Accelerators on s390x

You can use hardware accelerators to improve foundation model inference performance on IBM Z and LinuxONE systems running the s390x architecture. Accelerators provide optimized compute for AI workloads on mainframe infrastructure.

IBM Spyre accelerator
IBM Spyre accelerators are inference accelerators for AI workloads on s390x systems. Spyre accelerators integrate with IBM watsonx.ai to provide hardware-accelerated inference for foundation models on IBM Z and LinuxONE platforms.
Hardware requirements
  • 10 dedicated IFLs
    • Two additional shared IFLs are required for the Spyre Support Appliance and the Appliance Control Center (SSA/ACC).
  • 758 GB Memory
  • 2.1 TB Disk (Filesystem + FDF)
  • 8x Spyre Cards