Foundation models in IBM watsonx.ai

You can deploy a collection of third-party and IBM models in IBM watsonx.ai.

You can deploy the following types of foundation models:

Provided foundation models
Custom foundation models
Embedding and reranker models
Text extraction models
Time series foundation models 5.1.1 and later
Foundation models compatible with LoRA and QLoRA fine tuning 5.1.1 and later

GPU requirements

One of the following types of GPUs is required to support the use of foundation models in IBM watsonx.ai:

NVIDIA A100 GPUs with 80 GB RAM
NVIDIA H100 GPUs with 80 GB RAM
NVIDIA H100 GPUs with 94 GB RAM
NVIDIA L40S GPUs with 48 GB RAM (Not supported with all models. See tables for details.)

Attention: You can install the IBM watsonx.ai service on the VMware vSphere platform with GPUs configured in passthrough mode. You cannot use virtual GPUs (vGPUs) with watsonx.ai™.

A general guideline for calculating the number of GPUs required for hosting the model is as follows:

L40S: GPU memory requirement / 48. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/48 = 5.1. The model needs 8 GPUs.
A100/H100: GPU memory requirement / 80. You can use only a 1, 2, 4, or 8 GPU configuration, so round up. For example, if the model needs 246 GB GPU memory, that's 246/80 = 3. The model needs 4 GPUs.

To get an idea of the minimum GPU memory requirements, multiply the number of billion parameters of the model by 3. For example, for a foundation model with 12 billion parameters, multiply 12 by 3. An initial estimate of the memory required by the model is 36 GB. Then add 1 GB per 100,000 tokens in the context window length.

You might be able to run some models with fewer GPUs at context lengths other than the maximum or subject to other performance tradeoffs and constraints. If you use a configuration with fewer than the recommended number of GPUs, make sure to test the deployment to verify that the performance is satisfactory before you use the configuration in production. If you use a configuration with more than the recommended number of GPUs, make sure to increase the number of CPUs you use. It is recommended that the number of CPUs exceeds the number of GPUs by one at a minimum.

You can optionally partition A100 or H100 GPU processors to add more than one foundation model to a GPU. For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be partitioned indicate Yes for NVIDIA Multi-Instance GPU support in the foundation models table.

Restriction: You cannot tune foundation models in NVIDIA Multi-Instance GPU enabled clusters.

When you calculate the total number of GPUs that you need for your deployment, consider whether you plan to customize any foundation models by tuning them. For more information, see Planning for foundation model tuning in IBM watsonx.ai.

Provided foundation models

The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation models that are provided with IBM watsonx.ai at the default context window length for each model. Minimum system requirements may vary based on the context length you set, the number of model parameters, the model parameters' precision, and more.

For details about the foundation models provided with IBM watsonx.ai, including the default context window length, see Supported foundation models.

Note: You do not need to prepare these resources in addition to the overall service hardware requirements. If you meet the prerequisite hardware requirements for the service, you already have the resources you need. The following table describes the subset of resources that are required per model.

The following table describes the provided foundation models that you can deploy after you install the service.

Foundation model	Description	System requirements	Supported GPUs	Group name
Model name allam-1-13b-instruct Model ID `allam-1-13b-instruct`	A bilingual large language model for Arabic and English that is initialized with Llama-2 weights and is fine-tuned to support conversational tasks.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.	ibmwxAllam113bInstruct
Model name codellama-34b-instruct-hf Model ID `codellama-codellama-34b-instruct-hf` Restriction: Withdrawn in 5.1.2	Code Llama is an AI model built on top of Llama 2, fine-tuned for generating and discussing code.	CPUs 3 Memory 96 GB RAM Storage 77 GB	Configuration You can use any of the following GPU types: 2 NVIDIA A100 2 NVIDIA H100 2 NVIDIA L40S The 2 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxCodellamaCodellama34bInstructHf
Model name codestral-2501 Model ID `codestral-2501` New in 5.1.1	Ideal for complex tasks that require large reasoning capabilities or are highly specialized. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 2 NVIDIA A100 2 NVIDIA H100 The 2 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support Yes, with additional configuration. For details, see Installing models on GPU partitions.	ibmwxCodestral2501
Model name codestral-22b Model ID `codestral-22b` New in 5.1.0	Ideal for complex tasks that require large reasoning capabilities or are highly specialized. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 2 Memory 96 GB RAM Storage 50 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxCodestral22B
Model name elyza-japanese-llama-2-7b-instruct Model ID `elyza-japanese-llama-2-7b-instruct`	General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.	CPUs 2 Memory 96 GB RAM Storage 50 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxElyzaJapaneseLlama27bInstruct
Model name flan-t5-xl-3b Model ID `google-flan-t5-xl`	General use with zero- or few-shot prompts. Note: This foundation model can be prompt tuned.	CPUs 2 Memory 128 GB RAM Storage 21 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGoogleFlanT5xl
Model name flan-t5-xxl-11b Model ID `google-flan-t5-xxl`	General use with zero- or few-shot prompts.	CPUs 2 Memory 128 GB RAM Storage 52 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGoogleFlanT5xxl
Model name flan-ul2-20b Model ID `google-flan-ul2`	General use with zero- or few-shot prompts.	CPUs Based on the GPU type, the following number of CPUs are required: 2 with NVIDIA A100 2 with NVIDIA H100 3 with NVIDIA L40S Memory 128 GB RAM Storage 85 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 2 NVIDIA L40S The GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxGoogleFlanul2
Model name granite-7b-lab Model ID `ibm-granite-7b-lab` Restriction: Deprecated	InstructLab foundation model from IBM that supports knowledge and skills contributed by the open source community.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGranite7bLab
Model name granite-8b-japanese Model ID `ibm-granite-8b-japanese` Restriction: Deprecated	A per-trained instruct variant model from IBM designed to work with Japanese text.	CPUs 2 Memory 96 GB RAM Storage 50 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGranite8bJapanese
Model name granite-13b-chat-v2 Model ID `ibm-granite-13b-chat-v2` Restriction: Withdrawn in 5.1.2	General use model from IBM that is optimized for dialogue use cases.	CPUs 2 Memory 128 GB RAM Storage 36 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGranite13bChatv2
Model name granite-13b-instruct-v2 Model ID `ibm-granite-13b-instruct-v2`	General use model from IBM that is optimized for question and answer use cases. Note: This model can be prompt tuned.	CPUs 2 Memory 128 GB RAM Storage 62 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGranite13bInstructv2
Model name granite-20b-multilingual Model ID `ibm-granite-20b-multilingual` Restriction: Withdrawn in 5.1.2	The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well-suited for generative tasks.	CPUs 2 Memory 96 GB RAM Storage 100 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S Note: This foundation model cannot be sharded across multiple GPUs. NVIDIA Multi-Instance GPU support No	ibmwxGranite20bMultilingual
Model name granite-3-2-8b-instruct Model ID `granite-3-2-8b-instruct` New in 5.1.2	Granite 3.2 8b Instruct is a text-only model capable of reasoning which you can be enable or disable to use the capability that fits your use case.	CPUs 2 Memory 32 GB RAM Storage 20 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 2 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxGranite328BInstruct
Model name granite-3-2b-instruct Model ID `granite-3-2b-instruct` New in 5.1.0	Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.	CPUs 2 Memory 96 GB RAM Storage 6 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGranite32BInstruct
Model name granite-3-8b-instruct Model ID `granite-3-8b-instruct` New in 5.1.0	Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.	CPUs 2 Memory 96 GB RAM Storage 20 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGranite38BInstruct
Model name granite-guardian-3-2b Model ID `granite-guardian-3-2b` New in 5.1.0	Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.	CPUs 2 Memory 96 GB RAM Storage 10 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGraniteGuardian32b
Model name granite-guardian-3-8b Model ID `granite-guardian-3-8b` New in 5.1.0	Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.	CPUs 2 Memory 96 GB RAM Storage 20 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxGraniteGuardian38b
Model name granite-3b-code-instruct Model ID `granite-3b-code-instruct`	A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.	CPUs 2 Memory 96 GB RAM Storage 9 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.	ibmwxGranite3bCodeInstruct
Model name granite-8b-code-instruct Model ID `granite-8b-code-instruct`	An 8-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.	CPUs 2 Memory 96 GB RAM Storage 19 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.	ibmwxGranite8bCodeInstruct
Model name granite-20b-code-instruct Model ID `granite-20b-code-instruct`	A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.	CPUs 2 Memory 96 GB RAM Storage 70 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.	ibmwxGranite20bCodeInstruct
Model name granite-20b-code-base-schema-linking Model ID `granite-20b-code-base-schema-linking` New in 5.1.0	Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.	CPUs 2 Memory 96 GB RAM Storage 44 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxGranite20bCodeBaseSchemaLinking
Model name granite-20b-code-base-sql-gen Model ID `granite-20b-code-base-sql-gen` New in 5.1.0	Granite models are designed to be used for a wide range of generative and non-generative tasks with appropriate prompt engineering. They employ a GPT-style decoder-only architecture, with additional innovations from IBM Research and the open community.	CPUs 2 Memory 96 GB RAM Storage 44 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxGranite20bCodeBaseSqlGen
Model name granite-34b-code-instruct Model ID `granite-34b-code-instruct`	A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion, generation, and conversion.	CPUs 2 Memory 96 GB RAM Storage 78 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxGranite34bCodeInstruct
Model name granite-vision-3-2-2b Model ID `granite-vision-3-2-2b` New in 5.1.2	Granite 3.2 Vision is a image-text-in, text-out model capable of understanding images like charts for enterprise use cases for computer vision tasks.	CPUs 2 Memory 32 GB RAM Storage 7 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 NVIDIA Multi-Instance GPU support No	ibmwxGraniteVision322Bs
Model name jais-13b-chat Model ID `core42-jais-13b-chat`	General use foundation model for generative tasks in Arabic.	CPUs 2 Memory 96 GB RAM Storage 60 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 Note: This foundation model cannot be sharded across multiple GPUs. NVIDIA Multi-Instance GPU support No	ibmwxCore42Jais13bChat
Model name llama-3-3-70b-instruct Model ID `llama-3-3-70b-instruct` New in 5.1.1	A state-of-the-art refresh of the Llama 3.1 70B Instruct model by using the latest advancements in post training techniques.	CPUs 3 Memory 96 GB RAM Storage 75 GB	Configuration You can use any of the following GPU types: 2 NVIDIA A100 2 NVIDIA H100 4 NVIDIA L40S The GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama3370BInstruct
Model name llama-3-2-1b-instruct Model ID `llama-3-2-1b-instruct` New in 5.1.0	A pretrained and fine-tuned generative text model with 1 billion parameters, optimized for multilingual dialogue use cases and code output.	CPUs 2 Memory 96 GB RAM Storage 10 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxLlama321bInstruct
Model name llama-3-2-3b-instruct Model ID `llama-3-2-3b-instruct` New in 5.1.0	A pretrained and fine-tuned generative text model with 3 billion parameters, optimized for multilingual dialogue use cases and code output.	CPUs 2 Memory 96 GB RAM Storage 8 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxLlama323bInstruct
Model name llama-3-2-11b-vision-instruct Model ID `llama-3-2-11b-vision-instruct` New in 5.1.0	A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 2 NVIDIA L40S The GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama3211bVisionInstruct
Model name llama-3-2-90b-vision-instruct Model ID `llama-3-2-90b-vision-instruct` New in 5.1.0	A pretrained and fine-tuned generative text model with 90 billion parameters, optimized for multilingual dialogue use cases and code output.	CPUs 16 Memory 246 GB RAM Storage 200 GB	Configuration You can use any of the following GPU types: 8 NVIDIA A100 8 NVIDIA H100 8 NVIDIA L40S The 8 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama3290bVisionInstruct
Model name llama-guard-3-11b-vision Model ID `llama-guard-3-11b-vision` New in 5.1.0	A pretrained and fine-tuned generative text model with 11 billion parameters, optimized for multilingual dialogue use cases and code output.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxLlamaGuard311bVision
Model name llama-3-1-8b-instruct Model ID `llama-3-1-8b-instruct` Restriction: Deprecated in 5.1.1	An auto-regressive language model that uses an optimized transformer architecture.	CPUs 2 Memory 96 GB RAM Storage 20 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes Note: This model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.	ibmwxLlama318bInstruct
Model name llama-3-1-70b-instruct Model ID `llama-3-1-70b-instruct` Restriction: Deprecated in 5.1.1	An auto-regressive language model that uses an optimized transformer architecture.	CPUs 16 Memory 246 GB RAM Storage 163 GB	Configuration You can use any of the following GPU types: 4 NVIDIA A100 4 NVIDIA H100 4 NVIDIA L40S The 4 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama3170bInstruct
Model name llama-3-405b-instruct Model ID `llama-3-405b-instruct`	Meta's largest open-sourced foundation model to date, with 405 billion parameters, and optimized for dialogue use cases.	CPUs 16 Memory 246 GB RAM Storage 500 GB	Configuration You can use any of the following GPU types: 8 NVIDIA A100 8 NVIDIA H100 8 NVIDIA L40S The 8 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama3405bInstruct
Model name llama-3-8b-instruct Model ID `meta-llama-llama-3-8b-instruct` Restriction: Deprecated	Pre-trained and instruction tuned generative text model optimized for dialogue use cases.	CPUs 2 Memory 96 GB RAM Storage 40 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxMetaLlamaLlama38bInstruct
Model name llama-3-70b-instruct Model ID `meta-llama-llama-3-70b-instruct` Restriction: Deprecated	Pre-trained and instruction tuned generative text model optimized for dialogue use cases.	CPUs 10 Memory 246 GB RAM Storage 180 GB	Configuration You can use any of the following GPU types: 4 NVIDIA A100 4 NVIDIA H100 4 NVIDIA L40S The 4 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxMetaLlamaLlama370bInstruct
Model name llama-2-13b-chat Model ID `meta-llama-llama-2-13b-chat`	General use with zero- or few-shot prompts. Optimized for dialogue use cases. Note: This model can be prompt tuned.	CPUs 2 Memory 128 GB RAM Storage 62 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxMetaLlamaLlama213bChat
Model name llama2-13b-dpo-v7 Model ID `mncai-llama-2-13b-dpo-v7` Restriction: Deprecated	General use foundation model for generative tasks in Korean.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 NVIDIA Multi-Instance GPU support Yes	ibmwxMncaiLlama213bDpov7
Model name ministral-8b-instruct Model ID `ministral-8b-instruct` New in 5.1.0	Ideal for complex tasks that require large reasoning capabilities or are highly specialized. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 2 Memory 96 GB RAM Storage 35 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support Yes	ibmwxMinistral8BInstruct
Model name mistral-small-24b-instruct-2501 Model ID `mistral-small-24b-instruct-2501` New in 5.1.2	Mistral Small 3 ( 2501 ) sets a new benchmark in the small Large Language Models category with less than 70 billion parameters. With a size of 24 billion parameters, the model achieves state-of-the-art capabilities comparable to larger models.	CPUs 2 Memory 32 GB RAM Storage 50 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 NVIDIA Multi-Instance GPU support No	ibmwxMistralSmall24BInstruct2501
Model name mistral-small-instruct Model ID `mistral-small-instruct` New in 5.1.0	Ideal for complex tasks that require large reasoning capabilities or are highly specialized. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 2 Memory 96 GB RAM Storage 50 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxMistralSmallInstruct
Model name mistral-large-instruct-2411 Model ID `mistral-large-instruct-2411` New in 5.1.1	The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 5 Memory 246 GB RAM Storage 140 GB	Configuration You can use any of the following GPU types: 4 NVIDIA A100 4 NVIDIA H100 The 4 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxMistralLargeInstruct2411
Model name mistral-large Model ID `mistral-large`	The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art reasoning capabilities that can be applied to any language-based task, including the most sophisticated ones. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 16 Memory 246 GB RAM Storage 240 GB	Configuration You can use any of the following GPU types: 8 NVIDIA A100 8 NVIDIA H100 8 NVIDIA L40S The 8 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxMistralLarge
Model name mixtral-8x7b-instruct-v01 Model ID `mistralai-mixtral-8x7b-instruct-v01`	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. Mixtral-8x7B is not a commercial model and does not require a separate entitlement.	CPUs 3 Memory 96 GB RAM Storage 195 GB	Configuration You can use any of the following GPU types: 2 NVIDIA A100 2 NVIDIA H100 4 NVIDIA L40S The GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxMistralaiMixtral8x7bInstructv01
Model name mt0-xxl-13b Model ID `bigscience-mt0-xxl` Restriction: Deprecated	General use with zero- or few-shot prompts. Supports prompts in languages other than English and multilingual prompts.	CPUs 2 Memory 128 GB RAM Storage 62 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S Note: This foundation model cannot be sharded across multiple GPUs. NVIDIA Multi-Instance GPU support Yes	ibmwxBigscienceMt0xxl
Model name pixtral-large-instruct-2411 Model ID `pixtral-large-instruct` New in 5.1.1	A a 124-billion multimodal model built on top of Mistral Large 2, and demonstrates frontier-level image understanding. Attention: You must purchase Mistral AI with IBM separately before you are entitled to download and use this model.	CPUs 16 Memory 246 GB RAM Storage 240 GB	Configuration You can use any of the following GPU types: 8 NVIDIA A100 8 NVIDIA H100 The 8 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxPixtralLargeInstruct
Model name pixtral-12b Model ID `pixtral-12b` New in 5.1.0	A 12-billion parameter model pre-trained and fine-tuned for generative tasks in text and image domains. The model is optimized for multilingual use cases and provides robust performance in creative content generation.	CPUs 2 Memory 96 GB RAM Storage 30 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 1 NVIDIA L40S NVIDIA Multi-Instance GPU support No	ibmwxPixtral12b

You cannot add deprecated or withdrawn models to your deployment. For more information about how deprecated and withdrawn models are handled, see Foundation model lifecycle.

Custom foundation models

In addition to foundation models that are curated by IBM, you can upload and deploy your own foundation models. For more information about how to upload, register, and deploy a custom foundation model, see the following information:

Full-service installation: Deploying custom foundation models
Lightweight engine installation: Adding custom foundation models to watsonx.ai Lightweight Engine

Embedding and reranker models

You can use the following text embedding and reranker models:

Model	System requirements	Group name
Model name all-minilm-l6-v2 Model ID `all-minilm-l6-v2`	CPUs 2 Memory 4 GB Storage 1 GB	ibmwxAllMinilmL6V2
Model name all-minilm-l12-v2 Model ID `all-minilm-l12-v2` New in 5.1.0	CPUs 2 Memory 4 GB Storage 1 GB	ibmwxAllMinilmL12V2
Model name granite-embedding-107m-multilingual Model ID `granite-embedding-107m-multilingual` New in 5.1.1	CPUs 2 Memory 4 GB Storage 2 GB	ibmwxGranite107MMultilingualRtrvr
Model name granite-embedding-278m-multilingual Model ID `granite-embedding-278m-multilingual` New in 5.1.1	CPUs 2 Memory 4 GB Storage 2 GB	ibmwxGranite278MMultilingualRtrvr
Model name ms-marco-MiniLM-L-12-v2 Model ID `ms-marco-minilm-l-12-v2` New in 5.1.0	CPUs 2 Memory 4 GB Storage 10 GB	ibmwxMsMarcoMinilmL12V2
Model name multilingual-e5-large Model ID `multilingual-e5-large`	CPUs 4 Memory 8 GB Storage 10 GB	ibmwxMultilingualE5Large
Model name slate-30m-english-rtrvr Model ID `ibm-slate-30m-english-rtrvr`	CPUs 2 Memory 4 GB Storage 10 GB	ibmwxSlate30mEnglishRtrvr
Model name slate-125m-english-rtrvr Model ID `ibm-slate-125m-english-rtrvr`	CPUs 2 Memory 4 GB Storage 10 GB	ibmwxSlate125mEnglishRtrvr

Text extraction models

Text extraction is a programmatic method for extracting text from images, tables, and structured PDF documents by using the IBM watsonx.ai API. To use the text extraction API, you must install a set of machine learning models that do the natural language understanding processing during text extraction.

Note: You cannot install text extraction models with a watsonx.ai lightweight engine installation.

Model	System requirements	Group name
Model name A set of text extraction models that are represented by the `wdu` identifier. Model ID `wdu`	CPUs 9 Memory 31 GB Storage 20 GB	Not necessary. The models are always downloaded because they have a small footprint.

Time series foundation models

5.1.1 and later

You can use the time series API to pass historical data observations to a time series foundation model that can forecast future values. You can deploy the following time series foundation models:

Model	System requirements	Group name
Model name `granite-ttm-512-96-r2` Model ID `granite-ttm-512-96-r2`	CPUs 2 Memory 4 GB Storage 1 GB	ibmwxGraniteTimeseriesTtmV1
Model name `granite-ttm-1024-96-r2` Model ID `granite-ttm-1024-96-r2`	CPUs 2 Memory 4 GB Storage 1 GB	ibmwxGraniteTimeseriesTtmV1
Model name `granite-ttm-1536-96-r2` Model ID `granite-ttm-1536-96-r2`	CPUs 2 Memory 4 GB Storage 1 GB	ibmwxGraniteTimeseriesTtmV1

Foundation models compatible with LoRA and QLoRA fine tuning

5.1.1 and later

You can use Parameter-Efficient Fine Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA) to train, deploy, and inference foundation models. The foundation models compatible with LoRA and QLoRA tuning can only be fine tuned. Unlike most large language models that are provided with IBM watsonx.ai, these models cannot be inferenced in the Prompt Lab or programmatically by using the API right away. The only way to inference one of these base models is to deploy the model as a custom foundation model.

You can deploy the following foundation models that are compatible with LoRA and QLoRA fine tuning:

Model	System requirements	Supported GPUs	Group name
Model name `granite-3-1-8b-base` Model ID `granite-3-1-8b-base` Fine tuning method LoRA	CPUs 2 Memory 96 GB Storage 20 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 NVIDIA Multi-Instance GPU support No	ibmwxGranite318BBase
Model name `llama-3-1-8b` Model ID `llama-3-1-8b` Fine tuning method LoRA	CPUs 2 Memory 96 GB Storage 20 GB	Configuration You can use any of the following GPU types: 1 NVIDIA A100 1 NVIDIA H100 NVIDIA Multi-Instance GPU support No	ibmwxLlama318B
Model name `llama-3-1-70b` Model ID `llama-3-1-70b` Fine tuning method LoRA	CPUs 16 Memory 246 GB Storage 280 GB	Configuration You can use any of the following GPU types: 4 NVIDIA A100 4 NVIDIA H100 The 4 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama318B
Model name `llama-3-1-70b-gptq` Model ID `llama-3-1-70b-gptq` Fine tuning method QLoRA	CPUs 5 Memory 246 GB Storage 40 GB	Configuration You can use any of the following GPU types: 4 NVIDIA A100 4 NVIDIA H100 The 4 GPUs must be hosted on a single OpenShift worker node. NVIDIA Multi-Instance GPU support No	ibmwxLlama3170BGptq