Foundation models in IBM watsonx.ai
You can deploy a collection of third-party and IBM models in IBM watsonx.ai.
- NVIDIA A100 GPUs with 80 GB RAM
- NVIDIA H100 GPUs with 80 GB RAM
- NVIDIA L40S GPUs with 48 GB RAM (Not supported with all models. See tables for details.)
5.0.3 or later You
can optionally partition A100 or H100 GPU processors to add more than one foundation model to a GPU.
For more information, see Partitioning GPU processors in IBM watsonx.ai. Models that can be partitioned
indicate Yes for NVIDIA Multi-Instance GPU
support in the foundation models table.
The following table lists the recommended number of GPUs to configure on a single OpenShift® worker node for the various foundation models that are available with IBM watsonx.ai. You might be able to run some models with fewer GPUs at context lengths other than the maximum or subject to other performance tradeoffs and constraints. If you use a configuration with fewer than the recommended number of GPUs, be sure to test the deployment to verify that the performance is satisfactory before you use the configuration in production.
When you calculate the total number of GPUs that you need for your deployment, consider whether you plan to customize any foundation models by tuning them. If you plan to tune a foundation model, factor in one GPU that can be reserved for tuning tasks. Do not partition the GPU that will be used for tuning a foundation model.
| Foundation model | Description | System requirements | Group name 5.0.1 or later |
|---|---|---|---|
|
A bilingual large language model for Arabic and English that is initialized with Llama-2
weights and is fine-tuned to support conversational tasks. Note: Starting with 5.0.3,
this model can be fine tuned when configured to use an NVIDIA A100 or H100 GPU.
|
|
ibmwxAllam113bInstruct |
|
Code Llama is an AI model built on top of Llama 2, fine-tuned for generating and discussing code. |
|
ibmwxCodellamaCodellama34bInstructHf |
|
General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese. |
|
ibmwxElyzaJapaneseLlama27bInstruct |
|
General use with zero- or few-shot prompts. Note: This foundation model can be
prompt tuned.
|
|
ibmwxGoogleFlanT5xl |
|
General use with zero- or few-shot prompts. |
|
ibmwxGoogleFlanT5xxl |
|
General use with zero- or few-shot prompts. Note: In 5.0.0 only, if you want to use this
model with L40S GPUs, you must take some extra steps. See Adding foundation models for
details.
|
|
ibmwxGoogleFlanul2 |
|
InstructLab foundation model from IBM that supports knowledge and skills contributed by the open source community. |
|
ibmwxGranite7bLab |
|
A per-trained instruct variant model from IBM designed to work with Japanese text. |
|
ibmwxGranite8bJapanese |
|
General use model from IBM that is optimized for dialogue use cases. |
|
ibmwxGranite13bChatv2 |
|
General use model from IBM that is optimized for question and answer use cases. Note: This model can be prompt tuned.
|
|
ibmwxGranite13bInstructv2 |
|
The Granite model series is a family of IBM-trained, dense decoder-only models, which are particularly well-suited for generative tasks. |
|
ibmwxGranite20bMultilingual |
|
A 3-billion parameter instruction fine-tuned model from IBM that supports code discussion,
generation, and conversion. Note: New in 5.0.1.
Note: Starting with
5.0.3, this model can be fine tuned when configured to use an NVIDIA A100 or H100
GPU.
|
|
ibmwxGranite3bCodeInstruct |
|
An 8-billion parameter instruction fine-tuned model from IBM that supports code discussion,
generation, and conversion. Note: New in 5.0.1.
Note: Starting with
5.0.3, this model can be fine tuned when configured to use an NVIDIA A100 or H100
GPU.
|
|
ibmwxGranite8bCodeInstruct |
|
A 20-billion parameter instruction fine-tuned model from IBM that supports code discussion,
generation, and conversion. Note: New in 5.0.1.
Note: Starting with
5.0.3, this model can be fine tuned when configured to use an NVIDIA A100 or H100
GPU.
|
|
ibmwxGranite20bCodeInstruct |
|
A 34-billion parameter instruction fine-tuned model from IBM that supports code discussion,
generation, and conversion. Note: New in 5.0.1.
|
|
ibmwxGranite34bCodeInstruct |
|
General use foundation model for generative tasks in Arabic. |
|
ibmwxCore42Jais13bChat |
|
General use with zero- or few-shot prompts. Optimized for dialogue use cases. Note: This model can be prompt tuned.
|
|
ibmwxMetaLlamaLlama213bChat |
|
General use with zero- or few-shot prompts. Optimized for dialogue use cases. |
|
ibmwxMetaLlamaLlama370bChat |
|
General use foundation model for generative tasks in Korean. |
|
ibmwxMncaiLlama213bDpov7 |
|
An auto-regressive language model that uses an optimized transformer architecture. Note: New in 5.0.3.
Note: Starting with 5.0.3, this model can be fine tuned
when configured to use an NVIDIA A100 or H100 GPU.
|
|
ibmwxLlama318bInstruct |
|
An auto-regressive language model that uses an optimized transformer architecture. Note: New in 5.0.3.
|
|
ibmwxLlama3170bInstruct |
|
Meta's largest open-sourced foundation model to date, with 405 billion parameters, and
optimized for dialogue use cases. Note: New in 5.0.3.
|
|
ibmwxLlama3405bInstruct |
|
Pre-trained and instruction tuned generative text model optimized for dialogue use cases. |
|
ibmwxMetaLlamaLlama38bInstruct |
|
Pre-trained and instruction tuned generative text model optimized for dialogue use cases. |
|
ibmwxMetaLlamaLlama370bInstruct |
|
General use foundation model tuned by IBM that supports knowledge and skills contributed by the open source community. |
|
ibmwxMistralaiMerlinite7b |
|
The most advanced Large Language Model (LLM) developed by Mistral Al with state-of-the-art
reasoning capabilities that can be applied to any language-based task, including the most
sophisticated ones. Note: New in 5.0.3.
Attention: You must purchase
Mistral AI with IBM separately before you are entitled to download and use this
model.
|
|
ibmwxMistralLarge |
|
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
|
ibmwxMistralaiMixtral8x7bInstructv01 |
|
General use with zero- or few-shot prompts. Supports prompts in languages other than English and multilingual prompts. |
|
ibmwxBigscienceMt0xxl |
You cannot add deprecated or withdrawn models to your deployment. For more information about how deprecated and withdrawn models are handled, see Foundation model lifecycle.
- Full-service installation: Deploying custom foundation models
- Lightweight engine installation: Adding custom foundation models to watsonx.ai Lightweight Engine
| Embedding model | System requirements | Group name5.0.1 or later |
|---|---|---|
|
Note: New in 5.0.3.
|
ibmwxAllMinilmL6V2 |
|
Note: New in 5.0.3.
|
ibmwxMultilingualE5Large |
|
Note: This model was updated to version 2.0.1 in CPD 5.0.3.
|
ibmwxSlate30mEnglishRtrvr |
|
Note: This model was updated to version 2.0.1 in CPD 5.0.3.
|
ibmwxSlate125mEnglishRtrvr |