Explore the IBM library of AI models available in the watsonx.ai studio
Select the IBM® Granite®, open-source or third-party model best suited for your business and deploy on-prem or in the cloud.
Choose the model that best fits your specific use case, budget considerations, regional interests and risk profile.
Tailored for business, IBM Granite family of open, performant and trusted models deliver exceptional performance at a competitive price, without compromising safety.
Llama models are open, efficient large language models designed for versatility and strong performance across a wide range of natural language tasks.
Mistral models are fast, performant, open-weight language models designed for modularity and optimized for text generation, reasoning and multilingual applications.
There are several foundation models from other providers available on watsonx.ai.
granite-4-h-small
IBM
Supports questions-answering (Q&A), fill-in-the-middle, function calling, multilingual dialog, summarization, classification, generation, extraction, RAG and coding tasks.
128k
granite-3-3-8b-instruct
IBM
Supports reasoning and planning, questions and answers (Q&A), fill-in-the-middle support, summarization, classification, generation, extraction, RAG and coding tasks.
128k
granite-3-2-8b-instruct
IBM
Supports reasoning and planning, Q&A, summarization, classification, generation, extraction, RAG and coding tasks.
128k
granite-vision-3-2-2b
IBM
Supports image-to-text use cases for chart, graphs and infographics analysis, and context Q&A.
16,384
granite-3-2b-instruct (v3.1)
IBM
Supports Q&A, summarization, classification, generation, extraction, RAG and coding tasks.
128k
granite-3-8b-instruct (v3.1)
IBM
Supports Q&A, summarization, classification, generation, extraction, RAG and coding tasks.
128k
granite-guardian-3-8b (v3.1)
IBM
Supports detection of HAP/ or PII, jailbreaking, bias, violence and other harmful content.
128k
granite-guardian-3-2b (v3.1)
IBM
Supports detection of HAP or PII, jailbreaking, bias, violence and other harmful content.
128k
granite-13b-instruct
IBM
Supports Q&A, summarization, classification, generation, extraction and RAG tasks.
8192
granite-8b-code-instruct
IBM
Task-specific model for code by generating, explaining and translating code from a natural language prompt.
128k
granite-timeseries-ttm-r2
IBM
Supports time-series forecasting, anomaly detection in sequential data, and predictive maintenance.
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
llama-4-maverick-17b-128e-instruct-fp8
Meta
Multimodal reasoning, long-context processing (10M tokens), code generation and analysis, multilingual operations (200 languages supported), STEM and logical reasoning.
128k
llama-3-3-70b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
llama-3-2-90b-vision-instruct
Meta
Supports image captioning, image-to-text transcription (OCR) including handwriting, data extraction and processing, context Q&A and object identification.
128k
llama-3-2-11b-vision-instruct
Meta
Supports image captioning, image-to-text transcription (OCR) including handwriting, data extraction and processing, context Q&A and object identification.
128k
llama-guard-3-11b-vision
Meta
Supports image filtering, HAP or PII detection and harmful content filtering.
128k
llama-3-2-1b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
llama-3-2-3b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
llama-3-405b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai. |
128k
llama-2-13b-chat
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
4096
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
mistral-medium-2505
Mistral AI
Supports coding, image captioning, image-to-text transcription, function calling, data extraction and processing, context Q&A, mathematical reasoning
128k
mistral-small-3-1-24b-instruct-2503
Mistral AI
Supports image captioning, image-to-text transcription, function calling, data extraction and processing, context Q&A and object identification
128k
pixtral-12b
Mistral AI
Supports image captioning, image-to-text transcription (OCR) including handwriting, data extraction and processing, context Q&A and object identification.
128k
mistral-large-2
Mistral AI
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in French, German, Italian, Spanish and English.
128k*
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
allam-1-13b-instruct
SDAIA
Supports Q&A, summarization, classification, generation, extraction, RAG and translation in Arabic.
4096
jais-13b-chat (Arabic)
core42
Supports Q&A, summarization, classification, generation, extraction and translation in Arabic.
2048
flan-t5-xl-3b
Supports Q&A, summarization, classification, generation, extraction and RAG tasks. Available for prompt-tuning.
4096
flan-t5-xxl-11b
Supports Q&A, summarization, classification, generation, extraction and RAG tasks.
4096
flan-ul2-20b
Supports Q&A, summarization, classification, generation, extraction and RAG tasks.
4096
gpt-oss-120b
Open AI
Supports private on-premises or edge deployment, reasoning workflows, tool-use (e.g. search, code execution), customizable chain-of-thought, structured outputs, adjustable reasoning effort
128K
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
granite-embedding-107m-multilingual
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
granite-embedding-278m-multilingual
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
slate-125m-english-rtrvr-v2
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
slate-125m-english-rtrvr
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
slate-30m-english-rtrvr-v2
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
slate-30m-english-rtrvr
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
all-mini-l6-v2
Microsoft
Retrieval augmented generation, semantic search and document comparison tasks.
256
multilingual-e5-large
Intel
Retrieval augmented generation, semantic search and document comparison tasks.
512
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
What happens when you train a powerful AI model with your own unique data? Better customer experiences and faster value with AI. Explore these stories and see how.
Wimbledon used watsonx.ai foundation models to train its AI to create tennis commentary.
The Recording Academy used AI Stories with IBM watsonx to generate and scale editorial content around GRAMMY nominees.
The Masters uses watsonx.ai to bring AI-powered hole insights combined with expert opinions to digital platforms.
AddAI.Life uses watsonx.ai to access selected open-source large language models to build higher quality virtual assistants.
IBM believes in the creation, deployment and utilization of AI models that advance innovation across the enterprise responsibly. IBM watsonx AI portfolio has an end-to-end process for building and testing foundation models and generative AI. For IBM-developed models, we search for and remove duplication, and we employ URL blocklists, filters for objectionable content and document quality, sentence splitting and tokenization techniques, all before model training.
During the data training process, we work to prevent misalignments in the model outputs and use supervised fine-tuning to enable better instruction following so that the model can be used to complete enterprise tasks through prompt engineering. We are continuing to develop the Granite models in several directions, including other modalities, industry-specific content and more data annotations for training, while also deploying regular, ongoing data protection safeguards for IBM developed-models.
Given the rapidly changing generative AI technology landscape, our end-to-end processes are expected to continuously evolve and improve. As a testament to the rigor IBM puts into the development and testing of its foundation models, the company provides its standard contractual intellectual property indemnification for IBM-developed models, similar to those it provides for IBM hardware and software products.
Moreover, contrary to some other providers of large language models and consistent with the IBM standard approach on indemnification, IBM does not require its customers to indemnify IBM for a customer’s use of IBM-developed models. Also, consistent with the IBM approach to its indemnification obligation, IBM does not cap its indemnification liability for the IBM-developed models.
The current watsonx models now under these protections include:
(1) Slate family of encoder-only models
(2) Granite family of a decoder-only model
* Supported context length by model provider, but actual context length on platform is limited. For more information, please see Documentation.
Inference is billed in Resource Units. 1 Resource Unit is 1,000 tokens. Input and completion tokens are charged at the same rate. 1,000 tokens are generally about 750 words.
Not all models are available in all regions. See our documentation for details.
Context length is expressed in tokens.
The IBM statements regarding its plans, directions and intent are subject to change or withdrawal without notice at its sole discretion. See Pricing for more details. Unless otherwise specified under Software pricing, all features, capabilities and potential updates refer exclusively to SaaS. IBM makes no representation that SaaS and software features and capabilities are the same.