Third-party foundation models
You can choose from a collection of third-party foundation models in IBM watsonx.ai.
The following models are available in watsonx.ai:
- allam-1-13b-instruct
- codellama-34b-instruct-hf
- codestral-2501
- deepseek-r1-distill-llama-8b
- deepseek-r1-distill-llama-70b
- eurollm-1-7b-instruct
- eurollm-9b-instruct
- gpt-oss-20b
- gpt-oss-120b
- llama-4-maverick-17b-128e-instruct-fp8
- llama-4-maverick-17b-128e-instruct-int4
- llama-4-scout-17b-16e-instruct-fp8-dynamic
- llama-3-3-70b-instruct
- llama-3-2-11b-vision-instruct
- llama-3-2-90b-vision-instruct
- llama-guard-3-11b-vision
- llama-3-1-8b
- llama-3-1-8b-instruct
- llama-3-1-70b
- llama-3-1-70b-instruct
- llama-3-1-70b-gptq
- llama-3-1-405b-instruct-fp8
- llama-3-405b-instruct
- llama-2-13b-chat
- llama-2-70b-chat
- ministral-3b-instruct-2512
- ministral-8b-instruct-2512
- ministral-8b-instruct-2410
- mistral-large-2512
- mistral-large
- mistral-large-instruct-2407
- mistral-large-instruct-2411
- mistral-medium-2505
- mistral-medium-2508
- mistral-nemo-instruct-2407
- mistral-small-3-2-24b-instruct-2506
- mistral-small-3-1-24b-instruct-2503
- mixtral-8x7b-base
- mixtral-8x7b-instruct-v01
- poro-34b-chat
To learn more about the various ways that these models can be deployed, and to see a summary of pricing and context window length information for the models, see Supported foundation models.
For details about IBM foundation models, see IBM foundation models.
How to choose a model
To review factors that can help you to choose a model, such as supported tasks and languages, see Choosing a model and Foundation model benchmarks.
A deprecated foundation model is highlighted with a deprecated warning icon . For details about model deprecation and withdrawal, see Foundation model lifecycle.
Foundation model details
The foundation models in watsonx.ai support a range of use cases for both natural languages and programming languages. To see the types of tasks that these models can do, review and try the sample prompts. To view pricing details for deploy on demand foundation models, see Hourly billing rates for deploy on demand models.
If your watsonx region is the Dallas data center on IBM Cloud, you can follow the model card links. Otherwise, search for the model name in the Resource hub. The model might not be available in all regions or cloud platforms.
allam-1-13b-instruct
The allam-1-13b-instruct foundation model is a bilingual large language model for Arabic and English provided by the National Center for Artificial Intelligence and supported by the Saudi Authority for Data and Artificial Intelligence that is fine-tuned to support conversational tasks. The ALLaM series is a collection of powerful language models designed to advance Arabic language technology. These models are initialized with Llama-2 weights and undergo training on both Arabic and English languages.
- Usage
- Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
- Size
- 13 billion parameters
- API pricing tier
- Class 2 for the multitenant model deployment. For pricing details, see Table 3. For pricing details on dedicated use, see Table 5
- Availability
-
- Provided by IBM deployed on multitenant hardware.
-
- Deploy on demand for dedicated use.
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- Arabic (Modern Standard Arabic) and English
- Instruction tuning information
- allam-1-13b-instruct is based on the Allam-13b-base model, which is a foundation model that is pre-trained on a total of 3 trillion tokens in English and Arabic, including the tokens seen from its initialization. The Arabic dataset contains 500 billion tokens after cleaning and deduplication. The additional data is collected from open source collections and web crawls. The allam-1-13b-instruct foundation model is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
- Model architecture
- Decoder-only
- License
- Llama 2 community license and ALLaM license
- Learn more
- Read the following resource: wx
codellama-34b-instruct-hf
A programmatic code generation model that is based on Llama 2 from Meta. Code Llama is fine-tuned for generating and discussing code.
- Usage
- Use Code Llama to create prompts that generate code based on natural language inputs, explain code, or that complete and debug code.
- Size
- 34 billion parameters
- API pricing tier
- For pricing details, see Table 5.
- Availability
- Deploy on demand for dedicated use.
- Try it out
- Experiment with samples:
- Token limits
- Context window length (input + output): 16,384
- Note: The maximum new tokens, which means the tokens that are generated by the foundation model per request, is limited to 8,192.
- Supported natural languages
- English
- Supported programming languages
- The codellama-34b-instruct-hf foundation model supports many programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.
- Instruction tuning information
- The instruction fine-tuned version was fed natural language instruction input and the expected output to guide the model to generate helpful and safe answers in natural language.
- Model architecture
- Decoder
- License
- License
- Learn more
- Read the following resources:
codestral-2501
The codestral-2501 foundation model is a state-of-the-art coding model developed by Mistral AI. The model is based off the original codestral-22b model and has a more efficient architecture and an improved tokenizer. The codestral-2501 model performs code generation and code completion tasks approximately twice as fast as the original model.
- Usage
-
The codestral-2501 is optimized for low-latency, high-frequency use cases and supports tasks such as fill-in-the-middle (FIM), code correction and generating test cases.
- Size
-
22 billion parameters
- API pricing tier
-
For pricing details, see Hourly billing rates for deploy on demand models.
- Availability
-
Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 256,000 Note:
- The maximum new tokens, which means the tokens that are generated by the foundation model per request, is limited to 8,192.
- Supported natural languages
-
English
- Instruction tuning information
-
The codestral-2501 model is proficient in over 80 programming languages, including popular languages such as Python, Java, C, C++, JavaScript, and Bash. The model is also works well with more specialized languages like Swift and Fortran.
- Model architecture
-
Decoder
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources:
DeepSeek-R1 distilled models
The distilled variants of the DeepSeek-R1 models based on the Llama 3.1 models are provided by DeepSeek AI. The DeepSeek-R1 models are open-sourced models with powerful reasoning capabilities. The data samples generated by the DeepSeek R1 model are used to fine tune a base Llama model.
The deepseek-r1-distill-llama-8b and deepseek-r1-distill-llama-70b models are distilled versions of the DeepSeek-R1 model based on the Llama 3.1 8B and the Llama 3.3 70B models respectively.
- Usage
-
General use with zero- or few-shot prompts and are designed to excel in instruction-following tasks such as summarization, classification, reasoning, code tasks, as well as math.
- Available sizes
-
- 8 billion parameters
- 70 billion parameters
- API pricing tier
-
8b: Small
-
70b: Large
For pricing details on dedicated use, see Table 5.
- Availability
-
8b and 70b: Deploy on demand for dedicated use.
- Try it out
-
Experiment with samples:
- Token limits
-
8b and 70b: Context window length (input + output): 131,072
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 32,768.
- Supported natural languages
-
English
- Instruction tuning information
-
The DeepSeek-R1 models are trained by using large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. The subsequent RL and SFT stages aim to improve reasoning patterns and align the model with human preferences. DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
- Model architecture
-
Decoder
- License
-
8b: License
-
70b: License
- Learn more
-
Read the following resources:
EuroLLM Instruct
The EuroLLM series of models is developed by the Unified Transcription and Translation for Extended Reality (UTTER) Project and the European Union. The EuroLLM Instruct models are open-source models specialized in understanding and generating text across all the 24 official European Union (EU) languages, as well as 11 commercially and strategically important international languages.
- Usage
-
Suited for multingual language tasks like general instructon-following and language translation.
- Sizes
-
- 1.7 billion parameters
- 9 billion parameters
- API pricing tier
-
1.7b: Small
-
9b: Small
For pricing details on dedicated use, see Table 5.
- Availability
-
Deploy on demand for dedicated use.
- Token limits
-
1.7b and 9b: Context window length (input + output): 4,096
- Supported natural languages
-
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
- Instruction tuning information
-
The models are trained on 4 trillion tokens across the supported natural languages from web data, parallel data, Wikipedia, Arxiv, multiple books, and Apollo datasets.
- Model architecture
-
Decoder
- License
- Learn more
-
Read the following resources:
gpt-oss models
The gpt-oss foundation models, gpt-oss-20b and gpt-oss-120b, are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and various developer use cases. The models are designed for production, general purpose, high reasoning and can be fine-tuned for a variety of specialized use cases.
You can set the reasoning level in the system prompts that suits your task across three levels:
- Low: Fast responses for general dialogue.
- Medium: Balanced speed and detail.
- High: Deep and detailed analysis.
- Usage
- Supports Q&A, summarization, classification, generation, extraction, translation, function calling and code generation and conversion.
- Size
-
- 20 billion parameters
- 120 billion parameters
- 20 billion parameters
- API pricing tier
- 120b:
Input tier: Class 8, Output tier: Class 1. For pricing details, see Table 3.
For pricing details on deploying the 20b and 120b model on demand, see Hourly billing rates for deploy on demand models. - Availability
-
- 120b: Provided by IBM deployed on multitenant hardware.
- 20b and 120b: Deploy on demand for dedicated use.
- Token limits
- Context window length (input + output): 131,072
- Supported natural languages
- English. The gpt-oss-120b supports multilingual understanding.
- Instruction tuning information
- Pretrained on a mostly English, text-only dataset, with a focus on STEM, coding, and general knowledge.
- Model architecture
- Decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
Llama 4 Instruct models
The Llama 4 collection of foundation models are provided by Meta. The llama-4-maverick-17b-128e-instruct-fp8 and llama-4-scout-17b-16e-instruct-fp8-dynamic models are multimodal models that use a mixture-of-experts (MoE) architecture for optimized, best-in-class performance in text and image understanding. The llama-4-maverick-17b-128e-instruct-int4 model is a quantized version of the base model, with weights converted to the INT4 data type.
The Llama 4 Maverick model is a 17 billion active parameter multimodal model with 128 experts. The Llama 4 Scout model is a 17 billion active parameter model with 16 experts.
Some parameters are not supported for the llama-4-maverick-17b-128e-instruct-fp8 foundation model. For details, see Known issues and limitations.
- Usage
-
Generates multilingual dialog output like a chatbot, uses a model-specific prompt format, optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.
- Size
-
17 billion parameters
- API pricing tier
-
- Input tier: Class 9
Output tier: Class 16. For pricing details on thellama-4-maverick-17b-128e-instruct-fp8multitenant model deployment, see Table 3. - For pricing details on dedicated use, see Hourly billing rates for deploy on demand models.
- Input tier: Class 9
- Availability
-
- llama-4-maverick-17b-128e-instruct-fp8 only: Provided by IBM deployed on multitenant hardware and deploy on demand for dedicated use.
- llama-4-maverick-17b-128e-instruct-int4: Deploy on demand for dedicated use.
- llama-4-scout-17b-16e-instruct-fp8-dynamic: Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 131,072
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192.
- Supported natural languages
-
Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese.
- Instruction tuning information
-
Llama 4 was pre-trained on a broader collection of 200 languages. The Llama 4 Scout model was pre-trained on approximately 40 trillion tokens and the Llama 4 Maverick model was pre-trained on approximately 22 trillion tokens of multimodal data from publicly available and licensed information from Meta.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.3 70B Instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model (text in/text out) with 70 billion parameters.
The llama-3-3-70b-instruct is a revision of the popular Llama 3.1 70B Instruct foundation model. The Llama 3.3 foundation model is better at coding, step-by-step reasoning, and tool-calling. Despite its smaller size, the Llama 3.3 model's performance is similar to that of the Llama 3.1 405b model, making it a great choice for developers.
- Usage
-
Generates multilingual dialog output like a chatbot. Uses a model-specific prompt format.
- Size
-
70 billion parameters
- API pricing tier
-
Class 13 for the
llama-3-3-70b-instructmultitenant model deployment. For pricing details, see Table 3.
For pricing details on dedicated use, see Hourly billing rates for deploy on demand models. - Availability
-
-
A quantized version of the model is provided by IBM deployed on multitenant hardware.
-
Two versions of the model are available to deploy on demand for dedicated use:
- llama-3-3-70b-instruct-hf: Original version published on Hugging Face by Meta.
- llama-3-3-70b-instruct: A quantized version of the model that can be deployed with 2 GPUs instead of 4.
-
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output): 131,072
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Instruction tuning information
-
Llama 3.3 was pretrained on 15 trillion tokens of data from publicly available sources. The fine tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.2 Vision Instruct
The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-3-2-11b-vision-instruct and llama-3-2-90b-vision-instruct models are built for image-in, text-out use cases such as document-level understanding, interpretation of charts and graphs, and captioning of images.
- Usage
-
Generates dialog output like a chatbot and can perform computer vision tasks including classification, object detection and identification, image-to-text transcription (including handwriting), contextual Q&A, data extraction and processing, image comparison and personal visual assistance. Uses a model-specific prompt format.
- Sizes
-
- 11 billion parameters
- 90 billion parameters
- API pricing tier
-
- 11b: Class 9
- 90b: Class 10 For pricing details on multitenant model deployment, see Table 3.
For pricing details on dedicated use, see Hourly billing rates for deploy on demand models.
- Availability
-
- 11b and 90b: Provided by IBM deployed on multitenant hardware.
- 11b and 90b: Deploy on demand for dedicated use.
The IBM-provided deployment of the llama-3-2-90b-vision-instruct foundation model is deprecated. See Foundation model lifecycle.
- Try it out
- Token limits
-
Context window length (input + output)
- 11b: 131,072
- 90b: 131,072
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192. The tokens that are counted for an image that you submit to the model are not included in the context window length.
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.
- Instruction tuning information
-
Llama 3.2 Vision models use image-reasoning adaptor weights that are trained separately from the core large language model weights. This separation preserves the general knowledge of the model and makes the model more efficient both at pretraining time and run time. The Llama 3.2 Vision models were pretrained on 6 billion image-and-text pairs, which required far fewer compute resources than were needed to pretrain the Llama 3.1 70B foundation model alone. Llama 3.2 models also run efficiently because they can tap more compute resources for image reasoning only when the input requires it.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
llama-guard-3-11b-vision
The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-guard-3-11b-vision is a multimodal evolution of the text-only Llama-Guard-3 model. The model can be used to classify image and text content in user inputs (prompt classification) as safe or unsafe.
- Usage
-
Use the model to check the safety of the image and text in an image-to-text prompt.
- Size
-
- 11 billion parameters
- API pricing tier
-
Class 9 for the multitenant model deployment. For pricing details, see Table 3.
- Availability
-
Provided by IBM deployed on multitenant hardware.
- Try it out
- Token limits
-
Context window length (input + output): 131,072
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192. The tokens that are counted for an image that you submit to the model are not included in the context window length.
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.
- Instruction tuning information
-
Pretrained model that is fine-tuned for content safety classification. For more information about the types of content that are classified as unsafe, see the model card.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.1 base
The Meta Llama 3.1 collection of foundation models are provided by Meta. The Llama 3.1 base foundation models, llama-3-1-8b and llama-3-1-70b, are multilingual models that supports tool use, and have overall stronger reasoning capabilities.
- Usage
-
Use for long-form text summarization and with multilingual conversational agents or coding assistants.
You can use the following foundation models from the Llama 3.1 model family for fine tuning purposes:
- llama-3-1-8b
- llama-3-1-70b-gptq
- Sizes
-
- 8 billion parameters
- 70 billion parameters
- API pricing tier
-
For pricing details, see Table 5.
- Availability
-
8b and 70b: Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output):
- 8b: 131,072
- 70b: 131,072
- Supported natural languages
-
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 3.1 Instruct
The Meta Llama 3.1 collection of foundation models are provided by Meta. The Llama 3.1 foundation models are pretrained and instruction tuned text-only generative models that are optimized for multilingual dialogue use cases. The models use supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety.
The llama-3-405b-instruct-fp8 model is Meta's largest open-sourced foundation model in the Llama 3.1 family. This foundation model can also be used as a synthetic data generator, post-training data ranking judge, or model teacher/supervisor that can improve specialized capabilities in more inference-friendly, derivative models.
- Usage
- Generates dialog output like a chatbot. Uses a model-specific prompt format.
- Sizes
-
- 8 billion parameters
- 70 billion parameters
- 405 billion parameters
- API pricing tier
- For pricing details on dedicated use, see Table 5.
- Availability
-
- 8b, 70b and 405b: Deploy on demand for dedicated use.
- Try it out
- Token limits
- Context window length (input + output)
-
8b: 131,072
-
70b: 131,072
-
405b: 16,384
- Although the model supports a context window length of 131,072, the window is limited to 16,384 to reduce the time it takes for the model to generate a response.
-
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 4,096.
-
- Supported natural languages
- English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Instruction tuning information
- Llama 3.1 was pretrained on 15 trillion tokens of data from publicly available sources. The fine tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.
- Model architecture
- Decoder-only
- License
- Learn more
- Read the following resources:
Llama 3 Instruct
The Meta Llama 3 family of foundation models are accessible, open large language models that are built with Meta Llama 3 and provided by Meta on Hugging Face. The Llama 3 foundation models are instruction fine-tuned language models that can support various use cases.
- Usage
-
Generates dialog output like a chatbot.
- Sizes
-
- 8 billion parameters
- 70 billion parameters
- API pricing tier
-
For pricing details on dedicated use, see Table 5.
- Availability
-
- Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output)
- 8b: 8,192
- 70b: 8,192
Note: The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 4,096.
- Supported natural languages
-
English
- Instruction tuning information
-
Llama 3 features improvements in post-training procedures that reduce false refusal rates, improve alignment, and increase diversity in the foundation model output. The result is better reasoning, code generation, and instruction-following capabilities. Llama 3 has more training tokens (15T) that result in better language comprehension.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 2 Chat
The Llama 2 Chat models are provided by Meta on Hugging Face. The fine-tuned models are useful for chat generation. The models are pretrained with publicly available online data and fine-tuned using reinforcement learning from human feedback.
You can choose to use the 13 billion parameter or 70 billion parameter version of the model.
- Usage
-
Generates dialog output like a chatbot. Uses a model-specific prompt format.
- Size
-
- 13 billion parameters
- 70 billion parameters
- API pricing tier
-
For pricing details for dedicated use, see Table 5 and Hourly billing rates for deploy on demand models.
- Availability
-
- 13b: Deploy on demand for dedicated use
- 70b: Deploy on demand for dedicated use
The IBM-provided deployment of this foundation model is deprecated. See Foundation model lifecycle.
- Try it out
-
Experiment with samples:
- Token limits
-
Context window length (input + output)
- 13b: 4,096
- 70b: 4,096
- Supported natural languages
-
English
- Instruction tuning information
-
Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets and more than one million new examples that were annotated by humans.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Ministral 3
The ministral-3b-instruct-2512 and ministral-8b-instruct-2512 are instruction fine-tuned foundation models developed by Mistral AI. The Ministral 3 family is designed for edge deployment, providing flexibility to run models across many different environments. The models are powerful, efficient tiny language models with vision capabilities, ideal for chat and instruction based use cases.
- Usage
- Suitable for classification, generation, extraction, translation, retrieval augmented generation, code, function calling and more.
- Size
- 3 billion parameters 8 billion parameters
- API pricing tier
- For pricing details on dedicated use, see Hourly billing rates for deploy on demand models.
- Availability
-
- 3b and 8b: Deploy on demand for dedicated use.
- Try it out
- Sample prompts
- Token limits
- Context window length (input + output): 256,000
- Supported natural languages
- English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic, and dozens of other languages.
- Supported programming languages
- The Ministral 3 foundation models have been trained on several programming languages.
- Instruction tuning information
- The Ministral 3 foundation models are trained on a large proportion of multilingual and code data.
- Model architecture
- Decoder-only
- License
- For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources:
ministral-8b-instruct
The ministral-8b-instruct foundation model is an instruction fine-tuned model developed by Mistral AI. The ministral-8b-instruct model is optimized for on-device computing, local intelligence and at-the-edge use cases. The model works well for critical applications that run on edge devices and require privacy-first inferencing.
- Usage
-
Suitable for translation, function-calling, reasoning tasks, including text understanding and, transformation, internet-less smart assistants, local analytics, and autonomous robotics.
- Size
-
8 billion parameters
- API pricing tier
-
For pricing details on dedicated use, see Table 5 and Hourly billing rates for deploy on demand models.
Attention: This foundation model has an additional access fee that is applied per hour of use. - Availability
-
- Deploy on demand for dedicated use
- Try it out
- Token limits
-
Context window length (input + output): 32,768
- Supported natural languages
-
English, French, German, Italian, Spanish, and dozens of other languages.
- Supported programming languages
-
The ministral-8b-instruct model has been trained on several programming languages.
- Instruction tuning information
-
The ministral-8b-instruct foundation model is trained on a large proportion of multilingual and code data.
- Model architecture
-
Decoder-only
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
-
Read the following resources:
mistral-large-2512
The mistral-large-2512 model, also known as Mistral Large 3, is a large language model developed by Mistral AI. The mistral-large-2512 foundation model is a state-of-the-art, open-weight, general-purpose multimodal model with a granular mixture-of-experts architecture. The model is instruction post-trained and designed for reliability and long-context comprehension. It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows. The model has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases.
For more getting started information, see the watsonx.ai page on the Mistral AI website.
- Usage
-
Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.
- API pricing tier
-
- Input tier: Class 1
- Output tier: Class 2 For pricing details on multitenant model deployment, see Table 3.
For pricing details on dedicated use, see Table 5 and Hourly billing rates for deploy on demand models.
- Availability
-
- Provided by IBM deployed on multitenant hardware
- Deploy on demand for dedicated use
- Try it out
- Token limits
-
Context window length (input + output): 256,000
- Supported natural languages
-
English, French, German, Italian, Spanish, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, and dozens of other languages.
- Supported programming languages
-
The mistral-large-2512 model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
-
The mistral-large-2512 foundation model is pre-trained on diverse datasets like text, codebases, and mathematical data from various domains.
- Model architecture
-
Decoder-only
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources:
mistral-large
Mistral Large 2, also known as the Mistral-Large-Instruct-2407, is a family of large language models developed by Mistral AI. The mistral-large foundation model is fluent in and understand the grammar and cultural context of English, French, Spanish, German, and Italian. The foundation model can also understand dozens of other languages. The model has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases. The mistral-large foundation model is effective at programmatic tasks, such as generating, reviewing, and commenting on code, function calling, and can generate results in JSON format.
For more getting started information, see the watsonx.ai page on the Mistral AI website.
- Usage
-
Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.
- API pricing tier
-
For pricing details of deploy on demand models, see Table 5.
Attention: This foundation model has an additional access fee that is applied per hour of use. - Availability
-
- Deploy on demand for dedicated use
This model appears in Resource hub as follows:
- Provided by IBM deployed on multitenant hardware: mistral-large
- Deploy on demand for dedicated use: mistral-large-instruct-2407
- Try it out
- Token limits
-
Context window length (input + output): 131,072
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, and dozens of other languages.
- Supported programming languages
-
The mistral-large model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
-
The mistral-large foundation model is pre-trained on diverse datasets like text, codebases, and mathematical data from various domains.
- Model architecture
-
Decoder-only
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources:
mistral-large-instruct-2411
The mistral-large-instruct-2411 foundation model from Mistral AI and belongs to the Mistral Large 2 family of models. The model specializes in reasoning, knowledge, and coding. The model extends the capabilities of the Mistral-Large-Instruct-2407 foundation model to include better handling of long prompt contexts, system prompt instructions, and function calling requests.
- Usage
-
The mistral-large-instruct-2411 foundation model is multilingual, proficient in coding, agent-centric, and adheres to system prompts to aid in retrieval-augmented generation tasks and other use cases where prompts with large context need to be handled.
- Size
-
123 billion parameters
- API pricing tier
-
For pricing details of deploy on demand models, see Table 5.
Attention: This foundation model has an additional access fee that is applied per hour of use. - Availability
-
Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 131,072
- Supported natural languages
-
Multiple languages and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Supported programming languages
-
The mistral-large-instruct-2411 foundation model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
-
The mistral-large-instruct-2411 foundation model extends the Mistral-Large-Instruct-2407 foundation model from Mistral AI. Training enhanced the reasoning capabilities of the model. Training also focused on reducing hallucinations by fine tuning the model to be more cautious and discerning in its responses and to acknowledge when it cannot find solutions or does not have sufficient information to provide a confident answer.
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources:
mistral-medium-2505
Mistral Medium 3 is a family of medium language models developed by Mistral AI. The Mistral Medium models have a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases.
The mistral-medium-2505 foundation model is fluent in and understands the grammar and cultural context of many languages. The mistral-medium-2505 model can process visual inputs and is effective at programming, mathematical reasoning, document understanding and dialogue.
- Usage
-
Suitable for complex multilingual reasoning tasks, long document understanding, code generation, function calling & agentic workflows.
- API pricing tier
-
For pricing details of the multitenant model deployment, see Table 3.
Pricing for inferencing the provided Mistral Medium model is not assigned by a multiplier. The following special pricing tiers are used:
- Input tier: Mistral Large Input
- Output tier: Mistral Large
For pricing details on dedicated use, see Table 5 and Hourly billing rates for deploy on demand models.
- Availability
-
- 2505: Provided by IBM deployed on multitenant hardware and deploy on demand for dedicated use.
- 2508: Deploy on demand for dedicated use.
Try it out
- Token limits
-
Context window length (input + output): 131,072
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
Arabic, Farsi, Urdu, Hebrew, Turkish, Indonesian, Lao, Malysian, Thai, Tagalog, Vietnamese, Hindi, Bengali, Gujarati, Kannada, Marathi, Nepali, Punjabi, Tamil, Telugu, Breton, Catalan, Czech, Danish, Greek, Finnish, Croatian, Dutch, Norwegian, Polish, Romanian, Swedish, Serbian, Ukranian, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese.
- Supported programming languages
-
The mistral-medium model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
-
The mistral-medium foundation model is pre-trained on diverse datasets like text, codebases, and mathematical data from various domains.
- Model architecture
-
Decoder-only
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources:
mistral-nemo-instruct-2407
The mistral-nemo-instruct-2407 foundation model from Mistral AI was built in collaboration with NVIDIA. Mistral NeMo performs exceptionally well in reasoning, world knowledge, and coding accuracy, especially for a model of its size.
- Usage
- The Mistral NeMo model is multilingual and is trained on function calling.
- Size
- 12 billion parameters
- API pricing tier
- For pricing details of deploy on demand models, see Table 5.
- Availability
- Deploy on demand for dedicated use.
- Token limits
- Context window length (input + output): 131,072
- Supported natural languages
- Multiple languages and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
- Supported programming languages
- The Mistral NeMo model has been trained on several programming languages.
- Instruction tuning information
- Mistral NeMo had an advanced fine-tuning and alignment phase.
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
mistral-small-3-2-24b-instruct-2506
The Mistral Small 3.2 foundation model builds upon Mistral Small 3.1 developed by Mistral AI. The mistral-small-3-2-24b-instruct-2506 improves instruction following, function calling and produces fewer repetitive or infinite outputs. The model is instruction fine-tuned and comes with improved text performance, instruction following, conversational assistance, image understanding, multimodal understanding, and advanced reasoning. Built to support agentic application, with adherence to system prompts and function calling with JSON output generation.
For more information to get started, see the watsonx.ai page on the Mistral AI website.
- Usage
- Suitable for conversational agents and function calling.
- Size
- 24 billion parameters
- API pricing tier
- For pricing details on dedicated use, see Table 5 and Hourly billing rates for deploy on demand models.
- Availability
-
- Deploy on demand for dedicated use.
- Try it out
- Sample prompts
- Token limits
- Context window length (input + output): 131,072
- Supported natural languages
- English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi and many other languages.
- Supported programming languages
- The mistral-small-3-2-24b-instruct-2506 model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
- The mistral-small-3-2-24b-instruct-2506 foundation model is pre-trained on diverse datasets like text, codebases, and mathematical data from various domains.
- Model architecture
- Decoder-only
- License
- Apache 2.0 license
- Learn more
mistral-small-3-1-24b-instruct-2503
The Mistral Small 3.1 foundation model builds upon Mistral Small 3 developed by Mistral AI, enhancing vision understanding and long context capabilities without compromising text performance. The mistral-small-3-1-24b-instruct-2503 model is instruction fine-tuned and comes with improved text performance, instruction following, conversational assistance, image understanding, multimodal understanding, and advanced reasoning. Built to support agentic application, with adherence to system prompts and function calling with JSON output generation.
For more getting started information, see the watsonx.ai page on the Mistral AI website.
- Usage
-
Suitable for conversational agents and function calling.
- API pricing tier
-
For pricing details on the multitenant model deployment, see Table 3. For pricing details of the deploy on demand model, see Table 5 and Hourly billing rates for deploy on demand models.
- Availability
-
- Provided by IBM deployed on multitenant hardware.
- Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 131,072
Note:
- The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi and dozens of other languages.
- Supported programming languages
-
The mistral-small-3-1-24b-instruct-2503 model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.
- Instruction tuning information
-
The mistral-small-3-1-24b-instruct-2503 foundation model is pre-trained on diverse datasets like text, codebases, and mathematical data from various domains.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mixtral-8x7b-base
The mixtral-8x7b-base foundation model is provided by Mistral AI. The mixtral-8x7b-base foundation model is a generative sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
- Usage
-
Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation.
- Size
-
46.7 billion parameters
- API pricing tier
-
For pricing details of deploy on demand models, see Table 5.
- Availability
-
Deploy on demand for dedicated use.
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mixtral-8x7b-instruct-v01
The mixtral-8x7b-instruct-v01 foundation model is provided by Mistral AI. The mixtral-8x7b-instruct-v01 foundation model is a pretrained generative sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
- Usage
-
Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.
- Size
-
46.7 billion parameters
- API pricing tier
-
For pricing details of deploy on demand models, see Table 5.
- Availability
-
Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish
- Instruction tuning information
-
The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mt0-xxl-13b
The mt0-xxl-13b model is provided by BigScience on Hugging Face. The model is optimized to support language generation and translation tasks with English, languages other than English, and multilingual prompts.
Usage: General use with zero- or few-shot prompts. For translation tasks, include a period to indicate the end of the text you want translated or the model might continue the sentence rather than translate it.
- Size
- 13 billion parameters
- API pricing tier
- For pricing details of deploy on demand models, see Table 5.
- Availability
-
- Deployed on demand for dedicated use.
- Try it out
- Experiment with the following samples:
- Supported natural languages
- Multilingual
- Token limits
- Context window length (input + output): 4,096
- Supported natural languages
- The model is pretrained on multilingual data in 108 languages and fine-tuned with multilingual data in 46 languages to perform multilingual tasks.
- Instruction tuning information
- BigScience publishes details about its code and datasets.
- Model architecture
- Encoder-decoder
- License
- Apache 2.0 license
- Learn more
- Read the following resources:
pixtral-12b
Pixtral 12B is a multimodal model developed by Mistral AI. The pixtral-12b foundation model is trained to understand both natural images and documents and is able to ingest images at their natural resolution and aspect ratio, providing flexibility on the number of tokens used to process an image. The foundation model supports multiple images in its long context window. The model is effective in image-in, text-out multimodal tasks and excels at instruction following.
- Usage
-
Chart and figure understanding, document question answering, multimodal reasoning, and instruction following.
- Size
-
12 billion parameters
- API pricing tier
-
- For pricing details of deploy on demand models, see Hourly billing rates for deploy on demand models.
- Availability
-
- Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 128,000
The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192.
- Supported natural languages
-
English
- Instruction tuning information
-
The pixtral-12b model is trained with interleaved image and text data and is based on the Mistral Nemo model with a 400 million parameter vision encoder trained from scratch.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
poro-34b-chat
Poro 34b chat is designed to support chat use cases, with training to follow instructions in both Finnish and English. Poro is developed by Silo AI in collaboration with TurkuNLP and High Performance Language Technologies (HPLT).
- Usage
-
Use the model to generate dialog output like a chatbot.
- Size
-
34 billion parameters
- API pricing tier
-
For pricing details of deploy on demand models, see Table 5.
- Availability
-
Deploy on demand for dedicated use.
- Try it out
- Token limits
-
Context window length (input + output): 2,048
- Supported natural languages
-
English, Finnish
- Instruction tuning information
-
Poro-34b-Chat is developed through supervised fine-tuning (SFT) of the base Poro-34b model using instruction-following datasets in both English and Finnish.
- Model architecture
-
Decoder
- License
-
For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.
- Learn more
- Read the following resources: