Third-party foundation models

You can choose from a collection of third-party foundation models in IBM watsonx.ai.

For details about IBM foundation models, see IBM foundation models.

How to choose a model

To review factors that can help you to choose a model, such as supported tasks and languages, see Choosing a model and Foundation model benchmarks.

A deprecated foundation model is highlighted with a deprecated warning icon . For details about model deprecation and withdrawal, see Foundation model lifecycle.

Foundation model details

The foundation models in watsonx.ai support a range of use cases for both natural languages and programming languages. To see the types of tasks that these models can do, review and try the sample prompts.

allam-1-13b-instruct

The allam‑1‑13b‑instruct foundation model is a bilingual large language model for Arabic and English provided by the National Center for Artificial Intelligence. The model is supported by the Saudi Authority for Data and Artificial Intelligence and is fine‑tuned to support conversational tasks. The ALLaM series is a collection of powerful language models that are designed to advance Arabic language technology. These models are initialized with Llama-2 weights and undergo training on both Arabic and English languages.

Note:

When you inference this model from the Prompt Lab, disable AI guardrails.

Usage

Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.

Size

13 billion parameters

Try it out

Experiment with samples:

Token limits

Context window length (input + output): 4,096

Supported natural languages

Arabic (Modern Standard Arabic) and English

Instruction tuning information

allam-1-13b-instruct is based on the Allam-13b-base model, which is a foundation model that is pretrained on a total of 3 trillion tokens in English and Arabic, including the tokens seen from its initialization. The Arabic dataset contains 500 billion tokens after cleaning and deduplication. The additional data is collected from open source collections and web crawls. The allam-1-13b-instruct foundation model is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.

Model architecture

Decoder-only

License

Llama 2 community license and ALLaM license

codestral-2501

The codestral-2501 foundation model is a state-of-the-art coding model developed by Mistral AI. The model is based off the original codestral-22b model and has a more efficient architecture and an improved tokenizer. The codestral-2501 model performs code generation and code completion tasks approximately twice as fast as the original model.

Usage

The codestral-2501 is optimized for low-latency, high-frequency use cases, and supports tasks such as fill-in-the-middle (FIM), code correction, and generating test cases.

Size

22 billion parameters

Try it out

Sample prompt: Code

Token limits

Context window length (input + output): 256,000 Note:

The maximum new tokens, which means the tokens that are generated by the foundation model per request, is limited to 8,192.

Supported natural languages

English

Instruction tuning information

The codestral-2501 model is proficient in over 80 programming languages, including popular languages such as Python, Java, C, C++, JavaScript, and Bash. The model also works well with more specialized languages like Swift and Fortran.

Model architecture

Decoder

Learn more

Read the following resources:

Blog post for Codestral 25.01

gpt-oss models

The gpt-oss foundation models, gpt-oss-20b and gpt-oss-120b, are OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and various developer use cases. The models are designed for production, general‑purpose use, and high‑level reasoning and can be fine-tuned for various specialized use cases.

You can set the reasoning level in the system prompts that suit your task across three levels:

Low: Fast responses for general dialogue.
Medium: Balanced speed and detail.
High: Deep and detailed analysis.

Usage

Supports Q&A, summarization, classification, generation, extraction, translation, function calling, and code generation and conversion.

Size

20 billion parameters
120 billion parameters

Token limits

Context window length (input + output):

gpt-oss-20b: 131,072
gpt-oss-120b: 131,072

Supported natural languages

English. The gpt-oss-120b foundation model supports multilingual understanding.

Instruction tuning information

Pretrained on a mostly English, text-only dataset, with a focus on STEM, coding, and general knowledge.

Model architecture

Decoder

License

Apache 2.0 license

Learn more

Read the following resources:

OpenAI blog

Llama 4 Instruct models

Usage

Generates multilingual dialog output like a chatbot, uses a model-specific prompt format, optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

Size

17 billion parameters

Try it out

Token limits

Context window length (input + output):

llama-4-maverick-17b-128e-instruct-fp8 only: 131,072
llama-4-maverick-17b-128e-instruct-int4: 131,072

The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192.

Supported natural languages

Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese.

Instruction tuning information

Llama 4 was pretrained on a broader collection of 200 languages. The Llama 4 Scout model was pretrained on approximately 40 trillion tokens and the Llama 4 Maverick model was pretrained on approximately 22 trillion tokens of multimodal data from publicly available and licensed information from Meta.

Model architecture

Decoder-only

License

Meta Llama 4 Community License

Learn more

Read the following resources:

Meta AI blog

Llama 3.3 70B Instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model (text in/text out) with 70 billion parameters.

The llama-3-3-70b-instruct is a revision of the popular Llama 3.1 70B Instruct foundation model. The Llama 3.3 foundation model is better at coding, step-by-step reasoning, and tool-calling. Despite its smaller size, the Llama 3.3 model's performance is similar to that of the Llama 3.1 405b model, making it a great choice for developers.

Usage

Generates multilingual dialog output like a chatbot. Uses a model-specific prompt format.

Size

70 billion parameters

Try it out

Experiment with samples:

Token limits

Context window length (input + output): 131,072

Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Instruction tuning information

Llama 3.3 was pretrained on 15 trillion tokens of data from publicly available sources. The fine tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.

Model architecture

Decoder-only

License

Meta Llama 3.3 Community License

Learn more

Read the following resources:

Llama 3.2 Vision Instruct

The Meta Llama 3.2 collection of foundation models is provided by Meta. The llama-3-2-11b-vision-instruct and llama-3-2-90b-vision-instruct models are built for image-in, text-out use cases such as document-level understanding, interpretation of charts and graphs, and captioning of images.

Usage

Generates dialog output like a chatbot and can perform computer vision tasks including classification, object detection and identification, image-to-text transcription (including handwriting), contextual Q&A, data extraction and processing, image comparison, and personal visual assistance. Uses a model-specific prompt format.

Sizes

11 billion parameters
90 billion parameters

Try it out

Token limits

Context window length (input + output)

llama-3-2-11b-vision-instruct : 131,072
llama-3-2-90b-vision-instruct: 131,072

The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192. The tokens that are counted for an image that you submit to the model are not included in the context window length.

Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.

Instruction tuning information

Llama 3.2 Vision models use image-reasoning adaptor weights that are trained separately from the core large language model weights. This separation preserves the general knowledge of the model and makes the model more efficient both at pretraining time and run time. The Llama 3.2 Vision models were pretrained on 6 billion image-and-text pairs, which required far fewer compute resources than were needed to pretrain the Llama 3.1 70B foundation model alone. Llama 3.2 models also run efficiently because they can tap more compute resources for image reasoning only when required by the input.

Model architecture

Decoder-only

License

Meta Llama 3.2 Community License

Learn more

Read the following resources:

llama-guard-3-11b-vision

The Meta Llama 3.2 collection of foundation models is provided by Meta. The llama-guard-3-11b-vision is a multimodal evolution of the text-only Llama-Guard-3 model. The model can be used to classify image and text content in user inputs (prompt classification) as safe or unsafe.

Usage

Use the model to check the safety of the image and text in an image-to-text prompt.

Size

11 billion parameters

Try it out

Token limits

Context window length (input + output): 131,072

Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.

Instruction tuning information

Pretrained model that is fine-tuned for content safety classification. For more information about the types of content that are classified as unsafe, see the model card.

Model architecture

Decoder-only

License

Meta Llama 3.2 Community License

Learn more

Read the following resources:

Ministral 3

Usage: Suitable for classification, generation, extraction, translation, retrieval augmented generation, code, function calling, and more.

Size

Try it out

Sample prompts

Token limits

Context window length (input + output):

Supported natural languages

English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic, and dozens of other languages.

Supported programming languages

The Ministral 3 foundation models have been trained on several programming languages.

Instruction tuning information

The Ministral 3 foundation models are trained on a large proportion of multilingual and code data.

Model architecture

Decoder-only

Learn more

Read the following resources:

Blog post for Ministral 3

ministral-8b-instruct

The ministral-8b-instruct foundation model is an instruction fine-tuned model developed by Mistral AI. The ministral-8b-instruct model is optimized for on-device computing, local intelligence, and at-the-edge use cases. The model works well for critical applications that run on edge devices and require privacy-first inferencing.

Usage: Suitable for translation, function-calling, reasoning tasks, including text understanding and transformation, internet-less smart assistants, local analytics, and autonomous robotics.
Size: 8 billion parameters
Try it out: Sample prompts

Token limits

Supported natural languages

English, French, German, Italian, Spanish, and dozens of other languages.

Supported programming languages

The ministral-8b-instruct model has been trained on several programming languages.

Instruction tuning information

The ministral-8b-instruct foundation model is trained on a large proportion of multilingual and code data.

Model architecture

Decoder-only

Learn more

Read the following resources:

Blog post for Ministral 8b

mistral-large-2512

The mistral-large-2512, also known as Mistral Large 3, is a large language model developed by Mistral AI. The mistral-large-2512 foundation model is a state-of-the-art, open-weight, general-purpose multimodal model with a granular mixture-of-experts architecture. The model is instruction post-trained and designed for reliability and long-context comprehension. The Mistral Large 3 foundation models are engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows. The mistral-large-2512 has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases.

For more getting started information, see the watsonx.ai page on the Mistral AI website.

Usage

Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Ideal for chat, agentic, and instruction based use cases. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.

Try it out

Sample prompts

Token limits

Context window length (input + output): 256,000

Supported natural languages

English, French, German, Italian, Spanish, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, and dozens of other languages.

Supported programming languages

The mistral-large-2512 model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.

Instruction tuning information

The mistral-large-2512 foundation model is pretrained on diverse datasets like text, codebases, and mathematical data from various domains.

Model architecture

Decoder-only

Learn more

Read the following resources:

Blog post for Mistral Large 3

mistral-large

Mistral Large 2, also known as the Mistral-Large-Instruct-2407, is a family of large language models developed by Mistral AI. The mistral-large foundation model is fluent in and understands the grammar and cultural context of English, French, Spanish, German, and Italian. The foundation model can also understand dozens of other languages. The model has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases. The mistral-large foundation model is effective at programmatic tasks, such as generating, reviewing, and commenting on code, function calling, and generating results in JSON format.

For more getting started information, see the watsonx.ai page on the Mistral AI website.

Usage

Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.

Try it out

Token limits

Context window length (input + output): 131,072

Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.

Supported natural languages

English, French, German, Italian, Spanish, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, and dozens of other languages.

Supported programming languages

The mistral-large model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.

Instruction tuning information

The mistral-large foundation model is pretrained on diverse datasets like text, codebases, and mathematical data from various domains.

Model architecture

Decoder-only

Learn more

Read the following resources:

mistral-large-instruct-2411

The mistral-large-instruct-2411 foundation model from Mistral AI belongs to the Mistral Large 2 family of models. The model specializes in reasoning, knowledge, and coding. The model extends the capabilities of the Mistral-Large-Instruct-2407 foundation model to include better handling of long prompt contexts, system prompt instructions, and function calling requests.

Usage

The mistral-large-instruct-2411 foundation model is multilingual, proficient in coding, agent-centric, and adheres to system prompts to aid in retrieval-augmented generation tasks and other use cases where prompts with large context need to be handled.

Size

123 billion parameters

Try it out

Sample prompts

Token limits

Context window length (input + output): 131,072

Supported natural languages

Multiple languages and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

Supported programming languages

The mistral-large-instruct-2411 foundation model is trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.

Instruction tuning information

The mistral-large-instruct-2411 foundation model extends the Mistral-Large-Instruct-2407 foundation model from Mistral AI. Training enhanced the reasoning capabilities of the model. Training also focused on reducing hallucinations by fine tuning the model to be more cautious and discerning in its responses and to acknowledge when it cannot find solutions or does not have sufficient information to provide a confident answer.

Learn more

Read the following resources:

Blog post from Mistral AI

mistral-small-3-2-24b-instruct-2506

The Mistral Small 3.2 foundation model builds upon Mistral Small 3.1 developed by Mistral AI. The mistral-small-3-2-24b-instruct-2506 improves instruction following, function calling and produces fewer repetitive or infinite outputs. The model is instruction fine-tuned and comes with improved text performance, instruction following, conversational assistance, image understanding, multimodal understanding, and advanced reasoning. Built to support agentic application, with adherence to system prompts and function calling with JSON output generation.

For more information to get started, see the watsonx.ai page on the Mistral AI website.

Usage

Suitable for conversational agents and function calling.

Size

24 billion parameters

Try it out

Sample prompts

Token limits

Context window length (input + output): 131,072

Supported natural languages

English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi, and many other languages.

Supported programming languages

The mistral-small-3-2-24b-instruct-2506 model is trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.

Instruction tuning information

The mistral-small-3-2-24b-instruct-2506 foundation model is pretrained on diverse datasets like text, codebases, and mathematical data from various domains.

Model architecture

Decoder-only

License

Apache 2.0 license

Learn more

Research paper

mistral-small-3-1-24b-instruct-2503

The Mistral Small 3.1 foundation model builds upon Mistral Small 3 developed by Mistral AI, enhancing vision understanding and long context capabilities without compromising text performance. The mistral-small-3-1-24b-instruct-2503 model is instruction fine-tuned and comes with improved text performance, instruction following, conversational assistance, image understanding, multimodal understanding, and advanced reasoning. The model is built to support agentic application, with adherence to system prompts and function calling with JSON output generation.

For more getting started information, see the watsonx.ai page on the Mistral AI website.

Usage

Suitable for conversational agents and function calling.

Try it out

Sample prompts

Token limits

Context window length (input + output): 131,072

Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.

Supported natural languages

Supported programming languages

The mistral-small-3-1-24b-instruct-2503 model is trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.

Instruction tuning information

The mistral-small-3-1-24b-instruct-2503 foundation model is pretrained on diverse datasets like text, codebases, and mathematical data from various domains.

Model architecture

Decoder-only

License

Apache 2.0 license

Learn more

Read the following resources:

mixtral-8x7b-instruct-v01

The mixtral-8x7b-instruct-v01 foundation model is provided by Mistral AI. The mixtral-8x7b-instruct-v01 foundation model is a pretrained generative sparse mixture-of-experts network that groups the model parameters, and then for each token, chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but uses only 13 billion active parameters for inferencing, which reduces costs and latency.

Usage

Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.

Size

46.7 billion parameters

Try it out

Sample prompts

Token limits

Context window length (input + output): 32,768

Note: The maximum new tokens, which means the tokens generated by the foundation model per request, is limited to 16,384.

Supported natural languages

English, French, German, Italian, and Spanish.

Instruction tuning information

The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.

Model architecture

Decoder-only

License

Apache 2.0 license

Learn more

Read the following resources:

pixtral-12b

Pixtral 12B is a multimodal model developed by Mistral AI. The pixtral-12b foundation model is trained to understand both natural images and documents. The model is able to analyze images at their natural resolution and aspect ratio, providing flexibility on the number of tokens used to process an image. The foundation model supports multiple images in its long context window. The model is effective in image-in, text-out multimodal tasks and excels at instruction following.

Usage

Chart and figure understanding, document question answering, multimodal reasoning, and instruction following.

Size

12 billion parameters

Try it out

Chatting with documents and images

Token limits

Context window length (input + output): 128,000

The maximum new tokens, which means the tokens generated by the foundation models per request, is limited to 8,192.

Supported natural languages

English

Instruction tuning information

The pixtral-12b model is trained with interleaved image and text data and is based on the Mistral Nemo model with a 400 million parameter vision encoder trained from scratch.

Model architecture

Decoder-only

License

Apache 2.0 license

Learn more

Read the following resources: