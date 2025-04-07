7 April 2025
IBM is excited to announce the addition of Meta’s latest generation of open models, Llama 4, to watsonx.ai. Llama 4 Scout and Llama 4 Maverick, the first mixture of experts (MoE) models released by Meta, provide frontier multimodal performance, high speeds, low cost, and industry leading context length.
The release of Llama 4 initiates a new era for the Llama series, introducing both an exciting evolution of the Llama architecture and an innovative approach to integrating different types of data modalities—including text, image, video—much earlier in the process than conventionally trained models. Both new models support a wide variety of text-in, text-out and image-in, text-out use cases.
With the introduction of these latest offerings from Meta, IBM now supports a total of 13 Meta models in the expansive library of foundation models available in watsonx.ai. In keeping with IBM’s open, multi-model strategy for generative AI, we continue to provide our platform customers with the most performant open models on the market today.
The mixture of experts (MoE) architecture aims to balance the knowledge capacity of larger models with the inference efficiency of smaller models by subdividing the layers of the model’s neural network into multiple “experts.” Rather than activating every model parameter for each token, MoE models using a gating function that activates only the “experts” best suited to processing that token.
Llama 4 Scout, the smaller of the two new models with a total parameter count 109B, is divided into 16 experts. At inference, it has an active parameter count of only 17B, enabling it to serve more users in parallel. Trained on 40 trillion tokens of data, Llama 4 Scout offers performance rivalling or exceeding that of models with significantly larger active parameter counts while keeping costs and latency low. Despite those lean compute requirements, Llama 4 Scout beats comparable models on coding, reasoning, long context and image understanding benchmarks.
Llama 4 Maverick is divided into 128 experts, drawing from the knowledge of its 400B total parameters while maintaining the same 17B active parameter count as Llama 4 Scout. Per Meta AI’s official announcement, Llama 4 Maverick beats OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash “across the board” on a wide range of multimodal benchmarks and rivals the reasoning and coding performance of the much larger DeepSeek-V3 on reasoning and coding tasks.
In addition, Llama 4 Scout offers an industry-best context window of 10 million tokens while preserving excellent accuracy on long-context benchmarks such as Needle-in-a-haystack (NiH). This unprecedented leap forward opens up exciting opportunities for multi-document summarization, reasoning over vast codebases, and personalization through an extensive memory of user activity.
As Meta’s announcement explains, this massive expansion in context length comes primarily from two innovations: the use of interleaved attention layers without positional embeddings and inference-time temperature scaling of the models’ attention mechanism. This novel architecture, which Meta calls “iRope,” represents an important step toward Meta’s long-term goal of supporting “infinite” context length.
Whereas large language models (LLMs) are conventionally pre-trained exclusively on text data, then adapted to other data modalities (such as image data) afterwards during post-training, Llama 4 models are designed with “native multimodality.” This allowed Meta to jointly pre-train the models with large quantities of unlabeled text, image and video data all at once, efficiently enriching the models with integrated knowledge from diverse sources.
Training of the Llama 4 models incorporated the “fusion” of different types of data early in the processing pipeline, seamless integrating text and vision tokens to enable them to train as a single unified system. Consequently, Llama 4 Maverick and Llama 4 Scout offer excellent performance on an array of image understanding tasks, able to both address text prompts pertaining to multiple images at once or anchor model responses to specific regions with a single image.
Developers and businesses can select their preferred Llama 4 model from the extensive catalog of foundation models on IBM watsonx.ai, then fine-tune, distill and deploy it across cloud, on-premises, or edge environments of their choice. IBM further enhances this flexibility with its advanced AI infrastructure, seamless integration with agent frameworks and compatibility with vector databases.
IBM watsonx streamlines development with a suite of code, low-code and no-code tools in an enterprise-grade studio that supports the entire AI lifecycle while fostering collaboration across teams. IBM watsonx also offers robust end-to-end AI governance, ensuring responsible and accelerated workflows. Leveraging its deep expertise in technology transformation, IBM’s partnership with Meta delivers tailored strategies to efficiently and effectively address specific enterprise needs.
See how to build an AI Personal Trainer with Meta Llama 4 on watsonx.ai.
Start using Llama 4 models on watsonx.ai today.