On Wednesday, 24 July 2024, Mistral AI announced the release of Mistral Large 2, an advanced multilingual large language model (LLM) that “vastly outperforms” the previous version of Mistral Large released in February of this year. The new and improved model offers exciting advances over its predecessor in code generation, mathematics, reasoning, instruction following, function calling and support for a wide array of languages.
Mistral Large 2 was released under the Mistral Research License, allowing open usage and modification for research and noncommercial purposes. While commercial usage entailing self-deployment requires contacting Mistral AI to request a Mistral Commercial License, Mistral Large 2 is now readily available for commercial deployment in IBM® watsonx™.
Mistral Large 2 now replaces Mistral Large in the watsonx foundation model catalog with no increase in pricing, reflecting IBM’s commitment to providing clients with the latest and greatest open models available on the market.
Mistral Large 2 is “large enough” to compete with leading models
Mistral Large 2, or (more officially) Mistral-Large-2407, is a dense, transformer-based LLM of 123 billion parameters. “Dense,” in this context, implies a conventional neural network architecture (in contrast to “sparse” mixture of experts architectures employed by models such as Mistral’s Mixtral-8x7B).
At 123 billion parameters, Mistral Large 2 carves out a unique niche in an LLM landscape that generally leaps from models of 70 billion parameters directly to models with many hundreds of billions, or even trillions, of parameters. In its official release announcement, Mistral AI indicated that Mistral Large 2 was sized with the goal of allowing it to run at large throughout on a single node.
Though most state-of-the-art LLMs (with the exception of Meta’s Llama 3 405B) are closed models whose parameter counts are often not disclosed, all available information would indicate that Mistral Large 2 is competitive with leading models, despite its significantly leaner parameter count.
For example, with a score of 84.0% on the MMLU benchmark—which evaluates general undergraduate level knowledge—the base pretrained Mistral Large 2 rivals all open models except the much larger Llama 3 405B. It also matches or beats many proprietary models, including Google’s largest model, Gemini 1.0 Ultra.
Vastly improved code and mathematics
The updated Mistral Large 2 was trained on a very large proportion of code, enabling it to “vastly outperform” its predecessor and achieve parity with cutting-edge models such as OpenAI’s GPT-4o, Anthropic’s Claude 3 Opus and Meta Llama 3 405B. Mistral Large 2 includes support for an impressive range of over 80 coding languages.
When evaluated across multiple programming languages, including Python, C++, Bash, Java, TypeScript, PHP and C#, Mistral Large 2’s average performance accuracy (76.9%) rivaled that of GPT-4o (77.9%). On the popular HumanEval code benchmark, the instruction-tuned Mistral Large 2 achieved a state-of-the-art 92.0%, matched only by Claude 3.5 Sonnet.
Mistral Large 2 also achieves impressive mathematical performance. With a score of 71.5% on the MATH problem-solving benchmark, Mistral Large 2 significantly beats many closed models, including Gemini 1.5 Pro, Gemini 1.0 Ultra, GPT-4 and Claude 3 Opus.
Enhanced reasoning, accuracy and accountability
Mistral AI put forth significant effort into enhancing Mistral Large 2’s reasoning capabilities, particularly with regard to minimizing hallucinations and plausible-sounding (but false) responses. A key part of that effort involved training Mistral Large 2 to acknowledge when it can’t find a solution or lacks sufficient information to provide a confident answer.
This commitment to safety, echoing IBM’s own dedication to responsible innovation in AI and Forrester-endorsed approach to trustworthy AI, helps businesses enjoy the benefits of generative AI while minimizing its associated risks. As such, Mistral Large is the first third-party model to be distributed in watsonx under IBM’s own customer license agreements.
These assurances allow IBM customers to deploy Mistral Large 2, as well as models from the IBM Granite family, with full confidence even in an enterprise environment.
Concise conversations and intelligent instruction-following
While generating lengthy responses can often improve a model’s score on certain performance benchmarks, conciseness is important in a business context. Shorter model responses typically facilitate quicker interactions and more cost-effective inference. Mistral AI endeavored to optimize Mistral Large for generating succinct, to-the-point responses.
To quantify their success in that endeavor, Mistral measured the average length of generated responses from Mistral Large in comparison to responses from competitors, including Claude 3 Opus, Claude 3.5 Sonnet, Llama 3.1 and GPT-4o. Despite having the second shortest average response length, Mistral Large 2 bested nearly all models on MT-Bench (with GPT-4o as a judge) and similar instruction-following benchmarks.
Advanced tool use and function calling
Mistral Large 2 was trained for enhanced function calling—such as those necessary for integration with enterprise applications in watsonx—and skills for retrieval augmented generation (RAG).
Robust multilingual support
Mistral Large 2 supports dozens of languages, including English, French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese and Korean.
Whereas some ostensibly multilingual LLMs struggle to maintain their overall performance and accuracy across languages (usually due to relying on a disproportionate amount of training data in the model’s primary language), Mistral Large 2 maintains impressive performance across its language spectrum.
Per Mistral AI’s release announcement, the model’s multilingual MMLU performance is remarkably consistent for French (82.8%), German (81.6%), Spanish (82.7%), Italian (82.7%) and Portuguese (81.6%). This consistency even crosses alphabets, as demonstrated by competitive marks for Russian (79.0%) and Japanese (78.8%).
Get started with Mistral Large 2 in IBM watsonx.ai™
Continued partnership with Mistral AI reflects IBM’s commitment to furthering open source innovation in AI and providing our clients with access to best-in-class open models in watsonx, including both third-party models and the IBM Granite model family.
Effectively implementing generative AI is not as simple as selecting the most performant general-purpose model available. Choosing a model entails a series of tradeoffs. For instance, larger, more capable models typically entail larger inference costs, while increased model accuracy typically entails decreased speed. Identifying the ideal model requires thoughtful consideration of competing priorities to strike the right balance for the task at hand.
With its wide selection of foundation models, IBM watsonx helps clients match the right model to the right use case and explore the optimal balance of price and performance. Models in the watsonx platform can be further customized for clients’ specific needs through flexible deployment options and the ability to fine-tune models on your own data.
Mistral Large 2 is currently available as SaaS deployed through IBM watsonx.ai, with support for on-premises deployment and fine-tuning coming later this year. Readily build custom AI applications for your business, manage all data sources and accelerate responsible AI workflows—all on one platform.
Try out Mistral Large 2 in watsonx.ai™