My IBM

What is Mistral AI?

01 October 2024

Mistral AI is a France-based artificial intelligence (AI) startup known primarily for its open source large language models (LLMs). Since its founding in 2023, it has become one of the world’s leading generative AI developers.

Mistral AI was cofounded in April 2023 by Arthur Mensch, formerly of Google DeepMind, alongside Guillaume Lample and Timothée Lacroix, formerly of Meta AI. The cofounders, who originally met while studying at École Polytechnique in the suburbs of Paris, named their company after the strong north westerly wind that blows from southern France into the Mediterranean. By valuation, the French company was the largest AI startup in Europe and the largest outside of the San Francisco Bay Area, as of June 2024.¹

At DeepMind, Mensch was one of the lead authors of the seminal paper, “Training compute-optimal large language models”. The paper, and the “Chinchilla” model introduced therein, explored scaling laws for LLMs and introduced several highly influential discoveries regarding the relationship between model size, training data, efficiency and performance for autoregressive language models. At Meta, Lacroix and Lample were among the researchers behind the original LLaMa models.

The cofounders’ combined expertise in efficiency and LLM development has yielded an array of mostly open source models whose performance often matches that of significantly larger LLMs. Among the European company’s most notable early contributions to the development of generative AI were innovations in sparse mixture of experts (MoE) models.

Its stated mission involves a “strong commitment to open, portable and customizable solutions, and an extreme focus on shipping the most advanced technology in limited time.”

Mistral AI models

Mistral AI generally divides its LLMs into 3 categories: “general purpose” models, “specialist” models and “research” models.

Though Mistral offers many of its models with open weights across most common machine learning (ML) platforms under an Apache 2.0 license, it typically places some constraints on commercial deployment for its most performant models.

Mistral uses a simple, albeit unconventional, naming system for its models. The names of some models, such as Mistral 7B or Pixtral 12B, indicate parameter counts, while others refer to size more descriptively, such as “Mistral Large” or “Mistral Small,” or not at all. Many, like “Mixtral” or “Mathstral,” entail a play on the company’s name.

Some model version updates are reflected in primary model names, while others are not. For instance, Mistral Large and Mistral Small were first released in February 2024. The former was updated in July as “Mistral Large 2,” but the latter remained “Mistral Small” after a September update.

General purpose models

The models that Mistral AI categories as “general purpose” models are typically text-in, text-out LLMs that approach state-of-the-art performance for their respective model sizes, costs or computational demands. As the category’s name suggests, these models are well suited for general natural language processing (NLP) and text generation use cases.

Mistral Large 2

Mistral Large 2 is Mistral’s flagship LLM and largest model. Upon its release in September 2024, its performance on common benchmarks bested all open models (except the much larger Meta Llama 3.1 405B) and rivaled that of many leading closed models.

With 123B parameters, Mistral Large 2 occupies a unique niche in the LLM landscape, being larger than any “mid-size” model but significantly smaller than its direct competitors. In its official release announcement, Mistral AI indicated that the model was sized with the goal of enabling it to run at large throughput on a single node.

Per Mistral AI, the multilingual Mistral Large 2 supports dozens of languages, including English, French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese and Korean. It also includes support for over 80 coding languages.

Mistral Large 2 was released under the Mistral Research License, allows open usage and modification only for noncommercial purposes. Commercial deployment requires contacting the AI provider directly to request a Mistral Commercial License or accessing it through select partners, such as IBM watsonx™.

Mistral Small

Mistral Small was first released in February 2024 as an enterprise model, but was relegated to “legacy model” status before receiving an overhaul and returning as an “enterprise-grade” model, Mistral Small v24.09, in September. Despite its name, Mistral offers multiple models smaller than Mistral Small.

At 22B parameters, Mistral Small represents a cost-efficient midpoint between Mistral Larger 2 and the smaller Mistral NeMo 12B. Like Mistral Large 2, Mistral Small 24.09 is offered under the Mistral Research License.

Mistral NeMo

Mistral NeMo was built in collaboration with NVIDIA. At 12B parameters, it’s among the most performant models in its size category, with multilingual support for romance languages, Chinese, Japanese, Korean, Hindi and Arabic. Of Mistral’s general-purpose models, Mistral NeMo is the only LLM that’s fully open sourced under an Apache 2.0 license.

Specialist models

Unlike its general purpose models, Mistral AI’s “specialist” models are trained for specific tasks and domains, rather than for general text-in, text-out applications.

It’s worth noting, however, that this is not a rigid designation: Mistral AI categorizes some additional specialized models, such as Mathstral, under “research models” rather than “specialist models.” The distinction is based primarily on available usage rights: specialist models might have certain restrictions on deployment environments or commercial use, whereas research models do not.

Codestral

Codestral is a 22B open-weight model specializing in code generation tasks, fluent in over 80 programming models including Python, Java, C, C++, JavaScript, Bash, Swift and Fortran. It was released under the Mistral AI Non-Production License, allowing its use for research and testing purposes. Commercial licenses can be granted on request through contacting Mistral directly.

Mistral Embed

Mistral Embed is an embedding model trained to generate word embeddings. At present, it only supports the English language.

Pixtral 12B

Pixtral 12B is an open multimodal model, offered under an Apache 2.0 license, capable of both text-in, text-out and image-in, text-out tasks. Its architecture combines a 12B multimodal decoder based on Mistral Nemo and a 400M parameter vision encoder trained from scratch on image data. Pixtral can be used in conversational interfaces, similarly to how one interacts with standard text-only LLMs, with the added ability to upload images and prompt the model to answer questions about them.

Relative to multimodal models of comparable size, both proprietary and open source, Pixtral achieved highly competitive results on most multimodal benchmarks. For instance, Pixtral outperformed Anthropic’s Claude 3 Haiku, Google’s Gemini 1.5 Flash 8B and Microsoft’s Phi 3.5 Vision models on benchmarks measuring college-level problem solving (MMMU), visual mathematical reasoning (MathVista), chart understanding (ChartQA), document understanding (DocQA), and general vision question answering (VQAv2).²

Research models

Mistral’s research models are each offered as fully open source models, with no restrictions on commercial use, deployment environments or the ability to fine-tune.

Mixtral

Mixtral is a family of decoder-only sparse mixture of experts (MoE) models. Unlike conventional feedforward neural networks, which use the entire network for each inference, MoE models are subdivided into distinct groups of parameters called experts. For each token, a router network selects only a certain number of experts at each layer to process the input.

In training, this structure enables each expert network to specialize in the processing of certain kinds of inputs. During inference, the model uses only a fraction of the total available parameters—specifically, the parameters in the expert networks best suited to the task at hand—for each input. In doing so, the MoE architecture significantly reduces the cost and latency of inference without a corresponding decrease in performance.

Mixtral is offered in 2 variants, each of which is subdivided into 8 expert networks: Mixtral 8x7B and Mixtral 8x22B. The former is among the foundation models available in IBM watsonx.

Mathstral

Mathstral is a variant of Mistral 7B—which is now relegated to “legacy model” status—optimized for solving mathematical problems, available under the Apache 2.0 license.

Codestral Mamba

Whereas the original Codestral model uses the standard transformer architecture common to nearly all large language models, Codestral Mamba uses the distinct mamba architecture. Research on Mamba models are still in the earliest stage—Mamba was first introduced in a 2023 paper—but the novel architecture offers significant theoretical advantage in both speed and context length.

Le Chat

Le Chat is Mistral’s chatbot service, similar to OpenAI’s ChatGPT, first released to beta on 26 February, 2024. Alongside Mistral Large and Mistral Small, Mistral recently added the multimodal Pixtral 12B to the roster of LLMs available in Le Chat.

La Plateforme

La Plateforme is Mistral’s development and deployment API-serving platform, providing API endpoints and an ecosystem to experiment, fine-tune on custom data sets, evaluate and prototype with Mistral models.

How to choose the right foundation model

Learn how to choose the right approach in preparing datasets and employing foundation models.

Resources

Explore IBM Granite

Discover IBM® Granite™, our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

Discover the power of LLMs

Dive into IBM Developer articles, blogs and tutorials to deepen your knowledge of LLMs.

The CEO’s guide to model optimization

Learn how to continually push teams to improve model performance and outpace the competition by using the latest AI techniques and infrastructure.

A differentiated approach to AI foundation models

Explore the value of enterprise-grade foundation models that provide trust, performance and cost-effective benefits to all industries.

Unlock the Power of Generative AI and ML

Learn how to incorporate generative AI, machine learning and foundation models into your business operations for improved performance.

AI in Action 2024

Read about 2,000 organizations we surveyed about their AI initiatives to discover what’s working, what’s not and how you can get ahead.