A list of large language models

10 March 2025

Authors

Cole Stryker

Editorial Lead, AI Models, Gather

A list of large language models

The generative AI (gen AI) boom has put a spotlight on the driving force behind it: large language models (LLMs). Dozens of LLMs already exist, but with the technology advancing rapidly, more of these artificial intelligence (AI) models continue to crop up.

Think of it through the lens of the auto industry. Hundreds of car manufacturers across the world have their own models catering to varied consumer needs. Cars have transformed over time too, from gas-powered automobiles to electric vehicles with many smart features.

The same is true for LLMs. These AI systems began as foundation models made up of multiple neural network layers trained on vast dataset volumes.

They employ deep learning techniques to accomplish natural language processing (NLP) and natural language understanding (NLU) tasks. However, their capabilities have improved to include agentic AI functions and reasoning.

This fast-paced evolution means that the LLM landscape is constantly changing. AI developers must continuously update their models or even build new ones to keep up with the swift progress.

While NLP and NLU tasks such as content summarization, machine translation, sentiment analysis and text generation continue to be mainstays, AI developers are tailoring their models to certain use cases.

For instance, some LLMs are crafted specifically for code generation, while others are made to handle vision language tasks.

While it’s impossible to mention every LLM out there, here’s a list of some of the most current and popular large language models to help organizations narrow their options and consider which model meets their needs:

Claude

Developer: Anthropic

Release date: February 2025 for Claude 3.7 Sonnet

Number of parameters: Not publicly disclosed

Context window: 200,000 tokens

License: Proprietary

Access: Anthropic API, Amazon Bedrock, Google Cloud Vertex AI

Input: Multimodal (image, text)

Output: Text

Claude is a family of LLMs based on a transformer architecture. It’s the large model behind the conversational AI assistant of the same name. Claude’s design is guided by constitutional AI principles, which focus on AI safety to reduce harmful behaviors such as AI bias.

The Claude family consists of 3 AI models:

    ● Claude Haiku

    ● Claude Sonnet

    ● Claude Opus

Claude Haiku

Claude 3.5 Haiku is the fastest model. It’s ideal for low-latency use cases, such as customer service chatbots and code completion to speed up software development workflows.

Claude Sonnet

Claude 3.7 Sonnet is what Anthropic calls its “most intelligent model to date.” This reasoning model has an “extended thinking” mode, allowing it to self-reflect before responding. Those using the Anthropic API can also specify how long the model can think for.

Claude 3.7 Sonnet can be implemented for more specific tasks such as code generation, computer use (allowing the LLM to use a computer the way a human does), extracting information from visual data and question answering.

Claude Opus

Claude 3 Opus is the most powerful model among the three. It can handle in-depth analysis and longer, more complex tasks with multiple steps.

Command

Developer: Cohere

Release date: April 2024 for Command R+ and December 2024 for Command R7B

Number of parameters: Up to 104 billion

Context window: 128,000 tokens

License: Proprietary

Access: Cohere API, Amazon Bedrock, Microsoft Azure AI Studio, Oracle Cloud Infrastructure Generative AI

Input: Text

Output: Text

Command is Cohere’s flagship language model. This family of enterprise-focused LLMs includes these models:

    ● Command R

    ● Command R+

    ● Command R7B

Command R

Command R is a multilingual text-generation model with 32 billion parameters.1 It has been trained to ground its retrieval augmented generation (RAG) ability by supplying citations in its responses. Command R also offers conversational tool use capabilities.

Command R+

Command R+ is a more powerful version with 104 billion parameters.2 It can handle complex RAG functions and multistep tool use, allowing AI agents to gather the latest information and update their knowledge base by calling on external tools.

Command R7B

Command R7B is the smallest and fastest model at 7 billion parameters. It’s ideal for CPU-based deployments, low-end GPUs and other edge devices and can be implemented for on-device inference.

DeepSeek-R1

Developer: DeepSeek

Release date: January 2025

Number of parameters: 671 billion

Context window: 128,000 tokens

License: Open source (MIT License)

Access: DeepSeek API, Hugging Face

Input: Text

Output: Text

DeepSeek-R1 is an open source reasoning model from Chinese AI startup DeepSeek. It uses a Mixture of Experts (MoE) machine learning architecture and was trained using large-scale reinforcement learning to refine its reasoning abilities.

DeepSeek-R1’s performance is similar to or even better than OpenAI’s o1 series of reasoning models on certain LLM benchmarks. DeepSeek-R1 also used knowledge distillation to fine-tune several smaller Llama and Qwen models using the reasoning data generated by the much bigger DeepSeek-R1 LLM.

The resulting distilled models enhanced the reasoning capabilities of their original counterparts and even had improved performance over other larger models.3

Falcon

Developer: Technology Innovation Institute

Release date: December 2024 for Falcon 3

Number of parameters: Up to 180 billion

Context window: Up to 32,000 tokens

License: Open source

Access: Hugging Face

Input: Text

Output: Text

Falcon is a group of open source models developed by researchers at the UAE’s Technology Innovation Institute (TII). These models were trained on TII’s own RefinedWeb, a huge dataset containing filtered English web data.

Falcon consists of these LLMs:

    ● Falcon 2

    ● Falcon 3

    ● Falcon Mamba 7B

Other earlier and larger Falcon versions include Falcon 40B with 40 billion parameters and Falcon 180B with 180 billion parameters.

Falcon 2

Falcon 2 11B is a causal decoder-only model with 11 billion parameters. It offers multilingual support and will soon feature vision-to-language capabilities.

Falcon 3

Falcon 3 takes on a decoder-only design and comes in lightweight parameter sizes of 1, 3, 7 and 10 billion. It improves upon its predecessor, advancing its reasoning capabilities.

Falcon Mamba 7B

Falcon Mamba 7B is a state space language model (SSLM), deviating from the typical LLM transformer architecture. Transformer models use an attention mechanism to “focus their attention” on the most important tokens in the input sequence. However, as the context window grows, transformers require more memory and computing power.

SSLMs continuously update a “state” during processing and employ a selection algorithm to adjust parameters dynamically according to the input. This allows Falcon Mamba 7B to process long sequences of text without needing additional memory and to generate new tokens in the same amount of time regardless of context length.

Gemini

Developer: Google DeepMind

Release date: December 2024

Number of parameters: Not publicly disclosed

Context window: 1 million tokens

License: Proprietary

Access: Gemini API, Google AI Studio, Google Cloud Vertex AI

Input: Multimodal (audio, image, text, video)

Output: Text

Gemini is Google’s suite of multimodal models. It also powers the generative AI chatbot (formerly known as Bard) of the same name.

Gemini employs a transformer model, a neural network architecture that originated from Google itself, and builds upon the company’s previous foundational language models, including BERT (Bidirectional Encoder Representations from Transformers) and PaLM 2 (Pathways Language Model).

The latest version, Gemini 2.0, is “built for the agentic era,” according to Google. Gemini 2.0 comes in various variants:

    ● Gemini 2.0 Flash

    ● Gemini 2.0 Flash-Lite

    ● Gemini 2.0 Flash-Thinking

    ● Gemini 2.0 Pro

Gemini 2.0 Flash

Gemini 2.0 Flash is a lightweight model that supports tool use. Features coming soon include image generation and text-to-speech.

Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is an improved version of the previous lightweight and cost-efficient 1.5 Flash. It retains the same speed and cost while enhancing quality.

Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is a reasoning model trained to break up a human language prompt into a sequence of steps, showing its thought process along the way for better explainability.4

It excels in math, science and multimodal reasoning LLM benchmarks. Gemini 2.0 Flash Thinking is currently in the experimental stage and supports only images and text.

Gemini 2.0 Pro

Gemini 2.0 Pro is what Google calls its strongest model for coding and tackling complex prompts due to its tool use capabilities and longer context window at 2 million tokens. It’s still in the experimental phase.

GPT

Developer: OpenAI

Release date: May 2024 for GPT-4o and July 2024 for GPT-4o mini

Number of parameters: Not publicly disclosed

Context window: 128,000 tokens

License: Proprietary

Access: OpenAI API using .NET, JavaScript, Python, TypeScript

Input: Multimodal (audio, image, text, video)

Output: Multimodal (audio, image, text)

Generative pretrained transformers (GPTs) are a line of large language models developed by OpenAI. GPT includes these LLMs:

    ● GPT-4o

    ● GPT-4o mini

GPT-4o

GPT-4o is a multilingual and multimodal model. As one of the most advanced LLMs, GPT-4o is capable of processing audio, text and visual inputs and producing any blend of audio, image and text outputs.

It has improved performance over its GPT-4 Turbo and GPT-4 predecessors. GPT-4o is the current LLM powering OpenAI’s ChatGPT generative AI chatbot.

GPT-4o mini

GPT-4o mini is a smaller, more affordable model that accepts image and text inputs and generates text outputs. It has surpassed GPT-3.5 Turbo in terms of performance.

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

Granite

Developer: IBM®

Release date: February 2025

Number of parameters: Up to 34 billion

Context window: 128,000 tokens

License: Open source (Apache 2.0)

Access: IBM® watsonx.ai™, Hugging Face, LM Studio, Ollama, Replicate

Input: Multimodal (image, text)

Output: Text

IBM® Granite™ is a series of enterprise-ready, open source LLMs. It includes these models:

    ● Granite 3.2

    ● Granite Vision

Granite 3.2

Granite 3.2 incorporates enhanced reasoning capabilities and advanced features for RAG tasks. It comes in 2 and 8 billion parameter sizes.

Granite 3.2’s training data is a mix of open source datasets with permissive license and internally collected high-quality synthetic datasets tailored for solving long-context problems.

Granite Vision

Granite Vision is a 2-billion-parameter vision language model tailored for visual document understanding. It’s designed for efficient content extraction from charts, diagrams and tables, making it suitable for structured data analysis.

Other LLMs in the Granite series consist of these specialized models:

    ● Granite Code

    ● Granite Guardian

    ● Granite Embedding

Granite Code

These decoder-only models are designed for code generative tasks, including code editing, code explanation and code generation. Granite Code models were trained with code written in 116 programming languages and are available in sizes of 3, 8, 20 and 34 billion parameters.

Granite Guardian

Granite Guardian models are LLM-based guardrails designed to detect risks in prompts and responses. Granite Guardian is available in 2, 3, 5 and 8 billion parameter sizes.

Granite Embedding

Granite Embedding models are sentence-transformer models purpose-built for retrieval-based applications such as semantic search and RAG.

Grok

Developer: xAI

Release date: February 2025 for Grok 3

Number of parameters: 314 billion

Context window: 128,000 tokens

License: Proprietary

Access: xAI API

Input: Multimodal (image, text)

Output: Text

Grok is a language model from xAI. The first-generation LLM, Grok-1, is an MoE model with 314 billion parameters. Due to its huge size, only 25% of Grok-1’s model weights are active on a given input token.

In March 2024, xAI released Grok-1.5 with a context window of 128,000 tokens and enhanced problem-solving capabilities. Five months later, xAI launched the beta versions of Grok-2 and its smaller version, Grok-2 mini. Grok-2 has even more improved chat, coding and reasoning abilities and adds support for vision-based tasks.

The latest releases, Grok 3 and Grok 3 mini, are equipped with advanced reasoning and AI agent functions.

Llama

Developer: Meta

Release date: December 2024 for Llama 3.3

Number of parameters: Up to 405 billion

Context window: 128,000 tokens

License: Open source

Access: Meta, Hugging Face, Kaggle

Input: Multimodal (image, text)

Output: Text

Llama is Meta AI’s collection of LLMs. These autoregressive models implement an optimized transformer architecture, with tuned versions that apply supervised fine-tuning and reinforcement learning with human feedback (RLHF).5

The Llama 3 collection succeeds the Llama 2 LLMs and offers these models:

    ● Llama 3.1

    ● Llama 3.2

    ● Llama 3.3

Llama 3.1

Llama 3.1 has an 8-billion-parameter model and a 405-billion-parameter flagship foundation model. Both are multilingual text-only models.

Llama 3.2

Llama 3.2 comes in 1 and 3 billion parameter sizes that are compact enough for mobile and edge devices. The 11 and 90 billion parameter sizes are multimodal LLMs optimized for answering general questions about an image, captioning, image reasoning and visual recognition.6

Llama 3.3

Llama 3.3 is a 70-billion-parameter multilingual text-only model. It has comparable or even improved performance than Llama 3.1 405B but is more cost-efficient.

Mistral

Developer: Mistral AI

Release date: July 2024 for Mistral Large 2

Number of parameters: Up to 124 billion

Context window: Up to 256,000 tokens

License: Mistral Research License, Mistral Commercial License, Apache 2.0

Access: La Plateforme, Amazon Bedrock, Microsoft Azure AI Studio, Google Cloud Vertex AI, IBM watsonx.ai

Input: Multimodal (image, text)

Output: Text

France-based company Mistral AI has a suite of LLMs encompassing these models:

    ● Mistral Large

    ● Mistral Small

    ● Codestral

    ● Pixtral Large

Mistral Large

Mistral Large 2 is Mistral AI’s flagship model. It has 123 billion parameters and a context window of 128,000 tokens. It performs well in code generation, math and reasoning. Mistral Large 2 offers multilingual support and function calling capabilities.

Mistral Small

Mistral Small 3 is a more compact version at 24 billion parameters. This model is suitable for rapid-response conversational AI, low-latency function calling and handling inference locally on resource-constrained machines. Mistral Small 3 is open source and released under the Apache 2.0 license.

Codestral

Codestral 25.01 is the latest generation of Mistral AI’s coding model. It features a context length of 256,000 tokens and supports tasks such as code completion, code correction, code generation and test generation.

Pixtral Large

Pixtral Large is a 124-billion-parameter multimodal model. It’s built on top of Mistral Large 2 and extends its capabilities to include image understanding.

o1

Developer: OpenAI

Release date: September 2024 for o1, January 2025 for o3-mini

Number of parameters: Not publicly disclosed

Context window: Up to 200,000 tokens

License: Proprietary

Access: OpenAI API

Input: Multimodal (image, text)

Output: Text

The o1 series of AI models includes o1 and o1-mini. Compared to OpenAI’s GPT models, o1 LLMs are equipped with more advanced reasoning capabilities. Both o1 and o1-mini were trained with large-scale reinforcement learning, allowing them to “think” before responding. They can generate a long chain of thought before answering.

The o1 LLM accepts both image and text inputs, while o1-mini can only handle text inputs.7 Compared to o1, o1-mini is smaller, faster and more cost-effective. It also excels at STEM reasoning and coding.

Meanwhile, o3-mini is the latest reasoning model. Like o1-mini, its strength lies in coding, math and science. It supports function calling and offers 3 reasoning effort options (low, medium and high) to optimize for different scenarios, such as complex problems that need more reasoning effort or simpler problems that require rapid responses and can use less reasoning.

AI Academy

Why foundation models are a paradigm shift for AI

Learn about a new class of flexible, reusable AI models that can unlock new revenue, reduce costs and increase productivity, then use our guidebook to dive deeper.

Qwen

Developer: Alibaba Cloud

Release date: September 2024 for Qwen 2.5 and January 2025 for Qwen2.5-Max

Number of parameters: Up to 72 billion

Context window: Up to 1 million tokens

License: Open source (Apache 2.0), Proprietary for larger models

Access: Alibaba Cloud, Hugging Face

Input: Multimodal (audio, image, text, video)

Output: Text

Qwen is a series of LLMs from Chinese cloud computing company Alibaba Cloud. Qwen includes language models and variants optimized for audio, coding, math and vision tasks.

Qwen offers these models:

    ● Qwen 2.5

    ● Qwen Audio

    ● Qwen Coder

    ● Qwen Math

    ● Qwen VL

Qwen 2.5

Qwen2.5 models are decoder-only models for multilingual language processing tasks. They come in 0.5, 3, 7, 14, 32 and 72 billion parameter sizes. Larger models, such as the 72-billion variant, are available only through API access on Alibaba’s proprietary cloud platform.

Qwen2.5-Turbo features a longer context length of 1 million tokens and a quicker inference speed. Meanwhile, Qwen2.5-Max is the latest large-scale MoE model.

Qwen Audio

Qwen 2 Audio is purpose-built for audio-based tasks. This 7-billion-parameter model can be used to transcribe, detect and classify sounds, handle voice commands and identify musical elements.

Qwen Coder

Qwen2.5 Coder is a code-specific LLM. It’s available in 1.5, 7, 14, and 32 billion parameter sizes.

Qwen Math

Qwen 2 Math is a collection of math-optimized LLMs. These models are suitable for advanced mathematical reasoning and solving complex math problems. Qwen 2 Math comes in 1.5, 7 and 72 billion parameter sizes.

Qwen VL

Qwen 2 VL is a vision-language model that combines visual processing with natural language understanding. Sample use cases entail extracting information from visual data and generating captions and summaries for images and videos. Qwen 2 VL is available in 2, 7 and 72 billion parameter sizes.

Stable LM

Developer: Stability AI

Release date: April 2024 for Stable LM 2 12B

Number of parameters: Up to 12 billion

Context window: 4,096 tokens

License: Stability AI Community License or Enterprise License

Access: Stability AI, Hugging Face

Input: Text

Output: Text

Stable LM is a group of open-access language models from Stability AI, the makers of text-to-image model Stable Diffusion. Stable LM 2 12B has 12 billion parameters, while Stable LM 2 1.6B has 1.6 billion parameters. These are decoder-only LLMs trained on multilingual data and code datasets. Both models incorporate function calling and tool use.

Stable Code 3B is another LLM fine-tuned on code-related datasets. As a lightweight model with 3 billion parameters, Stable Code 3B can be run in real time on devices, even those without a GPU.

Footnotes

All links reside outside ibm.com

1 Model Card for C4AI Command R 08-2024, Hugging Face, Accessed 14 February 2025.

2 Model Card for C4AI Command R+ 08-2024, Hugging Face, Accessed 14 February 2025.

3 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, GitHub, 23 January 2025.

4 Access the latest 2.0 experimental models in the Gemini app, Google, 5 February 2025.

5 Model Information, GitHub, 30 September 2024.

6 Model Information, GitHub, 30 September 2024.

7 o1 and o1-mini, OpenAI, Accessed 14 February 2025.

Related solutions
Foundation models

Explore the IBM library of foundation models in the watsonx portfolio to scale generative AI for your business with confidence.

Discover watsonx.ai
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Explore the IBM library of foundation models in the IBM watsonx portfolio to scale generative AI for your business with confidence.

Explore watsonx.ai Explore AI solutions