The generative AI (gen AI) boom has put a spotlight on the driving force behind it: large language models (LLMs). Dozens of LLMs already exist, but with the technology advancing rapidly, more of these artificial intelligence (AI) models continue to crop up.
Think of it through the lens of the auto industry. Hundreds of car manufacturers across the world have their own models catering to varied consumer needs. Cars have transformed over time too, from gas-powered automobiles to electric vehicles with many smart features.
The same is true for LLMs. These AI systems began as foundation models made up of multiple neural network layers trained on vast dataset volumes.
They employ deep learning techniques to accomplish natural language processing (NLP) and natural language understanding (NLU) tasks. However, their capabilities have improved to include agentic AI functions and reasoning.
This fast-paced evolution means that the LLM landscape is constantly changing. AI developers must continuously update their models or even build new ones to keep up with the swift progress.
While NLP and NLU tasks such as content summarization, machine translation, sentiment analysis and text generation continue to be mainstays, AI developers are tailoring their models to certain use cases.
For instance, some LLMs are crafted specifically for code generation, while others are made to handle vision language tasks.
While it’s impossible to mention every LLM out there, here’s a list of some of the most current and popular large language models to help organizations narrow their options and consider which model meets their needs:
Developer: Anthropic
Release date: February 2025 for Claude 3.7 Sonnet
Number of parameters: Not publicly disclosed
Context window: 200,000 tokens
License: Proprietary
Access: Anthropic API, Amazon Bedrock, Google Cloud Vertex AI
Input: Multimodal (image, text)
Output: Text
Claude is a family of LLMs based on a transformer architecture. It’s the large model behind the conversational AI assistant of the same name. Claude’s design is guided by constitutional AI principles, which focus on AI safety to reduce harmful behaviors such as AI bias.
The Claude family consists of 3 AI models:
● Claude Haiku
● Claude Sonnet
● Claude Opus
Claude 3.5 Haiku is the fastest model. It’s ideal for low-latency use cases, such as customer service chatbots and code completion to speed up software development workflows.
Claude 3.7 Sonnet is what Anthropic calls its “most intelligent model to date.” This reasoning model has an “extended thinking” mode, allowing it to self-reflect before responding. Those using the Anthropic API can also specify how long the model can think for.
Claude 3.7 Sonnet can be implemented for more specific tasks such as code generation, computer use (allowing the LLM to use a computer the way a human does), extracting information from visual data and question answering.
Claude 3 Opus is the most powerful model among the three. It can handle in-depth analysis and longer, more complex tasks with multiple steps.
Developer: Cohere
Release date: April 2024 for Command R+ and December 2024 for Command R7B
Number of parameters: Up to 104 billion
Context window: 128,000 tokens
License: Proprietary
Access: Cohere API, Amazon Bedrock, Microsoft Azure AI Studio, Oracle Cloud Infrastructure Generative AI
Input: Text
Output: Text
Command is Cohere’s flagship language model. This family of enterprise-focused LLMs includes these models:
● Command R
● Command R+
● Command R7B
Command R is a multilingual text-generation model with 32 billion parameters.1 It has been trained to ground its retrieval augmented generation (RAG) ability by supplying citations in its responses. Command R also offers conversational tool use capabilities.
Command R+ is a more powerful version with 104 billion parameters.2 It can handle complex RAG functions and multistep tool use, allowing AI agents to gather the latest information and update their knowledge base by calling on external tools.
Command R7B is the smallest and fastest model at 7 billion parameters. It’s ideal for CPU-based deployments, low-end GPUs and other edge devices and can be implemented for on-device inference.
Developer: DeepSeek
Release date: January 2025
Number of parameters: 671 billion
Context window: 128,000 tokens
License: Open source (MIT License)
Access: DeepSeek API, Hugging Face
Input: Text
Output: Text
DeepSeek-R1 is an open source reasoning model from Chinese AI startup DeepSeek. It uses a Mixture of Experts (MoE) machine learning architecture and was trained using large-scale reinforcement learning to refine its reasoning abilities.
DeepSeek-R1’s performance is similar to or even better than OpenAI’s o1 series of reasoning models on certain LLM benchmarks. DeepSeek-R1 also used knowledge distillation to fine-tune several smaller Llama and Qwen models using the reasoning data generated by the much bigger DeepSeek-R1 LLM.
The resulting distilled models enhanced the reasoning capabilities of their original counterparts and even had improved performance over other larger models.3
Developer: Technology Innovation Institute
Release date: December 2024 for Falcon 3
Number of parameters: Up to 180 billion
Context window: Up to 32,000 tokens
License: Open source
Access: Hugging Face
Input: Text
Output: Text
Falcon is a group of open source models developed by researchers at the UAE’s Technology Innovation Institute (TII). These models were trained on TII’s own RefinedWeb, a huge dataset containing filtered English web data.
Falcon consists of these LLMs:
● Falcon 2
● Falcon 3
● Falcon Mamba 7B
Other earlier and larger Falcon versions include Falcon 40B with 40 billion parameters and Falcon 180B with 180 billion parameters.
Falcon 2 11B is a causal decoder-only model with 11 billion parameters. It offers multilingual support and will soon feature vision-to-language capabilities.
Falcon 3 takes on a decoder-only design and comes in lightweight parameter sizes of 1, 3, 7 and 10 billion. It improves upon its predecessor, advancing its reasoning capabilities.
Falcon Mamba 7B is a state space language model (SSLM), deviating from the typical LLM transformer architecture. Transformer models use an attention mechanism to “focus their attention” on the most important tokens in the input sequence. However, as the context window grows, transformers require more memory and computing power.
SSLMs continuously update a “state” during processing and employ a selection algorithm to adjust parameters dynamically according to the input. This allows Falcon Mamba 7B to process long sequences of text without needing additional memory and to generate new tokens in the same amount of time regardless of context length.
Number of parameters: Not publicly disclosed
Context window: 1 million tokens
License: Proprietary
Access: Gemini API, Google AI Studio, Google Cloud Vertex AI
Input: Multimodal (audio, image, text, video)
Output: Text
Gemini is Google’s suite of multimodal models. It also powers the generative AI chatbot (formerly known as Bard) of the same name.
Gemini employs a transformer model, a neural network architecture that originated from Google itself, and builds upon the company’s previous foundational language models, including BERT (Bidirectional Encoder Representations from Transformers) and PaLM 2 (Pathways Language Model).
The latest version, Gemini 2.0, is “built for the agentic era,” according to Google. Gemini 2.0 comes in various variants:
● Gemini 2.0 Flash
● Gemini 2.0 Flash-Lite
● Gemini 2.0 Flash-Thinking
● Gemini 2.0 Pro
Gemini 2.0 Flash is a lightweight model that supports tool use. Features coming soon include image generation and text-to-speech.
Gemini 2.0 Flash-Lite is an improved version of the previous lightweight and cost-efficient 1.5 Flash. It retains the same speed and cost while enhancing quality.
Gemini 2.0 Flash Thinking is a reasoning model trained to break up a human language prompt into a sequence of steps, showing its thought process along the way for better explainability.4
It excels in math, science and multimodal reasoning LLM benchmarks. Gemini 2.0 Flash Thinking is currently in the experimental stage and supports only images and text.
Gemini 2.0 Pro is what Google calls its strongest model for coding and tackling complex prompts due to its tool use capabilities and longer context window at 2 million tokens. It’s still in the experimental phase.
Developer: OpenAI
Release date: May 2024 for GPT-4o and July 2024 for GPT-4o mini
Number of parameters: Not publicly disclosed
Context window: 128,000 tokens
License: Proprietary
Access: OpenAI API using .NET, JavaScript, Python, TypeScript
Input: Multimodal (audio, image, text, video)
Output: Multimodal (audio, image, text)
Generative pretrained transformers (GPTs) are a line of large language models developed by OpenAI. GPT includes these LLMs:
● GPT-4o
● GPT-4o mini
GPT-4o is a multilingual and multimodal model. As one of the most advanced LLMs, GPT-4o is capable of processing audio, text and visual inputs and producing any blend of audio, image and text outputs.
It has improved performance over its GPT-4 Turbo and GPT-4 predecessors. GPT-4o is the current LLM powering OpenAI’s ChatGPT generative AI chatbot.
GPT-4o mini is a smaller, more affordable model that accepts image and text inputs and generates text outputs. It has surpassed GPT-3.5 Turbo in terms of performance.
Developer: IBM®
Release date: February 2025
Number of parameters: Up to 34 billion
Context window: 128,000 tokens
License: Open source (Apache 2.0)
Access: IBM® watsonx.ai™, Hugging Face, LM Studio, Ollama, Replicate
Input: Multimodal (image, text)
Output: Text
IBM® Granite™ is a series of enterprise-ready, open source LLMs. It includes these models:
● Granite 3.2
● Granite Vision
Granite 3.2 incorporates enhanced reasoning capabilities and advanced features for RAG tasks. It comes in 2 and 8 billion parameter sizes.
Granite 3.2’s training data is a mix of open source datasets with permissive license and internally collected high-quality synthetic datasets tailored for solving long-context problems.
Granite Vision is a 2-billion-parameter vision language model tailored for visual document understanding. It’s designed for efficient content extraction from charts, diagrams and tables, making it suitable for structured data analysis.
Other LLMs in the Granite series consist of these specialized models:
● Granite Code
● Granite Guardian
● Granite Embedding
These decoder-only models are designed for code generative tasks, including code editing, code explanation and code generation. Granite Code models were trained with code written in 116 programming languages and are available in sizes of 3, 8, 20 and 34 billion parameters.
Granite Guardian models are LLM-based guardrails designed to detect risks in prompts and responses. Granite Guardian is available in 2, 3, 5 and 8 billion parameter sizes.
Granite Embedding models are sentence-transformer models purpose-built for retrieval-based applications such as semantic search and RAG.
Developer: xAI
Release date: February 2025 for Grok 3
Number of parameters: 314 billion
Context window: 128,000 tokens
License: Proprietary
Access: xAI API
Input: Multimodal (image, text)
Output: Text
Grok is a language model from xAI. The first-generation LLM, Grok-1, is an MoE model with 314 billion parameters. Due to its huge size, only 25% of Grok-1’s model weights are active on a given input token.
In March 2024, xAI released Grok-1.5 with a context window of 128,000 tokens and enhanced problem-solving capabilities. Five months later, xAI launched the beta versions of Grok-2 and its smaller version, Grok-2 mini. Grok-2 has even more improved chat, coding and reasoning abilities and adds support for vision-based tasks.
The latest releases, Grok 3 and Grok 3 mini, are equipped with advanced reasoning and AI agent functions.
Developer: Meta
Release date: December 2024 for Llama 3.3
Number of parameters: Up to 405 billion
Context window: 128,000 tokens
License: Open source
Access: Meta, Hugging Face, Kaggle
Input: Multimodal (image, text)
Output: Text
Llama is Meta AI’s collection of LLMs. These autoregressive models implement an optimized transformer architecture, with tuned versions that apply supervised fine-tuning and reinforcement learning with human feedback (RLHF).5
The Llama 3 collection succeeds the Llama 2 LLMs and offers these models:
● Llama 3.1
● Llama 3.2
● Llama 3.3
Llama 3.1 has an 8-billion-parameter model and a 405-billion-parameter flagship foundation model. Both are multilingual text-only models.
Llama 3.2 comes in 1 and 3 billion parameter sizes that are compact enough for mobile and edge devices. The 11 and 90 billion parameter sizes are multimodal LLMs optimized for answering general questions about an image, captioning, image reasoning and visual recognition.6
Llama 3.3 is a 70-billion-parameter multilingual text-only model. It has comparable or even improved performance than Llama 3.1 405B but is more cost-efficient.
Developer: Mistral AI
Release date: July 2024 for Mistral Large 2
Number of parameters: Up to 124 billion
Context window: Up to 256,000 tokens
License: Mistral Research License, Mistral Commercial License, Apache 2.0
Access: La Plateforme, Amazon Bedrock, Microsoft Azure AI Studio, Google Cloud Vertex AI, IBM watsonx.ai
Input: Multimodal (image, text)
Output: Text
France-based company Mistral AI has a suite of LLMs encompassing these models:
● Mistral Large
● Mistral Small
● Codestral
● Pixtral Large
Mistral Large 2 is Mistral AI’s flagship model. It has 123 billion parameters and a context window of 128,000 tokens. It performs well in code generation, math and reasoning. Mistral Large 2 offers multilingual support and function calling capabilities.
Mistral Small 3 is a more compact version at 24 billion parameters. This model is suitable for rapid-response conversational AI, low-latency function calling and handling inference locally on resource-constrained machines. Mistral Small 3 is open source and released under the Apache 2.0 license.
Codestral 25.01 is the latest generation of Mistral AI’s coding model. It features a context length of 256,000 tokens and supports tasks such as code completion, code correction, code generation and test generation.
Pixtral Large is a 124-billion-parameter multimodal model. It’s built on top of Mistral Large 2 and extends its capabilities to include image understanding.
Developer: OpenAI
Release date: September 2024 for o1, January 2025 for o3-mini
Number of parameters: Not publicly disclosed
Context window: Up to 200,000 tokens
License: Proprietary
Access: OpenAI API
Input: Multimodal (image, text)
Output: Text
The o1 series of AI models includes o1 and o1-mini. Compared to OpenAI’s GPT models, o1 LLMs are equipped with more advanced reasoning capabilities. Both o1 and o1-mini were trained with large-scale reinforcement learning, allowing them to “think” before responding. They can generate a long chain of thought before answering.
The o1 LLM accepts both image and text inputs, while o1-mini can only handle text inputs.7 Compared to o1, o1-mini is smaller, faster and more cost-effective. It also excels at STEM reasoning and coding.
Meanwhile, o3-mini is the latest reasoning model. Like o1-mini, its strength lies in coding, math and science. It supports function calling and offers 3 reasoning effort options (low, medium and high) to optimize for different scenarios, such as complex problems that need more reasoning effort or simpler problems that require rapid responses and can use less reasoning.
Release date: September 2024 for Qwen 2.5 and January 2025 for Qwen2.5-Max
Number of parameters: Up to 72 billion
Context window: Up to 1 million tokens
License: Open source (Apache 2.0), Proprietary for larger models
Access: Alibaba Cloud, Hugging Face
Input: Multimodal (audio, image, text, video)
Output: Text
Qwen is a series of LLMs from Chinese cloud computing company Alibaba Cloud. Qwen includes language models and variants optimized for audio, coding, math and vision tasks.
Qwen offers these models:
● Qwen 2.5
● Qwen Audio
● Qwen Coder
● Qwen Math
● Qwen VL
Qwen2.5 models are decoder-only models for multilingual language processing tasks. They come in 0.5, 3, 7, 14, 32 and 72 billion parameter sizes. Larger models, such as the 72-billion variant, are available only through API access on Alibaba’s proprietary cloud platform.
Qwen2.5-Turbo features a longer context length of 1 million tokens and a quicker inference speed. Meanwhile, Qwen2.5-Max is the latest large-scale MoE model.
Qwen 2 Audio is purpose-built for audio-based tasks. This 7-billion-parameter model can be used to transcribe, detect and classify sounds, handle voice commands and identify musical elements.
Qwen2.5 Coder is a code-specific LLM. It’s available in 1.5, 7, 14, and 32 billion parameter sizes.
Qwen 2 Math is a collection of math-optimized LLMs. These models are suitable for advanced mathematical reasoning and solving complex math problems. Qwen 2 Math comes in 1.5, 7 and 72 billion parameter sizes.
Qwen 2 VL is a vision-language model that combines visual processing with natural language understanding. Sample use cases entail extracting information from visual data and generating captions and summaries for images and videos. Qwen 2 VL is available in 2, 7 and 72 billion parameter sizes.
Developer: Stability AI
Release date: April 2024 for Stable LM 2 12B
Number of parameters: Up to 12 billion
Context window: 4,096 tokens
License: Stability AI Community License or Enterprise License
Access: Stability AI, Hugging Face
Input: Text
Output: Text
Stable LM is a group of open-access language models from Stability AI, the makers of text-to-image model Stable Diffusion. Stable LM 2 12B has 12 billion parameters, while Stable LM 2 1.6B has 1.6 billion parameters. These are decoder-only LLMs trained on multilingual data and code datasets. Both models incorporate function calling and tool use.
Stable Code 3B is another LLM fine-tuned on code-related datasets. As a lightweight model with 3 billion parameters, Stable Code 3B can be run in real time on devices, even those without a GPU.
All links reside outside ibm.com
1 Model Card for C4AI Command R 08-2024, Hugging Face, Accessed 14 February 2025.
2 Model Card for C4AI Command R+ 08-2024, Hugging Face, Accessed 14 February 2025.
3 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, GitHub, 23 January 2025.
4 Access the latest 2.0 experimental models in the Gemini app, Google, 5 February 2025.
5 Model Information, GitHub, 30 September 2024.
6 Model Information, GitHub, 30 September 2024.
7 o1 and o1-mini, OpenAI, Accessed 14 February 2025.
Discover IBM® Granite™, our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Learn how to select the most suitable AI foundation model for your use case.
Dive into IBM Developer articles, blogs and tutorials to deepen your knowledge of LLMs.
Learn how to continually push teams to improve model performance and outpace the competition by using the latest AI techniques and infrastructure.
Explore the value of enterprise-grade foundation models that provide trust, performance and cost-effective benefits to all industries.
Learn how to incorporate generative AI, machine learning and foundation models into your business operations for improved performance.
Read about 2,000 organizations we surveyed about their AI initiatives to discover what's working, what's not and how you can get ahead.
Explore the IBM library of foundation models in the watsonx portfolio to scale generative AI for your business with confidence.
Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.