Llama 2 is a family of pre-trained and fine-tuned large language models (LLMs) released by Meta AI in 2023. Released free of charge for research and commercial use, Llama 2 AI models are capable of a variety of natural language processing (NLP) tasks, from text generation to programming code.
The Llama 2 model family, offered as both base foundation models and fine-tuned “chat” models, serves as the successor to the original LLaMa 1 models, which were released in 2022 under a noncommercial license granting access on a case-by-case basis exclusively to research institutions. Unlike their predecessors, Llama 2 models are available free of charge for both AI research and commercial use.
Meta’s Llama models thus aim to play an important role in democratizing the generative AI ecosystem. As noted in the Llama 2 research paper (link resides outside ibm.com), though the methodology for pre-training autoregressive LLMs via self-supervised learning is, by now, relatively straightforward and well understood, the immense computational requirements the process entails have largely limited the development of cutting-edge LLMs to a few key players. Because most state-of-the-art LLMs, like OpenAI’s GPT, Anthropic’s Claude and Google’s BARD are proprietary (and massive) closed-source models, public AI research access that might help understand how and why these models work—and how to better align their development with human interests—has been markedly limited.
In addition to making its code and model weights freely available, the Llama project has focused advancing the performance capabilities of smaller models, rather than through increasing parameter count. Whereas most prominent closed-source models have hundreds of billions of parameters, Llama 2 models are offered with seven billion (7B), 13 billion (13B) or 70 billion parameters (70B).
This enables smaller organizations, like startups and members of the research community, to deploy local instances of Llama 2 models—or Llama-based models developed by the AI community—without needing prohibitively expensive computing time or infrastructure investments.
The Llama 2 research paper details several advantages the newer generation of AI models offers over the original LLaMa models.
Though Meta has made the starting code and model weights for Llama 2 models freely available for research and commercial use, certain restrictions in its licensing agreement have caused debate regarding whether it can properly be called “open source.”
The debate is somewhat technical and semantic: though “open source” is often used colloquially to refer to any software (or other programming tools) whose source code is distributed free of charge, it is actually a formal designation stewarded by the Open Source Initiative (OSI). The OSI only certifies a given software license as “Open Source Initiative approved” if it deems said license to meet the ten requirements listed in the official Open Source Definition (OSD) (link resides outside ibm.com).
As explained in a statement from OSI Executive Director Stefano Maffulli, “OSI is pleased to see that Meta is lowering barriers for access to powerful AI systems. Unfortunately, the tech giant has created the misunderstanding that LLaMa 2 is “open source” – it is not.” 1
The discrepancy stems from two aspects of the Llama 2 license agreement:
These restrictions contradict two points of the OSD:
To acknowledge both the open spirit of Llama 2 and its failure to meet the technical definition of “open source,” some in the tech community have used the term “open approach.” 4
Llama 2 base models are pre-trained foundation models meant to be fine-tuned for specific use cases, whereas Llama 2 chat models are already optimized for dialogue.
Llama 2 is a family of transformer-based autoregressive causal language models. Autoregressive language models take a sequence of words as input and recursively predict—output—the next word(s).
During self-supervised pre-training, LLMs are provided the beginning of sample sentences drawn from a massive corpus of unlabeled data and tasked with predicting the next word. In training the model to minimize divergence between ground truth (the actual next word) and its own predictions, the model learns to replicate linguistic and logical patterns in the training data. Though the research paper notably omits details on specific data sources, it states that Llama 2 was trained with 2 trillion tokens—the numerically-represented words, word parts, phrases and other semantic fragments that transformer-based neural networks use for language processing—from publicly available sources.
On a fundamental level, base foundation models are not pre-trained to actually answer a prompt: they append text to it in a grammatically coherent way. An out-of-the-box foundation model might respond to a prompt of “teach me to bake cookies” with “for a holiday party.” Further fine-tuning, via techniques like supervised learning and reinforcement learning, is required to train a foundation model for a specific application like dialogue, instruction following or creative writing.
Instead, base Llama 2 models are intended to serve as a foundation to build a purpose-specific model upon. To date, Llama 2 (and the original LLaMa) models have served as the base of several prominent open source LLMs, including:
Llama-2-chat models are fine-tuned for dialogue-driven use cases, similar to the specific GPT model versions used in ChatGPT.
Supervised fine tuning (SFT) was used to prime the pre-trained Llama 2 base model to generate responses in the format expected by users in a chatbot or virtual agent setting. In a series of supervised learning tasks, labeled pairs of dialogue-style exchanges, annotated as (prompt, response), are used to train the model to minimize the divergence between its own response for a given prompt and the example response provided by the labeled data. The model thus learns, for example, that the proper response to a prompt of “teach me to bake cookies” is to provide actual instructions to bake cookies, rather than merely complete the sentence.
Rather than using millions of labeled examples, the paper states that results were improved by using “fewer but higher-quality examples,” noting that Meta AI collected 27,540 annotated samples.
Following SFT, Meta used reinforcement learning with human feedback (RLHF) to further align the chat models’ behavior with human preferences and instructions. In RLHF, direct human feedback is used to train a “reward model” to learn patterns of the kind of responses humans prefer. By translating the reward model’s predictions (regarding whether a given response would be preferred by humans) into a scalar reward signal, the reward model is then used to further train Llama-2-chat via reinforcement learning.
There are many different methods and formats in which that human feedback can be collected. Meta AI used a simple method of binary comparison: human annotators were asked to write a prompt, then choose between two model responses—based on criteria provided by Meta—generated by two different variants of Llama 2. To help the reward model properly weight these choices, annotators were also asked to rate the degree to which they preferred their chosen response over the other: “significantly better,” “slightly better” or “negligibly better/unsure.”
Human preferences were used to train two separate reward models: one optimized for helpfulness, the other optimized for safety (i.e. avoiding toxic, hateful responses or responses that might be used to aid in violence or criminal activity). In addition to proximal policy optimization (PPO), the algorithm typically used to update LLM model weights in RLHF, Meta also used rejection sampling (link resides outside ibm.com) to update Llama-2-chat-70B.
Code Llama, built on top of Llama 2, is fine-tuned for generating code (and natural language about code) from both code-based and natural language-based prompts. Introduced shortly after the release of the Llama 2 base and chat models, it’s free for research and commercial use.
Supporting most popular programming languages, including Python, C++, Java, PHP, and Javascript (among others), it’s available in model sizes of 7B, 13B and 34B parameters, and boasts a context length of up to 100,000 tokens. Two additional variations, Code Llama - Python and Code Llama - Instruct, are fine-tuned for Python (and PyTorch) and instruction following, respectively.
Relative to its closed-source competitors, Llama 2 models excel in areas like safety and factual accuracy. Though Llama 2 may not match the full capabilities of much larger models, its open availability and greater efficiency present unique advantages.
In comparing Llama 2 to the flagship proprietary models from competitors like OpenAI, Anthropic and Google, it’s important to consider scale. Though closed-source models do not always disclose the full details of their architecture, available information strongly suggests that they all greatly exceed the largest Llama 2 models’ 70 billion parameters:
According to the Llama 2 research paper, human evaluators preferred Llama-2-chat 70B responses to those of GPT-3.5.-turbo-0301, the standard model for ChatGPT: Llama 2 responses had a win rate of 36% and a tie rate of 31.5%. Relative to PaLM Bison, the second largest PaLM model, 70B had a win rate of over 50%.
In Meta’s testing, the 7B, 13B and 70B Llama 2 models all had significantly lower safety violation percentages than PaLM Bison—3% and 4%, compared to PaLM’s 27%—as well as lower safety violation percentages than ChatGPT’s 7%. This is a major strength for enterprise use cases, in which toxic, hateful or inflammatory language from chatbots can have major consequences.
An inherent advantage of smaller, open models over massive closed-source models is the freedom for businesses to run local model instances and the cost-efficiency to do so without massive investments in infrastructure or cloud computing. Running a local model ensures that proprietary code, training modifications and proprietary data can be used to fine-tune model performance without being loaded to a commercial server or potentially being used in future training of closed-source models. Furthermore, smaller model sizes, like the 7B and 13B variants, enable smoother performance in environments like mobile apps where processing power is limited.
Llama 2 does not have its own dedicated API, but it’s accessible through multiple providers.
Explore Granite 3.2 and the IBM library of foundation models in the watsonx portfolio to scale generative AI for your business with confidence.
Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.
Discover IBM® Granite™, our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Learn how to select the most suitable AI foundation model for your use case.
Dive into IBM Developer articles, blogs and tutorials to deepen your knowledge of LLMs.
Learn how to continually push teams to improve model performance and outpace the competition by using the latest AI techniques and infrastructure.
All links reside outside ibm.com.
1 "Meta's LLaMa 2 license is not Open Source", Voices of Open Source, 20 July 2023
2 "Llama 2 Community License Agreement", Meta, 18 July 2023
3 "The Open Source Definition", Open Source Initiative, last modified 22 Feb 2023
4 "Statement of Support for Meta’s Open Approach to Today’s AI", Meta, 18 July 2023
5 "Alpaca: A Strong, Replicable Instruction-Following Model", Stanford CRFM, 13 Mar 2023
6 "Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality", LMSYS Org, 30 Mar 2023
7 "Orca-2: Teaching Small Language Models How to Reason", Microsoft, Nov 2023
8 "WizardLM: Empowering Large Language Models to Follow Complex Instructions", arXiv, 10 June 2023
9 "The secret history of Elon Musk, Sam Altman, and OpenAI", Semafor, 24 Mar 2023
10 "Google’s newest A.I. model uses nearly five times more text data for training than its predecessor", CNBC, 16 May 2023
11 "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance", Google, 4 Apr 2023
12 "The Capacity for Moral Self-Correction in Large Language Models", arXiv, 18 Feb 2023
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com