Foundation models are artificial intelligence (AI) models trained on vast, immense datasets and can fulfill a broad range of general tasks. They serve as the base or building blocks for crafting more specialized applications.
Their flexibility and massive size set them apart from traditional machine learning models, which are trained on smaller datasets to accomplish specific tasks, such as object detection or trend forecasting. Foundation models, meanwhile, employ transfer learning to apply the knowledge learned from one task to another. This makes them fit for more expansive domains, including computer vision, natural language processing (NLP) and speech recognition.
Researchers at Stanford University’s Center for Research on Foundation Models and Institute for Human-Centered Artificial Intelligence coined the term “foundation models” in a 2021 paper. They characterize these models as a “paradigm shift” and describe the reasoning behind their naming: “[A] foundation model is itself incomplete but serves as the common basis from which many task-specific models are built via adaptation. We also chose the term ‘foundation’ to connote the significance of architectural stability, safety and security: poorly constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications.”1
Building a foundation model often involves a series of steps akin to developing a conventional machine learning model:
The first step is to collate a huge corpus of data from diverse sources. This sweeping spectrum of unlabeled, unstructured data allows foundation models to infer patterns, recognize relationships, discern context and generalize their knowledge.
Modality refers to the type of data that a model can process, including audio, images, software code, text and video. Foundation models can be either unimodal or multimodal. Unimodal models are designed to handle a single type of data, such as receiving text inputs and generating text outputs. Multimodal models can combine information from multiple modalities, such as taking a text prompt and creating an image or producing written transcripts from a voice recording.
Many foundation models employ a deep learning architecture, which uses multilayered neural networks to mimic the human brain’s decision-making process.
A type of deep learning model known as the transformer model has been an architecture of choice for foundation models, particularly those for NLP like the generative pre-trained transformer (GPT) line of models. Here’s a brief overview of the transformer architecture:
Encoders transform input sequences into numerical representations called embeddings that capture the semantics and position of tokens in the input sequence.
A self-attention mechanism allows transformers to “focus their attention” on the most important tokens in the input sequence, regardless of their position.
Decoders use this self-attention mechanism and the encoders’ embeddings to generate the most statistically probable output sequence.
Diffusion models are another architecture implemented in foundation models. Diffusion-based neural networks gradually “diffuse” training data with random noise, then learn to reverse that diffusion process to reconstruct the original data. Diffusion models are primarily used in text-to-image foundation models like Google’s Imagen, OpenAI’s DALL-E (starting with DALL-E 2) and Stability AI’s Stable Diffusion.
Training typically entails self-supervised learning, where foundation models learn inherent correlations in unlabeled data. So, training happens over multiple iterations, with model weights adjusted to minimize prediction errors and hyperparameters tuned to find the optimal configuration variables for training. Regularization methods can also be applied to correct for overfitting (when a model fits too closely or even exactly to its training data) and to improve a foundation model’s ability to generalize.
A foundation model’s performance can be validated by using standardized benchmarks. The results from these assessments can inform further improvements or performance optimizations.
Developing a foundation model from scratch can be a costly, computationally intensive and time-consuming process. That’s why enterprises might consider adapting existing foundation models for their particular needs. These models can be accessed through an application programming interface (API) or by using a local copy of the model.
Here are two common approaches to adaptation:
During fine-tuning, a pretrained foundation model adapts its general knowledge to a particular task. This involves further training by using supervised learning on a smaller, domain-specific or task-specific dataset that includes labeled examples. The model’s parameters are updated to optimize its performance on the task.
Because fine-tuning alters a model’s parameters, it might affect how the model performs on other tasks. Creating a labeled dataset is also a tedious process.
This method entails providing a prompt to tailor a foundation model to a certain task. The prompt comes in the form of task-related instructions or task-relevant examples that guide a model, allowing it to gain context and generate a plausible output, an ability known as in-context learning.
While prompting doesn’t require training a model or changing its parameters, it can take several tries to get the right prompt that conditions a model to understand context and make fitting predictions.
The adaptability and general-purpose nature of foundation models means they can be implemented for various real-world applications:
Computer vision
Natural language processing
Healthcare
Robotics
Software code generation
Foundation models can be used to generate and classify images and to detect, identify and describe objects. DALL-E, Imagen and Stable Diffusion are examples of text-to-image foundation models.
Large language models (LLMs) are a class of foundation models that excel in NLP and natural language understanding (NLU). Their capabilities encompass question answering, text summarization, transcription, translation and video captioning, among others.
Here are a few popular foundation models in the NLP space:
BERT (Bidirectional Encoder Representations from Transformers) was one of the first foundation models. Released by Google in 2018, this open source AI system was trained on only a plain-text corpus.2
BLOOM is an open-access multilingual language model trained on 46 languages. It’s the result of a collaborative effort between Hugging Face and BigScience, a community of AI researchers.3
Claude is Anthropic’s family of foundation models with advanced reasoning and multilingual processing capabilities.
GPT, OpenAI’s foundation model, is the backbone of ChatGPT, the company’s generative AI chatbot. GPT-3.5 powers the free version of ChatGPT, while GPT-4 is behind the premium version. The GPT-4 series is also the generative AI model that supports Microsoft’s Copilot AI assistant.
Granite is the IBM® flagship series of LLM foundation models based on decoder-only transformer architecture. The Granite 13b chat model is optimized for dialogue use cases and works well with virtual agent and chat apps. While the Granite multilingual model is trained to understand and generate text in English, German, Spanish, French and Portuguese.
PaLM 2 is Google’s next-generation language model with enhanced multilingual and reasoning capabilities.
Within the healthcare field, foundation models can aid in a range of tasks. From creating summaries of patient visits and searching medical literature to answering patient questions, matching patients with clinical trials and facilitating drug discovery. The Med-PaLM 2 language model, for instance, can answer medical questions, and Google is designing a multimodal version that can synthesize information from medical images.4
In the realm of robotics, foundation models can help robots rapidly adapt to new environments and generalize across various tasks, scenarios and machine embodiments. For example, the PaLM-E embodied multimodal language model transfers knowledge from PaLM’s language and visual domains to robotics systems and is trained on robot sensor data.5
Foundation models can assist with completing, debugging, explaining and generating code in different programming languages. These text-to-code foundation models include Anthropic’s Claude, Google’s Codey and PaLM 2 and IBM’s Granite Code model family trained on 116 programming languages.
With so many options, how can organizations choose the right foundation model for AI development? Here’s a six-step AI model selection framework that can help:
Building upon foundation models can lead to automation and innovation for enterprises. Here are other advantages businesses can gain from foundation models:
Accelerated time to value and time to scale: Adopting existing models eliminates the development and pretraining phases, allowing companies to swiftly customize and deploy fine-tuned models.
Access to data: Organizations don’t need to compile large amounts of data for pretraining that they might not have the means to acquire.
Baseline accuracy and performance: Foundation models have already been evaluated for accuracy and performance, offering a high-quality starting point.
Reduced cost: Enterprises won’t need to spend on the resources needed to create a foundation model from the ground up.
Like other AI models, foundation models are still contending with the risks of AI. This is a factor to keep in mind for enterprises considering foundation models as the technology underpinning their internal workflows or commercial AI applications.
Bias: A model can learn from the human bias present in the training data, and that bias can trickle down to the outputs of fine-tuned models.
Computational costs: Using existing foundation models still requires significant memory, advanced hardware such as GPUs (graphics processing units) and other computational resources to fine-tune, deploy and maintain.
Data privacy and intellectual property: Foundation models might be trained on data obtained without the consent or knowledge of its owners. Exercise caution when feeding data into algorithms to avoid infringing on the copyright of others or exposing personally identifiable or proprietary business information.
Environmental toll: Training and running large-scale foundation models involves energy-intensive computations that contribute to increased carbon emissions and water consumption.
Hallucinations: Verifying the results of AI foundation models is essential to make sure they’re producing factually correct outputs.
All links reside outside ibm.com
1 On the Opportunities and Risks of Foundation Models, Stanford Center for Research on Foundation Models and Stanford Institute for Human-Centered Artificial Intelligence, 2021
2 Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing, Google Research, 2 November 2018
3 BigScience Large Open-science Open-access Multilingual Language Model, Hugging Face, 6 July 2022
4 Med-PaLM, Google Research, Accessed 8 October 2024
5 PaLM-E: An embodied multimodal language model, Google Research, 10 March 2023
Learn how CEOs can balance the value generative AI can create against the investment it demands and the risks it introduces.
Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.
Learn how to confidently incorporate generative AI and machine learning into your business.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Put AI to work in your business with IBM's industry-leading AI expertise and portfolio of solutions at your side.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com