Also known as a transformer, a foundation model is an AI algorithm trained on vast amounts of broad data. The term “foundation model” was coined by the Stanford Institute for Human-Centered Artificial Intelligence in 2021.

A foundation model is built on a neural network model architecture to process information much like the human brain does. Foundation models can be trained to perform tasks such as data classification, the identification of objects within images (computer vision) and natural language processing (NLP) (understanding and generating text) with a high degree of accuracy. They can also perform self-supervised learning to generalize and apply their knowledge to new tasks.

Instead of spending time and effort on training a model from scratch, data scientists can use pretrained foundation models as starting points to create or customize generative AI models for a specific use case. For example, a foundation model might be used as the basis for a generative AI model that is then fine-tuned with additional manufacturing datasets to assist in the discovery of safer and faster ways to manufacturer a type of product.

A specific kind of foundation model known as a large language model (LLM) is trained on vast amounts of text data for NLP tasks. BERT (Bi-directional Encoder Representations from Transformers) is one of the earliest LLM foundation models developed. An open-source model, Google created BERT in 2018. It was pretrained on a large corpus of English language data with self-supervision and can be used for a variety of tasks such as:

Analyzing customer/audience sentiment

Answering customer service questions

Predicting text from input data

Generating text based on user prompts

Summarizing large, complex documents

Foundation models versus traditional machine learning models

A foundation model used for generative AI differs from a traditional machine learning model because it can be trained on large quantities of unlabeled data to support applications that generate content or perform tasks.

Meanwhile, a traditional machine learning model is typically trained to perform a single task using labeled data, such as using labeled images of cars to train the model to then recognize cars in unlabeled images.

Foundation models focused on enterprise value

IBM’s watsonx.ai studio a suite of language and code foundation models, each with a geology-themed code name, that can be customized for a range of enterprise tasks. All watsonx.ai models are trained on IBM’s curated, enterprise-focused data lake.

Available now: Slate

Slate refers to a family of encoder-only models, which while not generative, are fast and effective for many enterprise NLP tasks.

Coming soon: Granite

Granite models are based on a decoder-only, GPT-like architecture for generative tasks.

Coming soon: Sandstone

Sandstone models use an encoder-decoder architecture and are well suited for fine-tuning on specific tasks.

Coming soon: Obsidian

Obsidian models utilize a new modular architecture developed by IBM Research, providing high inference efficiency and levels of performance across a variety of tasks.