We stand on the frontier of an AI revolution. Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. But we’ve faced a paradoxical challenge: automation is labor intensive. It sounds like a joke, but it’s not, as anyone who has tried to solve business problems with AI may know.
Traditional AI tools, while powerful, can be expensive, time-consuming, and difficult to use. Data must be laboriously collected, curated, and labeled with task-specific annotations to train AI models. Building a model requires specialized, hard-to-find skills — and each new task requires repeating the process. As a result, businesses have focused mainly on automating tasks with abundant data and high business value, leaving everything else on the table. But this is starting to change.
The emergence of transformers and self-supervised learning methods has allowed us to tap into vast quantities of unlabeled data, paving the way for large pre-trained models, sometimes called “foundation models.” These large models have lowered the cost and labor involved in automation.
Foundation models provide a powerful and versatile foundation for a variety of AI applications. We can use foundation models to quickly perform tasks with limited annotated data and minimal effort; in some cases, we need only to describe the task at hand to coax the model into solving it.
But these powerful technologies also introduce new risks and challenges for enterprises. Many of today’s models are trained on datasets of unknown quality and provenance, leading to offensive, biased, or factually incorrect responses. The largest models are expensive, energy-intensive to train and run, and complex to deploy.
We at IBM have been developing an approach that addresses core challenges for using foundation models for enterprise. Today, we announced watsonx.ai, IBM’s gateway to the latest AI tools and technologies on the market today. In a testament to how fast the field is moving, some tools are just weeks old, and we are adding new ones as I write.
What’s included in watsonx.ai — part of IBM’s larger watsonx offerings announced this week — is varied, and will continue to evolve, but our overarching promise is the same: to provide safe, enterprise-ready automation products.
It’s part of our ongoing work at IBM to accelerate our customers’ journey to derive value from this new paradigm in AI. Here, I’ll describe our work to build a suite of enterprise-grade, IBM-trained foundation models, including our approach to data and model architectures. I’ll also outline our new platform and tooling that enables enterprises to build and deploy foundation model-based solutions using a wide catalog of open-source models, in addition to our own.
Data: the foundation of your foundation model
Data quality matters. An AI model trained on biased or toxic data will naturally tend to produce biased or toxic outputs. This problem is compounded in the era of foundation models, where the data used to train models typically comes from many sources and is so abundant that no human being could reasonably comb through it all.
Since data is the fuel that drives foundation models, we at IBM have focused on meticulously curating everything that goes into our models. We have developed AI tools to aggressively filter our data for hate and profanity, licensing restrictions, and bias. When objectionable data is identified, we remove it, retrain the model, and repeat.
Data curation is a task that’s never truly finished. We continue to develop and refine new methods to improve data quality and controls, to meet an evolving set of legal and regulatory requirements. We have built an end-to-end framework to track the raw data that’s been cleaned, the methods that were used, and the models that each datapoint has touched.
We continue to gather high-quality data to help tackle some of the most pressing business challenges across a range of domains like finance, law, cybersecurity, and sustainability. We are currently targeting more than 1 terabyte of curated text for training our foundation models, while adding curated software code, satellite data, and IT network event data and logs.
IBM Research is also developing techniques to infuse trust throughout the foundation model lifecycle, to mitigate bias and improve model safety. Our work in this area includes FairIJ, which identifies biased data points in data used to tune a model, so that they can be edited out. Other methods, like fairness reprogramming, allow us to mitigate biases in a model even after it has been trained.
Efficient foundation models focused on enterprise value
IBM’s new watsonx.ai studio offers a suite of foundation models aimed at delivering enterprise value. They’ve been incorporated into a range of IBM products that will be made available to IBM customers in the coming months.
Recognizing that one size doesn’t fit all, we’re building a family of language and code foundation models of different sizes and architectures. Each model family has a geology-themed code name —Granite, Sandstone, Obsidian, and Slate — which brings together cutting-edge innovations from IBM Research and the open research community. Each model can be customized for a range of enterprise tasks.
Our Granite models are based on a decoder-only, GPT-like architecture for generative tasks. Sandstone models use an encoder-decoder architecture and are well suited for fine-tuning on specific tasks, interchangeable with Google’s popular T5 models. Obsidian models utilize a new modular architecture developed by IBM Research, providing high inference efficiency and levels of performance across a variety of tasks. Slate refers to a family of encoder-only (RoBERTa-based) models, which while not generative, are fast and effective for many enterprise NLP tasks. All watsonx.ai models are trained on IBM’s curated, enterprise-focused data lake, on our custom-designed cloud-native AI supercomputer, Vela.
Efficiency and sustainability are core design principles for watsonx.ai. At IBM Research, we’ve invented new technologies for efficient model training, including our “LiGO” algorithm that recycles small models and “grows” them into larger ones. This method can save from 40% to 70% of the time, cost, and carbon output required to train a model. To improve inference speeds, we’re leveraging our deep expertise in quantization, or shrinking models from 32-point floating point arithmetic to much smaller integer bit formats. Reducing AI model precision brings huge efficiency benefits without sacrificing accuracy. We hope to soon run these compressed models on our AI-optimized chip, the IBM AIU.
Hybrid cloud tools for foundation models
The final piece of the foundation model puzzle is creating an easy-to-use software platform for tuning and deploying models. IBM’s hybrid, cloud-native inference stack, built on RedHat OpenShift, has been optimized for training and serving foundation models. Enterprises can leverage OpenShift’s flexibility to run models from anywhere, including on-premises.
We’ve created a suite of tools in watsonx.ai that provide customers with a user-friendly user interface and developer-friendly libraries for building foundation model-based solutions. Our Prompt Lab enables users to rapidly perform AI tasks with just a few labeled examples. The Tuning Studio enables rapid and robust model customization using your own data, based on state-of-the-art efficient fine-tuning techniques developed by IBM Research.
In addition to IBM’s own models, watsonx.ai provides seamless access to a broad catalog of open-source models for enterprises to experiment with and quickly iterate on. In a new partnership with Hugging Face, IBM will offer thousands of open-source Hugging Face foundation models, datasets, and libraries in watsonx.ai. Hugging Face, in turn, will offer all of IBM’s proprietary and open-access models and tools on watsonx.ai.
Foundation models are changing the landscape of AI, and progress in recent years has only been accelerating. We at IBM are excited to help chart the frontiers of this rapidly evolving field and translate innovation into real enterprise value.