May 9, 2023 By David D. Cox 5 min read

We stand on the frontier of an AI revolution. Over the past decade, deep learning arose from a seismic collision of data availability and sheer compute power, enabling a host of impressive AI capabilities. But we’ve faced a paradoxical challenge: automation is labor intensive. It sounds like a joke, but it’s not, as anyone who has tried to solve business problems with AI may know.  

Traditional AI tools, while powerful, can be expensive, time-consuming, and difficult to use. Data must be laboriously collected, curated, and labeled with task-specific annotations to train AI models. Building a model requires specialized, hard-to-find skills — and each new task requires repeating the process. As a result, businesses have focused mainly on automating tasks with abundant data and high business value, leaving everything else on the table. But this is starting to change. 

The emergence of transformers and self-supervised learning methods has allowed us to tap into vast quantities of unlabeled data, paving the way for large pre-trained models, sometimes called “foundation models.” These large models have lowered the cost and labor involved in automation.  

Foundation models provide a powerful and versatile foundation for a variety of AI applications. We can use foundation models to quickly perform tasks with limited annotated data and minimal effort; in some cases, we need only to describe the task at hand to coax the model into solving it.  

But these powerful technologies also introduce new risks and challenges for enterprises. Many of today’s models are trained on datasets of unknown quality and provenance, leading to offensive, biased, or factually incorrect responses. The largest models are expensive, energy-intensive to train and run, and complex to deploy. 

We at IBM have been developing an approach that addresses core challenges for using foundation models for enterprise. Today, we announced, IBM’s gateway to the latest AI tools and technologies on the market today. In a testament to how fast the field is moving, some tools are just weeks old, and we are adding new ones as I write.  

What’s included in — part of IBM’s larger watsonx offerings announced this week — is varied, and will continue to evolve, but our overarching promise is the same: to provide safe, enterprise-ready automation products. 

It’s part of our ongoing work at IBM to accelerate our customers’ journey to derive value from this new paradigm in AI. Here, I’ll describe our work to build a suite of enterprise-grade, IBM-trained foundation models, including our approach to data and model architectures. I’ll also outline our new platform and tooling that enables enterprises to build and deploy foundation model-based solutions using a wide catalog of open-source models, in addition to our own. 

Data: the foundation of your foundation model  

Data quality matters. An AI model trained on biased or toxic data will naturally tend to produce biased or toxic outputs. This problem is compounded in the era of foundation models, where the data used to train models typically comes from many sources and is so abundant that no human being could reasonably comb through it all. 

Since data is the fuel that drives foundation models, we at IBM have focused on meticulously curating everything that goes into our models. We have developed AI tools to aggressively filter our data for hate and profanity, licensing restrictions, and bias. When objectionable data is identified, we remove it, retrain the model, and repeat. 

Data curation is a task that’s never truly finished. We continue to develop and refine new methods to improve data quality and controls, to meet an evolving set of legal and regulatory requirements. We have built an end-to-end framework to track the raw data that’s been cleaned, the methods that were used, and the models that each datapoint has touched.  

We continue to gather high-quality data to help tackle some of the most pressing business challenges across a range of domains like finance, law, cybersecurity, and sustainability.  We are currently targeting more than 1 terabyte of curated text for training our foundation models, while adding curated software code, satellite data, and IT network event data and logs.  

IBM Research is also developing techniques to infuse trust throughout the foundation model lifecycle, to mitigate bias and improve model safety. Our work in this area includes FairIJ, which identifies biased data points in data used to tune a model, so that they can be edited out. Other methods, like fairness reprogramming, allow us to mitigate biases in a model even after it has been trained. 

Efficient foundation models focused on enterprise value 

IBM’s new studio offers a suite of foundation models aimed at delivering enterprise value. They’ve been incorporated into a range of IBM products that will be made available to IBM customers in the coming months. 

Recognizing that one size doesn’t fit all, we’re building a family of language and code foundation models of different sizes and architectures. Each model family has a geology-themed code name —Granite, Sandstone, Obsidian, and Slate — which brings together cutting-edge innovations from IBM Research and the open research community. Each model can be customized for a range of enterprise tasks. 

Our Granite models are based on a decoder-only, GPT-like architecture for generative tasks. Sandstone models use an encoder-decoder architecture and are well suited for fine-tuning on specific tasks, interchangeable with Google’s popular T5 models. Obsidian models utilize a new modular architecture developed by IBM Research, providing high inference efficiency and levels of performance across a variety of tasks. Slate refers to a family of encoder-only (RoBERTa-based) models, which while not generative, are fast and effective for many enterprise NLP tasks. All models are trained on IBM’s curated, enterprise-focused data lake, on our custom-designed cloud-native AI supercomputer, Vela

Efficiency and sustainability are core design principles for At IBM Research, we’ve invented new technologies for efficient model training, including our “LiGO” algorithm that recycles small models and “grows” them into larger ones. This method can save from 40% to 70% of the time, cost, and carbon output required to train a model. To improve inference speeds, we’re leveraging our deep expertise in quantization, or shrinking models from 32-point floating point arithmetic to much smaller integer bit formats. Reducing AI model precision brings huge efficiency benefits without sacrificing accuracy. We hope to soon run these compressed models on our AI-optimized chip, the IBM AIU

Hybrid cloud tools for foundation models 

The final piece of the foundation model puzzle is creating an easy-to-use software platform for tuning and deploying models. IBM’s hybrid, cloud-native inference stack, built on RedHat OpenShift, has been optimized for training and serving foundation models. Enterprises can leverage OpenShift’s flexibility to run models from anywhere, including on-premises. 

We’ve created a suite of tools in that provide customers with a user-friendly user interface and developer-friendly libraries for building foundation model-based solutions. Our Prompt Lab enables users to rapidly perform AI tasks with just a few labeled examples. The Tuning Studio enables rapid and robust model customization using your own data, based on state-of-the-art efficient fine-tuning techniques developed by IBM Research

In addition to IBM’s own models, provides seamless access to a broad catalog of open-source models for enterprises to experiment with and quickly iterate on. In a new partnership with Hugging Face, IBM will offer thousands of open-source Hugging Face foundation models, datasets, and libraries in Hugging Face, in turn, will offer all of IBM’s proprietary and open-access models and tools on  

To try out a new model simply select it from a drop-down menu. You can learn more about the studio here.

Looking to the future 

Foundation models are changing the landscape of AI, and progress in recent years has only been accelerating. We at IBM are excited to help chart the frontiers of this rapidly evolving field and translate innovation into real enterprise value. 

Learn more about

More from Artificial intelligence

How generative AI delivers value to insurance companies and their customers

4 min read - Insurers struggle to manage profitability while trying to grow their businesses and retain clients. They must comply with an increasing regulatory burden, and they compete with a broad range of financial services companies that offer investment products that have potential for better returns than traditional life insurance and annuity products. Although interest rates have increased at an unprecedented rate over the past year as central banks attempt to curb inflation, a significant part of insurers’ reserves are locked into low-yield…

How to build a successful employee experience strategy

4 min read - Ever since the pandemic changed the corporate world, organizations have rededicated themselves to excelling at employee experience strategy. A successful employee experience strategy (EX strategy) is the best way to recruit and retain top talent, as employees increasingly make decisions on where to work based on how they respond to employee needs. Organizations can prioritize overall employee experience by being thoughtful about how to serve their workers during all stages of the employee journey, from the hiring process to the…

Best practices for augmenting human intelligence with AI

2 min read - Artificial Intelligence (AI) should be designed to include and balance human oversight, agency, and accountability over decisions across the AI lifecycle. IBM’s first Principle for Trust and Transparency states that the purpose of AI is to augment human intelligence. Augmented human intelligence means that the use of AI enhances human intelligence, rather than operating independently of, or replacing it. All of this implies that AI systems are not to be treated as human beings, but rather viewed as support mechanisms…

IBM watsonx AI and data platform, security solutions and consulting services for generative AI to be showcased at AWS re:Invent

3 min read - According to a Gartner® report, “By 2026, more than 80% of enterprises will have used generative AI APIs or models, and/or deployed GenAI-enabled applications in production environments, up from less than 5% in 2023.”* However, to be successful they need the flexibility to run it on their existing cloud environments. That’s why we continue expanding the IBM and AWS collaboration, providing clients flexibility to build and govern their AI projects using the watsonx AI and data platform with AI assistants…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters