With the increasing importance of processing data where work is being performed, serving AI models at the enterprise edge enables near-real-time predictions, while abiding by data sovereignty and privacy requirements. By combining the IBM watsonx data and AI platform capabilities for FMs with edge computing, enterprises can run AI workloads for FM fine-tuning and inferencing at the operational edge. This enables enterprises to scale AI deployments at the edge, reducing the time and cost to deploy with faster response times.
Please make sure to check out all the installments in this series of blog posts on edge computing:
Foundation models (FMs), which are trained on a broad set of unlabeled data at scale, are driving state-of-the-art artificial intelligence (AI) applications. They can be adapted to a wide range of downstream tasks and fine-tuned for an array of applications. Modern AI models, which execute specific tasks in a single domain, are giving way to FMs because they learn more generally and work across domains and problems. As the name suggests, an FM can be the foundation for many applications of the AI model.
FMs address two key challenges that have kept enterprises from scaling AI adoption. First, enterprises produce a vast amount of unlabeled data, only a fraction of which is labeled for AI model training. Second, this labeling and annotation task is extremely human-intensive, often requiring several hundreds of hours of a subject matter expert’s (SME) time. This makes it cost-prohibitive to scale across use cases since it would require armies of SMEs and data experts. By ingesting vast amounts of unlabeled data and using self-supervised techniques for model training, FMs have removed these bottlenecks and opened the avenue for widescale adoption of AI across the enterprise. These massive amounts of data that exist in every business are waiting to be unleashed to drive insights.
What are large language models?
Large language models (LLMs) are a class of foundation models (FM) that consist of layers of neural networks that have been trained on these massive amounts of unlabeled data. They use self-supervised learning algorithms to perform a variety of natural language processing (NLP) tasks in ways that are similar to how humans use language (see Figure 1).
Scale and accelerate the impact of AI
There are several steps to building and deploying a foundation model (FM). These include data ingestion, data selection, data pre-processing, FM pre-training, model tuning to one or more downstream tasks, inference serving, and data and AI model governance and lifecycle management—all of which can be described as FMOps.
To help with all this, IBM is offering enterprises the necessary tools and capabilities to leverage the power of these FMs via IBM watsonx, an enterprise-ready AI and data platform designed to multiply the impact of AI across an enterprise. IBM watsonx consists of the following:
IBM watsonx.ai brings new generative AI capabilities—powered by FMs and traditional machine learning (ML)—into a powerful studio spanning the AI lifecycle.
IBM watsonx.datais a fit-for-purpose data store built on an open lakehouse architecture to scale AI workloads for all of your data, anywhere.
IBM watsonx.governance is an end-to-end automated AI lifecycle governance toolkit that is built to enable responsible, transparent and explainable AI workflows.
Another key vector is the increasing importance of computing at the enterprise edge, such as industrial locations, manufacturing floors, retail stores, telco edge sites, etc. More specifically, AI at the enterprise edge enables the processing of data where work is being performed for near real-time analysis. The enterprise edge is where vast amounts of enterprise data is being generated and where AI can provide valuable, timely and actionable business insights.
Serving AI models at the edge enables near-real-time predictions while abiding by data sovereignty and privacy requirements. This significantly reduces the latency often associated with the acquisition, transmission, transformation and processing of inspection data. Working at the edge allows us to safeguard sensitive enterprise data and reduce data transfer costs with faster response times.
Scaling AI deployments at the edge, however, is not an easy task amid data (heterogeneity, volume and regulatory) and constrained resources (compute, network connectivity, storage and even IT skills) related challenges. These can broadly be described in two categories:
Time/cost to deploy: Each deployment consists of several layers of hardware and software that need to be installed, configured and tested prior to deployment. Today, a service professional can take up to a week or two for installation at each location, severely limiting how fast and cost-effectively enterprises can scale up deployments across their organization.
Day-2 management: The vast number of deployed edges and the geographical location of each deployment could often make it prohibitively expensive to provide local IT support at each location to monitor, maintain and update these deployments.
Edge AI deployments
IBM developed an edge architecture that addresses these challenges by bringing an integrated hardware/software (HW/SW) appliance model to edge AI deployments. It consists of several key paradigms that aid the scalability of AI deployments:
Policy-based, zero-touch provisioning of the full software stack.
Continuous monitoring of edge system health
Capabilities to manage and push software/security/configuration updates to numerous edge locations—all from a central cloud-based location for day-2 management.
A distributed hub-and-spoke architecture can be utilized to scale enterprise AI deployments at the edge, wherein a central cloud or enterprise data center acts as a hub and the edge-in-a-box appliance acts as a spoke at an edge location. This hub and spoke model, extending across hybrid cloud and edge environments, best illustrates the balance necessary to optimally utilize resources needed for FM operations (see Figure 2).
Pre-training of these base large language models (LLMs) and other types of foundation models using self-supervised techniques on vast unlabeled datasets often needs significant compute (GPU) resources and is best performed at a hub. The virtually limitless compute resources and large data piles often stored in the cloud allow for pre-training of large parameter models and continual improvement in the accuracy of these base foundation models.
On the other hand, tuning of these base FMs for downstream tasks—which only require a few tens or hundreds of labeled data samples and inference serving—can be accomplished with only a few GPUs at the enterprise edge. This allows for sensitive labeled data (or enterprise crown-jewel data) to safely stay within the enterprise operational environment while also reducing data transfer costs.
Using a full-stack approach for deploying applications to the edge, a data scientist can perform fine-tuning, testing and deployment of the models. This can be accomplished in a single environment while shrinking the development lifecycle for serving new AI models to the end users. Platforms like the Red Hat OpenShift Data Science (RHODS) and the recently announced Red Hat OpenShift AI provide tools to rapidly develop and deploy production-ready AI models in distributed cloud and edge environments.
Finally, serving the fine-tuned AI model at the enterprise edge significantly reduces the latency often associated with the acquisition, transmission, transformation and processing of data. Decoupling the pre-training in the cloud from fine-tuning and inferencing on the edge lowers the overall operational cost by reducing the time required and data movement costs associated with any inference task (see Figure 3).
To demonstrate this value proposition end-to-end, an exemplar vision-transformer-based foundation model for civil infrastructure (pre-trained using public and custom industry-specific datasets) was fine-tuned and deployed for inference on a three-node edge (spoke) cluster. The software stack included the Red Hat OpenShift Container Platform and Red Hat OpenShift Data Science. This edge cluster was also connected to an instance of Red Hat Advanced Cluster Management for Kubernetes (RHACM) hub running in the cloud.
Policy-based, zero-touch provisioning was done with Red Hat Advanced Cluster Management for Kubernetes (RHACM) via policies and placement tags, which bind specific edge clusters to a set of software components and configurations. These software components—extending across the full stack and covering compute, storage, network and the AI workload—were installed using various OpenShift operators, provisioning of requisite application services, and S3 Bucket (storage).
The pre-trained foundation model (FM) for civil infrastructure was fine-tuned via a Jupyter Notebook within Red Hat OpenShift Data Science (RHODS) using labeled data to classify six types of defects found on concrete bridges. Inference serving of this fine-tuned FM was also demonstrated using a Triton server. Furthermore, monitoring of the health of this edge system was made possible by aggregating observability metrics from the hardware and software components via Prometheus to the central RHACM dashboard in the cloud. Civil infrastructure enterprises can deploy these FMs at their edge locations and use drone imagery to detect defects in near real-time—accelerating the time-to-insight and reducing the cost of moving large volumes of high-definition data to and from the Cloud.
Combining IBM watsonx data and AI platform capabilities for foundation models (FMs) with an edge-in-a-box appliance allows enterprises to run AI workloads for FM fine-tuning and inferencing at the operational edge. This appliance can handle complex use cases out of the box, and it builds the hub-and-spoke framework for centralized management, automation and self-service. Edge FM deployments can be reduced from weeks to hours with repeatable success, higher resiliency and security.