IBM watsonx
The watsonx.data on-premise solution integrates with Red Hat® OpenShift® and leverages IBM Storage Ceph for cloud-scale object storage and IBM Fusion for optimized application platform storage.
watsonx.data is an open, hybrid, and governed data lakehouse optimized for all data and AI workloads. It offers a single point of entry where you can store the data or attach data sources for managing and analyzing your enterprise data (structured, semi-structured, and unstructured), enabling access to all data across cloud and on-premise environments.
The watsonx.data is provided as part of IBM Software Hub (formally IBM Cloud Pak for Data), which is a modular set of integrated software components for data analysis, organization, and management. Red Hat OpenShift Container Platform provides a container run time for watsonx.data. Red Hat OpenShift Container Platform is hosted on bare metal x86 servers. IBM Storage Ceph is a software defined storage platform based upon an open source development model and deploys on industry-standard x86 hardware. It provides non-disruptive, horizontal scaling of object, block, and file storage with access to large capacities of data (petabytes to exabytes), ideal for modern AI frameworks that require data lake capabilities.
As part of this solution, IBM Fusion acts as a storage provider for the IBM Software Hub platform that hosts watsonx.data. IBM Fusion exposes storage classes that are used to provide RWO and RWX persistent volumes that IBM Software Hub needs to run. IBM Fusion obtains storage by remotely mounting the ESS 3500 file system using the Global Data Platform service. When Container Storage Interface (CSI) requests are made, the Global Data Platform provisions storage from the ESS 3500 file system and provides persistent volumes to the container workload.
IBM Fusion also provides watsonx.data with S3 access to the storage acceleration cache on the ESS 3500. Global Data Platform creates a statically provisioned volume that maps to the ESS 3500 fileset that acts as a cache for an accelerated object bucket. IBM Fusion then uses a technology called the Multi-cloud Gateway to expose the cache via S3 protocols. The watsonx.data attaches to the S3 bucket provided by the Multi-cloud Gateway, enabling query engines to take advantage of the storage acceleration cache.