IBM Fusion Data Foundation architecture

Internal components

IBM Fusion Data Foundation is based on an open-source technology stack including Rook, Ceph, and NooBaa.

Ceph
Ceph is IBM Fusion Data Foundation’s core storage platform. Ceph is based on RADOS (Reliable Autonomic Distributed Object Store). RADOS is an open source object storage service and integral part of Ceph distributed storage system.
Rook
Rook is a storage orchestrator for Kubernetes and coordinates the services that are provided by Ceph. Rook has operators to support different storage backends.
NooBaa
NooBaa (Multi Cloud Object Gateway) provides consistent S3 API endpoints across different backend infrastructures, such as AWS, Azure, GCP, IBM Cloud, Bare Metal, VMware, or OpenStack.
Figure 1. Components of IBM Fusion Data Foundation
The content of this image is explained in the surrounding text.

Ceph

After Ceph has been installed, it exposes the following storage classes:

  • RADOS Block Device (RBD), provisioned by openshift-storage-rbd.csi.ceph.com
  • Ceph File System (Ceph), provisioned by openshift-storage-cephfs.csi.ceph.com
  • RADOS Object Gateway (RGW), provisioned by openshift-storage-rgw.csi.ceph.com

A key characteristic of Ceph is its massive scalability up to petabytes of data. Ceph includes mechanisms for self-management, self-healing, and for distribution of data across servers and disks. The storage entity that the Ceph orchestrates is referred to as Object Storage Daemons (OSD). An OSD holds the actual data that is exposed as persistent volumes, including multiple copies for redundancy.

Note: In addition to the RGW object storage provided by Ceph, the Noobaa Multi Cloud Gateway provides an alternative implementation of object storage. Both expose the S3 API, which is a REST-based and de facto industry standard to interface with object storage.

Rook

Rook as storage orchestrator, is responsible for the packaging and management and scaling of the storage services that are provided by Ceph. Rook makes Ceph much easier to consume, as complex operations and configuration are taken care of by Rook’s automation and abstraction. Rook also detects and repairs inconsistent objects ensuring that data is protected and coherent.

Multi-Cloud Gateway (Noobaa)

While Ceph and Rook build the core software stack of IBM Fusion Data Foundation, Noobaa provides the Multi Cloud Object Gateway (MCG) interface to interact with object storage. As mentioned, this approach is an alternate implementation of the object storage class, which IBM Fusion Data Foundation offers in addition to Ceph Object Gateway (Ceph’s native object storage interface). Using MCG, applications can create object bucket claims to store objects. MCG creates its corresponding object buckets and also generates a configmap with the required credentials to access the bucket. Object buckets and object bucket claims are parallel and similar to PVs and PVCs.

MCG acts as a proxy between the local Red Hat OpenShift cluster and external object storage services that are offered by one or more cloud vendors, such as Amazon Web Services and IBM Cloud. The big advantage of this setup is that data can be kept in its place at the remote cloud vendor. When using object storage from applications running within the Red Hat OpenShift cluster, there is no need to copy data back and forth. Instead, a highly efficient caching mechanism achieves significant performance improvements. You can use Noobaa to manage your s3 compatible object storage resources in cloud services such as IBM Cloud Object Storage, AWS S3, or Azure Blob storage.

When the application claims an object storage as an object bucket claim, MCG creates its corresponding object bucket. Each bucket can have its own data placement and access control policy and can be changed over time. This allows adaptation to the changing needs of applications and environments. For example, a bucket can span multiple different cloud storage providers and manage the necessary authentication and caching.

For applications that use multiple data sources residing in more than one cloud service or using a combination of cloud and on-premises, object storage can be achieved with namespace buckets. MCG namespace buckets provide flexible data federation over multiple clouds and data object stores. Namespace buckets do not replicate or mirror data from the data resources and the storage admin decides the scope of the namespace.

In short, Noobaa Multi-Cloud Object gateway is an exciting opportunity to open IBM Fusion Data Foundation to a broader Hybrid Cloud topology.

Figure 2. Noobaa
The content of this image is explained in the surrounding text.

Red Hat OpenShift Operators

IBM Fusion Data Foundation is deployed inside OpenShift Container Platform by using operators that are provided by IBM Fusion and the Red Hat Operator Hub. Along with the Local Storage Operator and the Rook-Ceph operator, the installation creates a containerized Ceph storage cluster running on an OpenShift Container Platform. The operators are available through the Red Hat OpenShift Container Platform Service Catalog.

Installing software add-ons to Red Hat OpenShift through operators is user-friendly and straight-forward. It is seamlessly integrated into the overall Red Hat OpenShift administration experience. Making IBM Fusion Data Foundation part of all of this is a key advantage and reason to choose IBM Fusion Data Foundation as persistence layer for stateful containerized workloads.

Note: Currently, IBM Fusion Data Foundation is the naming for the IBM branded code base that is obtained as part of the larger offering IBM Fusion. The code base is also available from Red Hat’s Operator Hub as Red Hat branded Red Hat OpenShift Data Foundation. This also includes the naming of the operators in the Red Hat OpenShift catalog.
Figure 3. Operators of Red Hat OpenShift Data Foundation
The content of this image is explained in the surrounding text.

Storage classes provided by IBM Fusion Data Foundation

IBM Fusion Data Foundation supports various storage classes, which give developers flexibility when they implement workloads. Differentiators between storage classes are the supported attachment modes and data handling characteristics. Storage classes provide a way for administrators to describe the characteristics of the storage they offer. Different classes might map to quality-of-service levels, backup policies, or arbitrary policies determined by the cluster administrators.

For data access, the characteristics are distinguished depending on how they are shared or nonshared across OpenShift Container Platform nodes.

  • ReadWriteOnce (RWO): the volume can be mounted as read-write by a single node.
  • ReadOnlyMany (ROX): the volume can be mounted read-only by many nodes.
  • ReadWriteMany (RWX): the volume can be mounted as read-write by many nodes.

A key differentiator is the type of data the storage class contains:

Block Storage
Appropriate for ReadWriteOnce access modes. However, ReadWriteMany can be appropriate if the application can maintain data consistency and integrity. Suitable for databases and record systems.
File Storage (Shared and distributed file system)
Appropriate for both ReadWriteOnce and ReadWriteMany access modes, as the underlying file system is designed for multiple threads and multi-tenancy. Suitable for messaging, data aggregation, workloads machine learning, and deep learning.
Object Storage
Suitable for images, nonbinary files, documents, snapshots, or backups. Object Storage is also commonly used for AI workload. In IBM Fusion Data Foundation either Ceph RGW or Noobaa Multi Cloud Gateway MCG can be used. Both expose S3 Rest API to the application.
Figure 4. Storage classes of IBM Fusion Data Foundation
The content of this image is explained in the surrounding text.

IBM Fusion Data Foundation also exposes the standard NFS protocol. This allows to insert IBM Fusion directly into an existing NFS-based solution, without changing the application implementation.

How IBM Fusion Data Foundation works internally

IBM Fusion Data Foundation is deployed and running inside a OpenShift Container Platform cluster as storage nodes. Storage nodes can run either on a compute or infrastructure node. A complete IBM Fusion Data Foundation installation needs 3 storage nodes.

A containerized application runs on an Red Hat OpenShift compute node. When there is a need to store state and data, the application creates a persistent volume claim (PVC) and references it in its configuration. Each PVC specifies the storage class to be used (for example, ocs-storagecluster-ceph-rbd), the amount of storage needed (for example, 5 Gi), and other characteristics.

A PVC is bound to a persistent volume (PV). A PV is the logical representation of a storage unit and a Kubernetes concept, but it is not a real physical storage.

IBM Fusion Data Foundation orchestrates the mapping of a persistent volume to real physical storage to satisfy the persistent volume claims requested by the applications:

  • A PV is managed by a Ceph CSI driver on the node on which the consuming application is running. The Ceph CSI driver uses a RADOS Block Device or Ceph File System component from the Linux kernel to facilitate the interaction between the PV and a corresponding Ceph Object Storage Daemon (OSD) on the IBM Fusion Data Foundation storage nodes. An OSD is an entity of the storage that is managed by Ceph.
  • Behind the scenes, the OSDs are mapping the logical PVs to physical PVs, representing real physical storage, which is exposed by the storage adapter (or network adapter) and the Linux kernel. The storage and network adapters can be connected to FCP or DASD/FICON storage. For this mapping, IBM Fusion Data Foundation uses the Red Hat OpenShift Local Storage Operator to interact with the physical storage.
  • IBM Fusion Data Foundation is implementing redundancy by ensuring that each block of a logical PV, which is used on the compute node, is stored within at least three OSDs on storage nodes. IBM Fusion Data Foundation also provides capabilities for seamless operation, such as storage monitoring. All this is orchestrated by Rook. Rook creates several management components, which coordinate the lifecycle of the OSDs, that is, Monitors (MON), metadata servers (MDS) and the Storage Manager (MGR). Rook also provides an object gateway (RGW) for object storage.

The IBM Fusion Data Foundation storage nodes also expose an alternative object storage API via Noobaa’s Multi-Cloud Gateway (MCG). Noobaa MCG exposes an S3-compatible interface, which is a well-known industry best practice for application workloads interacting directly with object storage. The object storage can be spread out to multiple providers and Noobaa takes care of federating the data to a virtualized object storage. Noobaa presents capabilities to administer object storage and define buckets for access control and account management. Namespacing can be used to separate data between applications. A cache can be applied for optimized performance. In case of NooBaa, Ceph and OSDs are not involved.

Figure 5. Internal architecture of IBM Fusion Data Foundation
The content of this image is explained in the surrounding text.
Figure 6. Internal architecture of IBM Fusion Data Foundation (detailed view)
The content of this image is explained in the surrounding text.