Key components of IBM Fusion Data Foundation

This section covers key components of IBM Fusion Data Foundation such as block storage, file storage, and object storage.

IBM Fusion Data Foundation is based on a technology stack including Rook, Ceph®, and NooBaa.

  • Ceph is the core storage platform of IBM Fusion Data Foundation. Ceph is based on RADOS (Reliable Autonomic Distributed Object Store), which by itself is an open source object storage service and an integral part of the Ceph distributed storage system.
  • Rook is a storage orchestrator for Kubernetes and coordinates the services that are provided by IBM Fusion Data Foundation. Rook has operators to support different storage backends.
  • NooBaa (Multi Cloud Object Gateway) provides consistent S3 endpoints across different backend infrastructure; like AWS, Azure, GCP, Bare Metal, VMware, or OpenStack.
Figure 1. Key Technologies used by IBM Fusion Data Foundation
The content of this image is explained in the surrounding text.

IBM Fusion Data Foundation is deployed inside RHOCP using operators. With the Local Storage Operator and the Rook-Ceph operator, the installation creates a containerized Red Hat Ceph Storage cluster running on RHOCP.

Figure 2. IBM Fusion Data Foundation related Operators
The content of this image is explained in the surrounding text.

Storage classes

IBM Fusion Data Foundation supports various storage types; like file, block, and object storage, which gives developers flexibility when they implement workload. Differentiators are the supported attachment modes and data handling characteristics:

  • ReadWriteOnce (rwo)
  • ReadOnlyMany (rox)
  • ReadWriteMany (rwx)

The storage classes have influence on the requirements for shared or nonshared data across RHOCP nodes.

Block Storage
Appropriate for rwo access modes. However, rwx might be appropriate if the application can maintain data consistency and integrity. Suitable for databases and systems of record.
File Storage (Shared and distributed file system)
Appropriate for both rwo and rwx access modes, as the underlying file system is designed for multiple threads and multi-tenancy. Suitable for messaging, data aggregation, workloads machine learning, and deep learning.
Multicloud object storage accessed via a lightweight S3 API endpoint.
Appropriate for retrieval of data from multiple cloud object stores. Suitable for images, nonbinary files, documents, snapshots, or backups.
Figure 3. Storage Classes supported by IBM Fusion Data Foundation
The content of this image is explained in the surrounding text.

Encryption

IBM Fusion Data Foundation allows encryption of the stored data (data at rest) by using the common Linux Unified Key Setup (LUKS2). Two levels of granularity are possible:

  • OSD level: an entire storage device is encrypted
  • PV level: a specific persistent volume that is used by an application is encrypted individually

For an OSD level encryption, the keys that are used for the encryption process can be stored either internally or externally.

  • Internal key management is done within the Red Hat OpenShift cluster. For this purpose, the key is stored as a name-value-pair inside the Red Hat OpenShift etcd database.
  • External key management can be provided by a 3rd-party keystore, such as HashiCorp vault.

An external key management solution is preferred because it ensures a clear separation between the keys and the protected data.

For a PV-level encryption, external key management is mandatory. In this case, it is not possible to store keys internally inside the etcd database. The PV level encryption allows scoping the encryption process to specific persistent volumes used by an application. This allows an instance to protect one application (or tenant) from another. IBM Fusion Data Foundation has solved the separation between the persistent volumes, by creating a set of storage classes for each PV/tenant and assigning keys specifically to a storage class.

Important: When storing keys internally, it is important to secure and backup the etcd database correctly.

Deployment topologies

IBM Fusion Data Foundation can be deployed in different modes.

Internal mode
IBM Fusion Data Foundation is installed within a single RHOCP cluster. Highly scalable, enterprise grade storage, which is fully integrated into the RHOCP lifecycle, monitoring, and management.
  • Application pods and IBM Fusion Data Foundation pods can be scheduled on the same compute nodes. In this case, compute and storage infrastructure scale together within the same cluster. This setup is optimized for simplicity of management.
  • As an alternative, application pods and IBM Fusion Data Foundation pods can be scheduled on different nodes. For example, infrastructure nodes. This setup implies that compute hosts and storage hosts scale independently and a more balanced deployment can be achieved.
External mode
Application pods run on one or more Red Hat OpenShift clusters, while IBM Fusion Data Foundation storage is provided from an external IBM Storage Ceph Storage cluster. This implies that the lifecycle, monitoring, and management of RHOCP and IBM Fusion Data Foundation are decoupled. Compute and storage infrastructure scales independently in different clusters. Optimized for scale and performance (on-premises only). Typically the external IBM Fusion Data Foundation is deployed as a Ceph storage environment on x86 bare metal hardware.
Figure 4. Internal and external Mode
The content of this image is explained in the surrounding text.

Comparing IBM Fusion Data Foundation with IBM Storage Scale and NFS

Table 1. Comparing IBM Fusion Data Foundation with IBM Storage Scale and NFS
  IBM Fusion Data Foundation IBM Storage Scale NFS Storage
Key Value Proposition
  • Cluster based on CEPH
  • Tightly integrated into RHOCP
  • Specially developed for RHOCP
  • Cluster based on General Parallel File System (GPFS)
  • Can share data between different architectures (x86, ppc64le, s390x)
  • RHOCP attached via CSI API
  • Popular and common usage with same API for all HW architectures
  • Not highly performant
  • Great for tests and simple use cases
Additional Aspects
  • Software-defined storage Rook in SAN
  • Can federate RHOCP storage across local and cloud environments via NooBaa
  • SAN using FCP/SCSI Storage
  • Can also host Db2, Oracle …
  • Data Tiering included
  • Backup and HA / DR functions
  • Can use SCSI + ECKD disks
  • RHOCP attaches to an existing cluster
  • Transparent access to data from different architectures
  • Not highly securable
  • No tiering or auto scaling
  • Not recommended for production
Scale
  • Min of 3 storage nodes
  • Min 1 disk
  • Maximum of 3 X 500 PVs
  • Max of 81 TiB (3 x 27) storage
  • Highly scalable to Petabytes
  • Implementation variations for scalability and performance
  • Can scale based on the NFS limits
Storage Classes
  • General purpose file, block, or object Cloud native storage
  • Multicloud Object storage via MCG gateway (NooBaa)
  • General-purpose file storage
  • Can be used for RHOCP or other data and solutions
  • General-purpose file storage