Using CephFS

Understand how Ceph File Systems (CephFS) integrates within the Ceph cluster, its data flow, the role of the Metadata Server (MDS) and aspects of volume management.

Understanding the CephFS data flow

Understand the data flow in CephFS to recognize the interactions between the CephFS client, Metadata Server (MDS), and the Ceph Storage Cluster.

Figure 1 illustrates the role of Metadata Server in CephFS.

Figure 1. CephFS Data Flow
CephFS data flow
The following are the primary functions of the CephFS Client and Metadata Server:
CephFS Client
Mounting
Connects to CephFS by using either kcephfs or ceph-fuse.
Metadata requests
Requests metadata from the MDS for file operations.
Metadata Server
Metadata Operations
Manages file system metadata including file creation, directory management, and access control.
Caching
Improves performance by caching frequently accessed metadata. Caching is distributed and coordinated with clients and other multi-MDS
Ceph Storage Cluster
RADOS Storage
Handles distributed object storage.
OSDs
Manage data storage, retrieval, and replication across the cluster.
Data Access and Storage
Accessing Data
After retrieving metadata from the MDS, the client accesses data directly from OSDs. Data is stored as objects within the Ceph Storage Cluster, with the CRUSH algorithm ensuring efficient distribution and redundancy.

Comparison with other FSaaS Solutions

Following are some of the key comparisons between CephFS and other SaaS solutions:
Scalability
CephFS inherits all the benefits of the core Ceph internal storage engine RADOS. File systems can be increased in capacity and performance by adding more storage nodes. The data is automatically distributed and protetected.
Performance
CephFS is a parallel filesystem. Datastreams can run in parallel from multiple CephFS clients to multiple Ceph storage servers and OSDs. Metadata operations are running independently of the data operations.
Integration
CephFS is a POSIX-compatible file system. The integration of applications and infrastructure software such as Backup and Recovery solutions can rely on the POSIX nature of CephFS.
Feature Set
CephFS provides parallel data access and throughput and performance can be scaled up and down by the number of storage nodes and storage devices. CephFS like the rest of Ceph is designed to have no single point of failure and data redundancy is a built-in feature of the Ceph RADOS architecture. Available physical storage capacities are managed in thinly provisioned pools which all share the same infrastructure. CephF subvolumes can also be exported through NFS or SMB to clients that help enable many more use-cases.

Data Protection in CephFS

Because CephFS builds on top of RADOS, it inherits all the data durability available from that system. File data and metadata are regularly scrubbed and node failures are automatically handled. CephFS also provides metadata scrubbing that you can start.

For geographically resilient scale storage, you can use CephFS snapshots and your favorite file system backup solution to generate consistent versions. If a data center disaster occurs, CephFS provides fsck like tools to rebuild the file system from data objects. For help if your system reports data issues, contact IBM Support.