Using CephFS
Understand how Ceph File Systems (CephFS) integrates within the Ceph cluster, its data flow, the role of the Metadata Server (MDS) and aspects of volume management.
Understanding the CephFS data flow
Understand the data flow in CephFS to recognize the interactions between the CephFS client, Metadata Server (MDS), and the Ceph Storage Cluster.
Figure 1 illustrates the role of Metadata Server in CephFS.
- CephFS Client
-
- Mounting
- Connects to CephFS by using either
kcephfsorceph-fuse. - Metadata requests
- Requests metadata from the MDS for file operations.
- Metadata Server
-
- Metadata Operations
- Manages file system metadata including file creation, directory management, and access control.
- Caching
- Improves performance by caching frequently accessed metadata. Caching is distributed and coordinated with clients and other multi-MDS
- Ceph Storage Cluster
-
- RADOS Storage
- Handles distributed object storage.
- OSDs
- Manage data storage, retrieval, and replication across the cluster.
- Data Access and Storage
-
- Accessing Data
- After retrieving metadata from the MDS, the client accesses data directly from OSDs. Data is stored as objects within the Ceph Storage Cluster, with the CRUSH algorithm ensuring efficient distribution and redundancy.
Comparison with other FSaaS Solutions
- Scalability
- CephFS inherits all the benefits of the core Ceph internal storage engine RADOS. File systems can be increased in capacity and performance by adding more storage nodes. The data is automatically distributed and protetected.
- Performance
- CephFS is a parallel filesystem. Datastreams can run in parallel from multiple CephFS clients to multiple Ceph storage servers and OSDs. Metadata operations are running independently of the data operations.
- Integration
- CephFS is a POSIX-compatible file system. The integration of applications and infrastructure software such as Backup and Recovery solutions can rely on the POSIX nature of CephFS.
- Feature Set
- CephFS provides parallel data access and throughput and performance can be scaled up and down by the number of storage nodes and storage devices. CephFS like the rest of Ceph is designed to have no single point of failure and data redundancy is a built-in feature of the Ceph RADOS architecture. Available physical storage capacities are managed in thinly provisioned pools which all share the same infrastructure. CephF subvolumes can also be exported through NFS or SMB to clients that help enable many more use-cases.
Data Protection in CephFS
Because CephFS builds on top of RADOS, it inherits all the data durability available from that system. File data and metadata are regularly scrubbed and node failures are automatically handled. CephFS also provides metadata scrubbing that you can start.
For geographically resilient scale storage, you can use CephFS snapshots and your favorite file system backup solution to generate consistent versions. If a data center disaster occurs, CephFS provides fsck like tools to rebuild the file system from data objects. For help if your system reports data issues, contact IBM Support.