Multiple data pools use-cases
Ceph File System (CephFS) supports creating multiple data pools in a CephFS volume. Each data pool can be assigned to a dedicated subvolume or directory in the same file system. This helps administrators optimize their Ceph cluster’s performance and storage behavior.
CephFS is a distributed, POSIX-compliant file system. All CephFS deployments require a metadata RADOS pool. By default, a single, additional RADOS pool is provisioned for file data. Deploying additional data pools can be beneficial for supporting multiple workloads, storage media types, and data protection strategies.
- Recreate the entire CephFS filesystem
- Migrate all existing data to the new setup
The following lists the use-cases for multiple data pools:
- Layered file services
-
Assign dedicated data pools to subvolumes used for NFS shares or SMB shares. This isolates protocol-specific workloads and enables targeted performance tuning.
- File System as a Service (FSaaS)
- Service providers can offer differentiated storage tiers by mapping subvolumes to pools with distinct performance and durability characteristics.
- Workload segmentation
- Assign different workloads to subvolumes or directories backed by pools optimized for their specific needs. For optimized performance and cost efficiency:
- Use fast media (SSDs) and replicated pools for throughput-intensive workloads requiring low latency and high IOPS.
- Store archival or infrequently accessed data in subvolumes or directories backed by erasure-coded (EC) HDD or Quad-Level Cell (QLC) SSD pools to reduce total cost of ownership (TCO) while maintaining durability.
Benefits of using multiple pools
Administrators can optimize their Ceph clusters by aligning subvolume configurations with workload priorities by selecting appropriate attributes.
| Attributes | Types | |
|---|---|---|
| Data Protection Scheme | Replicated pools | EC pools |
| Recommended for performance-sensitive workloads. | Suitable for cost-efficient, durable storage. | |
| Storage Media | SSD | HDD or QLC SSD |
| High throughput and low latency. | Cost-effective capacity for less demanding workloads. | |
| Performance vs. Cost Trade-offs | Throughput-optimized | TCO-optimized |
| SSDs with replication for high-performance workloads. | HDDs or QLC SSDs with erasure coding for archival or backup workloads. | |
By leveraging these attributes and following recommendations, administrators can optimize CephFS deployments to meet diverse operational requirements.