Multiple data pools use-cases

Ceph File System (CephFS) supports creating multiple data pools in a CephFS volume. Each data pool can be assigned to a dedicated subvolume or directory in the same file system. This helps administrators optimize their Ceph cluster’s performance and storage behavior.

CephFS is a distributed, POSIX-compliant file system. All CephFS deployments require a metadata RADOS pool. By default, a single, additional RADOS pool is provisioned for file data. Deploying additional data pools can be beneficial for supporting multiple workloads, storage media types, and data protection strategies.

Note: When creating a new CephFS, it is recommended to configure the initial data pool as a replicated SSD pool, rather than using Erasure Coding (EC) or deploying on HDDs. The CephFS metadata and default data pools may be colocated on the same SSDs.
Note: If you do not configure the first data pool as a replicated SSD pool, the configurations cannot be retrofitted unless you:
  • Recreate the entire CephFS filesystem
  • Migrate all existing data to the new setup

The following lists the use-cases for multiple data pools:

Layered file services

Assign dedicated data pools to subvolumes used for NFS shares or SMB shares. This isolates protocol-specific workloads and enables targeted performance tuning.

File System as a Service (FSaaS)
Service providers can offer differentiated storage tiers by mapping subvolumes to pools with distinct performance and durability characteristics.
Workload segmentation
Assign different workloads to subvolumes or directories backed by pools optimized for their specific needs. For optimized performance and cost efficiency:
  • Use fast media (SSDs) and replicated pools for throughput-intensive workloads requiring low latency and high IOPS.
  • Store archival or infrequently accessed data in subvolumes or directories backed by erasure-coded (EC) HDD or Quad-Level Cell (QLC) SSD pools to reduce total cost of ownership (TCO) while maintaining durability.

Benefits of using multiple pools

Administrators can optimize their Ceph clusters by aligning subvolume configurations with workload priorities by selecting appropriate attributes.

Table 1. Attribute selection considerations for CephFS optimization
Attributes Types
Data Protection Scheme Replicated pools EC pools
Recommended for performance-sensitive workloads. Suitable for cost-efficient, durable storage.
Storage Media SSD HDD or QLC SSD
High throughput and low latency. Cost-effective capacity for less demanding workloads.
Performance vs. Cost Trade-offs Throughput-optimized TCO-optimized
SSDs with replication for high-performance workloads. HDDs or QLC SSDs with erasure coding for archival or backup workloads.

By leveraging these attributes and following recommendations, administrators can optimize CephFS deployments to meet diverse operational requirements.