Caching options

The user space implementation of the Ceph Block Device, librbd, cannot take advantage of the Linux page cache, so it includes its own in-memory caching, called RBD caching. Ceph Block Device caching behaves just like well-behaved hard disk caching.

When the operating system sends a barrier or a flush request, all dirty data is written to the Ceph OSDs. This means that using write-back caching is as safe as using a well-behaved physical hard disk with a virtual machine that properly sends flushes, that is, Linux kernel version 2.6.32 or higher. The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode it can coalesce contiguous requests for better throughput.

Ceph Block Devices support write-back caching. To enable write-back caching, set rbd_cache = true to the [client] section of the Ceph configuration file. By default, librbd does not perform any caching. Writes and reads go directly to the storage cluster, and writes return only when the data is on disk on all replicas. With caching enabled, writes return immediately, unless there are more than rbd_cache_max_dirty unflushed bytes. In this case, the write triggers a write-back and blocks until enough bytes are flushed.

Ceph Block Devices support write-through caching. You can set the size of the cache, and you can set targets and limits to switch from write-back caching to write-through caching. To enable write-through mode, set rbd_cache_max_dirty to 0. This means writes return only when the data is on disk on all replicas, but reads can come from the cache. The cache is in memory on the client, and each Ceph Block Device image has its own. Since the cache is local to the client, there is no coherency if there are others accessing the image. Running other file systems, such as GFS or OCFS, on top of Ceph Block Devices will not work with caching enabled.

The Ceph configuration settings for Ceph Block Devices must be set in the [client] section of the Ceph configuration file, by default /etc/ceph/ceph.conf.

Table 1 lists the available cache settings.
Table 1. IBM Storage Ceph cache option settings
Option Description Type Required Constraint Default
rbd_cache Enable caching for RADOS Block Device (RBD). Boolean No N/A true
rbd_cache_size The RADOS Block Devices (RBD) cache size in bytes. 64-bit integer No N/A 32 MiB
rbd_cache_max_dirty The dirty limit in bytes at which the cache triggers a write-back. If 0, uses write-through caching. 64-bit integer No Must be less than rbd cache size. 24 MiB
rbd_cache_target_dirty The dirty target before the cache begins writing data to the data storage. Does not block writes to the cache. 64-bit integer No Must be less than rbd cache max dirty. 16 MiB
rbd_cache_max_dirty_age The number of seconds dirty data is in the cache before writeback starts. Float No N/A 1.0
rbd_cache_max_dirty_object The dirty limit for objects. Set to 0 to auto calculate from rbd_cache_size. Integer N/A N/A 0
rbd_cache_block_writes_upfront If true, it will block writes to the cache before the aio_write call completes. If false, it blocks before the aio_completion is called. Boolean N/A N/A false
rbd_cache_writethrough_until_flush Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32. Boolean No N/A true