Ceph BlueStore BlueFS
BlueStore block database stores metadata as key-value pairs in a RocksDB database. The block database resides on a small BlueFS partition on the storage device. BlueFS is a minimal file system that is designed to hold the RocksDB files.
BlueFS files
There are three types of files that RocksDB produces.
Control files, for example
CURRENT
,IDENTITY
, andMANIFEST-00011
.Database (DB) table files, for example
004112.sst
.Write ahead logs (WAL), for example
00038.log
.
There is also an internal, hidden file that serves as BlueFS replay log, ino 1
, that works as directory structure, file mapping, and operations log.
Fallback hierarchy
With BlueFS it is possible to put any file on any device. Parts of file can even reside on different devices, that is WAL, DB, and SLOW. There is an order to where BlueFS puts files. File is put to secondary storage only when primary storage is exhausted, and tertiary only when secondary is exhausted.
The order for the specific files is as follows, for each device type.
Write ahead logs: WAL, DB, SLOW
Replay log ino 1
: DB, SLOW
Control and DB files: DB, SLOW
Control and DB file order when running out of space: SLOW
IMPORTANT: There is an exception to control and DB file order. When RocksDB detects that you are running out of space on DB file, it directly notifies you to put file to SLOW device.