IBM Support

IBM Storage Ceph: BlueStore OSD bluestore_min_alloc_size, bluestore_min_alloc_size_hdd, bluestore_min_alloc_size_ssd values and ramifications

How To


Summary

- What are the bluestore_min_alloc_size options about?

- How to best determine values for our environment? Can these values be changed dynamically?

- How to find the actual value of min_alloc_size for a BlueStore OSD?

Environment

  • IBM Storage Ceph
  • Openshift Data Foundation
  • Fusion Data Foundation

Steps

  • BlueStore OSDs allocate and manage underlying raw storage in chunks of at smallest bluestore_min_alloc_size.
  • bluestore_min_alloc_size_hdd is used for HDD based OSDs and bluestore_min_alloc_size_ssd is used for SSD and NVMe based OSDs.
  • Default values are :
    # ceph config show-with-defaults osd.0 | grep bluestore_min_alloc_size
    bluestore_min_alloc_size         0                                                                                                                                                                                                                                                                                            
    bluestore_min_alloc_size_hdd     4096                                                                                                                                                                                                                                                                                         
    bluestore_min_alloc_size_ssd     4096         
  • For example a 1 KB RADOS object will allocate 4KB of raw capacity, 3KB of which is padding.  We call this space amplification. Note that IBM Storage Ceph 9 brings improvements that partly mitigate this effect.
  • The value of 0 for bluestore_min_alloc_size indicates that the value to use for a given OSD will be that of bluestore_min_alloc_size_hdd or bluestore_min_alloc_size_ssd, depending on the detected device type when each OSD is created. The device type is determined by the kernel's rotational attribute for the main OSD block device. For example, an HDD reports as below:
    # cat  /sys/block/sdp/queue/rotational
    1
  • The 4KB defaults in recent IBM Storage Ceph releases work well for most deployments on HDDs or conventional SSDs.
  • There are rare use-cases that may benefit from raising these values, for example to the historical setting of 64 KB.  The defaults were lowered because this larger size resulted in significant space amplification for RGW and CephFS pools storing large numbers of user objects or files smaller than several multiples of the min_alloc_size, especially for erasure coded (EC) pools.  IBM advises that the default 4 KB values not be modified.  If you suspect that you would benefit from adjusted values, please contact your IBM Storage Ceph support team for guidance.

  • Recent and QLC and QLC-class SSDs may have a coarse IU (indirection unit) that is larger than 4KB.  Examples include the Intel / Solidigm P5316 and P5336 and the Micron 6550 ION.  These novel models perform best when writes are aligned to the IU boundaries.  When provisioning OSDs on such SSDs, be sure to set the below value before deployment:
     

    ceph config set global bluestore_use_optimal_io_size_for_min_alloc_size true
    This directs the OSD creation logic to query /sys/block/sdp/queue/optimal_io_size for each OSD main block device.  With recent kernels this value will reflect the appropriate value for coarse IU QLC-class SSDs.  If the value is 0, the logic will fall back to the above 4KB defaults.

Additional Information

Important
  • The bluestore_min_alloc_size_(hdd|ssd) central config values can be changed dynamically, but the prevailing setting when a given OSD is created is baked in and cannot be changed for that OSD without zapping and redeploy with the new value.h config database. Additional information may be found in the administration guide.
  • BlueStore OSDs deployed on releases earlier than IBM Storage Ceph 6 may have been created with min_alloc_size values larger than 4 KB.  These OSDs may be detected as described below and should be zapped and redeployed, one at a time, allowing recovery to complete before proceeding to the next affected OSD.
  • How to identify the value of min_alloc_size with which a given OSD was created:
    • In IBM Storage Ceph 7 and later releases, this can be found from each OSD's metadata:
      # ceph osd metadata osd.1701 | grep alloc
      "bluestore_min_alloc_size": "4096"
    • In IBM Storage Ceph 6 and earlier releases, however, the above command does not display this attribute. On clusters running these older releases, look for min_alloc_size in each OSD's messages logs during OSD startup. We recommend that clusters running IBM Storage Ceph 6 or earlier releases update to IBM Storage 8 or later to gain this ability and a host of other functionality and performance improvements..
      2024-02-18T03:31:30.353+0000 7fa4d062c200  1 bluestore(/var/lib/ceph/osd/ceph-4) _open_super_meta min_alloc_size 0x1000
      
      Converting the hex "0x1000" to decimal we get 4K.
  • The bluestore_min_alloc_size_(hdd|ssd) cannot be set lower then the bdev_block_size (default 4096 - 4kB). If lower size of objects is expected the bdev_block_size has to be lowered as well. The value must be multiple of 512. Note that this is not recommended as it may cause the OSD to crash at startup.

  • There may be OSDs of the same type (SSD or HDD) built with differing bluestore_min_alloc_size_(hdd|ssd) in the your cluster.  Unless specific SSD OSDS are coarse-IU QLC-class, these should be serially zapped and redeployed as described above.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSSEWFV","label":"Storage Fusion Data Foundation"},"ARM Category":[{"code":"a8m3p000000UoIPAA0","label":"Support Reference Guide"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
21 July 2025

UID

ibm17229530