IBM Support

OSD deployment fails due to non-empty or unclean storage drives

Troubleshooting


Problem

  • OSD drives are not cleaned properly when redeploying OSDs.
  • ODF deployment fails to deploy OSDs due to unclean devices.
 

Symptom

OSD recreation will fail if we use the same drive without the proper cleanup

Cause

As per the PR 55374, Ceph copies the bdev label to four redundant locations on the device: 0(original), 1GB, 10GB, 100GB, 1000GB. If all replicas do not match the label, the OSD refuses to start.
 
 

Environment

  • IBM Storage Ceph 8.x and above
  • IBM Storage Fusion Data Foundation 4.18 and above
  • Red Hat OpenShift Data Foundation 4.18 and above
  • Baremetal and VMware deployments with LSO 
 

Diagnosing The Problem

  • In the ODF environment, after zapping a device using the commands - "sgdisk --zap-all /dev/<disk> && wipefs -af /dev/<disk>", the OSD does not start and reports the below error:
 
sha256-0cc6d999e5e52bfe425b80493657ffd973e8d8729faf520d2217e6ccef6c08ca/namespaces/openshift-storage/pods/rook-ceph-osd-3-6cf5c9757f-ffq2f/expand-bluefs/expand-bluefs/logs/current.log
2025-04-01T07:19:54.918623613Z inferring bluefs devices from bluestore path
2025-04-01T07:19:54.919354059Z 2025-04-01T07:19:54.918+0000 7f65f1cffa40 -1 bluestore(/var/lib/ceph/osd/ceph-3/block) _read_bdev_label /var/lib/ceph/osd/ceph-3/block data at 0, unable to decode label
2025-04-01T07:19:55.190101707Z 2025-04-01T07:19:55.189+0000 7f65f1cffa40 -1 bluestore(/var/lib/ceph/osd/ceph-3/block) _read_bdev_label /var/lib/ceph/osd/ceph-3/block data at 0, unable to decode label
2025-04-01T07:19:55.190313374Z 2025-04-01T07:19:55.189+0000 7f65f1cffa40 -1 bluestore(/var/lib/ceph/osd/ceph-3) _check_main_bdev_label not all labels read properly
2025-04-01T07:19:55.458628292Z /builddir/build/BUILD/ceph-19.2.0/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7f65f1cffa40 time 2025-04-01T07:19:55.458515+0000
2025-04-01T07:19:55.458628292Z /builddir/build/BUILD/ceph-19.2.0/src/os/bluestore/BlueStore.cc: 8837: FAILED ceph_assert(r == 0)
2025-04-01T07:19:55.459052905Z  ceph version 19.2.0-98.el9cp (d5c3cf625491b0bd76b4585e77aa0d907446f314) squid (stable)
2025-04-01T07:19:55.459052905Z  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11e) [0x7f65f2ec2cb0]
2025-04-01T07:19:55.459052905Z  2: /usr/lib64/ceph/libceph-common.so.2(+0x182e6f) [0x7f65f2ec2e6f]
2025-04-01T07:19:55.459052905Z  3: (BlueStore::expand_devices(std::ostream&)+0xa43) [0x55a10722c023]
2025-04-01T07:19:55.459052905Z  4: main()
2025-04-01T07:19:55.459052905Z  5: /lib64/libc.so.6(+0x295d0) [0x7f65f28375d0]
2025-04-01T07:19:55.459052905Z  6: __libc_start_main()
2025-04-01T07:19:55.459052905Z  7: _start()
2025-04-01T07:19:55.459052905Z *** Caught signal (Aborted) **
2025-04-01T07:19:55.459052905Z  in thread 7f65f1cffa40 thread_name:ceph-bluestore-
2025-04-01T07:19:55.459069615Z 2025-04-01T07:19:55.458+0000 7f65f1cffa40 -1 /builddir/build/BUILD/ceph-19.2.0/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7f65f1cffa40 time

Resolving The Problem

  • To avoid OSD creation failure, the drive should be cleaned with the below steps:
DISK="/dev/sdX"

# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
sgdisk --zap-all $DISK

# Wipe portions of the disk to remove more LVM metadata that may be present
$ for gb in 0 1 10 100 1000; do dd if=/dev/zero of=/dev/"$DISK" bs=4K count=200 oflag=direct,dsync seek=$((gb * 1024**2)); done

# SSDs may be better cleaned with blkdiscard instead of dd
# This might not be supported on all devices
blkdiscard $DISK

# Inform the OS of partition table changes
partprobe $DISK
  • In order to clean up ceph bluestore metadata via the ODF Operator from OSD devices before deploying the cluster, a new feature is requested and being tracked as part of ODFRFE-19.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEG27","label":"IBM Storage Ceph"},"ARM Category":[{"code":"a8m3p000000UoIPAA0","label":"Support Reference Guide"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.0;8.1.0;9.0.0"},{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSSEWFV","label":"Storage Fusion Data Foundation"},"ARM Category":[{"code":"a8m3p000000UoIPAA0","label":"Support Reference Guide"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Document Information

Modified date:
26 November 2025

UID

ibm17242545