Bug fixes

`cephadm` utility

Improved error reporting for OSD redeployment when DB device slots are unavailable

Previously, the ceph-volume batch command did not correctly count devices with existing LVM data. After zapping an OSD, redeployment could fail silently because no DB device slots were available.

With this fix, the condition is treated as an error and a clear message is logged. As a result, when deployment fails due to unavailable DB device slots, users are informed through explicit log errors.

(IBMCEPH-12884)

Invalid HA cluster port values are now prevented

Previously, when creating a high availability (HA) cluster with a custom an internal offset caused the assigned port to exceed the valid range.

With this fix, validation has been added to ensure that the resulting port value does not exceed the maximum allowed limit of 65535. As a result, if the provided custom port plus the offset exceeds the valid range, the HA cluster creation command fails with a validation error instead of assigning an invalid port.

(IBMCEPH-11819)

Prometheus can now access NFS exporter metrics endpoint

Previously, Prometheus could not access the NFS exporter metrics endpoint because the NFS monitoring port was not opened in the firewall rules. As a result, NFS metrics URLs were unreachable.

With this fix, the required NFS monitoring port is properly handled, allowing Prometheus to access the metrics endpoint without requiring manual firewall configuration.

(IBMCEPH-13144)

Ceph Manager

Manager modules now load reliably after failover

Previously, a race condition in the manager (MGR) component could cause modules to fail to load in time after a failover. This could happen during an upgrade or when running the ceph mgr fail command. As a result, MGR module commands could fail with misleading error messages indicating that a module was not enabled or loaded, even though the issue was related to delayed loading.

With this fix, the MGR declares itself active only after all modules have been successfully loaded and are ready to receive commands. In addition, error messages have been improved to clearly distinguish between modules that failed to load in time and those that are not enabled. As a result, MGR module commands now execute reliably after failover, and clearer health warnings are provided if any modules fail to load in time.

(IBMCEPH-9881)

Ceph Dashboard

Improved clarity of performance card empty states

Previously, a generic empty state message was displayed when storage or Prometheus was not configured, providing no clear indication of the issue.

With this fix, empty state messages are displayed dynamically based on the configuration status. As a result, users can clearly identify whether Prometheus is disabled, not configured, storage is not configured, or performance charts are displayed as expected.

(IBMCEPH-13420)

Duplicate storage class name validation in Ceph Dashboard

Previously, the Ceph Dashboard did not validate duplicate storage class names during creation. As a result, creating a storage class with a name that already existed unintentionally modified the existing storage class instead of creating a new one.

With this fix, validation is added to prevent duplicate storage class names. As a result, users can no longer create a storage class with a name that already exists, preventing unintended modifications to existing storage classes.

(IBMCEPH-13044)

Ceph File System (CephFS)

Quota display is now accurate in mixed quota modes

Previously, if a quota was set on a parent directory, it was used for displaying quota information. This caused incorrect values when different quota types were defined at different directory levels.

With this fix, the system identifies the correct quota root separately for max_files and max_bytes. As a result, quotas are displayed correctly in mixed quota configurations.

(IBMCEPH-12286)

Fixed stability issues in LogSegment lifecycle management

Previously, LogSegment objects were managed using raw pointers, allowing multiple MDS components to reference them without a clear ownership model. This could leave dangling pointers when segments were trimmed or expired.

With this fix, reference counting has been introduced for LogSegment. As a result, segments are released only after all references are cleared, eliminating dangling pointers and improving MDS stability during operations such as replay and recovery.

(IBMCEPH-11811)

Snapshot info now correctly includes enctag

Previously, the enctag field was not populated when querying filesystem snapshot information. As a result, an error occurred and the snapshot info output was not fully displayed.

With this fix, the enctag field is populated during snapshot queries and snapshot information now displays correctly, including all attributes such as enctag.

(IBMCEPH-13575)

NFS client mounts remain accessible after MDS failover

Previously, after an MDS failover, the new active MDS could not properly regenerate client sessions during journal replay because the auth_name field was empty. As a result, the MDS returned -EPERM when NFS clients attempted to reclaim sessions, making mounts inaccessible.

With this fix, the auth_name field is consistently encoded and decoded for each ESession event. As a result, the MDS correctly restores client sessions during replay, allowing NFS clients to reclaim sessions and maintain mount accessibility.

(IBMCEPH-14183)

MDS no longer crashes due to improper mds_lock usage

Previously, a bug in the MDS allowed certain code paths to execute without holding the required mds_lock, leading to unsafe behavior and unpredictable crashes.

With this fix, the mds_lock is consistently held wherever required. As a result, the MDS operates reliably without unexpected crashes.

(IBMCEPH-13700, IBMCEPH-13705)

Ceph Block Device

Clone image sparseness is preserved during live migration

Previously, due to an implementation defect, sparseness was lost when live migrating a clone image. As a result, the destination image could consume significantly more space than the source clone image, including its entire parent chain.

With this fix, the implementation has been corrected to preserve sparseness during live migration. As a result, the destination image now consumes the same amount of space as the source clone image, including its parent chain.

(IBMCEPH-13504)

Ceph Object Gateway

Replayed multipart upload requests now return correct etag

Previously, the CompleteMultipartUpload::etag member was not assigned when replaying an already completed multipart upload request. As a result, an empty etag header was returned for replayed requests.

With this fix, the etag member is assigned the validated value in the replay response path. As a result, replayed requests now return the correct etag as expected.

(IBMCEPH-13821)

Conditional multipart upload operations now function as expected

Previously, while S3 operations were supported, HTTP preconditions for conditional multipart upload were not.

With this fix, HTTP preconditions for conditional multipart upload are fully implemented. As a result, conditional multipart upload operations are now supported as expected.

(IBMCEPH-13557)

Unordered bucket listings no longer repeat entries or loop indefinitely

Previously, unordered bucket listing had a bug when traversing regions of the entry space, which could cause entries to repeat or the operation to loop indefinitely.

With this fix, the traversal logic has been corrected. As a result, unordered bucket listing completes as expected without duplicate entries or infinite loops.

(IBMCEPH-12291)

Ceph Object Gateway no longer crashes during lifecycle processing of versioned buckets

Previously, the rgw daemon could crash during lifecycle (LC) processing on versioned buckets with a large number of object versions, particularly when expiration and noncurrent version expiration rules were applied concurrently.

With this fix, lifecycle processing handles object listing boundaries correctly, preventing invalid memory access during concurrent or batched operations. As a result, Ceph Object Gateway (rgw) no longer crashes during lifecycle deletion, and lifecycle processing completes reliably on large versioned buckets.

(IBMCEPH-14308)

Log buckets are now supported in erasure-coded pools

Previously, log buckets could not be created in erasure-coded (EC) pools due to append operation limitations.

With this fix, log records are first written to a temporary object in the replicated default.rgw.log pool and then copied to the EC pool upon commit. Implicit commit operations are handled asynchronously by a new BucketLoggingManager, while explicit commits continue to run synchronously. As a result, log buckets can now be used with EC pools, enabling more flexible storage configurations.

(IBMCEPH-11747)

Restore operations now validate storage class and return correct results

Previously, restore operations targeting a non-existent storage class succeeded silently and defaulted to the STANDARD class. However, the head-object API still reported the non-existent storage class, creating a mismatch between actual data placement and reported metadata.

With this fix, the restore operation validates whether the requested storage class exists before proceeding and fails with an appropriate error if it does not. As a result, restore operations no longer succeed with invalid storage classes, and object metadata accurately reflects the actual storage class.

(IBMCEPH-11517)

Cloud restore and transition operations are more robust and reliable

Previously, multiple issues affected cloud restore and transition workflows. Setting Days=0 in restore requests could leave objects in a broken state, multipart upload resumption could fail, restored objects could include incorrect ETag formatting, and the restore process was not shut down in the correct order.

With this fix, these issues are addressed across the restore and transition modules to ensure correct request handling, proper ETag formatting, and orderly process management. As a result, cloud restore and transition operations are more reliable, improving overall stability and service availability.

(IBMCEPH-9849)

Ceph Object Gateway multi-site

Bucket index metadata remains consistent after lifecycle expiration

Previously, in rare cases, lifecycle expiration in versioned buckets could leave behind stale bucket index metadata. This occurred due to incorrect shutdown and destruction ordering, which could trigger a segmentation violation and prevent proper cleanup. As a result, stale index entries remained even after objects were removed, potentially impacting operations that rely on accurate bucket index listings and causing errors such as (27) File too large.

With this fix, the Asio run queue is drained before destroying the storage backend, ensuring proper shutdown sequencing and cleanup. As a result, bucket index metadata is now correctly maintained after lifecycle expiration, preventing stale entries and improving stability.

(IBMCEPH-12980)

Ceph Object Gateway no longer crashes on client disconnect and bulk delete operations

Previously, two issues could cause Ceph Object Gateway to crash during request handling. When a client disconnected while Ceph Object Gateway was sending an error response, an unhandled exception could trigger an abort. In addition, bulk delete requests could cause an assertion failure due to attempts to close an already closed XML section.

With this fix, broken pipe errors from client disconnects are handled gracefully at write time, and redundant XML section close calls are removed. As a result, Ceph Object Gateway no longer crashes during client disconnects or bulk delete operations, improving overall stability.

(IBMCEPH-14447)

RADOS

Idle PGs are no longer incorrectly flagged as stuck peering

Previously, when OSDs restarted or were marked down, the cluster could report placement groups (PGs) as stuck peering even when they were active. This led to misleading HEALTH_WARN messages.

With this fix, PG statistics are updated more frequently, including for idle PGs, and the last_active and last_peered timestamps and reporting logic are adjusted. As a result, idle PGs are no longer incorrectly flagged as stuck peering, and HEALTH_WARN messages are triggered only after genuine timeouts (greater than 60 seconds).

(IBMCEPH-6744)

SMB file services

Wildcard permissions are now correctly applied for OSD access

Previously, incorrect permissions were applied to the OSD when wildcard (*) permissions were set on a volume. As a result, the SMB service that configured * permissions could not access the OSD.

With this fix, the permission parsing logic correctly interprets wildcard permissions. As a result, setting * permissions now applies the correct access rights, allowing the SMB service to access the OSD as expected.

(IBMCEPH-13415)

Ceph NVMe-oF gateway

Auto-listeners can no longer be deleted manually

Previously, the delete_listener gRPC method did not verify whether a listener was created manually or automatically. As a result, the listener del command could remove auto-listeners.

With this fix, the method returns an error when the target is an auto-listener. As a result, users cannot delete auto-listeners manually. Auto-listeners can only be removed by updating the subsystem network mask using the subsystem add_network or subsystem del_network commands.

(IBMCEPH-13379)

NFS service

Multi-active high availability is now supported for NFS server gateways

Previously, multiple active NFS server gateways were not supported behind HAProxy. As a result, only a single active gateway could be used, limiting scalability and not meeting high availability expectations.

With this fix, support for multiple active NFS gateways has been implemented using a Virtual IP (VIP) managed by HAProxy. This enables multiple gateways to operate concurrently. As a result, if a gateway fails, client NFS connections are automatically maintained or re-established with another available gateway, providing high availability and improved scalability.

(ISCE-2041)

RDMA mounts now correctly use RDMA

Previously, in RDMA-enabled configurations, if a client attempted to mount using TCP first and then RDMA on the same IP, the mount would incorrectly remain on TCP (nfsv4.2), even when RDMA was supported.

With this fix, the connection handling logic separates RDMA and TCP paths correctly. As a result, mounts now use RDMA as intended, instead of incorrectly falling back to TCP.

(IBMCEPH-13411)