FDF/ODF: PGs stuck in unknown, misplaced, and/or remapped state after the upgrade due to chnaged device class changes of OSD's

Troubleshooting

Problem

Pools are utilizing a crush rule that references a device class that is not currently utilized in the cluster, this results in PGs being in an unknown, misplaced, and/or remapped state
More than 100 percent of objects are misplaced in Ceph after upgrading the OpenShift Data Foundations operator to 4.16.x
Rook created crush rules to utilize a device class that is not used in the cluster; these crush rules are now applied to pools, causing issues
The defaultCephDeviceClass in the StorageCluster CR is incorrect
Transitioned from unsupported HDDs to SSDs in ODF, resulting in data unavailability

Example:
Ceph status shows a large amount of objects misplaced

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config

  cluster:
    id:     [REDACTED]
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,d (age 7d)
    mgr: b(active, since 7d), standbys: a
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 7d), 6 in (since 4M); 169 remapped pgs
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   12 pools, 281 pgs
    objects: 184.82k objects, 265 GiB
    usage:   1.2 TiB used, 17 TiB / 18 TiB avail
    pgs:     33.808% pgs unknown
             886324/554472 objects misplaced (159.850%)
             138 active+clean+remapped
             95  unknown
             48  active+undersized+remapped

All OSDs are utilizing the device class ssd

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd tree -c /var/lib/rook/openshift-storage/openshift-storage.config

ID  CLASS  WEIGHT   TYPE NAME                                         STATUS  REWEIGHT  PRI-AFF
-1         5.85956  root default                                                               
-5         1.95319      host [REDACTED]                           
 1    ssd  0.97659          osd.1                                         up   1.00000  1.00000
 5    ssd  0.97659          osd.5                                         up   1.00000  1.00000
-7         1.95319      host [REDACTED]                           
 2    ssd  0.97659          osd.2                                         up   1.00000  1.00000
 4    ssd  0.97659          osd.4                                         up   1.00000  1.00000
-3         1.95319      host [REDACTED]                           
 0    ssd  0.97659          osd.0                                         up   1.00000  1.00000
 3    ssd  0.97659          osd.3                                         up   1.00000  1.00000

The pools are utilizing a crush rule that references HDD, while there are only SSDs in the cluster

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd pool ls detail -c /var/lib/rook/openshift-storage/openshift-storage.config

pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 25 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9437 lfor 0/0/30 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd

The crush rules being utilized reference HDDs, while there are only SSDs in the cluster

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd crush rule dump -c /var/lib/rook/openshift-storage/openshift-storage.config

    {
        "rule_id": 25,
        "rule_name": "ocs-storagecluster-cephblockpool_host_hdd",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~hdd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },

Cause

When upgrading the OpenShift Data Foundations operator to 4.16.x, new crush rules are created for the Ceph pools. These new crush rules are device class specific. The pools are now utilizing a crush rule with a device class that is not being utilized in the cluster, resulting in PGs being in an unknown, misplaced, and/or remapped state

The default behavior of Rook when creating these crush rules is to choose the first item in the list from the command $ ceph osd crush class ls. This causes issues as customers who have transitioned from the unsupported HDDs to SSDs may still have HDDs in their list.

This issue is currently being tracked through a Jira.

Artifacts

Product/Version	Related BZ/Jira	Errata	Fixed Version
ODF/4.19	Jira DFBUGS-948	Errata N/A	4.19.0
ODF/4.18	Jira DFBUGS-1666	Errata N/A	4.18.1
ODF/4.17	Jira DFBUGS-1667	Errata RHSA-2025:17145	4.17.14
ODF/4.16	Jira DFBUGS-1668	Errata RHBA-2025:17157	4.16.16

Environment

IBM Storage Fusion Data Foundation (FDF) 4.x

Red Hat OpenShift Data Foundation (ODF) 4.x

Diagnosing The Problem

The Ceph command $ ceph osd crush class ls has two items in the list, with ssd not being the first in the list

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd crush class ls -c /var/lib/rook/openshift-storage/openshift-storage.config

[
    "hdd",
    "ssd"
]

The storagecluster CR has hdd as the defaultCephDeviceClass

$ oc get storagecluster -o json | jq '.items[].status.defaultCephDeviceClass'

"hdd"

The CephCluster CR has the device class hdd and ssd in the deviceClasses section. Notably hdd is listed first

$ oc get cephcluster -o json | jq '.items[].status.storage.deviceClasses

[
  {
    "name": "hdd"
    "name": "ssd"
  }
]

Resolving The Problem

For the Ceph commands, use KCS Configuring the Rook-Ceph Toolbox in Data Foundation 4.x or KCS Accessing the Ceph Storage CLI in Data Foundation 4.x .

The following steps involve the deletion of crush rules and crush classes. It is recommended to open a support case with Red Hat before applying this solution to minimize disruptions to your environment.

1. Switch to the openshift-storage project and scale down the rook-ceph-operator and ocs-operator:

$ oc project openshift-storage; oc scale deployment rook-ceph-operator ocs-operator --replicas=0

2. Switch to Ceph Tools and make note of HDD OSDs:

$ ceph osd df | grep -w hdd   ## Save output for step 11

3. Set the flags nobackfill, norecover, and norebalance

$ ceph osd set nobackfill
$ ceph osd set norecover
$ ceph osd set norebalance

NOTE: From the step below until the latest step the data will be unavailable for read and write

4. Change all pools to utilize an generic Crush Rule which does not reference a device class, hdd or ssd.
In this case we are utilizing the crush rule replicated_rule as it doesn't have a device class specified.
NOTE: Do not use the the for loop if you have custom pools with custom crush rules

$ ceph osd crush rule ls
$ for i in $(ceph osd pool ls); do ceph osd pool set $i crush_rule replicated_rule; done

5. Attempt to delete the hdd crush class NOTE: This fails as we have crush rules that reference hdd, make note of that list

$ ceph osd crush class ls
$ ceph osd crush class rm hdd

6. Delete the old crush rules one by one that reference hdd (Utilize the rules noted in the previous step)

$ ceph osd crush rule rm <crush_rules_hdd>

7. Remove the old crush class

$ ceph osd crush class rm hdd

8. Scale up the rook-ceph-operator and ocs-operator

$ oc scale deployment rook-ceph-operator ocs-operator --replicas=1

9. Remove the default defaultCephDeviceClass parameter using patch command:

$ oc patch storagecluster ocs-storagecluster -n openshift-storage --type=json --subresource=status --patch '[{"op": "remove", "path": "/status/defaultCephDeviceClass"}]'

10. Wait five to ten minutes to allow the operators time to reconcile

$ sleep 300

11. Verify new rules have been created in Ceph that reference the device class ssd and the pools are utilizing these new crush rules

$ ceph osd pool ls detail 
$ ceph osd crush rule dump

12. Change OSDs which are HDD Clas and unset nobackfill/norecover/norebalance:
You will need the output from step 2.

$ ceph osd crush rm-device-class osd.{num}; ceph osd crush set-device-class ssd osd.{num}  ## Repeat for all OSDs seen as an HDD.

$ ceph osd unset nobackfill
$ ceph osd unset norecover
$ ceph osd unset norebalance

NOTE: Now you will need to monitor the status until the Rebuild is finished
13. Verify the health of Ceph

$ ceph status

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSSEWFV","label":"Storage Fusion Data Foundation"},"ARM Category":[{"code":"a8m3p000000UoIPAA0","label":"Support Reference Guide"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":""}]

Tips

FDF/ODF: PGs stuck in unknown, misplaced, and/or remapped state after the upgrade due to chnaged device class changes of OSD's

Troubleshooting

Problem

Cause

Environment

Diagnosing The Problem

Resolving The Problem

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?