Identifying stuck placement groups

A placement group is not necessarily problematic just because it is not in a active+clean state. Generally, Ceph’s ability to self repair might not be working when placement groups get stuck.

Before you begin

Before you begin, make sure that you have root-level access to the node.

About this task

The stuck states include unclean, inactive, and stale.
Unclean
Placement groups contain objects that are not replicated the desired number of times. They should be recovering.
Inactive
Placement groups cannot process reads or writes because they are waiting for an OSD with the most up-to-date data to come back up.
Stale
Placement groups are in an unknown state, because the OSDs that host them have not reported to the monitor cluster in a while, and can be configured with the mon osd report timeout setting.

Procedure

Identify the stuck placement group by running the pg dump_stuck command.
ceph pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {INT}
For example,
[ceph: root@host01 /]# ceph pg dump_stuck stale
OK