Identifying stuck placement groups

A placement group is not necessarily problematic just because it is not in a active+clean state. Generally, Ceph’s ability to self repair might not be working when placement groups get stuck.

Before you begin

Before you begin, make sure that you have root-level access to the node.

About this task

The stuck states include unclean, inactive, and stale.

Unclean: Placement groups contain objects that are not replicated the desired number of times. They should be recovering.
Inactive: Placement groups cannot process reads or writes because they are waiting for an OSD with the most up-to-date data to come back up.
Stale: Placement groups are in an unknown state, because the OSDs that host them have not reported to the monitor cluster in a while, and can be configured with the mon osd report timeout setting.

Procedure

Identify the stuck placement group by running the pg dump_stuck command.

ceph pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {INT}

For example,

[ceph: root@host01 /]# ceph pg dump_stuck stale
OK