Unfound objects
Understand and troubleshoot unfound objects.
The ceph health command returns an error message similar to the following one, containing the
unfound keyword, as in the following example:
HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)
What this means
Example situation
A placement group stores data on
osd.1 and osd.2.
osd.1goes down.osd.2handles some write operations.osd.1comes up.- A peering process between
osd.1andosd.2starts, and the objects missing onosd.1are queued for recovery. - Before Ceph copies new objects,
osd.2goesdown.
As a result, osd.1 knows that these objects exist, but there is no OSD that has a copy of the objects.
In this scenario, Ceph is waiting for the failed node to be accessible again, and the unfound objects blocks the recovery process.
Troubleshooting this problem
- Log in to the
cephadmshell.For example,[root@host01 ~]# cephadm shell
- Use the ceph health detail command to determine which placement group contains unfound objects.
For example,
[ceph: root@host01 /]# ceph health detail HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; recovery 5/937611 objects degraded (0.001%); 1/312537 unfound (0.000%) pg 3.8a5 is stuck unclean for 803946.712780, current state active+recovering, last acting [320,248,0] pg 3.8a5 is active+recovering, acting [320,248,0], 1 unfound recovery 5/937611 objects degraded (0.001%); **1/312537 unfound (0.000%)**
- List more information about the placement group.
ceph pg ID queryReplace ID with the ID of the placement group containing the unfound objects.
For example,[ceph: root@host01 /]# ceph pg 3.8a5 query { "state": "active+recovering", "epoch": 10741, "up": [ 320, 248, 0], "acting": [ 320, 248, 0], <snip> "recovery_state": [ { "name": "Started\/Primary\/Active", "enter_time": "2021-08-28 19:30:12.058136", "might_have_unfound": [ { "osd": "0", "status": "already probed"}, { "osd": "248", "status": "already probed"}, { "osd": "301", "status": "already probed"}, { "osd": "362", "status": "already probed"}, { "osd": "395", "status": "already probed"}, { "osd": "429", "status": "osd is down"}], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "0\/\/0\/\/-1", "backfill_info": { "begin": "0\/\/0\/\/-1", "end": "0\/\/0\/\/-1", "objects": []}, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": []}}, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": 0, "scrubber.block_writes": 0, "scrubber.finalizing": 0, "scrubber.waiting_on": 0, "scrubber.waiting_on_whom": []}}, { "name": "Started", "enter_time": "2021-08-28 19:30:11.044020"}],Themight_have_unfoundsection includes OSDs where Ceph tried to locate the unfound objects:- The
already probedstatus indicates that Ceph cannot locate the unfound objects in that OSD. - The
osd is downstatus indicates that Ceph cannot contact that OSD.
- The
- Troubleshoot the OSDs that are marked as down. See Down OSDs for details.
- If you are unable to fix the problem that causes the OSD to be down, open a support ticket. For more information, see IBM Support.