IBM Support

Ceph status reporting clock skew warning

Troubleshooting


Problem

  • How to resolve MON clock skew issue in Ceph/FDF/ODF
  • Ceph OSD node with time sync issue

Symptom

  • The command #ceph -s showing one or more mons are out of time sync
# ceph -s
  cluster:
    id:     1111111-2222-3333-4444-555556666666
    health: HEALTH_WARN
            clock skew detected on mon.b, mon.c 
...
  • OSD fails to start with a repeating message unable to obtain rotating service keys; retrying
Jun 24 09:55:38 node-406 conmon[285430]: 2022-06-24 09:55:38.834 7f0a516cddc0 -1 osd.2 17318 unable to obtain rotating service keys; retrying
Jun 24 09:55:44 node-406 conmon[287350]: 2022-06-24 09:55:44.308 7f2c99f5fdc0 -1 osd.26 17318 unable to obtain rotating service keys; retrying
Jun 24 09:56:04 node-406 conmon[288107]: 2022-06-24 09:56:04.078 7f8ad797fdc0 -1 osd.14 17318 unable to obtain rotating service keys; retrying
Jun 24 09:56:06 node-406 conmon[284444]: 2022-06-24 09:56:06.466 7ffb7a2c3dc0 -1 osd.38 17318 unable to obtain rotating service keys; retrying
Jun 24 09:56:08 node-406 conmon[285430]: 2022-06-24 09:56:08.835 7f0a516cddc0 -1 osd.2 17318 unable to obtain rotating service keys; retrying

Cause

The Ceph/FDF/ODF nodes are unable to sync with the NTP servers.

Environment

  • IBM Storage Fusion Data Foundation (FDF) 4.x 
  • Red Hat OpenShift Data Foundation (ODF) 4.x
  • IBM Storage Ceph 5.x and above

Diagnosing The Problem

  • Example of a node not in time sync:
  • Note that Leap status is not  Normal:

# chronyc tracking
Reference ID    : AAABBBCCCC (111-22-333-444-55.redhat.com)
Stratum         : 3
Ref time (UTC)  : Fri July 24 11:22:33 2020
System time     : 0.000123456 seconds slow of NTP time
Last offset     : -0.00034568 seconds
RMS offset      : 0.00078901234 seconds
Frequency       : 0.147 ppm slow
Residual freq   : -0.074 ppm
Skew            : 0.061 ppm
Root delay      : 0.0456789 seconds
Root dispersion : 0.023456789 seconds
Update interval : 1234.55555onds
Leap status     : Not synchronised

Resolving The Problem

Manually force chronyc to sync the clocks by running the following...
  • Connect to the node reporting one of the above mentioned issues and turn off SELinux.
  • For Ceph use SSH to connect to the Ceph node.
  • For FDF/ODF, you can use 'oc debug node/' or use SSH (if keys are configured for the core user) to connect to the nodes
NOTE: Please make sure you turn back on selinux, you DO NOT want to keep this off for an extended period
 $ ssh core@<node>
 $ sudo -i
 or
 $ oc debug node/<node-name>
 $ chroot /host
 Temporarily disable SELinux
 $ setenforce 0
 Then run the makestep command
 $ setenforce 1
  • Manually force adjust time sync using chronyc:
# chronyc -a makestep
  • If the above step does not resolve the issue then restart chronyd service:
# systemctl stop chronyd; systemctl start chronyd; systemctl enable chronyd
  • If the commands above do not sync the time, check the NTP servers, for reference see Best practices for NTP and How to configure chrony

  • For FDF/ODF running on OCP nodes be mindful to use a machine configuration as detailed in configuring chrony time service

  • A group of OSD containers failed to start with the message unable to obtain rotating service keys these are related to a time sync issue on the node hosting the OSD containers/pods.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSSEWFV","label":"Storage Fusion Data Foundation"},"ARM Category":[{"code":"a8m3p000000UoIPAA0","label":"Support Reference Guide"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
28 March 2025

UID

ibm17171519