Troubleshooting
Problem
- How to resolve MON clock skew issue in Ceph/FDF/ODF
- Ceph OSD node with time sync issue
Symptom
- The command #
ceph -sshowing one or more mons are out of time sync
# ceph -s
cluster:
id: 1111111-2222-3333-4444-555556666666
health: HEALTH_WARN
clock skew detected on mon.b, mon.c
...
- OSD fails to start with a repeating message
unable to obtain rotating service keys; retrying
Jun 24 09:55:38 node-406 conmon[285430]: 2022-06-24 09:55:38.834 7f0a516cddc0 -1 osd.2 17318 unable to obtain rotating service keys; retrying
Jun 24 09:55:44 node-406 conmon[287350]: 2022-06-24 09:55:44.308 7f2c99f5fdc0 -1 osd.26 17318 unable to obtain rotating service keys; retrying
Jun 24 09:56:04 node-406 conmon[288107]: 2022-06-24 09:56:04.078 7f8ad797fdc0 -1 osd.14 17318 unable to obtain rotating service keys; retrying
Jun 24 09:56:06 node-406 conmon[284444]: 2022-06-24 09:56:06.466 7ffb7a2c3dc0 -1 osd.38 17318 unable to obtain rotating service keys; retrying
Jun 24 09:56:08 node-406 conmon[285430]: 2022-06-24 09:56:08.835 7f0a516cddc0 -1 osd.2 17318 unable to obtain rotating service keys; retrying
Cause
The Ceph/FDF/ODF nodes are unable to sync with the NTP servers.
Environment
- IBM Storage Fusion Data Foundation (FDF) 4.x
- Red Hat OpenShift Data Foundation (ODF) 4.x
- IBM Storage Ceph 5.x and above
Diagnosing The Problem
- Example of a node not in time sync:
-
Note that
Leap statusis notNormal:
# chronyc tracking
Reference ID : AAABBBCCCC (111-22-333-444-55.redhat.com)
Stratum : 3
Ref time (UTC) : Fri July 24 11:22:33 2020
System time : 0.000123456 seconds slow of NTP time
Last offset : -0.00034568 seconds
RMS offset : 0.00078901234 seconds
Frequency : 0.147 ppm slow
Residual freq : -0.074 ppm
Skew : 0.061 ppm
Root delay : 0.0456789 seconds
Root dispersion : 0.023456789 seconds
Update interval : 1234.55555onds
Leap status : Not synchronised
Resolving The Problem
Manually force chronyc to sync the clocks by running the following...
- Connect to the node reporting one of the above mentioned issues and turn off SELinux.
- For Ceph use SSH to connect to the Ceph node.
- For FDF/ODF, you can use 'oc debug node/' or use SSH (if keys are configured for the core user) to connect to the nodes
NOTE: Please make sure you turn back on selinux, you DO NOT want to keep this off for an extended period
$ ssh core@<node>
$ sudo -i
or
$ oc debug node/<node-name>
$ chroot /host
Temporarily disable SELinux
$ setenforce 0
Then run the makestep command
$ setenforce 1
- Manually force adjust time sync using
chronyc:
# chronyc -a makestep
- If the above step does not resolve the issue then restart
chronydservice:
# systemctl stop chronyd; systemctl start chronyd; systemctl enable chronyd
-
If the commands above do not sync the time, check the NTP servers, for reference see Best practices for NTP and How to configure chrony
-
For FDF/ODF running on OCP nodes be mindful to use a machine configuration as detailed in configuring chrony time service
-
A group of OSD containers failed to start with the message
unable to obtain rotating service keysthese are related to a time sync issue on the node hosting the OSD containers/pods.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB66","label":"Technology Lifecycle Services"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSSEWFV","label":"Storage Fusion Data Foundation"},"ARM Category":[{"code":"a8m3p000000UoIPAA0","label":"Support Reference Guide"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
28 March 2025
UID
ibm17171519