Since upgrading my jail server to 11.1-RELEASE, the clock occasionally jumps backwards by 5-35 minutes for no apparent reason. Has anybody seen something like this? Details ==== * Happens about once a day on my jail server, and has happened at least once on a separate bhyve server. * The jumps almost always happen between 1 and 3 AM, but I've also seen them happen at 06:30 and 20:15. * The jumps are always backwards, never forwards. * Inspecting the logs of both the host and its jails shows nothing interesting that's correlated with the jumps. Sometimes I find Amanda doing a backup, but not always. * Sometimes the jumps happen immediately after ntpd adds a new server to its list, but not always. * I'm using the default ntp.conf file. * ntpd is running on both, and it should be the only process touching the clock. I have a script running "ntpq -c peers" once a minute, which shows the offset for one server suddenly jump to a large negative number. Then the offsets for other servers jump to the same value, then either ntpd fixes the clock or exits because the offset is too high. * Said script is sleeping using the monotonic clock, not the realtime clock. As expected, successive timestamps differ by about 6.5 minutes when ntpd corrects a 5.5 minute clock offset. However, when the clock presumably jumps backwards I _don't_ see successive timestamps go backwards too. They keep marching forward at the expected rate. This makes me wonder if the entire machine is hanging. But it would have to be a pretty serious hang to stop the clock from ticking. Any ideas?
On 22/01/2018 17:07, Alan Somers wrote:> Since upgrading my jail server to 11.1-RELEASE, the clock occasionally > jumps backwards by 5-35 minutes for no apparent reason. Has anybody seen > something like this? > > Details > ====> > * Happens about once a day on my jail server, and has happened at least > once on a separate bhyve server. > > * The jumps almost always happen between 1 and 3 AM, but I've also seen > them happen at 06:30 and 20:15. >That's the window when the period scripts are run which if you have a default configuration and a lot of jails will put the system under a lot of stress.> * I'm using the default ntp.conf file. >Are you running ntpd inside the jail or on the jail host? On my jail systems (which are 10.3 and 11.1) I run ntpd out the jail host (outside all jails) and not inside the jails and the jails then get the accurate time as the underlying host has accurate time. Mike -- Mike Pumford | Senior Software Engineer T: +44 (0) 1225 710635 BSQUARE - The business of IoT www.bsquare.com <http://www.bsquare.com/>
On Jan 22, 2018, at 12:07 PM, Alan Somers <asomers at freebsd.org> wrote:> > * Sometimes the jumps happen immediately after ntpd adds a new server to > its list, but not always. > > * I'm using the default ntp.conf file. > > * ntpd is running on both, and it should be the only process touching the > clock. I have a script running "ntpq -c peers" once a minute, which shows > the offset for one server suddenly jump to a large negative number. Then > the offsets for other servers jump to the same value, then either ntpd > fixes the clock or exits because the offset is too high.- Lose ntpd running in jails and run it only on the host. Running in the jail is totally unnecessary. - Is this a bare metal server or VM? Lots of clock issues with VM?s? - Stagger your periodic jobs on the host and the jail so they don?t all run at the same time slamming the host. -- inoc.net!rblayzor XMPP: rblayzor.AT.inoc.net PGP: https://inoc.net/~rblayzor/