On 22/01/2018 17:07, Alan Somers wrote:> Since upgrading my jail server to 11.1-RELEASE, the clock occasionally > jumps backwards by 5-35 minutes for no apparent reason. Has anybody seen > something like this? > > Details > ====> > * Happens about once a day on my jail server, and has happened at least > once on a separate bhyve server. > > * The jumps almost always happen between 1 and 3 AM, but I've also seen > them happen at 06:30 and 20:15. >That's the window when the period scripts are run which if you have a default configuration and a lot of jails will put the system under a lot of stress.> * I'm using the default ntp.conf file. >Are you running ntpd inside the jail or on the jail host? On my jail systems (which are 10.3 and 11.1) I run ntpd out the jail host (outside all jails) and not inside the jails and the jails then get the accurate time as the underlying host has accurate time. Mike -- Mike Pumford | Senior Software Engineer T: +44 (0) 1225 710635 BSQUARE - The business of IoT www.bsquare.com <http://www.bsquare.com/>
On Tue, Jan 23, 2018 at 3:48 AM, Mike Pumford <michaelp at bsquare.com> wrote:> On 22/01/2018 17:07, Alan Somers wrote: > >> Since upgrading my jail server to 11.1-RELEASE, the clock occasionally >> jumps backwards by 5-35 minutes for no apparent reason. Has anybody seen >> something like this? >> >> Details >> ====>> >> * Happens about once a day on my jail server, and has happened at least >> once on a separate bhyve server. >> >> * The jumps almost always happen between 1 and 3 AM, but I've also seen >> them happen at 06:30 and 20:15. >> >> That's the window when the period scripts are run which if you have a > default configuration and a lot of jails will put the system under a lot of > stress. >That did not fail to escape my notice. However, none of the jails' periodic jobs involve the clock in any way. And I wouldn't think that a high CPU load could cause clock drift, could it? This isn't Windows XP, after all.> * I'm using the default ntp.conf file. >> >> Are you running ntpd inside the jail or on the jail host? On my jail > systems (which are 10.3 and 11.1) I run ntpd out the jail host (outside all > jails) and not inside the jails and the jails then get the accurate time as > the underlying host has accurate time. >Only on the host. New info: there is a possibility that my NFS server is hanging for awhile. That would explain my problem's timing. However, ntpd shouldn't be accessing any NFS shares, and I wouldn't think that a hung NFS server should be able to pause the clock. I'm doing a new experiment that should be more informative. But I'll have to wait until the problem recurs to learn anything. -Alan
On Tue, Jan 23, 2018 at 8:40 AM, Alan Somers <asomers at freebsd.org> wrote:> On Tue, Jan 23, 2018 at 3:48 AM, Mike Pumford <michaelp at bsquare.com> > wrote: > >> On 22/01/2018 17:07, Alan Somers wrote: >> >>> Since upgrading my jail server to 11.1-RELEASE, the clock occasionally >>> jumps backwards by 5-35 minutes for no apparent reason. Has anybody seen >>> something like this? >>> >>> Details >>> ====>>> >>> * Happens about once a day on my jail server, and has happened at least >>> once on a separate bhyve server. >>> >>> * The jumps almost always happen between 1 and 3 AM, but I've also seen >>> them happen at 06:30 and 20:15. >>> >>> That's the window when the period scripts are run which if you have a >> default configuration and a lot of jails will put the system under a lot of >> stress. >> > > That did not fail to escape my notice. However, none of the jails' > periodic jobs involve the clock in any way. And I wouldn't think that a > high CPU load could cause clock drift, could it? This isn't Windows XP, > after all. > > >> * I'm using the default ntp.conf file. >>> >>> Are you running ntpd inside the jail or on the jail host? On my jail >> systems (which are 10.3 and 11.1) I run ntpd out the jail host (outside all >> jails) and not inside the jails and the jails then get the accurate time as >> the underlying host has accurate time. >> > > Only on the host. > > New info: there is a possibility that my NFS server is hanging for > awhile. That would explain my problem's timing. However, ntpd shouldn't > be accessing any NFS shares, and I wouldn't think that a hung NFS server > should be able to pause the clock. I'm doing a new experiment that should > be more informative. But I'll have to wait until the problem recurs to > learn anything. >I have a little more data now. The problem happens much more frequently than I originally realized, but usually for just a few seconds at a time. It looks like the system is hanging for awhile and then recovering. Or at least, the clocks are hanging. The only other possibility would be for both the realtime _and_ monotonic clocks to jump backwards. In any case, the problem is not ntpd's fault. I don't know what could cause a system to hang for up to 30 minutes without crashing, and I'm not sure how to tell unless it happens during working hours. I'll send another update if I learn more. -Alan