On Mon, 2015-07-13 at 04:31 +1000, Peter Jeremy wrote:> On 2015-Jul-12 09:41:43 -0600, Ian Lepore <ian at freebsd.org> wrote:
> >And let's all just hope that a week or two of testing is enough
when
> >jumping a major piece of software forward several years in its
> >independent evolution.
>
> Whilst I support John's desire for NTP to be updated, I also do not
> think this is the appropriate time to do so. That said, the final
> decision is up to re at .
>
> >The import of 4.2.8p2 several months ago resulted in complete failure
of
> >timekeeping on all my arm systems. Just last week I tracked it down to
> >a kernel bug (which I haven't committed the fix for yet). While
the bug
> >has been in the kernel for years, it tooks a small change in ntpd
> >behavior to trigger it.
> >
> >Granted it's an odd corner-case problem that won't affect most
users
> >because they just use the stock ntp.conf file (and it only affects
> >systems that have a large time step due to no battery-backed clock).
> >But it took me weeks to find enough time to track down the cause of the
> >problem.
>
> I'm not using the stock ntp.conf on my RPis and didn't notice any
NTP
> issues. Are you able to provide more details of either the ntp.conf
> options that trigger the bug or the kernel bug itself? A quick search
> failed to find anything.
>
I just committed the kernel fix as r285424; the commit message has some
info on why the new ntpd made the problem visible.
I should have said "stock rc.conf and ntp.conf"... To get the problem
to
happen you've got to set rc.conf ntpd_sync_on_start=NO and allow ntpd to
make a large step (-g without -q, or tinker panic 0). I don't remember
why I had sync on start disabled on most of my arm systems (probably a
one-time experiment that I forgot to undo and it got copied around), but
I suspect most people who don't have battery clocks will have it set to
yes, and that's why nobody else saw this problem.
To me, the problem was mainly illustrative of how a tiny innocuous
change (ntpd making a series of ntp_adjtime() calls in a different, but
still correct, order than it used to) can expose a completely unexpected
longstanding bug in our code. Gotta wonder if any more of those are
lurking. :/
-- Ian