On Thu, 27 Feb 2020 at 06:43, Peter Jeremy <peter at rulingia.com> wrote:
> On 2020-Feb-26 16:37:43 +1100, Dewayne Geraghty <dewaynegeraghty at
gmail.com>
> wrote:
> >I usually run ntpd with both aslr and as user ntpd. While testing I
> >noticed that my server with a direct network cable to my main time
keeper,
> >jumped from the expected stratum 2 to 14 as follows (I record the date
so
> I
> >can synch with the debug log, also below):
> >
> >vm.loadavg={ 0.09 0.10 0.18 }
> >
> >Wed 26 Feb 2020 15:16:38 AEDT
> > remote refid st t when poll reach delay offset
> > jitter
>
>
>=============================================================================>
> 10.0.7.6 203.35.83.242 2 u 44 64 377 0.147 -227.12
> 33.560
> >*127.127.1.1 .LOCL. 14 l 59 128 377 0.000 0.000
> 0.000
>
> >26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad
>
> Why is this bad? You've specified that this is a valid clock source so
> ntpd is free to use it if it decides it is the best source of time.
>
> >server 127.127.1.1 minpoll 7 maxpoll 7
> >fudge 127.127.1.1 stratum 14
>
> Synchronizing to the local clock (ie using 127.127.1.x as a reference) is
> almost never correct. What external (to NTP) source is being used to
> synchronize the local clock?
>
> >I'm also very surprised that the jitter on the server (under
testing) is
> so
> >poor. The internet facing time server is
> >*x.y.z.t .ATOM. 1 u 73 512 7 23.776 34.905
95.961
> >but its very old and not running aslr.
>
> The 23ms distance to the peer suggests that this is over the Internet.
> What
> sort of link do you have to the Internet and how heavily loaded is it? The
> NTP protocol includes the assumption that the client-server path delay is
> symmetric - this is often untrue for SOHO connections. And SOHO
> connections
> will often wind up saturated in one direction - which skews the apparent
> timestamps and shows up as high jitter values.
>
> > /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g -u ntpd --nofork
> ...
> >I get similar results with /usr/sbin/ntpd, I've been testing both
and
> >happened to record details for the port ntpd.
>
> It's probably not relevant but it would be useful for you to say up
front
> which ntpd you are having problems with and which version of the port you
> have installed.
>
> --
> Peter Jeremy
>
Hi Peter, I appreciate your thoughts. Yes, using LOCL is bad because it
should only be used when the stratum 2 machine is unavailable, and it isn't
(representative ping time average 0.15ms). The load is less than 10% on
both devices and both the internet and internal traffic is typically less
than 50Kb. :/
The use of LOCL clock was a fix as named failed if ntpd only used the
timeserver and it lost sync (due to some ipsec work another story), I
suspect kerberos had a part as it uses tkey-gssapi-keytab. I should
investigate why the use of LOCL clock makes things work, but ... its a
workaround and I'm ok with it.
I'm at my wits end, I've systematically changed one variable from the
list,
and always the system clock reverts to LOCL within 20 minutes if not
immediately. This is FreeBSD 12.1-STABLE #0 r356046M: Tue Dec 24. I think
its time to try an earlier ntp to see if that helps (???) :(
The variables tested, one changed at a time:
- security.mac.ntpd.enabled
- kern.elf64.aslr.enable kern.elf64.aslr.stack_gap changed as a pair
- security.mac.portacl.rules
- run as root or ntpd
- use of proccontrol (which was changed with different combinations of
aslr, stack_gap
- all off and run as root
- and of course changes to the command line (-g or -G or -g -x)
I guess everyone else is using ntpd without a problem? (or not...)
Cheers, Dewayne
PS Apologies for delay in getting back, gmail placed your reply in the spam
folder :/