thr3ads.net - freebsd stable - ntp problems stratum 2 to 14? [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Dewayne Geraghty

2020-Feb-26 05:37 UTC

ntp problems stratum 2 to 14?

I usually run ntpd with both aslr and as user ntpd.  While testing I
noticed that my server with a direct network cable to my main time keeper,
jumped from the expected stratum 2 to 14 as follows (I record the date so I
can synch with the debug log, also below):

vm.loadavg={ 0.09 0.10 0.18 }

Wed 26 Feb 2020 15:16:38 AEDT
     remote           refid      st t when poll reach   delay   offset
 jitter
=============================================================================
10.0.7.6        203.35.83.242    2 u   44   64  377    0.147  -227.12
 33.560
*127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
0.000
Wed 26 Feb 2020 15:18:46 AEDT
     remote           refid      st t when poll reach   delay   offset
 jitter
=============================================================================
10.0.7.6        LOCAL(1)        14 u   42   64  377    0.147  -227.12
 44.529
*127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
0.000
Wed 26 Feb 2020 15:20:54 AEDT
     remote           refid      st t when poll reach   delay   offset
 jitter
=============================================================================
10.0.7.6        LOCAL(1)        14 u   42   64  377    0.147  -227.12
 73.969
*127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
0.000
Wed 26 Feb 2020 15:23:02 AEDT
     remote           refid      st t when poll reach   delay   offset
 jitter
=============================================================================*10.0.7.6
LOCAL(1)        14 u   37   64  377    0.164  -370.64
 74.119
 127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
0.000
Time marches on
Wed 26 Feb 2020 16:03:35 AEDT
     remote           refid      st t when poll reach   delay   offset
 jitter
=============================================================================*10.0.7.6
LOCAL(1)        14 u   11   64  177    0.133   -3.148
 72.295
 127.127.1.1     .LOCL.          14 l  406  128   10    0.000    0.000
0.000
Wed 26 Feb 2020 16:05:43 AEDT
     remote           refid      st t when poll reach   delay   offset
 jitter
=============================================================================*10.0.7.6
203.35.83.242    2 u    7   64  377    0.164  -42.789
 73.762
 127.127.1.1     .LOCL.          14 l  534  128   20    0.000    0.000
0.000

The debug for the above is:
26 Feb 14:58:33 ntpd[8772]: Command line: /usr/local/sbin/ntpd -c
/etc/ntp.conf -g -g -u ntpd --nofork
...
26 Feb 14:58:34 ntpd[8772]: 10.0.7.6 e014 84 reachable
26 Feb 14:58:35 ntpd[8772]: LOCAL(1) 8014 84 reachable
26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad
26 Feb 15:03:40 ntpd[8772]: 0.0.0.0 c515 05 clock_sync
26 Feb 15:22:25 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer  <=== Good!
26 Feb 15:22:25 ntpd[8772]: 0.0.0.0 0613 03 spike_detect -0.370644 s
26 Feb 15:30:03 ntpd[8772]: 0.0.0.0 061c 0c clock_step -0.536289 s
26 Feb 15:30:02 ntpd[8772]: 0.0.0.0 0615 05 clock_sync
26 Feb 15:30:03 ntpd[8772]: 0.0.0.0 c618 08 no_sys_peer
26 Feb 15:30:03 ntpd[8772]: 10.0.7.6 e014 84 reachable
26 Feb 15:30:07 ntpd[8772]: LOCAL(1) 8014 84 reachable
26 Feb 15:30:21 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
...
26 Feb 15:46:49 ntpd[8772]: 0.0.0.0 c618 08 no_sys_peer
26 Feb 15:46:57 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer

...
26 Feb 15:56:58 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
...
26 Feb 16:24:33 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== and stays LOCAL
which is now normal for this box  :(

Should the jump to stratum 14 be expected?  Anything obviously wrong with
the ntp.conf?

I've had a few days of testing on what is usually a very stable (time-wise
system), seems that running at prio 20 is required.

/etc/ntp.conf contains
rlimit memlock -1
rlimit filenum 32
driftfile /var/db/ntp/drift
disable bclient
server 10.0.7.6 iburst minpoll 4 maxpoll 6 version 4 key 23057 prefer

server 127.127.1.1 minpoll 7 maxpoll 7
fudge  127.127.1.1 stratum 14

restrict -4 default ignore
restrict -6 default ignore
restrict 127.0.0.1  nomodify nopeer notrap
restrict -6 ::1     nomodify nopeer notrap
restrict 0.0.0.0 ignore

restrict 10.0.7.6 nomodify nopeer noquery notrap ntpport
restrict 10.169.168.91 mask 255.255.255.0 nomodify nopeer noquery notrap
ntpport kod limited


I'm also very surprised that the jitter on the server (under testing) is so
poor.  The internet facing time server is
*x.y.z.t   .ATOM.           1 u   73  512    7   23.776   34.905  95.961
but its very old and not running aslr.

Any ideas or pointers would be appreciated.  This is very, time consuming.
:)

I'm using the following command sequence as these are all being changed
sysctl kern.elf64.aslr.enable=1 kern.elf64.aslr.stack_gap=1
security.mac.ntpd.enabled=1 && \
/usr/bin/proccontrol -m aslr -s disable /usr/local/sbin/ntpdate -v -a
23057  -k /etc/ntp.keys 10.0.7.6 && sleep 2 && \
/rescue/nice -n -20  /usr/bin/proccontrol -m aslr -s disable
 /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g  -u ntpd --nofork

I get similar results with /usr/sbin/ntpd, I've been testing both and
happened to record details for the port ntpd.

Regards, Dewayne

Peter Jeremy

2020-Feb-26 19:43 UTC

head link

ntp problems stratum 2 to 14?

On 2020-Feb-26 16:37:43 +1100, Dewayne Geraghty <dewaynegeraghty at
gmail.com> wrote:>I usually run ntpd with both aslr and as user ntpd.  While testing I
>noticed that my server with a direct network cable to my main time keeper,
>jumped from the expected stratum 2 to 14 as follows (I record the date so I
>can synch with the debug log, also below):
>
>vm.loadavg={ 0.09 0.10 0.18 }
>
>Wed 26 Feb 2020 15:16:38 AEDT
>     remote           refid      st t when poll reach   delay   offset
> jitter
>=============================================================================>
10.0.7.6        203.35.83.242    2 u   44   64  377    0.147  -227.12 33.560
>*127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000 
0.000
>26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad
Why is this bad?  You've specified that this is a valid clock source so
ntpd is free to use it if it decides it is the best source of time.
>server 127.127.1.1 minpoll 7 maxpoll 7
>fudge  127.127.1.1 stratum 14
Synchronizing to the local clock (ie using 127.127.1.x as a reference) is
almost never correct.  What external (to NTP) source is being used to
synchronize the local clock?
>I'm also very surprised that the jitter on the server (under testing) is
so
>poor.  The internet facing time server is
>*x.y.z.t   .ATOM.           1 u   73  512    7   23.776   34.905  95.961
>but its very old and not running aslr.
The 23ms distance to the peer suggests that this is over the Internet.  What
sort of link do you have to the Internet and how heavily loaded is it?  The
NTP protocol includes the assumption that the client-server path delay is
symmetric - this is often untrue for SOHO connections.  And SOHO connections
will often wind up saturated in one direction - which skews the apparent
timestamps and shows up as high jitter values.
> /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g  -u ntpd --nofork
...>I get similar results with /usr/sbin/ntpd, I've been testing both and
>happened to record details for the port ntpd.
It's probably not relevant but it would be useful for you to say up front
which ntpd you are having problems with and which version of the port you
have installed.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20200227/5e4a875d/attachment.sig>

Ian Lepore

2020-Mar-05 18:19 UTC

head link

ntp problems stratum 2 to 14?

On Wed, 2020-02-26 at 16:37 +1100, Dewayne Geraghty
wrote:> I usually run ntpd with both aslr and as user ntpd.  While testing I
> noticed that my server with a direct network cable to my main time keeper,
> jumped from the expected stratum 2 to 14 as follows (I record the date so I
> can synch with the debug log, also below):
> 
> vm.loadavg={ 0.09 0.10 0.18 }
> 
> Wed 26 Feb 2020 15:16:38 AEDT
>      remote           refid      st t when poll reach   delay   offset
>  jitter
>
=============================================================================>
10.0.7.6        203.35.83.242    2 u   44   64  377    0.147  -227.12
>  33.560
> *127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
> 0.000
> Wed 26 Feb 2020 15:18:46 AEDT
>      remote           refid      st t when poll reach   delay   offset
>  jitter
>
=============================================================================>
10.0.7.6        LOCAL(1)        14 u   42   64  377    0.147  -227.12
>  44.529
> *127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
> 0.000
> Wed 26 Feb 2020 15:20:54 AEDT
>      remote           refid      st t when poll reach   delay   offset
>  jitter
>
=============================================================================>
10.0.7.6        LOCAL(1)        14 u   42   64  377    0.147  -227.12
>  73.969
> *127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
> 0.000
> Wed 26 Feb 2020 15:23:02 AEDT
>      remote           refid      st t when poll reach   delay   offset
>  jitter
>
=============================================================================>
*10.0.7.6        LOCAL(1)        14 u   37   64  377    0.164  -370.64
>  74.119
>  127.127.1.1     .LOCL.          14 l   59  128  377    0.000    0.000
> 0.000
> Time marches on
> Wed 26 Feb 2020 16:03:35 AEDT
>      remote           refid      st t when poll reach   delay   offset
>  jitter
>
=============================================================================>
*10.0.7.6        LOCAL(1)        14 u   11   64  177    0.133   -3.148
>  72.295
>  127.127.1.1     .LOCL.          14 l  406  128   10    0.000    0.000
> 0.000
> Wed 26 Feb 2020 16:05:43 AEDT
>      remote           refid      st t when poll reach   delay   offset
>  jitter
>
=============================================================================>
*10.0.7.6        203.35.83.242    2 u    7   64  377    0.164  -42.789
>  73.762
>  127.127.1.1     .LOCL.          14 l  534  128   20    0.000    0.000
> 0.000
> 
> The debug for the above is:
> 26 Feb 14:58:33 ntpd[8772]: Command line: /usr/local/sbin/ntpd -c
> /etc/ntp.conf -g -g -u ntpd --nofork
> ...
> 26 Feb 14:58:34 ntpd[8772]: 10.0.7.6 e014 84 reachable
> 26 Feb 14:58:35 ntpd[8772]: LOCAL(1) 8014 84 reachable
> 26 Feb 15:03:40 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== bad
> 26 Feb 15:03:40 ntpd[8772]: 0.0.0.0 c515 05 clock_sync
> 26 Feb 15:22:25 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer  <=== Good!
> 26 Feb 15:22:25 ntpd[8772]: 0.0.0.0 0613 03 spike_detect -0.370644 s
> 26 Feb 15:30:03 ntpd[8772]: 0.0.0.0 061c 0c clock_step -0.536289 s
> 26 Feb 15:30:02 ntpd[8772]: 0.0.0.0 0615 05 clock_sync
> 26 Feb 15:30:03 ntpd[8772]: 0.0.0.0 c618 08 no_sys_peer
> 26 Feb 15:30:03 ntpd[8772]: 10.0.7.6 e014 84 reachable
> 26 Feb 15:30:07 ntpd[8772]: LOCAL(1) 8014 84 reachable
> 26 Feb 15:30:21 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
> ...
> 26 Feb 15:46:49 ntpd[8772]: 0.0.0.0 c618 08 no_sys_peer
> 26 Feb 15:46:57 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
> 
> ...
> 26 Feb 15:56:58 ntpd[8772]: 10.0.7.6 f01a 8a sys_peer
> ...
> 26 Feb 16:24:33 ntpd[8772]: LOCAL(1) 901a 8a sys_peer <== and stays
LOCAL
> which is now normal for this box  :(
> 
> Should the jump to stratum 14 be expected?  Anything obviously wrong with
> the ntp.conf?
> 
> I've had a few days of testing on what is usually a very stable
(time-wise
> system), seems that running at prio 20 is required.
> 
> /etc/ntp.conf contains
> rlimit memlock -1
> rlimit filenum 32
> driftfile /var/db/ntp/drift
> disable bclient
> server 10.0.7.6 iburst minpoll 4 maxpoll 6 version 4 key 23057 prefer
> 
> server 127.127.1.1 minpoll 7 maxpoll 7
> fudge  127.127.1.1 stratum 14
> 
> restrict -4 default ignore
> restrict -6 default ignore
> restrict 127.0.0.1  nomodify nopeer notrap
> restrict -6 ::1     nomodify nopeer notrap
> restrict 0.0.0.0 ignore
> 
> restrict 10.0.7.6 nomodify nopeer noquery notrap ntpport
> restrict 10.169.168.91 mask 255.255.255.0 nomodify nopeer noquery notrap
> ntpport kod limited
> 
> 
> I'm also very surprised that the jitter on the server (under testing)
is so
> poor.  The internet facing time server is
> *x.y.z.t   .ATOM.           1 u   73  512    7   23.776   34.905  95.961
> but its very old and not running aslr.
> 
> Any ideas or pointers would be appreciated.  This is very, time consuming.
> :)
> 
> I'm using the following command sequence as these are all being changed
> sysctl kern.elf64.aslr.enable=1 kern.elf64.aslr.stack_gap=1
> security.mac.ntpd.enabled=1 && \
> /usr/bin/proccontrol -m aslr -s disable /usr/local/sbin/ntpdate -v -a
> 23057  -k /etc/ntp.keys 10.0.7.6 && sleep 2 && \
> /rescue/nice -n -20  /usr/bin/proccontrol -m aslr -s disable
>  /usr/local/sbin/ntpd -c /etc/ntp.conf -g -g  -u ntpd --nofork
> 
> I get similar results with /usr/sbin/ntpd, I've been testing both and
> happened to record details for the port ntpd.
> 
> Regards, Dewayne
> 
Using a local clock is a bad bad idea.  I'm not sure what problem you
think it solves, but I'm fairly sure that configuring local clocks is
at the root of your problems.  The only valid configuration that
includes a local clock is when some external mechanism other than ntp
is disciplining the kernel clock and ntpd is being used only to monitor
that performance and serve time to others.

Your configuration is almost the perfect setup for failure:  one
unreliable network clock, and the local clock.  Ntpd's strong point is
being able to select reliable servers from a collection of candidates. 
When you reduce its choices to just two, there is no way it can make a
correct choice.

The sequence you show above starts out with 10.0.7.6 sync'd to another
network server, with an offset of 227ms, making it a bad candidate as
system peer.  Two minutes later 10.0.7.6 has switched itself to its
local clock and dropped to stratum 14; ntpd is not going to choose a
remote stratum 14 server over the stratum 14 local clock.

Eventually 10.0.7.6 steps its clock and resumes operation at stratum 2.
Shortly after that, the problem system follows by stepping its clock
then switching the system peer to 10.0.7.6.

You're going to keep having these kinds of problems as long as local
clocks are configured.  Especially since multiple servers in your
syncronization net are configured that way, and don't have enough peers
to sanity-check the times.

-- Ian

freebsd stable - Feb 2020 - ntp problems stratum 2 to 14?

ntp problems stratum 2 to 14?

ntp problems stratum 2 to 14?

ntp problems stratum 2 to 14?