I have 30 identical Lenovo desktop systems running CentOS 5.1. On one of those systems the clock is running slow (5+ minutes from yesterday to this morning and another minute since this morning) despite the fact that NTP is running on all of them and they all have the exact same /etc/ntp.conf file (I compared the MD5 sums of that file on all the systems). Here is the output of "grep ntp /var/log messages" on the system with the problem since I restarted the NTP daemon earlier today: May 20 11:35:38 hepdsw03 ntpd[31791]: ntpd 4.2.2p1 at 1.1570-o Sat Nov 10 12:33:50 UTC 2007 (1) May 20 11:35:38 hepdsw03 ntpd[31792]: precision = 1.000 usec May 20 11:35:38 hepdsw03 ntpd[31792]: Listening on interface wildcard, 0.0.0.0#123 Disabled May 20 11:35:38 hepdsw03 ntpd[31792]: Listening on interface wildcard, ::#123 Disabled May 20 11:35:38 hepdsw03 ntpd[31792]: Listening on interface lo, :: 1#123 Enabled May 20 11:35:38 hepdsw03 ntpd[31792]: Listening on interface eth0, fe80::210:c6ff:feab:dd92#123 Enabled May 20 11:35:38 hepdsw03 ntpd[31792]: Listening on interface lo, 127.0.0.1#123 Enabled May 20 11:35:38 hepdsw03 ntpd[31792]: Listening on interface eth0, 10.66.42.109#123 Enabled May 20 11:35:38 hepdsw03 ntpd[31792]: kernel time sync status 0040 May 20 11:35:38 hepdsw03 ntpd[31792]: frequency initialized 0.000 PPM from /var/lib/ntp/drift May 20 11:38:55 hepdsw03 ntpd[31792]: synchronized to LOCAL(0), stratum 10 May 20 11:38:55 hepdsw03 ntpd[31792]: kernel time sync disabled 0001 May 20 11:39:59 hepdsw03 ntpd[31792]: synchronized to 10.101.32.104, stratum 3 May 20 11:40:58 hepdsw03 ntpd[31792]: synchronized to LOCAL(0), stratum 10 May 20 11:42:09 hepdsw03 ntpd[31792]: synchronized to 10.101.32.104, stratum 3 May 20 11:47:26 hepdsw03 ntpd[31792]: synchronized to LOCAL(0), stratum 10 May 20 11:49:31 hepdsw03 ntpd[31792]: synchronized to 10.101.32.104, stratum 3 May 20 11:52:48 hepdsw03 ntpd[31792]: synchronized to LOCAL(0), stratum 10 May 20 11:54:54 hepdsw03 ntpd[31792]: synchronized to 10.101.32.104, stratum 3 May 20 12:01:26 hepdsw03 ntpd[31792]: synchronized to LOCAL(0), stratum 10 Any idea what could be causing this? For now I have disabled the NTP daemon and am running ntpdate once an hour. What complicates the matter is that we are using Kerberos for authentication, and after a day or so the user can not log into his system anymore because of the time skew. Alfred
On Tue, 20 May 2008, Alfred von Campe wrote:> I have 30 identical Lenovo desktop systems running CentOS 5.1. On > one of those systems the clock is running slow (5+ minutes from > yesterday to this morning and another minute since this morning) > despite the fact that NTP is running on all of them and they all > have the exact same /etc/ntp.conf file (I compared the MD5 sums of > that file on all the systems). Here is the output of "grep ntp > /var/log messages" on the system with the problem since I restarted > the NTP daemon earlier today:A slew of 5 min/24 hrs should be in the range of fixable.> May 20 11:35:38 hepdsw03 ntpd[31792]: frequency initialized 0.000 > PPM from /var/lib/ntp/driftThis is very suspect. Are there any SELinux or other log messages suggesting that ntpd isn't able to write to its drift file? Your local clock is definitely drifting, so a 0.000 value is bogus. It may indicate that there's a disconnect between ntpd and the filesystem. I'd be interested in the output of "ntpdc -c kerninfo"; on most systems the 'pll frequency' value is a close match to the figure in the drift file.> May 20 11:38:55 hepdsw03 ntpd[31792]: synchronized to LOCAL(0), > stratum 10 > May 20 11:38:55 hepdsw03 ntpd[31792]: kernel time sync disabled 0001 > May 20 11:39:59 hepdsw03 ntpd[31792]: synchronized to 10.101.32.104, > stratum 3This is ungood. Sync-ing to local before your network time server means that your machine doesn't want to believe your server -- and you should see a "kernel time sync enabled" message once the machine has sync-ed with the time server. You said the machines are identical. Could there be any variation in the BIOS revision level or its settings? Sometimes ACPI stuff can mess up ntp. Also -- the log messages you provide have no "step time server" reference. Do you have a valid /etc/ntp/step-tickers file? -- Paul Heinlein <> heinlein at madboa.com <> http://www.madboa.com/
On Tue, May 20, 2008 at 3:46 PM, Alfred von Campe <alfred at von-campe.com> wrote:> I have 30 identical Lenovo desktop systems running CentOS 5.1. On one of > those systems the clock is running slow (5+ minutes from yesterday to this > morning and another minute since this morning) despite the fact that NTP is > running on all of them and they all have the exact same /etc/ntp.conf file > (I compared the MD5 sums of that file on all the systems). Here is the > output of "grep ntp /var/log messages" on the system with the problem since > I restarted the NTP daemon earlier today:Hi Alfred, What is the output of "ntpq -np" ? You should have a line with a star (*), otherwise it is not synchronizing. Start running NTP again, wait for half an hour and issue that command to see what your output is. It could be a problem related to filtering UDP traffic, either with iptables on the machine itself, or some other router in the network your machine is in. HTH, Filipe