Hey all, I was watching some logs on a domU today and i suddenly noticed that the timestamps were off by something on the order of 47 seconds. I was surprised because *I don''t* run independent wall clocks. I checked some other domUs and the "drift" was also very close to that of the first domU. I also checked another dom0, Here the domUs were "only" out of sync by ~11 seconds. The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also debian squeeze and utilizing PV with the ParaVirtOPs in the normal debian linux-image-2.6.32 kernel. I am currently using ntpdate (in cron.hourly) on my dom0s, could this be the problem? Is there a difference in the way ntpdate updates time the way ntpd does? IIRC ntpd updates time continuously, correct? Can someone explain why this would happen. Could this be caused by a xend restart? After googling, I''m surprised a best practice solution isn''t listed on XenFaq... considering how many users seem to be frustrated with this issue. All the best, D. -- Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos Web: http://nedos.net -- Github: http://github.com/nedos _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, Ensure that ntpdate or ntpd is adjusting the hwclock in the dom0 as this is what is passed through to guests. Joseph. On 14 May 2011 21:05, Dmitry Nedospasov <dmitry@nedos.net> wrote:> Hey all, > > I was watching some logs on a domU today and i suddenly noticed that the > timestamps were off by something on the order of 47 seconds. I was > surprised because *I don''t* run independent wall clocks. I checked > some other domUs and the "drift" was also very close to that of the > first domU. > > I also checked another dom0, Here the domUs were "only" out of sync by > ~11 seconds. > > The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also > debian squeeze and utilizing PV with the ParaVirtOPs in the normal > debian linux-image-2.6.32 kernel. > > I am currently using ntpdate (in cron.hourly) on my dom0s, could this be > the problem? Is there a difference in the way ntpdate updates time the > way ntpd does? IIRC ntpd updates time continuously, correct? > > Can someone explain why this would happen. Could this be caused by a > xend restart? > > After googling, I''m surprised a best practice solution isn''t listed on > XenFaq... considering how many users seem to be frustrated with this > issue. > > All the best, > > D. > -- > Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos > Web: http://nedos.net -- Github: http://github.com/nedos > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Kind regards, Joseph. Founder | Director Orion Virtualisation Solutions | www.orionvm.com.au | Phone: 1300 56 99 52 | Mobile: 0428 754 846 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hey Joseph, Thanks for your reply! On Sat, May 14, 2011 at 09:18:25PM +1000, Joseph Glanville wrote:> Ensure that ntpdate or ntpd is adjusting the hwclock in the dom0 as > this is what is passed through to guests.What do you generally do? Is it sufficent to just do a hwclock --adjust on an hourly basis? Thanks, D. -- Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos Web: http://nedos.net -- Github: http://github.com/nedos _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi, Generally that will work but I would advise configuring your ntp deamon to do it for you if you are going to use ntp. If you are just using cron then run: # update system time ntpdate <ntp.example.org> # write time to hwclock hwclock --systohc This should suffice but isn''t best practices in my opinion except for boot. After boot I prefer to use ntpd to adjust the time ( ntpdate is just required as ntpd has issues adjusting large offsets in reasonable timeframes) I will add this to the list of articles I need to write. Joseph. On 14 May 2011 22:45, Dmitry Nedospasov <dmitry@nedos.net> wrote:> Hey Joseph, > > Thanks for your reply! > > On Sat, May 14, 2011 at 09:18:25PM +1000, Joseph Glanville wrote: >> Ensure that ntpdate or ntpd is adjusting the hwclock in the dom0 as >> this is what is passed through to guests. > > What do you generally do? Is it sufficent to just do a hwclock --adjust > on an hourly basis? > > Thanks, > > D. > -- > Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos > Web: http://nedos.net -- Github: http://github.com/nedos >-- Kind regards, Joseph. Founder | Director Orion Virtualisation Solutions | www.orionvm.com.au | Phone: 1300 56 99 52 | Mobile: 0428 754 846 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, May 14, 2011 at 01:05:36PM +0200, Dmitry Nedospasov wrote:> I was watching some logs on a domU today and i suddenly noticed that the > timestamps were off by something on the order of 47 seconds. I was > surprised because *I don''t* run independent wall clocks.How did you check? This was a feature of the old Xen-Linux tree. However it is not longer available in the Xen support of upstream Linux.> I am currently using ntpdate (in cron.hourly) on my dom0s, could this be > the problem? Is there a difference in the way ntpdate updates time the > way ntpd does? IIRC ntpd updates time continuously, correct?Use ntpd, in all domains. Bastian -- Suffocating together ... would create heroic camaraderie. -- Khan Noonian Singh, "Space Seed", stardate 3142.8 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Dmitry Nedospasov wrote:> > I was watching some logs on a domU today and i suddenly noticed that the > timestamps were off by something on the order of 47 seconds. I was > surprised because *I don''t* run independent wall clocks. I checked > some other domUs and the "drift" was also very close to that of the > first domU. > > I also checked another dom0, Here the domUs were "only" out of sync by > ~11 seconds. > > The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also > debian squeeze and utilizing PV with the ParaVirtOPs in the normal > debian linux-image-2.6.32 kernel. >I''ve been fighting this problem (clock running +47 seconds) for several months. My OS setup is like yours, dom0 is Debian Squeeze x64 running Xen 4.0.1-2. DomU''s are Debian Squeeze x64 or Lenny x86: dom0: Debian Squeeze x64, running ntpd Xen version 4.0.1 (Debian 4.0.1-2) Risk domU: Debian Squeeze x64, running ntpd Coop domU: Debian Squeeze x64, running ntpd T4 domU: Debian Lenny x86, not running ntpd Last night I wrote a Perl script to remotely monitor the dom0 and domU clocks via ''rsh <host> date +%s'' from a non-Xen server. The script runs every minute and records any time change > 2sec from previous minute. Here is the result: ---------------------------------------- Fri Jul 1 23:00:05 PDT 2011 dom0 = localtime + 1s Risk domU = localtime + 1s Coop domU = localtime + 1s T5 domU = localtime + 93s ---------------------------------------- Fri Jul 1 23:13:04 PDT 2011 T5 domU = localtime + 1s ..... (ran ntpdate manually) ---------------------------------------- Sat Jul 2 05:26:04 PDT 2011 dom0 = localtime + 47s Risk domU = localtime + 47s Coop domU = localtime + 48s T5 domU = localtime + 47s ---------------------------------------- Sat Jul 2 05:59:04 PDT 2011 Risk domU = localtime + 0s ---------------------------------------- Sat Jul 2 07:50:04 PDT 2011 Coop domU = localtime + 0s ---------------------------------------- Sat Jul 2 08:11:04 PDT 2011 dom0 = localtime + 0s ---------------------------------------- Sat Jul 2 09:13:05 PDT 2011 T5 domU = localtime - 1s ..... (ran ntpdate manually) At 5:26 am, there was a "time quake" on the Xen server, which caused dom0 and all domU clocks to move ahead by 47 seconds. Risk domU, running NTP, corrected its clock at 5:59 am by abruptly jerking it back to normal time. Coop domU and dom0 also did the same thing a while later. T5 domU, not running NTP, never corrected itself. I manually executed ntpdate on it. Several things are odd about this problem. First, the "time quake" is exact and reporducible, +47 seconds, same as Dmitry. My server is dual Xeon 5345 on SuperMicro X7DBR-E motherboard. Platform timer is "3.579MHz ACPI PM Timer" (from xm dmesg). Secondly, I thought NTP is suppose to adjust the clock gradually (-5ms each second) instead of skipping many seconds at once. (Or it might be running the clock VERY SLOWLY for a few seconds to offset +47 secs.) Thirdly, after the initial "time quake", domUs and dom0 had to correct their clocks individually, at different times. Although a long shot, I will try "clocksource=pit" in Xen command line this weekend... P.S. "+47 secs" often cause my Perl POE scripts to hang, that''s why this is a critical problem for me. -- View this message in context: http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4545936.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Dave Stevens
2011-Jul-02 23:16 UTC
Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)
Quoting Andy Lee <yikes2000@gmail.com>:> > Dmitry Nedospasov wrote: >> >> I was watching some logs on a domU today and i suddenly noticed that the >> timestamps were off by something on the order of 47 seconds. I was >> surprised because *I don''t* run independent wall clocks. I checked >> some other domUs and the "drift" was also very close to that of the >> first domU. >> >> I also checked another dom0, Here the domUs were "only" out of sync by >> ~11 seconds. >> >> The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also >> debian squeeze and utilizing PV with the ParaVirtOPs in the normal >> debian linux-image-2.6.32 kernel. >> > > I''ve been fighting this problem (clock running +47 seconds) for several > months. My OS setup is like yours, dom0 is Debian Squeeze x64 running Xen > 4.0.1-2. DomU''s are Debian Squeeze x64 or Lenny x86: > > dom0: Debian Squeeze x64, running ntpd > Xen version 4.0.1 (Debian 4.0.1-2) > > Risk domU: Debian Squeeze x64, running ntpd > Coop domU: Debian Squeeze x64, running ntpd > T4 domU: Debian Lenny x86, not running ntpd > > Last night I wrote a Perl script to remotely monitor the dom0 and domU > clocks via ''rsh <host> date +%s'' from a non-Xen server. The script runs > every minute and records any time change > 2sec from previous minute. Here > is the result: > > ---------------------------------------- > Fri Jul 1 23:00:05 PDT 2011 > dom0 = localtime + 1s > Risk domU = localtime + 1s > Coop domU = localtime + 1s > T5 domU = localtime + 93s > ---------------------------------------- > Fri Jul 1 23:13:04 PDT 2011 > T5 domU = localtime + 1s ..... (ran ntpdate manually) > ---------------------------------------- > Sat Jul 2 05:26:04 PDT 2011 > dom0 = localtime + 47s > Risk domU = localtime + 47s > Coop domU = localtime + 48s > T5 domU = localtime + 47s > ---------------------------------------- > Sat Jul 2 05:59:04 PDT 2011 > Risk domU = localtime + 0s > ---------------------------------------- > Sat Jul 2 07:50:04 PDT 2011 > Coop domU = localtime + 0s > ---------------------------------------- > Sat Jul 2 08:11:04 PDT 2011 > dom0 = localtime + 0s > ---------------------------------------- > Sat Jul 2 09:13:05 PDT 2011 > T5 domU = localtime - 1s ..... (ran ntpdate manually) > > At 5:26 am, there was a "time quake" on the Xen server, which caused dom0 > and all domU clocks to move ahead by 47 seconds. Risk domU, running NTP, > corrected its clock at 5:59 am by abruptly jerking it back to normal time. > Coop domU and dom0 also did the same thing a while later. T5 domU, not > running NTP, never corrected itself. I manually executed ntpdate on it. > > Several things are odd about this problem. First, the "time quake" is exact > and reporducible, +47 seconds, same as Dmitry. My server is dual Xeon 5345 > on SuperMicro X7DBR-E motherboard. Platform timer is "3.579MHz ACPI PM > Timer" (from xm dmesg). > > Secondly, I thought NTP is suppose to adjust the clock gradually (-5ms each > second) instead of skipping many seconds at once. (Or it might be running > the clock VERY SLOWLY for a few seconds to offset +47 secs.) Thirdly, after > the initial "time quake", domUs and dom0 had to correct their clocks > individually, at different times. > > Although a long shot, I will try "clocksource=pit" in Xen command line this > weekend... > > P.S. "+47 secs" often cause my Perl POE scripts to hang, that''s why this is > a critical problem for me.depending on the direction of the drift dovecot won''t like it either, your mail can stop working. Dave> > -- > View this message in context: > http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4545936.html > Sent from the Xen - User mailing list archive at Nabble.com. > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- "It is no measure of health to be well adjusted to a profoundly sick society." Krishnamurti _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Steve Allison
2011-Jul-03 11:03 UTC
Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)
On 03/07/2011 00:16, Dave Stevens wrote:> Quoting Andy Lee <yikes2000@gmail.com>: > >> >> Dmitry Nedospasov wrote: >>> >>> I was watching some logs on a domU today and i suddenly noticed that >>> the >>> timestamps were off by something on the order of 47 seconds. I was >>> surprised because *I don''t* run independent wall clocks. I checked >>> some other domUs and the "drift" was also very close to that of the >>> first domU. >>> >>> I also checked another dom0, Here the domUs were "only" out of sync by >>> ~11 seconds. >>> >>> The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also >>> debian squeeze and utilizing PV with the ParaVirtOPs in the normal >>> debian linux-image-2.6.32 kernel. >>> >> >> I''ve been fighting this problem (clock running +47 seconds) for several >> months. My OS setup is like yours, dom0 is Debian Squeeze x64 >> running Xen >> 4.0.1-2. DomU''s are Debian Squeeze x64 or Lenny x86:A friend of mine has been suffering from this same issue, and have yet to find a solution. Running Squeeze dom0, with a mixture of pvgrub domU''s running more Squeeze and CentOS, and 3 Windows HVM domU''s. I also believe it was a Supermicro machine, although I''ll get confirmation later. The time movement is also in the region of 48 seconds, but it causes catastrophic failure of Windows HVM domU''s. The HVM domU''s will BSOD, or just restart after the time shift. He has also suffered from spontaneously restarting dom0 which he never found the cause of but was suspecting the time shift was related. The same machine running CentOS & KVM. The machine was running Debian Lenny with Xen 3.x for some time. It had the same time issues but seemed to only be warnings and didn''t have any other symptoms. Since the move to Debian Squeeze and Xen 4.0.1, he has been plagued with this issue, and was forced to reluctantly explore and learn CentOS with KVM after hours of troubleshooting Xen. The dom0 restarts would happen randomly and in the range of 1 hour or 1.5 days. Don''t think he gave up easily though, he tried every combination of Debian Xen Kernel / Kernel from Jeremy, 2.6.32-* Debian Squeeze Xen 4.0.1 / Compiling Xen from source, 4.1.1 I am asking him to write a mail which either he will reply here himself or I''ll pass on. I have pieced this mail together using what correspondence we have had over the last couple of weeks. ====================== On boot dmesg would show [ 0.064660] PM-Timer failed consistency check (0x0xffffff) - aborting. ====================== This log was captured prior to HVM deaths and dom0 reboots with the following.. [31853.028654] hrtimer: interrupt took 48149483 ns ====================== Some process crashes on domU would show this during heavy I/O, don''t know if its related... [266640.072386] INFO: task flush-202:3:8547 blocked for more than 120 seconds. [266640.072393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [266640.072400] flush-202:3 D 0000000000000002 0 8547 2 0x00000000 [266640.072410] ffff88001fd81c40 0000000000000246 0000000000000000 0000000000000000 [266640.072424] 0000000000000001 0000000000000001 000000000000f9e0 ffff8800136dbfd8 [266640.072437] 0000000000015780 0000000000015780 ffff88001df58e20 ffff88001df59118 [266640.072451] Call Trace: [266640.072458] [<ffffffff8102cdcc>] ? pvclock_clocksource_read+0x3a/0x8b [266640.072467] [<ffffffff8110e16e>] ? sync_buffer+0x0/0x40 [266640.072474] [<ffffffff8110e16e>] ? sync_buffer+0x0/0x40 [266640.072481] [<ffffffff812fb0d2>] ? io_schedule+0x73/0xb7 [266640.072489] [<ffffffff8110e1a9>] ? sync_buffer+0x3b/0x40 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I may have found a solution: add "clocksource=pit" to Xen command line, e.g. my Grub2 stanza for booting the Xen server (dom0) is: /menuentry ''DEFAULT: Debian Squeeze, kernel 2.6.32-5-xen-amd64'' { insmod part_msdos insmod ext2 set root=''(hd0,msdos1)'' search --no-floppy --fs-uuid --set cdd50d18-e2bd-42b3-8042-c9c4d7aedb99 echo ''Loading Linux 2.6.32-5-xen-amd64 ...'' multiboot /xen-4.0-amd64.gz placeholder dom0_mem=512M *clocksource=pit* module /vmlinuz-2.6.32-5-xen-amd64 placeholder root=/dev/md2 ro quiet echo ''Loading initial ramdisk ...'' module /initrd.img-2.6.32-5-xen-amd64 }/ My explanation: http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01367.html ... Back in February 2011, Olivier Hanesse reported a TSC bug that caused a time jump of 50 minutes into the future. He was using HPET as clock source (platform timer) and "xm dmesg" showed this entry: (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2850 (count=3) http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01406.html ... Keir Fraser explained that Xen detects the platform timer counter wrapping and "to account for that based on trusting the CPU''s 64-bit TSC." An unreliable TSC may erroneously cause clock jumps. Using HPET as platform timer, the clock jump was ~50 minutes. I theorize 47 seconds is the clock jump for using ACPI PM as platform timer. My "xm dmesg" showed: (XEN) Platform timer is 3.579MHz ACPI PM Timer ... (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2870 (count=1) Keir suggested using PIT platform timer as a workaround. After booting dom0 with "clocksource=pit", my "xm dmesg" showed: (XEN) Platform timer is 1.193MHz PIT It''s been 48 hours without any 47sec time jump. Too soon to declare victory, but encouraging nonetheless. I only have one Xen server to test, so I ask everyone with this problem to try it. Thank you. -- View this message in context: http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4555976.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Nabble added ''*'' around "clocksource=pit". There is no ''*'': menuentry ''DEFAULT: Debian Squeeze, kernel 2.6.32-5-xen-amd64'' { ... multiboot /xen-4.0-amd64.gz ... clocksource=pit ... } -- View this message in context: http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4555992.html Sent from the Xen - User mailing list archive at Nabble.com. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users