thr3ads.net - Xen users - [Xen-users] DomU clock out of sync [May 2011]

If this information is useful, please help other people find it:
Share via:

Dmitry Nedospasov

2011-May-14 11:05 UTC

[Xen-users] DomU clock out of sync

Hey all,

I was watching some logs on a domU today and i suddenly noticed that the
timestamps were off by something on the order of 47 seconds. I was
surprised because *I don''t* run independent wall clocks. I checked
some other domUs and the "drift" was also very close to that of the
first domU.

I also checked another dom0, Here the domUs were "only" out of sync by
~11 seconds.

The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also
debian squeeze and utilizing PV with the ParaVirtOPs in the normal
debian linux-image-2.6.32 kernel.

I am currently using ntpdate (in cron.hourly) on my dom0s, could this be
the problem? Is there a difference in the way ntpdate updates time the
way ntpd does? IIRC ntpd updates time continuously, correct?

Can someone explain why this would happen. Could this be caused by a
xend restart?

After googling, I''m surprised a best practice solution isn''t
listed on
XenFaq...  considering how many users seem to be frustrated with this
issue.

All the best,

D.
-- 
Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos
Web: http://nedos.net -- Github: http://github.com/nedos

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Joseph Glanville

2011-May-14 11:18 UTC

head link

Re: [Xen-users] DomU clock out of sync

Hi,

Ensure that ntpdate or ntpd is adjusting the hwclock in the dom0 as
this is what is passed through to guests.

Joseph.

On 14 May 2011 21:05, Dmitry Nedospasov <dmitry@nedos.net>
wrote:> Hey all,
>
> I was watching some logs on a domU today and i suddenly noticed that the
> timestamps were off by something on the order of 47 seconds. I was
> surprised because *I don''t* run independent wall clocks. I checked
> some other domUs and the "drift" was also very close to that of
the
> first domU.
>
> I also checked another dom0, Here the domUs were "only" out of
sync by
> ~11 seconds.
>
> The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also
> debian squeeze and utilizing PV with the ParaVirtOPs in the normal
> debian linux-image-2.6.32 kernel.
>
> I am currently using ntpdate (in cron.hourly) on my dom0s, could this be
> the problem? Is there a difference in the way ntpdate updates time the
> way ntpd does? IIRC ntpd updates time continuously, correct?
>
> Can someone explain why this would happen. Could this be caused by a
> xend restart?
>
> After googling, I''m surprised a best practice solution
isn''t listed on
> XenFaq...  considering how many users seem to be frustrated with this
> issue.
>
> All the best,
>
> D.
> --
> Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos
> Web: http://nedos.net -- Github: http://github.com/nedos
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>


-- 
Kind regards,
Joseph.
Founder | Director
Orion Virtualisation Solutions | www.orionvm.com.au | Phone: 1300 56
99 52 | Mobile: 0428 754 846

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Dmitry Nedospasov

2011-May-14 12:45 UTC

head link

Re: [Xen-users] DomU clock out of sync

Hey Joseph,

Thanks for your reply!

On Sat, May 14, 2011 at 09:18:25PM +1000, Joseph Glanville
wrote:> Ensure that ntpdate or ntpd is adjusting the hwclock in the dom0 as
> this is what is passed through to guests.
What do you generally do? Is it sufficent to just do a hwclock --adjust
on an hourly basis?

Thanks,

D.
-- 
Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos
Web: http://nedos.net -- Github: http://github.com/nedos

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Joseph Glanville

2011-May-14 12:51 UTC

head link

Re: [Xen-users] DomU clock out of sync

Hi,

Generally that will work but I would advise configuring your ntp
deamon to do it for you if you are going to use ntp.
If you are just using cron then run:
# update system time
ntpdate <ntp.example.org>
# write time to hwclock
hwclock --systohc

This should suffice but isn''t best practices in my opinion except for
boot.
After boot I prefer to use ntpd to adjust the time ( ntpdate is just
required as ntpd has issues adjusting large offsets in reasonable
timeframes)

I will add this to the list of articles I need to write.

Joseph.

On 14 May 2011 22:45, Dmitry Nedospasov <dmitry@nedos.net>
wrote:> Hey Joseph,
>
> Thanks for your reply!
>
> On Sat, May 14, 2011 at 09:18:25PM +1000, Joseph Glanville wrote:
>> Ensure that ntpdate or ntpd is adjusting the hwclock in the dom0 as
>> this is what is passed through to guests.
>
> What do you generally do? Is it sufficent to just do a hwclock --adjust
> on an hourly basis?
>
> Thanks,
>
> D.
> --
> Dmitry Nedospasov <dmitry@nedos.net> -- Twitter: @nedos
> Web: http://nedos.net -- Github: http://github.com/nedos
>

-- 
Kind regards,
Joseph.
Founder | Director
Orion Virtualisation Solutions | www.orionvm.com.au | Phone: 1300 56
99 52 | Mobile: 0428 754 846

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Bastian Blank

2011-May-14 14:41 UTC

head link

Re: [Xen-users] DomU clock out of sync

On Sat, May 14, 2011 at 01:05:36PM +0200, Dmitry Nedospasov
wrote:> I was watching some logs on a domU today and i suddenly noticed that the
> timestamps were off by something on the order of 47 seconds. I was
> surprised because *I don''t* run independent wall clocks.
How did you check? This was a feature of the old Xen-Linux tree. However
it is not longer available in the Xen support of upstream Linux.
> I am currently using ntpdate (in cron.hourly) on my dom0s, could this be
> the problem? Is there a difference in the way ntpdate updates time the
> way ntpd does? IIRC ntpd updates time continuously, correct?
Use ntpd, in all domains.

Bastian

-- 
Suffocating together ... would create heroic camaraderie.
		-- Khan Noonian Singh, "Space Seed", stardate 3142.8

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andy Lee

2011-Jul-02 22:53 UTC

head link

[Xen-users] Re: DomU clock out of sync (and Dom0 too)

Dmitry Nedospasov wrote:> 
> I was watching some logs on a domU today and i suddenly noticed that the
> timestamps were off by something on the order of 47 seconds. I was
> surprised because *I don''t* run independent wall clocks. I checked
> some other domUs and the "drift" was also very close to that of
the
> first domU.
> 
> I also checked another dom0, Here the domUs were "only" out of
sync by
> ~11 seconds.
> 
> The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also
> debian squeeze and utilizing PV with the ParaVirtOPs in the normal
> debian linux-image-2.6.32 kernel.
> 
I''ve been fighting this problem (clock running +47 seconds) for several
months.  My OS setup is like yours, dom0 is Debian Squeeze x64 running Xen
4.0.1-2.  DomU''s are Debian Squeeze x64 or Lenny x86:

       dom0: Debian Squeeze x64, running ntpd
             Xen version 4.0.1 (Debian 4.0.1-2)

  Risk domU: Debian Squeeze x64, running ntpd
  Coop domU: Debian Squeeze x64, running ntpd
    T4 domU: Debian Lenny x86, not running ntpd

Last night I wrote a Perl script to remotely monitor the dom0 and domU
clocks via ''rsh <host> date +%s'' from a non-Xen server. 
The script runs
every minute and records any time change > 2sec from previous minute.  Here
is the result:

----------------------------------------
Fri Jul  1 23:00:05 PDT 2011
           dom0 = localtime + 1s
      Risk domU = localtime + 1s
      Coop domU = localtime + 1s
        T5 domU = localtime + 93s
----------------------------------------
Fri Jul  1 23:13:04 PDT 2011
        T5 domU = localtime + 1s ..... (ran ntpdate manually)
----------------------------------------
Sat Jul  2 05:26:04 PDT 2011
           dom0 = localtime + 47s
      Risk domU = localtime + 47s
      Coop domU = localtime + 48s
        T5 domU = localtime + 47s
----------------------------------------
Sat Jul  2 05:59:04 PDT 2011
      Risk domU = localtime + 0s
----------------------------------------
Sat Jul  2 07:50:04 PDT 2011
      Coop domU = localtime + 0s
----------------------------------------
Sat Jul  2 08:11:04 PDT 2011
           dom0 = localtime + 0s
----------------------------------------
Sat Jul  2 09:13:05 PDT 2011
        T5 domU = localtime - 1s ..... (ran ntpdate manually)

At 5:26 am, there was a "time quake" on the Xen server, which caused
dom0
and all domU clocks to move ahead by 47 seconds.  Risk domU, running NTP,
corrected its clock at 5:59 am by abruptly jerking it back to normal time. 
Coop domU and dom0 also did the same thing a while later.  T5 domU, not
running NTP, never corrected itself.  I manually executed ntpdate on it.

Several things are odd about this problem.  First, the "time quake" is
exact
and reporducible, +47 seconds, same as Dmitry.  My server is dual Xeon 5345
on SuperMicro X7DBR-E motherboard.  Platform timer is "3.579MHz ACPI PM
Timer" (from xm dmesg).

Secondly, I thought NTP is suppose to adjust the clock gradually (-5ms each
second) instead of skipping many seconds at once.  (Or it might be running
the clock VERY SLOWLY for a few seconds to offset +47 secs.)  Thirdly, after
the initial "time quake", domUs and dom0 had to correct their clocks
individually, at different times.

Although a long shot, I will try "clocksource=pit" in Xen command line
this
weekend...

P.S. "+47 secs" often cause my Perl POE scripts to hang,
that''s why this is
a critical problem for me.

--
View this message in context:
http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4545936.html
Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Dave Stevens

2011-Jul-02 23:16 UTC

head link

Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)

Quoting Andy Lee <yikes2000@gmail.com>:
>
> Dmitry Nedospasov wrote:
>>
>> I was watching some logs on a domU today and i suddenly noticed that
the
>> timestamps were off by something on the order of 47 seconds. I was
>> surprised because *I don''t* run independent wall clocks. I
checked
>> some other domUs and the "drift" was also very close to that
of the
>> first domU.
>>
>> I also checked another dom0, Here the domUs were "only" out
of sync by
>> ~11 seconds.
>>
>> The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are also
>> debian squeeze and utilizing PV with the ParaVirtOPs in the normal
>> debian linux-image-2.6.32 kernel.
>>
>
> I''ve been fighting this problem (clock running +47 seconds) for
several
> months.  My OS setup is like yours, dom0 is Debian Squeeze x64 running Xen
> 4.0.1-2.  DomU''s are Debian Squeeze x64 or Lenny x86:
>
>        dom0: Debian Squeeze x64, running ntpd
>              Xen version 4.0.1 (Debian 4.0.1-2)
>
>   Risk domU: Debian Squeeze x64, running ntpd
>   Coop domU: Debian Squeeze x64, running ntpd
>     T4 domU: Debian Lenny x86, not running ntpd
>
> Last night I wrote a Perl script to remotely monitor the dom0 and domU
> clocks via ''rsh <host> date +%s'' from a non-Xen
server.  The script runs
> every minute and records any time change > 2sec from previous minute. 
Here
> is the result:
>
> ----------------------------------------
> Fri Jul  1 23:00:05 PDT 2011
>            dom0 = localtime + 1s
>       Risk domU = localtime + 1s
>       Coop domU = localtime + 1s
>         T5 domU = localtime + 93s
> ----------------------------------------
> Fri Jul  1 23:13:04 PDT 2011
>         T5 domU = localtime + 1s ..... (ran ntpdate manually)
> ----------------------------------------
> Sat Jul  2 05:26:04 PDT 2011
>            dom0 = localtime + 47s
>       Risk domU = localtime + 47s
>       Coop domU = localtime + 48s
>         T5 domU = localtime + 47s
> ----------------------------------------
> Sat Jul  2 05:59:04 PDT 2011
>       Risk domU = localtime + 0s
> ----------------------------------------
> Sat Jul  2 07:50:04 PDT 2011
>       Coop domU = localtime + 0s
> ----------------------------------------
> Sat Jul  2 08:11:04 PDT 2011
>            dom0 = localtime + 0s
> ----------------------------------------
> Sat Jul  2 09:13:05 PDT 2011
>         T5 domU = localtime - 1s ..... (ran ntpdate manually)
>
> At 5:26 am, there was a "time quake" on the Xen server, which
caused dom0
> and all domU clocks to move ahead by 47 seconds.  Risk domU, running NTP,
> corrected its clock at 5:59 am by abruptly jerking it back to normal time.
> Coop domU and dom0 also did the same thing a while later.  T5 domU, not
> running NTP, never corrected itself.  I manually executed ntpdate on it.
>
> Several things are odd about this problem.  First, the "time
quake" is exact
> and reporducible, +47 seconds, same as Dmitry.  My server is dual Xeon 5345
> on SuperMicro X7DBR-E motherboard.  Platform timer is "3.579MHz ACPI
PM
> Timer" (from xm dmesg).
>
> Secondly, I thought NTP is suppose to adjust the clock gradually (-5ms each
> second) instead of skipping many seconds at once.  (Or it might be running
> the clock VERY SLOWLY for a few seconds to offset +47 secs.)  Thirdly,
after
> the initial "time quake", domUs and dom0 had to correct their
clocks
> individually, at different times.
>
> Although a long shot, I will try "clocksource=pit" in Xen command
line this
> weekend...
>
> P.S. "+47 secs" often cause my Perl POE scripts to hang,
that''s why this is
> a critical problem for me.
depending on the direction of the drift dovecot won''t like it either,  
your mail can stop working.


Dave
>
> --
> View this message in context:  
>
http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4545936.html
> Sent from the Xen - User mailing list archive at Nabble.com.
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>


-- 
"It is no measure of health to be well adjusted to a profoundly sick
society."
   Krishnamurti


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steve Allison

2011-Jul-03 11:03 UTC

head link

Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)

On 03/07/2011 00:16, Dave Stevens wrote:> Quoting Andy Lee <yikes2000@gmail.com>:
>
>>
>> Dmitry Nedospasov wrote:
>>>
>>> I was watching some logs on a domU today and i suddenly noticed
that
>>> the
>>> timestamps were off by something on the order of 47 seconds. I was
>>> surprised because *I don''t* run independent wall clocks. I
checked
>>> some other domUs and the "drift" was also very close to
that of the
>>> first domU.
>>>
>>> I also checked another dom0, Here the domUs were "only"
out of sync by
>>> ~11 seconds.
>>>
>>> The dom0s are all debian squeeze with Xen 4.0.1-2. The domUs are
also
>>> debian squeeze and utilizing PV with the ParaVirtOPs in the normal
>>> debian linux-image-2.6.32 kernel.
>>>
>>
>> I''ve been fighting this problem (clock running +47 seconds)
for several
>> months.  My OS setup is like yours, dom0 is Debian Squeeze x64 
>> running Xen
>> 4.0.1-2.  DomU''s are Debian Squeeze x64 or Lenny x86:
A friend of mine has been suffering from this same issue, and have yet 
to find a solution.

Running Squeeze dom0, with a mixture of pvgrub domU''s running more 
Squeeze and CentOS, and 3 Windows HVM domU''s. I also believe it was a 
Supermicro machine, although I''ll get confirmation later.

The time movement is also in the region of 48 seconds, but it causes 
catastrophic failure of Windows HVM domU''s. The HVM domU''s
will BSOD, or
just restart after the time shift. He has also suffered from 
spontaneously restarting dom0 which he never found the cause of but was 
suspecting the time shift was related. The same machine running CentOS & 
KVM.

The machine was running Debian Lenny with Xen 3.x for some time. It had 
the same time issues but seemed to only be warnings and didn''t have any
other symptoms. Since the move to Debian Squeeze and Xen 4.0.1, he has 
been plagued with this issue, and was forced to reluctantly explore and 
learn CentOS with KVM after hours of troubleshooting Xen.

The dom0 restarts would happen randomly and in the range of 1 hour or 
1.5 days.

Don''t think he gave up easily though, he tried every combination of

Debian Xen Kernel / Kernel from Jeremy, 2.6.32-*
Debian Squeeze Xen 4.0.1 / Compiling Xen from source, 4.1.1

I am asking him to write a mail which either he will reply here himself 
or I''ll pass on. I have pieced this mail together using what 
correspondence we have had over the last couple of weeks.
======================
On boot dmesg would show

[    0.064660] PM-Timer failed consistency check  (0x0xffffff) - aborting.
======================
This log was captured prior to HVM deaths and dom0 reboots with the 
following..

[31853.028654] hrtimer: interrupt took 48149483 ns
======================
Some process crashes on domU would show this during heavy I/O, don''t 
know if its related...

[266640.072386] INFO: task flush-202:3:8547 blocked for more than 120 
seconds.
[266640.072393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[266640.072400] flush-202:3   D 0000000000000002     0  8547      2 
0x00000000
[266640.072410]  ffff88001fd81c40 0000000000000246 0000000000000000 
0000000000000000
[266640.072424]  0000000000000001 0000000000000001 000000000000f9e0 
ffff8800136dbfd8
[266640.072437]  0000000000015780 0000000000015780 ffff88001df58e20 
ffff88001df59118
[266640.072451] Call Trace:
[266640.072458]  [<ffffffff8102cdcc>] ? pvclock_clocksource_read+0x3a/0x8b
[266640.072467]  [<ffffffff8110e16e>] ? sync_buffer+0x0/0x40
[266640.072474]  [<ffffffff8110e16e>] ? sync_buffer+0x0/0x40
[266640.072481]  [<ffffffff812fb0d2>] ? io_schedule+0x73/0xb7
[266640.072489]  [<ffffffff8110e1a9>] ? sync_buffer+0x3b/0x40

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andy Lee

2011-Jul-06 07:17 UTC

head link

[Xen-users] Re: DomU clock out of sync (and Dom0 too)

I may have found a solution: add "clocksource=pit" to Xen command
line, e.g.
my Grub2 stanza for booting the Xen server (dom0) is:

/menuentry ''DEFAULT: Debian Squeeze, kernel
2.6.32-5-xen-amd64'' {
        insmod part_msdos
        insmod ext2
        set root=''(hd0,msdos1)''
        search --no-floppy --fs-uuid --set
cdd50d18-e2bd-42b3-8042-c9c4d7aedb99
        echo ''Loading Linux 2.6.32-5-xen-amd64 ...''
        multiboot /xen-4.0-amd64.gz placeholder dom0_mem=512M
*clocksource=pit*
        module /vmlinuz-2.6.32-5-xen-amd64 placeholder root=/dev/md2 ro
quiet
        echo ''Loading initial ramdisk ...''
        module /initrd.img-2.6.32-5-xen-amd64
}/

My explanation:

http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01367.html ...
Back in February 2011, Olivier Hanesse reported a TSC bug that caused a time
jump of 50 minutes into the future.  He was using HPET as clock source
(platform timer) and "xm dmesg" showed this entry:

(XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=2850 (count=3)

http://lists.xensource.com/archives/html/xen-devel/2011-02/msg01406.html ...
Keir Fraser explained that Xen detects the platform timer counter wrapping
and "to account for that based on trusting the CPU''s 64-bit
TSC."  An
unreliable TSC may erroneously cause clock jumps.  Using HPET as platform
timer, the clock jump was ~50 minutes.  I theorize 47 seconds is the clock
jump for using ACPI PM as platform timer.  My "xm dmesg" showed:

(XEN) Platform timer is 3.579MHz ACPI PM Timer
...
(XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=2870 (count=1)

Keir suggested using PIT platform timer as a workaround.  After booting dom0
with "clocksource=pit", my "xm dmesg" showed:

(XEN) Platform timer is 1.193MHz PIT

It''s been 48 hours without any 47sec time jump.  Too soon to declare
victory, but encouraging nonetheless.  I only have one Xen server to test,
so I ask everyone with this problem to try it.  Thank you.

--
View this message in context:
http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4555976.html
Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andy Lee

2011-Jul-06 07:22 UTC

head link

[Xen-users] Re: DomU clock out of sync (and Dom0 too)

Nabble added ''*'' around "clocksource=pit".  There is
no ''*'':

menuentry ''DEFAULT: Debian Squeeze, kernel 2.6.32-5-xen-amd64''
{
       ...
       multiboot /xen-4.0-amd64.gz ... clocksource=pit
       ...
}



--
View this message in context:
http://xen.1045712.n5.nabble.com/DomU-clock-out-of-sync-tp4395454p4555992.html
Sent from the Xen - User mailing list archive at Nabble.com.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Possibly Parallel Threads

Search for more possibly parallel threads

Xen users - May 2011 - DomU clock out of sync

[Xen-users] DomU clock out of sync

Re: [Xen-users] DomU clock out of sync

Re: [Xen-users] DomU clock out of sync

Re: [Xen-users] DomU clock out of sync

Re: [Xen-users] DomU clock out of sync

[Xen-users] Re: DomU clock out of sync (and Dom0 too)

Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)

Re: [Xen-users] Re: DomU clock out of sync (and Dom0 too)

[Xen-users] Re: DomU clock out of sync (and Dom0 too)

[Xen-users] Re: DomU clock out of sync (and Dom0 too)

Possibly Parallel Threads