thr3ads.net - Xen devel - [Xen-devel] Xen 4 TSC problems [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Olivier Hanesse

2011-Feb-23 10:49 UTC

[Xen-devel] Xen 4 TSC problems

Hello

I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze
release).

My problem is here (hopefully I amn''t the only one, so there might be a
bug
somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
After some times,  I got this error : Clocksource tsc unstable (delta
-2999660334211 ns). It has happened on several servers.

Looking at the output of "xm debug-key s;"

(XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=2850 (count=3)

I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which has the
"constant_tsc", but not the "nonstop_tsc" one.
On other systems with a newer cpu with "nonstop_tsc", I don''t
have this
issue (systems are running the same distros with same config).

I tried to boot with "max_cstate=0", but nothing changed, my TSC
isn''t
reliable and after some times, I will got the "50min" issue again.

I don''t understand how a system can do a jump of "50min" in
the future. Why
50min ? it is not 40min, not 1 hour, it is always 50min.
I don''t know how to make my TSC "reliable" (I already disable
everything
about Powerstate in BIOS Settings).

Any ideas ?

Regards

Olivier


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Feb-23 16:16 UTC

head link

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

It''s very unlikely this is a problem with TSC. It is most likely a Xen
(or possibly a PV Linux) problem where a guest (or dom0) either "goes out
to lunch" for a long period, or some other timer gets stuck.  The
"clocksource tsc unstable" message is a side effect of this...
it''s very likely the TSC that IS stable and correct and the other
clocksource (pvclock) has lost/gained 50 minutes!

 

Mark Adams cc''ed and his original xen-devel posting below.  The fact
that two different users (possibly on the same processor/system type?) have
submitted the message with a delta so similar would lead me to believe there is
some timer that is "wrapping".  And since pvclock is usually the
clocksource for dom0, and pvclock is driven by Xen''s "system
time", a reasonable guess is that the timer that is wrapping is in Xen
itself.

 

Mark''s delta = -2999660303788 ns

Your delta = -2999660334211 ns

 

Googling, I see the HPET wraparound is ~306 seconds and this delta is about 3000
seconds, so that may be a bad guess.

 

Keir, any thoughts on this?  Do you recall any post-4.0 patches that may have
fixed this?

 

Thanks,

Dan

 

References:

http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html

https://lkml.org/lkml/2010/10/26/126

 

From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] 
Sent: Wednesday, February 23, 2011 3:50 AM
To: xen-devel@lists.xensource.com; Xen Users
Subject: [Xen-devel] Xen 4 TSC problems

 

Hello

 

I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze
release).

 

My problem is here (hopefully I amn''t the only one, so there might be a
bug somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50

After some times,  I got this error : Clocksource tsc unstable (delta =
-2999660334211 ns). It has happened on several servers.

 

Looking at the output of "xm debug-key s;"

 

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2850
(count=3)

 

I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which has the
"constant_tsc", but not the "nonstop_tsc" one.

On other systems with a newer cpu with "nonstop_tsc", I don''t
have this issue (systems are running the same distros with same config).

 

I tried to boot with "max_cstate=0", but nothing changed, my TSC
isn''t reliable and after some times, I will got the "50min"
issue again.

 

I don''t understand how a system can do a jump of "50min" in
the future. Why 50min ? it is not 40min, not 1 hour, it is always 50min.

I don''t know how to make my TSC "reliable" (I already disable
everything about Powerstate in BIOS Settings).

 

Any ideas ?

 

Regards

 

Olivier


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Keir Fraser

2011-Feb-23 17:19 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 23/02/2011 16:16, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> It¹s very unlikely this is a problem with TSC. It is most likely a Xen (or
> possibly a PV Linux) problem where a guest (or dom0) either ³goes out to
> lunch² for a long period, or some other timer gets stuck.  The ³clocksource
> tsc unstable² message is a side effect of this... it¹s very likely the TSC
> that IS stable and correct and the other clocksource (pvclock) has
lost/gained
> 50 minutes!
>  
> Mark Adams cc¹ed and his original xen-devel posting below.  The fact that
two
> different users (possibly on the same processor/system type?) have
submitted
> the message with a delta so similar would lead me to believe there is some
> timer that is ³wrapping².  And since pvclock is usually the clocksource for
> dom0, and pvclock is driven!  by Xen¹s ³system time², a reasonable guess is
> that the timer that is wrapping is in Xen itself.
>  
> Mark¹s delta = -2999660303788 ns
> Your delta = -2999660334211 ns
>  
> Googling, I see the HPET wraparound is ~306 seconds and this delta is about
> 3000 seconds, so that may be a bad guess.
>  
> Keir, any thoughts on this?  Do you recall any post-4.0 patches that may
have
> fixed this?
I''ve never seen a 3000s wrap, and I don''t know of anything
that would have
fixed a bug like this. If this is a Xen time wrap of some kind then it would
affect all running guests; it''s not clear here whether only one, or
all,
guests see the wrap.

 K.
> Thanks,
> Dan
>  
> References:
> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html
> https://lkml.org/lkml/2010/10/26/126
>  
> 
> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
> Sent: Wednesday, February 23, 2011 3:50 AM
> To: xen-devel@lists.xensource.co!  m; Xen Users
> Subject: [Xen-devel] Xen 4 TSC problems
>  
> 
> Hello
> 
>  
> 
> I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze
release).
> 
>  
> 
> My problem is here (hopefully I amn''t the only one, so there might
be a bug
> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
> 
> After some times,  I got this error : Clocksource tsc unstable (delta >
-2999660334211 ns). It has happened on several servers.
> 
>  
> 
> Looking at the output of "xm debug-key s;"
> 
>  
> 
> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=2850
> (count=3)
> 
>  
> 
> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which has
the
> "constant_tsc", but not the "nonstop_tsc" one.
> 
> On other systems with a newer cpu with "nonstop_tsc", I
don''t have this issue
> (systems are running the same distros with same config).
> 
>  
> 
> I tried to boot with "max_cstate=0", but nothing changed, my TSC
isn''t
> reliable and after some times, I will got the "50min" issue
again.
> 
>  
> 
> I don''t unders!  tand how a system can do a jump of
"50min" in the future. Why
> 50min ? it is not 40min, not 1 hour, it is always 50min.
> 
> I don''t know how to make my TSC "reliable" (I already
disable everything about
> Powerstate in BIOS Settings).
> 
>  
> 
> Any ideas ?
> 
>  
> 
> Regards
> 
>  
> 
> Olivier
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olivier Hanesse

2011-Feb-23 19:04 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

I am sorry for the lack of information.
Every domUs on the dom0 are affected by this bug at the exact same time.

And I had this bug on a dozen servers (all running on the same hw) since 
October (when I switched from Xen 3.2 to 4.0).

Regards

Olivier

Le 23/02/2011 18:19, Keir Fraser a écrit :> On 23/02/2011 16:16, "Dan
Magenheimer"<dan.magenheimer@oracle.com>  wrote:
>
>> It¹s very unlikely this is a problem with TSC. It is most likely a Xen
(or
>> possibly a PV Linux) problem where a guest (or dom0) either ³goes out
to
>> lunch² for a long period, or some other timer gets stuck.  The
³clocksource
>> tsc unstable² message is a side effect of this... it¹s very likely the
TSC
>> that IS stable and correct and the other clocksource (pvclock) has
lost/gained
>> 50 minutes!
>>
>> Mark Adams cc¹ed and his original xen-devel posting below.  The fact
that two
>> different users (possibly on the same processor/system type?) have
submitted
>> the message with a delta so similar would lead me to believe there is
some
>> timer that is ³wrapping².  And since pvclock is usually the clocksource
for
>> dom0, and pvclock is driven!  by Xen¹s ³system time², a reasonable
guess is
>> that the timer that is wrapping is in Xen itself.
>>
>> Mark¹s delta = -2999660303788 ns
>> Your delta = -2999660334211 ns
>>
>> Googling, I see the HPET wraparound is ~306 seconds and this delta is
about
>> 3000 seconds, so that may be a bad guess.
>>
>> Keir, any thoughts on this?  Do you recall any post-4.0 patches that
may have
>> fixed this?
> I''ve never seen a 3000s wrap, and I don''t know of
anything that would have
> fixed a bug like this. If this is a Xen time wrap of some kind then it
would
> affect all running guests; it''s not clear here whether only one,
or all,
> guests see the wrap.
>
>   K.
>
>> Thanks,
>> Dan
>>
>> References:
>>
http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html
>> https://lkml.org/lkml/2010/10/26/126
>>
>>
>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
>> Sent: Wednesday, February 23, 2011 3:50 AM
>> To: xen-devel@lists.xensource.co!  m; Xen Users
>> Subject: [Xen-devel] Xen 4 TSC problems
>>
>>
>> Hello
>>
>>
>>
>> I''ve got an issue about time keeping with Xen 4.0 (Debian
squeeze release).
>>
>>
>>
>> My problem is here (hopefully I amn''t the only one, so there
might be a bug
>> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
>>
>> After some times,  I got this error : Clocksource tsc unstable (delta
>> -2999660334211 ns). It has happened on several servers.
>>
>>
>>
>> Looking at the output of "xm debug-key s;"
>>
>>
>>
>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=2850
>> (count=3)
>>
>>
>>
>> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which
has the
>> "constant_tsc", but not the "nonstop_tsc" one.
>>
>> On other systems with a newer cpu with "nonstop_tsc", I
don''t have this issue
>> (systems are running the same distros with same config).
>>
>>
>>
>> I tried to boot with "max_cstate=0", but nothing changed, my
TSC isn''t
>> reliable and after some times, I will got the "50min" issue
again.
>>
>>
>>
>> I don''t unders!  tand how a system can do a jump of
"50min" in the future. Why
>> 50min ? it is not 40min, not 1 hour, it is always 50min.
>>
>> I don''t know how to make my TSC "reliable" (I
already disable everything about
>> Powerstate in BIOS Settings).
>>
>>
>>
>> Any ideas ?
>>
>>
>>
>> Regards
>>
>>
>>
>> Olivier
>>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Feb-24 07:16 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well
would be interesting, if you still have it installed on any of these
machines.

 -- Keir

On 23/02/2011 19:04, "Olivier Hanesse"
<olivier.hanesse@gmail.com> wrote:
> I am sorry for the lack of information.
> Every domUs on the dom0 are affected by this bug at the exact same time.
> 
> And I had this bug on a dozen servers (all running on the same hw) since
> October (when I switched from Xen 3.2 to 4.0).
> 
> Regards
> 
> Olivier
> 
> Le 23/02/2011 18:19, Keir Fraser a écrit :
>> On 23/02/2011 16:16, "Dan
Magenheimer"<dan.magenheimer@oracle.com>  wrote:
>> 
>>> It¹s very unlikely this is a problem with TSC. It is most likely a
Xen (or
>>> possibly a PV Linux) problem where a guest (or dom0) either ³goes
out to
>>> lunch² for a long period, or some other timer gets stuck.  The
³clocksource
>>> tsc unstable² message is a side effect of this... it¹s very likely
the TSC
>>> that IS stable and correct and the other clocksource (pvclock) has
>>> lost/gained
>>> 50 minutes!
>>> 
>>> Mark Adams cc¹ed and his original xen-devel posting below.  The
fact that
>>> two
>>> different users (possibly on the same processor/system type?) have
submitted
>>> the message with a delta so similar would lead me to believe there
is some
>>> timer that is ³wrapping².  And since pvclock is usually the
clocksource for
>>> dom0, and pvclock is driven!  by Xen¹s ³system time², a reasonable
guess is
>>> that the timer that is wrapping is in Xen itself.
>>> 
>>> Mark¹s delta = -2999660303788 ns
>>> Your delta = -2999660334211 ns
>>> 
>>> Googling, I see the HPET wraparound is ~306 seconds and this delta
is about
>>> 3000 seconds, so that may be a bad guess.
>>> 
>>> Keir, any thoughts on this?  Do you recall any post-4.0 patches
that may
>>> have
>>> fixed this?
>> I''ve never seen a 3000s wrap, and I don''t know of
anything that would have
>> fixed a bug like this. If this is a Xen time wrap of some kind then it
would
>> affect all running guests; it''s not clear here whether only
one, or all,
>> guests see the wrap.
>> 
>>   K.
>> 
>>> Thanks,
>>> Dan
>>> 
>>> References:
>>>
http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html
>>> https://lkml.org/lkml/2010/10/26/126
>>> 
>>> 
>>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
>>> Sent: Wednesday, February 23, 2011 3:50 AM
>>> To: xen-devel@lists.xensource.co!  m; Xen Users
>>> Subject: [Xen-devel] Xen 4 TSC problems
>>> 
>>> 
>>> Hello
>>> 
>>> 
>>> 
>>> I''ve got an issue about time keeping with Xen 4.0 (Debian
squeeze release).
>>> 
>>> 
>>> 
>>> My problem is here (hopefully I amn''t the only one, so
there might be a bug
>>> somewhere) :
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
>>> 
>>> After some times,  I got this error : Clocksource tsc unstable
(delta >>> -2999660334211 ns). It has happened on several servers.
>>> 
>>> 
>>> 
>>> Looking at the output of "xm debug-key s;"
>>> 
>>> 
>>> 
>>> (XEN) TSC has constant rate, deep Cstates possible, so not
reliable,
>>> warp=2850
>>> (count=3)
>>> 
>>> 
>>> 
>>> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz",
which has the
>>> "constant_tsc", but not the "nonstop_tsc" one.
>>> 
>>> On other systems with a newer cpu with "nonstop_tsc", I
don''t have this
>>> issue
>>> (systems are running the same distros with same config).
>>> 
>>> 
>>> 
>>> I tried to boot with "max_cstate=0", but nothing changed,
my TSC isn''t
>>> reliable and after some times, I will got the "50min"
issue again.
>>> 
>>> 
>>> 
>>> I don''t unders!  tand how a system can do a jump of
"50min" in the future.
>>> Why
>>> 50min ? it is not 40min, not 1 hour, it is always 50min.
>>> 
>>> I don''t know how to make my TSC "reliable" (I
already disable everything
>>> about
>>> Powerstate in BIOS Settings).
>>> 
>>> 
>>> 
>>> Any ideas ?
>>> 
>>> 
>>> 
>>> Regards
>>> 
>>> 
>>> 
>>> Olivier
>>> 
>> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olivier Hanesse

2011-Feb-24 09:59 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

xm dmesg :

(XEN) Xen version 4.0.1 (Debian 4.0.1-2) (waldi@debian.org) (gcc version
4.4.5 (Debian 4.4.5-10) ) Wed Jan 12 14:04:06 UTC 2011
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: dom0_mem=512M loglvl=all guest_loglvl=all
dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 2 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009ac00 (usable)
(XEN)  000000000009ac00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000bffc7980 (usable)
(XEN)  00000000bffc7980 - 00000000bffcee80 (ACPI data)
(XEN)  00000000bffcee80 - 00000000c0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 00000002c0000000 (usable)
(XEN) ACPI: RSDP 000FDFD0, 0024 (r2 IBM   )
(XEN) ACPI: XSDT BFFCED40, 0054 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: FACP BFFCEC80, 0084 (r2 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: DSDT BFFC7980, 2EDA (r2 IBM    SERDEFNT     1000 INTL 20041203)
(XEN) ACPI: FACS BFFCAB00, 0040
(XEN) ACPI: APIC BFFCEB80, 00BC (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: SRAT BFFCEA00, 0128 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: HPET BFFCE9C0, 0038 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: MCFG BFFCE980, 003C (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) ACPI: ERST BFFCAB40, 0230 (r1 IBM    SERDEFNT     1000 IBM  45444F43)
(XEN) System RAM: 10239MB (10485124kB)
(XEN) SRAT: PXM 0 -> APIC 0 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 1 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 2 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 3 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 4 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 5 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 6 -> Node 0
(XEN) SRAT: PXM 0 -> APIC 7 -> Node 0
(XEN) SRAT: Node 0 PXM 0 0-c0000000
(XEN) SRAT: Node 0 PXM 0 100000000-2c0000000
(XEN) SRAT: hot plug zone found 2c0000000 - 1000000000
(XEN) SRAT: Node 0 PXM 0 2c0000000-1000000000
(XEN) NUMA: Allocated memnodemap from 2bfdfe000 - 2bfdff000
(XEN) NUMA: Using 18 for the hash shift.
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 0009ad40
(XEN) DMI 2.4 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x588
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0]
(XEN) ACPI:                  wakeup_vec[bffcab0c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
(XEN) Processor #1 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
(XEN) Processor #3 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
(XEN) Processor #4 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
(XEN) Processor #5 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
(XEN) Processor #6 7:7 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
(XEN) Processor #7 7:7 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 14, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 20
(XEN) PCI: MCFG area at e0000000 reserved in E820
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2493.798 MHz processor.
(XEN) Initing memory sharing.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) Intel machine check reporting enabled
(XEN) I/O virtualisation disabled
(XEN) Total of 8 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) checking TSC synchronization across 8 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 64 KiB.
(XEN) microcode.c:73:d32767 microcode: CPU2 resumed
(XEN) microcode.c:73:d32767 microcode: CPU1 resumed
(XEN) microcode.c:73:d32767 microcode: CPU3 resumed
(XEN) Brought up 8 CPUs
(XEN) microcode.c:73:d32767 microcode: CPU4 resumed
(XEN) microcode.c:73:d32767 microcode: CPU5 resumed
(XEN) microcode.c:73:d32767 microcode: CPU6 resumed
(XEN) microcode.c:73:d32767 microcode: CPU7 resumed
(XEN) HPET: 3 timers in total, 0 timers will be used for broadcast
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x16b2000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   00000002b4000000->00000002b8000000 (114688 pages to be
allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff816b2000
(XEN)  Init. ramdisk: ffffffff816b2000->ffffffff82e05400
(XEN)  Phys-Mach map: ffffffff82e06000->ffffffff82f06000
(XEN)  Start info:    ffffffff82f06000->ffffffff82f064b4
(XEN)  Page tables:   ffffffff82f07000->ffffffff82f22000
(XEN)  Boot stack:    ffffffff82f22000->ffffffff82f23000
(XEN)  TOTAL:         ffffffff80000000->ffffffff83000000
(XEN)  ENTRY ADDRESS: ffffffff81502200
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Scrubbing Free RAM:
................................................................................................done.
(XEN) trace.c:89:d32767 calc_tinfo_first_offset: NR_CPUs 128,
offset_in_bytes 258, t_info_first_offset 65
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times
to switch input to
Xen)
(XEN) Freed 176kB init memory.
(XEN) PCI add device 00:00.0
(XEN) PCI add device 00:02.0
(XEN) PCI add device 00:03.0
(XEN) PCI add device 00:04.0
(XEN) PCI add device 00:05.0
(XEN) PCI add device 00:06.0
(XEN) PCI add device 00:07.0
(XEN) PCI add device 00:08.0
(XEN) PCI add device 00:10.0
(XEN) PCI add device 00:10.1
(XEN) PCI add device 00:10.2
(XEN) PCI add device 00:11.0
(XEN) PCI add device 00:13.0
(XEN) PCI add device 00:15.0
(XEN) PCI add device 00:16.0
(XEN) PCI add device 00:1c.0
(XEN) PCI add device 00:1d.0
(XEN) PCI add device 00:1d.1
(XEN) PCI add device 00:1d.2
(XEN) PCI add device 00:1d.7
(XEN) PCI add device 00:1e.0
(XEN) PCI add device 00:1f.0
(XEN) PCI add device 00:1f.1
(XEN) PCI add device 00:1f.3
(XEN) PCI add device 10:00.0
(XEN) PCI add device 10:00.3
(XEN) PCI add device 11:00.0
(XEN) PCI add device 11:01.0
(XEN) PCI add device 07:00.0
(XEN) PCI add device 07:00.1
(XEN) PCI add device 03:00.0
(XEN) PCI add device 04:00.0
(XEN) PCI add device 02:00.0
(XEN) PCI add device 05:00.0
(XEN) PCI add device 06:00.0
(XEN) PCI add device 01:01.0

When the issue append :

(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.

Output of xm debug-key s :

(XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=2684 (count=4)
(XEN) dom1: mode=0,ofs=0xa8dcbfb9a,khz=2493798,inc=1,vtsc count: 1756100739
kernel, 20526533 user
(XEN) dom2: mode=0,ofs=0xc257d49df,khz=2493798,inc=1,vtsc count: 900668266
kernel, 30618121 user
(XEN) dom3: mode=0,ofs=0xdb1299744,khz=2493798,inc=1,vtsc count: 16656509047
kernel, 709406217 user
(XEN) dom4: mode=0,ofs=0xf8627e616,khz=2493798,inc=1,vtsc count: 1174828915
kernel, 194957775 user
(XEN) dom5: mode=0,ofs=0x115a0f2a67,khz=2493798,inc=1,vtsc count: 332007967
kernel, 5766769 user
(XEN) dom6: mode=0,ofs=0x13bf462f38,khz=2493798,inc=1,vtsc count: 3137076938
kernel, 1076320679 user
(XEN) dom10: mode=0,ofs=0x1b99e41f4b,khz=2493798,inc=1,vtsc count: 411433049
kernel, 19532319 user
(XEN) dom11: mode=0,ofs=0x1e4991cf40,khz=2493798,inc=1,vtsc count: 415406148
kernel, 19223482 user
(XEN) dom12: mode=0,ofs=0x1fe8c10600,khz=2493798,inc=1,vtsc count:
1012850399 kernel, 63603352 user
(XEN) dom13: mode=0,ofs=0x21ef9b9531,khz=2493798,inc=1,vtsc count: 813097186
kernel, 27536004 user
(XEN) dom14: mode=0,ofs=0x23f5b4e429,khz=2493798,inc=1,vtsc count:
2461059718 kernel, 48182776 user
(XEN) dom18: mode=0,ofs=0x2bdc302048,khz=2493798,inc=1,vtsc count: 624333824
kernel, 5166805 user
(XEN) dom19: mode=0,ofs=0x2e67227085,khz=2493798,inc=1,vtsc count:
1037952789 kernel, 5778635 user
(XEN) dom20: mode=0,ofs=0x562ce020eea4,khz=2493798,inc=1,vtsc count:
643491360 kernel, 31771029 user
(XEN) dom21: mode=0,ofs=0x563a017eea82,khz=2493798,inc=1,vtsc count:
715148727 kernel, 24430809 user
(XEN) dom25: mode=0,ofs=0x1d0c5230cdfad,khz=2493798,inc=1,vtsc count:
2103227324 kernel, 656635140 user
(XEN) dom27: mode=0,ofs=0x1d868b8c1fbbf,khz=2493798,inc=1,vtsc count:
476542178 kernel, 12976786 user
(XEN) dom31: mode=0,ofs=0x1dc08da161ebc,khz=2493798,inc=1,vtsc count:
2747233178 kernel, 466863700 user
(XEN) dom32: mode=0,ofs=0x1ecde6eb53d2c,khz=2493798,inc=1,vtsc count:
305360096 kernel, 11705823 user
(XEN) dom33: mode=0,ofs=0x1ece1bf734f61,khz=2493798,inc=1,vtsc count:
516548852 kernel, 18662125 user

Output of xm debug-key t :

(XEN) Synced stime skew: max=1405ns avg=1405ns samples=1 current=1405ns
(XEN) Synced cycles skew: max=2377 avg=2377 samples=1 current=2377

Output of /proc/cpuinfo :

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
stepping        : 6
cpu MHz         : 2493.798
cache size      : 6144 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat
clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc up rep_good
aperfmperf pni est ssse3 cx16 sse4_1 hypervisor lahf_lm
bogomips        : 4987.59
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

Output of xm info :

release                : 2.6.32-bpo.5-xen-amd64
version                : #1 SMP Mon Jan 17 22:05:11 UTC 2011
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2493
hw_caps                :
bfebfbff:20000800:00000000:00000940:000ce3bd:00000000:00000001:00000000
virt_caps              : hvm
total_memory           : 10239
free_memory            : 910
node_to_cpu            : node0:0-7
node_to_memory         : node0:910
node_to_dma32_mem      : node0:910
max_node_id            : 0
xen_major              : 4
xen_minor              : 0
xen_extra              : .1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : dom0_mem=512M loglvl=all guest_loglvl=all
dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-10)
cc_compile_by          : waldi
cc_compile_domain      : debian.org
cc_compile_date        : Wed Jan 12 14:04:06 UTC 2011
xend_config_format     : 4

in dom0 /var/log/kern.log :

Feb 23 22:40:54 dom0 kernel: [995452.618519] Clocksource tsc unstable (delta
= -2999660335950 ns)

in domU, I don''t see any logs, the time just "jumps" 50min in
the future
(see /var/log/daemon.log)

Feb 23 21:50:51 domU snmpd[1037]: Connection from UDP: [10.16.2.101]:58303
Feb 23 22:40:55 domU snmpd[1037]: Connection from UDP: [10.16.2.101]:45713

Clocksource is set to "xen" to both dom0 et domU :
cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Regards

Olivier

2011/2/24 Keir Fraser <keir.xen@gmail.com>
> Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well
> would be interesting, if you still have it installed on any of these
> machines.
>
>  -- Keir
>
> On 23/02/2011 19:04, "Olivier Hanesse"
<olivier.hanesse@gmail.com> wrote:
>
> > I am sorry for the lack of information.
> > Every domUs on the dom0 are affected by this bug at the exact same
time.
> >
> > And I had this bug on a dozen servers (all running on the same hw)
since
> > October (when I switched from Xen 3.2 to 4.0).
> >
> > Regards
> >
> > Olivier
> >
> > Le 23/02/2011 18:19, Keir Fraser a écrit :
> >> On 23/02/2011 16:16, "Dan
Magenheimer"<dan.magenheimer@oracle.com>
>  wrote:
> >>
> >>> It¹s very unlikely this is a problem with TSC. It is most
likely a Xen
> (or
> >>> possibly a PV Linux) problem where a guest (or dom0) either
³goes out
> to
> >>> lunch² for a long period, or some other timer gets stuck.  The
> ³clocksource
> >>> tsc unstable² message is a side effect of this... it¹s very
likely the
> TSC
> >>> that IS stable and correct and the other clocksource (pvclock)
has
> >>> lost/gained
> >>> 50 minutes!
> >>>
> >>> Mark Adams cc¹ed and his original xen-devel posting below. 
The fact
> that
> >>> two
> >>> different users (possibly on the same processor/system type?)
have
> submitted
> >>> the message with a delta so similar would lead me to believe
there is
> some
> >>> timer that is ³wrapping².  And since pvclock is usually the
clocksource
> for
> >>> dom0, and pvclock is driven!  by Xen¹s ³system time², a
reasonable
> guess is
> >>> that the timer that is wrapping is in Xen itself.
> >>>
> >>> Mark¹s delta = -2999660303788 ns
> >>> Your delta = -2999660334211 ns
> >>>
> >>> Googling, I see the HPET wraparound is ~306 seconds and this
delta is
> about
> >>> 3000 seconds, so that may be a bad guess.
> >>>
> >>> Keir, any thoughts on this?  Do you recall any post-4.0
patches that
> may
> >>> have
> >>> fixed this?
> >> I''ve never seen a 3000s wrap, and I don''t know
of anything that would
> have
> >> fixed a bug like this. If this is a Xen time wrap of some kind
then it
> would
> >> affect all running guests; it''s not clear here whether
only one, or all,
> >> guests see the wrap.
> >>
> >>   K.
> >>
> >>> Thanks,
> >>> Dan
> >>>
> >>> References:
> >>>
> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html
> >>> https://lkml.org/lkml/2010/10/26/126
> >>>
> >>>
> >>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
> >>> Sent: Wednesday, February 23, 2011 3:50 AM
> >>> To: xen-devel@lists.xensource.co!  m; Xen Users
> >>> Subject: [Xen-devel] Xen 4 TSC problems
> >>>
> >>>
> >>> Hello
> >>>
> >>>
> >>>
> >>> I''ve got an issue about time keeping with Xen 4.0
(Debian squeeze
> release).
> >>>
> >>>
> >>>
> >>> My problem is here (hopefully I amn''t the only one,
so there might be a
> bug
> >>> somewhere) :
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
> >>>
> >>> After some times,  I got this error : Clocksource tsc unstable
(delta > >>> -2999660334211 ns). It has happened on several servers.
> >>>
> >>>
> >>>
> >>> Looking at the output of "xm debug-key s;"
> >>>
> >>>
> >>>
> >>> (XEN) TSC has constant rate, deep Cstates possible, so not
reliable,
> >>> warp=2850
> >>> (count=3)
> >>>
> >>>
> >>>
> >>> I am using a "Intel(R) Xeon(R) CPU L5420  @
2.50GHz", which has the
> >>> "constant_tsc", but not the "nonstop_tsc"
one.
> >>>
> >>> On other systems with a newer cpu with
"nonstop_tsc", I don''t have this
> >>> issue
> >>> (systems are running the same distros with same config).
> >>>
> >>>
> >>>
> >>> I tried to boot with "max_cstate=0", but nothing
changed, my TSC isn''t
> >>> reliable and after some times, I will got the
"50min" issue again.
> >>>
> >>>
> >>>
> >>> I don''t unders!  tand how a system can do a jump of
"50min" in the
> future.
> >>> Why
> >>> 50min ? it is not 40min, not 1 hour, it is always 50min.
> >>>
> >>> I don''t know how to make my TSC "reliable"
(I already disable
> everything
> >>> about
> >>> Powerstate in BIOS Settings).
> >>>
> >>>
> >>>
> >>> Any ideas ?
> >>>
> >>>
> >>>
> >>> Regards
> >>>
> >>>
> >>>
> >>> Olivier
> >>>
> >>
> >
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Feb-24 10:59 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

>>> On 24.02.11 at 10:59, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
> 
> Output of xm debug-key s :
> 
> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
> warp=2684 (count=4)
Did you try turning of use of C states ("cpuidle=0" on the Xen
command line)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Feb-24 11:30 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

On 24/02/2011 10:59, "Jan Beulich" <JBeulich@novell.com> wrote:
>>>> On 24.02.11 at 10:59, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more
times.
>> 
>> Output of xm debug-key s :
>> 
>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
>> warp=2684 (count=4)
> 
> Did you try turning of use of C states ("cpuidle=0" on the Xen
> command line)?
Another thing to try is changing the platform timer that Xen uses. It''s
using HPET on your machines, so try clocksource=pit on Xen command line, and
confirm that the ''Platform timer is xxx'' message changes in xm
dmesg.

However, this bug looks more like a CPU''s TSC jumping forward (or maybe
backward) for some inexplicable reason. We added code post 3.2 to detect the
platform timer counter wrapping, and to account for that based on trusting
the CPU''s 64-bit TSC. But if the TSC value is bogus then we can detect
a
wrap when it didn''t happen and the new code will do more harm than
good. It
is not currently possible to disable the code via a boto parameter -- maybe
we could add that. However, if the problem is a jumpy TSC then it is better
to fix that as Xen relies so heavily on TSC for time handling.

 -- Keir
> Jan
> 

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Olivier Hanesse

2011-Feb-24 11:57 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

Jan :

I tried to turn off cstates with max_cstate=0 without success (still "not
reliable").

With cpuidle=0, I also got :

(XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=3022 (count=1)

xm info | grep command
xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all guest_loglvl=all
dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1

Keir :

Using clocksource=pit :

(XEN) Platform timer is 1.193MHz PIT

I also got :

(XEN) TSC has constant rate, deep Cstates possible, so not reliable,
warp=3262 (count=2)

2011/2/24 Keir Fraser <keir@xen.org>
> On 24/02/2011 10:59, "Jan Beulich" <JBeulich@novell.com>
wrote:
>
> >>>> On 24.02.11 at 10:59, Olivier Hanesse
<olivier.hanesse@gmail.com>
> wrote:
> >> (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more
> times.
> >>
> >> Output of xm debug-key s :
> >>
> >> (XEN) TSC has constant rate, deep Cstates possible, so not
reliable,
> >> warp=2684 (count=4)
> >
> > Did you try turning of use of C states ("cpuidle=0" on the
Xen
> > command line)?
>
> Another thing to try is changing the platform timer that Xen uses.
It''s
> using HPET on your machines, so try clocksource=pit on Xen command line,
> and
> confirm that the ''Platform timer is xxx'' message changes
in xm dmesg.
>
> However, this bug looks more like a CPU''s TSC jumping forward (or
maybe
> backward) for some inexplicable reason. We added code post 3.2 to detect
> the
> platform timer counter wrapping, and to account for that based on trusting
> the CPU''s 64-bit TSC. But if the TSC value is bogus then we can
detect a
> wrap when it didn''t happen and the new code will do more harm than
good. It
> is not currently possible to disable the code via a boto parameter -- maybe
> we could add that. However, if the problem is a jumpy TSC then it is better
> to fix that as Xen relies so heavily on TSC for time handling.
>
>  -- Keir
>
> > Jan
> >
>
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jan Beulich

2011-Feb-24 12:37 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
> I tried to turn off cstates with max_cstate=0 without success (still
"not
> reliable").
> 
> With cpuidle=0, I also got :
> 
> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
> warp=3022 (count=1)
This message by itself isn''t telling much I believe.
> xm info | grep command
> xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
guest_loglvl=all
> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
> 
> Keir :
> 
> Using clocksource=pit :
> 
> (XEN) Platform timer is 1.193MHz PIT
> 
> I also got :
> 
> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
> warp=3262 (count=2)
The question is whether any of this eliminates the time jumps seen
by your DomU-s (from your past mails I wasn''t actually sure whether
Dom0 also experienced this problem, albeit it would be odd if it
didn''t).

Jan

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olivier Hanesse

2011-Feb-24 14:20 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

Both dom0 and domUs are affected by this" jump".

I expect to see something like "TSC marked as reliable, warp = 0".
I got this on newer hardware with same config/distros.

Is there a way to measure if it is a TSC warp ? to point out a cpu tsc issue
?


2011/2/24 Jan Beulich <JBeulich@novell.com>
> >>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
> wrote:
> > I tried to turn off cstates with max_cstate=0 without success (still
"not
> > reliable").
> >
> > With cpuidle=0, I also got :
> >
> > (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
> > warp=3022 (count=1)
>
> This message by itself isn''t telling much I believe.
>
> > xm info | grep command
> > xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
> guest_loglvl=all
> > dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
> >
> > Keir :
> >
> > Using clocksource=pit :
> >
> > (XEN) Platform timer is 1.193MHz PIT
> >
> > I also got :
> >
> > (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
> > warp=3262 (count=2)
>
> The question is whether any of this eliminates the time jumps seen
> by your DomU-s (from your past mails I wasn''t actually sure
whether
> Dom0 also experienced this problem, albeit it would be odd if it
didn''t).
>
> Jan
>
> Jan
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Feb-24 14:52 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com> wrote:
> Both dom0 and domUs are affected by this" jump".
> 
> I expect to see something like "TSC marked as reliable, warp =
0". 
> I got this on newer hardware with same config/distros.
It depends on the CPU itself, older CPUs do not have the super-stable TSC
features. But that should never cause a massive 3000s time jump.
> Is there a way to measure if it is a TSC warp ? to point out a cpu tsc
issue ?
The TSC warps or out-of-sync issues that we could reasonably expect would be
on the order of microseconds. A 3000s warp is something else entirely. Xen
is very confused and/or some TSC or platform timer has jumped a long way
(indicating a hardware/firmware issue).

 -- Keir
> 
> 2011/2/24 Jan Beulich <JBeulich@novell.com>
>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
>>> I tried to turn off cstates with max_cstate=0 without success
(still "not
>>> reliable").
>>> 
>>> With cpuidle=0, I also got :
>>> 
>>> (XEN) TSC has constant rate, deep Cstates possible, so not
reliable,
>>> warp=3022 (count=1)
>> 
>> This message by itself isn''t telling much I believe.
>> 
>>> xm info | grep command
>>> xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
guest_loglvl=all
>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1
>>> 
>>> Keir :
>>> 
>>> Using clocksource=pit :
>>> 
>>> (XEN) Platform timer is 1.193MHz PIT
>>> 
>>> I also got :
>>> 
>>> (XEN) TSC has constant rate, deep Cstates possible, so not
reliable,
>>> warp=3262 (count=2)
>> 
>> The question is whether any of this eliminates the time jumps seen
>> by your DomU-s (from your past mails I wasn''t actually sure
whether
>> Dom0 also experienced this problem, albeit it would be odd if it
didn''t).
>> 
>> Jan
>> 
>> Jan
>> 
> 
> 


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Dan Magenheimer

2011-Feb-24 17:43 UTC

head link

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

Just a wild guess, but this in Olivier''s posted output:

(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.

and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
"10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue
(or a complete red herring, but I thought it worth mentioning).

Mark and Olivier, it would be interesting to know if you are
using the same processor/system.
> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Thursday, February 24, 2011 7:52 AM
> To: Olivier Hanesse; Jan Beulich
> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen
> Users; Dan Magenheimer; Keir Fraser
> Subject: Re: [Xen-devel] Xen 4 TSC problems
> 
> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
> wrote:
> 
> > Both dom0 and domUs are affected by this" jump".
> >
> > I expect to see something like "TSC marked as reliable, warp =
0".
> > I got this on newer hardware with same config/distros.
> 
> It depends on the CPU itself, older CPUs do not have the super-stable
> TSC
> features. But that should never cause a massive 3000s time jump.
> 
> > Is there a way to measure if it is a TSC warp ? to point out a cpu
> tsc issue ?
> 
> The TSC warps or out-of-sync issues that we could reasonably expect
> would be
> on the order of microseconds. A 3000s warp is something else entirely.
> Xen
> is very confused and/or some TSC or platform timer has jumped a long
> way
> (indicating a hardware/firmware issue).
> 
>  -- Keir
> 
> >
> > 2011/2/24 Jan Beulich <JBeulich@novell.com>
> >>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
> wrote:
> >>> I tried to turn off cstates with max_cstate=0 without success
> (still "not
> >>> reliable").
> >>>
> >>> With cpuidle=0, I also got :
> >>>
> >>> (XEN) TSC has constant rate, deep Cstates possible, so not
> reliable,
> >>> warp=3022 (count=1)
> >>
> >> This message by itself isn''t telling much I believe.
> >>
> >>> xm info | grep command
> >>> xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
> guest_loglvl=all
> >>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1
com1=19200,8n1
> >>>
> >>> Keir :
> >>>
> >>> Using clocksource=pit :
> >>>
> >>> (XEN) Platform timer is 1.193MHz PIT
> >>>
> >>> I also got :
> >>>
> >>> (XEN) TSC has constant rate, deep Cstates possible, so not
> reliable,
> >>> warp=3262 (count=2)
> >>
> >> The question is whether any of this eliminates the time jumps seen
> >> by your DomU-s (from your past mails I wasn''t actually
sure whether
> >> Dom0 also experienced this problem, albeit it would be odd if it
> didn''t).
> >>
> >> Jan
> >>
> >> Jan
> >>
> >
> >
> 
> 
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Olivier Hanesse

2011-Feb-24 17:58 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

Mark is running with a E5620 Xeon processor.
I got a L5420.

What is very strange is that this jump is always 50min, not more, not less.
And we are not alone with Mark to have this issue.
So it might have an explanation somewhere (bad counter, overflow, bug or
somethings).
So maybe this 300 seconds * 10 is a lead.

Another point, what is the number "warp" really means, in the output
of "xm
debug-key -s".
Should I monitor this number ?  Maybe I could predict a jump by watching
this value ?

2011/2/24 Dan Magenheimer <dan.magenheimer@oracle.com>
> Just a wild guess, but this in Olivier''s posted output:
>
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
>
> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a
clue
> (or a complete red herring, but I thought it worth mentioning).
>
> Mark and Olivier, it would be interesting to know if you are
> using the same processor/system.
>
> > -----Original Message-----
> > From: Keir Fraser [mailto:keir.xen@gmail.com]
> > Sent: Thursday, February 24, 2011 7:52 AM
> > To: Olivier Hanesse; Jan Beulich
> > Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com;
Xen
> > Users; Dan Magenheimer; Keir Fraser
> > Subject: Re: [Xen-devel] Xen 4 TSC problems
> >
> > On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
> > wrote:
> >
> > > Both dom0 and domUs are affected by this" jump".
> > >
> > > I expect to see something like "TSC marked as reliable, warp
= 0".
> > > I got this on newer hardware with same config/distros.
> >
> > It depends on the CPU itself, older CPUs do not have the super-stable
> > TSC
> > features. But that should never cause a massive 3000s time jump.
> >
> > > Is there a way to measure if it is a TSC warp ? to point out a
cpu
> > tsc issue ?
> >
> > The TSC warps or out-of-sync issues that we could reasonably expect
> > would be
> > on the order of microseconds. A 3000s warp is something else entirely.
> > Xen
> > is very confused and/or some TSC or platform timer has jumped a long
> > way
> > (indicating a hardware/firmware issue).
> >
> >  -- Keir
> >
> > >
> > > 2011/2/24 Jan Beulich <JBeulich@novell.com>
> > >>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
> > wrote:
> > >>> I tried to turn off cstates with max_cstate=0 without
success
> > (still "not
> > >>> reliable").
> > >>>
> > >>> With cpuidle=0, I also got :
> > >>>
> > >>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> > reliable,
> > >>> warp=3022 (count=1)
> > >>
> > >> This message by itself isn''t telling much I believe.
> > >>
> > >>> xm info | grep command
> > >>> xen_commandline        : dom0_mem=512M cpuidle=0
loglvl=all
> > guest_loglvl=all
> > >>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1
com1=19200,8n1
> > >>>
> > >>> Keir :
> > >>>
> > >>> Using clocksource=pit :
> > >>>
> > >>> (XEN) Platform timer is 1.193MHz PIT
> > >>>
> > >>> I also got :
> > >>>
> > >>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> > reliable,
> > >>> warp=3262 (count=2)
> > >>
> > >> The question is whether any of this eliminates the time jumps
seen
> > >> by your DomU-s (from your past mails I wasn''t
actually sure whether
> > >> Dom0 also experienced this problem, albeit it would be odd if
it
> > didn''t).
> > >>
> > >> Jan
> > >>
> > >> Jan
> > >>
> > >
> > >
> >
> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Feb-24 19:01 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

On 02/24/2011 09:43 AM, Dan Magenheimer wrote:> Just a wild guess, but this in Olivier''s posted output:
>
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
>
> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a
clue
> (or a complete red herring, but I thought it worth mentioning).
>
> Mark and Olivier, it would be interesting to know if you are
> using the same processor/system.
It definitely seems like some kind of problem on the host system rather
than anything in the guests themselves.  If the platform timer is
misbehaving, then Xen could be completely screwing up the pvclock
calibration which it then passes to guests.

Could it be one of those "platform clock stops in certain power
states"
problems?

    J
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>> Sent: Thursday, February 24, 2011 7:52 AM
>> To: Olivier Hanesse; Jan Beulich
>> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen
>> Users; Dan Magenheimer; Keir Fraser
>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>
>> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
>> wrote:
>>
>>> Both dom0 and domUs are affected by this" jump".
>>>
>>> I expect to see something like "TSC marked as reliable, warp =
0".
>>> I got this on newer hardware with same config/distros.
>> It depends on the CPU itself, older CPUs do not have the super-stable
>> TSC
>> features. But that should never cause a massive 3000s time jump.
>>
>>> Is there a way to measure if it is a TSC warp ? to point out a cpu
>> tsc issue ?
>>
>> The TSC warps or out-of-sync issues that we could reasonably expect
>> would be
>> on the order of microseconds. A 3000s warp is something else entirely.
>> Xen
>> is very confused and/or some TSC or platform timer has jumped a long
>> way
>> (indicating a hardware/firmware issue).
>>
>>  -- Keir
>>
>>> 2011/2/24 Jan Beulich <JBeulich@novell.com>
>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
>> wrote:
>>>>> I tried to turn off cstates with max_cstate=0 without
success
>> (still "not
>>>>> reliable").
>>>>>
>>>>> With cpuidle=0, I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not
>> reliable,
>>>>> warp=3022 (count=1)
>>>> This message by itself isn''t telling much I believe.
>>>>
>>>>> xm info | grep command
>>>>> xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
>> guest_loglvl=all
>>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1
com1=19200,8n1
>>>>>
>>>>> Keir :
>>>>>
>>>>> Using clocksource=pit :
>>>>>
>>>>> (XEN) Platform timer is 1.193MHz PIT
>>>>>
>>>>> I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not
>> reliable,
>>>>> warp=3262 (count=2)
>>>> The question is whether any of this eliminates the time jumps
seen
>>>> by your DomU-s (from your past mails I wasn''t actually
sure whether
>>>> Dom0 also experienced this problem, albeit it would be odd if
it
>> didn''t).
>>>> Jan
>>>>
>>>> Jan
>>>>
>>>
>>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Olivier Hanesse

2011-Feb-28 14:37 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

Hello,

It happened again twice this weekend.

What about setting "tsc_mode=2" for my vms ? Should this mode prevent
this
bug (coming from a bad emulated tsc due to firmware issue ? is it possible
?) from affecting time in domUs ?

Setting clocksource=pit, make ''tsc'' available in
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
(otherwise only xen is available, is it normal ? ).

Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ?
or  will it be worsed ?

Regards

Olivier

2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>
> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:
> > Just a wild guess, but this in Olivier''s posted output:
> >
> > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more
> times.
> >
> > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> > "10 or more times", 10 * 300 seconds is 3000 seconds, might
be a clue
> > (or a complete red herring, but I thought it worth mentioning).
> >
> > Mark and Olivier, it would be interesting to know if you are
> > using the same processor/system.
>
> It definitely seems like some kind of problem on the host system rather
> than anything in the guests themselves.  If the platform timer is
> misbehaving, then Xen could be completely screwing up the pvclock
> calibration which it then passes to guests.
>
> Could it be one of those "platform clock stops in certain power
states"
> problems?
>
>    J
>
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.xen@gmail.com]
> >> Sent: Thursday, February 24, 2011 7:52 AM
> >> To: Olivier Hanesse; Jan Beulich
> >> Cc: Mark Adams; Jeremy Fitzhardinge;
xen-devel@lists.xensource.com; Xen
> >> Users; Dan Magenheimer; Keir Fraser
> >> Subject: Re: [Xen-devel] Xen 4 TSC problems
> >>
> >> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
> >> wrote:
> >>
> >>> Both dom0 and domUs are affected by this" jump".
> >>>
> >>> I expect to see something like "TSC marked as reliable,
warp = 0".
> >>> I got this on newer hardware with same config/distros.
> >> It depends on the CPU itself, older CPUs do not have the
super-stable
> >> TSC
> >> features. But that should never cause a massive 3000s time jump.
> >>
> >>> Is there a way to measure if it is a TSC warp ? to point out a
cpu
> >> tsc issue ?
> >>
> >> The TSC warps or out-of-sync issues that we could reasonably
expect
> >> would be
> >> on the order of microseconds. A 3000s warp is something else
entirely.
> >> Xen
> >> is very confused and/or some TSC or platform timer has jumped a
long
> >> way
> >> (indicating a hardware/firmware issue).
> >>
> >>  -- Keir
> >>
> >>> 2011/2/24 Jan Beulich <JBeulich@novell.com>
> >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
> >> wrote:
> >>>>> I tried to turn off cstates with max_cstate=0 without
success
> >> (still "not
> >>>>> reliable").
> >>>>>
> >>>>> With cpuidle=0, I also got :
> >>>>>
> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> >> reliable,
> >>>>> warp=3022 (count=1)
> >>>> This message by itself isn''t telling much I
believe.
> >>>>
> >>>>> xm info | grep command
> >>>>> xen_commandline        : dom0_mem=512M cpuidle=0
loglvl=all
> >> guest_loglvl=all
> >>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1
com1=19200,8n1
> >>>>>
> >>>>> Keir :
> >>>>>
> >>>>> Using clocksource=pit :
> >>>>>
> >>>>> (XEN) Platform timer is 1.193MHz PIT
> >>>>>
> >>>>> I also got :
> >>>>>
> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> >> reliable,
> >>>>> warp=3262 (count=2)
> >>>> The question is whether any of this eliminates the time
jumps seen
> >>>> by your DomU-s (from your past mails I wasn''t
actually sure whether
> >>>> Dom0 also experienced this problem, albeit it would be odd
if it
> >> didn''t).
> >>>> Jan
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Keir Fraser

2011-Feb-28 15:00 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

The message about detecting wrapped platform timer on Xen console indicates
a host problem rather than a guest configuration problem. Did you try
running long term with changed platform timer source on Xen command line
(clocksource=pit), and also cpuidle=0?

 K.

On 28/02/2011 14:37, "Olivier Hanesse"
<olivier.hanesse@gmail.com> wrote:
> Hello,
> 
> It happened again twice this weekend.
> 
> What about setting "tsc_mode=2" for my vms ? Should this mode
prevent this bug
> (coming from a bad emulated tsc due to firmware issue ? is it possible ?)
from
> affecting time in domUs ?
> 
> Setting clocksource=pit, make ''tsc'' available in
>
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
> (otherwise only xen is available, is it normal ? ). 
> 
> Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU
?
> or  will it be worsed ?
> 
> Regards
> 
> Olivier
> 
> 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>
>> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:
>>> Just a wild guess, but this in Olivier''s posted output:
>>> 
>>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more times.
>>> 
>>> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
>>> "10 or more times", 10 * 300 seconds is 3000 seconds,
might be a clue
>>> (or a complete red herring, but I thought it worth mentioning).
>>> 
>>> Mark and Olivier, it would be interesting to know if you are
>>> using the same processor/system.
>> 
>> It definitely seems like some kind of problem on the host system rather
>> than anything in the guests themselves.  If the platform timer is
>> misbehaving, then Xen could be completely screwing up the pvclock
>> calibration which it then passes to guests.
>> 
>> Could it be one of those "platform clock stops in certain power
states"
>> problems?
>> 
>>     J
>> 
>>>> -----Original Message-----
>>>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>>>> Sent: Thursday, February 24, 2011 7:52 AM
>>>> To: Olivier Hanesse; Jan Beulich
>>>> Cc: Mark Adams; Jeremy Fitzhardinge;
xen-devel@lists.xensource.com; Xen
>>>> Users; Dan Magenheimer; Keir Fraser
>>>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>>> 
>>>> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
>>>> wrote:
>>>> 
>>>>> Both dom0 and domUs are affected by this" jump".
>>>>> 
>>>>> I expect to see something like "TSC marked as
reliable, warp = 0".
>>>>> I got this on newer hardware with same config/distros.
>>>> It depends on the CPU itself, older CPUs do not have the
super-stable
>>>> TSC
>>>> features. But that should never cause a massive 3000s time
jump.
>>>> 
>>>>> Is there a way to measure if it is a TSC warp ? to point
out a cpu
>>>> tsc issue ?
>>>> 
>>>> The TSC warps or out-of-sync issues that we could reasonably
expect
>>>> would be
>>>> on the order of microseconds. A 3000s warp is something else
entirely.
>>>> Xen
>>>> is very confused and/or some TSC or platform timer has jumped a
long
>>>> way
>>>> (indicating a hardware/firmware issue).
>>>> 
>>>>  -- Keir
>>>> 
>>>>> 2011/2/24 Jan Beulich <JBeulich@novell.com>
>>>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
>>>> wrote:
>>>>>>> I tried to turn off cstates with max_cstate=0
without success
>>>> (still "not
>>>>>>> reliable").
>>>>>>> 
>>>>>>> With cpuidle=0, I also got :
>>>>>>> 
>>>>>>> (XEN) TSC has constant rate, deep Cstates possible,
so not
>>>> reliable,
>>>>>>> warp=3022 (count=1)
>>>>>> This message by itself isn''t telling much I
believe.
>>>>>> 
>>>>>>> xm info | grep command
>>>>>>> xen_commandline        : dom0_mem=512M cpuidle=0
loglvl=all
>>>> guest_loglvl=all
>>>>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1
com1=19200,8n1
>>>>>>> 
>>>>>>> Keir :
>>>>>>> 
>>>>>>> Using clocksource=pit :
>>>>>>> 
>>>>>>> (XEN) Platform timer is 1.193MHz PIT
>>>>>>> 
>>>>>>> I also got :
>>>>>>> 
>>>>>>> (XEN) TSC has constant rate, deep Cstates possible,
so not
>>>> reliable,
>>>>>>> warp=3262 (count=2)
>>>>>> The question is whether any of this eliminates the time
jumps seen
>>>>>> by your DomU-s (from your past mails I wasn''t
actually sure whether
>>>>>> Dom0 also experienced this problem, albeit it would be
odd if it
>>>> didn''t).
>>>>>> Jan
>>>>>> 
>>>>>> Jan
>>>>>> 
>>>>> 
>>>> 
>> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Feb-28 15:14 UTC

head link

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

Hi Olivier -

 

It is the Xen clocksource that you want to try to change, not the dom0
clocksource.  To do this, you need to specify "clocksource=pit" on the
Xen boot line (and reboot), not the dom0 boot line.

 

I believe Mark Adams played with tsc_mode to see if it solved his (similar?
identical?) problem last year, and it didn''t make any difference.


Please try booting Xen with "clocksource=pit" and ensure that
"Platform timer is 1.19MHz PIT" appears in the Xen boot messages.  If
the 50min jump does not appear again, it would point to a problem in the hpet,
either hardware or software.

 

Thanks,

Dan

 

From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] 
Sent: Monday, February 28, 2011 7:37 AM
To: Jeremy Fitzhardinge
Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams;
xen-devel@lists.xensource.com; Xen Users; Keir Fraser
Subject: Re: [Xen-devel] Xen 4 TSC problems

 

Hello,

 

It happened again twice this weekend.

 

What about setting "tsc_mode=2" for my vms ? Should this mode prevent
this bug (coming from a bad emulated tsc due to firmware issue ? is it possible
?) from affecting time in domUs ?

 

Setting clocksource=pit, make ''tsc'' available in
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
(otherwise only xen is available, is it normal ? ).

 

Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? or 
will it be worsed ?

 

Regards

 

Olivier

 

2011/2/24 Jeremy Fitzhardinge <HYPERLINK
"mailto:jeremy@goop.org"jeremy@goop.org>

On 02/24/2011 09:43 AM, Dan Magenheimer wrote:> Just a wild guess, but this in Olivier''s posted output:
>
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
>
> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a
clue
> (or a complete red herring, but I thought it worth mentioning).
>
> Mark and Olivier, it would be interesting to know if you are
> using the same processor/system.
It definitely seems like some kind of problem on the host system rather
than anything in the guests themselves.  If the platform timer is
misbehaving, then Xen could be completely screwing up the pvclock
calibration which it then passes to guests.

Could it be one of those "platform clock stops in certain power
states"
problems?


   J
>> -----Original Message-----
>> From: Keir Fraser [mailto:HYPERLINK
"mailto:keir.xen@gmail.com"keir.xen@gmail.com]
>> Sent: Thursday, February 24, 2011 7:52 AM
>> To: Olivier Hanesse; Jan Beulich
>> Cc: Mark Adams; Jeremy Fitzhardinge; HYPERLINK
"mailto:xen-devel@lists.xensource.com"xen-devel@lists.xensource.com;
Xen
>> Users; Dan Magenheimer; Keir Fraser
>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>
>> On 24/02/2011 14:20, "Olivier Hanesse" <HYPERLINK
"mailto:olivier.hanesse@gmail.com"olivier.hanesse@gmail.com>
>> wrote:
>>
>>> Both dom0 and domUs are affected by this" jump".
>>>
>>> I expect to see something like "TSC marked as reliable, warp =
0".
>>> I got this on newer hardware with same config/distros.
>> It depends on the CPU itself, older CPUs do not have the super-stable
>> TSC
>> features. But that should never cause a massive 3000s time jump.
>>
>>> Is there a way to measure if it is a TSC warp ? to point out a cpu
>> tsc issue ?
>>
>> The TSC warps or out-of-sync issues that we could reasonably expect
>> would be
>> on the order of microseconds. A 3000s warp is something else entirely.
>> Xen
>> is very confused and/or some TSC or platform timer has jumped a long
>> way
>> (indicating a hardware/firmware issue).
>>
>>  -- Keir
>>
>>> 2011/2/24 Jan Beulich <HYPERLINK
"mailto:JBeulich@novell.com"JBeulich@novell.com>
>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <HYPERLINK
"mailto:olivier.hanesse@gmail.com"olivier.hanesse@gmail.com>
>> wrote:
>>>>> I tried to turn off cstates with max_cstate=0 without
success
>> (still "not
>>>>> reliable").
>>>>>
>>>>> With cpuidle=0, I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not
>> reliable,
>>>>> warp=3022 (count=1)
>>>> This message by itself isn''t telling much I believe.
>>>>
>>>>> xm info | grep command
>>>>> xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
>> guest_loglvl=all
>>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1
com1=19200,8n1
>>>>>
>>>>> Keir :
>>>>>
>>>>> Using clocksource=pit :
>>>>>
>>>>> (XEN) Platform timer is 1.193MHz PIT
>>>>>
>>>>> I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not
>> reliable,
>>>>> warp=3262 (count=2)
>>>> The question is whether any of this eliminates the time jumps
seen
>>>> by your DomU-s (from your past mails I wasn''t actually
sure whether
>>>> Dom0 also experienced this problem, albeit it would be odd if
it
>> didn''t).
>>>> Jan
>>>>
>>>> Jan
>>>>
>>>
>>
 


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Olivier Hanesse

2011-Feb-28 15:23 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

Keir :

Yes, it is "under progress".
To make this change, I had to reboot every server, so it is taking time
(production server :()
So i was hoping to find a quick method to mitigate this issue on domUs while
rebooting servers.

As this bug happens once or twice per server since October, I can''t say
that
right now that changing platform timer to PIT fixed it. I have to wait (I
hope forever!) this bug to happen again on a ''patched'' server
...

But even with clcoksource=pit, I am seeing some warp=3000+ in debug message
? I guess it is not a good sign, is it ?

Jan : I was hoping to find a way to make the domU clocksource more
"independent" like with xen3.2.


2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>
> Hi Olivier –
>
>
>
> It is the Xen clocksource that you want to try to change, not the dom0
> clocksource.  To do this, you need to specify “clocksource=pit” on the Xen
> boot line (and reboot), not the dom0 boot line.
>
>
>
> I believe Mark Adams played with tsc_mode to see if it solved! his
> (similar? identical?) problem last year, and it didn’t make any difference.
>
>
> Please try booting Xen with “clocksource=pit” and ensure that “Platform
> timer is 1.19MHz PIT” appears in the Xen boot messages.  If the 50min jump
> does not appear again, it would point to a problem in the hpet, either
> hardware or software.
>
>
>
> Thanks,
>
> Dan
>
>
>
> *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
> *Sent:* Monday, February 28, 2011 7:37 AM
> *To:* Jeremy Fitzhardinge
> *Cc:* Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams;
> xen-devel@lists.xensource.com; Xen Users; Keir Fraser
>
> *Subject:* Re: [Xen-devel] Xen 4 TSC problems
>
>
>
> Hello,
>
>
>
> It happened again twice this weekend.
>
>
>
> What about setting "tsc_mode=2" for my vms ? Should this mode
prevent this
> bug (coming from a bad emulated tsc due to firmware issue ? is it possible
> ?) from affecting time in domUs ?
>
>
>
> Setting clocksource=pit, make ''tsc'' available in
>
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
> (otherwise only xen is available, is it normal ? ).
>
>
>
> Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU
> ? or  will it be worsed ?
>
>
>
> Regards
>
>
>
> Olivier
>
>
>
> 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>
>
> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:
> > Just a wild guess, but this in Olivier''s posted output:
> >
> > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more
> times.
> >
> > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> > "10 or more times", 10 * 300 seconds is 3000 seconds, might
be a clue
> > (or a complete red herring, but I thought it worth mentioning).
> >
> > Mark and Olivier, it would be interesting to know if you are
> > using the same processor/system.
>
> It definitely seems like some kind of problem on the host system rather
> than anything in the guests themselves. !  If the platform timer is
> misbehaving, then Xen could be completely screwing up the pvclock
> calibration which it then passes to guests.
>
> Could it be one of those "platform clock stops in certain power
states"
> problems?
>
>
>    J
>
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.xen@gmail.com]
> >> Sent: Thursday, February 24, 2011 7:52 AM
> >> To: Olivier Hanesse; Jan Beulich
> >> Cc: Mark Adams; Jeremy Fitzhardinge;
xen-devel@lists.xensource.com; Xen
> >> Users; Dan Magenheimer; Keir Fraser
> >> Subject: Re: [Xen-devel] Xen 4 TSC problems
> >>
> >> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
> >> wrote:
> >>
> >>> Both dom0 and domUs are affected by this" jump".
> >>>
> >>> I expect to see something like "TSC marked as reliable,
warp = 0".
> >>> I got this on newer hardware with same config/distros.
> >> It depends on the CPU itself, older CPUs do not have the
super-stable
> >> TSC
> >> features. But that should never cause a massive 3000s time jump.
> >>
> >>> Is there a way to measure if it is a TSC warp ? to point out a
cpu
> >> tsc issue ?
> >>
> >> The TSC warps or out-of-sync issues that we could reasonably
expect
> >> would be
> >> on the order of microseconds. A 3000s warp is something else
entirely.
> >> Xen
> >> is very confused and/or some TSC or platform timer has jumped a
long
> >> way
> >> (indicating a hardware/firmware issue).
> >>
> >>  -- Keir
> >>
> >&gt! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com>
>
> >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
> >> wrote:
> >>>>> I tried to turn off cstates with max_cstate=0 without
success
> >> (still "not
> >>>>> reliable").
> >>>>>
> >>>>> With cpuidle=0, I also got :
> >>>>>
> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> >> reliable,
> >>>>> warp=3022 (count=1)
> >>>> This message by itself isn''t telling much I
believe.
> >>>>
> >>>>> xm info | grep command
> >>>>> xen_commandline        : dom0_mem=512M cpuidle=0
loglvl=all
> >> guest_loglvl=all
> >>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1
com1=19200,8n1
>
> >>>>>
> >>>>> Keir :
> >>>>>
> >>>>> Using clocksource=pit :
> >>>>>
> >>>>> (XEN) Platform timer is 1.193MHz PIT
> >>>>>
> >>>>> I also got :
> >>>>>
> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> >> reliable,
> >>>>> warp=3262 (count=2)
> >>>> The question is whether any of this eliminates the time
jumps seen
> >>>> by your DomU-s (from your past mails I wasn''t
actually sure whether
> >>>> Dom0 also experienced this problem, albeit it would be odd
if it
> >> didn''t).
> >>>> Jan
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>
>
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Dan Magenheimer

2011-Feb-28 15:30 UTC

head link

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

Hi Olivier -

By "warp=3000+ in  debug message" do you mean the Xen boot message
"TSC has constant rate..., warp = NNNN"?

If so, this is a very different "warp" measured in cycles, not in
seconds, so 3000 is more like a microsecond not an hour, and this is normal (not
a bad sign).

Dan

From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] 
Sent: Monday, February 28, 2011 8:23 AM
To: Dan Magenheimer
Cc: Jeremy Fitzhardinge; Keir Fraser; Jan Beulich; Mark Adams;
xen-devel@lists.xensource.com; Xen Users; Keir Fraser
Subject: Re: [Xen-devel] Xen 4 TSC problems

Keir : 

Yes, it is "under progress". 

To make this change, I had to reboot every server, so it is taking time
(production server :()

So i was hoping to find a quick method to mitigate this issue on domUs while
rebooting servers.

As this bug happens once or twice per server since October, I can''t say
that right now that changing platform timer to PIT fixed it. I have to wait (I
hope forever!) this bug to happen again on a ''patched'' server
...

But even with clcoksource=pit, I am seeing some warp=3000+ in debug message ? I
guess it is not a good sign, is it ?

Jan : I was hoping to find a way to make the domU clocksource more
"independent" like with xen3.2.

2011/2/28 Dan Magenheimer <HYPERLINK
"mailto:dan.magenheimer@oracle.com"dan.magenheimer@oracle.com>

Hi Olivier -

It is the Xen clocksource that you want to try to change, not the dom0
clocksource.  To do this, you need to specify "clocksource=pit" on the
Xen boot line (and reboot), not the dom0 boot line.

I believe Mark Adams played with tsc_mode to see if it solved! his (similar?
identical?) problem last year, and it didn''t make any difference.

Please try booting Xen with "clocksource=pit" and ensure that
"Platform timer is 1.19MHz PIT" appears in the Xen boot messages.  If
the 50min jump does not appear again, it would point to a problem in the hpet,
either hardware or software.

Thanks,

Dan

From: Olivier Hanesse [mailto:HYPERLINK
"mailto:olivier.hanesse@gmail.com" \nolivier.hanesse@gmail.com]
Sent: Monday, February 28, 2011 7:37 AM
To: Jeremy Fitzhardinge
Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; HYPERLINK
"mailto:xen-devel@lists.xensource.com"
\nxen-devel@lists.xensource.com; Xen Users; Keir Fraser

Subject: Re: [Xen-devel] Xen 4 TSC problems

Hello,

It happened again twice this weekend.

What about setting "tsc_mode=2" for my vms ? Should this mode prevent
this bug (coming from a bad emulated tsc due to firmware issue ? is it possible
?) from affecting time in domUs ?

Setting clocksource=pit, make ''tsc'' available in
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
(otherwise only xen is available, is it normal ? ).

Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? or 
will it be worsed ?

Regards

Olivier

2011/2/24 Jeremy Fitzhardinge <HYPERLINK "mailto:jeremy@goop.org"
\njeremy@goop.org>

On 02/24/2011 09:43 AM, Dan Magenheimer wrote:> Just a wild guess, but this in Olivier''s posted output:
>
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
>
> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a
clue
> (or a complete red herring, but I thought it worth mentioning).
>
> Mark and Olivier, it would be interesting to know if you are
> using the same processor/system.
It definitely seems like some kind of problem on the host system rather
than anything in the guests themselves. !  If the platform timer is
misbehaving, then Xen could be completely screwing up the pvclock
calibration which it then passes to guests.

Could it be one of those "platform clock stops in certain power
states"
problems?

   J
>> -----Original Message-----
>> From: Keir Fraser [mailto:HYPERLINK
"mailto:keir.xen@gmail.com" \nkeir.xen@gmail.com]
>> Sent: Thursday, February 24, 2011 7:52 AM
>> To: Olivier Hanesse; Jan Beulich
>> Cc: Mark Adams; Jeremy Fitzhardinge; HYPERLINK
"mailto:xen-devel@lists.xensource.com"
\nxen-devel@lists.xensource.com; Xen
>> Users; Dan Magenheimer; Keir Fraser
>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>
>> On 24/02/2011 14:20, "Olivier Hanesse" <HYPERLINK
"mailto:olivier.hanesse@gmail.com" \nolivier.hanesse@gmail.com>
>> wrote:
>>
>>> Both dom0 and domUs are affected by this" jump".
>>>
>>> I expect to see something like "TSC marked as reliable, warp =
0".
>>> I got this on newer hardware with same config/distros.
>> It depends on the CPU itself, older CPUs do not have the super-stable
>> TSC
>> features. But that should never cause a massive 3000s time jump.
>>
>>> Is there a way to measure if it is a TSC warp ? to point out a cpu
>> tsc issue ?
>>
>> The TSC warps or out-of-sync issues that we could reasonably expect
>> would be
>> on the order of microseconds. A 3000s warp is something else entirely.
>> Xen
>> is very confused and/or some TSC or platform timer has jumped a long
>> way
>> (indicating a hardware/firmware issue).
>>
>>  -- Keir
>>
>&gt! ;> 2011/2/24 Jan Beulich <HYPERLINK
"mailto:JBeulich@novell.com" \nJBeulich@novell.com>
>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <HYPERLINK
"mailto:olivier.hanesse@gmail.com" \nolivier.hanesse@gmail.com>
>> wrote:
>>>>> I tried to turn off cstates with max_cstate=0 without
success
>> (still "not
>>>>> reliable").
>>>>>
>>>>> With cpuidle=0, I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not
>> reliable,
>>>>> warp=3022 (count=1)
>>>> This message by itself isn''t telling much I believe.
>>>>
>>>>> xm info | grep command
>>>>> xen_commandline        : dom0_mem=512M cpuidle=0 loglvl=all
>> guest_loglvl=all
>>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1
com1=19200,8n1
>>>>>
>>>>> Keir :
>>>>>
>>>>> Using clocksource=pit :
>>>>>
>>>>> (XEN) Platform timer is 1.193MHz PIT
>>>>>
>>>>> I also got :
>>>>>
>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not
>> reliable,
>>>>> warp=3262 (count=2)
>>>> The question is whether any of this eliminates the time jumps
seen
>>>> by your DomU-s (from your past mails I wasn''t actually
sure whether
>>>> Dom0 also experienced this problem, albeit it would be odd if
it
>> didn''t).
>>>> Jan
>>>>
>>>> Jan
>>>>
>>>
>>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Keir Fraser

2011-Feb-28 15:39 UTC

head link

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

On 28/02/2011 15:23, "Olivier Hanesse"
<olivier.hanesse@gmail.com> wrote:
> Keir : 
> 
> Yes, it is "under progress". 
> To make this change, I had to reboot every server, so it is taking time
> (production server :()
> So i was hoping to find a quick method to mitigate this issue on domUs
while
> rebooting servers.
> 
> As this bug happens once or twice per server since October, I
can''t say that
> right now that changing platform timer to PIT fixed it. I have to wait (I
hope
> forever!) this bug to happen again on a ''patched'' server
... 
> 
> But even with clcoksource=pit, I am seeing some warp=3000+ in debug message
?
> I guess it is not a good sign, is it ?
Better not to have it, but honestly you''re very unlikely to see any
problem
from it. It''s totally unrelated to the 3000-second time jumps.

 -- Keir
> Jan : I was hoping to find a way to make the domU clocksource more
> "independent" like with xen3.2.
> 
> 
> 2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>
>> Hi Olivier 
>>  
>> It is the Xen clocksource that you want to try to change, not the dom0
>> clocksource.  To do this, you need to specify ³clocksource=pit² on the
Xen
>> boot line (and reboot), not the dom0 boot line.
>>  
>> I believe Mark Adams played with tsc_mode to see if it solved! his
(similar?
>> identical?) problem last year, and it didn¹t make any difference.
>> 
>> Please try booting Xen with ³clocksource=pit² and ensure that ³Platform
timer
>> is 1.19MHz PIT² appears in the Xen boot messages.  If the 50min jump
does not
>> appear again, it would point to a problem in the hpet, either hardware
or
>> software.
>>  
>> Thanks,
>> Dan
>>  
>> 
>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
>> Sent: Monday, February 28, 2011 7:37 AM
>> To: Jeremy Fitzhardinge
>> Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams;
>> xen-devel@lists.xensource.com; Xen Users; Keir Fraser
>> 
>> 
>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>  
>> 
>> Hello,
>> 
>>  
>> It happened again twice this weekend.
>> 
>>  
>> 
>> What about setting "tsc_mode=2" for my vms ? Should this mode
prevent this
>> bug (coming from a bad emulated tsc due to firmware issue ? is it
possible ?)
>> from affecting time in domUs ?
>> 
>>  
>> 
>> Setting clocksource=pit, make ''tsc'' available in
>>
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
>> (otherwise only xen is available, is it normal ? ). 
>> 
>>  
>> 
>> Should I bypass xen clocksource and use tsc as a clocksource for
dom0/domU ?
>> or  will it be worsed ?
>> 
>>  
>> 
>> Regards
>> 
>>  
>> 
>> Olivier
>> 
>>  
>> 
>> 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>
>> 
>> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:
>>> Just a wild guess, but this in Olivier''s posted output:
>>> 
>>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more times.
>>> 
>>> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
>>> "10 or more times", 10 * 300 seconds is 3000 seconds,
might be a clue
>>> (or a complete red herring, but I thought it worth mentioning).
>>> 
>>> Mark and Olivier, it would be interesting to know if you are
>>> using the same processor/system.
>> It definitely seems like some kind of problem on the host system rather
>> than anything in the guests themselves. !  If the platform timer is
>> misbehaving, then Xen could be completely screwing up the pvclock
>> calibration which it then passes to guests.
>> 
>> Could it be one of those "platform clock stops in certain power
states"
>> problems?
>> 
>> 
>>    J
>> 
>>>> -----Original Message-----
>>>> From: Keir Fraser [mailto:keir.xen@gmail.com]
>>>> Sent: Thursday, February 24, 2011 7:52 AM
>>>> To: Olivier Hanesse; Jan Beulich
>>>> Cc: Mark Adams; Jeremy Fitzhardinge;
xen-devel@lists.xensource.com; Xen
>>>> Users; Dan Magenheimer; Keir Fraser
>>>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>>> 
>>>> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
>>>> wrote:
>>>> 
>>>>> Both dom0 and domUs are affected by this" jump".
>>>>> 
>>>>> I expect to see something like "TSC marked as
reliable, warp = 0".
>>>>> I got this on newer hardware with same config/distros.
>>>> It depends on the CPU itself, older CPUs do not have the
super-stable
>>>> TSC
>>>> features. But that should never cause a massive 3000s time
jump.
>>>> 
>>>>> Is there a way to measure if it is a TSC warp ? to point
out a cpu
>>>> tsc issue ?
>>>> 
>>>> The TSC warps or out-of-sync issues that we could reasonably
expect
>>>> would be
>>>> on the order of microseconds. A 3000s warp is something else
entirely.
>>>> Xen
>>>> is very confused and/or some TSC or platform timer has jumped a
long
>>>> way
>>>> (indicating a hardware/firmware issue).
>>>> 
>>>>  -- Keir
>>>> 
>>> &gt! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com>
>> 
>>>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
>>>> wrote:
>>>>>>> I tried to turn off cstates with max_cstate=0
without success
>>>> (still "not
>>>>>>> reliable").
>>>>>>> 
>>>>>>> With cpuidle=0, I also got :
>>>>>>> 
>>>>>>> (XEN) TSC has constant rate, deep Cstates possible,
so not
>>>> reliable,
>>>>>>> warp=3022 (count=1)
>>>>>> This message by itself isn''t telling much I
believe.
>>>>>> 
>>>>>>> xm info | grep command
>>>>>>> xen_commandline        : dom0_mem=512M cpuidle=0
loglvl=all
>>>> guest_loglvl=all
>>>>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1
com1=19200,8n1
>> 
>>>>>>> 
>>>>>>> Keir :
>>>>>>> 
>>>>>>> Using clocksource=pit :
>>>>>>> 
>>>>>>> (XEN) Platform timer is 1.193MHz PIT
>>>>>>> 
>>>>>>> I also got :
>>>>>>> 
>>>>>>> (XEN) TSC has constant rate, deep Cstates possible,
so not
>>>> reliable,
>>>>>>> warp=3262 (count=2)
>>>>>> The question is whether any of this eliminates the time
jumps seen
>>>>>> by your DomU-s (from your past mails I wasn''t
actually sure whether
>>>>>> Dom0 also experienced this problem, albeit it would be
odd if it
>>>> didn''t).
>>>>>> Jan
>>>>>> 
>>>>>> Jan
>>>>>> 
>>>>> 
>>>> 
>>  
> 
> 


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Olivier Hanesse

2011-Feb-28 15:54 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

Yes this is what I mean.
I am glad to hear that it isn''t a bad sign :)
I thought of a bad sign, because on system with "reliable TSC", this
counter
is always 0.

2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>
> Hi Olivier –
>
>
>
> By “warp=3000+ in  debug message” do you mean the Xen boot message “TSC has
> constant rate..., warp = NNNN”?
>
>
>
> If so, this is a very different “warp” measured in cycles, not in seconds,
> so 3000 is more like a microsecond not an hour, ! and this is normal (not a
> bad sign).
>
>
>
> Dan
>
>
>
> *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
> *Sent:* Monday, February 28, 2011 8:23 AM
> *To:* Dan Magenheimer
> *Cc:* Jeremy Fitzhardinge; Keir Fraser; Jan Beulich; Mark Adams;
> xen-devel@lists.xensource.com; Xen Users; Keir Fraser
>
> *Subject:* Re: [Xen-devel] Xen 4 TSC problems
>
>
>
> Keir :
>
>
>
> Yes, it is "under progress".
>
> To make this change, I had to reboot every server, so it is taking time
> (production server :()
>
> So i was hoping to find a quick method to mitigate this issue on domUs
> while rebooting servers.
>
>
>
> As this bug happens once or twice per server since October, I
can''t say
> that right now that changing platform timer to PIT fixed it. I have to wait
> (I hope forever!) this bug to happen again on a ''patched''
server ...
>
>
>
> But even with clcoksource=pit, I am seeing some warp=3000+ in debug message
> ? I guess it is not a good sign, is it ?
>
>
>
> Jan : I was hoping to find a way to make the domU clocksource more
> "independent" like with xen3.2.
>
>
>
>
>
> 2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>
>
> Hi Olivier –
>
>
>
> It is the Xen clocksource that you want to try to change, not the dom0
> clocksource.  To do this, you need to specify “clocksource=pit” on the Xen
> boot line (and reboot), not the dom0 boot line.
>
>
>
> I believe Mark Adams played with tsc_mode to see if it solved! his
> (similar? identical?) problem last year, and it didn’t make any difference.
>
>
> Please try booting Xen with “clocksource=pit” and ensu! re that “Platform
> timer is 1.19MHz PIT” appears in the Xen boot messages.  If the 50min jump
> does not appear again, it would point to a problem in the hpet, either
> hardware or software.
>
>
>
> Thanks,
>
> Dan
>
>
>
> *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com]
> *Sent:* Monday, February 28, 2011 7:37 AM
> *To:* Jeremy Fitzhardinge
> *Cc:* Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams;
> xen-devel@lists.xensource.com; Xen Users; Keir Fraser
>
>
> *Subject:* Re: [Xen-devel] Xen 4 TSC problems
>
>
>
> Hello,
>
>
>
> It happened again twice this weekend.
>
>
>
> What about setting "tsc_mode=2" for my vms ? Should this mode
prevent this
> bug (coming from a bad emulated tsc due to firmware issue ? is it possible
> ?) from affecting time in domUs ?
>
>
>
> Setting clocksource=pit, make ''tsc'' available in
>
"/sys/devices/system/clocksource/clocksource0/available_clocksource"
> (otherwise only xen is available, is it norma! l ? ).
>
>
>
> Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU
> ? or  will it be worsed ?
>
>
>
> Regards
>
>
>
> Olivier
>
>
>
> 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>
>
> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:
> > Just a wild guess, but this in Olivier''s posted output:
> >
> > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more
> times.
> >
> > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the
> > "10 or more times", 10 * 300 seconds is 3000 seconds, might
be a clue
> > (or a complete red herring, but I thought it worth mentioning).
> >
> > Mark and Olivier, it would be interesting to know if you are
> > using the same processor/system.
>
> It definitely seems like some kind of problem on the host system rather
> than anything in the guests themselves. !  If the platform timer is
> misbehaving, then Xen could be completely screwing up the pvclock
> calibration which it then passes to guests.
>
> Could it be one of those "platform clock stops in certain power
states"
> problems?
>
>
>    J
>
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.xen@gmail.com]
> >> Sent: Thursday, February 24, 2011 7:52 AM
> >> To: Olivier Hanesse; Jan Beulich
> >> Cc: Mark Adams; Jeremy Fitzhardinge;
xen-devel@lists.xensource.com; Xen
> >> Users; Dan Magenheimer; Keir Fraser
> >> Subject: Re: [Xen-devel] Xen 4 TSC problems
> >>
> >> On 24/02/2011 14:20, "Olivier Hanesse"
<olivier.hanesse@gmail.com>
> >> wrote:
> >>
> >>> Both dom0 and domUs are affected by this" jump".
> >>>
> >>> I expect to see something like "TSC marked as reliable,
warp = 0".
> >>> I got this on newer hardware with same config/distros.
> >> It depends on the CPU itself, older CPUs do not have the
super-stable
> >> TSC
> >> features. But that should never cause a massive 3000s time jump.
> >>
> >>> Is there a way to measure if it is a TSC warp ? to point out a
cpu
> >> tsc issue ?
> >>
> >> The TSC warps or out-of-sync issues that we could reasonably
expect
> >> would be
> >> on the order of microseconds. A 3000s warp is something else
entirely.
> >> Xen
> >> is very confused and/or some TSC or platform timer has jumped a
long
> >> way
> >> (indicating a hardware/firmware issue).
> >>
> >>  -- Keir
> >>
>
> >&gt! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com>
>
>
> >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse
<olivier.hanesse@gmail.com>
> >> wrote:
> >>>>> I tried to turn off cstates with max_cstate=0 without
success
> >> (still "not
> >>>>> reliable").
> >>>>>
> >>>>> With cpuidle=0, I also got :
> >>>>>
> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> >> reliable,
> >>>>> warp=3022 (count=1)
> >>>> This message by itself isn''t telling much I
believe.
> >>>>
> >>>>> xm info | grep command
> >>>>> xen_commandline        : dom0_mem=512M cpuidle=0
loglvl=all
> >> guest_loglvl=all
>
> >>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1
com1=19200,8n1
>
>
> >>>>>
> >>>>> Keir :
> >>>>>
> >>>>> Using clocksource=pit :
> >>>>>
> >>>>> (XEN) Platform timer is 1.193MHz PIT
> >>>>>
> >>>>> I also got :
> >>>>>
> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so
not
> >> reliable,
> >>>>> warp=3262 (count=2)
> >>>> The question is whether any of this eliminates the time
jumps seen
> >>>> by your DomU-s (from your past mails I wasn''t
actually sure whether
> >>>> Dom0 also experienced this problem, albeit it would be odd
if it
> >> didn''t).
> >>>> Jan
> >>>>
> >>>> Jan
> >>>>
> >>>
> >>
>
>
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

andre.arnold

2011-Apr-15 07:51 UTC

head link

[Xen-devel] Re: Xen 4 TSC problems

Hi,

we experienced the same problem with xen 4 under debian squeeze on
our DELL PowerEdge R815 Servers.

Does the "clocksource=pit" setting solve the problem?

Cheers
Andre

--
View this message in context:
http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-tp3396848p4304962.html
Sent from the Xen - Dev mailing list archive at Nabble.com.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Olivier Hanesse

2011-Apr-15 16:31 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

So far yes.
Let''s wait another month to tell that setting clocksource=pit was the
solution :)


2011/4/15 andre.arnold <andre.arnold@gmail.com>
> Hi,
>
> we experienced the same problem with xen 4 under debian squeeze on
> our DELL PowerEdge R815 Servers.
>
> Does the "clocksource=pit" setting solve the problem?
>
> Cheers
> Andre
>
> --
> View this message in context:
> http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-tp3396848p4304962.html
> Sent from the Xen - Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Philippe Simonet

2011-Sep-13 07:16 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

Hi Xen developers

i just would like to inform you that I have exactly the same problem 
with Debian squeeze and xen, with
50 seconds time jump on my dom0 and domu. NTP is running on all 
dom0/domuU, clocksource is ''xen''
everywhere.

some messages :
syslog :
Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable 
(delta = -2999662111513 ns)

xm dmesg :
...
(XEN) Platform timer is 14.318MHz HPET
...
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
(XEN) TSC marked as reliable, warp = 0 (count=2)
...

I had some contact with Olivier Hanesse and it indicates that he
doesn''t
have any solution for this problem,
and all what was proposed in February didn''t solved this problem.

all suggestions are welcomed.

Best regards

Philippe


config :
--------------------------------------
Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 12:46:30 
UTC 2011 x86_64 GNU/Linux
--------------------------------------
HP DL385
--------------------------------------
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 9
model name      : AMD Opteron(tm) Processor 6174
stepping        : 1
cpu MHz         : 3058776.574
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat 
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 
3dnow constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni cx16 
popcnt hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a 
misalignsse 3dnowprefetch nodeid_msr
bogomips        : 4409.03
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
--------------------------------------

--------------------------------------
PCI :
00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual 
slot (2x16) PCI-e GFX Hydra part (rev 02)
00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI 
express gpp port D)
00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI 
express gpp port F)
00:0a.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge 
(external gfx1 port A)
00:0b.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (NB-SB 
link)
00:0d.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge 
(external gfx1 port B)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA 
Controller [IDE mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 
Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 
Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Link Control
00:1a.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
HyperTransport Configuration
00:1a.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Address Map
00:1a.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
DRAM Controller
00:1a.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Miscellaneous Control
00:1a.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Link Control
00:1b.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
HyperTransport Configuration
00:1b.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Address Map
00:1b.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
DRAM Controller
00:1b.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Miscellaneous Control
00:1b.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor 
Link Control
01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
02:00.0 System peripheral: Hewlett-Packard Company iLO3 Slave 
instrumentation & System support (rev 04)
02:00.2 System peripheral: Hewlett-Packard Company iLO3 Management 
Processor Support and Messaging (rev 04)
02:00.4 USB Controller: Hewlett-Packard Company Proliant iLO2/iLO3 
virtual USB controller (rev 01)
03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 
controllers (rev 01)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 
Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 
Gigabit Ethernet (rev 20)
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 
Gigabit Ethernet (rev 20)
05:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 
Gigabit Ethernet (rev 20)
09:00.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI 
Express Gen 2 (5.0 GT/s) Switch (rev bb)
0a:04.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI 
Express Gen 2 (5.0 GT/s) Switch (rev bb)
0a:05.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI 
Express Gen 2 (5.0 GT/s) Switch (rev bb)
0a:06.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI 
Express Gen 2 (5.0 GT/s) Switch (rev bb)
0c:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0c:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0c:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0c:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0f:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0f:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0f:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
0f:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network 
Connection (rev 01)
--------------------------------------




On 8:59 PM, Olivier Hanesse wrote:> Hello
>
> I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze
release).
>
> My problem is here (hopefully I amn''t the only one, so there might
be a bug
> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
> After some times,  I got this error : Clocksource tsc unstable (delta >
-2999660334211 ns). It has happened on several servers.
>
> Looking at the output of "xm debug-key s;"
>
> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
> warp=2850 (count=3)
>
> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which has
the
> "constant_tsc", but not the "nonstop_tsc" one.
> On other systems with a newer cpu with "nonstop_tsc", I
don''t have this
> issue (systems are running the same distros with same config).
>
> I tried to boot with "max_cstate=0", but nothing changed, my TSC
isn''t
> reliable and after some times, I will got the "50min" issue
again.
>
> I don''t understand how a system can do a jump of "50min"
in the future. Why
> 50min ? it is not 40min, not 1 hour, it is always 50min.
> I don''t know how to make my TSC "reliable" (I already
disable everything
> about Powerstate in BIOS Settings).
>
> Any ideas ?
>
> Regards
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-15 08:23 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet
wrote:> Hi Xen developers
> 
> i just would like to inform you that I have exactly the same problem
> with Debian squeeze and xen, with
> 50 seconds time jump on my dom0 and domu. NTP is running on all
> dom0/domuU, clocksource is ''xen''
> everywhere.
> 
> some messages :
> syslog :
> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc
> unstable (delta = -2999662111513 ns)
> 
> xm dmesg :
> ...
> (XEN) Platform timer is 14.318MHz HPET
> ...
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
> (XEN) TSC marked as reliable, warp = 0 (count=2)
> ...
> 
> I had some contact with Olivier Hanesse and it indicates that he
> doesn''t have any solution for this problem,
> and all what was proposed in February didn''t solved this problem.

Which was the max_cstate=0 ?
..> config :
> --------------------------------------
> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14
> 12:46:30 UTC 2011 x86_64 GNU/Linux
> --------------------------------------
> HP DL385
> --------------------------------------
> vendor_id       : AuthenticAMD
> cpu family      : 16
> model           : 9
> model name      : AMD Opteron(tm) Processor 6174
> stepping        : 1
> cpu MHz         : 3058776.574
OK, that is really messed up. Your house must be on fire for the machine
to be running at 3058GHz!

Jeremy, this sounds familiar - did we have a patch for this in
your 2.6.32 tree?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Sep-15 08:24 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet
wrote:> Hi Xen developers
Lets try this again, this time Cc-ing Jeremy.> 
> i just would like to inform you that I have exactly the same problem
> with Debian squeeze and xen, with
> 50 seconds time jump on my dom0 and domu. NTP is running on all
> dom0/domuU, clocksource is ''xen''
> everywhere.
> 
> some messages :
> syslog :
> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc
> unstable (delta = -2999662111513 ns)
> 
> xm dmesg :
> ...
> (XEN) Platform timer is 14.318MHz HPET
> ...
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
> (XEN) TSC marked as reliable, warp = 0 (count=2)
> ...
> 
> I had some contact with Olivier Hanesse and it indicates that he
> doesn''t have any solution for this problem,
> and all what was proposed in February didn''t solved this problem.

Which was the max_cstate=0 ?
..> config :
> --------------------------------------
> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14
> 12:46:30 UTC 2011 x86_64 GNU/Linux
> --------------------------------------
> HP DL385
> --------------------------------------
> vendor_id       : AuthenticAMD
> cpu family      : 16
> model           : 9
> model name      : AMD Opteron(tm) Processor 6174
> stepping        : 1
> cpu MHz         : 3058776.574
OK, that is really messed up. Your house must be on fire for the machine
to be running at 3058GHz!

Jeremy, this sounds familiar - did we have a patch for this in
your 2.6.32 tree?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Sep-15 10:36 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On Tue, Sep 13, 2011 at 8:16 AM, Philippe Simonet
<philippe.simonet@bluewin.ch> wrote:> Hi Xen developers
>
> i just would like to inform you that I have exactly the same problem with
> Debian squeeze and xen, with
> 50 seconds time jump on my dom0 and domu. NTP is running on all dom0/domuU,
> clocksource is ''xen''
> everywhere.
>
> some messages :
> syslog :
> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable
> (delta = -2999662111513 ns)
>
> xm dmesg :
> ...
> (XEN) Platform timer is 14.318MHz HPET
> ...
> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
> (XEN) TSC marked as reliable, warp = 0 (count=2)
> ...
I haven''t been following this conversation, so I don''t know if
this is
relevant, but I''ve just discovered this morning that the TSC warp
check in Xen is done at the wrong time (before any secondary cpus are
brought up), and thus always returns warp=0.  I''ve submitted a patch
to do the check after secondary CPUs are brought up; that should cause
Xen to do periodic synchronization of TSCs when there is drift.

 -George
>
> I had some contact with Olivier Hanesse and it indicates that he
doesn''t
> have any solution for this problem,
> and all what was proposed in February didn''t solved this problem.
>
> all suggestions are welcomed.
>
> Best regards
>
> Philippe
>
>
> config :
> --------------------------------------
> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 12:46:30 UTC
> 2011 x86_64 GNU/Linux
> --------------------------------------
> HP DL385
> --------------------------------------
> vendor_id       : AuthenticAMD
> cpu family      : 16
> model           : 9
> model name      : AMD Opteron(tm) Processor 6174
> stepping        : 1
> cpu MHz         : 3058776.574
> cache size      : 512 KB
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 5
> wp              : yes
> flags           : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush
> mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
> constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni cx16 popcnt
> hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse
> 3dnowprefetch nodeid_msr
> bogomips        : 4409.03
> TLB size        : 1024 4K pages
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 48 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate
> --------------------------------------
>
> --------------------------------------
> PCI :
> 00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot
> (2x16) PCI-e GFX Hydra part (rev 02)
> 00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI
> express gpp port D)
> 00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI
> express gpp port F)
> 00:0a.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external
> gfx1 port A)
> 00:0b.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (NB-SB
> link)
> 00:0d.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external
> gfx1 port B)
> 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller
> [IDE mode]
> 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
> Controller
> 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
> 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
> 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0
> Controller
> 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
> 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI
Controller
> 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d)
> 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
> 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
> 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
> 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> HyperTransport Configuration
> 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Address Map
> 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM
> Controller
> 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Miscellaneous Control
> 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link
> Control
> 00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> HyperTransport Configuration
> 00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Address Map
> 00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM
> Controller
> 00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Miscellaneous Control
> 00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link
> Control
> 00:1a.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> HyperTransport Configuration
> 00:1a.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Address Map
> 00:1a.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM
> Controller
> 00:1a.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Miscellaneous Control
> 00:1a.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link
> Control
> 00:1b.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> HyperTransport Configuration
> 00:1b.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Address Map
> 00:1b.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM
> Controller
> 00:1b.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor
> Miscellaneous Control
> 00:1b.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link
> Control
> 01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
> 02:00.0 System peripheral: Hewlett-Packard Company iLO3 Slave
> instrumentation & System support (rev 04)
> 02:00.2 System peripheral: Hewlett-Packard Company iLO3 Management
Processor
> Support and Messaging (rev 04)
> 02:00.4 USB Controller: Hewlett-Packard Company Proliant iLO2/iLO3 virtual
> USB controller (rev 01)
> 03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6
> controllers (rev 01)
> 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> 04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> 05:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> 09:00.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI
> Express Gen 2 (5.0 GT/s) Switch (rev bb)
> 0a:04.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI
> Express Gen 2 (5.0 GT/s) Switch (rev bb)
> 0a:05.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI
> Express Gen 2 (5.0 GT/s) Switch (rev bb)
> 0a:06.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI
> Express Gen 2 (5.0 GT/s) Switch (rev bb)
> 0c:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0c:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0c:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0c:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0f:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0f:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0f:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> 0f:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network
> Connection (rev 01)
> --------------------------------------
>
>
>
>
> On 8:59 PM, Olivier Hanesse wrote:
>>
>> Hello
>>
>> I''ve got an issue about time keeping with Xen 4.0 (Debian
squeeze
>> release).
>>
>> My problem is here (hopefully I amn''t the only one, so there
might be a
>> bug
>> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50
>> After some times,  I got this error : Clocksource tsc unstable (delta
>> -2999660334211 ns). It has happened on several servers.
>>
>> Looking at the output of "xm debug-key s;"
>>
>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable,
>> warp=2850 (count=3)
>>
>> I am using a "Intel(R) Xeon(R) CPU L5420  @ 2.50GHz", which
has the
>> "constant_tsc", but not the "nonstop_tsc" one.
>> On other systems with a newer cpu with "nonstop_tsc", I
don''t have this
>> issue (systems are running the same distros with same config).
>>
>> I tried to boot with "max_cstate=0", but nothing changed, my
TSC isn''t
>> reliable and after some times, I will got the "50min" issue
again.
>>
>> I don''t understand how a system can do a jump of
"50min" in the future.
>> Why
>> 50min ? it is not 40min, not 1 hour, it is always 50min.
>> I don''t know how to make my TSC "reliable" (I
already disable everything
>> about Powerstate in BIOS Settings).
>>
>> Any ideas ?
>>
>> Regards
>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-15 16:24 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote:> On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote:
>> Hi Xen developers
> Lets try this again, this time Cc-ing Jeremy.
>> i just would like to inform you that I have exactly the same problem
>> with Debian squeeze and xen, with
>> 50 seconds time jump on my dom0 and domu. NTP is running on all
>> dom0/domuU, clocksource is ''xen''
>> everywhere.
>>
>> some messages :
>> syslog :
>> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc
>> unstable (delta = -2999662111513 ns)
>>
>> xm dmesg :
>> ...
>> (XEN) Platform timer is 14.318MHz HPET
>> ...
>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more
times.
>> (XEN) TSC marked as reliable, warp = 0 (count=2)
>> ...
>>
>> I had some contact with Olivier Hanesse and it indicates that he
>> doesn''t have any solution for this problem,
>> and all what was proposed in February didn''t solved this
problem.
That looks like Xen itself is having problems keeping track of time.  If
it can''t manage it, then there''s not much the guest kernels
can do about it.
>
> Which was the max_cstate=0 ?
> ..
>> config :
>> --------------------------------------
>> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14
>> 12:46:30 UTC 2011 x86_64 GNU/Linux
>> --------------------------------------
>> HP DL385
>> --------------------------------------
>> vendor_id       : AuthenticAMD
>> cpu family      : 16
>> model           : 9
>> model name      : AMD Opteron(tm) Processor 6174
>> stepping        : 1
>> cpu MHz         : 3058776.574
> OK, that is really messed up. Your house must be on fire for the machine
> to be running at 3058GHz!
>
> Jeremy, this sounds familiar - did we have a patch for this in
> your 2.6.32 tree?
Not that I can think of.  All I can suggest from the kernel side is that
perhaps some of the ACPI power stuff isn''t being set up properly, and
that makes the CPU do very strange things with its TSC/power states in
general.

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Sep-15 18:38 UTC

head link

RE: [Xen-devel] Xen 4 TSC problems

> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
> Sent: Thursday, September 15, 2011 4:36 AM
> To: Philippe Simonet
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] Xen 4 TSC problems
> 
> On Tue, Sep 13, 2011 at 8:16 AM, Philippe Simonet
> <philippe.simonet@bluewin.ch> wrote:
> > Hi Xen developers
> >
> > i just would like to inform you that I have exactly the same problem
with
> > Debian squeeze and xen, with
> > 50 seconds time jump on my dom0 and domu. NTP is running on all
dom0/domuU,
> > clocksource is ''xen''
> > everywhere.
> >
> > some messages :
> > syslog :
> > Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc
unstable
> > (delta = -2999662111513 ns)
> >
> > xm dmesg :
> > ...
> > (XEN) Platform timer is 14.318MHz HPET
> > ...
> > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more
times.
> > (XEN) TSC marked as reliable, warp = 0 (count=2)
> > ...
> 
> I haven''t been following this conversation, so I don''t
know if this is
> relevant, but I''ve just discovered this morning that the TSC warp
> check in Xen is done at the wrong time (before any secondary cpus are
> brought up), and thus always returns warp=0.  I''ve submitted a
patch
> to do the check after secondary CPUs are brought up; that should cause
> Xen to do periodic synchronization of TSCs when there is drift.
Wow, nice catch, George!  I wonder if this is the underlying bug
for many of the mysterious time problems that have been reported
for a year or two now... at least on certain AMD boxes.
Any idea when this was introduced?  Or has it always been wrong?

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

<Philippe.Simonet@swisscom.com>

2011-Sep-16 06:03 UTC

head link

RE: [Xen-devel] Xen 4 TSC problems

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge
> Sent: Thursday, September 15, 2011 6:25 PM
> To: Konrad Rzeszutek Wilk
> Cc: xen-devel@lists.xensource.com; Philippe Simonet
> Subject: Re: [Xen-devel] Xen 4 TSC problems
> 
> On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote:
> > On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote:
> >> Hi Xen developers
> > Lets try this again, this time Cc-ing Jeremy.
> >> i just would like to inform you that I have exactly the same
problem
> >> with Debian squeeze and xen, with
> >> 50 seconds time jump on my dom0 and domu. NTP is running on all
> >> dom0/domuU, clocksource is ''xen''
> >> everywhere.
> >>
> >> some messages :
> >> syslog :
> >> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc
> >> unstable (delta = -2999662111513 ns)
> >>
> >> xm dmesg :
> >> ...
> >> (XEN) Platform timer is 14.318MHz HPET ...
> >> (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more
> times.
> >> (XEN) TSC marked as reliable, warp = 0 (count=2) ...
> >>
> >> I had some contact with Olivier Hanesse and it indicates that he
> >> doesn''t have any solution for this problem, and all what
was proposed
> >> in February didn''t solved this problem.
> 
> That looks like Xen itself is having problems keeping track of time.  If it
can''t
> manage it, then there''s not much the guest kernels can do about
it.
> 
> >
> > Which was the max_cstate=0 ?
> > ..
> >> config :
> >> --------------------------------------
> >> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14
> >> 12:46:30 UTC 2011 x86_64 GNU/Linux
> >> --------------------------------------
> >> HP DL385
> >> --------------------------------------
> >> vendor_id       : AuthenticAMD
> >> cpu family      : 16
> >> model           : 9
> >> model name      : AMD Opteron(tm) Processor 6174
> >> stepping        : 1
> >> cpu MHz         : 3058776.574
> > OK, that is really messed up. Your house must be on fire for the
> > machine to be running at 3058GHz!
> >
> > Jeremy, this sounds familiar - did we have a patch for this in your
> > 2.6.32 tree?
> 
> Not that I can think of.  All I can suggest from the kernel side is that
perhaps
> some of the ACPI power stuff isn''t being set up properly, and that
makes the
> CPU do very strange things with its TSC/power states in general.
> 
how can i detect that ? 

the /proc/acpi/processor path is empty, 

find /proc/acpi
 /proc/acpi
 /proc/acpi/processor
 /proc/acpi/button
 /proc/acpi/button/power
 /proc/acpi/button/power/PWRF
 /proc/acpi/button/power/PWRF/info
 /proc/acpi/thermal_zone
 /proc/acpi/wakeup
 /proc/acpi/sleep
 /proc/acpi/fadt
 /proc/acpi/dsdt
 /proc/acpi/info
 /proc/acpi/power_resource
 /proc/acpi/embedded_controller

dmesg | grep -I acpi
 [    1.205647] hpet_acpi_add: no address or irqs in _CRS

lsmod | grep -i acpi
 acpi_processor          5087  1 processor,[permanent]

>     J
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2011-Sep-16 22:40 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 09/15/2011 11:03 PM, Philippe.Simonet@swisscom.com
wrote:>> -----Original Message-----
>> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
>> bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge
>> Sent: Thursday, September 15, 2011 6:25 PM
>> To: Konrad Rzeszutek Wilk
>> Cc: xen-devel@lists.xensource.com; Philippe Simonet
>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>
>> On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote:
>>>> Hi Xen developers
>>> Lets try this again, this time Cc-ing Jeremy.
>>>> i just would like to inform you that I have exactly the same
problem
>>>> with Debian squeeze and xen, with
>>>> 50 seconds time jump on my dom0 and domu. NTP is running on all
>>>> dom0/domuU, clocksource is ''xen''
>>>> everywhere.
>>>>
>>>> some messages :
>>>> syslog :
>>>> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc
>>>> unstable (delta = -2999662111513 ns)
>>>>
>>>> xm dmesg :
>>>> ...
>>>> (XEN) Platform timer is 14.318MHz HPET ...
>>>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more
>> times.
>>>> (XEN) TSC marked as reliable, warp = 0 (count=2) ...
>>>>
>>>> I had some contact with Olivier Hanesse and it indicates that
he
>>>> doesn''t have any solution for this problem, and all
what was proposed
>>>> in February didn''t solved this problem.
>> That looks like Xen itself is having problems keeping track of time. 
If it can''t
>> manage it, then there''s not much the guest kernels can do
about it.
>>
>>> Which was the max_cstate=0 ?
>>> ..
>>>> config :
>>>> --------------------------------------
>>>> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14
>>>> 12:46:30 UTC 2011 x86_64 GNU/Linux
>>>> --------------------------------------
>>>> HP DL385
>>>> --------------------------------------
>>>> vendor_id       : AuthenticAMD
>>>> cpu family      : 16
>>>> model           : 9
>>>> model name      : AMD Opteron(tm) Processor 6174
>>>> stepping        : 1
>>>> cpu MHz         : 3058776.574
>>> OK, that is really messed up. Your house must be on fire for the
>>> machine to be running at 3058GHz!
>>>
>>> Jeremy, this sounds familiar - did we have a patch for this in your
>>> 2.6.32 tree?
>> Not that I can think of.  All I can suggest from the kernel side is
that perhaps
>> some of the ACPI power stuff isn''t being set up properly, and
that makes the
>> CPU do very strange things with its TSC/power states in general.
>>
> how can i detect that ? 
>
> the /proc/acpi/processor path is empty, 
>
> find /proc/acpi
>  /proc/acpi
>  /proc/acpi/processor
>  /proc/acpi/button
>  /proc/acpi/button/power
>  /proc/acpi/button/power/PWRF
>  /proc/acpi/button/power/PWRF/info
>  /proc/acpi/thermal_zone
>  /proc/acpi/wakeup
>  /proc/acpi/sleep
>  /proc/acpi/fadt
>  /proc/acpi/dsdt
>  /proc/acpi/info
>  /proc/acpi/power_resource
>  /proc/acpi/embedded_controller
>
> dmesg | grep -I acpi
>  [    1.205647] hpet_acpi_add: no address or irqs in _CRS
>
> lsmod | grep -i acpi
>  acpi_processor          5087  1 processor,[permanent]
What does "xenpm start 5" say?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Philippe Simonet

2011-Sep-19 05:45 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 9/17/2011 12:40 AM, Jeremy Fitzhardinge wrote:> On 09/15/2011 11:03 PM, Philippe.Simonet@swisscom.com wrote:
>>> -----Original Message-----
>>> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
>>> bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge
>>> Sent: Thursday, September 15, 2011 6:25 PM
>>> To: Konrad Rzeszutek Wilk
>>> Cc: xen-devel@lists.xensource.com; Philippe Simonet
>>> Subject: Re: [Xen-devel] Xen 4 TSC problems
>>>
>>> On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote:
>>>> On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet
wrote:
>>>>> Hi Xen developers
>>>> Lets try this again, this time Cc-ing Jeremy.
>>>>> i just would like to inform you that I have exactly the
same problem
>>>>> with Debian squeeze and xen, with
>>>>> 50 seconds time jump on my dom0 and domu. NTP is running on
all
>>>>> dom0/domuU, clocksource is ''xen''
>>>>> everywhere.
>>>>>
>>>>> some messages :
>>>>> syslog :
>>>>> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource
tsc
>>>>> unstable (delta = -2999662111513 ns)
>>>>>
>>>>> xm dmesg :
>>>>> ...
>>>>> (XEN) Platform timer is 14.318MHz HPET ...
>>>>> (XEN) Platform timer appears to have unexpectedly wrapped
10 or more
>>> times.
>>>>> (XEN) TSC marked as reliable, warp = 0 (count=2) ...
>>>>>
>>>>> I had some contact with Olivier Hanesse and it indicates
that he
>>>>> doesn''t have any solution for this problem, and
all what was proposed
>>>>> in February didn''t solved this problem.
>>> That looks like Xen itself is having problems keeping track of
time.  If it can''t
>>> manage it, then there''s not much the guest kernels can do
about it.
>>>
>>>> Which was the max_cstate=0 ?
>>>> ..
>>>>> config :
>>>>> --------------------------------------
>>>>> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun
14
>>>>> 12:46:30 UTC 2011 x86_64 GNU/Linux
>>>>> --------------------------------------
>>>>> HP DL385
>>>>> --------------------------------------
>>>>> vendor_id       : AuthenticAMD
>>>>> cpu family      : 16
>>>>> model           : 9
>>>>> model name      : AMD Opteron(tm) Processor 6174
>>>>> stepping        : 1
>>>>> cpu MHz         : 3058776.574
>>>> OK, that is really messed up. Your house must be on fire for
the
>>>> machine to be running at 3058GHz!
>>>>
>>>> Jeremy, this sounds familiar - did we have a patch for this in
your
>>>> 2.6.32 tree?
>>> Not that I can think of.  All I can suggest from the kernel side is
that perhaps
>>> some of the ACPI power stuff isn''t being set up properly,
and that makes the
>>> CPU do very strange things with its TSC/power states in general.
>>>
>> how can i detect that ?
>>
>> the /proc/acpi/processor path is empty,
>>
>> find /proc/acpi
>>   /proc/acpi
>>   /proc/acpi/processor
>>   /proc/acpi/button
>>   /proc/acpi/button/power
>>   /proc/acpi/button/power/PWRF
>>   /proc/acpi/button/power/PWRF/info
>>   /proc/acpi/thermal_zone
>>   /proc/acpi/wakeup
>>   /proc/acpi/sleep
>>   /proc/acpi/fadt
>>   /proc/acpi/dsdt
>>   /proc/acpi/info
>>   /proc/acpi/power_resource
>>   /proc/acpi/embedded_controller
>>
>> dmesg | grep -I acpi
>>   [    1.205647] hpet_acpi_add: no address or irqs in _CRS
>>
>> lsmod | grep -i acpi
>>   acpi_processor          5087  1 processor,[permanent]
> What does "xenpm start 5" say?
>
>      J
>
here it is :

root@dnsit22.swissptt.ch ~# xenpm start 5
Timeout set to 5 seconds
Start sampling, waiting for CTRL-C or SIGINT or SIGALARM signal ...
Elapsed time (ms): 5028

CPU0:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU1:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU2:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU3:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU4:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU5:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU6:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU7:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU8:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU9:   Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU10:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU11:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU12:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU13:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU14:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU15:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU16:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU17:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU18:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU19:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU20:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU21:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU22:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz

CPU23:  Residency(ms)           Avg Res(ms)
   Avg freq      18      KHz






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Sep-19 10:39 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer
<dan.magenheimer@oracle.com> wrote:>> I haven''t been following this conversation, so I
don''t know if this is
>> relevant, but I''ve just discovered this morning that the TSC
warp
>> check in Xen is done at the wrong time (before any secondary cpus are
>> brought up), and thus always returns warp=0.  I''ve submitted a
patch
>> to do the check after secondary CPUs are brought up; that should cause
>> Xen to do periodic synchronization of TSCs when there is drift.
>
> Wow, nice catch, George!  I wonder if this is the underlying bug
> for many of the mysterious time problems that have been reported
> for a year or two now... at least on certain AMD boxes.
> Any idea when this was introduced?  Or has it always been wrong?
Well the comment in 20823:89907dab1aef seems to indicate that''s where
the "assume it''s reliable on AMD until proven otherwise"
started; that
would be January 2010.

I looked as far back as 20705:a74aca4b9386, and there the TSC
reliability checks were again in init_xen_time().  Figuring out where
things were before then is getting into archeology. :-)

The comment at the top of init_xen_time() is correct now, but from the
time it was first written through 4.1 is was just plain wrong -- it
said init_xen_time() happened after all cpus were up, which has never
been true.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Sep-22 12:07 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

>>> On 19.09.11 at 12:39, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> The comment at the top of init_xen_time() is correct now, but from the
> time it was first written through 4.1 is was just plain wrong -- it
> said init_xen_time() happened after all cpus were up, which has never
> been true.
Not really - CPUs got booted by that time originally (pre-4.0, which is
what the old comment said), but not onlined. Prior to the use of
smp_call_function() for the TSC reliability check I assume only CPU
feature flags got looked at, which was available at that point prior to
Keir''s re-work of the SMP boot process.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

<Philippe.Simonet@swisscom.com>

2011-Sep-30 06:33 UTC

head link

RE: [Xen-devel] Xen 4 TSC problems

Hi Xen developpers

i need some good tips to go forward with my TSC problem : 

first fast the problem : 

- clock jump 50 minutes forward : (xm dmesg)
	(XEN) TSC is reliable, synchronization unnecessary
	(XEN) Platform timer is 14.318MHz HPET
	(XEN)  Platform timer appears to have unexpectedly wrapped 10 or more times

	(syslog)
	Sep 28 17:45:06 dnsit11 kernel: [1970548.356130] Clocksource tsc unstable
(delta = -2999660112689 ns)
	Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable (delta
= -2999662111513 ns)

- I can''t reproduce or force the problem

- on 2 different HP DL 385 G7,  with debian squeeze : 
	xen-hypervisor-4.0-amd64                4.0.1-2
	dom0 : linux-image-2.6.32-5-xen-amd64          2.6.32-35
	domus : 5 -> 15 debian machines
	2 * 12-cores AMD Opteron(tm) Processor 6174

- i have this problem since begin of september, before, the machine were running
since 3 month without problem
	begin of September,  I have done an upgrade (dom0 and domus:)
	linux-image-2.6.32-5-xen-amd64:amd64 (2.6.32-31, automatic)  ->
linux-image-2.6.32-5-xen-amd64:amd64 (2.6.32-31, 2.6.32-35)

- what is strange : (don''t know if there is a link with the problem)
	/proc/cpuinfo in dom0 gives me : 

	cpu MHz         : 3249880.888
  --or --
	cpu MHz         : 2300454.255
....		(different after each reboot)
	
	in domu thi value is ok(cpu MHz         : 2200.112), the bogomips is also ok
(bogomips        : 4400.21)
	if I start the machine with a non-xen environment, the values are also ok
	
I have now exact the same machine where I can make some tests.

Could you give me some tips that I could test or implement ?
	- hardware problem ? hypervisor problem ? dom0 problem ?
	- try other hypervisor version ? 
	- try linux-image-3.0.0-1-amd64 3.0.0-3
	- try reproducing problem ? (how ?, log it ? ....)

all your help is welcomed !

many thanks

Philippe




> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of George Dunlap
> Sent: Monday, September 19, 2011 12:40 PM
> To: Dan Magenheimer
> Cc: Keir Fraser; jeremy@goop.org; xen-devel@lists.xensource.com; Philippe
> Simonet; Konrad Wilk
> Subject: Re: [Xen-devel] Xen 4 TSC problems
> 
> On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer
> <dan.magenheimer@oracle.com> wrote:
> >> I haven''t been following this conversation, so I
don''t know if this
> >> is relevant, but I''ve just discovered this morning that
the TSC warp
> >> check in Xen is done at the wrong time (before any secondary cpus
are
> >> brought up), and thus always returns warp=0.  I''ve
submitted a patch
> >> to do the check after secondary CPUs are brought up; that should
> >> cause Xen to do periodic synchronization of TSCs when there is
drift.
> >
> > Wow, nice catch, George!  I wonder if this is the underlying bug for
> > many of the mysterious time problems that have been reported for a
> > year or two now... at least on certain AMD boxes.
> > Any idea when this was introduced?  Or has it always been wrong?
> 
> Well the comment in 20823:89907dab1aef seems to indicate that''s
where the
> "assume it''s reliable on AMD until proven otherwise"
started; that would be
> January 2010.
> 
> I looked as far back as 20705:a74aca4b9386, and there the TSC reliability
> checks were again in init_xen_time().  Figuring out where things were
before
> then is getting into archeology. :-)
> 
> The comment at the top of init_xen_time() is correct now, but from the time
> it was first written through 4.1 is was just plain wrong -- it said
> init_xen_time() happened after all cpus were up, which has never been true.
> 
>  -George
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

tommics

2011-Sep-30 09:36 UTC

head link

[Xen-devel] RE: Xen 4 TSC problems

Hey there,

just wanted to report that we also experience the same problem since
upgrading to xen 4.0.1 on debian squeeze. We are running latest debian
stable release.

ii  libxenstore3.0                          4.0.1-2          
ii  linux-image-2.6.32-5-xen-amd64          2.6.32-35squeeze2
ii  linux-image-xen-amd64                   2.6.32+29        
ii  xen-hypervisor-4.0-amd64                4.0.1-2          
ii  xen-linux-system-2.6-xen-amd64          2.6.32+29        
ii  xen-linux-system-2.6.32-5-xen-amd64     2.6.32-35squeeze2
ii  xen-qemu-dm-4.0                         4.0.1-2          
ii  xen-tools                               4.2-1            
ii  xen-utils-4.0                           4.0.1-2          
ii  xen-utils-common                        4.0.0-1          
ii  xenstore-utils                          4.0.1-2          

We also have the clock jumping 50 minutes into future. 

We are running IBM Blades  HS21 XM (Type 7995)  with Intel(R) Xeon(R) CPU
E5345  @ 2.33GHz.
We are also running the same configuration on another machine with Intel(R)
Xeon(R) CPU X7550  @ 2.00GHz where we dont experience this problems. Also we
didnt had those bugs running xen 3.1.0 with 2.6.18-5-xen-amd64 kernel.

We currently running one blade with disabled HPET, clocksoure=pit and
cpuidle=0 and another with HPET on and nothing else configured.

The main problem debugging this, is to wait for the next error to appear.
Until now both machines run fine without time jumping, but we did see that
time jump on the machine which is running with hpet enabled and no other
settings. 

We could help debugging here.

Regards
Thomas Pöhler


--
View this message in context:
http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-tp3396848p4856420.html
Sent from the Xen - Dev mailing list archive at Nabble.com.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Sep-30 17:16 UTC

head link

RE: [Xen-devel] Xen 4 TSC problems

> From: Philippe.Simonet@swisscom.com [mailto:Philippe.Simonet@swisscom.com]
> Subject: RE: [Xen-devel] Xen 4 TSC problems
> 
> Hi Xen developpers
> 
> i need some good tips to go forward with my TSC problem :
> 
> Could you give me some tips that I could test or implement ?
> 	- try other hypervisor version ?
Hi Phillipe --

It would definitely be worthwhile to see if you can reproduce
the problem on the latest xen-unstable bits.  (Please make sure
that the bug George reported below is fixed in your build.)
A lot has changed since 4.0.1.

Dan

P.S. I will be mostly offline for the next week or so...
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> > bounces@lists.xensource.com] On Behalf Of George Dunlap
> > Sent: Monday, September 19, 2011 12:40 PM
> > To: Dan Magenheimer
> > Cc: Keir Fraser; jeremy@goop.org; xen-devel@lists.xensource.com;
Philippe
> > Simonet; Konrad Wilk
> > Subject: Re: [Xen-devel] Xen 4 TSC problems
> >
> > On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer
> > <dan.magenheimer@oracle.com> wrote:
> > >> I haven''t been following this conversation, so I
don''t know if this
> > >> is relevant, but I''ve just discovered this morning
that the TSC warp
> > >> check in Xen is done at the wrong time (before any secondary
cpus are
> > >> brought up), and thus always returns warp=0.  I''ve
submitted a patch
> > >> to do the check after secondary CPUs are brought up; that
should
> > >> cause Xen to do periodic synchronization of TSCs when there
is drift.
> > >
> > > Wow, nice catch, George!  I wonder if this is the underlying bug
for
> > > many of the mysterious time problems that have been reported for
a
> > > year or two now... at least on certain AMD boxes.
> > > Any idea when this was introduced?  Or has it always been wrong?
> >
> > Well the comment in 20823:89907dab1aef seems to indicate
that''s where the
> > "assume it''s reliable on AMD until proven
otherwise" started; that would be
> > January 2010.
> >
> > I looked as far back as 20705:a74aca4b9386, and there the TSC
reliability
> > checks were again in init_xen_time().  Figuring out where things were
before
> > then is getting into archeology. :-)
> >
> > The comment at the top of init_xen_time() is correct now, but from the
time
> > it was first written through 4.1 is was just plain wrong -- it said
> > init_xen_time() happened after all cpus were up, which has never been
true.
> >
> >  -George
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mauro

2012-Sep-27 15:54 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 28 February 2011 16:54, Olivier Hanesse <olivier.hanesse@gmail.com>
wrote:> Yes this is what I mean.
> I am glad to hear that it isn''t a bad sign :)
> I thought of a bad sign, because on system with "reliable TSC",
this counter
> is always 0.
Hey men.
I have exactly the same problem.
I have two cluster nodes.
Server are two HP Proliant DL 580 G4 with four Quad Core Intel(R)
Xeon(R) CPU E7330  @ 2.40GHz.
I''m running debian squeeze in dom0s end domUs.
xm info:

host                   : xen-p01
release                : 2.6.32-5-xen-amd64
version                : #1 SMP Sun May 6 08:57:29 UTC 2012
machine                : x86_64
nr_cpus                : 16
nr_nodes               : 1
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2400
hw_caps                :
bfebfbff:20100800:00000000:00000940:0004e3bd:00000000:00000001:00000000
virt_caps              : hvm
total_memory           : 65532
free_memory            : 40317
node_to_cpu            : node0:0-15
node_to_memory         : node0:40317
node_to_dma32_mem      : node0:3256
max_node_id            : 0
xen_major              : 4
xen_minor              : 0
xen_extra              : .1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
xen_commandline        : placeholder dom0_mem=3072M loglvl=warning
guest_loglvl=warning
cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-8)
cc_compile_by          : ultrotter
cc_compile_domain      : debian.org
cc_compile_date        : Sat Sep  8 19:15:46 UTC 2012
xend_config_format     : 4

I''m experiencing weekly a clock jump ahead of about 50 minutes on dom0.
I''m seriously in trouble because it cause every time a reboot of one
of the two nodes clusters.

Dan Magenheimer

2012-Sep-27 19:27 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

> From: Mauro [mailto:mrsanna1@gmail.com]
> Sent: Thursday, September 27, 2012 9:55 AM
> To: Olivier Hanesse
> Cc: Dan Magenheimer; Jeremy Fitzhardinge; xen-devel@lists.xensource.com;
Keir Fraser; Jan Beulich;
> Keir Fraser; Xen Users; Mark Adams
> Subject: Re: [Xen-users] Re: [Xen-devel] Xen 4 TSC problems
> 
> On 28 February 2011 16:54, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
> > Yes this is what I mean.
> > I am glad to hear that it isn''t a bad sign :)
> > I thought of a bad sign, because on system with "reliable
TSC", this counter
> > is always 0.
> 
> Hey men.
> I have exactly the same problem.
> I have two cluster nodes.
> Server are two HP Proliant DL 580 G4 with four Quad Core Intel(R)
> Xeon(R) CPU E7330  @ 2.40GHz.
> I''m running debian squeeze in dom0s end domUs.
Hi Mauro --

There''s been a lot of work on clocks since 4.0 (by other Xen
developers,
not me).  I don''t think this specific problem was ever reproduced
by a developer so I don''t think anyone knows if it has been
already fixed or not, nor are there any plans to backport all the
timer work to 4.0.

You might try upgrading your Xen hypervisor to the just-released
Xen 4.2 [1] and see if the problem goes away.  If the problem still
exists in 4.2, it may be easier to get some developer to pay attention
to it.  It may be specific hardware or processors or power
management or firmware or even dom0 kernel, so the first thing
to do is try later hypervisor bits.

Sorry I can''t be more helpful.  Good luck!

Dan

[1] Sorry, I''m not familiar with the 4.0->4.2 upgrade process
so you may want to confirm with others.
> xm info:
> 
> host                   : xen-p01
> release                : 2.6.32-5-xen-amd64
> version                : #1 SMP Sun May 6 08:57:29 UTC 2012
> machine                : x86_64
> nr_cpus                : 16
> nr_nodes               : 1
> cores_per_socket       : 4
> threads_per_core       : 1
> cpu_mhz                : 2400
> hw_caps                :
> bfebfbff:20100800:00000000:00000940:0004e3bd:00000000:00000001:00000000
> virt_caps              : hvm
> total_memory           : 65532
> free_memory            : 40317
> node_to_cpu            : node0:0-15
> node_to_memory         : node0:40317
> node_to_dma32_mem      : node0:3256
> max_node_id            : 0
> xen_major              : 4
> xen_minor              : 0
> xen_extra              : .1
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : unavailable
> xen_commandline        : placeholder dom0_mem=3072M loglvl=warning
> guest_loglvl=warning
> cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-8)
> cc_compile_by          : ultrotter
> cc_compile_domain      : debian.org
> cc_compile_date        : Sat Sep  8 19:15:46 UTC 2012
> xend_config_format     : 4
> 
> I''m experiencing weekly a clock jump ahead of about 50 minutes on
dom0.
> I''m seriously in trouble because it cause every time a reboot of
one
> of the two nodes clusters.

Olivier Hanesse

2012-Sep-27 21:28 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

Hello,

From my point of view, this was a kind of xen hardware
"incompatibility/bug" : I was able to reproduce this bug on more than
50
identical servers, but not on another farm of servers with a different
hardware.
Xen version, Debian Kernel was exactly the same on both farm.

Regards

Olivier

2012/9/27 Dan Magenheimer <dan.magenheimer@oracle.com>
> > From: Mauro [mailto:mrsanna1@gmail.com]
> > Sent: Thursday, September 27, 2012 9:55 AM
> > To: Olivier Hanesse
> > Cc: Dan Magenheimer; Jeremy Fitzhardinge;
xen-devel@lists.xensource.com;
> Keir Fraser; Jan Beulich;
> > Keir Fraser; Xen Users; Mark Adams
> > Subject: Re: [Xen-users] Re: [Xen-devel] Xen 4 TSC problems
> >
> > On 28 February 2011 16:54, Olivier Hanesse
<olivier.hanesse@gmail.com>
> wrote:
> > > Yes this is what I mean.
> > > I am glad to hear that it isn''t a bad sign :)
> > > I thought of a bad sign, because on system with "reliable
TSC", this
> counter
> > > is always 0.
> >
> > Hey men.
> > I have exactly the same problem.
> > I have two cluster nodes.
> > Server are two HP Proliant DL 580 G4 with four Quad Core Intel(R)
> > Xeon(R) CPU E7330  @ 2.40GHz.
> > I''m running debian squeeze in dom0s end domUs.
>
> Hi Mauro --
>
> There''s been a lot of work on clocks since 4.0 (by other Xen
developers,
> not me).  I don''t think this specific problem was ever reproduced
> by a developer so I don''t think anyone knows if it has been
> already fixed or not, nor are there any plans to backport all the
> timer work to 4.0.
>
> You might try upgrading your Xen hypervisor to the just-released
> Xen 4.2 [1] and see if the problem goes away.  If the problem still
> exists in 4.2, it may be easier to get some developer to pay attention
> to it.  It may be specific hardware or processors or power
> management or firmware or even dom0 kernel, so the first thing
> to do is try later hypervisor bits.
>
> Sorry I can''t be more helpful.  Good luck!
>
> Dan
>
> [1] Sorry, I''m not familiar with the 4.0->4.2 upgrade process
> so you may want to confirm with others.
>
> > xm info:
> >
> > host                   : xen-p01
> > release                : 2.6.32-5-xen-amd64
> > version                : #1 SMP Sun May 6 08:57:29 UTC 2012
> > machine                : x86_64
> > nr_cpus                : 16
> > nr_nodes               : 1
> > cores_per_socket       : 4
> > threads_per_core       : 1
> > cpu_mhz                : 2400
> > hw_caps                :
> >
bfebfbff:20100800:00000000:00000940:0004e3bd:00000000:00000001:00000000
> > virt_caps              : hvm
> > total_memory           : 65532
> > free_memory            : 40317
> > node_to_cpu            : node0:0-15
> > node_to_memory         : node0:40317
> > node_to_dma32_mem      : node0:3256
> > max_node_id            : 0
> > xen_major              : 4
> > xen_minor              : 0
> > xen_extra              : .1
> > xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> > hvm-3.0-x86_32p hvm-3.0-x86_64
> > xen_scheduler          : credit
> > xen_pagesize           : 4096
> > platform_params        : virt_start=0xffff800000000000
> > xen_changeset          : unavailable
> > xen_commandline        : placeholder dom0_mem=3072M loglvl=warning
> > guest_loglvl=warning
> > cc_compiler            : gcc version 4.4.5 (Debian 4.4.5-8)
> > cc_compile_by          : ultrotter
> > cc_compile_domain      : debian.org
> > cc_compile_date        : Sat Sep  8 19:15:46 UTC 2012
> > xend_config_format     : 4
> >
> > I''m experiencing weekly a clock jump ahead of about 50
minutes on dom0.
> > I''m seriously in trouble because it cause every time a reboot
of one
> > of the two nodes clusters.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Mauro

2012-Sep-27 21:42 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 27 September 2012 23:28, Olivier Hanesse <olivier.hanesse@gmail.com>
wrote:> Hello,
>
> From my point of view, this was a kind of xen hardware
"incompatibility/bug"
> : I was able to reproduce this bug on more than 50 identical servers, but
> not on another farm of servers with a different hardware.
> Xen version, Debian Kernel was exactly the same on both farm.
Yes I think so.
The problem is where I use debian squeeze with xen 4.0.
In another server with the same hardware but with debian lenny and xen
3.0 I have no problems.
I''ve read that a workaround is to set clocksource=pit on the xen boot
line in the grub conf, I hope this works because I can''t change
hardware.

Olivier Hanesse

2012-Sep-29 08:08 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

It didn''t work for me :(
clocksource=pit made another "time jump" (don''t remember how
much, but it
was worst than 50min)

2012/9/27 Mauro <mrsanna1@gmail.com>
> On 27 September 2012 23:28, Olivier Hanesse
<olivier.hanesse@gmail.com>
> wrote:
> > Hello,
> >
> > From my point of view, this was a kind of xen hardware
> "incompatibility/bug"
> > : I was able to reproduce this bug on more than 50 identical servers,
but
> > not on another farm of servers with a different hardware.
> > Xen version, Debian Kernel was exactly the same on both farm.
>
> Yes I think so.
> The problem is where I use debian squeeze with xen 4.0.
> In another server with the same hardware but with debian lenny and xen
> 3.0 I have no problems.
> I''ve read that a workaround is to set clocksource=pit on the xen
boot
> line in the grub conf, I hope this works because I can''t change
hardware.
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Mauro

2012-Sep-29 09:41 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 29 September 2012 10:08, Olivier Hanesse <olivier.hanesse@gmail.com>
wrote:> It didn''t work for me :(
> clocksource=pit made another "time jump" (don''t remember
how much, but it
> was worst than 50min)
Damn.........so there isn''t a solution, it is a huge problem.
What processors do you have?
I have HP Proliant DL580 G5 systems with four quad core Intel(R)
Xeon(R) CPU E7330  @ 2.40GHz and debian linux as s.o.

Mauro

2012-Sep-29 12:19 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

It''s happened another time, system date 50 minutes ahead.
There is really no solution?

root@xen-p02:~# date
sab 29 set 2012, 15.06.25, CEST

root@xen-p02:~# hwclock --debug
hwclock from util-linux-ng 2.17.2
Using /dev interface to clock.
Last drift adjustment done at 1348816781 seconds after 1969
Last calibration done at 1348816781 seconds after 1969
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC time.
Waiting for clock tick...
...got clock tick
Time read from Hardware Clock: 2012/09/29 12:16:58
Hw clock time : 2012/09/29 12:16:58 = 1348921018 seconds since 1969
sab 29 set 2012 14:16:58 CEST  -0.751536 seconds

root@xen-p02:~# hwclock --show
sab 29 set 2012 14:17:12 CEST  -0.423643 seconds

Mauro

2012-Sep-29 15:13 UTC

head link

Re: [Xen-devel] Xen 4 TSC problems

On 24 February 2011 08:16, Keir Fraser <keir.xen@gmail.com>
wrote:> Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well
> would be interesting, if you still have it installed on any of these
> machines.
If it can be useful here is xm dmesf of xen 4.0 on a debian squeeze system:

(XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org) (gcc
version 4.4.5 (Debian 4.4.5-8) ) Sat Sep  8 19:15:46 UTC 2012
(XEN) Bootloader: GRUB 1.98+20100804-14+squeeze1
(XEN) Command line: placeholder dom0_mem=3072M loglvl=warning
guest_loglvl=warning
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 2 seconds
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009f400 (usable)
(XEN)  000000000009f400 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000cfd43000 (usable)
(XEN)  00000000cfd43000 - 00000000cfd4c000 (ACPI data)
(XEN)  00000000cfd4c000 - 00000000cfd4d000 (usable)
(XEN)  00000000cfd4d000 - 00000000d0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 00000000fed00000 (reserved)
(XEN)  00000000fee00000 - 00000000fee10000 (reserved)
(XEN)  00000000ffc00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 000000102ffff000 (usable)
(XEN) ACPI: RSDP 000F4F20, 0024 (r2 HP    )
(XEN) ACPI: XSDT CFD43900, 007C (r1 HP     ProLiant        2   �     162E)
(XEN) ACPI: FACP CFD439C0, 00F4 (r3 HP     ProLiant        2   �     162E)
(XEN) ACPI: DSDT CFD43AC0, 30C9 (r1 HP         DSDT        1 INTL 20030228)
(XEN) ACPI: FACS CFD43100, 0040
(XEN) ACPI: SPCR CFD43140, 0050 (r1 HP     SPCRRBSU        1   �     162E)
(XEN) ACPI: MCFG CFD431C0, 003C (r1 HP     ProLiant        1             0)
(XEN) ACPI: HPET CFD43200, 0038 (r1 HP     ProLiant        2   �     162E)
(XEN) ACPI: FFFF CFD43240, 0064 (r2 HP     P61             2   �     162E)
(XEN) ACPI: SPMI CFD432C0, 0040 (r5 HP     ProLiant        1   �     162E)
(XEN) ACPI: ERST CFD43300, 01D0 (r1 HP     ProLiant        1   �     162E)
(XEN) ACPI: APIC CFD43500, 0176 (r1 HP     ProLiant        2             0)
(XEN) ACPI: FFFF CFD43680, 0176 (r1 HP     ProLiant        1   �     162E)
(XEN) ACPI: BERT CFD43800, 0030 (r1 HP     ProLiant        1   �     162E)
(XEN) ACPI: HEST CFD43840, 00BC (r1 HP     ProLiant        1   �     162E)
(XEN) System RAM: 65532MB (67105672kB)
(XEN) Domain heap initialised
(XEN) Processor #0 6:15 APIC version 20
(XEN) Processor #8 6:15 APIC version 20
(XEN) Processor #16 6:15 APIC version 20
(XEN) Processor #24 6:15 APIC version 20
(XEN) Processor #1 6:15 APIC version 20
(XEN) Processor #9 6:15 APIC version 20
(XEN) Processor #17 6:15 APIC version 20
(XEN) Processor #25 6:15 APIC version 20
(XEN) Processor #2 6:15 APIC version 20
(XEN) Processor #10 6:15 APIC version 20
(XEN) Processor #18 6:15 APIC version 20
(XEN) Processor #26 6:15 APIC version 20
(XEN) Processor #3 6:15 APIC version 20
(XEN) Processor #11 6:15 APIC version 20
(XEN) Processor #19 6:15 APIC version 20
(XEN) Processor #27 6:15 APIC version 20
(XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
(XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec80000, GSI 24-47
(XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec81000, GSI 48-71
(XEN) IOAPIC[3]: apic_id 4, version 32, address 0xfec81800, GSI 72-95
(XEN) Enabling APIC mode:  Phys.  Using 4 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2400.145 MHz processor.
(XEN) Initing memory sharing.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) I/O virtualisation disabled
(XEN) Total of 16 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) checking TSC synchronization across 16 CPUs: passed.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 32 KiB.
(XEN) Brought up 16 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1708000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   000000083c000000->0000000840000000 (770048 pages
to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff81708000
(XEN)  Init. ramdisk: ffffffff81708000->ffffffff81efb000
(XEN)  Phys-Mach map: ffffffff81efb000->ffffffff824fb000
(XEN)  Start info:    ffffffff824fb000->ffffffff824fb4b4
(XEN)  Page tables:   ffffffff824fc000->ffffffff82513000
(XEN)  Boot stack:    ffffffff82513000->ffffffff82514000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82800000
(XEN)  ENTRY ADDRESS: ffffffff81531200
(XEN) Dom0 has maximum 16 VCPUs
(XEN) Scrubbing Free RAM:
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Errors and warnings
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
input to Xen)
(XEN) Freed 176kB init memory.
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.


and here is xm dmesg of xen 3.2 on a debian lenny system running on
the same hardware, on this system I don't have clock problems:

(XEN) Xen version 3.2-1 (Debian 3.2.1-2) (waldi@debian.org) (gcc
version 4.3.1 (Debian 4.3.1-2) ) Sat Jun 28 09:32:18 UTC 2008
(XEN) Command line:
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 2 seconds
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009f400 (usable)
(XEN)  000000000009f400 - 00000000000a0000 (reserved)
(XEN)  00000000000f0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000cfd43000 (usable)
(XEN)  00000000cfd43000 - 00000000cfd4c000 (ACPI data)
(XEN)  00000000cfd4c000 - 00000000cfd4d000 (usable)
(XEN)  00000000cfd4d000 - 00000000d0000000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fec00000 - 00000000fed00000 (reserved)
(XEN)  00000000fee00000 - 00000000fee10000 (reserved)
(XEN)  00000000ffc00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000f2ffff000 (usable)
(XEN) System RAM: 61436MB (62911368kB)
(XEN) Xen heap: 12MB (13128kB)
(XEN) Domain heap initialised: DMA width 32 bits
(XEN) Processor #0 6:15 APIC version 20
(XEN) Processor #8 6:15 APIC version 20
(XEN) Processor #16 6:15 APIC version 20
(XEN) Processor #24 6:15 APIC version 20
(XEN) Processor #1 6:15 APIC version 20
(XEN) Processor #9 6:15 APIC version 20
(XEN) Processor #17 6:15 APIC version 20
(XEN) Processor #25 6:15 APIC version 20
(XEN) Processor #2 6:15 APIC version 20
(XEN) Processor #10 6:15 APIC version 20
(XEN) Processor #18 6:15 APIC version 20
(XEN) Processor #26 6:15 APIC version 20
(XEN) Processor #3 6:15 APIC version 20
(XEN) Processor #11 6:15 APIC version 20
(XEN) Processor #19 6:15 APIC version 20
(XEN) Processor #27 6:15 APIC version 20
(XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
(XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec80000, GSI 24-47
(XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec81000, GSI 48-71
(XEN) IOAPIC[3]: apic_id 4, version 32, address 0xfec81800, GSI 72-95
(XEN) Enabling APIC mode:  Phys.  Using 4 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2400.136 MHz processor.
(XEN) HVM: VMX enabled
(XEN) CPU0: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 1/8 eip 8c000
(XEN) CPU1: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 2/16 eip 8c000
(XEN) CPU2: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 3/24 eip 8c000
(XEN) CPU3: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 4/1 eip 8c000
(XEN) CPU4: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 5/9 eip 8c000
(XEN) CPU5: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 6/17 eip 8c000
(XEN) CPU6: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 7/25 eip 8c000
(XEN) CPU7: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 8/2 eip 8c000
(XEN) CPU8: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 9/10 eip 8c000
(XEN) CPU9: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 10/18 eip 8c000
(XEN) CPU10: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 11/26 eip 8c000
(XEN) CPU11: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 12/3 eip 8c000
(XEN) CPU12: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 13/11 eip 8c000
(XEN) CPU13: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 14/19 eip 8c000
(XEN) CPU14: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Booting processor 15/27 eip 8c000
(XEN) CPU15: Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz stepping 0b
(XEN) Total of 16 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) Platform timer overflows in 14998 jiffies.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 16 CPUs
(XEN) AMD IOMMU: Disabled
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, lsb, paddr 0x200000 -> 0x631918
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   00000008e0000000->00000008f0000000 (15422585
pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff80200000->ffffffff80631918
(XEN)  Init. ramdisk: ffffffff80632000->ffffffff80d40e00
(XEN)  Phys-Mach map: ffffffff80d41000->ffffffff8836b3c8
(XEN)  Start info:    ffffffff8836c000->ffffffff8836c4a4
(XEN)  Page tables:   ffffffff8836d000->ffffffff883b4000
(XEN)  Boot stack:    ffffffff883b4000->ffffffff883b5000
(XEN)  TOTAL:         ffffffff80000000->ffffffff88800000
(XEN)  ENTRY ADDRESS: ffffffff80200000
(XEN) Dom0 has maximum 16 VCPUs
(XEN) Initrd len 0x70ee00, start at 0xffffffff80632000
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
input to Xen)
(XEN) Freed 104kB init memory.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Pasi Kärkkäinen

2012-Sep-30 15:13 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:> It''s happened another time, system date 50 minutes ahead.
> There is really no solution?
> 
Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
It helps a lot to know if the issue is still in the latest hypervisor versions
or not.

4.0.1 is quite old already.. and besides 4.0.4 is the latest version in 4.0
branch.

-- Pasi
> root@xen-p02:~# date
> sab 29 set 2012, 15.06.25, CEST
> 
> root@xen-p02:~# hwclock --debug
> hwclock from util-linux-ng 2.17.2
> Using /dev interface to clock.
> Last drift adjustment done at 1348816781 seconds after 1969
> Last calibration done at 1348816781 seconds after 1969
> Hardware clock is on UTC time
> Assuming hardware clock is kept in UTC time.
> Waiting for clock tick...
> ...got clock tick
> Time read from Hardware Clock: 2012/09/29 12:16:58
> Hw clock time : 2012/09/29 12:16:58 = 1348921018 seconds since 1969
> sab 29 set 2012 14:16:58 CEST  -0.751536 seconds
> 
> root@xen-p02:~# hwclock --show
> sab 29 set 2012 14:17:12 CEST  -0.423643 seconds
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Mauro

2012-Sep-30 19:23 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi>
wrote:> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:
>> It's happened another time, system date 50 minutes ahead.
>> There is really no solution?
>>
>
> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
> It helps a lot to know if the issue is still in the latest hypervisor
versions or not.
I'm using debian squeeze xen kernel and this kernel has xen 4.0.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Mauro

2012-Sep-30 20:19 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com>
wrote:> On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote:
>> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:
>>> It's happened another time, system date 50 minutes ahead.
>>> There is really no solution?
>>>
>>
>> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
>> It helps a lot to know if the issue is still in the latest hypervisor
versions or not.
There is someone that had the problem and solved using a recent xen hypervisor?

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Zary Matej

2012-Oct-01 11:39 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>From: xen-users-bounces@lists.xen.org [xen-users-bounces@lists.xen.org] On
Behalf Of Mauro [mrsanna1@gmail.com]
>On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote:
>> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:
>>> It''s happened another time, system date 50 minutes ahead.
>>> There is really no solution?
>>>
>>
>> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
>> It helps a lot to know if the issue is still in the latest hypervisor
versions or not.
>
>I''m using debian squeeze xen kernel and this kernel has xen 4.0.
>
Xen and kernel are two different things, you can mix and match them in many
ways. If you don''t want to compile from source, use Xen packages from
Debian Wheezy (4.1.3-2), Sid (4.1.3-3) or Experimental (4.2.0-1). I doubt you
will move forward without trying newer versions no matter how many mails you
write to list (all the people) ... :)

Matej

Olivier Hanesse

2012-Oct-15 07:39 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

Hello,

Strange things on this rainy morning, I got this bug again on another
hardware, and Xen hypervisor from Wheezy and kernel from Squeeze.

ii  linux-image-2.6.32-5-xen-amd64      2.6.32-46                  Linux
2.6.32 for 64-bit PCs, Xen dom0 support
ii  xen-hypervisor-4.1-amd64               4.1.3-2                      Xen
Hypervisor on AMD64

I''ve upgraded this server last week. I don''t know if it is
linked or not,
but I didn''t get any ''time wrap'' on this server for
more than 250days.
Maybe it is related to the upgrade process. Before the upgrade, my version
were :

ii  linux-image-2.6.32-5-xen-amd64   2.6.32-41squeeze2            Linux
2.6.32 for 64-bit PCs, Xen dom0 support
ii  xen-hypervisor-4.1-amd64            4.1.2-2                      Xen
Hypervisor on AMD64

I only upgraded half of my servers, so I will wait a little bit to upgrade
the other half and see if this issue occurs again only on updated servers.

For the record, always the same errors :

xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more times.

/var/log/*   => Oct 14 22:46:07 eul2400468 kernel: [734618.562219]
Clocksource tsc unstable (delta = -2999660313370 ns)

I thought it was this issue was hardware related, maybe not.

Olivier

2012/9/30 Mauro <mrsanna1@gmail.com>
> On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com> wrote:
> > On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi>
wrote:
> >> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:
> >>> It''s happened another time, system date 50 minutes
ahead.
> >>> There is really no solution?
> >>>
> >>
> >> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
> >> It helps a lot to know if the issue is still in the latest
hypervisor
> versions or not.
>
> There is someone that had the problem and solved using a recent xen
> hypervisor?
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

<Philippe.Simonet@swisscom.com>

2012-Oct-15 08:05 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

Hi Oliver

bad news, this means that xen 4.1xxx doesn''t solve this issue ...

on my side, on my  hardware that produce this bug ,I never had the problem whit
this combination : (100% WHEEZY)
ii  xen-hypervisor-4.1-amd64                  4.1.3-2                   amd64   
Xen Hypervisor on AMD64
ii  linux-image-3.2.0-3-amd64                 3.2.23-1                  amd64   
Linux 3.2 for 64-bit PCs

                (this was because I had a great hope that 4.1.xxx solved the
problem ...)

Philippe



From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org]
On Behalf Of Olivier Hanesse
Sent: Monday, October 15, 2012 9:40 AM
To: Mauro
Cc: Dan Magenheimer; xen-devel@lists.xensource.com; Keir Fraser; Jeremy
Fitzhardinge; Keir Fraser; Xen Users; Mark Adams
Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems

Hello,

Strange things on this rainy morning, I got this bug again on another hardware,
and Xen hypervisor from Wheezy and kernel from Squeeze.

ii  linux-image-2.6.32-5-xen-amd64      2.6.32-46                  Linux 2.6.32
for 64-bit PCs, Xen dom0 support
ii  xen-hypervisor-4.1-amd64               4.1.3-2                      Xen
Hypervisor on AMD64

I''ve upgraded this server last week. I don''t know if it is
linked or not, but I didn''t get any ''time wrap'' on
this server for more than 250days.
Maybe it is related to the upgrade process. Before the upgrade, my version were
:

ii  linux-image-2.6.32-5-xen-amd64   2.6.32-41squeeze2            Linux 2.6.32
for 64-bit PCs, Xen dom0 support
ii  xen-hypervisor-4.1-amd64            4.1.2-2                      Xen
Hypervisor on AMD64

I only upgraded half of my servers, so I will wait a little bit to upgrade the
other half and see if this issue occurs again only on updated servers.

For the record, always the same errors :

xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or
more times.

/var/log/*   => Oct 14 22:46:07 eul2400468 kernel: [734618.562219]
Clocksource tsc unstable (delta = -2999660313370 ns)

I thought it was this issue was hardware related, maybe not.

Olivier

2012/9/30 Mauro <mrsanna1@gmail.com<mailto:mrsanna1@gmail.com>>
On 30 September 2012 21:23, Mauro
<mrsanna1@gmail.com<mailto:mrsanna1@gmail.com>>
wrote:> On 30 September 2012 17:13, Pasi Kärkkäinen
<pasik@iki.fi<mailto:pasik@iki.fi>> wrote:
>> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:
>>> It''s happened another time, system date 50 minutes ahead.
>>> There is really no solution?
>>>
>>
>> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
>> It helps a lot to know if the issue is still in the latest hypervisor
versions or not.There is someone that had the problem and solved using a recent xen hypervisor?



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Mauro

2012-Oct-15 09:39 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

really really bad news, I hope this annoying problem will be resolved very soon.


On 15 October 2012 10:05,  <Philippe.Simonet@swisscom.com>
wrote:> Hi Oliver
>
>
>
> bad news, this means that xen 4.1xxx doesn’t solve this issue …
>
>
>
> on my side, on my  hardware that produce this bug ,I never had the problem
> whit this combination : (100% WHEEZY)
>
> ii  xen-hypervisor-4.1-amd64                  4.1.3-2
> amd64        Xen Hypervisor on AMD64
>
> ii  linux-image-3.2.0-3-amd64                 3.2.23-1
> amd64        Linux 3.2 for 64-bit PCs
>
>
>
>                 (this was because I had a great hope that 4.1.xxx solved
the
> problem …)
>
>
>
> Philippe
>
>
>
>
>
>
>
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Olivier Hanesse
> Sent: Monday, October 15, 2012 9:40 AM
> To: Mauro
> Cc: Dan Magenheimer; xen-devel@lists.xensource.com; Keir Fraser; Jeremy
> Fitzhardinge; Keir Fraser; Xen Users; Mark Adams
> Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems
>
>
>
> Hello,
>
>
>
> Strange things on this rainy morning, I got this bug again on another
> hardware, and Xen hypervisor from Wheezy and kernel from Squeeze.
>
>
>
> ii  linux-image-2.6.32-5-xen-amd64      2.6.32-46                  Linux
> 2.6.32 for 64-bit PCs, Xen dom0 support
>
> ii  xen-hypervisor-4.1-amd64               4.1.3-2                      Xen
> Hypervisor on AMD64
>
>
>
> I've upgraded this server last week. I don't know if it is linked
or not,
> but I didn't get any 'time wrap' on this server for more than
250days.
>
> Maybe it is related to the upgrade process. Before the upgrade, my version
> were :
>
>
>
> ii  linux-image-2.6.32-5-xen-amd64   2.6.32-41squeeze2            Linux
> 2.6.32 for 64-bit PCs, Xen dom0 support
>
> ii  xen-hypervisor-4.1-amd64            4.1.2-2                      Xen
> Hypervisor on AMD64
>
>
>
> I only upgraded half of my servers, so I will wait a little bit to upgrade
> the other half and see if this issue occurs again only on updated servers.
>
>
>
> For the record, always the same errors :
>
>
>
> xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10
or
> more times.
>
>
>
> /var/log/*   => Oct 14 22:46:07 eul2400468 kernel: [734618.562219]
> Clocksource tsc unstable (delta = -2999660313370 ns)
>
>
>
> I thought it was this issue was hardware related, maybe not.
>
>
>
> Olivier
>
>
>
> 2012/9/30 Mauro <mrsanna1@gmail.com>
>
> On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com> wrote:
>> On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote:
>>> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:
>>>> It's happened another time, system date 50 minutes ahead.
>>>> There is really no solution?
>>>>
>>>
>>> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0.
>>> It helps a lot to know if the issue is still in the latest
hypervisor
>>> versions or not.
>
> There is someone that had the problem and solved using a recent xen
> hypervisor?
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Oct-15 10:32 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 15.10.12 at 09:39, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
> For the record, always the same errors :
> 
> xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10
or
> more times.
This is what needs to be analyzed: For it to happen, timer softirqs
must not occur for quite long a period of time, and it needs to be
determined what that is. Since only very few people can actually
see this problem, we depend on at least some data collection
(including figuring out what hardware and/or software components
are involved in surfacing the problem) being done by them.

Jan

Mauro

2012-Oct-15 11:24 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 15 October 2012 12:32, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 15.10.12 at 09:39, Olivier Hanesse
<olivier.hanesse@gmail.com> wrote:
>> For the record, always the same errors :
>>
>> xm dmesg => (XEN) Platform timer appears to have unexpectedly
wrapped 10 or
>> more times.
>
> This is what needs to be analyzed: For it to happen, timer softirqs
> must not occur for quite long a period of time, and it needs to be
> determined what that is. Since only very few people can actually
> see this problem, we depend on at least some data collection
> (including figuring out what hardware and/or software components
> are involved in surfacing the problem) being done by them.
I have the problem on this hardware type:

Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @ 2.40GHz.
It seem that
GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
put in in /etc/default/grup (I use linux debian)
solves the problem for me.

Jan Beulich

2012-Oct-15 12:49 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote:
> I have the problem on this hardware type:
> 
> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @ 2.40GHz.
> It seem that
> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
> put in in /etc/default/grup (I use linux debian)
> solves the problem for me.
Did you check whether either or both options on their own also
make the problem go away?

Jan

Mauro

2012-Oct-15 14:25 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote:
>> I have the problem on this hardware type:
>>
>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @ 2.40GHz.
>> It seem that
>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
>> put in in /etc/default/grup (I use linux debian)
>> solves the problem for me.
>
> Did you check whether either or both options on their own also
> make the problem go away?
Only clocksource=pit does not solve the problem, I''ve not tried with
only cpuidle=0, I will try soon.

Keir Fraser

2012-Oct-17 16:15 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 15/10/2012 15:25, "Mauro" <mrsanna1@gmail.com> wrote:
> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com>
wrote:
>>> I have the problem on this hardware type:
>>> 
>>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @
2.40GHz.
>>> It seem that
>>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
>>> put in in /etc/default/grup (I use linux debian)
>>> solves the problem for me.
>> 
>> Did you check whether either or both options on their own also
>> make the problem go away?
> 
> Only clocksource=pit does not solve the problem, I''ve not tried
with
> only cpuidle=0, I will try soon.
The problem here is that the platform timer has *not* wrapped. In fact it is
almost certainly correct, and it is the calculation of current-system-time
extrapolated from local CPU''s TSC that has gone haywire. The
overflow-handling logic in plt_overflow() then propagates that incorrectness
into plt_stamp64 (up to a maximum of 10 times wrapping the platform
timer''s
counter). This means that platform time is incorrect (skips forward) and
soon after will infect the local time estimation for all CPUs.

I''ve attached a patch which will (a) stop plt_overflow() from
misguidedly
trying to fix up apparent platform timer overflow; and (b) will print
possibly-useful diagnostics when apparent ''timer overflow''
occurs. Such
lines will be prefixed "XXX plt_overflow:" in the hypervisor log.
Patch is
against xen-unstable but I''m sure it must backport to older trees quite
trivially.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-Oct-18 07:40 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused)
>          plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1);
>          if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
>              break;
> +        rdtscll(tsc);
> +        printk("XXX plt_overflow: plt_now=%"PRIx64"
plt_wrap=%"PRIx64
> +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
> +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
> +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
> +               plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64,
> +               plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp);
> +        break;
Is the break here, making the following update to plt_stamp64 dead code
deliberate?
>          plt_stamp64 += plt_mask + 1;
>      }
>      if ( i != 0 )
Ian.

Keir Fraser

2012-Oct-18 07:55 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 18/10/2012 08:40, "Ian Campbell" <Ian.Campbell@citrix.com>
wrote:
> On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:
>> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused)
>>          plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1);
>>          if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
>>              break;
>> +        rdtscll(tsc);
>> +        printk("XXX plt_overflow: plt_now=%"PRIx64"
plt_wrap=%"PRIx64
>> +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
>> +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
>> +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
>> +               plt_now, plt_wrap, now, old_stamp, plt_stamp,
plt_stamp64,
>> +               plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp);
>> +        break;
> 
> Is the break here, making the following update to plt_stamp64 dead code
> deliberate?
Yes, it''s a hack to disable the timer-has-apparently-wrapped
workaround.

 -- Keir
>>          plt_stamp64 += plt_mask + 1;
>>      }
>>      if ( i != 0 )
> 
> Ian.
> 
>

Ian Campbell

2012-Oct-18 08:33 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On Thu, 2012-10-18 at 08:55 +0100, Keir Fraser wrote:> On 18/10/2012 08:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
> 
> > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:
> >> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused)
> >>          plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask +
1);
> >>          if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
> >>              break;
> >> +        rdtscll(tsc);
> >> +        printk("XXX plt_overflow:
plt_now=%"PRIx64" plt_wrap=%"PRIx64
> >> +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
> >> +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
> >> +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
> >> +               plt_now, plt_wrap, now, old_stamp, plt_stamp,
plt_stamp64,
> >> +               plt_mask, tsc,
this_cpu(cpu_time).local_tsc_stamp);
> >> +        break;
> > 
> > Is the break here, making the following update to plt_stamp64 dead
code
> > deliberate?
> 
> Yes, it''s a hack to disable the timer-has-apparently-wrapped
workaround.
OK, good.

I wonder if this explains some of the issues which have been plaguing
Debian Squeeze (4.0 based) for a while now. I''ll see if I can get
someone there to give it a go.

Ian.
> 
>  -- Keir
> 
> >>          plt_stamp64 += plt_mask + 1;
> >>      }
> >>      if ( i != 0 )
> > 
> > Ian.
> > 
> > 
> 
>

Mauro

2012-Oct-18 08:56 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 18 October 2012 10:33, Ian Campbell <Ian.Campbell@citrix.com>
wrote:> On Thu, 2012-10-18 at 08:55 +0100, Keir Fraser wrote:
>> On 18/10/2012 08:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
>>
>> > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:
>> >> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused)
>> >>          plt_wrap = __read_platform_stime(plt_stamp64 +
plt_mask + 1);
>> >>          if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
>> >>              break;
>> >> +        rdtscll(tsc);
>> >> +        printk("XXX plt_overflow:
plt_now=%"PRIx64" plt_wrap=%"PRIx64
>> >> +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
>> >> +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
>> >> +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
>> >> +               plt_now, plt_wrap, now, old_stamp, plt_stamp,
plt_stamp64,
>> >> +               plt_mask, tsc,
this_cpu(cpu_time).local_tsc_stamp);
>> >> +        break;
>> >
>> > Is the break here, making the following update to plt_stamp64 dead
code
>> > deliberate?
>>
>> Yes, it''s a hack to disable the timer-has-apparently-wrapped
workaround.
>
> OK, good.
>
> I wonder if this explains some of the issues which have been plaguing
> Debian Squeeze (4.0 based) for a while now. I''ll see if I can get
> someone there to give it a go.
If that patch works debian kernel maintainers can be advised if they
can include that patch and release a new kernel working for squeeze.

Ian Campbell

2012-Oct-18 09:36 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On Thu, 2012-10-18 at 09:56 +0100, Mauro wrote:> On 18 October 2012 10:33, Ian Campbell <Ian.Campbell@citrix.com>
wrote:
> > On Thu, 2012-10-18 at 08:55 +0100, Keir Fraser wrote:
> >> On 18/10/2012 08:40, "Ian Campbell"
<Ian.Campbell@citrix.com> wrote:
> >>
> >> > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:
> >> >> @@ -540,6 +541,14 @@ static void plt_overflow(void
*unused)
> >> >>          plt_wrap = __read_platform_stime(plt_stamp64 +
plt_mask + 1);
> >> >>          if ( ABS(plt_wrap - now) > ABS(plt_now - now)
)
> >> >>              break;
> >> >> +        rdtscll(tsc);
> >> >> +        printk("XXX plt_overflow:
plt_now=%"PRIx64" plt_wrap=%"PRIx64
> >> >> +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
> >> >> +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
> >> >> +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
> >> >> +               plt_now, plt_wrap, now, old_stamp,
plt_stamp, plt_stamp64,
> >> >> +               plt_mask, tsc,
this_cpu(cpu_time).local_tsc_stamp);
> >> >> +        break;
> >> >
> >> > Is the break here, making the following update to plt_stamp64
dead code
> >> > deliberate?
> >>
> >> Yes, it''s a hack to disable the
timer-has-apparently-wrapped workaround.
> >
> > OK, good.
> >
> > I wonder if this explains some of the issues which have been plaguing
> > Debian Squeeze (4.0 based) for a while now. I''ll see if I can
get
> > someone there to give it a go.
> 
> If that patch works debian kernel maintainers can be advised if they
> can include that patch and release a new kernel working for squeeze.
AFAIK this is a debug patch, not something we would deploy as is but it
should give us the information required to work out a real fix.

Ian

<Philippe.Simonet@swisscom.com>

2012-Oct-18 13:45 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

in the meantime, it would be cool to have a kernel boot parameter that could
disable this wrapping''
correction'' ? like <check-timer-wrap=false>
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Ian Campbell
> Sent: Thursday, October 18, 2012 9:40 AM
> To: Keir Fraser
> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Dan
> Magenheimer; Mauro; Olivier Hanesse; Jan Beulich; Xen Users; Mark Adams
> Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems
> 
> On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:
> > @@ -540,6 +541,14 @@ static void plt_overflow(void *unused)
> >          plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1);
> >          if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
> >              break;
> > +        rdtscll(tsc);
> > +        printk("XXX plt_overflow: plt_now=%"PRIx64"
plt_wrap=%"PRIx64
> > +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
> > +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
> > +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
> > +               plt_now, plt_wrap, now, old_stamp, plt_stamp,
plt_stamp64,
> > +               plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp);
> > +        break;
> 
> Is the break here, making the following update to plt_stamp64 dead code
> deliberate?
> 
> >          plt_stamp64 += plt_mask + 1;
> >      }
> >      if ( i != 0 )
> 
> Ian.
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Keir Fraser

2012-Oct-18 16:43 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

We have no idea yet whether this workaround even does any good.

 -- Keir

On 18/10/2012 14:45, "Philippe.Simonet@swisscom.com"
<Philippe.Simonet@swisscom.com> wrote:
> in the meantime, it would be cool to have a kernel boot parameter that
could
> disable this wrapping''
> correction'' ? like <check-timer-wrap=false>
> 
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of Ian Campbell
>> Sent: Thursday, October 18, 2012 9:40 AM
>> To: Keir Fraser
>> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Dan
>> Magenheimer; Mauro; Olivier Hanesse; Jan Beulich; Xen Users; Mark Adams
>> Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems
>> 
>> On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:
>>> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused)
>>>          plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask +
1);
>>>          if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
>>>              break;
>>> +        rdtscll(tsc);
>>> +        printk("XXX plt_overflow: plt_now=%"PRIx64"
plt_wrap=%"PRIx64
>>> +               " now=%"PRIx64"
old_stamp=%"PRIx64" new_stamp=%"PRIx64
>>> +               " plt_stamp64=%"PRIx64"
plt_mask=%"PRIx64
>>> +               " tsc=%"PRIx64"
tsc_stamp=%"PRIx64"\n",
>>> +               plt_now, plt_wrap, now, old_stamp, plt_stamp,
plt_stamp64,
>>> +               plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp);
>>> +        break;
>> 
>> Is the break here, making the following update to plt_stamp64 dead code
>> deliberate?
>> 
>>>          plt_stamp64 += plt_mask + 1;
>>>      }
>>>      if ( i != 0 )
>> 
>> Ian.
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

Mauro

2012-Oct-21 20:52 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote:
>> I have the problem on this hardware type:
>>
>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @ 2.40GHz.
>> It seem that
>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
>> put in in /etc/default/grup (I use linux debian)
>> solves the problem for me.
>
> Did you check whether either or both options on their own also
> make the problem go away?
It seems that with debian squeeze on my HP Proliant Dl 580 G5 servers
is sufficient to use
GRUB_CMDLINE_XEN="cpuidle=0".
Is from about 20 days that I have no clock jumps.
Before I had a clock jump every week.
Hope this is the final workaround for me.

Jan Beulich

2012-Oct-22 06:54 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote:
> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com>
wrote:
>>> I have the problem on this hardware type:
>>>
>>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @
2.40GHz.
>>> It seem that
>>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
>>> put in in /etc/default/grup (I use linux debian)
>>> solves the problem for me.
>>
>> Did you check whether either or both options on their own also
>> make the problem go away?
> 
> It seems that with debian squeeze on my HP Proliant Dl 580 G5 servers
> is sufficient to use
> GRUB_CMDLINE_XEN="cpuidle=0".
> Is from about 20 days that I have no clock jumps.
> Before I had a clock jump every week.
> Hope this is the final workaround for me.
So what''s the contents of /proc/cpuinfo (any one CPU suffices)
under a native recent kernel on that system? The most likely
issue here is that we''re mis-identifying the CPU as having an
always running APIC timer (ARAT)...

For a second, less intrusive try: Could you replace "cpuidle=0"
with "max_cstate=1" (assuming the former didn''t meanwhile
turn out not to cure the problem)? If that works too (expected),
try "max_cstate=2" followed eventually by
"max_cstate=2 local_apic_timer_c2_ok".

Jan

Mauro

2012-Oct-22 09:17 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote:
>> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com>
wrote:
>>>> I have the problem on this hardware type:
>>>>
>>>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330  @
2.40GHz.
>>>> It seem that
>>>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0"
>>>> put in in /etc/default/grup (I use linux debian)
>>>> solves the problem for me.
>>>
>>> Did you check whether either or both options on their own also
>>> make the problem go away?
>>
>> It seems that with debian squeeze on my HP Proliant Dl 580 G5 servers
>> is sufficient to use
>> GRUB_CMDLINE_XEN="cpuidle=0".
>> Is from about 20 days that I have no clock jumps.
>> Before I had a clock jump every week.
>> Hope this is the final workaround for me.
>
> So what''s the contents of /proc/cpuinfo (any one CPU suffices)
> under a native recent kernel on that system? The most likely
> issue here is that we''re mis-identifying the CPU as having an
> always running APIC timer (ARAT)...
uname -a

Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012
x86_64 GNU/Linux

cat /proc/cpuinfo

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.176
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.35
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
>
> For a second, less intrusive try: Could you replace "cpuidle=0"
> with "max_cstate=1" (assuming the former didn''t
meanwhile
> turn out not to cure the problem)? If that works too (expected),
> try "max_cstate=2" followed eventually by
> "max_cstate=2 local_apic_timer_c2_ok".
I''ll try but to say that it works I''ve to wait at least two
weeks.

Jan Beulich

2012-Oct-22 09:27 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 22.10.12 at 11:17, Mauro <mrsanna1@gmail.com> wrote:
> On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com>
wrote:
>>> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com>
wrote:
>> So what''s the contents of /proc/cpuinfo (any one CPU suffices)
>> under a native recent kernel on that system? The most likely
>> issue here is that we''re mis-identifying the CPU as having an
>> always running APIC timer (ARAT)...
> 
> uname -a
> 
> Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012
> x86_64 GNU/Linux
I had specifically asked to do this under a _native_ kernel.
> cat /proc/cpuinfo
> 
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
> stepping        : 11
> cpu MHz         : 2400.176
> cache size      : 3072 KB
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
> pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
> rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
> bogomips        : 4800.35
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:
> 
>>
>> For a second, less intrusive try: Could you replace
"cpuidle=0"
>> with "max_cstate=1" (assuming the former didn''t
meanwhile
>> turn out not to cure the problem)? If that works too (expected),
>> try "max_cstate=2" followed eventually by
>> "max_cstate=2 local_apic_timer_c2_ok".
> 
> I''ll try but to say that it works I''ve to wait at least
two weeks.
I understand that this takes quite a bit of time.

Jan

Mauro

2012-Oct-22 10:40 UTC

head link

Re: [Xen-devel] Re: Xen 4 TSC problems

On 22 October 2012 11:27, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 22.10.12 at 11:17, Mauro <mrsanna1@gmail.com> wrote:
>> On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com>
wrote:
>>>> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com>
wrote:
>>> So what''s the contents of /proc/cpuinfo (any one CPU
suffices)
>>> under a native recent kernel on that system? The most likely
>>> issue here is that we''re mis-identifying the CPU as having
an
>>> always running APIC timer (ARAT)...
>>
>> uname -a
>>
>> Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012
>> x86_64 GNU/Linux
>
> I had specifically asked to do this under a _native_ kernel.
sorry for my ignorance, what does it mean native_kernel.

Jan Beulich

2012-Oct-22 12:06 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 22.10.12 at 12:40, Mauro <mrsanna1@gmail.com> wrote:
> On 22 October 2012 11:27, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 22.10.12 at 11:17, Mauro <mrsanna1@gmail.com>
wrote:
>>> On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com>
wrote:
>>>>>>> On 21.10.12 at 22:52, Mauro
<mrsanna1@gmail.com> wrote:
>>>>> On 15 October 2012 14:49, Jan Beulich
<JBeulich@suse.com> wrote:
>>>> So what''s the contents of /proc/cpuinfo (any one CPU
suffices)
>>>> under a native recent kernel on that system? The most likely
>>>> issue here is that we''re mis-identifying the CPU as
having an
>>>> always running APIC timer (ARAT)...
>>>
>>> uname -a
>>>
>>> Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC
2012
>>> x86_64 GNU/Linux
>>
>> I had specifically asked to do this under a _native_ kernel.
> 
> sorry for my ignorance, what does it mean native_kernel.
A kernel run with no Xen underneath it.

Jan

Mauro

2012-Oct-23 07:19 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

> A kernel run with no Xen underneath it.
Here is:

uname -a
Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012
x86_64 GNU/Linux

cat /proc/cpuinfo

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2399.822
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf
pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm
tpr_shadow vnmi flexpriority
bogomips        : 4799.64
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Jan Beulich

2012-Oct-23 07:58 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote:
>>  A kernel run with no Xen underneath it.
> 
> Here is:
> 
> uname -a
> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012
> x86_64 GNU/Linux
I''m sorry to say that, but 2.6.32 is nowhere close to
"recent" (as
I had asked for). The thing is that the (unfortunately incomplete)
hypervisor log you sent earlier leaves open whether we mis-detect
ARAT on that system (the relevant HPET message is info level, but
you had "loglvl=warning" in place), so we can''t be certain of
either
fact (ARAT actually being reported by the CPU as well as whether
HPET broadcast is getting set up).

Irrespective of that it would also be useful to know whether the
native kernel (and in particular its CPU idle management) work
on that system, and which C-states it actually makes use of. So
getting us the contents of the respective sysfs nodes would also
be helpful for reference.

Jan
> cat /proc/cpuinfo
> 
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 15
> model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
> stepping        : 11
> cpu MHz         : 2399.822
> cache size      : 3072 KB
> physical id     : 0
> siblings        : 4
> core id         : 0
> cpu cores       : 4
> apicid          : 0
> initial apicid  : 0
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 10
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf
> pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm
> tpr_shadow vnmi flexpriority
> bogomips        : 4799.64
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:

Mauro

2012-Oct-23 08:40 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote:
>>>  A kernel run with no Xen underneath it.
>>
>> Here is:
>>
>> uname -a
>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012
>> x86_64 GNU/Linux
>
> I''m sorry to say that, but 2.6.32 is nowhere close to
"recent" (as
> I had asked for).
Sorry I''m using squeeze in production servers and I don''t have
a test
machine on which install wheezy.
There''s nothing else I can do?
As reported before with the workaround cpuidle=0 I don''t have any
problem now.

Jan Beulich

2012-Oct-23 08:50 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote:
> On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com>
wrote:
>>>>  A kernel run with no Xen underneath it.
>>>
>>> Here is:
>>>
>>> uname -a
>>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012
>>> x86_64 GNU/Linux
>>
>> I''m sorry to say that, but 2.6.32 is nowhere close to
"recent" (as
>> I had asked for).
> 
> Sorry I''m using squeeze in production servers and I don''t
have a test
> machine on which install wheezy.
And can''t/don''t want to install a self-built kernel?
> There''s nothing else I can do?
I told you yesterday what less invasive command line options you
could try. Plus in an earlier mail today I also asked for specific
information on which C-states the native kernel uses. If all you''ve
got is 2.6.32, obtaining the information there is better than nothing.

Jan
> As reported before with the workaround cpuidle=0 I don''t have any
problem
> now.

Konrad Rzeszutek Wilk

2012-Oct-23 11:50 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich
wrote:> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote:
> > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com>
wrote:
> >>>>  A kernel run with no Xen underneath it.
> >>>
> >>> Here is:
> >>>
> >>> uname -a
> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC
2012
> >>> x86_64 GNU/Linux
> >>
> >> I''m sorry to say that, but 2.6.32 is nowhere close to
"recent" (as
> >> I had asked for).
> > 
> > Sorry I''m using squeeze in production servers and I
don''t have a test
> > machine on which install wheezy.
> 
> And can''t/don''t want to install a self-built kernel?
You could also boot one of those Live Image kernels. Like an Fedora or
Ubuntu and just capture this. That way you don''t over-write anything.

Mauro

2012-Oct-23 14:07 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 23 October 2012 13:50, Konrad Rzeszutek Wilk <konrad@kernel.org>
wrote:> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote:
>> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com>
wrote:
>> > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com>
wrote:
>> >>>>> On 23.10.12 at 09:18, Mauro
<mrsanna1@gmail.com> wrote:
>> >>>>  A kernel run with no Xen underneath it.
>> >>>
>> >>> Here is:
>> >>>
>> >>> uname -a
>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46
UTC 2012
>> >>> x86_64 GNU/Linux
>> >>
>> >> I''m sorry to say that, but 2.6.32 is nowhere close to
"recent" (as
>> >> I had asked for).
>> >
>> > Sorry I''m using squeeze in production servers and I
don''t have a test
>> > machine on which install wheezy.
>>
>> And can''t/don''t want to install a self-built kernel?
>
> You could also boot one of those Live Image kernels. Like an Fedora or
> Ubuntu and just capture this. That way you don''t over-write
anything.
Ok, I''ll try the live image.
Today I had another clock jump so cpuidle=0 doesn''t work.
I''ll stay using clocksource= pit and cpuidle =0 for a while to see if
they together work.
But...how to know what C states the kernel uses?

Jan Beulich

2012-Oct-23 14:43 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com> wrote:
> On 23 October 2012 13:50, Konrad Rzeszutek Wilk <konrad@kernel.org>
wrote:
>> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote:
>>> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com>
wrote:
>>> > On 23 October 2012 09:58, Jan Beulich
<JBeulich@suse.com> wrote:
>>> >>>>> On 23.10.12 at 09:18, Mauro
<mrsanna1@gmail.com> wrote:
>>> >>>>  A kernel run with no Xen underneath it.
>>> >>>
>>> >>> Here is:
>>> >>>
>>> >>> uname -a
>>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23
10:07:46 UTC 2012
>>> >>> x86_64 GNU/Linux
>>> >>
>>> >> I''m sorry to say that, but 2.6.32 is nowhere
close to "recent" (as
>>> >> I had asked for).
>>> >
>>> > Sorry I''m using squeeze in production servers and I
don''t have a test
>>> > machine on which install wheezy.
>>>
>>> And can''t/don''t want to install a self-built
kernel?
>>
>> You could also boot one of those Live Image kernels. Like an Fedora or
>> Ubuntu and just capture this. That way you don''t over-write
anything.
> 
> Ok, I''ll try the live image.
> Today I had another clock jump so cpuidle=0 doesn''t work.
> I''ll stay using clocksource= pit and cpuidle =0 for a while to see
if
> they together work.
> But...how to know what C states the kernel uses?
If "cpuidle=0" alone doesn''t work, your problem is not
C-state
related, and you don''t need to look up how much of it the native
kernel uses.

Jan

Mauro

2012-Oct-23 14:46 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 23 October 2012 16:43, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com> wrote:
>> On 23 October 2012 13:50, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
>>> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote:
>>>> >>> On 23.10.12 at 10:40, Mauro
<mrsanna1@gmail.com> wrote:
>>>> > On 23 October 2012 09:58, Jan Beulich
<JBeulich@suse.com> wrote:
>>>> >>>>> On 23.10.12 at 09:18, Mauro
<mrsanna1@gmail.com> wrote:
>>>> >>>>  A kernel run with no Xen underneath it.
>>>> >>>
>>>> >>> Here is:
>>>> >>>
>>>> >>> uname -a
>>>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23
10:07:46 UTC 2012
>>>> >>> x86_64 GNU/Linux
>>>> >>
>>>> >> I''m sorry to say that, but 2.6.32 is nowhere
close to "recent" (as
>>>> >> I had asked for).
>>>> >
>>>> > Sorry I''m using squeeze in production servers and
I don''t have a test
>>>> > machine on which install wheezy.
>>>>
>>>> And can''t/don''t want to install a self-built
kernel?
>>>
>>> You could also boot one of those Live Image kernels. Like an Fedora
or
>>> Ubuntu and just capture this. That way you don''t
over-write anything.
>>
>> Ok, I''ll try the live image.
>> Today I had another clock jump so cpuidle=0 doesn''t work.
>> I''ll stay using clocksource= pit and cpuidle =0 for a while to
see if
>> they together work.
>> But...how to know what C states the kernel uses?
>
> If "cpuidle=0" alone doesn''t work, your problem is not
C-state
> related, and you don''t need to look up how much of it the native
> kernel uses.
Ok, cpuidle=0 however is to be used because also clocksource=pit alone
doesn''t work.
Now I''ll use both params and see what happens.

Mauro

2012-Oct-23 15:34 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

On 23 October 2012 16:46, Mauro <mrsanna1@gmail.com>
wrote:> On 23 October 2012 16:43, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com>
wrote:
>>> On 23 October 2012 13:50, Konrad Rzeszutek Wilk
<konrad@kernel.org> wrote:
>>>> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote:
>>>>> >>> On 23.10.12 at 10:40, Mauro
<mrsanna1@gmail.com> wrote:
>>>>> > On 23 October 2012 09:58, Jan Beulich
<JBeulich@suse.com> wrote:
>>>>> >>>>> On 23.10.12 at 09:18, Mauro
<mrsanna1@gmail.com> wrote:
>>>>> >>>>  A kernel run with no Xen underneath it.
>>>>> >>>
>>>>> >>> Here is:
>>>>> >>>
>>>>> >>> uname -a
>>>>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23
10:07:46 UTC 2012
>>>>> >>> x86_64 GNU/Linux
>>>>> >>
>>>>> >> I''m sorry to say that, but 2.6.32 is
nowhere close to "recent" (as
>>>>> >> I had asked for).
>>>>> >
>>>>> > Sorry I''m using squeeze in production servers
and I don''t have a test
>>>>> > machine on which install wheezy.
>>>>>
>>>>> And can''t/don''t want to install a
self-built kernel?
>>>>
>>>> You could also boot one of those Live Image kernels. Like an
Fedora or
>>>> Ubuntu and just capture this. That way you don''t
over-write anything.
>>>
>>> Ok, I''ll try the live image.
>>> Today I had another clock jump so cpuidle=0 doesn''t work.
>>> I''ll stay using clocksource= pit and cpuidle =0 for a
while to see if
>>> they together work.
>>> But...how to know what C states the kernel uses?
>>
>> If "cpuidle=0" alone doesn''t work, your problem is
not C-state
>> related, and you don''t need to look up how much of it the
native
>> kernel uses.
>
> Ok, cpuidle=0 however is to be used because also clocksource=pit alone
> doesn''t work.
> Now I''ll use both params and see what happens.
Sorry for the noise, I''m not sure if it has been really a clock jump.
Retry using only cpuidle=0 and then what Jan has suggested.
If you want to see the C states tell me how to do.

p.s. sorry for bad english

Jan Beulich

2012-Oct-23 15:49 UTC

head link

Re: [Xen-users] Re: Xen 4 TSC problems

>>> On 23.10.12 at 17:34, Mauro <mrsanna1@gmail.com> wrote:
> On 23 October 2012 16:46, Mauro <mrsanna1@gmail.com> wrote:
>> On 23 October 2012 16:43, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com>
wrote:
>>>> Today I had another clock jump so cpuidle=0 doesn''t
work.
>>>> I''ll stay using clocksource= pit and cpuidle =0 for a
while to see if
>>>> they together work.
>>>> But...how to know what C states the kernel uses?
>>>
>>> If "cpuidle=0" alone doesn''t work, your problem
is not C-state
>>> related, and you don''t need to look up how much of it the
native
>>> kernel uses.
>>
>> Ok, cpuidle=0 however is to be used because also clocksource=pit alone
>> doesn''t work.
>> Now I''ll use both params and see what happens.
> 
> Sorry for the noise, I''m not sure if it has been really a clock
jump.
> Retry using only cpuidle=0 and then what Jan has suggested.
> If you want to see the C states tell me how to do.
Let''s wait with that until you''re settled on whether
"cpuidle=0"
alone works.

Jan

Xen devel - Feb 2011 - Xen 4 TSC problems

[Xen-devel] Xen 4 TSC problems

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

[Xen-users] RE: [Xen-devel] Xen 4 TSC problems

[Xen-users] Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

[Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

RE: [Xen-devel] Xen 4 TSC problems

RE: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

RE: [Xen-devel] Xen 4 TSC problems

[Xen-devel] RE: Xen 4 TSC problems

RE: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-devel] Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-devel] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems

Re: [Xen-users] Re: Xen 4 TSC problems