Hello I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze release). My problem is here (hopefully I amn''t the only one, so there might be a bug somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 After some times, I got this error : Clocksource tsc unstable (delta -2999660334211 ns). It has happened on several servers. Looking at the output of "xm debug-key s;" (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2850 (count=3) I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the "constant_tsc", but not the "nonstop_tsc" one. On other systems with a newer cpu with "nonstop_tsc", I don''t have this issue (systems are running the same distros with same config). I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t reliable and after some times, I will got the "50min" issue again. I don''t understand how a system can do a jump of "50min" in the future. Why 50min ? it is not 40min, not 1 hour, it is always 50min. I don''t know how to make my TSC "reliable" (I already disable everything about Powerstate in BIOS Settings). Any ideas ? Regards Olivier _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It''s very unlikely this is a problem with TSC. It is most likely a Xen (or possibly a PV Linux) problem where a guest (or dom0) either "goes out to lunch" for a long period, or some other timer gets stuck. The "clocksource tsc unstable" message is a side effect of this... it''s very likely the TSC that IS stable and correct and the other clocksource (pvclock) has lost/gained 50 minutes! Mark Adams cc''ed and his original xen-devel posting below. The fact that two different users (possibly on the same processor/system type?) have submitted the message with a delta so similar would lead me to believe there is some timer that is "wrapping". And since pvclock is usually the clocksource for dom0, and pvclock is driven by Xen''s "system time", a reasonable guess is that the timer that is wrapping is in Xen itself. Mark''s delta = -2999660303788 ns Your delta = -2999660334211 ns Googling, I see the HPET wraparound is ~306 seconds and this delta is about 3000 seconds, so that may be a bad guess. Keir, any thoughts on this? Do you recall any post-4.0 patches that may have fixed this? Thanks, Dan References: http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html https://lkml.org/lkml/2010/10/26/126 From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] Sent: Wednesday, February 23, 2011 3:50 AM To: xen-devel@lists.xensource.com; Xen Users Subject: [Xen-devel] Xen 4 TSC problems Hello I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze release). My problem is here (hopefully I amn''t the only one, so there might be a bug somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 After some times, I got this error : Clocksource tsc unstable (delta = -2999660334211 ns). It has happened on several servers. Looking at the output of "xm debug-key s;" (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2850 (count=3) I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the "constant_tsc", but not the "nonstop_tsc" one. On other systems with a newer cpu with "nonstop_tsc", I don''t have this issue (systems are running the same distros with same config). I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t reliable and after some times, I will got the "50min" issue again. I don''t understand how a system can do a jump of "50min" in the future. Why 50min ? it is not 40min, not 1 hour, it is always 50min. I don''t know how to make my TSC "reliable" (I already disable everything about Powerstate in BIOS Settings). Any ideas ? Regards Olivier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 23/02/2011 16:16, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> It¹s very unlikely this is a problem with TSC. It is most likely a Xen (or > possibly a PV Linux) problem where a guest (or dom0) either ³goes out to > lunch² for a long period, or some other timer gets stuck. The ³clocksource > tsc unstable² message is a side effect of this... it¹s very likely the TSC > that IS stable and correct and the other clocksource (pvclock) has lost/gained > 50 minutes! > > Mark Adams cc¹ed and his original xen-devel posting below. The fact that two > different users (possibly on the same processor/system type?) have submitted > the message with a delta so similar would lead me to believe there is some > timer that is ³wrapping². And since pvclock is usually the clocksource for > dom0, and pvclock is driven! by Xen¹s ³system time², a reasonable guess is > that the timer that is wrapping is in Xen itself. > > Mark¹s delta = -2999660303788 ns > Your delta = -2999660334211 ns > > Googling, I see the HPET wraparound is ~306 seconds and this delta is about > 3000 seconds, so that may be a bad guess. > > Keir, any thoughts on this? Do you recall any post-4.0 patches that may have > fixed this?I''ve never seen a 3000s wrap, and I don''t know of anything that would have fixed a bug like this. If this is a Xen time wrap of some kind then it would affect all running guests; it''s not clear here whether only one, or all, guests see the wrap. K.> Thanks, > Dan > > References: > http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html > https://lkml.org/lkml/2010/10/26/126 > > > From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] > Sent: Wednesday, February 23, 2011 3:50 AM > To: xen-devel@lists.xensource.co! m; Xen Users > Subject: [Xen-devel] Xen 4 TSC problems > > > Hello > > > > I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze release). > > > > My problem is here (hopefully I amn''t the only one, so there might be a bug > somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 > > After some times, I got this error : Clocksource tsc unstable (delta > -2999660334211 ns). It has happened on several servers. > > > > Looking at the output of "xm debug-key s;" > > > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2850 > (count=3) > > > > I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the > "constant_tsc", but not the "nonstop_tsc" one. > > On other systems with a newer cpu with "nonstop_tsc", I don''t have this issue > (systems are running the same distros with same config). > > > > I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t > reliable and after some times, I will got the "50min" issue again. > > > > I don''t unders! tand how a system can do a jump of "50min" in the future. Why > 50min ? it is not 40min, not 1 hour, it is always 50min. > > I don''t know how to make my TSC "reliable" (I already disable everything about > Powerstate in BIOS Settings). > > > > Any ideas ? > > > > Regards > > > > Olivier >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I am sorry for the lack of information. Every domUs on the dom0 are affected by this bug at the exact same time. And I had this bug on a dozen servers (all running on the same hw) since October (when I switched from Xen 3.2 to 4.0). Regards Olivier Le 23/02/2011 18:19, Keir Fraser a écrit :> On 23/02/2011 16:16, "Dan Magenheimer"<dan.magenheimer@oracle.com> wrote: > >> It¹s very unlikely this is a problem with TSC. It is most likely a Xen (or >> possibly a PV Linux) problem where a guest (or dom0) either ³goes out to >> lunch² for a long period, or some other timer gets stuck. The ³clocksource >> tsc unstable² message is a side effect of this... it¹s very likely the TSC >> that IS stable and correct and the other clocksource (pvclock) has lost/gained >> 50 minutes! >> >> Mark Adams cc¹ed and his original xen-devel posting below. The fact that two >> different users (possibly on the same processor/system type?) have submitted >> the message with a delta so similar would lead me to believe there is some >> timer that is ³wrapping². And since pvclock is usually the clocksource for >> dom0, and pvclock is driven! by Xen¹s ³system time², a reasonable guess is >> that the timer that is wrapping is in Xen itself. >> >> Mark¹s delta = -2999660303788 ns >> Your delta = -2999660334211 ns >> >> Googling, I see the HPET wraparound is ~306 seconds and this delta is about >> 3000 seconds, so that may be a bad guess. >> >> Keir, any thoughts on this? Do you recall any post-4.0 patches that may have >> fixed this? > I''ve never seen a 3000s wrap, and I don''t know of anything that would have > fixed a bug like this. If this is a Xen time wrap of some kind then it would > affect all running guests; it''s not clear here whether only one, or all, > guests see the wrap. > > K. > >> Thanks, >> Dan >> >> References: >> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html >> https://lkml.org/lkml/2010/10/26/126 >> >> >> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] >> Sent: Wednesday, February 23, 2011 3:50 AM >> To: xen-devel@lists.xensource.co! m; Xen Users >> Subject: [Xen-devel] Xen 4 TSC problems >> >> >> Hello >> >> >> >> I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze release). >> >> >> >> My problem is here (hopefully I amn''t the only one, so there might be a bug >> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 >> >> After some times, I got this error : Clocksource tsc unstable (delta >> -2999660334211 ns). It has happened on several servers. >> >> >> >> Looking at the output of "xm debug-key s;" >> >> >> >> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2850 >> (count=3) >> >> >> >> I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the >> "constant_tsc", but not the "nonstop_tsc" one. >> >> On other systems with a newer cpu with "nonstop_tsc", I don''t have this issue >> (systems are running the same distros with same config). >> >> >> >> I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t >> reliable and after some times, I will got the "50min" issue again. >> >> >> >> I don''t unders! tand how a system can do a jump of "50min" in the future. Why >> 50min ? it is not 40min, not 1 hour, it is always 50min. >> >> I don''t know how to make my TSC "reliable" (I already disable everything about >> Powerstate in BIOS Settings). >> >> >> >> Any ideas ? >> >> >> >> Regards >> >> >> >> Olivier >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well would be interesting, if you still have it installed on any of these machines. -- Keir On 23/02/2011 19:04, "Olivier Hanesse" <olivier.hanesse@gmail.com> wrote:> I am sorry for the lack of information. > Every domUs on the dom0 are affected by this bug at the exact same time. > > And I had this bug on a dozen servers (all running on the same hw) since > October (when I switched from Xen 3.2 to 4.0). > > Regards > > Olivier > > Le 23/02/2011 18:19, Keir Fraser a écrit : >> On 23/02/2011 16:16, "Dan Magenheimer"<dan.magenheimer@oracle.com> wrote: >> >>> It¹s very unlikely this is a problem with TSC. It is most likely a Xen (or >>> possibly a PV Linux) problem where a guest (or dom0) either ³goes out to >>> lunch² for a long period, or some other timer gets stuck. The ³clocksource >>> tsc unstable² message is a side effect of this... it¹s very likely the TSC >>> that IS stable and correct and the other clocksource (pvclock) has >>> lost/gained >>> 50 minutes! >>> >>> Mark Adams cc¹ed and his original xen-devel posting below. The fact that >>> two >>> different users (possibly on the same processor/system type?) have submitted >>> the message with a delta so similar would lead me to believe there is some >>> timer that is ³wrapping². And since pvclock is usually the clocksource for >>> dom0, and pvclock is driven! by Xen¹s ³system time², a reasonable guess is >>> that the timer that is wrapping is in Xen itself. >>> >>> Mark¹s delta = -2999660303788 ns >>> Your delta = -2999660334211 ns >>> >>> Googling, I see the HPET wraparound is ~306 seconds and this delta is about >>> 3000 seconds, so that may be a bad guess. >>> >>> Keir, any thoughts on this? Do you recall any post-4.0 patches that may >>> have >>> fixed this? >> I''ve never seen a 3000s wrap, and I don''t know of anything that would have >> fixed a bug like this. If this is a Xen time wrap of some kind then it would >> affect all running guests; it''s not clear here whether only one, or all, >> guests see the wrap. >> >> K. >> >>> Thanks, >>> Dan >>> >>> References: >>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html >>> https://lkml.org/lkml/2010/10/26/126 >>> >>> >>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] >>> Sent: Wednesday, February 23, 2011 3:50 AM >>> To: xen-devel@lists.xensource.co! m; Xen Users >>> Subject: [Xen-devel] Xen 4 TSC problems >>> >>> >>> Hello >>> >>> >>> >>> I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze release). >>> >>> >>> >>> My problem is here (hopefully I amn''t the only one, so there might be a bug >>> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 >>> >>> After some times, I got this error : Clocksource tsc unstable (delta >>> -2999660334211 ns). It has happened on several servers. >>> >>> >>> >>> Looking at the output of "xm debug-key s;" >>> >>> >>> >>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, >>> warp=2850 >>> (count=3) >>> >>> >>> >>> I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the >>> "constant_tsc", but not the "nonstop_tsc" one. >>> >>> On other systems with a newer cpu with "nonstop_tsc", I don''t have this >>> issue >>> (systems are running the same distros with same config). >>> >>> >>> >>> I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t >>> reliable and after some times, I will got the "50min" issue again. >>> >>> >>> >>> I don''t unders! tand how a system can do a jump of "50min" in the future. >>> Why >>> 50min ? it is not 40min, not 1 hour, it is always 50min. >>> >>> I don''t know how to make my TSC "reliable" (I already disable everything >>> about >>> Powerstate in BIOS Settings). >>> >>> >>> >>> Any ideas ? >>> >>> >>> >>> Regards >>> >>> >>> >>> Olivier >>> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xm dmesg : (XEN) Xen version 4.0.1 (Debian 4.0.1-2) (waldi@debian.org) (gcc version 4.4.5 (Debian 4.4.5-10) ) Wed Jan 12 14:04:06 UTC 2011 (XEN) Bootloader: GNU GRUB 0.97 (XEN) Command line: dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: none; EDID transfer time: 2 seconds (XEN) EDID info not retrieved because no DDC retrieval method detected (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 2 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009ac00 (usable) (XEN) 000000000009ac00 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000bffc7980 (usable) (XEN) 00000000bffc7980 - 00000000bffcee80 (ACPI data) (XEN) 00000000bffcee80 - 00000000c0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fec00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 00000002c0000000 (usable) (XEN) ACPI: RSDP 000FDFD0, 0024 (r2 IBM ) (XEN) ACPI: XSDT BFFCED40, 0054 (r1 IBM SERDEFNT 1000 IBM 45444F43) (XEN) ACPI: FACP BFFCEC80, 0084 (r2 IBM SERDEFNT 1000 IBM 45444F43) (XEN) ACPI: DSDT BFFC7980, 2EDA (r2 IBM SERDEFNT 1000 INTL 20041203) (XEN) ACPI: FACS BFFCAB00, 0040 (XEN) ACPI: APIC BFFCEB80, 00BC (r1 IBM SERDEFNT 1000 IBM 45444F43) (XEN) ACPI: SRAT BFFCEA00, 0128 (r1 IBM SERDEFNT 1000 IBM 45444F43) (XEN) ACPI: HPET BFFCE9C0, 0038 (r1 IBM SERDEFNT 1000 IBM 45444F43) (XEN) ACPI: MCFG BFFCE980, 003C (r1 IBM SERDEFNT 1000 IBM 45444F43) (XEN) ACPI: ERST BFFCAB40, 0230 (r1 IBM SERDEFNT 1000 IBM 45444F43) (XEN) System RAM: 10239MB (10485124kB) (XEN) SRAT: PXM 0 -> APIC 0 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 1 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 2 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 3 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 4 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 5 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 6 -> Node 0 (XEN) SRAT: PXM 0 -> APIC 7 -> Node 0 (XEN) SRAT: Node 0 PXM 0 0-c0000000 (XEN) SRAT: Node 0 PXM 0 100000000-2c0000000 (XEN) SRAT: hot plug zone found 2c0000000 - 1000000000 (XEN) SRAT: Node 0 PXM 0 2c0000000-1000000000 (XEN) NUMA: Allocated memnodemap from 2bfdfe000 - 2bfdff000 (XEN) NUMA: Using 18 for the hash shift. (XEN) Domain heap initialised (XEN) found SMP MP-table at 0009ad40 (XEN) DMI 2.4 present. (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x588 (XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[584,0], pm1x_evt[580,0] (XEN) ACPI: wakeup_vec[bffcab0c], vec_size[20] (XEN) ACPI: Local APIC address 0xfee00000 (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) (XEN) Processor #0 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) (XEN) Processor #1 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) (XEN) Processor #2 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) (XEN) Processor #3 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled) (XEN) Processor #4 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled) (XEN) Processor #5 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) (XEN) Processor #6 7:7 APIC version 20 (XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) (XEN) Processor #7 7:7 APIC version 20 (XEN) ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1]) (XEN) ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1]) (XEN) ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 14, version 32, address 0xfec00000, GSI 0-23 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) ACPI: IRQ9 used by override. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000 (XEN) PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 20 (XEN) PCI: MCFG area at e0000000 reserved in E820 (XEN) Using ACPI (MADT) for SMP configuration information (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2493.798 MHz processor. (XEN) Initing memory sharing. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) HVM: ASIDs disabled. (XEN) HVM: VMX enabled (XEN) Intel machine check reporting enabled (XEN) I/O virtualisation disabled (XEN) Total of 8 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) checking TSC synchronization across 8 CPUs: passed. (XEN) Platform timer is 14.318MHz HPET (XEN) Allocated console ring of 64 KiB. (XEN) microcode.c:73:d32767 microcode: CPU2 resumed (XEN) microcode.c:73:d32767 microcode: CPU1 resumed (XEN) microcode.c:73:d32767 microcode: CPU3 resumed (XEN) Brought up 8 CPUs (XEN) microcode.c:73:d32767 microcode: CPU4 resumed (XEN) microcode.c:73:d32767 microcode: CPU5 resumed (XEN) microcode.c:73:d32767 microcode: CPU6 resumed (XEN) microcode.c:73:d32767 microcode: CPU7 resumed (XEN) HPET: 3 timers in total, 0 timers will be used for broadcast (XEN) ACPI sleep modes: S3 (XEN) mcheck_poll: Machine check polling timer started. (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x16b2000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 00000002b4000000->00000002b8000000 (114688 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff816b2000 (XEN) Init. ramdisk: ffffffff816b2000->ffffffff82e05400 (XEN) Phys-Mach map: ffffffff82e06000->ffffffff82f06000 (XEN) Start info: ffffffff82f06000->ffffffff82f064b4 (XEN) Page tables: ffffffff82f07000->ffffffff82f22000 (XEN) Boot stack: ffffffff82f22000->ffffffff82f23000 (XEN) TOTAL: ffffffff80000000->ffffffff83000000 (XEN) ENTRY ADDRESS: ffffffff81502200 (XEN) Dom0 has maximum 1 VCPUs (XEN) Scrubbing Free RAM: ................................................................................................done. (XEN) trace.c:89:d32767 calc_tinfo_first_offset: NR_CPUs 128, offset_in_bytes 258, t_info_first_offset 65 (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen) (XEN) Freed 176kB init memory. (XEN) PCI add device 00:00.0 (XEN) PCI add device 00:02.0 (XEN) PCI add device 00:03.0 (XEN) PCI add device 00:04.0 (XEN) PCI add device 00:05.0 (XEN) PCI add device 00:06.0 (XEN) PCI add device 00:07.0 (XEN) PCI add device 00:08.0 (XEN) PCI add device 00:10.0 (XEN) PCI add device 00:10.1 (XEN) PCI add device 00:10.2 (XEN) PCI add device 00:11.0 (XEN) PCI add device 00:13.0 (XEN) PCI add device 00:15.0 (XEN) PCI add device 00:16.0 (XEN) PCI add device 00:1c.0 (XEN) PCI add device 00:1d.0 (XEN) PCI add device 00:1d.1 (XEN) PCI add device 00:1d.2 (XEN) PCI add device 00:1d.7 (XEN) PCI add device 00:1e.0 (XEN) PCI add device 00:1f.0 (XEN) PCI add device 00:1f.1 (XEN) PCI add device 00:1f.3 (XEN) PCI add device 10:00.0 (XEN) PCI add device 10:00.3 (XEN) PCI add device 11:00.0 (XEN) PCI add device 11:01.0 (XEN) PCI add device 07:00.0 (XEN) PCI add device 07:00.1 (XEN) PCI add device 03:00.0 (XEN) PCI add device 04:00.0 (XEN) PCI add device 02:00.0 (XEN) PCI add device 05:00.0 (XEN) PCI add device 06:00.0 (XEN) PCI add device 01:01.0 When the issue append : (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. Output of xm debug-key s : (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=2684 (count=4) (XEN) dom1: mode=0,ofs=0xa8dcbfb9a,khz=2493798,inc=1,vtsc count: 1756100739 kernel, 20526533 user (XEN) dom2: mode=0,ofs=0xc257d49df,khz=2493798,inc=1,vtsc count: 900668266 kernel, 30618121 user (XEN) dom3: mode=0,ofs=0xdb1299744,khz=2493798,inc=1,vtsc count: 16656509047 kernel, 709406217 user (XEN) dom4: mode=0,ofs=0xf8627e616,khz=2493798,inc=1,vtsc count: 1174828915 kernel, 194957775 user (XEN) dom5: mode=0,ofs=0x115a0f2a67,khz=2493798,inc=1,vtsc count: 332007967 kernel, 5766769 user (XEN) dom6: mode=0,ofs=0x13bf462f38,khz=2493798,inc=1,vtsc count: 3137076938 kernel, 1076320679 user (XEN) dom10: mode=0,ofs=0x1b99e41f4b,khz=2493798,inc=1,vtsc count: 411433049 kernel, 19532319 user (XEN) dom11: mode=0,ofs=0x1e4991cf40,khz=2493798,inc=1,vtsc count: 415406148 kernel, 19223482 user (XEN) dom12: mode=0,ofs=0x1fe8c10600,khz=2493798,inc=1,vtsc count: 1012850399 kernel, 63603352 user (XEN) dom13: mode=0,ofs=0x21ef9b9531,khz=2493798,inc=1,vtsc count: 813097186 kernel, 27536004 user (XEN) dom14: mode=0,ofs=0x23f5b4e429,khz=2493798,inc=1,vtsc count: 2461059718 kernel, 48182776 user (XEN) dom18: mode=0,ofs=0x2bdc302048,khz=2493798,inc=1,vtsc count: 624333824 kernel, 5166805 user (XEN) dom19: mode=0,ofs=0x2e67227085,khz=2493798,inc=1,vtsc count: 1037952789 kernel, 5778635 user (XEN) dom20: mode=0,ofs=0x562ce020eea4,khz=2493798,inc=1,vtsc count: 643491360 kernel, 31771029 user (XEN) dom21: mode=0,ofs=0x563a017eea82,khz=2493798,inc=1,vtsc count: 715148727 kernel, 24430809 user (XEN) dom25: mode=0,ofs=0x1d0c5230cdfad,khz=2493798,inc=1,vtsc count: 2103227324 kernel, 656635140 user (XEN) dom27: mode=0,ofs=0x1d868b8c1fbbf,khz=2493798,inc=1,vtsc count: 476542178 kernel, 12976786 user (XEN) dom31: mode=0,ofs=0x1dc08da161ebc,khz=2493798,inc=1,vtsc count: 2747233178 kernel, 466863700 user (XEN) dom32: mode=0,ofs=0x1ecde6eb53d2c,khz=2493798,inc=1,vtsc count: 305360096 kernel, 11705823 user (XEN) dom33: mode=0,ofs=0x1ece1bf734f61,khz=2493798,inc=1,vtsc count: 516548852 kernel, 18662125 user Output of xm debug-key t : (XEN) Synced stime skew: max=1405ns avg=1405ns samples=1 current=1405ns (XEN) Synced cycles skew: max=2377 avg=2377 samples=1 current=2377 Output of /proc/cpuinfo : processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU L5420 @ 2.50GHz stepping : 6 cpu MHz : 2493.798 cache size : 6144 KB fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc up rep_good aperfmperf pni est ssse3 cx16 sse4_1 hypervisor lahf_lm bogomips : 4987.59 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: Output of xm info : release : 2.6.32-bpo.5-xen-amd64 version : #1 SMP Mon Jan 17 22:05:11 UTC 2011 machine : x86_64 nr_cpus : 8 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2493 hw_caps : bfebfbff:20000800:00000000:00000940:000ce3bd:00000000:00000001:00000000 virt_caps : hvm total_memory : 10239 free_memory : 910 node_to_cpu : node0:0-7 node_to_memory : node0:910 node_to_dma32_mem : node0:910 max_node_id : 0 xen_major : 4 xen_minor : 0 xen_extra : .1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : dom0_mem=512M loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 cc_compiler : gcc version 4.4.5 (Debian 4.4.5-10) cc_compile_by : waldi cc_compile_domain : debian.org cc_compile_date : Wed Jan 12 14:04:06 UTC 2011 xend_config_format : 4 in dom0 /var/log/kern.log : Feb 23 22:40:54 dom0 kernel: [995452.618519] Clocksource tsc unstable (delta = -2999660335950 ns) in domU, I don''t see any logs, the time just "jumps" 50min in the future (see /var/log/daemon.log) Feb 23 21:50:51 domU snmpd[1037]: Connection from UDP: [10.16.2.101]:58303 Feb 23 22:40:55 domU snmpd[1037]: Connection from UDP: [10.16.2.101]:45713 Clocksource is set to "xen" to both dom0 et domU : cat /sys/devices/system/clocksource/clocksource0/current_clocksource Regards Olivier 2011/2/24 Keir Fraser <keir.xen@gmail.com>> Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well > would be interesting, if you still have it installed on any of these > machines. > > -- Keir > > On 23/02/2011 19:04, "Olivier Hanesse" <olivier.hanesse@gmail.com> wrote: > > > I am sorry for the lack of information. > > Every domUs on the dom0 are affected by this bug at the exact same time. > > > > And I had this bug on a dozen servers (all running on the same hw) since > > October (when I switched from Xen 3.2 to 4.0). > > > > Regards > > > > Olivier > > > > Le 23/02/2011 18:19, Keir Fraser a écrit : > >> On 23/02/2011 16:16, "Dan Magenheimer"<dan.magenheimer@oracle.com> > wrote: > >> > >>> It¹s very unlikely this is a problem with TSC. It is most likely a Xen > (or > >>> possibly a PV Linux) problem where a guest (or dom0) either ³goes out > to > >>> lunch² for a long period, or some other timer gets stuck. The > ³clocksource > >>> tsc unstable² message is a side effect of this... it¹s very likely the > TSC > >>> that IS stable and correct and the other clocksource (pvclock) has > >>> lost/gained > >>> 50 minutes! > >>> > >>> Mark Adams cc¹ed and his original xen-devel posting below. The fact > that > >>> two > >>> different users (possibly on the same processor/system type?) have > submitted > >>> the message with a delta so similar would lead me to believe there is > some > >>> timer that is ³wrapping². And since pvclock is usually the clocksource > for > >>> dom0, and pvclock is driven! by Xen¹s ³system time², a reasonable > guess is > >>> that the timer that is wrapping is in Xen itself. > >>> > >>> Mark¹s delta = -2999660303788 ns > >>> Your delta = -2999660334211 ns > >>> > >>> Googling, I see the HPET wraparound is ~306 seconds and this delta is > about > >>> 3000 seconds, so that may be a bad guess. > >>> > >>> Keir, any thoughts on this? Do you recall any post-4.0 patches that > may > >>> have > >>> fixed this? > >> I''ve never seen a 3000s wrap, and I don''t know of anything that would > have > >> fixed a bug like this. If this is a Xen time wrap of some kind then it > would > >> affect all running guests; it''s not clear here whether only one, or all, > >> guests see the wrap. > >> > >> K. > >> > >>> Thanks, > >>> Dan > >>> > >>> References: > >>> > http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00210.html > >>> https://lkml.org/lkml/2010/10/26/126 > >>> > >>> > >>> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] > >>> Sent: Wednesday, February 23, 2011 3:50 AM > >>> To: xen-devel@lists.xensource.co! m; Xen Users > >>> Subject: [Xen-devel] Xen 4 TSC problems > >>> > >>> > >>> Hello > >>> > >>> > >>> > >>> I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze > release). > >>> > >>> > >>> > >>> My problem is here (hopefully I amn''t the only one, so there might be a > bug > >>> somewhere) : > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 > >>> > >>> After some times, I got this error : Clocksource tsc unstable (delta > >>> -2999660334211 ns). It has happened on several servers. > >>> > >>> > >>> > >>> Looking at the output of "xm debug-key s;" > >>> > >>> > >>> > >>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > >>> warp=2850 > >>> (count=3) > >>> > >>> > >>> > >>> I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the > >>> "constant_tsc", but not the "nonstop_tsc" one. > >>> > >>> On other systems with a newer cpu with "nonstop_tsc", I don''t have this > >>> issue > >>> (systems are running the same distros with same config). > >>> > >>> > >>> > >>> I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t > >>> reliable and after some times, I will got the "50min" issue again. > >>> > >>> > >>> > >>> I don''t unders! tand how a system can do a jump of "50min" in the > future. > >>> Why > >>> 50min ? it is not 40min, not 1 hour, it is always 50min. > >>> > >>> I don''t know how to make my TSC "reliable" (I already disable > everything > >>> about > >>> Powerstate in BIOS Settings). > >>> > >>> > >>> > >>> Any ideas ? > >>> > >>> > >>> > >>> Regards > >>> > >>> > >>> > >>> Olivier > >>> > >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 24.02.11 at 10:59, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > > Output of xm debug-key s : > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > warp=2684 (count=4)Did you try turning of use of C states ("cpuidle=0" on the Xen command line)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24/02/2011 10:59, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 24.02.11 at 10:59, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: >> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. >> >> Output of xm debug-key s : >> >> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, >> warp=2684 (count=4) > > Did you try turning of use of C states ("cpuidle=0" on the Xen > command line)?Another thing to try is changing the platform timer that Xen uses. It''s using HPET on your machines, so try clocksource=pit on Xen command line, and confirm that the ''Platform timer is xxx'' message changes in xm dmesg. However, this bug looks more like a CPU''s TSC jumping forward (or maybe backward) for some inexplicable reason. We added code post 3.2 to detect the platform timer counter wrapping, and to account for that based on trusting the CPU''s 64-bit TSC. But if the TSC value is bogus then we can detect a wrap when it didn''t happen and the new code will do more harm than good. It is not currently possible to disable the code via a boto parameter -- maybe we could add that. However, if the problem is a jumpy TSC then it is better to fix that as Xen relies so heavily on TSC for time handling. -- Keir> Jan >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jan : I tried to turn off cstates with max_cstate=0 without success (still "not reliable"). With cpuidle=0, I also got : (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=3022 (count=1) xm info | grep command xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all guest_loglvl=all dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 Keir : Using clocksource=pit : (XEN) Platform timer is 1.193MHz PIT I also got : (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=3262 (count=2) 2011/2/24 Keir Fraser <keir@xen.org>> On 24/02/2011 10:59, "Jan Beulich" <JBeulich@novell.com> wrote: > > >>>> On 24.02.11 at 10:59, Olivier Hanesse <olivier.hanesse@gmail.com> > wrote: > >> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more > times. > >> > >> Output of xm debug-key s : > >> > >> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > >> warp=2684 (count=4) > > > > Did you try turning of use of C states ("cpuidle=0" on the Xen > > command line)? > > Another thing to try is changing the platform timer that Xen uses. It''s > using HPET on your machines, so try clocksource=pit on Xen command line, > and > confirm that the ''Platform timer is xxx'' message changes in xm dmesg. > > However, this bug looks more like a CPU''s TSC jumping forward (or maybe > backward) for some inexplicable reason. We added code post 3.2 to detect > the > platform timer counter wrapping, and to account for that based on trusting > the CPU''s 64-bit TSC. But if the TSC value is bogus then we can detect a > wrap when it didn''t happen and the new code will do more harm than good. It > is not currently possible to disable the code via a boto parameter -- maybe > we could add that. However, if the problem is a jumpy TSC then it is better > to fix that as Xen relies so heavily on TSC for time handling. > > -- Keir > > > Jan > > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: > I tried to turn off cstates with max_cstate=0 without success (still "not > reliable"). > > With cpuidle=0, I also got : > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > warp=3022 (count=1)This message by itself isn''t telling much I believe.> xm info | grep command > xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all guest_loglvl=all > dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 > > Keir : > > Using clocksource=pit : > > (XEN) Platform timer is 1.193MHz PIT > > I also got : > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > warp=3262 (count=2)The question is whether any of this eliminates the time jumps seen by your DomU-s (from your past mails I wasn''t actually sure whether Dom0 also experienced this problem, albeit it would be odd if it didn''t). Jan Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Both dom0 and domUs are affected by this" jump". I expect to see something like "TSC marked as reliable, warp = 0". I got this on newer hardware with same config/distros. Is there a way to measure if it is a TSC warp ? to point out a cpu tsc issue ? 2011/2/24 Jan Beulich <JBeulich@novell.com>> >>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> > wrote: > > I tried to turn off cstates with max_cstate=0 without success (still "not > > reliable"). > > > > With cpuidle=0, I also got : > > > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > > warp=3022 (count=1) > > This message by itself isn''t telling much I believe. > > > xm info | grep command > > xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all > guest_loglvl=all > > dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 > > > > Keir : > > > > Using clocksource=pit : > > > > (XEN) Platform timer is 1.193MHz PIT > > > > I also got : > > > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > > warp=3262 (count=2) > > The question is whether any of this eliminates the time jumps seen > by your DomU-s (from your past mails I wasn''t actually sure whether > Dom0 also experienced this problem, albeit it would be odd if it didn''t). > > Jan > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> wrote:> Both dom0 and domUs are affected by this" jump". > > I expect to see something like "TSC marked as reliable, warp = 0". > I got this on newer hardware with same config/distros.It depends on the CPU itself, older CPUs do not have the super-stable TSC features. But that should never cause a massive 3000s time jump.> Is there a way to measure if it is a TSC warp ? to point out a cpu tsc issue ?The TSC warps or out-of-sync issues that we could reasonably expect would be on the order of microseconds. A 3000s warp is something else entirely. Xen is very confused and/or some TSC or platform timer has jumped a long way (indicating a hardware/firmware issue). -- Keir> > 2011/2/24 Jan Beulich <JBeulich@novell.com> >>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: >>> I tried to turn off cstates with max_cstate=0 without success (still "not >>> reliable"). >>> >>> With cpuidle=0, I also got : >>> >>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, >>> warp=3022 (count=1) >> >> This message by itself isn''t telling much I believe. >> >>> xm info | grep command >>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all guest_loglvl=all >>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 >>> >>> Keir : >>> >>> Using clocksource=pit : >>> >>> (XEN) Platform timer is 1.193MHz PIT >>> >>> I also got : >>> >>> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, >>> warp=3262 (count=2) >> >> The question is whether any of this eliminates the time jumps seen >> by your DomU-s (from your past mails I wasn''t actually sure whether >> Dom0 also experienced this problem, albeit it would be odd if it didn''t). >> >> Jan >> >> Jan >> > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Just a wild guess, but this in Olivier''s posted output: (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. and the fact that a 32-bit HPET wrap is ~300 seconds and, with the "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue (or a complete red herring, but I thought it worth mentioning). Mark and Olivier, it would be interesting to know if you are using the same processor/system.> -----Original Message----- > From: Keir Fraser [mailto:keir.xen@gmail.com] > Sent: Thursday, February 24, 2011 7:52 AM > To: Olivier Hanesse; Jan Beulich > Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen > Users; Dan Magenheimer; Keir Fraser > Subject: Re: [Xen-devel] Xen 4 TSC problems > > On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> > wrote: > > > Both dom0 and domUs are affected by this" jump". > > > > I expect to see something like "TSC marked as reliable, warp = 0". > > I got this on newer hardware with same config/distros. > > It depends on the CPU itself, older CPUs do not have the super-stable > TSC > features. But that should never cause a massive 3000s time jump. > > > Is there a way to measure if it is a TSC warp ? to point out a cpu > tsc issue ? > > The TSC warps or out-of-sync issues that we could reasonably expect > would be > on the order of microseconds. A 3000s warp is something else entirely. > Xen > is very confused and/or some TSC or platform timer has jumped a long > way > (indicating a hardware/firmware issue). > > -- Keir > > > > > 2011/2/24 Jan Beulich <JBeulich@novell.com> > >>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> > wrote: > >>> I tried to turn off cstates with max_cstate=0 without success > (still "not > >>> reliable"). > >>> > >>> With cpuidle=0, I also got : > >>> > >>> (XEN) TSC has constant rate, deep Cstates possible, so not > reliable, > >>> warp=3022 (count=1) > >> > >> This message by itself isn''t telling much I believe. > >> > >>> xm info | grep command > >>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all > guest_loglvl=all > >>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 > >>> > >>> Keir : > >>> > >>> Using clocksource=pit : > >>> > >>> (XEN) Platform timer is 1.193MHz PIT > >>> > >>> I also got : > >>> > >>> (XEN) TSC has constant rate, deep Cstates possible, so not > reliable, > >>> warp=3262 (count=2) > >> > >> The question is whether any of this eliminates the time jumps seen > >> by your DomU-s (from your past mails I wasn''t actually sure whether > >> Dom0 also experienced this problem, albeit it would be odd if it > didn''t). > >> > >> Jan > >> > >> Jan > >> > > > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark is running with a E5620 Xeon processor. I got a L5420. What is very strange is that this jump is always 50min, not more, not less. And we are not alone with Mark to have this issue. So it might have an explanation somewhere (bad counter, overflow, bug or somethings). So maybe this 300 seconds * 10 is a lead. Another point, what is the number "warp" really means, in the output of "xm debug-key -s". Should I monitor this number ? Maybe I could predict a jump by watching this value ? 2011/2/24 Dan Magenheimer <dan.magenheimer@oracle.com>> Just a wild guess, but this in Olivier''s posted output: > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > (or a complete red herring, but I thought it worth mentioning). > > Mark and Olivier, it would be interesting to know if you are > using the same processor/system. > > > -----Original Message----- > > From: Keir Fraser [mailto:keir.xen@gmail.com] > > Sent: Thursday, February 24, 2011 7:52 AM > > To: Olivier Hanesse; Jan Beulich > > Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen > > Users; Dan Magenheimer; Keir Fraser > > Subject: Re: [Xen-devel] Xen 4 TSC problems > > > > On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> > > wrote: > > > > > Both dom0 and domUs are affected by this" jump". > > > > > > I expect to see something like "TSC marked as reliable, warp = 0". > > > I got this on newer hardware with same config/distros. > > > > It depends on the CPU itself, older CPUs do not have the super-stable > > TSC > > features. But that should never cause a massive 3000s time jump. > > > > > Is there a way to measure if it is a TSC warp ? to point out a cpu > > tsc issue ? > > > > The TSC warps or out-of-sync issues that we could reasonably expect > > would be > > on the order of microseconds. A 3000s warp is something else entirely. > > Xen > > is very confused and/or some TSC or platform timer has jumped a long > > way > > (indicating a hardware/firmware issue). > > > > -- Keir > > > > > > > > 2011/2/24 Jan Beulich <JBeulich@novell.com> > > >>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> > > wrote: > > >>> I tried to turn off cstates with max_cstate=0 without success > > (still "not > > >>> reliable"). > > >>> > > >>> With cpuidle=0, I also got : > > >>> > > >>> (XEN) TSC has constant rate, deep Cstates possible, so not > > reliable, > > >>> warp=3022 (count=1) > > >> > > >> This message by itself isn''t telling much I believe. > > >> > > >>> xm info | grep command > > >>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all > > guest_loglvl=all > > >>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 > > >>> > > >>> Keir : > > >>> > > >>> Using clocksource=pit : > > >>> > > >>> (XEN) Platform timer is 1.193MHz PIT > > >>> > > >>> I also got : > > >>> > > >>> (XEN) TSC has constant rate, deep Cstates possible, so not > > reliable, > > >>> warp=3262 (count=2) > > >> > > >> The question is whether any of this eliminates the time jumps seen > > >> by your DomU-s (from your past mails I wasn''t actually sure whether > > >> Dom0 also experienced this problem, albeit it would be odd if it > > didn''t). > > >> > > >> Jan > > >> > > >> Jan > > >> > > > > > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/24/2011 09:43 AM, Dan Magenheimer wrote:> Just a wild guess, but this in Olivier''s posted output: > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > (or a complete red herring, but I thought it worth mentioning). > > Mark and Olivier, it would be interesting to know if you are > using the same processor/system.It definitely seems like some kind of problem on the host system rather than anything in the guests themselves. If the platform timer is misbehaving, then Xen could be completely screwing up the pvclock calibration which it then passes to guests. Could it be one of those "platform clock stops in certain power states" problems? J>> -----Original Message----- >> From: Keir Fraser [mailto:keir.xen@gmail.com] >> Sent: Thursday, February 24, 2011 7:52 AM >> To: Olivier Hanesse; Jan Beulich >> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen >> Users; Dan Magenheimer; Keir Fraser >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> >> On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> >> wrote: >> >>> Both dom0 and domUs are affected by this" jump". >>> >>> I expect to see something like "TSC marked as reliable, warp = 0". >>> I got this on newer hardware with same config/distros. >> It depends on the CPU itself, older CPUs do not have the super-stable >> TSC >> features. But that should never cause a massive 3000s time jump. >> >>> Is there a way to measure if it is a TSC warp ? to point out a cpu >> tsc issue ? >> >> The TSC warps or out-of-sync issues that we could reasonably expect >> would be >> on the order of microseconds. A 3000s warp is something else entirely. >> Xen >> is very confused and/or some TSC or platform timer has jumped a long >> way >> (indicating a hardware/firmware issue). >> >> -- Keir >> >>> 2011/2/24 Jan Beulich <JBeulich@novell.com> >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> >> wrote: >>>>> I tried to turn off cstates with max_cstate=0 without success >> (still "not >>>>> reliable"). >>>>> >>>>> With cpuidle=0, I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3022 (count=1) >>>> This message by itself isn''t telling much I believe. >>>> >>>>> xm info | grep command >>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all >> guest_loglvl=all >>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 >>>>> >>>>> Keir : >>>>> >>>>> Using clocksource=pit : >>>>> >>>>> (XEN) Platform timer is 1.193MHz PIT >>>>> >>>>> I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3262 (count=2) >>>> The question is whether any of this eliminates the time jumps seen >>>> by your DomU-s (from your past mails I wasn''t actually sure whether >>>> Dom0 also experienced this problem, albeit it would be odd if it >> didn''t). >>>> Jan >>>> >>>> Jan >>>> >>> >>_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hello, It happened again twice this weekend. What about setting "tsc_mode=2" for my vms ? Should this mode prevent this bug (coming from a bad emulated tsc due to firmware issue ? is it possible ?) from affecting time in domUs ? Setting clocksource=pit, make ''tsc'' available in "/sys/devices/system/clocksource/clocksource0/available_clocksource" (otherwise only xen is available, is it normal ? ). Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? or will it be worsed ? Regards Olivier 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org>> On 02/24/2011 09:43 AM, Dan Magenheimer wrote: > > Just a wild guess, but this in Olivier''s posted output: > > > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more > times. > > > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > > (or a complete red herring, but I thought it worth mentioning). > > > > Mark and Olivier, it would be interesting to know if you are > > using the same processor/system. > > It definitely seems like some kind of problem on the host system rather > than anything in the guests themselves. If the platform timer is > misbehaving, then Xen could be completely screwing up the pvclock > calibration which it then passes to guests. > > Could it be one of those "platform clock stops in certain power states" > problems? > > J > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.xen@gmail.com] > >> Sent: Thursday, February 24, 2011 7:52 AM > >> To: Olivier Hanesse; Jan Beulich > >> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen > >> Users; Dan Magenheimer; Keir Fraser > >> Subject: Re: [Xen-devel] Xen 4 TSC problems > >> > >> On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> > >> wrote: > >> > >>> Both dom0 and domUs are affected by this" jump". > >>> > >>> I expect to see something like "TSC marked as reliable, warp = 0". > >>> I got this on newer hardware with same config/distros. > >> It depends on the CPU itself, older CPUs do not have the super-stable > >> TSC > >> features. But that should never cause a massive 3000s time jump. > >> > >>> Is there a way to measure if it is a TSC warp ? to point out a cpu > >> tsc issue ? > >> > >> The TSC warps or out-of-sync issues that we could reasonably expect > >> would be > >> on the order of microseconds. A 3000s warp is something else entirely. > >> Xen > >> is very confused and/or some TSC or platform timer has jumped a long > >> way > >> (indicating a hardware/firmware issue). > >> > >> -- Keir > >> > >>> 2011/2/24 Jan Beulich <JBeulich@novell.com> > >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> > >> wrote: > >>>>> I tried to turn off cstates with max_cstate=0 without success > >> (still "not > >>>>> reliable"). > >>>>> > >>>>> With cpuidle=0, I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3022 (count=1) > >>>> This message by itself isn''t telling much I believe. > >>>> > >>>>> xm info | grep command > >>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all > >> guest_loglvl=all > >>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 > >>>>> > >>>>> Keir : > >>>>> > >>>>> Using clocksource=pit : > >>>>> > >>>>> (XEN) Platform timer is 1.193MHz PIT > >>>>> > >>>>> I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3262 (count=2) > >>>> The question is whether any of this eliminates the time jumps seen > >>>> by your DomU-s (from your past mails I wasn''t actually sure whether > >>>> Dom0 also experienced this problem, albeit it would be odd if it > >> didn''t). > >>>> Jan > >>>> > >>>> Jan > >>>> > >>> > >> > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
The message about detecting wrapped platform timer on Xen console indicates a host problem rather than a guest configuration problem. Did you try running long term with changed platform timer source on Xen command line (clocksource=pit), and also cpuidle=0? K. On 28/02/2011 14:37, "Olivier Hanesse" <olivier.hanesse@gmail.com> wrote:> Hello, > > It happened again twice this weekend. > > What about setting "tsc_mode=2" for my vms ? Should this mode prevent this bug > (coming from a bad emulated tsc due to firmware issue ? is it possible ?) from > affecting time in domUs ? > > Setting clocksource=pit, make ''tsc'' available in > "/sys/devices/system/clocksource/clocksource0/available_clocksource" > (otherwise only xen is available, is it normal ? ). > > Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? > or will it be worsed ? > > Regards > > Olivier > > 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org> >> On 02/24/2011 09:43 AM, Dan Magenheimer wrote: >>> Just a wild guess, but this in Olivier''s posted output: >>> >>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. >>> >>> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the >>> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue >>> (or a complete red herring, but I thought it worth mentioning). >>> >>> Mark and Olivier, it would be interesting to know if you are >>> using the same processor/system. >> >> It definitely seems like some kind of problem on the host system rather >> than anything in the guests themselves. If the platform timer is >> misbehaving, then Xen could be completely screwing up the pvclock >> calibration which it then passes to guests. >> >> Could it be one of those "platform clock stops in certain power states" >> problems? >> >> J >> >>>> -----Original Message----- >>>> From: Keir Fraser [mailto:keir.xen@gmail.com] >>>> Sent: Thursday, February 24, 2011 7:52 AM >>>> To: Olivier Hanesse; Jan Beulich >>>> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen >>>> Users; Dan Magenheimer; Keir Fraser >>>> Subject: Re: [Xen-devel] Xen 4 TSC problems >>>> >>>> On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> >>>> wrote: >>>> >>>>> Both dom0 and domUs are affected by this" jump". >>>>> >>>>> I expect to see something like "TSC marked as reliable, warp = 0". >>>>> I got this on newer hardware with same config/distros. >>>> It depends on the CPU itself, older CPUs do not have the super-stable >>>> TSC >>>> features. But that should never cause a massive 3000s time jump. >>>> >>>>> Is there a way to measure if it is a TSC warp ? to point out a cpu >>>> tsc issue ? >>>> >>>> The TSC warps or out-of-sync issues that we could reasonably expect >>>> would be >>>> on the order of microseconds. A 3000s warp is something else entirely. >>>> Xen >>>> is very confused and/or some TSC or platform timer has jumped a long >>>> way >>>> (indicating a hardware/firmware issue). >>>> >>>> -- Keir >>>> >>>>> 2011/2/24 Jan Beulich <JBeulich@novell.com> >>>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> >>>> wrote: >>>>>>> I tried to turn off cstates with max_cstate=0 without success >>>> (still "not >>>>>>> reliable"). >>>>>>> >>>>>>> With cpuidle=0, I also got : >>>>>>> >>>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >>>> reliable, >>>>>>> warp=3022 (count=1) >>>>>> This message by itself isn''t telling much I believe. >>>>>> >>>>>>> xm info | grep command >>>>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all >>>> guest_loglvl=all >>>>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 >>>>>>> >>>>>>> Keir : >>>>>>> >>>>>>> Using clocksource=pit : >>>>>>> >>>>>>> (XEN) Platform timer is 1.193MHz PIT >>>>>>> >>>>>>> I also got : >>>>>>> >>>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >>>> reliable, >>>>>>> warp=3262 (count=2) >>>>>> The question is whether any of this eliminates the time jumps seen >>>>>> by your DomU-s (from your past mails I wasn''t actually sure whether >>>>>> Dom0 also experienced this problem, albeit it would be odd if it >>>> didn''t). >>>>>> Jan >>>>>> >>>>>> Jan >>>>>> >>>>> >>>> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Olivier - It is the Xen clocksource that you want to try to change, not the dom0 clocksource. To do this, you need to specify "clocksource=pit" on the Xen boot line (and reboot), not the dom0 boot line. I believe Mark Adams played with tsc_mode to see if it solved his (similar? identical?) problem last year, and it didn''t make any difference. Please try booting Xen with "clocksource=pit" and ensure that "Platform timer is 1.19MHz PIT" appears in the Xen boot messages. If the 50min jump does not appear again, it would point to a problem in the hpet, either hardware or software. Thanks, Dan From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] Sent: Monday, February 28, 2011 7:37 AM To: Jeremy Fitzhardinge Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; xen-devel@lists.xensource.com; Xen Users; Keir Fraser Subject: Re: [Xen-devel] Xen 4 TSC problems Hello, It happened again twice this weekend. What about setting "tsc_mode=2" for my vms ? Should this mode prevent this bug (coming from a bad emulated tsc due to firmware issue ? is it possible ?) from affecting time in domUs ? Setting clocksource=pit, make ''tsc'' available in "/sys/devices/system/clocksource/clocksource0/available_clocksource" (otherwise only xen is available, is it normal ? ). Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? or will it be worsed ? Regards Olivier 2011/2/24 Jeremy Fitzhardinge <HYPERLINK "mailto:jeremy@goop.org"jeremy@goop.org> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:> Just a wild guess, but this in Olivier''s posted output: > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > (or a complete red herring, but I thought it worth mentioning). > > Mark and Olivier, it would be interesting to know if you are > using the same processor/system.It definitely seems like some kind of problem on the host system rather than anything in the guests themselves. If the platform timer is misbehaving, then Xen could be completely screwing up the pvclock calibration which it then passes to guests. Could it be one of those "platform clock stops in certain power states" problems? J>> -----Original Message----- >> From: Keir Fraser [mailto:HYPERLINK "mailto:keir.xen@gmail.com"keir.xen@gmail.com] >> Sent: Thursday, February 24, 2011 7:52 AM >> To: Olivier Hanesse; Jan Beulich >> Cc: Mark Adams; Jeremy Fitzhardinge; HYPERLINK "mailto:xen-devel@lists.xensource.com"xen-devel@lists.xensource.com; Xen >> Users; Dan Magenheimer; Keir Fraser >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> >> On 24/02/2011 14:20, "Olivier Hanesse" <HYPERLINK "mailto:olivier.hanesse@gmail.com"olivier.hanesse@gmail.com> >> wrote: >> >>> Both dom0 and domUs are affected by this" jump". >>> >>> I expect to see something like "TSC marked as reliable, warp = 0". >>> I got this on newer hardware with same config/distros. >> It depends on the CPU itself, older CPUs do not have the super-stable >> TSC >> features. But that should never cause a massive 3000s time jump. >> >>> Is there a way to measure if it is a TSC warp ? to point out a cpu >> tsc issue ? >> >> The TSC warps or out-of-sync issues that we could reasonably expect >> would be >> on the order of microseconds. A 3000s warp is something else entirely. >> Xen >> is very confused and/or some TSC or platform timer has jumped a long >> way >> (indicating a hardware/firmware issue). >> >> -- Keir >> >>> 2011/2/24 Jan Beulich <HYPERLINK "mailto:JBeulich@novell.com"JBeulich@novell.com> >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <HYPERLINK "mailto:olivier.hanesse@gmail.com"olivier.hanesse@gmail.com> >> wrote: >>>>> I tried to turn off cstates with max_cstate=0 without success >> (still "not >>>>> reliable"). >>>>> >>>>> With cpuidle=0, I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3022 (count=1) >>>> This message by itself isn''t telling much I believe. >>>> >>>>> xm info | grep command >>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all >> guest_loglvl=all >>>>> dom0_max_vcpus=1 dom0_vcpus_pin console=vga,com1 com1=19200,8n1 >>>>> >>>>> Keir : >>>>> >>>>> Using clocksource=pit : >>>>> >>>>> (XEN) Platform timer is 1.193MHz PIT >>>>> >>>>> I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3262 (count=2) >>>> The question is whether any of this eliminates the time jumps seen >>>> by your DomU-s (from your past mails I wasn''t actually sure whether >>>> Dom0 also experienced this problem, albeit it would be odd if it >> didn''t). >>>> Jan >>>> >>>> Jan >>>> >>> >>_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir : Yes, it is "under progress". To make this change, I had to reboot every server, so it is taking time (production server :() So i was hoping to find a quick method to mitigate this issue on domUs while rebooting servers. As this bug happens once or twice per server since October, I can''t say that right now that changing platform timer to PIT fixed it. I have to wait (I hope forever!) this bug to happen again on a ''patched'' server ... But even with clcoksource=pit, I am seeing some warp=3000+ in debug message ? I guess it is not a good sign, is it ? Jan : I was hoping to find a way to make the domU clocksource more "independent" like with xen3.2. 2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>> Hi Olivier – > > > > It is the Xen clocksource that you want to try to change, not the dom0 > clocksource. To do this, you need to specify “clocksource=pit” on the Xen > boot line (and reboot), not the dom0 boot line. > > > > I believe Mark Adams played with tsc_mode to see if it solved! his > (similar? identical?) problem last year, and it didn’t make any difference. > > > Please try booting Xen with “clocksource=pit” and ensure that “Platform > timer is 1.19MHz PIT” appears in the Xen boot messages. If the 50min jump > does not appear again, it would point to a problem in the hpet, either > hardware or software. > > > > Thanks, > > Dan > > > > *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com] > *Sent:* Monday, February 28, 2011 7:37 AM > *To:* Jeremy Fitzhardinge > *Cc:* Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; > xen-devel@lists.xensource.com; Xen Users; Keir Fraser > > *Subject:* Re: [Xen-devel] Xen 4 TSC problems > > > > Hello, > > > > It happened again twice this weekend. > > > > What about setting "tsc_mode=2" for my vms ? Should this mode prevent this > bug (coming from a bad emulated tsc due to firmware issue ? is it possible > ?) from affecting time in domUs ? > > > > Setting clocksource=pit, make ''tsc'' available in > "/sys/devices/system/clocksource/clocksource0/available_clocksource" > (otherwise only xen is available, is it normal ? ). > > > > Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU > ? or will it be worsed ? > > > > Regards > > > > Olivier > > > > 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org> > > On 02/24/2011 09:43 AM, Dan Magenheimer wrote: > > Just a wild guess, but this in Olivier''s posted output: > > > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more > times. > > > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > > (or a complete red herring, but I thought it worth mentioning). > > > > Mark and Olivier, it would be interesting to know if you are > > using the same processor/system. > > It definitely seems like some kind of problem on the host system rather > than anything in the guests themselves. ! If the platform timer is > misbehaving, then Xen could be completely screwing up the pvclock > calibration which it then passes to guests. > > Could it be one of those "platform clock stops in certain power states" > problems? > > > J > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.xen@gmail.com] > >> Sent: Thursday, February 24, 2011 7:52 AM > >> To: Olivier Hanesse; Jan Beulich > >> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen > >> Users; Dan Magenheimer; Keir Fraser > >> Subject: Re: [Xen-devel] Xen 4 TSC problems > >> > >> On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> > >> wrote: > >> > >>> Both dom0 and domUs are affected by this" jump". > >>> > >>> I expect to see something like "TSC marked as reliable, warp = 0". > >>> I got this on newer hardware with same config/distros. > >> It depends on the CPU itself, older CPUs do not have the super-stable > >> TSC > >> features. But that should never cause a massive 3000s time jump. > >> > >>> Is there a way to measure if it is a TSC warp ? to point out a cpu > >> tsc issue ? > >> > >> The TSC warps or out-of-sync issues that we could reasonably expect > >> would be > >> on the order of microseconds. A 3000s warp is something else entirely. > >> Xen > >> is very confused and/or some TSC or platform timer has jumped a long > >> way > >> (indicating a hardware/firmware issue). > >> > >> -- Keir > >> > >>! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com> > > >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> > >> wrote: > >>>>> I tried to turn off cstates with max_cstate=0 without success > >> (still "not > >>>>> reliable"). > >>>>> > >>>>> With cpuidle=0, I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3022 (count=1) > >>>> This message by itself isn''t telling much I believe. > >>>> > >>>>> xm info | grep command > >>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all > >> guest_loglvl=all > >>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1 com1=19200,8n1 > > >>>>> > >>>>> Keir : > >>>>> > >>>>> Using clocksource=pit : > >>>>> > >>>>> (XEN) Platform timer is 1.193MHz PIT > >>>>> > >>>>> I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3262 (count=2) > >>>> The question is whether any of this eliminates the time jumps seen > >>>> by your DomU-s (from your past mails I wasn''t actually sure whether > >>>> Dom0 also experienced this problem, albeit it would be odd if it > >> didn''t). > >>>> Jan > >>>> > >>>> Jan > >>>> > >>> > >> > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Olivier - By "warp=3000+ in debug message" do you mean the Xen boot message "TSC has constant rate..., warp = NNNN"? If so, this is a very different "warp" measured in cycles, not in seconds, so 3000 is more like a microsecond not an hour, and this is normal (not a bad sign). Dan From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] Sent: Monday, February 28, 2011 8:23 AM To: Dan Magenheimer Cc: Jeremy Fitzhardinge; Keir Fraser; Jan Beulich; Mark Adams; xen-devel@lists.xensource.com; Xen Users; Keir Fraser Subject: Re: [Xen-devel] Xen 4 TSC problems Keir : Yes, it is "under progress". To make this change, I had to reboot every server, so it is taking time (production server :() So i was hoping to find a quick method to mitigate this issue on domUs while rebooting servers. As this bug happens once or twice per server since October, I can''t say that right now that changing platform timer to PIT fixed it. I have to wait (I hope forever!) this bug to happen again on a ''patched'' server ... But even with clcoksource=pit, I am seeing some warp=3000+ in debug message ? I guess it is not a good sign, is it ? Jan : I was hoping to find a way to make the domU clocksource more "independent" like with xen3.2. 2011/2/28 Dan Magenheimer <HYPERLINK "mailto:dan.magenheimer@oracle.com"dan.magenheimer@oracle.com> Hi Olivier - It is the Xen clocksource that you want to try to change, not the dom0 clocksource. To do this, you need to specify "clocksource=pit" on the Xen boot line (and reboot), not the dom0 boot line. I believe Mark Adams played with tsc_mode to see if it solved! his (similar? identical?) problem last year, and it didn''t make any difference. Please try booting Xen with "clocksource=pit" and ensure that "Platform timer is 1.19MHz PIT" appears in the Xen boot messages. If the 50min jump does not appear again, it would point to a problem in the hpet, either hardware or software. Thanks, Dan From: Olivier Hanesse [mailto:HYPERLINK "mailto:olivier.hanesse@gmail.com" \nolivier.hanesse@gmail.com] Sent: Monday, February 28, 2011 7:37 AM To: Jeremy Fitzhardinge Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; HYPERLINK "mailto:xen-devel@lists.xensource.com" \nxen-devel@lists.xensource.com; Xen Users; Keir Fraser Subject: Re: [Xen-devel] Xen 4 TSC problems Hello, It happened again twice this weekend. What about setting "tsc_mode=2" for my vms ? Should this mode prevent this bug (coming from a bad emulated tsc due to firmware issue ? is it possible ?) from affecting time in domUs ? Setting clocksource=pit, make ''tsc'' available in "/sys/devices/system/clocksource/clocksource0/available_clocksource" (otherwise only xen is available, is it normal ? ). Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? or will it be worsed ? Regards Olivier 2011/2/24 Jeremy Fitzhardinge <HYPERLINK "mailto:jeremy@goop.org" \njeremy@goop.org> On 02/24/2011 09:43 AM, Dan Magenheimer wrote:> Just a wild guess, but this in Olivier''s posted output: > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > (or a complete red herring, but I thought it worth mentioning). > > Mark and Olivier, it would be interesting to know if you are > using the same processor/system.It definitely seems like some kind of problem on the host system rather than anything in the guests themselves. ! If the platform timer is misbehaving, then Xen could be completely screwing up the pvclock calibration which it then passes to guests. Could it be one of those "platform clock stops in certain power states" problems? J>> -----Original Message----- >> From: Keir Fraser [mailto:HYPERLINK "mailto:keir.xen@gmail.com" \nkeir.xen@gmail.com] >> Sent: Thursday, February 24, 2011 7:52 AM >> To: Olivier Hanesse; Jan Beulich >> Cc: Mark Adams; Jeremy Fitzhardinge; HYPERLINK "mailto:xen-devel@lists.xensource.com" \nxen-devel@lists.xensource.com; Xen >> Users; Dan Magenheimer; Keir Fraser >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> >> On 24/02/2011 14:20, "Olivier Hanesse" <HYPERLINK "mailto:olivier.hanesse@gmail.com" \nolivier.hanesse@gmail.com> >> wrote: >> >>> Both dom0 and domUs are affected by this" jump". >>> >>> I expect to see something like "TSC marked as reliable, warp = 0". >>> I got this on newer hardware with same config/distros. >> It depends on the CPU itself, older CPUs do not have the super-stable >> TSC >> features. But that should never cause a massive 3000s time jump. >> >>> Is there a way to measure if it is a TSC warp ? to point out a cpu >> tsc issue ? >> >> The TSC warps or out-of-sync issues that we could reasonably expect >> would be >> on the order of microseconds. A 3000s warp is something else entirely. >> Xen >> is very confused and/or some TSC or platform timer has jumped a long >> way >> (indicating a hardware/firmware issue). >> >> -- Keir >>>>! ;> 2011/2/24 Jan Beulich <HYPERLINK "mailto:JBeulich@novell.com" \nJBeulich@novell.com>>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <HYPERLINK "mailto:olivier.hanesse@gmail.com" \nolivier.hanesse@gmail.com> >> wrote: >>>>> I tried to turn off cstates with max_cstate=0 without success >> (still "not >>>>> reliable"). >>>>> >>>>> With cpuidle=0, I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3022 (count=1) >>>> This message by itself isn''t telling much I believe. >>>> >>>>> xm info | grep command >>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all >> guest_loglvl=all>>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1 com1=19200,8n1>>>>> >>>>> Keir : >>>>> >>>>> Using clocksource=pit : >>>>> >>>>> (XEN) Platform timer is 1.193MHz PIT >>>>> >>>>> I also got : >>>>> >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >> reliable, >>>>> warp=3262 (count=2) >>>> The question is whether any of this eliminates the time jumps seen >>>> by your DomU-s (from your past mails I wasn''t actually sure whether >>>> Dom0 also experienced this problem, albeit it would be odd if it >> didn''t). >>>> Jan >>>> >>>> Jan >>>> >>> >>_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 28/02/2011 15:23, "Olivier Hanesse" <olivier.hanesse@gmail.com> wrote:> Keir : > > Yes, it is "under progress". > To make this change, I had to reboot every server, so it is taking time > (production server :() > So i was hoping to find a quick method to mitigate this issue on domUs while > rebooting servers. > > As this bug happens once or twice per server since October, I can''t say that > right now that changing platform timer to PIT fixed it. I have to wait (I hope > forever!) this bug to happen again on a ''patched'' server ... > > But even with clcoksource=pit, I am seeing some warp=3000+ in debug message ? > I guess it is not a good sign, is it ?Better not to have it, but honestly you''re very unlikely to see any problem from it. It''s totally unrelated to the 3000-second time jumps. -- Keir> Jan : I was hoping to find a way to make the domU clocksource more > "independent" like with xen3.2. > > > 2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com> >> Hi Olivier >> >> It is the Xen clocksource that you want to try to change, not the dom0 >> clocksource. To do this, you need to specify ³clocksource=pit² on the Xen >> boot line (and reboot), not the dom0 boot line. >> >> I believe Mark Adams played with tsc_mode to see if it solved! his (similar? >> identical?) problem last year, and it didn¹t make any difference. >> >> Please try booting Xen with ³clocksource=pit² and ensure that ³Platform timer >> is 1.19MHz PIT² appears in the Xen boot messages. If the 50min jump does not >> appear again, it would point to a problem in the hpet, either hardware or >> software. >> >> Thanks, >> Dan >> >> >> From: Olivier Hanesse [mailto:olivier.hanesse@gmail.com] >> Sent: Monday, February 28, 2011 7:37 AM >> To: Jeremy Fitzhardinge >> Cc: Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; >> xen-devel@lists.xensource.com; Xen Users; Keir Fraser >> >> >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> >> >> Hello, >> >> >> It happened again twice this weekend. >> >> >> >> What about setting "tsc_mode=2" for my vms ? Should this mode prevent this >> bug (coming from a bad emulated tsc due to firmware issue ? is it possible ?) >> from affecting time in domUs ? >> >> >> >> Setting clocksource=pit, make ''tsc'' available in >> "/sys/devices/system/clocksource/clocksource0/available_clocksource" >> (otherwise only xen is available, is it normal ? ). >> >> >> >> Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU ? >> or will it be worsed ? >> >> >> >> Regards >> >> >> >> Olivier >> >> >> >> 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org> >> >> On 02/24/2011 09:43 AM, Dan Magenheimer wrote: >>> Just a wild guess, but this in Olivier''s posted output: >>> >>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. >>> >>> and the fact that a 32-bit HPET wrap is ~300 seconds and, with the >>> "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue >>> (or a complete red herring, but I thought it worth mentioning). >>> >>> Mark and Olivier, it would be interesting to know if you are >>> using the same processor/system. >> It definitely seems like some kind of problem on the host system rather >> than anything in the guests themselves. ! If the platform timer is >> misbehaving, then Xen could be completely screwing up the pvclock >> calibration which it then passes to guests. >> >> Could it be one of those "platform clock stops in certain power states" >> problems? >> >> >> J >> >>>> -----Original Message----- >>>> From: Keir Fraser [mailto:keir.xen@gmail.com] >>>> Sent: Thursday, February 24, 2011 7:52 AM >>>> To: Olivier Hanesse; Jan Beulich >>>> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen >>>> Users; Dan Magenheimer; Keir Fraser >>>> Subject: Re: [Xen-devel] Xen 4 TSC problems >>>> >>>> On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> >>>> wrote: >>>> >>>>> Both dom0 and domUs are affected by this" jump". >>>>> >>>>> I expect to see something like "TSC marked as reliable, warp = 0". >>>>> I got this on newer hardware with same config/distros. >>>> It depends on the CPU itself, older CPUs do not have the super-stable >>>> TSC >>>> features. But that should never cause a massive 3000s time jump. >>>> >>>>> Is there a way to measure if it is a TSC warp ? to point out a cpu >>>> tsc issue ? >>>> >>>> The TSC warps or out-of-sync issues that we could reasonably expect >>>> would be >>>> on the order of microseconds. A 3000s warp is something else entirely. >>>> Xen >>>> is very confused and/or some TSC or platform timer has jumped a long >>>> way >>>> (indicating a hardware/firmware issue). >>>> >>>> -- Keir >>>> >>> >! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com> >> >>>>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> >>>> wrote: >>>>>>> I tried to turn off cstates with max_cstate=0 without success >>>> (still "not >>>>>>> reliable"). >>>>>>> >>>>>>> With cpuidle=0, I also got : >>>>>>> >>>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >>>> reliable, >>>>>>> warp=3022 (count=1) >>>>>> This message by itself isn''t telling much I believe. >>>>>> >>>>>>> xm info | grep command >>>>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all >>>> guest_loglvl=all >>>>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1 com1=19200,8n1 >> >>>>>>> >>>>>>> Keir : >>>>>>> >>>>>>> Using clocksource=pit : >>>>>>> >>>>>>> (XEN) Platform timer is 1.193MHz PIT >>>>>>> >>>>>>> I also got : >>>>>>> >>>>>>> (XEN) TSC has constant rate, deep Cstates possible, so not >>>> reliable, >>>>>>> warp=3262 (count=2) >>>>>> The question is whether any of this eliminates the time jumps seen >>>>>> by your DomU-s (from your past mails I wasn''t actually sure whether >>>>>> Dom0 also experienced this problem, albeit it would be odd if it >>>> didn''t). >>>>>> Jan >>>>>> >>>>>> Jan >>>>>> >>>>> >>>> >> > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Yes this is what I mean. I am glad to hear that it isn''t a bad sign :) I thought of a bad sign, because on system with "reliable TSC", this counter is always 0. 2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com>> Hi Olivier – > > > > By “warp=3000+ in debug message” do you mean the Xen boot message “TSC has > constant rate..., warp = NNNN”? > > > > If so, this is a very different “warp” measured in cycles, not in seconds, > so 3000 is more like a microsecond not an hour, ! and this is normal (not a > bad sign). > > > > Dan > > > > *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com] > *Sent:* Monday, February 28, 2011 8:23 AM > *To:* Dan Magenheimer > *Cc:* Jeremy Fitzhardinge; Keir Fraser; Jan Beulich; Mark Adams; > xen-devel@lists.xensource.com; Xen Users; Keir Fraser > > *Subject:* Re: [Xen-devel] Xen 4 TSC problems > > > > Keir : > > > > Yes, it is "under progress". > > To make this change, I had to reboot every server, so it is taking time > (production server :() > > So i was hoping to find a quick method to mitigate this issue on domUs > while rebooting servers. > > > > As this bug happens once or twice per server since October, I can''t say > that right now that changing platform timer to PIT fixed it. I have to wait > (I hope forever!) this bug to happen again on a ''patched'' server ... > > > > But even with clcoksource=pit, I am seeing some warp=3000+ in debug message > ? I guess it is not a good sign, is it ? > > > > Jan : I was hoping to find a way to make the domU clocksource more > "independent" like with xen3.2. > > > > > > 2011/2/28 Dan Magenheimer <dan.magenheimer@oracle.com> > > Hi Olivier – > > > > It is the Xen clocksource that you want to try to change, not the dom0 > clocksource. To do this, you need to specify “clocksource=pit” on the Xen > boot line (and reboot), not the dom0 boot line. > > > > I believe Mark Adams played with tsc_mode to see if it solved! his > (similar? identical?) problem last year, and it didn’t make any difference. > > > Please try booting Xen with “clocksource=pit” and ensu! re that “Platform > timer is 1.19MHz PIT” appears in the Xen boot messages. If the 50min jump > does not appear again, it would point to a problem in the hpet, either > hardware or software. > > > > Thanks, > > Dan > > > > *From:* Olivier Hanesse [mailto:olivier.hanesse@gmail.com] > *Sent:* Monday, February 28, 2011 7:37 AM > *To:* Jeremy Fitzhardinge > *Cc:* Dan Magenheimer; Keir Fraser; Jan Beulich; Mark Adams; > xen-devel@lists.xensource.com; Xen Users; Keir Fraser > > > *Subject:* Re: [Xen-devel] Xen 4 TSC problems > > > > Hello, > > > > It happened again twice this weekend. > > > > What about setting "tsc_mode=2" for my vms ? Should this mode prevent this > bug (coming from a bad emulated tsc due to firmware issue ? is it possible > ?) from affecting time in domUs ? > > > > Setting clocksource=pit, make ''tsc'' available in > "/sys/devices/system/clocksource/clocksource0/available_clocksource" > (otherwise only xen is available, is it norma! l ? ). > > > > Should I bypass xen clocksource and use tsc as a clocksource for dom0/domU > ? or will it be worsed ? > > > > Regards > > > > Olivier > > > > 2011/2/24 Jeremy Fitzhardinge <jeremy@goop.org> > > On 02/24/2011 09:43 AM, Dan Magenheimer wrote: > > Just a wild guess, but this in Olivier''s posted output: > > > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more > times. > > > > and the fact that a 32-bit HPET wrap is ~300 seconds and, with the > > "10 or more times", 10 * 300 seconds is 3000 seconds, might be a clue > > (or a complete red herring, but I thought it worth mentioning). > > > > Mark and Olivier, it would be interesting to know if you are > > using the same processor/system. > > It definitely seems like some kind of problem on the host system rather > than anything in the guests themselves. ! If the platform timer is > misbehaving, then Xen could be completely screwing up the pvclock > calibration which it then passes to guests. > > Could it be one of those "platform clock stops in certain power states" > problems? > > > J > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.xen@gmail.com] > >> Sent: Thursday, February 24, 2011 7:52 AM > >> To: Olivier Hanesse; Jan Beulich > >> Cc: Mark Adams; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Xen > >> Users; Dan Magenheimer; Keir Fraser > >> Subject: Re: [Xen-devel] Xen 4 TSC problems > >> > >> On 24/02/2011 14:20, "Olivier Hanesse" <olivier.hanesse@gmail.com> > >> wrote: > >> > >>> Both dom0 and domUs are affected by this" jump". > >>> > >>> I expect to see something like "TSC marked as reliable, warp = 0". > >>> I got this on newer hardware with same config/distros. > >> It depends on the CPU itself, older CPUs do not have the super-stable > >> TSC > >> features. But that should never cause a massive 3000s time jump. > >> > >>> Is there a way to measure if it is a TSC warp ? to point out a cpu > >> tsc issue ? > >> > >> The TSC warps or out-of-sync issues that we could reasonably expect > >> would be > >> on the order of microseconds. A 3000s warp is something else entirely. > >> Xen > >> is very confused and/or some TSC or platform timer has jumped a long > >> way > >> (indicating a hardware/firmware issue). > >> > >> -- Keir > >> > > >>! ;> 2011/2/24 Jan Beulich <JBeulich@novell.com> > > > >>>>>>> On 24.02.11 at 12:57, Olivier Hanesse <olivier.hanesse@gmail.com> > >> wrote: > >>>>> I tried to turn off cstates with max_cstate=0 without success > >> (still "not > >>>>> reliable"). > >>>>> > >>>>> With cpuidle=0, I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3022 (count=1) > >>>> This message by itself isn''t telling much I believe. > >>>> > >>>>> xm info | grep command > >>>>> xen_commandline : dom0_mem=512M cpuidle=0 loglvl=all > >> guest_loglvl=all > > >>>>> dom0_max_vcpus=1 dom0_vcp! us_pin console=vga,com1 com1=19200,8n1 > > > >>>>> > >>>>> Keir : > >>>>> > >>>>> Using clocksource=pit : > >>>>> > >>>>> (XEN) Platform timer is 1.193MHz PIT > >>>>> > >>>>> I also got : > >>>>> > >>>>> (XEN) TSC has constant rate, deep Cstates possible, so not > >> reliable, > >>>>> warp=3262 (count=2) > >>>> The question is whether any of this eliminates the time jumps seen > >>>> by your DomU-s (from your past mails I wasn''t actually sure whether > >>>> Dom0 also experienced this problem, albeit it would be odd if it > >> didn''t). > >>>> Jan > >>>> > >>>> Jan > >>>> > >>> > >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, we experienced the same problem with xen 4 under debian squeeze on our DELL PowerEdge R815 Servers. Does the "clocksource=pit" setting solve the problem? Cheers Andre -- View this message in context: http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-tp3396848p4304962.html Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
So far yes. Let''s wait another month to tell that setting clocksource=pit was the solution :) 2011/4/15 andre.arnold <andre.arnold@gmail.com>> Hi, > > we experienced the same problem with xen 4 under debian squeeze on > our DELL PowerEdge R815 Servers. > > Does the "clocksource=pit" setting solve the problem? > > Cheers > Andre > > -- > View this message in context: > http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-tp3396848p4304962.html > Sent from the Xen - Dev mailing list archive at Nabble.com. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Xen developers i just would like to inform you that I have exactly the same problem with Debian squeeze and xen, with 50 seconds time jump on my dom0 and domu. NTP is running on all dom0/domuU, clocksource is ''xen'' everywhere. some messages : syslog : Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable (delta = -2999662111513 ns) xm dmesg : ... (XEN) Platform timer is 14.318MHz HPET ... (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. (XEN) TSC marked as reliable, warp = 0 (count=2) ... I had some contact with Olivier Hanesse and it indicates that he doesn''t have any solution for this problem, and all what was proposed in February didn''t solved this problem. all suggestions are welcomed. Best regards Philippe config : -------------------------------------- Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 12:46:30 UTC 2011 x86_64 GNU/Linux -------------------------------------- HP DL385 -------------------------------------- vendor_id : AuthenticAMD cpu family : 16 model : 9 model name : AMD Opteron(tm) Processor 6174 stepping : 1 cpu MHz : 3058776.574 cache size : 512 KB fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni cx16 popcnt hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch nodeid_msr bogomips : 4409.03 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate -------------------------------------- -------------------------------------- PCI : 00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02) 00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port D) 00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port F) 00:0a.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external gfx1 port A) 00:0b.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (NB-SB link) 00:0d.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external gfx1 port B) 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode] 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d) 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 00:1a.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 00:1a.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 00:1a.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 00:1a.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00:1a.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 00:1b.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration 00:1b.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map 00:1b.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller 00:1b.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00:1b.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control 01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 02:00.0 System peripheral: Hewlett-Packard Company iLO3 Slave instrumentation & System support (rev 04) 02:00.2 System peripheral: Hewlett-Packard Company iLO3 Management Processor Support and Messaging (rev 04) 02:00.4 USB Controller: Hewlett-Packard Company Proliant iLO2/iLO3 virtual USB controller (rev 01) 03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 controllers (rev 01) 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 05:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 09:00.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) 0a:04.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) 0a:05.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) 0a:06.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb) 0c:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0c:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0c:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0c:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0f:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0f:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0f:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) 0f:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) -------------------------------------- On 8:59 PM, Olivier Hanesse wrote:> Hello > > I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze release). > > My problem is here (hopefully I amn''t the only one, so there might be a bug > somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 > After some times, I got this error : Clocksource tsc unstable (delta > -2999660334211 ns). It has happened on several servers. > > Looking at the output of "xm debug-key s;" > > (XEN) TSC has constant rate, deep Cstates possible, so not reliable, > warp=2850 (count=3) > > I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the > "constant_tsc", but not the "nonstop_tsc" one. > On other systems with a newer cpu with "nonstop_tsc", I don''t have this > issue (systems are running the same distros with same config). > > I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t > reliable and after some times, I will got the "50min" issue again. > > I don''t understand how a system can do a jump of "50min" in the future. Why > 50min ? it is not 40min, not 1 hour, it is always 50min. > I don''t know how to make my TSC "reliable" (I already disable everything > about Powerstate in BIOS Settings). > > Any ideas ? > > Regards >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote:> Hi Xen developers > > i just would like to inform you that I have exactly the same problem > with Debian squeeze and xen, with > 50 seconds time jump on my dom0 and domu. NTP is running on all > dom0/domuU, clocksource is ''xen'' > everywhere. > > some messages : > syslog : > Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc > unstable (delta = -2999662111513 ns) > > xm dmesg : > ... > (XEN) Platform timer is 14.318MHz HPET > ... > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > (XEN) TSC marked as reliable, warp = 0 (count=2) > ... > > I had some contact with Olivier Hanesse and it indicates that he > doesn''t have any solution for this problem, > and all what was proposed in February didn''t solved this problem.Which was the max_cstate=0 ? ..> config : > -------------------------------------- > Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 > 12:46:30 UTC 2011 x86_64 GNU/Linux > -------------------------------------- > HP DL385 > -------------------------------------- > vendor_id : AuthenticAMD > cpu family : 16 > model : 9 > model name : AMD Opteron(tm) Processor 6174 > stepping : 1 > cpu MHz : 3058776.574OK, that is really messed up. Your house must be on fire for the machine to be running at 3058GHz! Jeremy, this sounds familiar - did we have a patch for this in your 2.6.32 tree? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote:> Hi Xen developersLets try this again, this time Cc-ing Jeremy.> > i just would like to inform you that I have exactly the same problem > with Debian squeeze and xen, with > 50 seconds time jump on my dom0 and domu. NTP is running on all > dom0/domuU, clocksource is ''xen'' > everywhere. > > some messages : > syslog : > Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc > unstable (delta = -2999662111513 ns) > > xm dmesg : > ... > (XEN) Platform timer is 14.318MHz HPET > ... > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > (XEN) TSC marked as reliable, warp = 0 (count=2) > ... > > I had some contact with Olivier Hanesse and it indicates that he > doesn''t have any solution for this problem, > and all what was proposed in February didn''t solved this problem.Which was the max_cstate=0 ? ..> config : > -------------------------------------- > Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 > 12:46:30 UTC 2011 x86_64 GNU/Linux > -------------------------------------- > HP DL385 > -------------------------------------- > vendor_id : AuthenticAMD > cpu family : 16 > model : 9 > model name : AMD Opteron(tm) Processor 6174 > stepping : 1 > cpu MHz : 3058776.574OK, that is really messed up. Your house must be on fire for the machine to be running at 3058GHz! Jeremy, this sounds familiar - did we have a patch for this in your 2.6.32 tree? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Sep 13, 2011 at 8:16 AM, Philippe Simonet <philippe.simonet@bluewin.ch> wrote:> Hi Xen developers > > i just would like to inform you that I have exactly the same problem with > Debian squeeze and xen, with > 50 seconds time jump on my dom0 and domu. NTP is running on all dom0/domuU, > clocksource is ''xen'' > everywhere. > > some messages : > syslog : > Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable > (delta = -2999662111513 ns) > > xm dmesg : > ... > (XEN) Platform timer is 14.318MHz HPET > ... > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > (XEN) TSC marked as reliable, warp = 0 (count=2) > ...I haven''t been following this conversation, so I don''t know if this is relevant, but I''ve just discovered this morning that the TSC warp check in Xen is done at the wrong time (before any secondary cpus are brought up), and thus always returns warp=0. I''ve submitted a patch to do the check after secondary CPUs are brought up; that should cause Xen to do periodic synchronization of TSCs when there is drift. -George> > I had some contact with Olivier Hanesse and it indicates that he doesn''t > have any solution for this problem, > and all what was proposed in February didn''t solved this problem. > > all suggestions are welcomed. > > Best regards > > Philippe > > > config : > -------------------------------------- > Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 12:46:30 UTC > 2011 x86_64 GNU/Linux > -------------------------------------- > HP DL385 > -------------------------------------- > vendor_id : AuthenticAMD > cpu family : 16 > model : 9 > model name : AMD Opteron(tm) Processor 6174 > stepping : 1 > cpu MHz : 3058776.574 > cache size : 512 KB > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush > mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow > constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni cx16 popcnt > hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse > 3dnowprefetch nodeid_msr > bogomips : 4409.03 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate > -------------------------------------- > > -------------------------------------- > PCI : > 00:00.0 Host bridge: ATI Technologies Inc RD890 Northbridge only dual slot > (2x16) PCI-e GFX Hydra part (rev 02) > 00:04.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI > express gpp port D) > 00:06.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (PCI > express gpp port F) > 00:0a.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external > gfx1 port A) > 00:0b.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (NB-SB > link) > 00:0d.0 PCI bridge: ATI Technologies Inc RD890 PCI to PCI bridge (external > gfx1 port B) > 00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller > [IDE mode] > 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 > Controller > 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller > 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller > 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 > Controller > 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller > 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller > 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3d) > 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller > 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller > 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge > 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > HyperTransport Configuration > 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Address Map > 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM > Controller > 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Miscellaneous Control > 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link > Control > 00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > HyperTransport Configuration > 00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Address Map > 00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM > Controller > 00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Miscellaneous Control > 00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link > Control > 00:1a.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > HyperTransport Configuration > 00:1a.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Address Map > 00:1a.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM > Controller > 00:1a.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Miscellaneous Control > 00:1a.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link > Control > 00:1b.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > HyperTransport Configuration > 00:1b.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Address Map > 00:1b.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM > Controller > 00:1b.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor > Miscellaneous Control > 00:1b.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link > Control > 01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) > 02:00.0 System peripheral: Hewlett-Packard Company iLO3 Slave > instrumentation & System support (rev 04) > 02:00.2 System peripheral: Hewlett-Packard Company iLO3 Management Processor > Support and Messaging (rev 04) > 02:00.4 USB Controller: Hewlett-Packard Company Proliant iLO2/iLO3 virtual > USB controller (rev 01) > 03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 > controllers (rev 01) > 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > 04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > 05:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 > Gigabit Ethernet (rev 20) > 09:00.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI > Express Gen 2 (5.0 GT/s) Switch (rev bb) > 0a:04.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI > Express Gen 2 (5.0 GT/s) Switch (rev bb) > 0a:05.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI > Express Gen 2 (5.0 GT/s) Switch (rev bb) > 0a:06.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI > Express Gen 2 (5.0 GT/s) Switch (rev bb) > 0c:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0c:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0c:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0c:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0f:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0f:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0f:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > 0f:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network > Connection (rev 01) > -------------------------------------- > > > > > On 8:59 PM, Olivier Hanesse wrote: >> >> Hello >> >> I''ve got an issue about time keeping with Xen 4.0 (Debian squeeze >> release). >> >> My problem is here (hopefully I amn''t the only one, so there might be a >> bug >> somewhere) : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599161#50 >> After some times, I got this error : Clocksource tsc unstable (delta >> -2999660334211 ns). It has happened on several servers. >> >> Looking at the output of "xm debug-key s;" >> >> (XEN) TSC has constant rate, deep Cstates possible, so not reliable, >> warp=2850 (count=3) >> >> I am using a "Intel(R) Xeon(R) CPU L5420 @ 2.50GHz", which has the >> "constant_tsc", but not the "nonstop_tsc" one. >> On other systems with a newer cpu with "nonstop_tsc", I don''t have this >> issue (systems are running the same distros with same config). >> >> I tried to boot with "max_cstate=0", but nothing changed, my TSC isn''t >> reliable and after some times, I will got the "50min" issue again. >> >> I don''t understand how a system can do a jump of "50min" in the future. >> Why >> 50min ? it is not 40min, not 1 hour, it is always 50min. >> I don''t know how to make my TSC "reliable" (I already disable everything >> about Powerstate in BIOS Settings). >> >> Any ideas ? >> >> Regards >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote:> On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote: >> Hi Xen developers > Lets try this again, this time Cc-ing Jeremy. >> i just would like to inform you that I have exactly the same problem >> with Debian squeeze and xen, with >> 50 seconds time jump on my dom0 and domu. NTP is running on all >> dom0/domuU, clocksource is ''xen'' >> everywhere. >> >> some messages : >> syslog : >> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc >> unstable (delta = -2999662111513 ns) >> >> xm dmesg : >> ... >> (XEN) Platform timer is 14.318MHz HPET >> ... >> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. >> (XEN) TSC marked as reliable, warp = 0 (count=2) >> ... >> >> I had some contact with Olivier Hanesse and it indicates that he >> doesn''t have any solution for this problem, >> and all what was proposed in February didn''t solved this problem.That looks like Xen itself is having problems keeping track of time. If it can''t manage it, then there''s not much the guest kernels can do about it.> > Which was the max_cstate=0 ? > .. >> config : >> -------------------------------------- >> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 >> 12:46:30 UTC 2011 x86_64 GNU/Linux >> -------------------------------------- >> HP DL385 >> -------------------------------------- >> vendor_id : AuthenticAMD >> cpu family : 16 >> model : 9 >> model name : AMD Opteron(tm) Processor 6174 >> stepping : 1 >> cpu MHz : 3058776.574 > OK, that is really messed up. Your house must be on fire for the machine > to be running at 3058GHz! > > Jeremy, this sounds familiar - did we have a patch for this in > your 2.6.32 tree?Not that I can think of. All I can suggest from the kernel side is that perhaps some of the ACPI power stuff isn''t being set up properly, and that makes the CPU do very strange things with its TSC/power states in general. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com] > Sent: Thursday, September 15, 2011 4:36 AM > To: Philippe Simonet > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] Xen 4 TSC problems > > On Tue, Sep 13, 2011 at 8:16 AM, Philippe Simonet > <philippe.simonet@bluewin.ch> wrote: > > Hi Xen developers > > > > i just would like to inform you that I have exactly the same problem with > > Debian squeeze and xen, with > > 50 seconds time jump on my dom0 and domu. NTP is running on all dom0/domuU, > > clocksource is ''xen'' > > everywhere. > > > > some messages : > > syslog : > > Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable > > (delta = -2999662111513 ns) > > > > xm dmesg : > > ... > > (XEN) Platform timer is 14.318MHz HPET > > ... > > (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. > > (XEN) TSC marked as reliable, warp = 0 (count=2) > > ... > > I haven''t been following this conversation, so I don''t know if this is > relevant, but I''ve just discovered this morning that the TSC warp > check in Xen is done at the wrong time (before any secondary cpus are > brought up), and thus always returns warp=0. I''ve submitted a patch > to do the check after secondary CPUs are brought up; that should cause > Xen to do periodic synchronization of TSCs when there is drift.Wow, nice catch, George! I wonder if this is the underlying bug for many of the mysterious time problems that have been reported for a year or two now... at least on certain AMD boxes. Any idea when this was introduced? Or has it always been wrong? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge > Sent: Thursday, September 15, 2011 6:25 PM > To: Konrad Rzeszutek Wilk > Cc: xen-devel@lists.xensource.com; Philippe Simonet > Subject: Re: [Xen-devel] Xen 4 TSC problems > > On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote: > > On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote: > >> Hi Xen developers > > Lets try this again, this time Cc-ing Jeremy. > >> i just would like to inform you that I have exactly the same problem > >> with Debian squeeze and xen, with > >> 50 seconds time jump on my dom0 and domu. NTP is running on all > >> dom0/domuU, clocksource is ''xen'' > >> everywhere. > >> > >> some messages : > >> syslog : > >> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc > >> unstable (delta = -2999662111513 ns) > >> > >> xm dmesg : > >> ... > >> (XEN) Platform timer is 14.318MHz HPET ... > >> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more > times. > >> (XEN) TSC marked as reliable, warp = 0 (count=2) ... > >> > >> I had some contact with Olivier Hanesse and it indicates that he > >> doesn''t have any solution for this problem, and all what was proposed > >> in February didn''t solved this problem. > > That looks like Xen itself is having problems keeping track of time. If it can''t > manage it, then there''s not much the guest kernels can do about it. > > > > > Which was the max_cstate=0 ? > > .. > >> config : > >> -------------------------------------- > >> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 > >> 12:46:30 UTC 2011 x86_64 GNU/Linux > >> -------------------------------------- > >> HP DL385 > >> -------------------------------------- > >> vendor_id : AuthenticAMD > >> cpu family : 16 > >> model : 9 > >> model name : AMD Opteron(tm) Processor 6174 > >> stepping : 1 > >> cpu MHz : 3058776.574 > > OK, that is really messed up. Your house must be on fire for the > > machine to be running at 3058GHz! > > > > Jeremy, this sounds familiar - did we have a patch for this in your > > 2.6.32 tree? > > Not that I can think of. All I can suggest from the kernel side is that perhaps > some of the ACPI power stuff isn''t being set up properly, and that makes the > CPU do very strange things with its TSC/power states in general. >how can i detect that ? the /proc/acpi/processor path is empty, find /proc/acpi /proc/acpi /proc/acpi/processor /proc/acpi/button /proc/acpi/button/power /proc/acpi/button/power/PWRF /proc/acpi/button/power/PWRF/info /proc/acpi/thermal_zone /proc/acpi/wakeup /proc/acpi/sleep /proc/acpi/fadt /proc/acpi/dsdt /proc/acpi/info /proc/acpi/power_resource /proc/acpi/embedded_controller dmesg | grep -I acpi [ 1.205647] hpet_acpi_add: no address or irqs in _CRS lsmod | grep -i acpi acpi_processor 5087 1 processor,[permanent]> J > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/15/2011 11:03 PM, Philippe.Simonet@swisscom.com wrote:>> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- >> bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge >> Sent: Thursday, September 15, 2011 6:25 PM >> To: Konrad Rzeszutek Wilk >> Cc: xen-devel@lists.xensource.com; Philippe Simonet >> Subject: Re: [Xen-devel] Xen 4 TSC problems >> >> On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote: >>> On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote: >>>> Hi Xen developers >>> Lets try this again, this time Cc-ing Jeremy. >>>> i just would like to inform you that I have exactly the same problem >>>> with Debian squeeze and xen, with >>>> 50 seconds time jump on my dom0 and domu. NTP is running on all >>>> dom0/domuU, clocksource is ''xen'' >>>> everywhere. >>>> >>>> some messages : >>>> syslog : >>>> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc >>>> unstable (delta = -2999662111513 ns) >>>> >>>> xm dmesg : >>>> ... >>>> (XEN) Platform timer is 14.318MHz HPET ... >>>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more >> times. >>>> (XEN) TSC marked as reliable, warp = 0 (count=2) ... >>>> >>>> I had some contact with Olivier Hanesse and it indicates that he >>>> doesn''t have any solution for this problem, and all what was proposed >>>> in February didn''t solved this problem. >> That looks like Xen itself is having problems keeping track of time. If it can''t >> manage it, then there''s not much the guest kernels can do about it. >> >>> Which was the max_cstate=0 ? >>> .. >>>> config : >>>> -------------------------------------- >>>> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 >>>> 12:46:30 UTC 2011 x86_64 GNU/Linux >>>> -------------------------------------- >>>> HP DL385 >>>> -------------------------------------- >>>> vendor_id : AuthenticAMD >>>> cpu family : 16 >>>> model : 9 >>>> model name : AMD Opteron(tm) Processor 6174 >>>> stepping : 1 >>>> cpu MHz : 3058776.574 >>> OK, that is really messed up. Your house must be on fire for the >>> machine to be running at 3058GHz! >>> >>> Jeremy, this sounds familiar - did we have a patch for this in your >>> 2.6.32 tree? >> Not that I can think of. All I can suggest from the kernel side is that perhaps >> some of the ACPI power stuff isn''t being set up properly, and that makes the >> CPU do very strange things with its TSC/power states in general. >> > how can i detect that ? > > the /proc/acpi/processor path is empty, > > find /proc/acpi > /proc/acpi > /proc/acpi/processor > /proc/acpi/button > /proc/acpi/button/power > /proc/acpi/button/power/PWRF > /proc/acpi/button/power/PWRF/info > /proc/acpi/thermal_zone > /proc/acpi/wakeup > /proc/acpi/sleep > /proc/acpi/fadt > /proc/acpi/dsdt > /proc/acpi/info > /proc/acpi/power_resource > /proc/acpi/embedded_controller > > dmesg | grep -I acpi > [ 1.205647] hpet_acpi_add: no address or irqs in _CRS > > lsmod | grep -i acpi > acpi_processor 5087 1 processor,[permanent]What does "xenpm start 5" say? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 9/17/2011 12:40 AM, Jeremy Fitzhardinge wrote:> On 09/15/2011 11:03 PM, Philippe.Simonet@swisscom.com wrote: >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- >>> bounces@lists.xensource.com] On Behalf Of Jeremy Fitzhardinge >>> Sent: Thursday, September 15, 2011 6:25 PM >>> To: Konrad Rzeszutek Wilk >>> Cc: xen-devel@lists.xensource.com; Philippe Simonet >>> Subject: Re: [Xen-devel] Xen 4 TSC problems >>> >>> On 09/15/2011 01:24 AM, Konrad Rzeszutek Wilk wrote: >>>> On Tue, Sep 13, 2011 at 09:16:27AM +0200, Philippe Simonet wrote: >>>>> Hi Xen developers >>>> Lets try this again, this time Cc-ing Jeremy. >>>>> i just would like to inform you that I have exactly the same problem >>>>> with Debian squeeze and xen, with >>>>> 50 seconds time jump on my dom0 and domu. NTP is running on all >>>>> dom0/domuU, clocksource is ''xen'' >>>>> everywhere. >>>>> >>>>> some messages : >>>>> syslog : >>>>> Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc >>>>> unstable (delta = -2999662111513 ns) >>>>> >>>>> xm dmesg : >>>>> ... >>>>> (XEN) Platform timer is 14.318MHz HPET ... >>>>> (XEN) Platform timer appears to have unexpectedly wrapped 10 or more >>> times. >>>>> (XEN) TSC marked as reliable, warp = 0 (count=2) ... >>>>> >>>>> I had some contact with Olivier Hanesse and it indicates that he >>>>> doesn''t have any solution for this problem, and all what was proposed >>>>> in February didn''t solved this problem. >>> That looks like Xen itself is having problems keeping track of time. If it can''t >>> manage it, then there''s not much the guest kernels can do about it. >>> >>>> Which was the max_cstate=0 ? >>>> .. >>>>> config : >>>>> -------------------------------------- >>>>> Linux dnsit22.swissptt.ch 2.6.32-5-xen-amd64 #1 SMP Tue Jun 14 >>>>> 12:46:30 UTC 2011 x86_64 GNU/Linux >>>>> -------------------------------------- >>>>> HP DL385 >>>>> -------------------------------------- >>>>> vendor_id : AuthenticAMD >>>>> cpu family : 16 >>>>> model : 9 >>>>> model name : AMD Opteron(tm) Processor 6174 >>>>> stepping : 1 >>>>> cpu MHz : 3058776.574 >>>> OK, that is really messed up. Your house must be on fire for the >>>> machine to be running at 3058GHz! >>>> >>>> Jeremy, this sounds familiar - did we have a patch for this in your >>>> 2.6.32 tree? >>> Not that I can think of. All I can suggest from the kernel side is that perhaps >>> some of the ACPI power stuff isn''t being set up properly, and that makes the >>> CPU do very strange things with its TSC/power states in general. >>> >> how can i detect that ? >> >> the /proc/acpi/processor path is empty, >> >> find /proc/acpi >> /proc/acpi >> /proc/acpi/processor >> /proc/acpi/button >> /proc/acpi/button/power >> /proc/acpi/button/power/PWRF >> /proc/acpi/button/power/PWRF/info >> /proc/acpi/thermal_zone >> /proc/acpi/wakeup >> /proc/acpi/sleep >> /proc/acpi/fadt >> /proc/acpi/dsdt >> /proc/acpi/info >> /proc/acpi/power_resource >> /proc/acpi/embedded_controller >> >> dmesg | grep -I acpi >> [ 1.205647] hpet_acpi_add: no address or irqs in _CRS >> >> lsmod | grep -i acpi >> acpi_processor 5087 1 processor,[permanent] > What does "xenpm start 5" say? > > J >here it is : root@dnsit22.swissptt.ch ~# xenpm start 5 Timeout set to 5 seconds Start sampling, waiting for CTRL-C or SIGINT or SIGALARM signal ... Elapsed time (ms): 5028 CPU0: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU1: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU2: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU3: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU4: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU5: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU6: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU7: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU8: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU9: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU10: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU11: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU12: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU13: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU14: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU15: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU16: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU17: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU18: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU19: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU20: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU21: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU22: Residency(ms) Avg Res(ms) Avg freq 18 KHz CPU23: Residency(ms) Avg Res(ms) Avg freq 18 KHz _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:>> I haven''t been following this conversation, so I don''t know if this is >> relevant, but I''ve just discovered this morning that the TSC warp >> check in Xen is done at the wrong time (before any secondary cpus are >> brought up), and thus always returns warp=0. I''ve submitted a patch >> to do the check after secondary CPUs are brought up; that should cause >> Xen to do periodic synchronization of TSCs when there is drift. > > Wow, nice catch, George! I wonder if this is the underlying bug > for many of the mysterious time problems that have been reported > for a year or two now... at least on certain AMD boxes. > Any idea when this was introduced? Or has it always been wrong?Well the comment in 20823:89907dab1aef seems to indicate that''s where the "assume it''s reliable on AMD until proven otherwise" started; that would be January 2010. I looked as far back as 20705:a74aca4b9386, and there the TSC reliability checks were again in init_xen_time(). Figuring out where things were before then is getting into archeology. :-) The comment at the top of init_xen_time() is correct now, but from the time it was first written through 4.1 is was just plain wrong -- it said init_xen_time() happened after all cpus were up, which has never been true. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 19.09.11 at 12:39, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > The comment at the top of init_xen_time() is correct now, but from the > time it was first written through 4.1 is was just plain wrong -- it > said init_xen_time() happened after all cpus were up, which has never > been true.Not really - CPUs got booted by that time originally (pre-4.0, which is what the old comment said), but not onlined. Prior to the use of smp_call_function() for the TSC reliability check I assume only CPU feature flags got looked at, which was available at that point prior to Keir''s re-work of the SMP boot process. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Xen developpers i need some good tips to go forward with my TSC problem : first fast the problem : - clock jump 50 minutes forward : (xm dmesg) (XEN) TSC is reliable, synchronization unnecessary (XEN) Platform timer is 14.318MHz HPET (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times (syslog) Sep 28 17:45:06 dnsit11 kernel: [1970548.356130] Clocksource tsc unstable (delta = -2999660112689 ns) Sep 11 13:56:50 dnsit22 kernel: [571603.359863] Clocksource tsc unstable (delta = -2999662111513 ns) - I can''t reproduce or force the problem - on 2 different HP DL 385 G7, with debian squeeze : xen-hypervisor-4.0-amd64 4.0.1-2 dom0 : linux-image-2.6.32-5-xen-amd64 2.6.32-35 domus : 5 -> 15 debian machines 2 * 12-cores AMD Opteron(tm) Processor 6174 - i have this problem since begin of september, before, the machine were running since 3 month without problem begin of September, I have done an upgrade (dom0 and domus:) linux-image-2.6.32-5-xen-amd64:amd64 (2.6.32-31, automatic) -> linux-image-2.6.32-5-xen-amd64:amd64 (2.6.32-31, 2.6.32-35) - what is strange : (don''t know if there is a link with the problem) /proc/cpuinfo in dom0 gives me : cpu MHz : 3249880.888 --or -- cpu MHz : 2300454.255 .... (different after each reboot) in domu thi value is ok(cpu MHz : 2200.112), the bogomips is also ok (bogomips : 4400.21) if I start the machine with a non-xen environment, the values are also ok I have now exact the same machine where I can make some tests. Could you give me some tips that I could test or implement ? - hardware problem ? hypervisor problem ? dom0 problem ? - try other hypervisor version ? - try linux-image-3.0.0-1-amd64 3.0.0-3 - try reproducing problem ? (how ?, log it ? ....) all your help is welcomed ! many thanks Philippe> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of George Dunlap > Sent: Monday, September 19, 2011 12:40 PM > To: Dan Magenheimer > Cc: Keir Fraser; jeremy@goop.org; xen-devel@lists.xensource.com; Philippe > Simonet; Konrad Wilk > Subject: Re: [Xen-devel] Xen 4 TSC problems > > On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer > <dan.magenheimer@oracle.com> wrote: > >> I haven''t been following this conversation, so I don''t know if this > >> is relevant, but I''ve just discovered this morning that the TSC warp > >> check in Xen is done at the wrong time (before any secondary cpus are > >> brought up), and thus always returns warp=0. I''ve submitted a patch > >> to do the check after secondary CPUs are brought up; that should > >> cause Xen to do periodic synchronization of TSCs when there is drift. > > > > Wow, nice catch, George! I wonder if this is the underlying bug for > > many of the mysterious time problems that have been reported for a > > year or two now... at least on certain AMD boxes. > > Any idea when this was introduced? Or has it always been wrong? > > Well the comment in 20823:89907dab1aef seems to indicate that''s where the > "assume it''s reliable on AMD until proven otherwise" started; that would be > January 2010. > > I looked as far back as 20705:a74aca4b9386, and there the TSC reliability > checks were again in init_xen_time(). Figuring out where things were before > then is getting into archeology. :-) > > The comment at the top of init_xen_time() is correct now, but from the time > it was first written through 4.1 is was just plain wrong -- it said > init_xen_time() happened after all cpus were up, which has never been true. > > -George > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hey there, just wanted to report that we also experience the same problem since upgrading to xen 4.0.1 on debian squeeze. We are running latest debian stable release. ii libxenstore3.0 4.0.1-2 ii linux-image-2.6.32-5-xen-amd64 2.6.32-35squeeze2 ii linux-image-xen-amd64 2.6.32+29 ii xen-hypervisor-4.0-amd64 4.0.1-2 ii xen-linux-system-2.6-xen-amd64 2.6.32+29 ii xen-linux-system-2.6.32-5-xen-amd64 2.6.32-35squeeze2 ii xen-qemu-dm-4.0 4.0.1-2 ii xen-tools 4.2-1 ii xen-utils-4.0 4.0.1-2 ii xen-utils-common 4.0.0-1 ii xenstore-utils 4.0.1-2 We also have the clock jumping 50 minutes into future. We are running IBM Blades HS21 XM (Type 7995) with Intel(R) Xeon(R) CPU E5345 @ 2.33GHz. We are also running the same configuration on another machine with Intel(R) Xeon(R) CPU X7550 @ 2.00GHz where we dont experience this problems. Also we didnt had those bugs running xen 3.1.0 with 2.6.18-5-xen-amd64 kernel. We currently running one blade with disabled HPET, clocksoure=pit and cpuidle=0 and another with HPET on and nothing else configured. The main problem debugging this, is to wait for the next error to appear. Until now both machines run fine without time jumping, but we did see that time jump on the machine which is running with hpet enabled and no other settings. We could help debugging here. Regards Thomas Pöhler -- View this message in context: http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-tp3396848p4856420.html Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Philippe.Simonet@swisscom.com [mailto:Philippe.Simonet@swisscom.com] > Subject: RE: [Xen-devel] Xen 4 TSC problems > > Hi Xen developpers > > i need some good tips to go forward with my TSC problem : > > Could you give me some tips that I could test or implement ? > - try other hypervisor version ?Hi Phillipe -- It would definitely be worthwhile to see if you can reproduce the problem on the latest xen-unstable bits. (Please make sure that the bug George reported below is fixed in your build.) A lot has changed since 4.0.1. Dan P.S. I will be mostly offline for the next week or so...> > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > > bounces@lists.xensource.com] On Behalf Of George Dunlap > > Sent: Monday, September 19, 2011 12:40 PM > > To: Dan Magenheimer > > Cc: Keir Fraser; jeremy@goop.org; xen-devel@lists.xensource.com; Philippe > > Simonet; Konrad Wilk > > Subject: Re: [Xen-devel] Xen 4 TSC problems > > > > On Thu, Sep 15, 2011 at 7:38 PM, Dan Magenheimer > > <dan.magenheimer@oracle.com> wrote: > > >> I haven''t been following this conversation, so I don''t know if this > > >> is relevant, but I''ve just discovered this morning that the TSC warp > > >> check in Xen is done at the wrong time (before any secondary cpus are > > >> brought up), and thus always returns warp=0. I''ve submitted a patch > > >> to do the check after secondary CPUs are brought up; that should > > >> cause Xen to do periodic synchronization of TSCs when there is drift. > > > > > > Wow, nice catch, George! I wonder if this is the underlying bug for > > > many of the mysterious time problems that have been reported for a > > > year or two now... at least on certain AMD boxes. > > > Any idea when this was introduced? Or has it always been wrong? > > > > Well the comment in 20823:89907dab1aef seems to indicate that''s where the > > "assume it''s reliable on AMD until proven otherwise" started; that would be > > January 2010. > > > > I looked as far back as 20705:a74aca4b9386, and there the TSC reliability > > checks were again in init_xen_time(). Figuring out where things were before > > then is getting into archeology. :-) > > > > The comment at the top of init_xen_time() is correct now, but from the time > > it was first written through 4.1 is was just plain wrong -- it said > > init_xen_time() happened after all cpus were up, which has never been true. > > > > -George > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 February 2011 16:54, Olivier Hanesse <olivier.hanesse@gmail.com> wrote:> Yes this is what I mean. > I am glad to hear that it isn''t a bad sign :) > I thought of a bad sign, because on system with "reliable TSC", this counter > is always 0.Hey men. I have exactly the same problem. I have two cluster nodes. Server are two HP Proliant DL 580 G4 with four Quad Core Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. I''m running debian squeeze in dom0s end domUs. xm info: host : xen-p01 release : 2.6.32-5-xen-amd64 version : #1 SMP Sun May 6 08:57:29 UTC 2012 machine : x86_64 nr_cpus : 16 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2400 hw_caps : bfebfbff:20100800:00000000:00000940:0004e3bd:00000000:00000001:00000000 virt_caps : hvm total_memory : 65532 free_memory : 40317 node_to_cpu : node0:0-15 node_to_memory : node0:40317 node_to_dma32_mem : node0:3256 max_node_id : 0 xen_major : 4 xen_minor : 0 xen_extra : .1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : placeholder dom0_mem=3072M loglvl=warning guest_loglvl=warning cc_compiler : gcc version 4.4.5 (Debian 4.4.5-8) cc_compile_by : ultrotter cc_compile_domain : debian.org cc_compile_date : Sat Sep 8 19:15:46 UTC 2012 xend_config_format : 4 I''m experiencing weekly a clock jump ahead of about 50 minutes on dom0. I''m seriously in trouble because it cause every time a reboot of one of the two nodes clusters.
> From: Mauro [mailto:mrsanna1@gmail.com] > Sent: Thursday, September 27, 2012 9:55 AM > To: Olivier Hanesse > Cc: Dan Magenheimer; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Keir Fraser; Jan Beulich; > Keir Fraser; Xen Users; Mark Adams > Subject: Re: [Xen-users] Re: [Xen-devel] Xen 4 TSC problems > > On 28 February 2011 16:54, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: > > Yes this is what I mean. > > I am glad to hear that it isn''t a bad sign :) > > I thought of a bad sign, because on system with "reliable TSC", this counter > > is always 0. > > Hey men. > I have exactly the same problem. > I have two cluster nodes. > Server are two HP Proliant DL 580 G4 with four Quad Core Intel(R) > Xeon(R) CPU E7330 @ 2.40GHz. > I''m running debian squeeze in dom0s end domUs.Hi Mauro -- There''s been a lot of work on clocks since 4.0 (by other Xen developers, not me). I don''t think this specific problem was ever reproduced by a developer so I don''t think anyone knows if it has been already fixed or not, nor are there any plans to backport all the timer work to 4.0. You might try upgrading your Xen hypervisor to the just-released Xen 4.2 [1] and see if the problem goes away. If the problem still exists in 4.2, it may be easier to get some developer to pay attention to it. It may be specific hardware or processors or power management or firmware or even dom0 kernel, so the first thing to do is try later hypervisor bits. Sorry I can''t be more helpful. Good luck! Dan [1] Sorry, I''m not familiar with the 4.0->4.2 upgrade process so you may want to confirm with others.> xm info: > > host : xen-p01 > release : 2.6.32-5-xen-amd64 > version : #1 SMP Sun May 6 08:57:29 UTC 2012 > machine : x86_64 > nr_cpus : 16 > nr_nodes : 1 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 2400 > hw_caps : > bfebfbff:20100800:00000000:00000940:0004e3bd:00000000:00000001:00000000 > virt_caps : hvm > total_memory : 65532 > free_memory : 40317 > node_to_cpu : node0:0-15 > node_to_memory : node0:40317 > node_to_dma32_mem : node0:3256 > max_node_id : 0 > xen_major : 4 > xen_minor : 0 > xen_extra : .1 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > xen_commandline : placeholder dom0_mem=3072M loglvl=warning > guest_loglvl=warning > cc_compiler : gcc version 4.4.5 (Debian 4.4.5-8) > cc_compile_by : ultrotter > cc_compile_domain : debian.org > cc_compile_date : Sat Sep 8 19:15:46 UTC 2012 > xend_config_format : 4 > > I''m experiencing weekly a clock jump ahead of about 50 minutes on dom0. > I''m seriously in trouble because it cause every time a reboot of one > of the two nodes clusters.
Hello, From my point of view, this was a kind of xen hardware "incompatibility/bug" : I was able to reproduce this bug on more than 50 identical servers, but not on another farm of servers with a different hardware. Xen version, Debian Kernel was exactly the same on both farm. Regards Olivier 2012/9/27 Dan Magenheimer <dan.magenheimer@oracle.com>> > From: Mauro [mailto:mrsanna1@gmail.com] > > Sent: Thursday, September 27, 2012 9:55 AM > > To: Olivier Hanesse > > Cc: Dan Magenheimer; Jeremy Fitzhardinge; xen-devel@lists.xensource.com; > Keir Fraser; Jan Beulich; > > Keir Fraser; Xen Users; Mark Adams > > Subject: Re: [Xen-users] Re: [Xen-devel] Xen 4 TSC problems > > > > On 28 February 2011 16:54, Olivier Hanesse <olivier.hanesse@gmail.com> > wrote: > > > Yes this is what I mean. > > > I am glad to hear that it isn''t a bad sign :) > > > I thought of a bad sign, because on system with "reliable TSC", this > counter > > > is always 0. > > > > Hey men. > > I have exactly the same problem. > > I have two cluster nodes. > > Server are two HP Proliant DL 580 G4 with four Quad Core Intel(R) > > Xeon(R) CPU E7330 @ 2.40GHz. > > I''m running debian squeeze in dom0s end domUs. > > Hi Mauro -- > > There''s been a lot of work on clocks since 4.0 (by other Xen developers, > not me). I don''t think this specific problem was ever reproduced > by a developer so I don''t think anyone knows if it has been > already fixed or not, nor are there any plans to backport all the > timer work to 4.0. > > You might try upgrading your Xen hypervisor to the just-released > Xen 4.2 [1] and see if the problem goes away. If the problem still > exists in 4.2, it may be easier to get some developer to pay attention > to it. It may be specific hardware or processors or power > management or firmware or even dom0 kernel, so the first thing > to do is try later hypervisor bits. > > Sorry I can''t be more helpful. Good luck! > > Dan > > [1] Sorry, I''m not familiar with the 4.0->4.2 upgrade process > so you may want to confirm with others. > > > xm info: > > > > host : xen-p01 > > release : 2.6.32-5-xen-amd64 > > version : #1 SMP Sun May 6 08:57:29 UTC 2012 > > machine : x86_64 > > nr_cpus : 16 > > nr_nodes : 1 > > cores_per_socket : 4 > > threads_per_core : 1 > > cpu_mhz : 2400 > > hw_caps : > > bfebfbff:20100800:00000000:00000940:0004e3bd:00000000:00000001:00000000 > > virt_caps : hvm > > total_memory : 65532 > > free_memory : 40317 > > node_to_cpu : node0:0-15 > > node_to_memory : node0:40317 > > node_to_dma32_mem : node0:3256 > > max_node_id : 0 > > xen_major : 4 > > xen_minor : 0 > > xen_extra : .1 > > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > > hvm-3.0-x86_32p hvm-3.0-x86_64 > > xen_scheduler : credit > > xen_pagesize : 4096 > > platform_params : virt_start=0xffff800000000000 > > xen_changeset : unavailable > > xen_commandline : placeholder dom0_mem=3072M loglvl=warning > > guest_loglvl=warning > > cc_compiler : gcc version 4.4.5 (Debian 4.4.5-8) > > cc_compile_by : ultrotter > > cc_compile_domain : debian.org > > cc_compile_date : Sat Sep 8 19:15:46 UTC 2012 > > xend_config_format : 4 > > > > I''m experiencing weekly a clock jump ahead of about 50 minutes on dom0. > > I''m seriously in trouble because it cause every time a reboot of one > > of the two nodes clusters. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 27 September 2012 23:28, Olivier Hanesse <olivier.hanesse@gmail.com> wrote:> Hello, > > From my point of view, this was a kind of xen hardware "incompatibility/bug" > : I was able to reproduce this bug on more than 50 identical servers, but > not on another farm of servers with a different hardware. > Xen version, Debian Kernel was exactly the same on both farm.Yes I think so. The problem is where I use debian squeeze with xen 4.0. In another server with the same hardware but with debian lenny and xen 3.0 I have no problems. I''ve read that a workaround is to set clocksource=pit on the xen boot line in the grub conf, I hope this works because I can''t change hardware.
It didn''t work for me :( clocksource=pit made another "time jump" (don''t remember how much, but it was worst than 50min) 2012/9/27 Mauro <mrsanna1@gmail.com>> On 27 September 2012 23:28, Olivier Hanesse <olivier.hanesse@gmail.com> > wrote: > > Hello, > > > > From my point of view, this was a kind of xen hardware > "incompatibility/bug" > > : I was able to reproduce this bug on more than 50 identical servers, but > > not on another farm of servers with a different hardware. > > Xen version, Debian Kernel was exactly the same on both farm. > > Yes I think so. > The problem is where I use debian squeeze with xen 4.0. > In another server with the same hardware but with debian lenny and xen > 3.0 I have no problems. > I''ve read that a workaround is to set clocksource=pit on the xen boot > line in the grub conf, I hope this works because I can''t change hardware. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 29 September 2012 10:08, Olivier Hanesse <olivier.hanesse@gmail.com> wrote:> It didn''t work for me :( > clocksource=pit made another "time jump" (don''t remember how much, but it > was worst than 50min)Damn.........so there isn''t a solution, it is a huge problem. What processors do you have? I have HP Proliant DL580 G5 systems with four quad core Intel(R) Xeon(R) CPU E7330 @ 2.40GHz and debian linux as s.o.
It''s happened another time, system date 50 minutes ahead. There is really no solution? root@xen-p02:~# date sab 29 set 2012, 15.06.25, CEST root@xen-p02:~# hwclock --debug hwclock from util-linux-ng 2.17.2 Using /dev interface to clock. Last drift adjustment done at 1348816781 seconds after 1969 Last calibration done at 1348816781 seconds after 1969 Hardware clock is on UTC time Assuming hardware clock is kept in UTC time. Waiting for clock tick... ...got clock tick Time read from Hardware Clock: 2012/09/29 12:16:58 Hw clock time : 2012/09/29 12:16:58 = 1348921018 seconds since 1969 sab 29 set 2012 14:16:58 CEST -0.751536 seconds root@xen-p02:~# hwclock --show sab 29 set 2012 14:17:12 CEST -0.423643 seconds
On 24 February 2011 08:16, Keir Fraser <keir.xen@gmail.com> wrote:> Please send Xen boot output (xm dmesg). Getting it from Xen 3.2 as well > would be interesting, if you still have it installed on any of these > machines.If it can be useful here is xm dmesf of xen 4.0 on a debian squeeze system: (XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) Sat Sep 8 19:15:46 UTC 2012 (XEN) Bootloader: GRUB 1.98+20100804-14+squeeze1 (XEN) Command line: placeholder dom0_mem=3072M loglvl=warning guest_loglvl=warning (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: V2; EDID transfer time: 2 seconds (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 2 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cfd43000 (usable) (XEN) 00000000cfd43000 - 00000000cfd4c000 (ACPI data) (XEN) 00000000cfd4c000 - 00000000cfd4d000 (usable) (XEN) 00000000cfd4d000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000102ffff000 (usable) (XEN) ACPI: RSDP 000F4F20, 0024 (r2 HP ) (XEN) ACPI: XSDT CFD43900, 007C (r1 HP ProLiant 2 � 162E) (XEN) ACPI: FACP CFD439C0, 00F4 (r3 HP ProLiant 2 � 162E) (XEN) ACPI: DSDT CFD43AC0, 30C9 (r1 HP DSDT 1 INTL 20030228) (XEN) ACPI: FACS CFD43100, 0040 (XEN) ACPI: SPCR CFD43140, 0050 (r1 HP SPCRRBSU 1 � 162E) (XEN) ACPI: MCFG CFD431C0, 003C (r1 HP ProLiant 1 0) (XEN) ACPI: HPET CFD43200, 0038 (r1 HP ProLiant 2 � 162E) (XEN) ACPI: FFFF CFD43240, 0064 (r2 HP P61 2 � 162E) (XEN) ACPI: SPMI CFD432C0, 0040 (r5 HP ProLiant 1 � 162E) (XEN) ACPI: ERST CFD43300, 01D0 (r1 HP ProLiant 1 � 162E) (XEN) ACPI: APIC CFD43500, 0176 (r1 HP ProLiant 2 0) (XEN) ACPI: FFFF CFD43680, 0176 (r1 HP ProLiant 1 � 162E) (XEN) ACPI: BERT CFD43800, 0030 (r1 HP ProLiant 1 � 162E) (XEN) ACPI: HEST CFD43840, 00BC (r1 HP ProLiant 1 � 162E) (XEN) System RAM: 65532MB (67105672kB) (XEN) Domain heap initialised (XEN) Processor #0 6:15 APIC version 20 (XEN) Processor #8 6:15 APIC version 20 (XEN) Processor #16 6:15 APIC version 20 (XEN) Processor #24 6:15 APIC version 20 (XEN) Processor #1 6:15 APIC version 20 (XEN) Processor #9 6:15 APIC version 20 (XEN) Processor #17 6:15 APIC version 20 (XEN) Processor #25 6:15 APIC version 20 (XEN) Processor #2 6:15 APIC version 20 (XEN) Processor #10 6:15 APIC version 20 (XEN) Processor #18 6:15 APIC version 20 (XEN) Processor #26 6:15 APIC version 20 (XEN) Processor #3 6:15 APIC version 20 (XEN) Processor #11 6:15 APIC version 20 (XEN) Processor #19 6:15 APIC version 20 (XEN) Processor #27 6:15 APIC version 20 (XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 (XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec80000, GSI 24-47 (XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec81000, GSI 48-71 (XEN) IOAPIC[3]: apic_id 4, version 32, address 0xfec81800, GSI 72-95 (XEN) Enabling APIC mode: Phys. Using 4 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2400.145 MHz processor. (XEN) Initing memory sharing. (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) HVM: ASIDs disabled. (XEN) HVM: VMX enabled (XEN) I/O virtualisation disabled (XEN) Total of 16 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) checking TSC synchronization across 16 CPUs: passed. (XEN) Platform timer is 14.318MHz HPET (XEN) Allocated console ring of 32 KiB. (XEN) Brought up 16 CPUs (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1708000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 000000083c000000->0000000840000000 (770048 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff81708000 (XEN) Init. ramdisk: ffffffff81708000->ffffffff81efb000 (XEN) Phys-Mach map: ffffffff81efb000->ffffffff824fb000 (XEN) Start info: ffffffff824fb000->ffffffff824fb4b4 (XEN) Page tables: ffffffff824fc000->ffffffff82513000 (XEN) Boot stack: ffffffff82513000->ffffffff82514000 (XEN) TOTAL: ffffffff80000000->ffffffff82800000 (XEN) ENTRY ADDRESS: ffffffff81531200 (XEN) Dom0 has maximum 16 VCPUs (XEN) Scrubbing Free RAM: .........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Errors and warnings (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 176kB init memory. (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. and here is xm dmesg of xen 3.2 on a debian lenny system running on the same hardware, on this system I don't have clock problems: (XEN) Xen version 3.2-1 (Debian 3.2.1-2) (waldi@debian.org) (gcc version 4.3.1 (Debian 4.3.1-2) ) Sat Jun 28 09:32:18 UTC 2008 (XEN) Command line: (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: V2; EDID transfer time: 2 seconds (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 2 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009f400 (usable) (XEN) 000000000009f400 - 00000000000a0000 (reserved) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cfd43000 (usable) (XEN) 00000000cfd43000 - 00000000cfd4c000 (ACPI data) (XEN) 00000000cfd4c000 - 00000000cfd4d000 (usable) (XEN) 00000000cfd4d000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fec00000 - 00000000fed00000 (reserved) (XEN) 00000000fee00000 - 00000000fee10000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000f2ffff000 (usable) (XEN) System RAM: 61436MB (62911368kB) (XEN) Xen heap: 12MB (13128kB) (XEN) Domain heap initialised: DMA width 32 bits (XEN) Processor #0 6:15 APIC version 20 (XEN) Processor #8 6:15 APIC version 20 (XEN) Processor #16 6:15 APIC version 20 (XEN) Processor #24 6:15 APIC version 20 (XEN) Processor #1 6:15 APIC version 20 (XEN) Processor #9 6:15 APIC version 20 (XEN) Processor #17 6:15 APIC version 20 (XEN) Processor #25 6:15 APIC version 20 (XEN) Processor #2 6:15 APIC version 20 (XEN) Processor #10 6:15 APIC version 20 (XEN) Processor #18 6:15 APIC version 20 (XEN) Processor #26 6:15 APIC version 20 (XEN) Processor #3 6:15 APIC version 20 (XEN) Processor #11 6:15 APIC version 20 (XEN) Processor #19 6:15 APIC version 20 (XEN) Processor #27 6:15 APIC version 20 (XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 (XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec80000, GSI 24-47 (XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec81000, GSI 48-71 (XEN) IOAPIC[3]: apic_id 4, version 32, address 0xfec81800, GSI 72-95 (XEN) Enabling APIC mode: Phys. Using 4 I/O APICs (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 2400.136 MHz processor. (XEN) HVM: VMX enabled (XEN) CPU0: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 1/8 eip 8c000 (XEN) CPU1: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 2/16 eip 8c000 (XEN) CPU2: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 3/24 eip 8c000 (XEN) CPU3: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 4/1 eip 8c000 (XEN) CPU4: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 5/9 eip 8c000 (XEN) CPU5: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 6/17 eip 8c000 (XEN) CPU6: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 7/25 eip 8c000 (XEN) CPU7: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 8/2 eip 8c000 (XEN) CPU8: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 9/10 eip 8c000 (XEN) CPU9: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 10/18 eip 8c000 (XEN) CPU10: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 11/26 eip 8c000 (XEN) CPU11: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 12/3 eip 8c000 (XEN) CPU12: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 13/11 eip 8c000 (XEN) CPU13: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 14/19 eip 8c000 (XEN) CPU14: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Booting processor 15/27 eip 8c000 (XEN) CPU15: Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping 0b (XEN) Total of 16 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) Platform timer overflows in 14998 jiffies. (XEN) Platform timer is 14.318MHz HPET (XEN) Brought up 16 CPUs (XEN) AMD IOMMU: Disabled (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, lsb, paddr 0x200000 -> 0x631918 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 00000008e0000000->00000008f0000000 (15422585 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff80200000->ffffffff80631918 (XEN) Init. ramdisk: ffffffff80632000->ffffffff80d40e00 (XEN) Phys-Mach map: ffffffff80d41000->ffffffff8836b3c8 (XEN) Start info: ffffffff8836c000->ffffffff8836c4a4 (XEN) Page tables: ffffffff8836d000->ffffffff883b4000 (XEN) Boot stack: ffffffff883b4000->ffffffff883b5000 (XEN) TOTAL: ffffffff80000000->ffffffff88800000 (XEN) ENTRY ADDRESS: ffffffff80200000 (XEN) Dom0 has maximum 16 VCPUs (XEN) Initrd len 0x70ee00, start at 0xffffffff80632000 (XEN) Scrubbing Free RAM: .done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: Errors and warnings (XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen) (XEN) Freed 104kB init memory. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote:> It''s happened another time, system date 50 minutes ahead. > There is really no solution? >Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. It helps a lot to know if the issue is still in the latest hypervisor versions or not. 4.0.1 is quite old already.. and besides 4.0.4 is the latest version in 4.0 branch. -- Pasi> root@xen-p02:~# date > sab 29 set 2012, 15.06.25, CEST > > root@xen-p02:~# hwclock --debug > hwclock from util-linux-ng 2.17.2 > Using /dev interface to clock. > Last drift adjustment done at 1348816781 seconds after 1969 > Last calibration done at 1348816781 seconds after 1969 > Hardware clock is on UTC time > Assuming hardware clock is kept in UTC time. > Waiting for clock tick... > ...got clock tick > Time read from Hardware Clock: 2012/09/29 12:16:58 > Hw clock time : 2012/09/29 12:16:58 = 1348921018 seconds since 1969 > sab 29 set 2012 14:16:58 CEST -0.751536 seconds > > root@xen-p02:~# hwclock --show > sab 29 set 2012 14:17:12 CEST -0.423643 seconds > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote:> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote: >> It's happened another time, system date 50 minutes ahead. >> There is really no solution? >> > > Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. > It helps a lot to know if the issue is still in the latest hypervisor versions or not.I'm using debian squeeze xen kernel and this kernel has xen 4.0. _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com> wrote:> On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote: >> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote: >>> It's happened another time, system date 50 minutes ahead. >>> There is really no solution? >>> >> >> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. >> It helps a lot to know if the issue is still in the latest hypervisor versions or not.There is someone that had the problem and solved using a recent xen hypervisor? _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
>From: xen-users-bounces@lists.xen.org [xen-users-bounces@lists.xen.org] On Behalf Of Mauro [mrsanna1@gmail.com] >On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote: >> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote: >>> It''s happened another time, system date 50 minutes ahead. >>> There is really no solution? >>> >> >> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. >> It helps a lot to know if the issue is still in the latest hypervisor versions or not. > >I''m using debian squeeze xen kernel and this kernel has xen 4.0. >Xen and kernel are two different things, you can mix and match them in many ways. If you don''t want to compile from source, use Xen packages from Debian Wheezy (4.1.3-2), Sid (4.1.3-3) or Experimental (4.2.0-1). I doubt you will move forward without trying newer versions no matter how many mails you write to list (all the people) ... :) Matej
Hello, Strange things on this rainy morning, I got this bug again on another hardware, and Xen hypervisor from Wheezy and kernel from Squeeze. ii linux-image-2.6.32-5-xen-amd64 2.6.32-46 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii xen-hypervisor-4.1-amd64 4.1.3-2 Xen Hypervisor on AMD64 I''ve upgraded this server last week. I don''t know if it is linked or not, but I didn''t get any ''time wrap'' on this server for more than 250days. Maybe it is related to the upgrade process. Before the upgrade, my version were : ii linux-image-2.6.32-5-xen-amd64 2.6.32-41squeeze2 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii xen-hypervisor-4.1-amd64 4.1.2-2 Xen Hypervisor on AMD64 I only upgraded half of my servers, so I will wait a little bit to upgrade the other half and see if this issue occurs again only on updated servers. For the record, always the same errors : xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. /var/log/* => Oct 14 22:46:07 eul2400468 kernel: [734618.562219] Clocksource tsc unstable (delta = -2999660313370 ns) I thought it was this issue was hardware related, maybe not. Olivier 2012/9/30 Mauro <mrsanna1@gmail.com>> On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com> wrote: > > On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote: > >> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote: > >>> It''s happened another time, system date 50 minutes ahead. > >>> There is really no solution? > >>> > >> > >> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. > >> It helps a lot to know if the issue is still in the latest hypervisor > versions or not. > > There is someone that had the problem and solved using a recent xen > hypervisor? >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
<Philippe.Simonet@swisscom.com>
2012-Oct-15 08:05 UTC
Re: [Xen-users] Re: Xen 4 TSC problems
Hi Oliver bad news, this means that xen 4.1xxx doesn''t solve this issue ... on my side, on my hardware that produce this bug ,I never had the problem whit this combination : (100% WHEEZY) ii xen-hypervisor-4.1-amd64 4.1.3-2 amd64 Xen Hypervisor on AMD64 ii linux-image-3.2.0-3-amd64 3.2.23-1 amd64 Linux 3.2 for 64-bit PCs (this was because I had a great hope that 4.1.xxx solved the problem ...) Philippe From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Olivier Hanesse Sent: Monday, October 15, 2012 9:40 AM To: Mauro Cc: Dan Magenheimer; xen-devel@lists.xensource.com; Keir Fraser; Jeremy Fitzhardinge; Keir Fraser; Xen Users; Mark Adams Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems Hello, Strange things on this rainy morning, I got this bug again on another hardware, and Xen hypervisor from Wheezy and kernel from Squeeze. ii linux-image-2.6.32-5-xen-amd64 2.6.32-46 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii xen-hypervisor-4.1-amd64 4.1.3-2 Xen Hypervisor on AMD64 I''ve upgraded this server last week. I don''t know if it is linked or not, but I didn''t get any ''time wrap'' on this server for more than 250days. Maybe it is related to the upgrade process. Before the upgrade, my version were : ii linux-image-2.6.32-5-xen-amd64 2.6.32-41squeeze2 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii xen-hypervisor-4.1-amd64 4.1.2-2 Xen Hypervisor on AMD64 I only upgraded half of my servers, so I will wait a little bit to upgrade the other half and see if this issue occurs again only on updated servers. For the record, always the same errors : xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. /var/log/* => Oct 14 22:46:07 eul2400468 kernel: [734618.562219] Clocksource tsc unstable (delta = -2999660313370 ns) I thought it was this issue was hardware related, maybe not. Olivier 2012/9/30 Mauro <mrsanna1@gmail.com<mailto:mrsanna1@gmail.com>> On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com<mailto:mrsanna1@gmail.com>> wrote:> On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi<mailto:pasik@iki.fi>> wrote: >> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote: >>> It''s happened another time, system date 50 minutes ahead. >>> There is really no solution? >>> >> >> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. >> It helps a lot to know if the issue is still in the latest hypervisor versions or not.There is someone that had the problem and solved using a recent xen hypervisor? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
really really bad news, I hope this annoying problem will be resolved very soon. On 15 October 2012 10:05, <Philippe.Simonet@swisscom.com> wrote:> Hi Oliver > > > > bad news, this means that xen 4.1xxx doesn’t solve this issue … > > > > on my side, on my hardware that produce this bug ,I never had the problem > whit this combination : (100% WHEEZY) > > ii xen-hypervisor-4.1-amd64 4.1.3-2 > amd64 Xen Hypervisor on AMD64 > > ii linux-image-3.2.0-3-amd64 3.2.23-1 > amd64 Linux 3.2 for 64-bit PCs > > > > (this was because I had a great hope that 4.1.xxx solved the > problem …) > > > > Philippe > > > > > > > > From: xen-devel-bounces@lists.xen.org > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Olivier Hanesse > Sent: Monday, October 15, 2012 9:40 AM > To: Mauro > Cc: Dan Magenheimer; xen-devel@lists.xensource.com; Keir Fraser; Jeremy > Fitzhardinge; Keir Fraser; Xen Users; Mark Adams > Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems > > > > Hello, > > > > Strange things on this rainy morning, I got this bug again on another > hardware, and Xen hypervisor from Wheezy and kernel from Squeeze. > > > > ii linux-image-2.6.32-5-xen-amd64 2.6.32-46 Linux > 2.6.32 for 64-bit PCs, Xen dom0 support > > ii xen-hypervisor-4.1-amd64 4.1.3-2 Xen > Hypervisor on AMD64 > > > > I've upgraded this server last week. I don't know if it is linked or not, > but I didn't get any 'time wrap' on this server for more than 250days. > > Maybe it is related to the upgrade process. Before the upgrade, my version > were : > > > > ii linux-image-2.6.32-5-xen-amd64 2.6.32-41squeeze2 Linux > 2.6.32 for 64-bit PCs, Xen dom0 support > > ii xen-hypervisor-4.1-amd64 4.1.2-2 Xen > Hypervisor on AMD64 > > > > I only upgraded half of my servers, so I will wait a little bit to upgrade > the other half and see if this issue occurs again only on updated servers. > > > > For the record, always the same errors : > > > > xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or > more times. > > > > /var/log/* => Oct 14 22:46:07 eul2400468 kernel: [734618.562219] > Clocksource tsc unstable (delta = -2999660313370 ns) > > > > I thought it was this issue was hardware related, maybe not. > > > > Olivier > > > > 2012/9/30 Mauro <mrsanna1@gmail.com> > > On 30 September 2012 21:23, Mauro <mrsanna1@gmail.com> wrote: >> On 30 September 2012 17:13, Pasi Kärkkäinen <pasik@iki.fi> wrote: >>> On Sat, Sep 29, 2012 at 02:19:55PM +0200, Mauro wrote: >>>> It's happened another time, system date 50 minutes ahead. >>>> There is really no solution? >>>> >>> >>> Try with a recent Xen hypervisor version. Xen 4.1.3 or 4.2.0. >>> It helps a lot to know if the issue is still in the latest hypervisor >>> versions or not. > > There is someone that had the problem and solved using a recent xen > hypervisor? > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 15.10.12 at 09:39, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: > For the record, always the same errors : > > xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or > more times.This is what needs to be analyzed: For it to happen, timer softirqs must not occur for quite long a period of time, and it needs to be determined what that is. Since only very few people can actually see this problem, we depend on at least some data collection (including figuring out what hardware and/or software components are involved in surfacing the problem) being done by them. Jan
On 15 October 2012 12:32, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 15.10.12 at 09:39, Olivier Hanesse <olivier.hanesse@gmail.com> wrote: >> For the record, always the same errors : >> >> xm dmesg => (XEN) Platform timer appears to have unexpectedly wrapped 10 or >> more times. > > This is what needs to be analyzed: For it to happen, timer softirqs > must not occur for quite long a period of time, and it needs to be > determined what that is. Since only very few people can actually > see this problem, we depend on at least some data collection > (including figuring out what hardware and/or software components > are involved in surfacing the problem) being done by them.I have the problem on this hardware type: Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. It seem that GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" put in in /etc/default/grup (I use linux debian) solves the problem for me.
>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote: > I have the problem on this hardware type: > > Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. > It seem that > GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" > put in in /etc/default/grup (I use linux debian) > solves the problem for me.Did you check whether either or both options on their own also make the problem go away? Jan
On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote: >> I have the problem on this hardware type: >> >> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. >> It seem that >> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" >> put in in /etc/default/grup (I use linux debian) >> solves the problem for me. > > Did you check whether either or both options on their own also > make the problem go away?Only clocksource=pit does not solve the problem, I''ve not tried with only cpuidle=0, I will try soon.
On 15/10/2012 15:25, "Mauro" <mrsanna1@gmail.com> wrote:> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote: >>> I have the problem on this hardware type: >>> >>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. >>> It seem that >>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" >>> put in in /etc/default/grup (I use linux debian) >>> solves the problem for me. >> >> Did you check whether either or both options on their own also >> make the problem go away? > > Only clocksource=pit does not solve the problem, I''ve not tried with > only cpuidle=0, I will try soon.The problem here is that the platform timer has *not* wrapped. In fact it is almost certainly correct, and it is the calculation of current-system-time extrapolated from local CPU''s TSC that has gone haywire. The overflow-handling logic in plt_overflow() then propagates that incorrectness into plt_stamp64 (up to a maximum of 10 times wrapping the platform timer''s counter). This means that platform time is incorrect (skips forward) and soon after will infect the local time estimation for all CPUs. I''ve attached a patch which will (a) stop plt_overflow() from misguidedly trying to fix up apparent platform timer overflow; and (b) will print possibly-useful diagnostics when apparent ''timer overflow'' occurs. Such lines will be prefixed "XXX plt_overflow:" in the hypervisor log. Patch is against xen-unstable but I''m sure it must backport to older trees quite trivially. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote:> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) > plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); > if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) > break; > + rdtscll(tsc); > + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 > + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 > + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 > + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", > + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, > + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); > + break;Is the break here, making the following update to plt_stamp64 dead code deliberate?> plt_stamp64 += plt_mask + 1; > } > if ( i != 0 )Ian.
On 18/10/2012 08:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:> On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote: >> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) >> plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); >> if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) >> break; >> + rdtscll(tsc); >> + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 >> + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 >> + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 >> + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", >> + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, >> + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); >> + break; > > Is the break here, making the following update to plt_stamp64 dead code > deliberate?Yes, it''s a hack to disable the timer-has-apparently-wrapped workaround. -- Keir>> plt_stamp64 += plt_mask + 1; >> } >> if ( i != 0 ) > > Ian. > >
On Thu, 2012-10-18 at 08:55 +0100, Keir Fraser wrote:> On 18/10/2012 08:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: > > > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote: > >> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) > >> plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); > >> if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) > >> break; > >> + rdtscll(tsc); > >> + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 > >> + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 > >> + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 > >> + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", > >> + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, > >> + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); > >> + break; > > > > Is the break here, making the following update to plt_stamp64 dead code > > deliberate? > > Yes, it''s a hack to disable the timer-has-apparently-wrapped workaround.OK, good. I wonder if this explains some of the issues which have been plaguing Debian Squeeze (4.0 based) for a while now. I''ll see if I can get someone there to give it a go. Ian.> > -- Keir > > >> plt_stamp64 += plt_mask + 1; > >> } > >> if ( i != 0 ) > > > > Ian. > > > > > >
On 18 October 2012 10:33, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2012-10-18 at 08:55 +0100, Keir Fraser wrote: >> On 18/10/2012 08:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: >> >> > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote: >> >> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) >> >> plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); >> >> if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) >> >> break; >> >> + rdtscll(tsc); >> >> + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 >> >> + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 >> >> + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 >> >> + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", >> >> + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, >> >> + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); >> >> + break; >> > >> > Is the break here, making the following update to plt_stamp64 dead code >> > deliberate? >> >> Yes, it''s a hack to disable the timer-has-apparently-wrapped workaround. > > OK, good. > > I wonder if this explains some of the issues which have been plaguing > Debian Squeeze (4.0 based) for a while now. I''ll see if I can get > someone there to give it a go.If that patch works debian kernel maintainers can be advised if they can include that patch and release a new kernel working for squeeze.
On Thu, 2012-10-18 at 09:56 +0100, Mauro wrote:> On 18 October 2012 10:33, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Thu, 2012-10-18 at 08:55 +0100, Keir Fraser wrote: > >> On 18/10/2012 08:40, "Ian Campbell" <Ian.Campbell@citrix.com> wrote: > >> > >> > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote: > >> >> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) > >> >> plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); > >> >> if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) > >> >> break; > >> >> + rdtscll(tsc); > >> >> + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 > >> >> + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 > >> >> + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 > >> >> + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", > >> >> + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, > >> >> + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); > >> >> + break; > >> > > >> > Is the break here, making the following update to plt_stamp64 dead code > >> > deliberate? > >> > >> Yes, it''s a hack to disable the timer-has-apparently-wrapped workaround. > > > > OK, good. > > > > I wonder if this explains some of the issues which have been plaguing > > Debian Squeeze (4.0 based) for a while now. I''ll see if I can get > > someone there to give it a go. > > If that patch works debian kernel maintainers can be advised if they > can include that patch and release a new kernel working for squeeze.AFAIK this is a debug patch, not something we would deploy as is but it should give us the information required to work out a real fix. Ian
<Philippe.Simonet@swisscom.com>
2012-Oct-18 13:45 UTC
Re: [Xen-users] Re: Xen 4 TSC problems
in the meantime, it would be cool to have a kernel boot parameter that could disable this wrapping'' correction'' ? like <check-timer-wrap=false>> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Thursday, October 18, 2012 9:40 AM > To: Keir Fraser > Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Dan > Magenheimer; Mauro; Olivier Hanesse; Jan Beulich; Xen Users; Mark Adams > Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems > > On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote: > > @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) > > plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); > > if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) > > break; > > + rdtscll(tsc); > > + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 > > + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 > > + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 > > + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", > > + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, > > + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); > > + break; > > Is the break here, making the following update to plt_stamp64 dead code > deliberate? > > > plt_stamp64 += plt_mask + 1; > > } > > if ( i != 0 ) > > Ian. > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
We have no idea yet whether this workaround even does any good. -- Keir On 18/10/2012 14:45, "Philippe.Simonet@swisscom.com" <Philippe.Simonet@swisscom.com> wrote:> in the meantime, it would be cool to have a kernel boot parameter that could > disable this wrapping'' > correction'' ? like <check-timer-wrap=false> > >> -----Original Message----- >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- >> bounces@lists.xen.org] On Behalf Of Ian Campbell >> Sent: Thursday, October 18, 2012 9:40 AM >> To: Keir Fraser >> Cc: Jeremy Fitzhardinge; xen-devel@lists.xensource.com; Dan >> Magenheimer; Mauro; Olivier Hanesse; Jan Beulich; Xen Users; Mark Adams >> Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems >> >> On Wed, 2012-10-17 at 17:15 +0100, Keir Fraser wrote: >>> @@ -540,6 +541,14 @@ static void plt_overflow(void *unused) >>> plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1); >>> if ( ABS(plt_wrap - now) > ABS(plt_now - now) ) >>> break; >>> + rdtscll(tsc); >>> + printk("XXX plt_overflow: plt_now=%"PRIx64" plt_wrap=%"PRIx64 >>> + " now=%"PRIx64" old_stamp=%"PRIx64" new_stamp=%"PRIx64 >>> + " plt_stamp64=%"PRIx64" plt_mask=%"PRIx64 >>> + " tsc=%"PRIx64" tsc_stamp=%"PRIx64"\n", >>> + plt_now, plt_wrap, now, old_stamp, plt_stamp, plt_stamp64, >>> + plt_mask, tsc, this_cpu(cpu_time).local_tsc_stamp); >>> + break; >> >> Is the break here, making the following update to plt_stamp64 dead code >> deliberate? >> >>> plt_stamp64 += plt_mask + 1; >>> } >>> if ( i != 0 ) >> >> Ian. >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote: >> I have the problem on this hardware type: >> >> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. >> It seem that >> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" >> put in in /etc/default/grup (I use linux debian) >> solves the problem for me. > > Did you check whether either or both options on their own also > make the problem go away?It seems that with debian squeeze on my HP Proliant Dl 580 G5 servers is sufficient to use GRUB_CMDLINE_XEN="cpuidle=0". Is from about 20 days that I have no clock jumps. Before I had a clock jump every week. Hope this is the final workaround for me.
>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote: > On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote: >>> I have the problem on this hardware type: >>> >>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. >>> It seem that >>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" >>> put in in /etc/default/grup (I use linux debian) >>> solves the problem for me. >> >> Did you check whether either or both options on their own also >> make the problem go away? > > It seems that with debian squeeze on my HP Proliant Dl 580 G5 servers > is sufficient to use > GRUB_CMDLINE_XEN="cpuidle=0". > Is from about 20 days that I have no clock jumps. > Before I had a clock jump every week. > Hope this is the final workaround for me.So what''s the contents of /proc/cpuinfo (any one CPU suffices) under a native recent kernel on that system? The most likely issue here is that we''re mis-identifying the CPU as having an always running APIC timer (ARAT)... For a second, less intrusive try: Could you replace "cpuidle=0" with "max_cstate=1" (assuming the former didn''t meanwhile turn out not to cure the problem)? If that works too (expected), try "max_cstate=2" followed eventually by "max_cstate=2 local_apic_timer_c2_ok". Jan
On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote: >> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote: >>>>>> On 15.10.12 at 13:24, Mauro <mrsanna1@gmail.com> wrote: >>>> I have the problem on this hardware type: >>>> >>>> Hp Proliant DL580 G5 with four Intel(R) Xeon(R) CPU E7330 @ 2.40GHz. >>>> It seem that >>>> GRUB_CMDLINE_XEN="clocksource=pit cpuidle=0" >>>> put in in /etc/default/grup (I use linux debian) >>>> solves the problem for me. >>> >>> Did you check whether either or both options on their own also >>> make the problem go away? >> >> It seems that with debian squeeze on my HP Proliant Dl 580 G5 servers >> is sufficient to use >> GRUB_CMDLINE_XEN="cpuidle=0". >> Is from about 20 days that I have no clock jumps. >> Before I had a clock jump every week. >> Hope this is the final workaround for me. > > So what''s the contents of /proc/cpuinfo (any one CPU suffices) > under a native recent kernel on that system? The most likely > issue here is that we''re mis-identifying the CPU as having an > always running APIC timer (ARAT)...uname -a Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 x86_64 GNU/Linux cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping : 11 cpu MHz : 2400.176 cache size : 3072 KB fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm bogomips : 4800.35 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:> > For a second, less intrusive try: Could you replace "cpuidle=0" > with "max_cstate=1" (assuming the former didn''t meanwhile > turn out not to cure the problem)? If that works too (expected), > try "max_cstate=2" followed eventually by > "max_cstate=2 local_apic_timer_c2_ok".I''ll try but to say that it works I''ve to wait at least two weeks.
>>> On 22.10.12 at 11:17, Mauro <mrsanna1@gmail.com> wrote: > On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote: >>> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote: >> So what''s the contents of /proc/cpuinfo (any one CPU suffices) >> under a native recent kernel on that system? The most likely >> issue here is that we''re mis-identifying the CPU as having an >> always running APIC timer (ARAT)... > > uname -a > > Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 > x86_64 GNU/LinuxI had specifically asked to do this under a _native_ kernel.> cat /proc/cpuinfo > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz > stepping : 11 > cpu MHz : 2400.176 > cache size : 3072 KB > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov > pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc > rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm > bogomips : 4800.35 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: > >> >> For a second, less intrusive try: Could you replace "cpuidle=0" >> with "max_cstate=1" (assuming the former didn''t meanwhile >> turn out not to cure the problem)? If that works too (expected), >> try "max_cstate=2" followed eventually by >> "max_cstate=2 local_apic_timer_c2_ok". > > I''ll try but to say that it works I''ve to wait at least two weeks.I understand that this takes quite a bit of time. Jan
On 22 October 2012 11:27, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 22.10.12 at 11:17, Mauro <mrsanna1@gmail.com> wrote: >> On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com> wrote: >>>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote: >>>> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote: >>> So what''s the contents of /proc/cpuinfo (any one CPU suffices) >>> under a native recent kernel on that system? The most likely >>> issue here is that we''re mis-identifying the CPU as having an >>> always running APIC timer (ARAT)... >> >> uname -a >> >> Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 >> x86_64 GNU/Linux > > I had specifically asked to do this under a _native_ kernel.sorry for my ignorance, what does it mean native_kernel.
>>> On 22.10.12 at 12:40, Mauro <mrsanna1@gmail.com> wrote: > On 22 October 2012 11:27, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 22.10.12 at 11:17, Mauro <mrsanna1@gmail.com> wrote: >>> On 22 October 2012 08:54, Jan Beulich <JBeulich@suse.com> wrote: >>>>>>> On 21.10.12 at 22:52, Mauro <mrsanna1@gmail.com> wrote: >>>>> On 15 October 2012 14:49, Jan Beulich <JBeulich@suse.com> wrote: >>>> So what''s the contents of /proc/cpuinfo (any one CPU suffices) >>>> under a native recent kernel on that system? The most likely >>>> issue here is that we''re mis-identifying the CPU as having an >>>> always running APIC timer (ARAT)... >>> >>> uname -a >>> >>> Linux xen-p01 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 >>> x86_64 GNU/Linux >> >> I had specifically asked to do this under a _native_ kernel. > > sorry for my ignorance, what does it mean native_kernel.A kernel run with no Xen underneath it. Jan
> A kernel run with no Xen underneath it.Here is: uname -a Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 x86_64 GNU/Linux cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping : 11 cpu MHz : 2399.822 cache size : 3072 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow vnmi flexpriority bogomips : 4799.64 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >> A kernel run with no Xen underneath it. > > Here is: > > uname -a > Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 > x86_64 GNU/LinuxI''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as I had asked for). The thing is that the (unfortunately incomplete) hypervisor log you sent earlier leaves open whether we mis-detect ARAT on that system (the relevant HPET message is info level, but you had "loglvl=warning" in place), so we can''t be certain of either fact (ARAT actually being reported by the CPU as well as whether HPET broadcast is getting set up). Irrespective of that it would also be useful to know whether the native kernel (and in particular its CPU idle management) work on that system, and which C-states it actually makes use of. So getting us the contents of the respective sysfs nodes would also be helpful for reference. Jan> cat /proc/cpuinfo > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz > stepping : 11 > cpu MHz : 2399.822 > cache size : 3072 KB > physical id : 0 > siblings : 4 > core id : 0 > cpu cores : 4 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 10 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf > pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm > tpr_shadow vnmi flexpriority > bogomips : 4799.64 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management:
On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >>> A kernel run with no Xen underneath it. >> >> Here is: >> >> uname -a >> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 >> x86_64 GNU/Linux > > I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as > I had asked for).Sorry I''m using squeeze in production servers and I don''t have a test machine on which install wheezy. There''s nothing else I can do? As reported before with the workaround cpuidle=0 I don''t have any problem now.
>>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote: > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >>>> A kernel run with no Xen underneath it. >>> >>> Here is: >>> >>> uname -a >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 >>> x86_64 GNU/Linux >> >> I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as >> I had asked for). > > Sorry I''m using squeeze in production servers and I don''t have a test > machine on which install wheezy.And can''t/don''t want to install a self-built kernel?> There''s nothing else I can do?I told you yesterday what less invasive command line options you could try. Plus in an earlier mail today I also asked for specific information on which C-states the native kernel uses. If all you''ve got is 2.6.32, obtaining the information there is better than nothing. Jan> As reported before with the workaround cpuidle=0 I don''t have any problem > now.
On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote:> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote: > > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote: > >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: > >>>> A kernel run with no Xen underneath it. > >>> > >>> Here is: > >>> > >>> uname -a > >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 > >>> x86_64 GNU/Linux > >> > >> I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as > >> I had asked for). > > > > Sorry I''m using squeeze in production servers and I don''t have a test > > machine on which install wheezy. > > And can''t/don''t want to install a self-built kernel?You could also boot one of those Live Image kernels. Like an Fedora or Ubuntu and just capture this. That way you don''t over-write anything.
On 23 October 2012 13:50, Konrad Rzeszutek Wilk <konrad@kernel.org> wrote:> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote: >> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote: >> > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote: >> >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >> >>>> A kernel run with no Xen underneath it. >> >>> >> >>> Here is: >> >>> >> >>> uname -a >> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 >> >>> x86_64 GNU/Linux >> >> >> >> I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as >> >> I had asked for). >> > >> > Sorry I''m using squeeze in production servers and I don''t have a test >> > machine on which install wheezy. >> >> And can''t/don''t want to install a self-built kernel? > > You could also boot one of those Live Image kernels. Like an Fedora or > Ubuntu and just capture this. That way you don''t over-write anything.Ok, I''ll try the live image. Today I had another clock jump so cpuidle=0 doesn''t work. I''ll stay using clocksource= pit and cpuidle =0 for a while to see if they together work. But...how to know what C states the kernel uses?
>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com> wrote: > On 23 October 2012 13:50, Konrad Rzeszutek Wilk <konrad@kernel.org> wrote: >> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote: >>> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote: >>> > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote: >>> >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >>> >>>> A kernel run with no Xen underneath it. >>> >>> >>> >>> Here is: >>> >>> >>> >>> uname -a >>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 >>> >>> x86_64 GNU/Linux >>> >> >>> >> I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as >>> >> I had asked for). >>> > >>> > Sorry I''m using squeeze in production servers and I don''t have a test >>> > machine on which install wheezy. >>> >>> And can''t/don''t want to install a self-built kernel? >> >> You could also boot one of those Live Image kernels. Like an Fedora or >> Ubuntu and just capture this. That way you don''t over-write anything. > > Ok, I''ll try the live image. > Today I had another clock jump so cpuidle=0 doesn''t work. > I''ll stay using clocksource= pit and cpuidle =0 for a while to see if > they together work. > But...how to know what C states the kernel uses?If "cpuidle=0" alone doesn''t work, your problem is not C-state related, and you don''t need to look up how much of it the native kernel uses. Jan
On 23 October 2012 16:43, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com> wrote: >> On 23 October 2012 13:50, Konrad Rzeszutek Wilk <konrad@kernel.org> wrote: >>> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote: >>>> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote: >>>> > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote: >>>> >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >>>> >>>> A kernel run with no Xen underneath it. >>>> >>> >>>> >>> Here is: >>>> >>> >>>> >>> uname -a >>>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 >>>> >>> x86_64 GNU/Linux >>>> >> >>>> >> I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as >>>> >> I had asked for). >>>> > >>>> > Sorry I''m using squeeze in production servers and I don''t have a test >>>> > machine on which install wheezy. >>>> >>>> And can''t/don''t want to install a self-built kernel? >>> >>> You could also boot one of those Live Image kernels. Like an Fedora or >>> Ubuntu and just capture this. That way you don''t over-write anything. >> >> Ok, I''ll try the live image. >> Today I had another clock jump so cpuidle=0 doesn''t work. >> I''ll stay using clocksource= pit and cpuidle =0 for a while to see if >> they together work. >> But...how to know what C states the kernel uses? > > If "cpuidle=0" alone doesn''t work, your problem is not C-state > related, and you don''t need to look up how much of it the native > kernel uses.Ok, cpuidle=0 however is to be used because also clocksource=pit alone doesn''t work. Now I''ll use both params and see what happens.
On 23 October 2012 16:46, Mauro <mrsanna1@gmail.com> wrote:> On 23 October 2012 16:43, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com> wrote: >>> On 23 October 2012 13:50, Konrad Rzeszutek Wilk <konrad@kernel.org> wrote: >>>> On Tue, Oct 23, 2012 at 09:50:13AM +0100, Jan Beulich wrote: >>>>> >>> On 23.10.12 at 10:40, Mauro <mrsanna1@gmail.com> wrote: >>>>> > On 23 October 2012 09:58, Jan Beulich <JBeulich@suse.com> wrote: >>>>> >>>>> On 23.10.12 at 09:18, Mauro <mrsanna1@gmail.com> wrote: >>>>> >>>> A kernel run with no Xen underneath it. >>>>> >>> >>>>> >>> Here is: >>>>> >>> >>>>> >>> uname -a >>>>> >>> Linux xen-p02 2.6.32-5-amd64 #1 SMP Sun Sep 23 10:07:46 UTC 2012 >>>>> >>> x86_64 GNU/Linux >>>>> >> >>>>> >> I''m sorry to say that, but 2.6.32 is nowhere close to "recent" (as >>>>> >> I had asked for). >>>>> > >>>>> > Sorry I''m using squeeze in production servers and I don''t have a test >>>>> > machine on which install wheezy. >>>>> >>>>> And can''t/don''t want to install a self-built kernel? >>>> >>>> You could also boot one of those Live Image kernels. Like an Fedora or >>>> Ubuntu and just capture this. That way you don''t over-write anything. >>> >>> Ok, I''ll try the live image. >>> Today I had another clock jump so cpuidle=0 doesn''t work. >>> I''ll stay using clocksource= pit and cpuidle =0 for a while to see if >>> they together work. >>> But...how to know what C states the kernel uses? >> >> If "cpuidle=0" alone doesn''t work, your problem is not C-state >> related, and you don''t need to look up how much of it the native >> kernel uses. > > Ok, cpuidle=0 however is to be used because also clocksource=pit alone > doesn''t work. > Now I''ll use both params and see what happens.Sorry for the noise, I''m not sure if it has been really a clock jump. Retry using only cpuidle=0 and then what Jan has suggested. If you want to see the C states tell me how to do. p.s. sorry for bad english
>>> On 23.10.12 at 17:34, Mauro <mrsanna1@gmail.com> wrote: > On 23 October 2012 16:46, Mauro <mrsanna1@gmail.com> wrote: >> On 23 October 2012 16:43, Jan Beulich <JBeulich@suse.com> wrote: >>>>>> On 23.10.12 at 16:07, Mauro <mrsanna1@gmail.com> wrote: >>>> Today I had another clock jump so cpuidle=0 doesn''t work. >>>> I''ll stay using clocksource= pit and cpuidle =0 for a while to see if >>>> they together work. >>>> But...how to know what C states the kernel uses? >>> >>> If "cpuidle=0" alone doesn''t work, your problem is not C-state >>> related, and you don''t need to look up how much of it the native >>> kernel uses. >> >> Ok, cpuidle=0 however is to be used because also clocksource=pit alone >> doesn''t work. >> Now I''ll use both params and see what happens. > > Sorry for the noise, I''m not sure if it has been really a clock jump. > Retry using only cpuidle=0 and then what Jan has suggested. > If you want to see the C states tell me how to do.Let''s wait with that until you''re settled on whether "cpuidle=0" alone works. Jan