Derek Murray
2007-Feb-23 19:10 UTC
[Xen-devel] Soft lockup/Time went backwards in latest unstable
I''m running Xen on a Lenovo Thinkpad T60p, based on an Intel Core Duo T2500 processor. Since upgrading to the latest unstable, the console is filled with "Time went backwards" messages. An example: Timer ISR/1: Time went backwards: delta=-52249969 delta_cpu=67750031 shadow=197099583154 off=168169166 processed=197320000000 cpu_processed=197200000000 0: 197310000000 1: 197200000000 Timer ISR/1: Time went backwards: delta=-76271442 delta_cpu=73728558 shadow=197099583154 off=224147323 processed=197400000000 cpu_processed=197250000000 0: 197390000000 1: 197250000000 Timer ISR/1: Time went backwards: delta=-90285138 delta_cpu=69714862 shadow=197099583154 off=280133627 processed=197470000000 cpu_processed=197310000000 0: 197460000000 1: 197310000000 In addition, I noticed a soft lockup, with the following stack trace: BUG: soft lockup detected on CPU#0! [<c014ef57>] softlockup_tick+0x97/0xd0 [<c01097ea>] timer_interrupt+0x2fa/0x6b0 [<ee06a875>] sd_rw_intr+0x75/0x2e0 [sd_mod] [<c014f2c3>] handle_IRQ_event+0x33/0xa0 [<c014f3d8>] __do_IRQ+0xa8/0x120 [<c010733f>] do_IRQ+0x3f/0xa0 [<c0244b4e>] evtchn_do_upcall+0xbe/0x100 [<c010580d>] hypervisor_callback+0x3d/0x48 [<c020007b>] cfb_imageblit+0x10b/0x530 [<c0243f5a>] force_evtchn_callback+0xa/0x10 [<ee07c63f>] acpi_processor_idle+0x26d/0x422 [processor] [<ee07c3d2>] acpi_processor_idle+0x0/0x422 [processor] [<c01035a1>] cpu_idle+0x71/0xd0 [<c03b482d>] start_kernel+0x39d/0x480 [<c03b4220>] unknown_bootoption+0x0/0x270 and BUG: soft lockup detected on CPU#1! [<c014ef57>] softlockup_tick+0x97/0xd0 [<c01097ea>] timer_interrupt+0x2fa/0x6b0 [<c014f2c3>] handle_IRQ_event+0x33/0xa0 [<c014f3d8>] __do_IRQ+0xa8/0x120 [<c010733f>] do_IRQ+0x3f/0xa0 [<c0107344>] do_IRQ+0x44/0xa0 [<c0244b4e>] evtchn_do_upcall+0xbe/0x100 [<c010580d>] hypervisor_callback+0x3d/0x48 [<c0243f5a>] force_evtchn_callback+0xa/0x10 [<ee07c63f>] acpi_processor_idle+0x26d/0x422 [processor] [<ee07c3d2>] acpi_processor_idle+0x0/0x422 [processor] [<c01035a1>] cpu_idle+0x71/0xd0 It appears that this issue was introduced in Xen 3.0.4, as this exhibits the same issue. 3.0.3 works fine. The version of unstable that I was using previously (which did not have this issue) worked with kernel version 2.6.16.29; 3.0.4 comes with 2.6.16.33. Could the problem have been introduced in the move from 2.6.16.29 to 2.6.16.33 (or somewhere in between)? I''m not sure how to go about handling this apparent bug, though I am prepared to offer any assistance necessary in fixing it. I''ve attached my xm dmesg, in case that''s any help. Thanks in advance. Regards, Derek Murray. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-23 23:08 UTC
Re: [Xen-devel] Soft lockup/Time went backwards in latest unstable
On 23/2/07 19:10, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote:> I''m not sure how to go about handling this apparent bug, though I am > prepared to offer any assistance necessary in fixing it. I''ve > attached my xm dmesg, in case that''s any help.You should be able to run a 3.0.3 dom0 on 3.0.4 Xen and vice versa. By trying both combinations you should be able to work out whether it is the Xen upgrade or the Linux upgrade which introduces the problem. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Derek Murray
2007-Feb-26 12:39 UTC
Re: [Xen-devel] Soft lockup/Time went backwards in latest unstable
On 23 Feb 2007, at 23:08, Keir Fraser wrote:> You should be able to run a 3.0.3 dom0 on 3.0.4 Xen and vice versa. By > trying both combinations you should be able to work out whether it > is the > Xen upgrade or the Linux upgrade which introduces the problem.It appears I was being hasty in blaming 3.0.4 for introducing the bug. I tested the following configurations: Xen 3.0.3_0: works with 3.0.3 kernel (2.6.16.29-xen) and 3.0.4 kernel (2.6.16.33-xen). Fails with -unstable kernel (2.6.18). Xen 3.0.4_1: works with 3.0.3 and 3.0.4 kernels. Fails with -unstable kernel. Xen unstable: works with 3.0.3 and 3.0.4 kernels. Fails with - unstable kernel. So, 3.0.3 and 3.0.4 work as distributed, but the latest -unstable is broken. The reason I attributed the blame to 3.0.4 was because I had seen the same bug in -unstable during the 3.0.3-3.0.4 interregnum; however, this bug seems to have been fixed for the 3.0.4 release, and re- emerged since with the transition to 2.6.18. Is there a pattern for how these bugs have been fixed in the past, so that I could go about trying to make a patch? Regards, Derek Murray. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-26 13:22 UTC
Re: [Xen-devel] Soft lockup/Time went backwards in latest unstable
On 26/2/07 12:39, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote:> The reason I attributed the blame to 3.0.4 was because I had seen the > same bug in -unstable during the 3.0.3-3.0.4 interregnum; however, > this bug seems to have been fixed for the 3.0.4 release, and re- > emerged since with the transition to 2.6.18. > > Is there a pattern for how these bugs have been fixed in the past, so > that I could go about trying to make a patch?Unfortunately not. There were big changes to Linux''s time handling between 2.6.16 and 2.6.18 so bugs may hav ebeen introduced while porting out own timer code. It might be worth diffing the file time-xen.c from working and non-working Linux kernels. It''s weird that the failure mode is bad on the T60p yet noone else has reported this bug, nor has our testing reproduced it. :-( -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Derek Murray
2007-Feb-28 11:25 UTC
Re: [Xen-devel] Soft lockup/Time went backwards in latest unstable
On 26 Feb 2007, at 13:22, Keir Fraser wrote:> There were big changes to Linux''s time handling between > 2.6.16 and 2.6.18 so bugs may hav ebeen introduced while porting > out own > timer code. It might be worth diffing the file time-xen.c from > working and > non-working Linux kernels. It''s weird that the failure mode is bad > on the > T60p yet noone else has reported this bug, nor has our testing > reproduced > it. :-(Some more information on the bug hunt in progress, in case anyone with more experience has any ideas: * Changeset 13508 (i.e. the last one before the transition to a 2.6.17 kernel) fails in the same way as the current unstable, but it uses a 2.6.16.33 kernel. * Changeset 13213 (which introduced the idle=poll option in the kernel when running on Xen) also fails with a 2.6.16.33 kernel. * Changeset 13216 fails to boot at all (or, at least, locks up for >= 30 seconds on boot). * Changeset 13217 ("Make sure we always have a sensible idle function; this fixed problems") fails with time going backwards. * Changeset 13212, however, works. The only relevant changes between changesets 13212 and 13213 apply to the file linux-2.6-xen-sparse/arch/i386/kernel/process-xen.c (and the corresponding x86_64 file), to which a poll_idle() function has been added. I don''t quite understand the significance of this, but could it be responsible for the bug that I''m seeing? Regards, Derek Murray. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-28 12:03 UTC
Re: [Xen-devel] Soft lockup/Time went backwards in latest unstable
On 28/2/07 11:25, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote:> The only relevant changes between changesets 13212 and 13213 apply to > the file linux-2.6-xen-sparse/arch/i386/kernel/process-xen.c (and the > corresponding x86_64 file), to which a poll_idle() function has been > added. I don''t quite understand the significance of this, but could > it be responsible for the bug that I''m seeing?You are almost certainly picking up acpi_processor_idle() or apm_cpu_idle(). You might want to check this. I don''t think any good could come of this. :-) I''ll have a think about the best fix... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel