[ PLease keep me in Cc:, I''m not subscribed to this list ] Hi, I''m working on getting domain0 support in NetBSD (i386 for now) for Xen-3. I have a bootable kernel which is working fine under xen-3.0.0, with hardware device support and limited support for xentools. Now I tried to switch to 3.0.2-2, and the serial console hangs shortly after my domain0 kernel enables interrupts. If I call the NetBSD kernel debugger just after enabling interrupts I can type a few chars before in hangs. With ^A^A^A I can switch the input to Xen, and the message "switching input ..." is printed. However from here, only R is functionnal (it prints the message and reboot), all others commands produce no outputs, including ''h'' (this is a xen kernel rebuilt with debug options). The same xen kernel works fine with a linux dom0, including the debug actions (I tested h, q, i at last) so it''s probably not a compile option issue. Maybe it''s an interrupt issue, but I''m not sure as ^A and R are still working fine. I noticed there were changes in the include/public interfaces between 3.0.0 and 3.0.2, including a new hypercall to unmask event. I''ve not updated my NetBSD domain0 to this yet. Could this be the cause ? Any other idea on what could cause this ? -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d''experience feront toujours la difference -- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, May 07, 2006 at 10:57:10AM +0200, Manuel Bouyer wrote:> [ PLease keep me in Cc:, I''m not subscribed to this list ] > > Hi, > I''m working on getting domain0 support in NetBSD (i386 for now) for Xen-3. > I have a bootable kernel which is working fine under xen-3.0.0, with hardware > device support and limited support for xentools. > Now I tried to switch to 3.0.2-2, and the serial console hangs shortly after > my domain0 kernel enables interrupts. If I call the NetBSD kernel debugger > just after enabling interrupts I can type a few chars before in hangs. > With ^A^A^A I can switch the input to Xen, and the message "switching > input ..." is printed. However from here, only R is functionnal (it prints > the message and reboot), all others commands produce no outputs, including > ''h'' (this is a xen kernel rebuilt with debug options). The same xen kernel > works fine with a linux dom0, including the debug actions (I tested h, q, i > at last) so it''s probably not a compile option issue. > > Maybe it''s an interrupt issue, but I''m not sure as ^A and R are still > working fine. I noticed there were changes in the include/public interfaces > between 3.0.0 and 3.0.2, including a new hypercall to unmask event. > I''ve not updated my NetBSD domain0 to this yet. Could this be the cause ? > Any other idea on what could cause this ?Seems to be SMP-related; if I boot nosmp things are working properly again. I''m booting with noapic (I don''t have completed ioapic support yet) and as it''s an asus P2B, acpi=ht seems to be automatically added. When the hypervisor is hung, d is still working (in fact, soft interrupt are blocked) and I get: (XEN) *** Serial input -> Xen (type ''CTRL-a'' three times to switch input to DOM0). (XEN) ''d'' pressed -> dumping registers (XEN) ----[ Xen-3.0.2-2 Not tainted ]---- (XEN) CPU: 0 (XEN) EIP: e008:[<ff13e31a>] on_selected_cpus+0xdd/0x113 (XEN) EFLAGS: 00000202 CONTEXT: hypervisor (XEN) eax: 00000000 ebx: ff1fc180 ecx: 00000008 edx: ff1b5ea4 (XEN) esi: 0000000f edi: ff1b5fac ebp: ff1b5ecc esp: ff1b5e84 (XEN) cr0: 8005003b cr3: 0fc6e000 (XEN) ds: e010 es: e010 fs: 0031 gs: 0011 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ff1b5e84: (XEN) 00000002 000000fb ff1b5e9c ff13db33 ff1b5eb4 ff1b5ea4 ff1b5eac 00000001 (XEN) ff155cd0 00000000 00000001 00000000 00000000 00000002 ff1b5ecc ff13ddc1 (XEN) 00000000 ff1b5ee8 ff1b5eec ff13e23b 00000002 ff155cd0 00000000 00000001 (XEN) 00000001 00000002 ff1b5f0c ff155dff ff155cd0 00000000 00000001 00000001 (XEN) ff1ede80 00000002 ff1b5f2c ff155db6 ff155cd0 00000000 00000001 00000001 (XEN) ff1fc180 ff1fc180 ff1b5f7c ff11d4a7 00000000 ff1e30a8 ff1b5f5c ff11cfe4 (XEN) 00000000 00000000 80818efd 00000003 ff1bf680 ff1e30a8 00000000 ff155d8c (XEN) 8080cbad 00000003 ff1b5f8c ff1ede80 ff1e30a8 00000000 ff1b5fac ff11bc9d (XEN) 00000000 ff1be500 00e4a037 ff17d36f 7a936ca1 00000003 deadbeef 00000001 (XEN) 00000000 00000000 00e4a037 ff17d3f6 7a936ca1 00000003 00000000 00000000 (XEN) 0000000f c0a01e98 00000000 00e00000 c04d1523 00000009 00000246 c0a01e60 (XEN) 00000011 00000011 00000011 00000031 00000011 00000000 ff1fc180 (XEN) Xen call trace: (XEN) [<ff13e31a>] on_selected_cpus+0xdd/0x113 (XEN) [<ff13e23b>] smp_call_function+0x4e/0x50 (XEN) [<ff155dff>] on_each_cpu+0x26/0x39 (XEN) [<ff155db6>] mce_work_fn+0x2a/0x4d (XEN) [<ff11d4a7>] timer_softirq_action+0xea/0x17f (XEN) [<ff11bc9d>] do_softirq+0xa1/0xb8 (XEN) -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d''experience feront toujours la difference -- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Seems to be SMP-related; if I boot nosmp things are working > properly again. > I''m booting with noapic (I don''t have completed ioapic > support yet) and as it''s an asus P2B, acpi=ht seems to be > automatically added.Please try -unstable: this sounds like it could be the missing spin_unlock that Juan Quintila found. It only affects machines without ioapics. Ian # HG changeset patch # User kaf24@firebug.cl.cam.ac.uk # Node ID 65a2cf84b33552eb749bba1990aa35c4fa887a16 # Parent 5afb142646294a6c446e275c5bef60ff7d477881 Add missing spin_unlock_irq() at xen/arch/x86/irq.c Changeset 9889:42a8e3101c6c reorganized the code on this file, and missed this spin_unlock_irq(). Without this patch, my machine hangs completely during boot. With this, it works. Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> diff -r 5afb14264629 -r 65a2cf84b335 xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Fri May 05 00:27:10 2006 +0100 +++ b/xen/arch/x86/irq.c Fri May 05 13:41:35 2006 +0100 @@ -318,6 +318,7 @@ static void __pirq_guest_eoi(struct doma { ASSERT(cpus_empty(action->cpu_eoi_map)); desc->handler->end(irq_to_vector(irq)); + spin_unlock_irq(&desc->lock); return; }> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of > Manuel Bouyer > Sent: 07 May 2006 08:52 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] Re: serial console hang in 3.0.2-2 > > On Sun, May 07, 2006 at 10:57:10AM +0200, Manuel Bouyer wrote: > > [ PLease keep me in Cc:, I''m not subscribed to this list ] > > > > Hi, > > I''m working on getting domain0 support in NetBSD (i386 for > now) for Xen-3. > > I have a bootable kernel which is working fine under > xen-3.0.0, with > > hardware device support and limited support for xentools. > > Now I tried to switch to 3.0.2-2, and the serial console > hangs shortly > > after my domain0 kernel enables interrupts. If I call the NetBSD > > kernel debugger just after enabling interrupts I can type a > few chars before in hangs. > > With ^A^A^A I can switch the input to Xen, and the message > "switching > > input ..." is printed. However from here, only R is functionnal (it > > prints the message and reboot), all others commands produce no > > outputs, including ''h'' (this is a xen kernel rebuilt with debug > > options). The same xen kernel works fine with a linux dom0, > including > > the debug actions (I tested h, q, i at last) so it''s > probably not a compile option issue. > > > > Maybe it''s an interrupt issue, but I''m not sure as ^A and R > are still > > working fine. I noticed there were changes in the include/public > > interfaces between 3.0.0 and 3.0.2, including a new > hypercall to unmask event. > > I''ve not updated my NetBSD domain0 to this yet. Could this > be the cause ? > > Any other idea on what could cause this ? > > Seems to be SMP-related; if I boot nosmp things are working > properly again. > I''m booting with noapic (I don''t have completed ioapic > support yet) and as it''s an asus P2B, acpi=ht seems to be > automatically added. > > When the hypervisor is hung, d is still working (in fact, > soft interrupt are > blocked) and I get: > (XEN) *** Serial input -> Xen (type ''CTRL-a'' three times to > switch input to DOM0). > (XEN) ''d'' pressed -> dumping registers > (XEN) ----[ Xen-3.0.2-2 Not tainted ]---- > (XEN) CPU: 0 > (XEN) EIP: e008:[<ff13e31a>] on_selected_cpus+0xdd/0x113 > (XEN) EFLAGS: 00000202 CONTEXT: hypervisor > (XEN) eax: 00000000 ebx: ff1fc180 ecx: 00000008 edx: ff1b5ea4 > (XEN) esi: 0000000f edi: ff1b5fac ebp: ff1b5ecc esp: ff1b5e84 > (XEN) cr0: 8005003b cr3: 0fc6e000 > (XEN) ds: e010 es: e010 fs: 0031 gs: 0011 ss: e010 cs: e008 > (XEN) Xen stack trace from esp=ff1b5e84: > (XEN) 00000002 000000fb ff1b5e9c ff13db33 ff1b5eb4 > ff1b5ea4 ff1b5eac 00000001 > (XEN) ff155cd0 00000000 00000001 00000000 00000000 > 00000002 ff1b5ecc ff13ddc1 > (XEN) 00000000 ff1b5ee8 ff1b5eec ff13e23b 00000002 > ff155cd0 00000000 00000001 > (XEN) 00000001 00000002 ff1b5f0c ff155dff ff155cd0 > 00000000 00000001 00000001 > (XEN) ff1ede80 00000002 ff1b5f2c ff155db6 ff155cd0 > 00000000 00000001 00000001 > (XEN) ff1fc180 ff1fc180 ff1b5f7c ff11d4a7 00000000 > ff1e30a8 ff1b5f5c ff11cfe4 > (XEN) 00000000 00000000 80818efd 00000003 ff1bf680 > ff1e30a8 00000000 ff155d8c > (XEN) 8080cbad 00000003 ff1b5f8c ff1ede80 ff1e30a8 > 00000000 ff1b5fac ff11bc9d > (XEN) 00000000 ff1be500 00e4a037 ff17d36f 7a936ca1 > 00000003 deadbeef 00000001 > (XEN) 00000000 00000000 00e4a037 ff17d3f6 7a936ca1 > 00000003 00000000 00000000 > (XEN) 0000000f c0a01e98 00000000 00e00000 c04d1523 > 00000009 00000246 c0a01e60 > (XEN) 00000011 00000011 00000011 00000031 00000011 > 00000000 ff1fc180 > (XEN) Xen call trace: > (XEN) [<ff13e31a>] on_selected_cpus+0xdd/0x113 > (XEN) [<ff13e23b>] smp_call_function+0x4e/0x50 > (XEN) [<ff155dff>] on_each_cpu+0x26/0x39 > (XEN) [<ff155db6>] mce_work_fn+0x2a/0x4d > (XEN) [<ff11d4a7>] timer_softirq_action+0xea/0x17f > (XEN) [<ff11bc9d>] do_softirq+0xa1/0xb8 > (XEN) > > -- > Manuel Bouyer <bouyer@antioche.eu.org> > NetBSD: 26 ans d''experience feront toujours la difference > -- > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 May 2006, at 16:52, Manuel Bouyer wrote:> Seems to be SMP-related; if I boot nosmp things are working properly > again. > I''m booting with noapic (I don''t have completed ioapic support yet) and > as it''s an asus P2B, acpi=ht seems to be automatically added. > > When the hypervisor is hung, d is still working (in fact, soft > interrupt are > blocked) and I get:The CPU is probably stuck waiting for some other CPU to run its smp_call_function callback handler. This may be because the other CPU has got interrupts permanently disabled. It''s worth disassembling your Xen image, or adding tracing to on_selected_cpus(), to find out where the CPU has got stuck and what it is waiting on. Then perhaps we can help you some more. Ian''s suggestion of trying -unstable is perhaps also worthwhile, although not for the reason he states (the fix that he points out is not applicable to our 3.0.2 tree). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, May 08, 2006 at 12:58:18PM +0100, Ian Pratt wrote:> > Seems to be SMP-related; if I boot nosmp things are working > > properly again. > > I''m booting with noapic (I don''t have completed ioapic > > support yet) and as it''s an asus P2B, acpi=ht seems to be > > automatically added. > > Please try -unstable: this sounds like it could be the missing > spin_unlock that Juan Quintila found. It only affects machines without > ioapics.Thanks, but it doesn''t help for me. Anyway, it may well be because I use noapic on a SMP box; I''m not sure the 2 CPUs will be useable in such a configuration anyway. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 26 ans d''experience feront toujours la difference -- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8 May 2006, at 16:01, Manuel Bouyer wrote:>> Please try -unstable: this sounds like it could be the missing >> spin_unlock that Juan Quintila found. It only affects machines without >> ioapics. > > Thanks, but it doesn''t help for me. Anyway, it may well be because I > use > noapic on a SMP box; I''m not sure the 2 CPUs will be useable in such > a configuration anyway."noapic" should be okay I think, but I wouldn''t guarantee it. That simply stops use of the I/O APICs. It probably depends on motherboard and chipset whether it possible to run in SMP mode with interrupts routed through the legacy PIC. Anyway, clearly it worked for you with 3.0.0. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel