Christoph Egger
2011-Jun-30 12:06 UTC
[Xen-devel] xen kernel crash at boot since 23598:b24018319772
Changeset 23598:b24018319772 causes the xen kernel crash at boot: (XEN) Assertion ''spin_is_locked(&d->event_lock)'' failed at irq.c:965 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c480160766>] pirq_spin_lock_irq_desc+0x2e/0xfb (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830421db3e80 rcx: 0000000000000000 (XEN) rdx: 0000000000000292 rsi: ffff830421db3e80 rdi: ffff830421de9194 (XEN) rbp: ffff82c48029fd28 rsp: ffff82c48029fcf8 r8: ffff82c48029fd18 (XEN) r9: 0000000000000000 r10: 0000000000001000 r11: ffffffffffffffc0 (XEN) r12: ffff830421de9000 r13: 000000000000002c r14: ffff830421db3e80 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000406f0 (XEN) cr3: 0000000420f2b000 cr2: 0000000000000000 (XEN) ds: 0013 es: 0013 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c48029fcf8: (XEN) ffff82c480134dfb ffff830421db3e80 ffff830421de9000 ffff830421de9000 (XEN) 00000000000000c7 ffff830421de91a8 ffff82c48029fd48 ffff82c48016351b (XEN) 000000000000000a 000000000000000c ffff82c48029fe18 ffff82c4801635dc (XEN) ffff82c48029ff18 ffff82c48029fd68 ffff830421db3d00 ffff830421dc2e80 (XEN) ffff830421db1980 ffff830421db1b00 ffff830421db1c00 ffff830421db1880 (XEN) ffff830421db1700 ffff830421db1600 ffff830421db1500 ffff830421db3f80 (XEN) ffff830421db3e80 ffff830421db3e00 ffffffff8020d2df ffff82c48029ff18 (XEN) ffff82c48029ff08 ffff82c4801812a2 000000000000f800 ffff8300cfd92000 (XEN) ffffffff80bd85c8 0000000000000001 0000000000000000 0000000000000001 (XEN) ffff82c48029feb8 ffff82c480174a63 ffff82c48029fe88 0000000000000092 (XEN) ffff8300cfd92000 000000005f129b56 0000001300000001 000082c400000000 (XEN) ffff82c480123806 00ff830421db3ee8 ffff82c48029feb8 ffff82c480126453 (XEN) 0000000800000008 ffffffff8020d2df ffff82c48029feb8 ffff82c480182e4e (XEN) ffffffff80bd85c0 0000000000002480 0000000000000001 0000000000000000 (XEN) ffff82c48029fef8 ffff82c48014e671 0000000000000004 0000000000000000 (XEN) 0000000000000000 ffffffff80c7e870 00000000deadbeef ffff8300cfd92000 (XEN) 00007d3b7fd600c7 ffff82c480216d78 ffffffff8010126a 0000000000000013 (XEN) 0000000000000001 0000000000000000 0000000000000001 0000000000002480 (XEN) ffffffff80f3cd48 0000000000000005 0000000000000202 0000000000000000 (XEN) ffffffff80993d23 0000000000000004 0000000000000013 ffffffff8010126a (XEN) Xen call trace: (XEN) [<ffff82c480160766>] pirq_spin_lock_irq_desc+0x2e/0xfb (XEN) [<ffff82c48016351b>] pirq_guest_eoi+0x33/0x4b (XEN) [<ffff82c4801635dc>] pirq_guest_unmask+0xa9/0xdf (XEN) [<ffff82c480174a63>] do_physdev_op+0x403/0x1388 (XEN) [<ffff82c48014e671>] do_physdev_op_compat+0x6c/0x7b (XEN) [<ffff82c480216d78>] syscall_enter+0xc8/0x122 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion ''spin_is_locked(&d->event_lock)'' failed at irq.c:965 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Jun-30 12:43 UTC
Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772
>>> On 30.06.11 at 14:06, Christoph Egger <Christoph.Egger@amd.com> wrote:> Changeset 23598:b24018319772 causes the xen kernel crash at boot:That''s likely not the one, but rather 23573:584c2e5e03d9. And indeed it seems like the assertion is a stale leftover from the original non-RCU version of the patch. There are a few more similar ones which may similarly be candidates fro removal. Keir, what''s your take on this? Jan> (XEN) Assertion ''spin_is_locked(&d->event_lock)'' failed at irq.c:965 > (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c480160766>] pirq_spin_lock_irq_desc+0x2e/0xfb > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: ffff830421db3e80 rcx: 0000000000000000 > (XEN) rdx: 0000000000000292 rsi: ffff830421db3e80 rdi: ffff830421de9194 > (XEN) rbp: ffff82c48029fd28 rsp: ffff82c48029fcf8 r8: ffff82c48029fd18 > (XEN) r9: 0000000000000000 r10: 0000000000001000 r11: ffffffffffffffc0 > (XEN) r12: ffff830421de9000 r13: 000000000000002c r14: ffff830421db3e80 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000406f0 > (XEN) cr3: 0000000420f2b000 cr2: 0000000000000000 > (XEN) ds: 0013 es: 0013 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c48029fcf8: > (XEN) ffff82c480134dfb ffff830421db3e80 ffff830421de9000 ffff830421de9000 > (XEN) 00000000000000c7 ffff830421de91a8 ffff82c48029fd48 ffff82c48016351b > (XEN) 000000000000000a 000000000000000c ffff82c48029fe18 ffff82c4801635dc > (XEN) ffff82c48029ff18 ffff82c48029fd68 ffff830421db3d00 ffff830421dc2e80 > (XEN) ffff830421db1980 ffff830421db1b00 ffff830421db1c00 ffff830421db1880 > (XEN) ffff830421db1700 ffff830421db1600 ffff830421db1500 ffff830421db3f80 > (XEN) ffff830421db3e80 ffff830421db3e00 ffffffff8020d2df ffff82c48029ff18 > (XEN) ffff82c48029ff08 ffff82c4801812a2 000000000000f800 ffff8300cfd92000 > (XEN) ffffffff80bd85c8 0000000000000001 0000000000000000 0000000000000001 > (XEN) ffff82c48029feb8 ffff82c480174a63 ffff82c48029fe88 0000000000000092 > (XEN) ffff8300cfd92000 000000005f129b56 0000001300000001 000082c400000000 > (XEN) ffff82c480123806 00ff830421db3ee8 ffff82c48029feb8 ffff82c480126453 > (XEN) 0000000800000008 ffffffff8020d2df ffff82c48029feb8 ffff82c480182e4e > (XEN) ffffffff80bd85c0 0000000000002480 0000000000000001 0000000000000000 > (XEN) ffff82c48029fef8 ffff82c48014e671 0000000000000004 0000000000000000 > (XEN) 0000000000000000 ffffffff80c7e870 00000000deadbeef ffff8300cfd92000 > (XEN) 00007d3b7fd600c7 ffff82c480216d78 ffffffff8010126a 0000000000000013 > (XEN) 0000000000000001 0000000000000000 0000000000000001 0000000000002480 > (XEN) ffffffff80f3cd48 0000000000000005 0000000000000202 0000000000000000 > (XEN) ffffffff80993d23 0000000000000004 0000000000000013 ffffffff8010126a > (XEN) Xen call trace: > (XEN) [<ffff82c480160766>] pirq_spin_lock_irq_desc+0x2e/0xfb > (XEN) [<ffff82c48016351b>] pirq_guest_eoi+0x33/0x4b > (XEN) [<ffff82c4801635dc>] pirq_guest_unmask+0xa9/0xdf > (XEN) [<ffff82c480174a63>] do_physdev_op+0x403/0x1388 > (XEN) [<ffff82c48014e671>] do_physdev_op_compat+0x6c/0x7b > (XEN) [<ffff82c480216d78>] syscall_enter+0xc8/0x122 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Assertion ''spin_is_locked(&d->event_lock)'' failed at irq.c:965 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Jun-30 13:23 UTC
[Xen-devel] Re: xen kernel crash at boot since 23598:b24018319772
Christoph Egger writes ("xen kernel crash at boot since 23598:b24018319772"):> Changeset 23598:b24018319772 causes the xen kernel crash at boot:23598:b24018319772 is a merge. This occurs if two tree maintainers with push access "race" and commit changes locally. In this case the two sub-branches are: 1. Olaf''s xenpaging series ending with 23591:3dcb553f3ba9, applied by me, affecting only tools/ 2. A series of hypervisor patches ending with 23597:e2235fe267eb, applied by Keir. You should be able to see this information in "hg log" (the commit message "Merge" is a giveaway) but it''s easier if you use "hg view". It seems likely that something in 2. is responsible. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Jun-30 13:29 UTC
[Xen-devel] Re: xen kernel crash at boot since 23598:b24018319772
On 06/30/11 15:23, Ian Jackson wrote:> Christoph Egger writes ("xen kernel crash at boot since 23598:b24018319772"): >> Changeset 23598:b24018319772 causes the xen kernel crash at boot: > > 23598:b24018319772 is a merge. This occurs if two tree maintainers > with push access "race" and commit changes locally. > > In this case the two sub-branches are: > > 1. Olaf''s xenpaging series ending with 23591:3dcb553f3ba9, > applied by me, affecting only tools/ > 2. A series of hypervisor patches ending with 23597:e2235fe267eb, > applied by Keir. > > You should be able to see this information in "hg log" (the commit > message "Merge" is a giveaway) but it''s easier if you use "hg view". > > It seems likely that something in 2. is responsible.As Jan already pointed out it is changeset 23573:584c2e5e03d9. When I remove the assertion as Jan pointed out Xen boots again. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Jun-30 13:48 UTC
[Xen-devel] Re: xen kernel crash at boot since 23598:b24018319772
Christoph Egger writes ("Re: xen kernel crash at boot since 23598:b24018319772"):> On 06/30/11 15:23, Ian Jackson wrote: > > It seems likely that something in 2. is responsible. > > As Jan already pointed out it is changeset 23573:584c2e5e03d9. > When I remove the assertion as Jan pointed out Xen boots again.Yes. (Thanks, Jan.) I was writing to help you (and others reading) be more able to accurately identify problematic changesets in future. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jun-30 14:21 UTC
Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772
On 30/06/2011 13:43, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 30.06.11 at 14:06, Christoph Egger <Christoph.Egger@amd.com> wrote: > >> Changeset 23598:b24018319772 causes the xen kernel crash at boot: > > That''s likely not the one, but rather 23573:584c2e5e03d9. And > indeed it seems like the assertion is a stale leftover from the > original non-RCU version of the patch. There are a few more > similar ones which may similarly be candidates fro removal. > > Keir, what''s your take on this?Not sure, pirq_spin_lock_irq_desc() has a comment about the event_lock preventing the PIRQ-IRQ mapping from changing under its feet. Why would the radix-tree patch change what code is protected by event_lock, anyway? -- Keir> Jan > >> (XEN) Assertion ''spin_is_locked(&d->event_lock)'' failed at irq.c:965 >> (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82c480160766>] pirq_spin_lock_irq_desc+0x2e/0xfb >> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor >> (XEN) rax: 0000000000000000 rbx: ffff830421db3e80 rcx: 0000000000000000 >> (XEN) rdx: 0000000000000292 rsi: ffff830421db3e80 rdi: ffff830421de9194 >> (XEN) rbp: ffff82c48029fd28 rsp: ffff82c48029fcf8 r8: ffff82c48029fd18 >> (XEN) r9: 0000000000000000 r10: 0000000000001000 r11: ffffffffffffffc0 >> (XEN) r12: ffff830421de9000 r13: 000000000000002c r14: ffff830421db3e80 >> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000406f0 >> (XEN) cr3: 0000000420f2b000 cr2: 0000000000000000 >> (XEN) ds: 0013 es: 0013 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c48029fcf8: >> (XEN) ffff82c480134dfb ffff830421db3e80 ffff830421de9000 ffff830421de9000 >> (XEN) 00000000000000c7 ffff830421de91a8 ffff82c48029fd48 ffff82c48016351b >> (XEN) 000000000000000a 000000000000000c ffff82c48029fe18 ffff82c4801635dc >> (XEN) ffff82c48029ff18 ffff82c48029fd68 ffff830421db3d00 ffff830421dc2e80 >> (XEN) ffff830421db1980 ffff830421db1b00 ffff830421db1c00 ffff830421db1880 >> (XEN) ffff830421db1700 ffff830421db1600 ffff830421db1500 ffff830421db3f80 >> (XEN) ffff830421db3e80 ffff830421db3e00 ffffffff8020d2df ffff82c48029ff18 >> (XEN) ffff82c48029ff08 ffff82c4801812a2 000000000000f800 ffff8300cfd92000 >> (XEN) ffffffff80bd85c8 0000000000000001 0000000000000000 0000000000000001 >> (XEN) ffff82c48029feb8 ffff82c480174a63 ffff82c48029fe88 0000000000000092 >> (XEN) ffff8300cfd92000 000000005f129b56 0000001300000001 000082c400000000 >> (XEN) ffff82c480123806 00ff830421db3ee8 ffff82c48029feb8 ffff82c480126453 >> (XEN) 0000000800000008 ffffffff8020d2df ffff82c48029feb8 ffff82c480182e4e >> (XEN) ffffffff80bd85c0 0000000000002480 0000000000000001 0000000000000000 >> (XEN) ffff82c48029fef8 ffff82c48014e671 0000000000000004 0000000000000000 >> (XEN) 0000000000000000 ffffffff80c7e870 00000000deadbeef ffff8300cfd92000 >> (XEN) 00007d3b7fd600c7 ffff82c480216d78 ffffffff8010126a 0000000000000013 >> (XEN) 0000000000000001 0000000000000000 0000000000000001 0000000000002480 >> (XEN) ffffffff80f3cd48 0000000000000005 0000000000000202 0000000000000000 >> (XEN) ffffffff80993d23 0000000000000004 0000000000000013 ffffffff8010126a >> (XEN) Xen call trace: >> (XEN) [<ffff82c480160766>] pirq_spin_lock_irq_desc+0x2e/0xfb >> (XEN) [<ffff82c48016351b>] pirq_guest_eoi+0x33/0x4b >> (XEN) [<ffff82c4801635dc>] pirq_guest_unmask+0xa9/0xdf >> (XEN) [<ffff82c480174a63>] do_physdev_op+0x403/0x1388 >> (XEN) [<ffff82c48014e671>] do_physdev_op_compat+0x6c/0x7b >> (XEN) [<ffff82c480216d78>] syscall_enter+0xc8/0x122 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion ''spin_is_locked(&d->event_lock)'' failed at irq.c:965 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Jun-30 15:49 UTC
Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772
>>> On 30.06.11 at 16:21, Keir Fraser <keir.xen@gmail.com> wrote: > On 30/06/2011 13:43, "Jan Beulich" <JBeulich@novell.com> wrote: >> That''s likely not the one, but rather 23573:584c2e5e03d9. And >> indeed it seems like the assertion is a stale leftover from the >> original non-RCU version of the patch. There are a few more >> similar ones which may similarly be candidates fro removal. >> >> Keir, what''s your take on this? > > Not sure, pirq_spin_lock_irq_desc() has a comment about the event_lock > preventing the PIRQ-IRQ mapping from changing under its feet. Why would the > radix-tree patch change what code is protected by event_lock, anyway?The whole function (including the comment) got added by that patch. Hence either comment and assertion need fixing, or both need to stay and calling code needs adjustment. The RCU-ness, as I understand it, allows read accesses to the PIRQ -> IRQ mapping to be done lockless, hence d->event_lock needs to be held only if the intention is to alter the mapping (which in particular isn''t the case when unmasking an IRQ). Or did I still not get my RCU thinking right? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jun-30 16:33 UTC
Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772
On 30/06/2011 16:49, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 30.06.11 at 16:21, Keir Fraser <keir.xen@gmail.com> wrote: >> On 30/06/2011 13:43, "Jan Beulich" <JBeulich@novell.com> wrote: >>> That''s likely not the one, but rather 23573:584c2e5e03d9. And >>> indeed it seems like the assertion is a stale leftover from the >>> original non-RCU version of the patch. There are a few more >>> similar ones which may similarly be candidates fro removal. >>> >>> Keir, what''s your take on this? >> >> Not sure, pirq_spin_lock_irq_desc() has a comment about the event_lock >> preventing the PIRQ-IRQ mapping from changing under its feet. Why would the >> radix-tree patch change what code is protected by event_lock, anyway? > > The whole function (including the comment) got added by that patch. > Hence either comment and assertion need fixing, or both need to > stay and calling code needs adjustment.Well, I guess at least it''s good that you wrote it, and recently. We''re not dealing with anyone else''s hidden assumptions in that case.> The RCU-ness, as I understand it, allows read accesses to the > PIRQ -> IRQ mapping to be done lockless, hence d->event_lock > needs to be held only if the intention is to alter the mapping > (which in particular isn''t the case when unmasking an IRQ). Or > did I still not get my RCU thinking right?We''re nearly there. *Yes*, it is now safe to look up pirq structs in the pirq_tree with no lock held. And the resulting pirq struct can safely be accessed basically until you might yield (i.e., do softirq work). This is safe because pirq structs are freed after an RCU safety period. *However*, you still need to worry about concurrency aspects of access to the contents of the pirq structure. *In particular*, pirq->arch.irq could apparently be modified concurrently with the execution of pirq_spin_lock_irq_desc() -- the modifying CPU holds both d->event_lock and pirq->arch.irq''s desc_lock, but pirq_spin_lock_irq_desc() may hold neither. Note that domain_spin_lock_irq_desc() has a retry loop for a reason! It knows that pirq-irq mapping may change under its feet, so it needs to re-check the mapping with the desc_lock held, at which point the mapping cannot change *if* it obtained the correct desc_lock in time! Perhaps pirq_spin_lock_irq_desc() needs a similar retry loop? Perhaps pirq_spin_lock_irq_desc() should never have been forked from domain_spin_lock_irq_desc(), and all callers should simply use the former? -- Keir> Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Jul-01 10:02 UTC
Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772
>>> On 30.06.11 at 18:33, Keir Fraser <keir.xen@gmail.com> wrote: > Note that domain_spin_lock_irq_desc() has a retry loop for a reason! It > knows that pirq-irq mapping may change under its feet, so it needs to > re-check the mapping with the desc_lock held, at which point the mapping > cannot change *if* it obtained the correct desc_lock in time! > > Perhaps pirq_spin_lock_irq_desc() needs a similar retry loop? PerhapsYes. Will send a patch soon.> pirq_spin_lock_irq_desc() should never have been forked from > domain_spin_lock_irq_desc(), and all callers should simply use the former?I''d rather not - the lookup isn''t really inexpensive (and doesn''t need to be re-done on each iteration either), which is why I created the clone in the first place. Instead I think that with adding the retry loop here, domain_spin_lock_irq_desc() could become a simple wrapper around pirq_spin_lock_irq_desc(). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jul-01 10:18 UTC
Re: [Xen-devel] xen kernel crash at boot since 23598:b24018319772
On 01/07/2011 11:02, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 30.06.11 at 18:33, Keir Fraser <keir.xen@gmail.com> wrote: >> Note that domain_spin_lock_irq_desc() has a retry loop for a reason! It >> knows that pirq-irq mapping may change under its feet, so it needs to >> re-check the mapping with the desc_lock held, at which point the mapping >> cannot change *if* it obtained the correct desc_lock in time! >> >> Perhaps pirq_spin_lock_irq_desc() needs a similar retry loop? Perhaps > > Yes. Will send a patch soon. > >> pirq_spin_lock_irq_desc() should never have been forked from >> domain_spin_lock_irq_desc(), and all callers should simply use the former? > > I''d rather not - the lookup isn''t really inexpensive (and doesn''t need > to be re-done on each iteration either), which is why I created the > clone in the first place. Instead I think that with adding the retry > loop here, domain_spin_lock_irq_desc() could become a simple > wrapper around pirq_spin_lock_irq_desc().Yes please! -- Keir> Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel