I have a testing intel machine with 4 physical cpus running 64 bit Xen 4.1.3-rc1. I have a particular linux VM for which at its kernel boot time ( as domU ) it makes the XEN and Dom0 unresponsive (even the XEN serial console freezes indefinitely). The domU linux kernel is 3.2.1 and has XEN compiled in. Trying to find where the problem is, I found this: at the point where XEN/Dom0 freezes, the physical cpus 0, 1 and 2 are executing the idle_loop() from xen/arch/x86/domain.c and the cpu 3 spins indefinitely in the d->arch.hvm_domain.irq_lock ( where "d" is the domain of my domU ) This is the Xen call trace of the cpu 3, at the point where it spins forever: (XEN) [<ffff82c480126a06>] _spin_lock+0x56/0x110 (XEN) [<ffff82c4801d0814>] hvm_set_callback_irq_level+0x34/0x150 (XEN) [<ffff82c4801d1245>] hvm_assert_evtchn_irq+0x65/0x90 (XEN) [<ffff82c48016e505>] vcpu_mark_events_pending+0x35/0x40 (XEN) [<ffff82c4801073f5>] evtchn_set_pending+0x135/0x1c0 (XEN) [<ffff82c4801075e7>] send_guest_pirq+0x57/0x70 (XEN) [<ffff82c4801d04cb>] assert_gsi+0x5b/0x60 (XEN) [<ffff82c4801d07bb>] hvm_isa_irq_assert+0x9b/0xc0 (XEN) [<ffff82c4801cd642>] do_hvm_op+0x1982/0x22e0 (XEN) [<ffff82c480235ffe>] compat_hypercall+0xae/0x107 A history of acquiring d->arch.hvm_domain.irq_lock shows that cpu 3 tries to acquire the same lock twice first here: (XEN) [<ffff82c480126a06>] _spin_lock+0x56/0x110 (XEN) [<ffff82c4801d0762>] hvm_isa_irq_assert+0x42/0xc0 (XEN) [<ffff82c4801cd642>] do_hvm_op+0x1982/0x22e0 (XEN) [<ffff82c480235ffe>] compat_hypercall+0xae/0x107 at hvm_isa_irq_assert() and then next at hvm_set_callback_irq_level() and it thus spins indefinitely. Also, the vcpu 0 of Dom0 is bound to cpu 3 at that time. I am not familiar with XEN internals so I don''t know the best way to create a patch for fixing this. That''s why I am posting this to the xen developers list. All the best, Paulian
On 27/01/2012 23:20, "Paulian Bogdan Marinca" <paulian@marinca.net> wrote:> I have a testing intel machine with 4 physical cpus running 64 bit Xen > 4.1.3-rc1. > > I have a particular linux VM for which at its kernel boot time ( as > domU ) it makes the XEN and Dom0 unresponsive (even the XEN serial > console freezes indefinitely). The domU linux kernel is 3.2.1 and has > XEN compiled in. > > Trying to find where the problem is, I found this: at the point where > XEN/Dom0 freezes, the physical cpus 0, 1 and 2 are executing the > idle_loop() from xen/arch/x86/domain.c and the cpu 3 spins > indefinitely in theThanks, the offending patch is xen-unstable:22409, I think. It needs an HVM guest which maps emulated IRQs onto event channels, but then has event channel notifications mapped onto an emulated IRQ (rather than ''direct vector delivery''). An unlikely state of affairs, maybe it''s a temporary setup during guest boot, or maybe it indicates a bug in the guest as well. In any case, cc''ing the author, Stefano. This ought to be easy to fix. We could make the irq_lock a recursive lock for example. -- Keir> d->arch.hvm_domain.irq_lock > > ( where "d" is the domain of my domU ) > > This is the Xen call trace of the cpu 3, at the point where it spins forever: > > (XEN) [<ffff82c480126a06>] _spin_lock+0x56/0x110 > (XEN) [<ffff82c4801d0814>] hvm_set_callback_irq_level+0x34/0x150 > (XEN) [<ffff82c4801d1245>] hvm_assert_evtchn_irq+0x65/0x90 > (XEN) [<ffff82c48016e505>] vcpu_mark_events_pending+0x35/0x40 > (XEN) [<ffff82c4801073f5>] evtchn_set_pending+0x135/0x1c0 > (XEN) [<ffff82c4801075e7>] send_guest_pirq+0x57/0x70 > (XEN) [<ffff82c4801d04cb>] assert_gsi+0x5b/0x60 > (XEN) [<ffff82c4801d07bb>] hvm_isa_irq_assert+0x9b/0xc0 > (XEN) [<ffff82c4801cd642>] do_hvm_op+0x1982/0x22e0 > (XEN) [<ffff82c480235ffe>] compat_hypercall+0xae/0x107 > > > A history of acquiring d->arch.hvm_domain.irq_lock shows that cpu 3 > tries to acquire the same lock twice > > first here: > > (XEN) [<ffff82c480126a06>] _spin_lock+0x56/0x110 > (XEN) [<ffff82c4801d0762>] hvm_isa_irq_assert+0x42/0xc0 > (XEN) [<ffff82c4801cd642>] do_hvm_op+0x1982/0x22e0 > (XEN) [<ffff82c480235ffe>] compat_hypercall+0xae/0x107 > > > at hvm_isa_irq_assert() > > and then next at hvm_set_callback_irq_level() > > and it thus spins indefinitely. Also, the vcpu 0 of Dom0 is bound to > cpu 3 at that time. > > I am not familiar with XEN internals so I don''t know the best way to > create a patch for fixing this. That''s why I am posting this to the > xen developers list. > > All the best, > Paulian > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Sat, 28 Jan 2012, Keir Fraser wrote:> On 27/01/2012 23:20, "Paulian Bogdan Marinca" <paulian@marinca.net> wrote: > > > I have a testing intel machine with 4 physical cpus running 64 bit Xen > > 4.1.3-rc1. > > > > I have a particular linux VM for which at its kernel boot time ( as > > domU ) it makes the XEN and Dom0 unresponsive (even the XEN serial > > console freezes indefinitely). The domU linux kernel is 3.2.1 and has > > XEN compiled in. > > > > Trying to find where the problem is, I found this: at the point where > > XEN/Dom0 freezes, the physical cpus 0, 1 and 2 are executing the > > idle_loop() from xen/arch/x86/domain.c and the cpu 3 spins > > indefinitely in the > > Thanks, the offending patch is xen-unstable:22409, I think. It needs an HVM > guest which maps emulated IRQs onto event channels, but then has event > channel notifications mapped onto an emulated IRQ (rather than ''direct > vector delivery''). An unlikely state of affairs, maybe it''s a temporary > setup during guest boot, or maybe it indicates a bug in the guest as well.That''s right, this scenario should not be possible because whenever XENFEAT_hvm_pirqs is available, XENFEAT_hvm_callback_vector should also be available. In any case I''ll submit a patch to Linux to make sure this doesn''t happen, explicitly checking for xen_have_vector_callback. What is your Linux kernel version? Could you please post your kernel config as well?> In any case, cc''ing the author, Stefano. This ought to be easy to fix. We > could make the irq_lock a recursive lock for example.We could also fail to map irqs into event channels if the delivery method is not HVMIRQ_callback_vector: diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index df92cc7..7d89ed6 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -93,6 +93,11 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, if ( domid == DOMID_SELF && is_hvm_domain(d) ) { + if ( !is_hvm_pv_evtchn_domain(d) ) + { + ret = -EINVAL; + goto free_domain; + } ret = physdev_hvm_map_pirq(d, type, index, pirq_p); goto free_domain; }
Paulian Bogdan Marinca
2012-Jan-30 13:13 UTC
Re: XEN 4.1.3-rc1 bug, spinlock acquired twice
On 30 January 2012 11:32, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> On Sat, 28 Jan 2012, Keir Fraser wrote: >> On 27/01/2012 23:20, "Paulian Bogdan Marinca" <paulian@marinca.net> wrote: >> >> > I have a testing intel machine with 4 physical cpus running 64 bit Xen >> > 4.1.3-rc1. >> > >> > I have a particular linux VM for which at its kernel boot time ( as >> > domU ) it makes the XEN and Dom0 unresponsive (even the XEN serial >> > console freezes indefinitely). The domU linux kernel is 3.2.1 and has[...]> > That''s right, this scenario should not be possible because whenever > XENFEAT_hvm_pirqs is available, XENFEAT_hvm_callback_vector should also > be available. > In any case I''ll submit a patch to Linux to make sure this doesn''t > happen, explicitly checking for xen_have_vector_callback. > What is your Linux kernel version? Could you please post your kernel > config as well?Yes, you are right, this does not happen normally in a booting kernel. What happens is that I needed to test my own PV drivers in all possible scenarios, so I hacked the domU linux kernel to "believe" that XEN does not have XENFEAT_hvm_callback_vector, basically I forced a xen_have_vector_callback = 0; in enlighten.c in domU kernel. This is probably why it triggered this unusual situation. I attach my kernel config (btw is actually a 3.0.6 kernel not 3.2.1) I will try to apply your patch against XEN. Thanks, Paulian> >> In any case, cc''ing the author, Stefano. This ought to be easy to fix. We >> could make the irq_lock a recursive lock for example. > > We could also fail to map irqs into event channels if the delivery > method is not HVMIRQ_callback_vector: > > > diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c > index df92cc7..7d89ed6 100644 > --- a/xen/arch/x86/physdev.c > +++ b/xen/arch/x86/physdev.c > @@ -93,6 +93,11 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, > > if ( domid == DOMID_SELF && is_hvm_domain(d) ) > { > + if ( !is_hvm_pv_evtchn_domain(d) ) > + { > + ret = -EINVAL; > + goto free_domain; > + } > ret = physdev_hvm_map_pirq(d, type, index, pirq_p); > goto free_domain; > }_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 30 Jan 2012, Paulian Bogdan Marinca wrote:> On 30 January 2012 11:32, Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > > On Sat, 28 Jan 2012, Keir Fraser wrote: > >> On 27/01/2012 23:20, "Paulian Bogdan Marinca" <paulian@marinca.net> wrote: > >> > >> > I have a testing intel machine with 4 physical cpus running 64 bit Xen > >> > 4.1.3-rc1. > >> > > >> > I have a particular linux VM for which at its kernel boot time ( as > >> > domU ) it makes the XEN and Dom0 unresponsive (even the XEN serial > >> > console freezes indefinitely). The domU linux kernel is 3.2.1 and has > [...] > > > > That''s right, this scenario should not be possible because whenever > > XENFEAT_hvm_pirqs is available, XENFEAT_hvm_callback_vector should also > > be available. > > In any case I''ll submit a patch to Linux to make sure this doesn''t > > happen, explicitly checking for xen_have_vector_callback. > > What is your Linux kernel version? Could you please post your kernel > > config as well? > > Yes, you are right, this does not happen normally in a booting kernel. > What happens is that I needed to test my own PV drivers in all > possible scenarios, > so I hacked the domU linux kernel to "believe" that XEN does not have > XENFEAT_hvm_callback_vector, basically I forced a > > xen_have_vector_callback = 0; > > in enlighten.c in domU kernel. This is probably why it triggered this > unusual situation. > > I attach my kernel config (btw is actually a 3.0.6 kernel not 3.2.1) > > I will try to apply your patch against XEN.In that case it is a matter of protecting Xen against misbehaving guests, so I would rather have the patch below than try to handle the case correctly.> > diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c > > index df92cc7..7d89ed6 100644 > > --- a/xen/arch/x86/physdev.c > > +++ b/xen/arch/x86/physdev.c > > @@ -93,6 +93,11 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, > > > > Â Â if ( domid == DOMID_SELF && is_hvm_domain(d) ) > > Â Â { > > + Â Â Â Â if ( !is_hvm_pv_evtchn_domain(d) ) > > + Â Â Â Â { > > + Â Â Â Â Â Â ret = -EINVAL; > > + Â Â Â Â Â Â goto free_domain; > > + Â Â Â Â } > > Â Â Â Â ret = physdev_hvm_map_pirq(d, type, index, pirq_p); > > Â Â Â Â goto free_domain; > > Â Â } >--8323329-87990123-1327932724=:3196 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --8323329-87990123-1327932724=:3196--
Paulian Bogdan Marinca
2012-Jan-30 14:14 UTC
Re: XEN 4.1.3-rc1 bug, spinlock acquired twice
>> >> I attach my kernel config (btw is actually a 3.0.6 kernel not 3.2.1) >> >> I will try to apply your patch against XEN. > > In that case it is a matter of protecting Xen against misbehaving > guests, so I would rather have the patch below than try to handle the > case correctly. > > > >> > diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c >> > index df92cc7..7d89ed6 100644 >> > --- a/xen/arch/x86/physdev.c >> > +++ b/xen/arch/x86/physdev.c >> > @@ -93,6 +93,11 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, >> > >> > if ( domid == DOMID_SELF && is_hvm_domain(d) ) >> > { >> > + if ( !is_hvm_pv_evtchn_domain(d) ) >> > + { >> > + ret = -EINVAL; >> > + goto free_domain; >> > + } >> > ret = physdev_hvm_map_pirq(d, type, index, pirq_p); >> > goto free_domain; >> > } >>I tested the patch and yes, it prevents Xen being locked up.
On Mon, 30 Jan 2012, Paulian Bogdan Marinca wrote:> >> > >> I attach my kernel config (btw is actually a 3.0.6 kernel not 3.2.1) > >> > >> I will try to apply your patch against XEN. > > > > In that case it is a matter of protecting Xen against misbehaving > > guests, so I would rather have the patch below than try to handle the > > case correctly. > > > > > > > >> > diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c > >> > index df92cc7..7d89ed6 100644 > >> > --- a/xen/arch/x86/physdev.c > >> > +++ b/xen/arch/x86/physdev.c > >> > @@ -93,6 +93,11 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, > >> > > >> > Â Â if ( domid == DOMID_SELF && is_hvm_domain(d) ) > >> > Â Â { > >> > + Â Â Â Â if ( !is_hvm_pv_evtchn_domain(d) ) > >> > + Â Â Â Â { > >> > + Â Â Â Â Â Â ret = -EINVAL; > >> > + Â Â Â Â Â Â goto free_domain; > >> > + Â Â Â Â } > >> > Â Â Â Â ret = physdev_hvm_map_pirq(d, type, index, pirq_p); > >> > Â Â Â Â goto free_domain; > >> > Â Â } > >> > > I tested the patch and yes, it prevents Xen being locked up. >Thanks, I''ll resend and add your tested-by --8323329-1119101427-1327933118=:3196 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --8323329-1119101427-1327933118=:3196--