Stefan Bader
2011-Sep-21 13:03 UTC
[Xen-devel] Still struggling with HVM: tx timeouts on emulated nics
This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and gets configured via dhcp. And initial pings also get routed and done correctly. But slightly higher traffic (like checking for updates) hangs. And after a while there are messages about tx timeouts. The ne2k_pci type nic almost immediately has those issues and never comes up correctly. I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how this should be but both nics get configured with level,low IRQs. Disk emulation seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be at least not level. Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC? Another problem came up recently though that may just be me doing the wrong thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated devices. xen-blkfront is a module in my case and I thought I once had been able to use that by removing the unplug arg and making the blkfront driver load. But when I recently tried the module loaded but no disks appeared... Again, not sure I just forgot how to do that right or that was different when using a 4.1.0 hypervisor still... -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Sep-21 13:31 UTC
[Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Wed, 21 Sep 2011, Stefan Bader wrote:> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the > default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and > gets configured via dhcp. And initial pings also get routed and done correctly. > But slightly higher traffic (like checking for updates) hangs. And after a while > there are messages about tx timeouts. > The ne2k_pci type nic almost immediately has those issues and never comes up > correctly. > > I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how > this should be but both nics get configured with level,low IRQs. Disk emulation > seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be > at least not level.Does the e1000 emulated card work correctly? What happens if you disable interrupt remapping (see patch below)? diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 1017c7b..472a58b 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -354,8 +354,7 @@ int __init pci_xen_init(void) int __init pci_xen_hvm_init(void) { - if (!xen_feature(XENFEAT_hvm_pirqs)) - return 0; + return 0; #ifdef CONFIG_ACPI /*> Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?xen-pirq-ioapic interrupts are interrupts that have been remapped onto event channels> Another problem came up recently though that may just be me doing the wrong > thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated > devices. xen-blkfront is a module in my case and I thought I once had been able > to use that by removing the unplug arg and making the blkfront driver load. But > when I recently tried the module loaded but no disks appeared... Again, not sure > I just forgot how to do that right or that was different when using a 4.1.0 > hypervisor still...xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on older hypervisors that didn''t support the unplug protocol and had other ways to cope with multiple drivers accessing the same devices. You can use xen_emul_unplug=never to prevent any unplug but you won''t get any PV interfaces. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Sep-21 15:34 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 21.09.2011 15:31, Stefano Stabellini wrote:> On Wed, 21 Sep 2011, Stefan Bader wrote: >> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the >> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and >> gets configured via dhcp. And initial pings also get routed and done correctly. >> But slightly higher traffic (like checking for updates) hangs. And after a while >> there are messages about tx timeouts. >> The ne2k_pci type nic almost immediately has those issues and never comes up >> correctly. >> >> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how >> this should be but both nics get configured with level,low IRQs. Disk emulation >> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be >> at least not level. >> Does the e1000 emulated card work correctly?Yes, that one seems to work ok.> What happens if you disable interrupt remapping (see patch below)?8139cp seems to work correctly now (much higher irq stats as well) and e1000 still works. Both then using IOAPIC-fasteoi.> > > diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c > index 1017c7b..472a58b 100644 > --- a/arch/x86/pci/xen.c > +++ b/arch/x86/pci/xen.c > @@ -354,8 +354,7 @@ int __init pci_xen_init(void) > > int __init pci_xen_hvm_init(void) > { > - if (!xen_feature(XENFEAT_hvm_pirqs)) > - return 0; > + return 0; > > #ifdef CONFIG_ACPI > /* > > >> Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC? > > xen-pirq-ioapic interrupts are interrupts that have been remapped onto > event channelsAh ok, thanks.> > >> Another problem came up recently though that may just be me doing the wrong >> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated >> devices. xen-blkfront is a module in my case and I thought I once had been able >> to use that by removing the unplug arg and making the blkfront driver load. But >> when I recently tried the module loaded but no disks appeared... Again, not sure >> I just forgot how to do that right or that was different when using a 4.1.0 >> hypervisor still... > > xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on > older hypervisors that didn''t support the unplug protocol and had other > ways to cope with multiple drivers accessing the same devices. > You can use xen_emul_unplug=never to prevent any unplug but you won''t > get any PV interfaces.Hm, odd. Somehow I thought that I had been using pv interfaces that way when the interrupts for the emulated ide was broken. A bit suboptimal atm, because without any option and a kernel compiled with the platform pci and pv drivers (as modules) booting in HVM mode the kernel decides that having both is no use and unplugs the emulated devices. Which then leaves you with ... none. -Stefan> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Sep-22 10:30 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Wed, 21 Sep 2011, Stefan Bader wrote:> On 21.09.2011 15:31, Stefano Stabellini wrote: > > On Wed, 21 Sep 2011, Stefan Bader wrote: > >> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the > >> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and > >> gets configured via dhcp. And initial pings also get routed and done correctly. > >> But slightly higher traffic (like checking for updates) hangs. And after a while > >> there are messages about tx timeouts. > >> The ne2k_pci type nic almost immediately has those issues and never comes up > >> correctly. > >> > >> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how > >> this should be but both nics get configured with level,low IRQs. Disk emulation > >> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be > >> at least not level. > > > > > Does the e1000 emulated card work correctly? > > Yes, that one seems to work ok. > > > What happens if you disable interrupt remapping (see patch below)? > > 8139cp seems to work correctly now (much higher irq stats as well) and e1000 > still works. Both then using IOAPIC-fasteoi. >That means there must be another subtle bug in Xen in interrupt remapping that only affects 8139p emulation> >> Another problem came up recently though that may just be me doing the wrong > >> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated > >> devices. xen-blkfront is a module in my case and I thought I once had been able > >> to use that by removing the unplug arg and making the blkfront driver load. But > >> when I recently tried the module loaded but no disks appeared... Again, not sure > >> I just forgot how to do that right or that was different when using a 4.1.0 > >> hypervisor still... > > > > xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on > > older hypervisors that didn''t support the unplug protocol and had other > > ways to cope with multiple drivers accessing the same devices. > > You can use xen_emul_unplug=never to prevent any unplug but you won''t > > get any PV interfaces. > > Hm, odd. Somehow I thought that I had been using pv interfaces that way when the > interrupts for the emulated ide was broken. > A bit suboptimal atm, because without any option and a kernel compiled with the > platform pci and pv drivers (as modules) booting in HVM mode the kernel decides > that having both is no use and unplugs the emulated devices. Which then leaves > you with ... none.In theory you would have the PV frontend modules in the initrd. On the other hand having both can easily cause data corruptions on your drive. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Sep-22 11:58 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 22.09.2011 12:30, Stefano Stabellini wrote:> On Wed, 21 Sep 2011, Stefan Bader wrote: >> On 21.09.2011 15:31, Stefano Stabellini wrote: >>> On Wed, 21 Sep 2011, Stefan Bader wrote: >>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the >>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and >>>> gets configured via dhcp. And initial pings also get routed and done correctly. >>>> But slightly higher traffic (like checking for updates) hangs. And after a while >>>> there are messages about tx timeouts. >>>> The ne2k_pci type nic almost immediately has those issues and never comes up >>>> correctly. >>>> >>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how >>>> this should be but both nics get configured with level,low IRQs. Disk emulation >>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be >>>> at least not level. >>> >> >>> Does the e1000 emulated card work correctly? >> >> Yes, that one seems to work ok. >> >>> What happens if you disable interrupt remapping (see patch below)? >> >> 8139cp seems to work correctly now (much higher irq stats as well) and e1000 >> still works. Both then using IOAPIC-fasteoi. >> > > That means there must be another subtle bug in Xen in interrupt > remapping that only affects 8139p emulation >Right, or to be complete: - e1000: ok - 8139cp: unstable (setup is possible) - ne2k_pci: not working (tx problems from the beginning) The behaviour feels a bit like interrupts may get lost if occurring at a higher rate. Why this affects various drivers differently is a bit weird.> >>>> Another problem came up recently though that may just be me doing the wrong >>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated >>>> devices. xen-blkfront is a module in my case and I thought I once had been able >>>> to use that by removing the unplug arg and making the blkfront driver load. But >>>> when I recently tried the module loaded but no disks appeared... Again, not sure >>>> I just forgot how to do that right or that was different when using a 4.1.0 >>>> hypervisor still... >>> >>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on >>> older hypervisors that didn''t support the unplug protocol and had other >>> ways to cope with multiple drivers accessing the same devices. >>> You can use xen_emul_unplug=never to prevent any unplug but you won''t >>> get any PV interfaces. >> >> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the >> interrupts for the emulated ide was broken. >> A bit suboptimal atm, because without any option and a kernel compiled with the >> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides >> that having both is no use and unplugs the emulated devices. Which then leaves >> you with ... none. > > In theory you would have the PV frontend modules in the initrd. > On the other hand having both can easily cause data corruptions on your > drive.They _are_ in the initrd. And the boot rightfully drops to a maintenance shell right now (without any argument and the emulated devices unplugged). And "modprobe xen-blkfront" loads the module but it does _not_ detect any pv device. -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Sep-22 14:32 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 22.09.2011 13:58, Stefan Bader wrote:> On 22.09.2011 12:30, Stefano Stabellini wrote: >> On Wed, 21 Sep 2011, Stefan Bader wrote: >>> On 21.09.2011 15:31, Stefano Stabellini wrote: >>>> On Wed, 21 Sep 2011, Stefan Bader wrote: >>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the >>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and >>>>> gets configured via dhcp. And initial pings also get routed and done correctly. >>>>> But slightly higher traffic (like checking for updates) hangs. And after a while >>>>> there are messages about tx timeouts. >>>>> The ne2k_pci type nic almost immediately has those issues and never comes up >>>>> correctly. >>>>> >>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how >>>>> this should be but both nics get configured with level,low IRQs. Disk emulation >>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be >>>>> at least not level. >>>> >>> >>>> Does the e1000 emulated card work correctly? >>> >>> Yes, that one seems to work ok. >>> >>>> What happens if you disable interrupt remapping (see patch below)? >>> >>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000 >>> still works. Both then using IOAPIC-fasteoi. >>> >> >> That means there must be another subtle bug in Xen in interrupt >> remapping that only affects 8139p emulation >> > Right, or to be complete: > - e1000: ok > - 8139cp: unstable (setup is possible) > - ne2k_pci: not working (tx problems from the beginning) > > The behaviour feels a bit like interrupts may get lost if occurring at a higher > rate. Why this affects various drivers differently is a bit weird. >>This is mainly speculating... Quite a while back there was this patch to events: commit dffe2e1e1a1ddb566a76266136c312801c66dcf7 Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Fri Aug 20 19:10:01 2010 -0700 xen: handle events as edge-triggered The commit message stated that Xen events are logically edge triggered. So PV events were changed to be handled as edge interrupts. Would that not mean that for xen-pirq-apic being using events this would apply the same and those should be apic-edge instead of level?>>>>> Another problem came up recently though that may just be me doing the wrong >>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated >>>>> devices. xen-blkfront is a module in my case and I thought I once had been able >>>>> to use that by removing the unplug arg and making the blkfront driver load. But >>>>> when I recently tried the module loaded but no disks appeared... Again, not sure >>>>> I just forgot how to do that right or that was different when using a 4.1.0 >>>>> hypervisor still... >>>> >>>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on >>>> older hypervisors that didn''t support the unplug protocol and had other >>>> ways to cope with multiple drivers accessing the same devices. >>>> You can use xen_emul_unplug=never to prevent any unplug but you won''t >>>> get any PV interfaces. >>> >>> Hm, odd. Somehow I thought that I had been using pv interfaces that way when the >>> interrupts for the emulated ide was broken. >>> A bit suboptimal atm, because without any option and a kernel compiled with the >>> platform pci and pv drivers (as modules) booting in HVM mode the kernel decides >>> that having both is no use and unplugs the emulated devices. Which then leaves >>> you with ... none. >> >> In theory you would have the PV frontend modules in the initrd. >> On the other hand having both can easily cause data corruptions on your >> drive. > > They _are_ in the initrd. And the boot rightfully drops to a maintenance shell > right now (without any argument and the emulated devices unplugged). And > "modprobe xen-blkfront" loads the module but it does _not_ detect any pv device. > > -Stefan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Sep-22 17:44 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Thu, 22 Sep 2011, Stefan Bader wrote:> On 22.09.2011 13:58, Stefan Bader wrote: > > On 22.09.2011 12:30, Stefano Stabellini wrote: > >> On Wed, 21 Sep 2011, Stefan Bader wrote: > >>> On 21.09.2011 15:31, Stefano Stabellini wrote: > >>>> On Wed, 21 Sep 2011, Stefan Bader wrote: > >>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the > >>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and > >>>>> gets configured via dhcp. And initial pings also get routed and done correctly. > >>>>> But slightly higher traffic (like checking for updates) hangs. And after a while > >>>>> there are messages about tx timeouts. > >>>>> The ne2k_pci type nic almost immediately has those issues and never comes up > >>>>> correctly. > >>>>> > >>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how > >>>>> this should be but both nics get configured with level,low IRQs. Disk emulation > >>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be > >>>>> at least not level. > >>>> > >>> > >>>> Does the e1000 emulated card work correctly? > >>> > >>> Yes, that one seems to work ok. > >>> > >>>> What happens if you disable interrupt remapping (see patch below)? > >>> > >>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000 > >>> still works. Both then using IOAPIC-fasteoi. > >>> > >> > >> That means there must be another subtle bug in Xen in interrupt > >> remapping that only affects 8139p emulation > >> > > Right, or to be complete: > > - e1000: ok > > - 8139cp: unstable (setup is possible) > > - ne2k_pci: not working (tx problems from the beginning) > > > > The behaviour feels a bit like interrupts may get lost if occurring at a higher > > rate. Why this affects various drivers differently is a bit weird. > >> > > This is mainly speculating... Quite a while back there was this patch to events: > > commit dffe2e1e1a1ddb566a76266136c312801c66dcf7 > Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Date: Fri Aug 20 19:10:01 2010 -0700 > > xen: handle events as edge-triggered > > The commit message stated that Xen events are logically edge triggered. So PV > events were changed to be handled as edge interrupts. Would that not mean that > for xen-pirq-apic being using events this would apply the same and those should > be apic-edge instead of level?That commit is referring to the internal way Linux handles these event, that look like normal interrupt to the Linux irq subsystem. It is not related to the way actual events are delivered from Xen to Linux, so it shouldn''t matter here. I would add lots of printk''s in: xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert xen/arch/x86/hvm/irq.c:assert_irq xen/arch/x86/hvm/irq.c:assert_gsi to find out why xen is not injecting those interrupts _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Sep-30 09:13 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 22.09.2011 19:44, Stefano Stabellini wrote:> On Thu, 22 Sep 2011, Stefan Bader wrote: >> On 22.09.2011 13:58, Stefan Bader wrote: >>> On 22.09.2011 12:30, Stefano Stabellini wrote: >>>> On Wed, 21 Sep 2011, Stefan Bader wrote: >>>>> On 21.09.2011 15:31, Stefano Stabellini wrote: >>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote: >>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the >>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and >>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly. >>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while >>>>>>> there are messages about tx timeouts. >>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up >>>>>>> correctly. >>>>>>> >>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how >>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation >>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be >>>>>>> at least not level. >>>>>> >>>>> >>>>>> Does the e1000 emulated card work correctly? >>>>> >>>>> Yes, that one seems to work ok. >>>>> >>>>>> What happens if you disable interrupt remapping (see patch below)? >>>>> >>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000 >>>>> still works. Both then using IOAPIC-fasteoi. >>>>> >>>> >>>> That means there must be another subtle bug in Xen in interrupt >>>> remapping that only affects 8139p emulation >>>> >>> Right, or to be complete: >>> - e1000: ok >>> - 8139cp: unstable (setup is possible) >>> - ne2k_pci: not working (tx problems from the beginning) >>> >>> The behaviour feels a bit like interrupts may get lost if occurring at a higher >>> rate. Why this affects various drivers differently is a bit weird. >>>> >> >> This is mainly speculating... Quite a while back there was this patch to events: >> >> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7 >> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> Date: Fri Aug 20 19:10:01 2010 -0700 >> >> xen: handle events as edge-triggered >> >> The commit message stated that Xen events are logically edge triggered. So PV >> events were changed to be handled as edge interrupts. Would that not mean that >> for xen-pirq-apic being using events this would apply the same and those should >> be apic-edge instead of level? > > That commit is referring to the internal way Linux handles these event, > that look like normal interrupt to the Linux irq subsystem. It is not > related to the way actual events are delivered from Xen to Linux, so it > shouldn''t matter here. > > I would add lots of printk''s in: > > xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert > xen/arch/x86/hvm/irq.c:assert_irq > xen/arch/x86/hvm/irq.c:assert_gsi > > to find out why xen is not injecting those interrupts > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develIt took quite a bit of time but at least I got some hopefully useful information now. So in general, whenever an interrupt is asserted, the hypervisor runs through this: __hvm_pci_intx_assert: when assert count was 0 before incrementing call assert_gsi call send_guest_pirq (when hvm uses pirq) In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one of the first actions whether evtchn_pending in the shared_info is set. If that is the case the call immediately returns with 1. Adding printks to call_assert_gsi, I noticed that - When things stop working, the last call to send_guest_pirq returned 1. - But not every time the return code is one, the stall happens. - e1000 also has cases where send_guest_pirq returns 1 but they happen much less often (than using the 8139cp). Usually every intx_assert has a intx_deassert call that follows. when the stall occurs, this does not happen. Right here I got some troubles to understand where this intx_deassert is actually triggered. With an added WARN_ON the stack traces seem odd, like this: (XEN) [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130 (XEN) [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60 (XEN) [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60 (XEN) [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220 (XEN) [<ffff82c48017dba6>] copy_from_user+0x26/0x90 (XEN) [<ffff82c4801f9446>] do_iret+0xb6/0x1a0 (XEN) [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d Not really sure how one gets from do_update_descriptor to do_hvm_op and the only thing in there which does the deassert is some irq level setting. Actually the guest does not really do much do EOI (which I had been assuming). But since domain_pirq_to_irq maps to 0 for emuirqs, the call to PHYSDEVOP_irq_status_query will hit the following and not set the flag for needing EOI. irq_status_query.flags = 0; if ( is_hvm_domain(v->domain) && domain_pirq_to_irq(v->domain, irq) <= 0 ) { ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; break; } So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I fail to understand what actually is doing the hvm_pci_intx_deassert calls but the way the fasteoi code in the guest looks to be working, there seems to be some gap between calling the handler and the eoi function... So from what I see, I would assume the following: dom0 domU - intx_assert (count 0->1) - send_guest_pirq = 0 (evtchn_pending = 1) - upcall starts fasteoi handler - something does intx_deassert (count 1->0) - intx_assert (count 0->1) - send_guest_pirq = 1 (evtchn_pending still set) - handler->eoi sets evtchn to 0 but otherwise does nothing - there is no intx_deassert, so even when another intx_assert would happen (which does not seem to be the case) no further send_guest_pirq would be called. Unfortunately I do miss some details on the inner working here. Generally I wonder whether not setting the needsEOI flag for those pirqs just is the problem. But it also could be intentional... -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Sep-30 14:09 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Fri, 30 Sep 2011, Stefan Bader wrote:> On 22.09.2011 19:44, Stefano Stabellini wrote: > > On Thu, 22 Sep 2011, Stefan Bader wrote: > >> On 22.09.2011 13:58, Stefan Bader wrote: > >>> On 22.09.2011 12:30, Stefano Stabellini wrote: > >>>> On Wed, 21 Sep 2011, Stefan Bader wrote: > >>>>> On 21.09.2011 15:31, Stefano Stabellini wrote: > >>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote: > >>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the > >>>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and > >>>>>>> gets configured via dhcp. And initial pings also get routed and done correctly. > >>>>>>> But slightly higher traffic (like checking for updates) hangs. And after a while > >>>>>>> there are messages about tx timeouts. > >>>>>>> The ne2k_pci type nic almost immediately has those issues and never comes up > >>>>>>> correctly. > >>>>>>> > >>>>>>> I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how > >>>>>>> this should be but both nics get configured with level,low IRQs. Disk emulation > >>>>>>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be > >>>>>>> at least not level. > >>>>>> > >>>>> > >>>>>> Does the e1000 emulated card work correctly? > >>>>> > >>>>> Yes, that one seems to work ok. > >>>>> > >>>>>> What happens if you disable interrupt remapping (see patch below)? > >>>>> > >>>>> 8139cp seems to work correctly now (much higher irq stats as well) and e1000 > >>>>> still works. Both then using IOAPIC-fasteoi. > >>>>> > >>>> > >>>> That means there must be another subtle bug in Xen in interrupt > >>>> remapping that only affects 8139p emulation > >>>> > >>> Right, or to be complete: > >>> - e1000: ok > >>> - 8139cp: unstable (setup is possible) > >>> - ne2k_pci: not working (tx problems from the beginning) > >>> > >>> The behaviour feels a bit like interrupts may get lost if occurring at a higher > >>> rate. Why this affects various drivers differently is a bit weird. > >>>> > >> > >> This is mainly speculating... Quite a while back there was this patch to events: > >> > >> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7 > >> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > >> Date: Fri Aug 20 19:10:01 2010 -0700 > >> > >> xen: handle events as edge-triggered > >> > >> The commit message stated that Xen events are logically edge triggered. So PV > >> events were changed to be handled as edge interrupts. Would that not mean that > >> for xen-pirq-apic being using events this would apply the same and those should > >> be apic-edge instead of level? > > > > That commit is referring to the internal way Linux handles these event, > > that look like normal interrupt to the Linux irq subsystem. It is not > > related to the way actual events are delivered from Xen to Linux, so it > > shouldn''t matter here. > > > > I would add lots of printk''s in: > > > > xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert > > xen/arch/x86/hvm/irq.c:assert_irq > > xen/arch/x86/hvm/irq.c:assert_gsi > > > > to find out why xen is not injecting those interrupts > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > It took quite a bit of time but at least I got some hopefully useful information > now. So in general, whenever an interrupt is asserted, > the hypervisor runs through this: > > __hvm_pci_intx_assert: > when assert count was 0 before incrementing > call assert_gsi > call send_guest_pirq (when hvm uses pirq) > > In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one > of the first actions whether evtchn_pending in the shared_info is set. If that > is the case the call immediately returns with 1. > > Adding printks to call_assert_gsi, I noticed that > - When things stop working, the last call to send_guest_pirq returned 1. > - But not every time the return code is one, the stall happens. > - e1000 also has cases where send_guest_pirq returns 1 but they happen much > less often (than using the 8139cp). > > Usually every intx_assert has a intx_deassert call that follows. when the stall > occurs, this does not happen. Right here I got some troubles to understand where > this intx_deassert is actually triggered. With an added WARN_ON the stack traces > seem odd, like this: > > (XEN) [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130 > (XEN) [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60 > (XEN) [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60 > (XEN) [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220 > (XEN) [<ffff82c48017dba6>] copy_from_user+0x26/0x90 > (XEN) [<ffff82c4801f9446>] do_iret+0xb6/0x1a0 > (XEN) [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d > > Not really sure how one gets from do_update_descriptor to do_hvm_op and the only > thing in there which does the deassert is some irq level setting. > > Actually the guest does not really do much do EOI (which I had been assuming). > But since domain_pirq_to_irq maps to 0 for emuirqs, the call to > PHYSDEVOP_irq_status_query will hit the following and not set the flag for > needing EOI. > > irq_status_query.flags = 0; > if ( is_hvm_domain(v->domain) && > domain_pirq_to_irq(v->domain, irq) <= 0 ) > { > ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; > break; > } > > So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I > fail to understand what actually is doing the hvm_pci_intx_deassert calls but > the way the fasteoi code in the guest looks to be working, there seems to be > some gap between calling the handler and the eoi function... So from what I see, > I would assume the following: > > dom0 domU > - intx_assert (count 0->1) > - send_guest_pirq = 0 > (evtchn_pending = 1) > - upcall starts fasteoi handler > - something does intx_deassert > (count 1->0) > - intx_assert (count 0->1) > - send_guest_pirq = 1 > (evtchn_pending still set) > - handler->eoi sets evtchn to 0 but > otherwise does nothing > - there is no intx_deassert, so even > when another intx_assert would happen > (which does not seem to be the case) > no further send_guest_pirq would be > called. > > Unfortunately I do miss some details on the inner working here. Generally I > wonder whether not setting the needsEOI flag for those pirqs just is the > problem. But it also could be intentional...Thanks for the very detailed analysis. It seems to me that the problem is that if the interrupt is a level triggered interrupt when the guest issues an EOI we should be reinjecting the interrupt again if it has been issued a second time in the meantime. However this doesn''t happen if the interrupt has been remapped onto an even channel. In that case the guest is not even going to issue an EOI at all. So I wrote a patch to force the guest to issue EOIs even on remapped irqs; in the hypercall handler we check whether we need to reinject the interrupt and if that is the case we set the corresponding event channel pending. Could you please try the patch I appended? I haven''t been able to reproduce your problem so I am not really sure if it works. diff -r e042fb60e0ee xen/arch/x86/physdev.c --- a/xen/arch/x86/physdev.c Thu Sep 29 11:23:01 2011 +0000 +++ b/xen/arch/x86/physdev.c Fri Sep 30 14:01:46 2011 +0000 @@ -11,6 +11,7 @@ #include <asm/current.h> #include <asm/io_apic.h> #include <asm/msi.h> +#include <asm/hvm/irq.h> #include <asm/hypercall.h> #include <public/xen.h> #include <public/physdev.h> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) pirq_guest_eoi(pirq); + if ( is_hvm_domain(v->domain) && + domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 ) + { + struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq; + int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq); + + /* if this is a level irq and count > 0, send another + * notification */ + if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */ + && hvm_irq->gsi_assert_count[gsi] ) + send_guest_pirq(v->domain, pirq); + } spin_unlock(&v->domain->event_lock); ret = 0; break; @@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H if ( (irq < 0) || (irq >= v->domain->nr_pirqs) ) break; irq_status_query.flags = 0; - if ( is_hvm_domain(v->domain) && - domain_pirq_to_irq(v->domain, irq) <= 0 ) - { - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; - break; - } /* * Even edge-triggered or message-based IRQs can need masking from _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Sep-30 16:06 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 30.09.2011 16:09, Stefano Stabellini wrote:> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H > if ( !is_hvm_domain(v->domain) || > domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) > pirq_guest_eoi(pirq); > + if ( is_hvm_domain(v->domain) && > + domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 ) > + { > + struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq; > + int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq); > + > + /* if this is a level irq and count > 0, send another > + * notification */ > + if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */ > + && hvm_irq->gsi_assert_count[gsi] ) > + send_guest_pirq(v->domain, pirq); > + } > spin_unlock(&v->domain->event_lock); > ret = 0; > break;This hunk looks substantially different from my 4.1.1 based code. There is no spin_lock acquired. Not sure that could be a reason for the different behaviour, too. I''ll add that spinlock too. case PHYSDEVOP_eoi: { struct physdev_eoi eoi; ret = -EFAULT; if ( copy_from_guest(&eoi, arg, 1) != 0 ) break; ret = -EINVAL; if ( eoi.irq >= v->domain->nr_pirqs ) break; if ( v->domain->arch.pirq_eoi_map ) evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]); if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) ret = pirq_guest_eoi(v->domain, eoi.irq); else ret = 0; break; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Sep-30 17:59 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 30.09.2011 18:06, Stefan Bader wrote:> On 30.09.2011 16:09, Stefano Stabellini wrote: >> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H >> if ( !is_hvm_domain(v->domain) || >> domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) >> pirq_guest_eoi(pirq); >> + if ( is_hvm_domain(v->domain) && >> + domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 ) >> + { >> + struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq; >> + int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq); >> + >> + /* if this is a level irq and count > 0, send another >> + * notification */ >> + if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */ >> + && hvm_irq->gsi_assert_count[gsi] ) >> + send_guest_pirq(v->domain, pirq); >> + } >> spin_unlock(&v->domain->event_lock); >> ret = 0; >> break; > > This hunk looks substantially different from my 4.1.1 based code. There is no > spin_lock acquired. Not sure that could be a reason for the different behaviour, > too. I''ll add that spinlock too. > > case PHYSDEVOP_eoi: { > struct physdev_eoi eoi; > ret = -EFAULT; > if ( copy_from_guest(&eoi, arg, 1) != 0 ) > break; > ret = -EINVAL; > if ( eoi.irq >= v->domain->nr_pirqs ) > break; > if ( v->domain->arch.pirq_eoi_map ) > evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]); > if ( !is_hvm_domain(v->domain) || > domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) > ret = pirq_guest_eoi(v->domain, eoi.irq); > else > ret = 0; > break; > } > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develOk, so I had been modifying that hunk to spin_lock(&v->domain->event_lock); if ( v->domain->arch.pirq_eoi_map ) evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]); if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) pirq_guest_eoi(v->domain, eoi.irq); if ( is_hvm_domain(v->domain) && domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 ) { struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq; int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq); /* if this is a level irq and count > 0, send another * notification */ if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */ && hvm_irq->gsi_assert_count[gsi] ) { printk("re-send event for gsi%i\n", gsi); send_guest_pirq(v->domain, eoi.irq); } } spin_unlock(&v->domain->event_lock); ret = 0; Also I did not completely remove the section that would return the status without setting needsEOI. I just changed the if condition to be <0 instead of <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check could be useful for something. irq_status_query.flags = 0; if ( is_hvm_domain(v->domain) && domain_pirq_to_irq(v->domain, irq) < 0 ) { ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; break; } With that a quick test shows both the re-sends done sometimes and the domU doing EOIs. And there is no stall apparent. Did the same quick test with the e1000 emulated NIC and that still seems ok. Those were not very thorough tests but at least I would have observed a stall pretty quick otherwise. -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Oct-03 17:24 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Fri, 30 Sep 2011, Stefan Bader wrote:> Also I did not completely remove the section that would return the status > without setting needsEOI. I just changed the if condition to be <0 instead of > <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check > could be useful for something. > > irq_status_query.flags = 0; > if ( is_hvm_domain(v->domain) && > domain_pirq_to_irq(v->domain, irq) < 0 ) > { > ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; > break; > } >You need to remove the entire test because we want to receive notifications in all cases.> With that a quick test shows both the re-sends done sometimes and the domU doing > EOIs. And there is no stall apparent. Did the same quick test with the e1000 > emulated NIC and that still seems ok. Those were not very thorough tests but at > least I would have observed a stall pretty quick otherwise.I am glad it fixes the problem for you. I am going to send a different patch upstream for Xen 4.2, because I would also like it to cover the very unlikely scenario in which a PV guest (like dom0 or a PV guest with PCI passthrough) is loosing level interrupts because when Xen tries to set the corresponding event channel pending the bit is alreay set. The codebase is different enough that making the same change on 4.1 is non-trivial. I am appending the new patch to this email, it would be great if you could test it. You just need a 4.2 hypervisor, not the entire system. You should be able to perform the test updating only xen.gz. If you have trouble if xen-unstable.hg tip, try changeset 23843. --- diff -r bf533533046c xen/arch/x86/hvm/irq.c --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000 @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d, if ( hvm_domain_use_pirq(d, pirq) ) { - send_guest_pirq(d, pirq); + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS ) + pirq->lost++; return; } vioapic_irq_positive_edge(d, ioapic_gsi); @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert( { struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; unsigned int gsi, link, isa_irq; + struct pirq *pirq; ASSERT((device <= 31) && (intx <= 3)); @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert( gsi = hvm_pci_intx_gsi(device, intx); if ( hvm_irq->gsi_assert_count[gsi]++ == 0 ) assert_gsi(d, gsi); + else { + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi)); + if ( hvm_domain_use_pirq(d, pirq) ) + pirq->lost++; + } link = hvm_pci_intx_link(device, intx); isa_irq = hvm_irq->pci_link.route[link]; diff -r bf533533046c xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000 @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq) !test_and_set_bool(pirq->masked) ) action->in_flight++; if ( !hvm_do_IRQ_dpci(d, pirq) ) - send_guest_pirq(d, pirq); + { + if ( send_guest_pirq(d, pirq) && + action->ack_type == ACKTYPE_EOI ) + pirq->lost++; + } } if ( action->ack_type != ACKTYPE_NONE ) diff -r bf533533046c xen/arch/x86/physdev.c --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000 @@ -11,6 +11,7 @@ #include <asm/current.h> #include <asm/io_apic.h> #include <asm/msi.h> +#include <asm/hvm/irq.h> #include <asm/hypercall.h> #include <public/xen.h> #include <public/physdev.h> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) pirq_guest_eoi(pirq); + if ( pirq->lost > 0) { + if ( !send_guest_pirq(v->domain, pirq) ) + pirq->lost--; + } spin_unlock(&v->domain->event_lock); ret = 0; break; @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H break; irq_status_query.flags = 0; if ( is_hvm_domain(v->domain) && - domain_pirq_to_irq(v->domain, irq) <= 0 ) + domain_pirq_to_irq(v->domain, irq) <= 0 && + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND ) { - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; + ret = -EINVAL; break; } diff -r bf533533046c xen/include/xen/irq.h --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000 @@ -146,6 +146,7 @@ struct pirq { int pirq; u16 evtchn; bool_t masked; + u32 lost; struct rcu_head rcu_head; struct arch_pirq arch; }; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Oct-03 18:13 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
CC''ing Jan, that probably is going to have an opinion on this. Let me add a bit of background: Stefan found out that PV on HVM guests could loose level interrupts coming from emulated devices. Looking through the code I realized that we need to add some logic to inject a pirq in the guest if a level interrupt has been raised while the guest is servicing the first one. While this is all very specific to interrupt remapping and emulated devices, I realized that something similar could happen even with dom0 or other PV guests with PCI passthrough: 1) the device raises a level interrupt and xen injects it into the guest; 2) the guest is temporarely stuck: it does not ack it or eoi it; 3) the xen timer kicks in and eois the interrupt; 4) the device thinks it is all fine and sends a second interrupt; 5) Xen fails to inject the second interrupt into the guest because the guest has still the event channel pending bit set; at this point the guest looses the second interrupt notification, that is not supposed to happen with level interrupts and I think it might cause problems with some devices. Jan, do you think we should try to handle this case, or is it too unlikely? In any case we need to handle the PV on HVM remapping bug, that because of the way interrupts are emulated is much more likely to happen... On Mon, 3 Oct 2011, Stefano Stabellini wrote:> On Fri, 30 Sep 2011, Stefan Bader wrote: > > Also I did not completely remove the section that would return the status > > without setting needsEOI. I just changed the if condition to be <0 instead of > > <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check > > could be useful for something. > > > > irq_status_query.flags = 0; > > if ( is_hvm_domain(v->domain) && > > domain_pirq_to_irq(v->domain, irq) < 0 ) > > { > > ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; > > break; > > } > > > > You need to remove the entire test because we want to receive > notifications in all cases. > > > > With that a quick test shows both the re-sends done sometimes and the domU doing > > EOIs. And there is no stall apparent. Did the same quick test with the e1000 > > emulated NIC and that still seems ok. Those were not very thorough tests but at > > least I would have observed a stall pretty quick otherwise. > > I am glad it fixes the problem for you. > > I am going to send a different patch upstream for Xen 4.2, because I > would also like it to cover the very unlikely scenario in which a PV > guest (like dom0 or a PV guest with PCI passthrough) is loosing level > interrupts because when Xen tries to set the corresponding event channel > pending the bit is alreay set. The codebase is different enough that > making the same change on 4.1 is non-trivial. I am appending the new > patch to this email, it would be great if you could test it. You just > need a 4.2 hypervisor, not the entire system. You should be able to > perform the test updating only xen.gz. > If you have trouble if xen-unstable.hg tip, try changeset 23843. > > --- > > > diff -r bf533533046c xen/arch/x86/hvm/irq.c > --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000 > +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000 > @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d, > > if ( hvm_domain_use_pirq(d, pirq) ) > { > - send_guest_pirq(d, pirq); > + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS ) > + pirq->lost++; > return; > } > vioapic_irq_positive_edge(d, ioapic_gsi); > @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert( > { > struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; > unsigned int gsi, link, isa_irq; > + struct pirq *pirq; > > ASSERT((device <= 31) && (intx <= 3)); > > @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert( > gsi = hvm_pci_intx_gsi(device, intx); > if ( hvm_irq->gsi_assert_count[gsi]++ == 0 ) > assert_gsi(d, gsi); > + else { > + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi)); > + if ( hvm_domain_use_pirq(d, pirq) ) > + pirq->lost++; > + } > > link = hvm_pci_intx_link(device, intx); > isa_irq = hvm_irq->pci_link.route[link]; > diff -r bf533533046c xen/arch/x86/irq.c > --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000 > +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000 > @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq) > !test_and_set_bool(pirq->masked) ) > action->in_flight++; > if ( !hvm_do_IRQ_dpci(d, pirq) ) > - send_guest_pirq(d, pirq); > + { > + if ( send_guest_pirq(d, pirq) && > + action->ack_type == ACKTYPE_EOI ) > + pirq->lost++; > + } > } > > if ( action->ack_type != ACKTYPE_NONE ) > diff -r bf533533046c xen/arch/x86/physdev.c > --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000 > +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000 > @@ -11,6 +11,7 @@ > #include <asm/current.h> > #include <asm/io_apic.h> > #include <asm/msi.h> > +#include <asm/hvm/irq.h> > #include <asm/hypercall.h> > #include <public/xen.h> > #include <public/physdev.h> > @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H > if ( !is_hvm_domain(v->domain) || > domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) > pirq_guest_eoi(pirq); > + if ( pirq->lost > 0) { > + if ( !send_guest_pirq(v->domain, pirq) ) > + pirq->lost--; > + } > spin_unlock(&v->domain->event_lock); > ret = 0; > break; > @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H > break; > irq_status_query.flags = 0; > if ( is_hvm_domain(v->domain) && > - domain_pirq_to_irq(v->domain, irq) <= 0 ) > + domain_pirq_to_irq(v->domain, irq) <= 0 && > + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND ) > { > - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; > + ret = -EINVAL; > break; > } > > diff -r bf533533046c xen/include/xen/irq.h > --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000 > +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000 > @@ -146,6 +146,7 @@ struct pirq { > int pirq; > u16 evtchn; > bool_t masked; > + u32 lost; > struct rcu_head rcu_head; > struct arch_pirq arch; > }; >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Cooper
2011-Oct-04 10:07 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 03/10/11 19:13, Stefano Stabellini wrote:> CC''ing Jan, that probably is going to have an opinion on this. > > Let me add a bit of background: Stefan found out that PV on HVM guests > could loose level interrupts coming from emulated devices. Looking > through the code I realized that we need to add some logic to inject a > pirq in the guest if a level interrupt has been raised while the guest > is servicing the first one. > While this is all very specific to interrupt remapping and emulated > devices, I realized that something similar could happen even with dom0 > or other PV guests with PCI passthrough: > > 1) the device raises a level interrupt and xen injects it into the > guest; > > 2) the guest is temporarely stuck: it does not ack it or eoi it; > > 3) the xen timer kicks in and eois the interrupt; > > 4) the device thinks it is all fine and sends a second interrupt; > > 5) Xen fails to inject the second interrupt into the guest because the > guest has still the event channel pending bit set; > > at this point the guest looses the second interrupt notification, that > is not supposed to happen with level interrupts and I think it might > cause problems with some devices. > > Jan, do you think we should try to handle this case, or is it too > unlikely?I am not certain whether this is relevant, but the ICH10 IO-APIC documentation indicated that early EOI''ing of a line level interrupt should not have this effect. Specifically, it states that EOI''ing a line level interrupt whos line is still asserted will cause the interrupt to be "re-raised" from the IO-APIC. It uses this to assert that it is fine to use multiple IO-APIC entries with the same vector, with a broadcast of vector number alone to EOI the interrupt. In this case, while Xen sees two interrupts, from the devices point of view, only I has happened. In the case where the device has dropped its line level interrupt of its own accord, then I would agree that the current Xen behavior is wrong. I cant offhand think of a good reason why this would occur. I know it is not helpful in this case, but as a rule of thumb, line level interrupts should not be used with Xen. The average response time on an unloaded system is ~30ms, ranging from 5 to 150. (On a sample set of a Dell R710, Xen 4.1.0 and 2.6.32 dom0, over 2 weeks of debugging another line level interrupt bug). ~Andrew> In any case we need to handle the PV on HVM remapping bug, that because > of the way interrupts are emulated is much more likely to happen... > > > On Mon, 3 Oct 2011, Stefano Stabellini wrote: >> On Fri, 30 Sep 2011, Stefan Bader wrote: >>> Also I did not completely remove the section that would return the status >>> without setting needsEOI. I just changed the if condition to be <0 instead of >>> <=0 (I knew from the tests that the mapping was always 0 and maybe the <0 check >>> could be useful for something. >>> >>> irq_status_query.flags = 0; >>> if ( is_hvm_domain(v->domain) && >>> domain_pirq_to_irq(v->domain, irq) < 0 ) >>> { >>> ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; >>> break; >>> } >>> >> You need to remove the entire test because we want to receive >> notifications in all cases. >> >> >>> With that a quick test shows both the re-sends done sometimes and the domU doing >>> EOIs. And there is no stall apparent. Did the same quick test with the e1000 >>> emulated NIC and that still seems ok. Those were not very thorough tests but at >>> least I would have observed a stall pretty quick otherwise. >> I am glad it fixes the problem for you. >> >> I am going to send a different patch upstream for Xen 4.2, because I >> would also like it to cover the very unlikely scenario in which a PV >> guest (like dom0 or a PV guest with PCI passthrough) is loosing level >> interrupts because when Xen tries to set the corresponding event channel >> pending the bit is alreay set. The codebase is different enough that >> making the same change on 4.1 is non-trivial. I am appending the new >> patch to this email, it would be great if you could test it. You just >> need a 4.2 hypervisor, not the entire system. You should be able to >> perform the test updating only xen.gz. >> If you have trouble if xen-unstable.hg tip, try changeset 23843. >> >> --- >> >> >> diff -r bf533533046c xen/arch/x86/hvm/irq.c >> --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000 >> +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000 >> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d, >> >> if ( hvm_domain_use_pirq(d, pirq) ) >> { >> - send_guest_pirq(d, pirq); >> + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS ) >> + pirq->lost++; >> return; >> } >> vioapic_irq_positive_edge(d, ioapic_gsi); >> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert( >> { >> struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; >> unsigned int gsi, link, isa_irq; >> + struct pirq *pirq; >> >> ASSERT((device <= 31) && (intx <= 3)); >> >> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert( >> gsi = hvm_pci_intx_gsi(device, intx); >> if ( hvm_irq->gsi_assert_count[gsi]++ == 0 ) >> assert_gsi(d, gsi); >> + else { >> + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi)); >> + if ( hvm_domain_use_pirq(d, pirq) ) >> + pirq->lost++; >> + } >> >> link = hvm_pci_intx_link(device, intx); >> isa_irq = hvm_irq->pci_link.route[link]; >> diff -r bf533533046c xen/arch/x86/irq.c >> --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000 >> +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000 >> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq) >> !test_and_set_bool(pirq->masked) ) >> action->in_flight++; >> if ( !hvm_do_IRQ_dpci(d, pirq) ) >> - send_guest_pirq(d, pirq); >> + { >> + if ( send_guest_pirq(d, pirq) && >> + action->ack_type == ACKTYPE_EOI ) >> + pirq->lost++; >> + } >> } >> >> if ( action->ack_type != ACKTYPE_NONE ) >> diff -r bf533533046c xen/arch/x86/physdev.c >> --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000 >> +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000 >> @@ -11,6 +11,7 @@ >> #include <asm/current.h> >> #include <asm/io_apic.h> >> #include <asm/msi.h> >> +#include <asm/hvm/irq.h> >> #include <asm/hypercall.h> >> #include <public/xen.h> >> #include <public/physdev.h> >> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H >> if ( !is_hvm_domain(v->domain) || >> domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) >> pirq_guest_eoi(pirq); >> + if ( pirq->lost > 0) { >> + if ( !send_guest_pirq(v->domain, pirq) ) >> + pirq->lost--; >> + } >> spin_unlock(&v->domain->event_lock); >> ret = 0; >> break; >> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H >> break; >> irq_status_query.flags = 0; >> if ( is_hvm_domain(v->domain) && >> - domain_pirq_to_irq(v->domain, irq) <= 0 ) >> + domain_pirq_to_irq(v->domain, irq) <= 0 && >> + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND ) >> { >> - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; >> + ret = -EINVAL; >> break; >> } >> >> diff -r bf533533046c xen/include/xen/irq.h >> --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000 >> +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000 >> @@ -146,6 +146,7 @@ struct pirq { >> int pirq; >> u16 evtchn; >> bool_t masked; >> + u32 lost; >> struct rcu_head rcu_head; >> struct arch_pirq arch; >> }; >> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Oct-04 14:13 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Tue, 4 Oct 2011, Andrew Cooper wrote:> On 03/10/11 19:13, Stefano Stabellini wrote: > > CC''ing Jan, that probably is going to have an opinion on this. > > > > Let me add a bit of background: Stefan found out that PV on HVM guests > > could loose level interrupts coming from emulated devices. Looking > > through the code I realized that we need to add some logic to inject a > > pirq in the guest if a level interrupt has been raised while the guest > > is servicing the first one. > > While this is all very specific to interrupt remapping and emulated > > devices, I realized that something similar could happen even with dom0 > > or other PV guests with PCI passthrough: > > > > 1) the device raises a level interrupt and xen injects it into the > > guest; > > > > 2) the guest is temporarely stuck: it does not ack it or eoi it; > > > > 3) the xen timer kicks in and eois the interrupt; > > > > 4) the device thinks it is all fine and sends a second interrupt; > > > > 5) Xen fails to inject the second interrupt into the guest because the > > guest has still the event channel pending bit set; > > > > at this point the guest looses the second interrupt notification, that > > is not supposed to happen with level interrupts and I think it might > > cause problems with some devices. > > > > Jan, do you think we should try to handle this case, or is it too > > unlikely? > > I am not certain whether this is relevant, but the ICH10 IO-APIC > documentation indicated that early EOI''ing of a line level interrupt > should not have this effect. Specifically, it states that EOI''ing a > line level interrupt whos line is still asserted will cause the > interrupt to be "re-raised" from the IO-APIC. It uses this to assert > that it is fine to use multiple IO-APIC entries with the same vector, > with a broadcast of vector number alone to EOI the interrupt. > > In this case, while Xen sees two interrupts, from the devices point of > view, only I has happened. > > In the case where the device has dropped its line level interrupt of its > own accord, then I would agree that the current Xen behavior is wrong. > I cant offhand think of a good reason why this would occur.I think this scenario is actually possible. It is certainly happening with qemu''s emulated devices. This patch would take care of re-injecting the interrupts both in the case of the device deasserting and reasserting the interrupt while the guest hasn''t cleared the pending bit yet and in case a PV on HVM guest eois the interrupt too early. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Oct-05 16:10 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 03.10.2011 19:24, Stefano Stabellini wrote:> I am going to send a different patch upstream for Xen 4.2, because I > would also like it to cover the very unlikely scenario in which a PV > guest (like dom0 or a PV guest with PCI passthrough) is loosing level > interrupts because when Xen tries to set the corresponding event channel > pending the bit is alreay set. The codebase is different enough that > making the same change on 4.1 is non-trivial. I am appending the new > patch to this email, it would be great if you could test it. You just > need a 4.2 hypervisor, not the entire system. You should be able to > perform the test updating only xen.gz. > If you have trouble if xen-unstable.hg tip, try changeset 23843.Hi Stefano, currently I would have the problem that I don''t have too much time to move to another hypervisor (tests may or may not be useful there with substantial changes beside this one) with our next release being close. But I think I got a usable backport of your change to 4.1.1 (you think it looks ok?) and have given that a quick test which seems to be ok... Though one drawback is that I don''t have a setup which would use passthrough, so that path is not tested. I think I did see (with a debugging version) that the lost count was incremented and decremented in dom0, though. -Stefan --- Index: xen-4.1.1/xen/arch/x86/domain.c ==================================================================--- xen-4.1.1.orig/xen/arch/x86/domain.c 2011-10-05 15:03:19.405815293 +0200 +++ xen-4.1.1/xen/arch/x86/domain.c 2011-10-05 15:09:59.781816622 +0200 @@ -514,6 +514,12 @@ int arch_domain_create(struct domain *d, memset(d->arch.pirq_irq, 0, d->nr_pirqs * sizeof(*d->arch.pirq_irq)); + d->arch.pirq_lost = xmalloc_array(int, d->nr_pirqs); + if ( !d->arch.pirq_lost) + goto fail; + memset(d->arch.pirq_lost, 0, + d->nr_pirqs * sizeof(*d->arch.pirq_lost)); + d->arch.irq_pirq = xmalloc_array(int, nr_irqs); if ( !d->arch.irq_pirq ) goto fail; @@ -575,6 +581,7 @@ int arch_domain_create(struct domain *d, fail: d->is_dying = DOMDYING_dead; vmce_destroy_msr(d); + xfree(d->arch.pirq_lost); xfree(d->arch.pirq_irq); xfree(d->arch.irq_pirq); xfree(d->arch.pirq_emuirq); @@ -628,6 +635,7 @@ void arch_domain_destroy(struct domain * #endif free_xenheap_page(d->shared_info); + xfree(d->arch.pirq_lost); xfree(d->arch.pirq_irq); xfree(d->arch.irq_pirq); xfree(d->arch.pirq_emuirq); Index: xen-4.1.1/xen/arch/x86/hvm/irq.c ==================================================================--- xen-4.1.1.orig/xen/arch/x86/hvm/irq.c 2011-10-05 15:14:35.441815292 +0200 +++ xen-4.1.1/xen/arch/x86/hvm/irq.c 2011-10-05 17:55:43.603986605 +0200 @@ -33,7 +33,9 @@ static void assert_gsi(struct domain *d, int pirq = domain_emuirq_to_pirq(d, ioapic_gsi); if ( hvm_domain_use_pirq(d, pirq) ) { - send_guest_pirq(d, pirq); + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS ) + if (d->arch.pirq_lost) + d->arch.pirq_lost[pirq]++; return; } vioapic_irq_positive_edge(d, ioapic_gsi); @@ -67,6 +69,12 @@ static void __hvm_pci_intx_assert( gsi = hvm_pci_intx_gsi(device, intx); if ( hvm_irq->gsi_assert_count[gsi]++ == 0 ) assert_gsi(d, gsi); + else { + int pirq = domain_emuirq_to_pirq(d, gsi); + + if ( hvm_domain_use_pirq(d, pirq) && d->arch.pirq_lost) + d->arch.pirq_lost[pirq]++; + } link = hvm_pci_intx_link(device, intx); isa_irq = hvm_irq->pci_link.route[link]; Index: xen-4.1.1/xen/arch/x86/irq.c ==================================================================--- xen-4.1.1.orig/xen/arch/x86/irq.c 2011-10-05 15:26:58.477815292 +0200 +++ xen-4.1.1/xen/arch/x86/irq.c 2011-10-05 17:56:23.191986535 +0200 @@ -888,10 +888,13 @@ static void __do_IRQ_guest(int irq) desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */ } } - else if ( send_guest_pirq(d, pirq) && - (action->ack_type == ACKTYPE_NONE) ) - { - already_pending++; + else { + if ( send_guest_pirq(d, pirq) ) { + if ( action->ack_type == ACKTYPE_EOI && d->arch.pirq_lost) + d->arch.pirq_lost[pirq]++; + else if ( action->ack_type == ACKTYPE_NONE ) + already_pending++; + } } } Index: xen-4.1.1/xen/arch/x86/physdev.c ==================================================================--- xen-4.1.1.orig/xen/arch/x86/physdev.c 2011-10-05 15:36:14.545815292 +0200 +++ xen-4.1.1/xen/arch/x86/physdev.c 2011-10-05 17:57:06.055986460 +0200 @@ -261,13 +261,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H ret = -EINVAL; if ( eoi.irq >= v->domain->nr_pirqs ) break; + spin_lock(&v->domain->event_lock); if ( v->domain->arch.pirq_eoi_map ) evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]); if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) - ret = pirq_guest_eoi(v->domain, eoi.irq); - else - ret = 0; + pirq_guest_eoi(v->domain, eoi.irq); + if ( v->domain->arch.pirq_lost && v->domain->arch.pirq_lost[eoi.irq]) { + if ( !send_guest_pirq(v->domain, eoi.irq) ) + v->domain->arch.pirq_lost[eoi.irq]--; + } + ret = 0; + spin_unlock(&v->domain->event_lock); break; } @@ -323,9 +328,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H break; irq_status_query.flags = 0; if ( is_hvm_domain(v->domain) && - domain_pirq_to_irq(v->domain, irq) <= 0 ) + domain_pirq_to_irq(v->domain, irq) <= 0 && + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND ) { - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; + ret = -EINVAL; break; } Index: xen-4.1.1/xen/include/asm-x86/domain.h ==================================================================--- xen-4.1.1.orig/xen/include/asm-x86/domain.h 2011-10-05 15:10:11.709815293 +0200 +++ xen-4.1.1/xen/include/asm-x86/domain.h 2011-10-05 15:12:46.237815276 +0200 @@ -312,6 +312,9 @@ struct arch_domain (possibly other cases in the future */ uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */ uint64_t vtsc_usercount; /* not used for hvm */ + + /* Protected by d->event_lock, count of lost pirqs */ + int *pirq_lost; } __cacheline_aligned; #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Oct-06 10:12 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On Wed, 5 Oct 2011, Stefan Bader wrote:> On 03.10.2011 19:24, Stefano Stabellini wrote: > > I am going to send a different patch upstream for Xen 4.2, because I > > would also like it to cover the very unlikely scenario in which a PV > > guest (like dom0 or a PV guest with PCI passthrough) is loosing level > > interrupts because when Xen tries to set the corresponding event channel > > pending the bit is alreay set. The codebase is different enough that > > making the same change on 4.1 is non-trivial. I am appending the new > > patch to this email, it would be great if you could test it. You just > > need a 4.2 hypervisor, not the entire system. You should be able to > > perform the test updating only xen.gz. > > If you have trouble if xen-unstable.hg tip, try changeset 23843. > > Hi Stefano, > > currently I would have the problem that I don''t have too much time to move to > another hypervisor (tests may or may not be useful there with substantial > changes beside this one) with our next release being close. > But I think I got a usable backport of your change to 4.1.1 (you think it looks > ok?) and have given that a quick test which seems to be ok... > Though one drawback is that I don''t have a setup which would use passthrough, so > that path is not tested. I think I did see (with a debugging version) that the > lost count was incremented and decremented in dom0, though. >Honestly if you have to commit to a backport for your package right now, I would go for the previous version, because it is simpler and less likely to introduce regressions. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Bader
2011-Oct-06 12:16 UTC
Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics
On 06.10.2011 12:12, Stefano Stabellini wrote:> On Wed, 5 Oct 2011, Stefan Bader wrote: >> On 03.10.2011 19:24, Stefano Stabellini wrote: >>> I am going to send a different patch upstream for Xen 4.2, because I >>> would also like it to cover the very unlikely scenario in which a PV >>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level >>> interrupts because when Xen tries to set the corresponding event channel >>> pending the bit is alreay set. The codebase is different enough that >>> making the same change on 4.1 is non-trivial. I am appending the new >>> patch to this email, it would be great if you could test it. You just >>> need a 4.2 hypervisor, not the entire system. You should be able to >>> perform the test updating only xen.gz. >>> If you have trouble if xen-unstable.hg tip, try changeset 23843. >> >> Hi Stefano, >> >> currently I would have the problem that I don''t have too much time to move to >> another hypervisor (tests may or may not be useful there with substantial >> changes beside this one) with our next release being close. >> But I think I got a usable backport of your change to 4.1.1 (you think it looks >> ok?) and have given that a quick test which seems to be ok... >> Though one drawback is that I don''t have a setup which would use passthrough, so >> that path is not tested. I think I did see (with a debugging version) that the >> lost count was incremented and decremented in dom0, though. >> > > Honestly if you have to commit to a backport for your package right now, > I would go for the previous version, because it is simpler and less > likely to introduce regressions.Agreed. Well at least I hope that since that backport seemed to fix the issue I saw in 4.1.1 it will give some more confidence for you on the 4.2 version. With the drawback of the passthrough not being tested. -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Oct-27 10:37 UTC
[Xen-devel] [PATCH] xen: do not loose level interrupt notifications
PV on HVM guests can loose level interrupts coming from emulated devices: we are missing code to retry to inject a pirq in the guest if it corresponds to a level interrupt and the interrupt has been raised while the guest is servicing the first one. The same thing could also happen with PV guests, including dom0, even though it is much more unlikely. In case of PV guests the scenario would be the following: 1) a device raises a level interrupt and xen injects it into the guest; 2) the guest is temporarely stuck: it does not ack it or eoi it; 3) the xen timer kicks in and eois the interrupt; 4) the device thinks it is all fine and sends a second interrupt; 5) Xen fails to inject the second interrupt into the guest because the guest has still the event channel pending bit set; at this point the guest looses the second interrupt notification, that is not supposed to happen with level interrupts and it might cause problems with some devices. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> diff -r bf533533046c xen/arch/x86/hvm/irq.c --- a/xen/arch/x86/hvm/irq.c Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/arch/x86/hvm/irq.c Mon Oct 03 16:54:51 2011 +0000 @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d, if ( hvm_domain_use_pirq(d, pirq) ) { - send_guest_pirq(d, pirq); + if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS ) + pirq->lost++; return; } vioapic_irq_positive_edge(d, ioapic_gsi); @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert( { struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq; unsigned int gsi, link, isa_irq; + struct pirq *pirq; ASSERT((device <= 31) && (intx <= 3)); @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert( gsi = hvm_pci_intx_gsi(device, intx); if ( hvm_irq->gsi_assert_count[gsi]++ == 0 ) assert_gsi(d, gsi); + else { + pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi)); + if ( hvm_domain_use_pirq(d, pirq) ) + pirq->lost++; + } link = hvm_pci_intx_link(device, intx); isa_irq = hvm_irq->pci_link.route[link]; diff -r bf533533046c xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/arch/x86/irq.c Mon Oct 03 16:54:51 2011 +0000 @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq) !test_and_set_bool(pirq->masked) ) action->in_flight++; if ( !hvm_do_IRQ_dpci(d, pirq) ) - send_guest_pirq(d, pirq); + { + if ( send_guest_pirq(d, pirq) && + action->ack_type == ACKTYPE_EOI ) + pirq->lost++; + } } if ( action->ack_type != ACKTYPE_NONE ) diff -r bf533533046c xen/arch/x86/physdev.c --- a/xen/arch/x86/physdev.c Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/arch/x86/physdev.c Mon Oct 03 16:54:51 2011 +0000 @@ -11,6 +11,7 @@ #include <asm/current.h> #include <asm/io_apic.h> #include <asm/msi.h> +#include <asm/hvm/irq.h> #include <asm/hypercall.h> #include <public/xen.h> #include <public/physdev.h> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) pirq_guest_eoi(pirq); + if ( pirq->lost > 0) { + if ( !send_guest_pirq(v->domain, pirq) ) + pirq->lost--; + } spin_unlock(&v->domain->event_lock); ret = 0; break; @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H break; irq_status_query.flags = 0; if ( is_hvm_domain(v->domain) && - domain_pirq_to_irq(v->domain, irq) <= 0 ) + domain_pirq_to_irq(v->domain, irq) <= 0 && + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND ) { - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; + ret = -EINVAL; break; } diff -r bf533533046c xen/include/xen/irq.h --- a/xen/include/xen/irq.h Fri Sep 30 14:12:35 2011 +0000 +++ b/xen/include/xen/irq.h Mon Oct 03 16:54:51 2011 +0000 @@ -146,6 +146,7 @@ struct pirq { int pirq; u16 evtchn; bool_t masked; + u32 lost; struct rcu_head rcu_head; struct arch_pirq arch; }; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Oct-27 11:18 UTC
Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications
On 27/10/2011 11:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com> wrote:> PV on HVM guests can loose level interrupts coming from emulated > devices: we are missing code to retry to inject a pirq in the guest if > it corresponds to a level interrupt and the interrupt has been raised > while the guest is servicing the first one. > > The same thing could also happen with PV guests, including dom0, even > though it is much more unlikely. In case of PV guests the scenario would > be the following: > > 1) a device raises a level interrupt and xen injects it into the > guest; > > 2) the guest is temporarely stuck: it does not ack it or eoi it; > > 3) the xen timer kicks in and eois the interrupt; > > 4) the device thinks it is all fine and sends a second interrupt; > > 5) Xen fails to inject the second interrupt into the guest because the > guest has still the event channel pending bit set; > > at this point the guest looses the second interrupt notification, that > is not supposed to happen with level interrupts and it might cause > problems with some devices.You can''t really lose a level-triggered interrupt. In step (4) the device isn''t really actively involved in sending another interrupt -- it never deasserted its INTx line, and nor will it until the guest''s ISR quenches the interrupt at the device. If the guest misses such an interrupt, and doesn''t execute the relevant ISR when it should, then another interrupt will simply be raised by the interrupt controller when the guest does finally EOI the interrupt. Because the device is *still* asserting the line. Well, that''s the PV case anyway. I don''t see any problem with our handling of the PV case. Is PV-HVM so different? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Oct-27 11:42 UTC
Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications
On Thu, 27 Oct 2011, Keir Fraser wrote:> On 27/10/2011 11:37, "Stefano Stabellini" <Stefano.Stabellini@eu.citrix.com> > wrote: > > > PV on HVM guests can loose level interrupts coming from emulated > > devices: we are missing code to retry to inject a pirq in the guest if > > it corresponds to a level interrupt and the interrupt has been raised > > while the guest is servicing the first one. > > > > The same thing could also happen with PV guests, including dom0, even > > though it is much more unlikely. In case of PV guests the scenario would > > be the following: > > > > 1) a device raises a level interrupt and xen injects it into the > > guest; > > > > 2) the guest is temporarely stuck: it does not ack it or eoi it; > > > > 3) the xen timer kicks in and eois the interrupt; > > > > 4) the device thinks it is all fine and sends a second interrupt; > > > > 5) Xen fails to inject the second interrupt into the guest because the > > guest has still the event channel pending bit set; > > > > at this point the guest looses the second interrupt notification, that > > is not supposed to happen with level interrupts and it might cause > > problems with some devices. > > You can''t really lose a level-triggered interrupt. In step (4) the device > isn''t really actively involved in sending another interrupt -- it never > deasserted its INTx line, and nor will it until the guest''s ISR quenches the > interrupt at the device. If the guest misses such an interrupt, and doesn''t > execute the relevant ISR when it should, then another interrupt will simply > be raised by the interrupt controller when the guest does finally EOI the > interrupt. Because the device is *still* asserting the line. > > Well, that''s the PV case anyway. I don''t see any problem with our handling > of the PV case.OK, you convinced me.> Is PV-HVM so different?Yes, it is. In the PV on HVM case we need to reassert an emulated interrupt if the guest EOIs it without quenching the interrupt in the ISR. We are doing it already in the emulated code path but we are not doing it for interrupts that have been remapped into pirqs. That said, if we don''t care about the PV case we can simplify the patch, I am appending a new one that only takes care of the PV on HVM case. --- xen: re-inject emulated level pirqs in PV on HVM guests if still asserted PV on HVM guests can loose level interrupts coming from emulated devices if they have been remapped onto event channels. The reason is that we are missing the code to inject a pirq again in the guest when the guest EOIs it, if it corresponds to an emulated level interrupt and the interrupt is still asserted. Fix this issue and also return error when the guest tries to get the irq_status of a non-existing pirq. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> diff -r c681dd5aecf3 xen/arch/x86/physdev.c --- a/xen/arch/x86/physdev.c Tue Oct 25 19:22:09 2011 +0100 +++ b/xen/arch/x86/physdev.c Thu Oct 27 11:30:55 2011 +0000 @@ -11,6 +11,7 @@ #include <asm/current.h> #include <asm/io_apic.h> #include <asm/msi.h> +#include <asm/hvm/irq.h> #include <asm/hypercall.h> #include <public/xen.h> #include <public/physdev.h> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H if ( !is_hvm_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) pirq_guest_eoi(pirq); + if ( is_hvm_domain(v->domain) && + domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 ) + { + struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq; + int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq); + + /* if this is a level irq and count > 0, send another + * notification */ + if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */ + && hvm_irq->gsi_assert_count[gsi] ) + send_guest_pirq(v->domain, pirq); + } spin_unlock(&v->domain->event_lock); ret = 0; break; @@ -328,9 +341,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H break; irq_status_query.flags = 0; if ( is_hvm_domain(v->domain) && - domain_pirq_to_irq(v->domain, irq) <= 0 ) + domain_pirq_to_irq(v->domain, irq) <= 0 && + domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND ) { - ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0; + ret = -EINVAL; break; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Oct-27 12:17 UTC
Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications
On 27/10/2011 12:42, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> wrote:>> Is PV-HVM so different? > > Yes, it is. In the PV on HVM case we need to reassert an emulated > interrupt if the guest EOIs it without quenching the interrupt in the > ISR. > We are doing it already in the emulated code path but we are not doing > it for interrupts that have been remapped into pirqs. > > That said, if we don''t care about the PV case we can simplify the patch, > I am appending a new one that only takes care of the PV on HVM case.Ah yes, when we are *emulating* an INTx line, either for an emulated device or because we are doing MSI-INTx emulation, we do have to remember to reassert. That makes sense. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel