thr3ads.net - Xen devel - [Xen-devel] Still struggling with HVM: tx timeouts on emulated nics [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Stefan Bader

2011-Sep-21 13:03 UTC

[Xen-devel] Still struggling with HVM: tx timeouts on emulated nics

This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using the
default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up and
gets configured via dhcp. And initial pings also get routed and done correctly.
But slightly higher traffic (like checking for updates) hangs. And after a while
there are messages about tx timeouts.
The ne2k_pci type nic almost immediately has those issues and never comes up
correctly.

I am attaching the dmesg of the guest with apic=debug enabled. I am not sure how
this should be but both nics get configured with level,low IRQs. Disk emulation
seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem to be
at least not level.

Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?

Another problem came up recently though that may just be me doing the wrong
thing. Normally I boot with xen_emul_unplug=unnecessary as I want the emulated
devices. xen-blkfront is a module in my case and I thought I once had been able
to use that by removing the unplug arg and making the blkfront driver load. But
when I recently tried the module loaded but no disks appeared... Again, not sure
I just forgot how to do that right or that was different when using a 4.1.0
hypervisor still...

-Stefan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Sep-21 13:31 UTC

head link

[Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Wed, 21 Sep 2011, Stefan Bader wrote:> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried using
the
> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes up
and
> gets configured via dhcp. And initial pings also get routed and done
correctly.
> But slightly higher traffic (like checking for updates) hangs. And after a
while
> there are messages about tx timeouts.
> The ne2k_pci type nic almost immediately has those issues and never comes
up
> correctly.
> 
> I am attaching the dmesg of the guest with apic=debug enabled. I am not
sure how
> this should be but both nics get configured with level,low IRQs. Disk
emulation
> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs seem
to be
> at least not level.
Does the e1000 emulated card work correctly?
What happens if you disable interrupt remapping (see patch below)?


diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 1017c7b..472a58b 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -354,8 +354,7 @@ int __init pci_xen_init(void)
 
 int __init pci_xen_hvm_init(void)
 {
-	if (!xen_feature(XENFEAT_hvm_pirqs))
-		return 0;
+	return 0;
 
 #ifdef CONFIG_ACPI
 	/*

> Btw, what exactly is the difference between xen-pirq-ioapic and IO-APIC?
xen-pirq-ioapic interrupts are interrupts that have been remapped onto
event channels

> Another problem came up recently though that may just be me doing the wrong
> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the
emulated
> devices. xen-blkfront is a module in my case and I thought I once had been
able
> to use that by removing the unplug arg and making the blkfront driver load.
But
> when I recently tried the module loaded but no disks appeared... Again, not
sure
> I just forgot how to do that right or that was different when using a 4.1.0
> hypervisor still... 
xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
older hypervisors that didn''t support the unplug protocol and had other
ways to cope with multiple drivers accessing the same devices.
You can use xen_emul_unplug=never to prevent any unplug but you won''t
get any PV interfaces.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Sep-21 15:34 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 21.09.2011 15:31, Stefano Stabellini wrote:> On Wed, 21 Sep 2011, Stefan Bader wrote:
>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I tried
using the
>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least comes
up and
>> gets configured via dhcp. And initial pings also get routed and done
correctly.
>> But slightly higher traffic (like checking for updates) hangs. And
after a while
>> there are messages about tx timeouts.
>> The ne2k_pci type nic almost immediately has those issues and never
comes up
>> correctly.
>>
>> I am attaching the dmesg of the guest with apic=debug enabled. I am not
sure how
>> this should be but both nics get configured with level,low IRQs. Disk
emulation
>> seems to be ok but that seem to use IO-APIC-edge. And any other IRQs
seem to be
>> at least not level.
> 
> Does the e1000 emulated card work correctly?
Yes, that one seems to work ok.
> What happens if you disable interrupt remapping (see patch below)?
8139cp seems to work correctly now (much higher irq stats as well) and e1000
still works. Both then using IOAPIC-fasteoi.
> 
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 1017c7b..472a58b 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -354,8 +354,7 @@ int __init pci_xen_init(void)
>  
>  int __init pci_xen_hvm_init(void)
>  {
> -	if (!xen_feature(XENFEAT_hvm_pirqs))
> -		return 0;
> +	return 0;
>  
>  #ifdef CONFIG_ACPI
>  	/*
> 
> 
>> Btw, what exactly is the difference between xen-pirq-ioapic and
IO-APIC?
> 
> xen-pirq-ioapic interrupts are interrupts that have been remapped onto
> event channels
Ah ok, thanks.
> 
> 
>> Another problem came up recently though that may just be me doing the
wrong
>> thing. Normally I boot with xen_emul_unplug=unnecessary as I want the
emulated
>> devices. xen-blkfront is a module in my case and I thought I once had
been able
>> to use that by removing the unplug arg and making the blkfront driver
load. But
>> when I recently tried the module loaded but no disks appeared... Again,
not sure
>> I just forgot how to do that right or that was different when using a
4.1.0
>> hypervisor still...
>  
> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
> older hypervisors that didn''t support the unplug protocol and had
other
> ways to cope with multiple drivers accessing the same devices.
> You can use xen_emul_unplug=never to prevent any unplug but you
won''t
> get any PV interfaces.
Hm, odd. Somehow I thought that I had been using pv interfaces that way when the
interrupts for the emulated ide was broken.
A bit suboptimal atm, because without any option and a kernel compiled with the
platform pci and pv drivers (as modules) booting in HVM mode the kernel decides
that having both is no use and unplugs the emulated devices. Which then leaves
you with ... none.

-Stefan> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Sep-22 10:30 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Wed, 21 Sep 2011, Stefan Bader wrote:> On 21.09.2011 15:31, Stefano Stabellini wrote:
> > On Wed, 21 Sep 2011, Stefan Bader wrote:
> >> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I
tried using the
> >> default 8139cp and ne2k_pci emulated nic. The 8139cp one at least
comes up and
> >> gets configured via dhcp. And initial pings also get routed and
done correctly.
> >> But slightly higher traffic (like checking for updates) hangs. And
after a while
> >> there are messages about tx timeouts.
> >> The ne2k_pci type nic almost immediately has those issues and
never comes up
> >> correctly.
> >>
> >> I am attaching the dmesg of the guest with apic=debug enabled. I
am not sure how
> >> this should be but both nics get configured with level,low IRQs.
Disk emulation
> >> seems to be ok but that seem to use IO-APIC-edge. And any other
IRQs seem to be
> >> at least not level.
> > 
> 
> > Does the e1000 emulated card work correctly?
> 
> Yes, that one seems to work ok.
> 
> > What happens if you disable interrupt remapping (see patch below)?
> 
> 8139cp seems to work correctly now (much higher irq stats as well) and
e1000
> still works. Both then using IOAPIC-fasteoi.
> 
That means there must be another subtle bug in Xen in interrupt
remapping that only affects 8139p emulation

> >> Another problem came up recently though that may just be me doing
the wrong
> >> thing. Normally I boot with xen_emul_unplug=unnecessary as I want
the emulated
> >> devices. xen-blkfront is a module in my case and I thought I once
had been able
> >> to use that by removing the unplug arg and making the blkfront
driver load. But
> >> when I recently tried the module loaded but no disks appeared...
Again, not sure
> >> I just forgot how to do that right or that was different when
using a 4.1.0
> >> hypervisor still...
> >  
> > xen_emul_unplug=unnecessary allows the kernel to use PV interfaces on
> > older hypervisors that didn''t support the unplug protocol and
had other
> > ways to cope with multiple drivers accessing the same devices.
> > You can use xen_emul_unplug=never to prevent any unplug but you
won''t
> > get any PV interfaces.
> 
> Hm, odd. Somehow I thought that I had been using pv interfaces that way
when the
> interrupts for the emulated ide was broken.
> A bit suboptimal atm, because without any option and a kernel compiled with
the
> platform pci and pv drivers (as modules) booting in HVM mode the kernel
decides
> that having both is no use and unplugs the emulated devices. Which then
leaves
> you with ... none.
In theory you would have the PV frontend modules in the initrd.
On the other hand having both can easily cause data corruptions on your
drive.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Sep-22 11:58 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 22.09.2011 12:30, Stefano Stabellini wrote:> On Wed, 21 Sep 2011, Stefan Bader wrote:
>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor. I
tried using the
>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at
least comes up and
>>>> gets configured via dhcp. And initial pings also get routed and
done correctly.
>>>> But slightly higher traffic (like checking for updates) hangs.
And after a while
>>>> there are messages about tx timeouts.
>>>> The ne2k_pci type nic almost immediately has those issues and
never comes up
>>>> correctly.
>>>>
>>>> I am attaching the dmesg of the guest with apic=debug enabled.
I am not sure how
>>>> this should be but both nics get configured with level,low
IRQs. Disk emulation
>>>> seems to be ok but that seem to use IO-APIC-edge. And any other
IRQs seem to be
>>>> at least not level.
>>>
>>
>>> Does the e1000 emulated card work correctly?
>>
>> Yes, that one seems to work ok.
>>
>>> What happens if you disable interrupt remapping (see patch below)?
>>
>> 8139cp seems to work correctly now (much higher irq stats as well) and
e1000
>> still works. Both then using IOAPIC-fasteoi.
>>
> 
> That means there must be another subtle bug in Xen in interrupt
> remapping that only affects 8139p emulation
> Right, or to be complete:
- e1000: ok
- 8139cp: unstable (setup is possible)
- ne2k_pci: not working (tx problems from the beginning)

The behaviour feels a bit like interrupts may get lost if occurring at a higher
rate. Why this affects various drivers differently is a bit
weird.> 
>>>> Another problem came up recently though that may just be me
doing the wrong
>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as I
want the emulated
>>>> devices. xen-blkfront is a module in my case and I thought I
once had been able
>>>> to use that by removing the unplug arg and making the blkfront
driver load. But
>>>> when I recently tried the module loaded but no disks
appeared... Again, not sure
>>>> I just forgot how to do that right or that was different when
using a 4.1.0
>>>> hypervisor still...
>>>  
>>> xen_emul_unplug=unnecessary allows the kernel to use PV interfaces
on
>>> older hypervisors that didn''t support the unplug protocol
and had other
>>> ways to cope with multiple drivers accessing the same devices.
>>> You can use xen_emul_unplug=never to prevent any unplug but you
won''t
>>> get any PV interfaces.
>>
>> Hm, odd. Somehow I thought that I had been using pv interfaces that way
when the
>> interrupts for the emulated ide was broken.
>> A bit suboptimal atm, because without any option and a kernel compiled
with the
>> platform pci and pv drivers (as modules) booting in HVM mode the kernel
decides
>> that having both is no use and unplugs the emulated devices. Which then
leaves
>> you with ... none.
> 
> In theory you would have the PV frontend modules in the initrd.
> On the other hand having both can easily cause data corruptions on your
> drive.
They _are_ in the initrd. And the boot rightfully drops to a maintenance shell
right now (without any argument and the emulated devices unplugged). And
"modprobe xen-blkfront" loads the module but it does _not_ detect any
pv device.

-Stefan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Sep-22 14:32 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 22.09.2011 13:58, Stefan Bader wrote:> On 22.09.2011 12:30, Stefano Stabellini wrote:
>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1 hypervisor.
I tried using the
>>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp one at
least comes up and
>>>>> gets configured via dhcp. And initial pings also get routed
and done correctly.
>>>>> But slightly higher traffic (like checking for updates)
hangs. And after a while
>>>>> there are messages about tx timeouts.
>>>>> The ne2k_pci type nic almost immediately has those issues
and never comes up
>>>>> correctly.
>>>>>
>>>>> I am attaching the dmesg of the guest with apic=debug
enabled. I am not sure how
>>>>> this should be but both nics get configured with level,low
IRQs. Disk emulation
>>>>> seems to be ok but that seem to use IO-APIC-edge. And any
other IRQs seem to be
>>>>> at least not level.
>>>>
>>>
>>>> Does the e1000 emulated card work correctly?
>>>
>>> Yes, that one seems to work ok.
>>>
>>>> What happens if you disable interrupt remapping (see patch
below)?
>>>
>>> 8139cp seems to work correctly now (much higher irq stats as well)
and e1000
>>> still works. Both then using IOAPIC-fasteoi.
>>>
>>
>> That means there must be another subtle bug in Xen in interrupt
>> remapping that only affects 8139p emulation
>>
> Right, or to be complete:
> - e1000: ok
> - 8139cp: unstable (setup is possible)
> - ne2k_pci: not working (tx problems from the beginning)
> 
> The behaviour feels a bit like interrupts may get lost if occurring at a
higher
> rate. Why this affects various drivers differently is a bit weird.
>>
This is mainly speculating... Quite a while back there was this patch to events:

commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Fri Aug 20 19:10:01 2010 -0700

    xen: handle events as edge-triggered

The commit message stated that Xen events are logically edge triggered. So PV
events were changed to be handled as edge interrupts. Would that not mean that
for xen-pirq-apic being using events this would apply the same and those should
be apic-edge instead of level?
>>>>> Another problem came up recently though that may just be me
doing the wrong
>>>>> thing. Normally I boot with xen_emul_unplug=unnecessary as
I want the emulated
>>>>> devices. xen-blkfront is a module in my case and I thought
I once had been able
>>>>> to use that by removing the unplug arg and making the
blkfront driver load. But
>>>>> when I recently tried the module loaded but no disks
appeared... Again, not sure
>>>>> I just forgot how to do that right or that was different
when using a 4.1.0
>>>>> hypervisor still...
>>>>  
>>>> xen_emul_unplug=unnecessary allows the kernel to use PV
interfaces on
>>>> older hypervisors that didn''t support the unplug
protocol and had other
>>>> ways to cope with multiple drivers accessing the same devices.
>>>> You can use xen_emul_unplug=never to prevent any unplug but you
won''t
>>>> get any PV interfaces.
>>>
>>> Hm, odd. Somehow I thought that I had been using pv interfaces that
way when the
>>> interrupts for the emulated ide was broken.
>>> A bit suboptimal atm, because without any option and a kernel
compiled with the
>>> platform pci and pv drivers (as modules) booting in HVM mode the
kernel decides
>>> that having both is no use and unplugs the emulated devices. Which
then leaves
>>> you with ... none.
>>
>> In theory you would have the PV frontend modules in the initrd.
>> On the other hand having both can easily cause data corruptions on your
>> drive.
> 
> They _are_ in the initrd. And the boot rightfully drops to a maintenance
shell
> right now (without any argument and the emulated devices unplugged). And
> "modprobe xen-blkfront" loads the module but it does _not_ detect
any pv device.
> 
> -Stefan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Sep-22 17:44 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Thu, 22 Sep 2011, Stefan Bader wrote:> On 22.09.2011 13:58, Stefan Bader wrote:
> > On 22.09.2011 12:30, Stefano Stabellini wrote:
> >> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> This is on 3.0.4 based dom0 and domU with 4.1.1
hypervisor. I tried using the
> >>>>> default 8139cp and ne2k_pci emulated nic. The 8139cp
one at least comes up and
> >>>>> gets configured via dhcp. And initial pings also get
routed and done correctly.
> >>>>> But slightly higher traffic (like checking for
updates) hangs. And after a while
> >>>>> there are messages about tx timeouts.
> >>>>> The ne2k_pci type nic almost immediately has those
issues and never comes up
> >>>>> correctly.
> >>>>>
> >>>>> I am attaching the dmesg of the guest with apic=debug
enabled. I am not sure how
> >>>>> this should be but both nics get configured with
level,low IRQs. Disk emulation
> >>>>> seems to be ok but that seem to use IO-APIC-edge. And
any other IRQs seem to be
> >>>>> at least not level.
> >>>>
> >>>
> >>>> Does the e1000 emulated card work correctly?
> >>>
> >>> Yes, that one seems to work ok.
> >>>
> >>>> What happens if you disable interrupt remapping (see patch
below)?
> >>>
> >>> 8139cp seems to work correctly now (much higher irq stats as
well) and e1000
> >>> still works. Both then using IOAPIC-fasteoi.
> >>>
> >>
> >> That means there must be another subtle bug in Xen in interrupt
> >> remapping that only affects 8139p emulation
> >>
> > Right, or to be complete:
> > - e1000: ok
> > - 8139cp: unstable (setup is possible)
> > - ne2k_pci: not working (tx problems from the beginning)
> > 
> > The behaviour feels a bit like interrupts may get lost if occurring at
a higher
> > rate. Why this affects various drivers differently is a bit weird.
> >>
> 
> This is mainly speculating... Quite a while back there was this patch to
events:
> 
> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Date:   Fri Aug 20 19:10:01 2010 -0700
> 
>     xen: handle events as edge-triggered
> 
> The commit message stated that Xen events are logically edge triggered. So
PV
> events were changed to be handled as edge interrupts. Would that not mean
that
> for xen-pirq-apic being using events this would apply the same and those
should
> be apic-edge instead of level?
That commit is referring to the internal way Linux handles these event,
that look like normal interrupt to the Linux irq subsystem. It is not
related to the way actual events are delivered from Xen to Linux, so it
shouldn''t matter here.

I would add lots of printk''s in:

xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
xen/arch/x86/hvm/irq.c:assert_irq
xen/arch/x86/hvm/irq.c:assert_gsi

to find out why xen is not injecting those interrupts

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Sep-30 09:13 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 22.09.2011 19:44, Stefano Stabellini wrote:> On Thu, 22 Sep 2011, Stefan Bader wrote:
>> On 22.09.2011 13:58, Stefan Bader wrote:
>>> On 22.09.2011 12:30, Stefano Stabellini wrote:
>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
>>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
>>>>>>> This is on 3.0.4 based dom0 and domU with 4.1.1
hypervisor. I tried using the
>>>>>>> default 8139cp and ne2k_pci emulated nic. The
8139cp one at least comes up and
>>>>>>> gets configured via dhcp. And initial pings also
get routed and done correctly.
>>>>>>> But slightly higher traffic (like checking for
updates) hangs. And after a while
>>>>>>> there are messages about tx timeouts.
>>>>>>> The ne2k_pci type nic almost immediately has those
issues and never comes up
>>>>>>> correctly.
>>>>>>>
>>>>>>> I am attaching the dmesg of the guest with
apic=debug enabled. I am not sure how
>>>>>>> this should be but both nics get configured with
level,low IRQs. Disk emulation
>>>>>>> seems to be ok but that seem to use IO-APIC-edge.
And any other IRQs seem to be
>>>>>>> at least not level.
>>>>>>
>>>>>
>>>>>> Does the e1000 emulated card work correctly?
>>>>>
>>>>> Yes, that one seems to work ok.
>>>>>
>>>>>> What happens if you disable interrupt remapping (see
patch below)?
>>>>>
>>>>> 8139cp seems to work correctly now (much higher irq stats
as well) and e1000
>>>>> still works. Both then using IOAPIC-fasteoi.
>>>>>
>>>>
>>>> That means there must be another subtle bug in Xen in interrupt
>>>> remapping that only affects 8139p emulation
>>>>
>>> Right, or to be complete:
>>> - e1000: ok
>>> - 8139cp: unstable (setup is possible)
>>> - ne2k_pci: not working (tx problems from the beginning)
>>>
>>> The behaviour feels a bit like interrupts may get lost if occurring
at a higher
>>> rate. Why this affects various drivers differently is a bit weird.
>>>>
>>
>> This is mainly speculating... Quite a while back there was this patch
to events:
>>
>> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> Date:   Fri Aug 20 19:10:01 2010 -0700
>>
>>     xen: handle events as edge-triggered
>>
>> The commit message stated that Xen events are logically edge triggered.
So PV
>> events were changed to be handled as edge interrupts. Would that not
mean that
>> for xen-pirq-apic being using events this would apply the same and
those should
>> be apic-edge instead of level?
> 
> That commit is referring to the internal way Linux handles these event,
> that look like normal interrupt to the Linux irq subsystem. It is not
> related to the way actual events are delivered from Xen to Linux, so it
> shouldn''t matter here.
> 
> I would add lots of printk''s in:
> 
> xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> xen/arch/x86/hvm/irq.c:assert_irq
> xen/arch/x86/hvm/irq.c:assert_gsi
> 
> to find out why xen is not injecting those interrupts
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
It took quite a bit of time but at least I got some hopefully useful information
now. So in general, whenever an interrupt is asserted,
the hypervisor runs through this:

__hvm_pci_intx_assert:
  when assert count was 0 before incrementing
    call assert_gsi
      call send_guest_pirq (when hvm uses pirq)

In the send_guest_pirq chain is a call to evtchn_set_pending which tests as one
of the first actions whether evtchn_pending in the shared_info is set. If that
is the case the call immediately returns with 1.

Adding printks to call_assert_gsi, I noticed that
- When things stop working, the last call to send_guest_pirq returned 1.
- But not every time the return code is one, the stall happens.
- e1000 also has cases where send_guest_pirq returns 1 but they happen much
  less often (than using the 8139cp).

Usually every intx_assert has a intx_deassert call that follows. when the stall
occurs, this does not happen. Right here I got some troubles to understand where
this intx_deassert is actually triggered. With an added WARN_ON the stack traces
seem odd, like this:

(XEN)    [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
(XEN)    [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
(XEN)    [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
(XEN)    [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
(XEN)    [<ffff82c48017dba6>] copy_from_user+0x26/0x90
(XEN)    [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
(XEN)    [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d

Not really sure how one gets from do_update_descriptor to do_hvm_op and the only
thing in there which does the deassert is some irq level setting.

Actually the guest does not really do much do EOI (which I had been assuming).
But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
PHYSDEVOP_irq_status_query will hit the following and not set the flag for
needing EOI.

        irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) <= 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

So all the guest is doing is to clear evtchn_pending in the pirq EOI function. I
fail to understand what actually is doing the hvm_pci_intx_deassert calls but
the way the fasteoi code in the guest looks to be working, there seems to be
some gap between calling the handler and the eoi function... So from what I see,
I would assume the following:

dom0                                     domU
- intx_assert (count 0->1)
- send_guest_pirq = 0
  (evtchn_pending = 1)
                                         - upcall starts fasteoi handler
- something does intx_deassert
  (count 1->0)
- intx_assert (count 0->1)
- send_guest_pirq = 1
  (evtchn_pending still set)
                                         - handler->eoi sets evtchn to 0 but
                                           otherwise does nothing
- there is no intx_deassert, so even
  when another intx_assert would happen
  (which does not seem to be the case)
  no further send_guest_pirq would be
  called.

Unfortunately I do miss some details on the inner working here. Generally I
wonder whether not setting the needsEOI flag for those pirqs just is the
problem. But it also could be intentional...

-Stefan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Sep-30 14:09 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Fri, 30 Sep 2011, Stefan Bader wrote:> On 22.09.2011 19:44, Stefano Stabellini wrote:
> > On Thu, 22 Sep 2011, Stefan Bader wrote:
> >> On 22.09.2011 13:58, Stefan Bader wrote:
> >>> On 22.09.2011 12:30, Stefano Stabellini wrote:
> >>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>> On 21.09.2011 15:31, Stefano Stabellini wrote:
> >>>>>> On Wed, 21 Sep 2011, Stefan Bader wrote:
> >>>>>>> This is on 3.0.4 based dom0 and domU with
4.1.1 hypervisor. I tried using the
> >>>>>>> default 8139cp and ne2k_pci emulated nic. The
8139cp one at least comes up and
> >>>>>>> gets configured via dhcp. And initial pings
also get routed and done correctly.
> >>>>>>> But slightly higher traffic (like checking for
updates) hangs. And after a while
> >>>>>>> there are messages about tx timeouts.
> >>>>>>> The ne2k_pci type nic almost immediately has
those issues and never comes up
> >>>>>>> correctly.
> >>>>>>>
> >>>>>>> I am attaching the dmesg of the guest with
apic=debug enabled. I am not sure how
> >>>>>>> this should be but both nics get configured
with level,low IRQs. Disk emulation
> >>>>>>> seems to be ok but that seem to use
IO-APIC-edge. And any other IRQs seem to be
> >>>>>>> at least not level.
> >>>>>>
> >>>>>
> >>>>>> Does the e1000 emulated card work correctly?
> >>>>>
> >>>>> Yes, that one seems to work ok.
> >>>>>
> >>>>>> What happens if you disable interrupt remapping
(see patch below)?
> >>>>>
> >>>>> 8139cp seems to work correctly now (much higher irq
stats as well) and e1000
> >>>>> still works. Both then using IOAPIC-fasteoi.
> >>>>>
> >>>>
> >>>> That means there must be another subtle bug in Xen in
interrupt
> >>>> remapping that only affects 8139p emulation
> >>>>
> >>> Right, or to be complete:
> >>> - e1000: ok
> >>> - 8139cp: unstable (setup is possible)
> >>> - ne2k_pci: not working (tx problems from the beginning)
> >>>
> >>> The behaviour feels a bit like interrupts may get lost if
occurring at a higher
> >>> rate. Why this affects various drivers differently is a bit
weird.
> >>>>
> >>
> >> This is mainly speculating... Quite a while back there was this
patch to events:
> >>
> >> commit dffe2e1e1a1ddb566a76266136c312801c66dcf7
> >> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> >> Date:   Fri Aug 20 19:10:01 2010 -0700
> >>
> >>     xen: handle events as edge-triggered
> >>
> >> The commit message stated that Xen events are logically edge
triggered. So PV
> >> events were changed to be handled as edge interrupts. Would that
not mean that
> >> for xen-pirq-apic being using events this would apply the same and
those should
> >> be apic-edge instead of level?
> > 
> > That commit is referring to the internal way Linux handles these
event,
> > that look like normal interrupt to the Linux irq subsystem. It is not
> > related to the way actual events are delivered from Xen to Linux, so
it
> > shouldn''t matter here.
> > 
> > I would add lots of printk''s in:
> > 
> > xen/arch/x86/hvm/irq.c:__hvm_pci_intx_assert
> > xen/arch/x86/hvm/irq.c:assert_irq
> > xen/arch/x86/hvm/irq.c:assert_gsi
> > 
> > to find out why xen is not injecting those interrupts
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 
> It took quite a bit of time but at least I got some hopefully useful
information
> now. So in general, whenever an interrupt is asserted,
> the hypervisor runs through this:
> 
> __hvm_pci_intx_assert:
>   when assert count was 0 before incrementing
>     call assert_gsi
>       call send_guest_pirq (when hvm uses pirq)
> 
> In the send_guest_pirq chain is a call to evtchn_set_pending which tests as
one
> of the first actions whether evtchn_pending in the shared_info is set. If
that
> is the case the call immediately returns with 1.
> 
> Adding printks to call_assert_gsi, I noticed that
> - When things stop working, the last call to send_guest_pirq returned 1.
> - But not every time the return code is one, the stall happens.
> - e1000 also has cases where send_guest_pirq returns 1 but they happen much
>   less often (than using the 8139cp).
> 
> Usually every intx_assert has a intx_deassert call that follows. when the
stall
> occurs, this does not happen. Right here I got some troubles to understand
where
> this intx_deassert is actually triggered. With an added WARN_ON the stack
traces
> seem odd, like this:
> 
> (XEN)    [<ffff82c4801abd9c>] __hvm_pci_intx_deassert+0x6c/0x130
> (XEN)    [<ffff82c4801ac43e>] hvm_pci_intx_deassert+0x3e/0x60
> (XEN)    [<ffff82c4801a8148>] do_hvm_op+0x3b8/0x1e60
> (XEN)    [<ffff82c480168ea1>] do_update_descriptor+0x171/0x220
> (XEN)    [<ffff82c48017dba6>] copy_from_user+0x26/0x90
> (XEN)    [<ffff82c4801f9446>] do_iret+0xb6/0x1a0
> (XEN)    [<ffff82c4801f4f28>] syscall_enter+0x88/0x8d
> 
> Not really sure how one gets from do_update_descriptor to do_hvm_op and the
only
> thing in there which does the deassert is some irq level setting.
> 
> Actually the guest does not really do much do EOI (which I had been
assuming).
> But since domain_pirq_to_irq maps to 0 for emuirqs, the call to
> PHYSDEVOP_irq_status_query will hit the following and not set the flag for
> needing EOI.
> 
>         irq_status_query.flags = 0;
>         if ( is_hvm_domain(v->domain) &&
>              domain_pirq_to_irq(v->domain, irq) <= 0 )
>         {
>             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT :
0;
>             break;
>         }
> 
> So all the guest is doing is to clear evtchn_pending in the pirq EOI
function. I
> fail to understand what actually is doing the hvm_pci_intx_deassert calls
but
> the way the fasteoi code in the guest looks to be working, there seems to
be
> some gap between calling the handler and the eoi function... So from what I
see,
> I would assume the following:
> 
> dom0                                     domU
> - intx_assert (count 0->1)
> - send_guest_pirq = 0
>   (evtchn_pending = 1)
>                                          - upcall starts fasteoi handler
> - something does intx_deassert
>   (count 1->0)
> - intx_assert (count 0->1)
> - send_guest_pirq = 1
>   (evtchn_pending still set)
>                                          - handler->eoi sets evtchn to 0
but
>                                            otherwise does nothing
> - there is no intx_deassert, so even
>   when another intx_assert would happen
>   (which does not seem to be the case)
>   no further send_guest_pirq would be
>   called.
> 
> Unfortunately I do miss some details on the inner working here. Generally I
> wonder whether not setting the needsEOI flag for those pirqs just is the
> problem. But it also could be intentional... 
Thanks for the very detailed analysis.
It seems to me that the problem is that if the interrupt is a level
triggered interrupt when the guest issues an EOI we should be
reinjecting the interrupt again if it has been issued a second time in
the meantime. However this doesn''t happen if the interrupt has been
remapped onto an even channel. In that case the guest is not even going
to issue an EOI at all.
So I wrote a patch to force the guest to issue EOIs even on remapped
irqs; in the hypercall handler we check whether we need to reinject the
interrupt and if that is the case we set the corresponding event channel
pending.
Could you please try the patch I appended? I haven''t been able to
reproduce
your problem so I am not really sure if it works.



diff -r e042fb60e0ee xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Thu Sep 29 11:23:01 2011 +0000
+++ b/xen/arch/x86/physdev.c	Fri Sep 30 14:01:46 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( is_hvm_domain(v->domain) &&
+                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+        {
+            struct hvm_irq *hvm_irq =
&v->domain->arch.hvm_domain.irq;
+            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+            /* if this is a level irq and count > 0, send another
+             * notification */ 
+            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+                    && hvm_irq->gsi_assert_count[gsi] )
+                send_guest_pirq(v->domain, pirq);
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -327,12 +340,6 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( (irq < 0) || (irq >= v->domain->nr_pirqs) )
             break;
         irq_status_query.flags = 0;
-        if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
-        {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
-            break;
-        }
 
         /*
          * Even edge-triggered or message-based IRQs can need masking from

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Sep-30 16:06 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 30.09.2011 16:09, Stefano Stabellini wrote:> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>          if ( !is_hvm_domain(v->domain) ||
>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>              pirq_guest_eoi(pirq);
> +        if ( is_hvm_domain(v->domain) &&
> +                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
> +        {
> +            struct hvm_irq *hvm_irq =
&v->domain->arch.hvm_domain.irq;
> +            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
> +
> +            /* if this is a level irq and count > 0, send another
> +             * notification */ 
> +            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
> +                    && hvm_irq->gsi_assert_count[gsi] )
> +                send_guest_pirq(v->domain, pirq);
> +        }
>          spin_unlock(&v->domain->event_lock);
>          ret = 0;
>          break;
This hunk looks substantially different from my 4.1.1 based code. There is no
spin_lock acquired. Not sure that could be a reason for the different behaviour,
too. I''ll add that spinlock too.

    case PHYSDEVOP_eoi: {
        struct physdev_eoi eoi;
        ret = -EFAULT;
        if ( copy_from_guest(&eoi, arg, 1) != 0 )
            break;
        ret = -EINVAL;
        if ( eoi.irq >= v->domain->nr_pirqs )
            break;
        if ( v->domain->arch.pirq_eoi_map )
            evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
        if ( !is_hvm_domain(v->domain) ||
             domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
            ret = pirq_guest_eoi(v->domain, eoi.irq);
        else
            ret = 0;
        break;
    }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Sep-30 17:59 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 30.09.2011 18:06, Stefan Bader wrote:> On 30.09.2011 16:09, Stefano Stabellini wrote:
>> @@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>          if ( !is_hvm_domain(v->domain) ||
>>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>>              pirq_guest_eoi(pirq);
>> +        if ( is_hvm_domain(v->domain) &&
>> +                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
>> +        {
>> +            struct hvm_irq *hvm_irq =
&v->domain->arch.hvm_domain.irq;
>> +            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
>> +
>> +            /* if this is a level irq and count > 0, send another
>> +             * notification */ 
>> +            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered
*/
>> +                    && hvm_irq->gsi_assert_count[gsi] )
>> +                send_guest_pirq(v->domain, pirq);
>> +        }
>>          spin_unlock(&v->domain->event_lock);
>>          ret = 0;
>>          break;
> 
> This hunk looks substantially different from my 4.1.1 based code. There is
no
> spin_lock acquired. Not sure that could be a reason for the different
behaviour,
> too. I''ll add that spinlock too.
> 
>     case PHYSDEVOP_eoi: {
>         struct physdev_eoi eoi;
>         ret = -EFAULT;
>         if ( copy_from_guest(&eoi, arg, 1) != 0 )
>             break;
>         ret = -EINVAL;
>         if ( eoi.irq >= v->domain->nr_pirqs )
>             break;
>         if ( v->domain->arch.pirq_eoi_map )
>             evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
>         if ( !is_hvm_domain(v->domain) ||
>              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>             ret = pirq_guest_eoi(v->domain, eoi.irq);
>         else
>             ret = 0;
>         break;
>     }
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
Ok, so I had been modifying that hunk to

        spin_lock(&v->domain->event_lock);
        if ( v->domain->arch.pirq_eoi_map )
            evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
        if ( !is_hvm_domain(v->domain) ||
             domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
            pirq_guest_eoi(v->domain, eoi.irq);
        if ( is_hvm_domain(v->domain) &&
                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
        {
            struct hvm_irq *hvm_irq = &v->domain->arch.hvm_domain.irq;
            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);

            /* if this is a level irq and count > 0, send another
             * notification */
            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
                    && hvm_irq->gsi_assert_count[gsi] ) {
                printk("re-send event for gsi%i\n", gsi);
                send_guest_pirq(v->domain, eoi.irq);
           }
        }
        spin_unlock(&v->domain->event_lock);
        ret = 0;

Also I did not completely remove the section that would return the status
without setting needsEOI. I just changed the if condition to be <0 instead of
<=0 (I knew from the tests that the mapping was always 0 and maybe the <0
check
could be useful for something.

        irq_status_query.flags = 0;
        if ( is_hvm_domain(v->domain) &&
             domain_pirq_to_irq(v->domain, irq) < 0 )
        {
            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
            break;
        }

With that a quick test shows both the re-sends done sometimes and the domU doing
EOIs. And there is no stall apparent. Did the same quick test with the e1000
emulated NIC and that still seems ok. Those were not very thorough tests but at
least I would have observed a stall pretty quick otherwise.

-Stefan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Oct-03 17:24 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Fri, 30 Sep 2011, Stefan Bader wrote:> Also I did not completely remove the section that would return the status
> without setting needsEOI. I just changed the if condition to be <0
instead of
> <=0 (I knew from the tests that the mapping was always 0 and maybe the
<0 check
> could be useful for something.
> 
>         irq_status_query.flags = 0;
>         if ( is_hvm_domain(v->domain) &&
>              domain_pirq_to_irq(v->domain, irq) < 0 )
>         {
>             ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT :
0;
>             break;
>         }
> 
You need to remove the entire test because we want to receive
notifications in all cases.

> With that a quick test shows both the re-sends done sometimes and the domU
doing
> EOIs. And there is no stall apparent. Did the same quick test with the
e1000
> emulated NIC and that still seems ok. Those were not very thorough tests
but at
> least I would have observed a stall pretty quick otherwise.
I am glad it fixes the problem for you.

I am going to send a different patch upstream for Xen 4.2, because I
would also like it to cover the very unlikely scenario in which a PV
guest (like dom0 or a PV guest with PCI passthrough) is loosing level
interrupts because when Xen tries to set the corresponding event channel
pending the bit is alreay set. The codebase is different enough that
making the same change on 4.1 is non-trivial. I am appending the new
patch to this email, it would be great if you could test it. You just
need a 4.2 hypervisor, not the entire system. You should be able to
perform the test updating only xen.gz.
If you have trouble if xen-unstable.hg tip, try changeset 23843.

---


diff -r bf533533046c xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
 
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            pirq->lost++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
 {
     struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
     unsigned int gsi, link, isa_irq;
+    struct pirq *pirq;
 
     ASSERT((device <= 31) && (intx <= 3));
 
@@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
+        if ( hvm_domain_use_pirq(d, pirq) )
+            pirq->lost++;
+    }
 
     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
diff -r bf533533046c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
              !test_and_set_bool(pirq->masked) )
             action->in_flight++;
         if ( !hvm_do_IRQ_dpci(d, pirq) )
-            send_guest_pirq(d, pirq);
+        {
+            if ( send_guest_pirq(d, pirq) &&
+                    action->ack_type == ACKTYPE_EOI )
+                pirq->lost++;
+        }
     }
 
     if ( action->ack_type != ACKTYPE_NONE )
diff -r bf533533046c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( pirq->lost > 0) {
+            if ( !send_guest_pirq(v->domain, pirq) )
+                pirq->lost--;
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }
 
diff -r bf533533046c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
@@ -146,6 +146,7 @@ struct pirq {
     int pirq;
     u16 evtchn;
     bool_t masked;
+    u32 lost;
     struct rcu_head rcu_head;
     struct arch_pirq arch;
 };

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Oct-03 18:13 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

CC''ing Jan, that probably is going to have an opinion on this.

Let me add a bit of background: Stefan found out that PV on HVM guests
could loose level interrupts coming from emulated devices. Looking
through the code I realized that we need to add some logic to inject a
pirq in the guest if a level interrupt has been raised while the guest
is servicing the first one.
While this is all very specific to interrupt remapping and emulated
devices, I realized that something similar could happen even with dom0
or other PV guests with PCI passthrough:

1) the device raises a level interrupt and xen injects it into the
guest;

2) the guest is temporarely stuck: it does not ack it or eoi it;

3) the xen timer kicks in and eois the interrupt;

4) the device thinks it is all fine and sends a second interrupt;

5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;

at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and I think it might
cause problems with some devices.

Jan, do you think we should try to handle this case, or is it too
unlikely?

In any case we need to handle the PV on HVM remapping bug, that because
of the way interrupts are emulated is much more likely to happen...


On Mon, 3 Oct 2011, Stefano Stabellini wrote:> On Fri, 30 Sep 2011, Stefan Bader wrote:
> > Also I did not completely remove the section that would return the
status
> > without setting needsEOI. I just changed the if condition to be <0
instead of
> > <=0 (I knew from the tests that the mapping was always 0 and maybe
the <0 check
> > could be useful for something.
> > 
> >         irq_status_query.flags = 0;
> >         if ( is_hvm_domain(v->domain) &&
> >              domain_pirq_to_irq(v->domain, irq) < 0 )
> >         {
> >             ret = copy_to_guest(arg, &irq_status_query, 1) ?
-EFAULT : 0;
> >             break;
> >         }
> > 
> 
> You need to remove the entire test because we want to receive
> notifications in all cases.
> 
> 
> > With that a quick test shows both the re-sends done sometimes and the
domU doing
> > EOIs. And there is no stall apparent. Did the same quick test with the
e1000
> > emulated NIC and that still seems ok. Those were not very thorough
tests but at
> > least I would have observed a stall pretty quick otherwise.
> 
> I am glad it fixes the problem for you.
> 
> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.
> 
> ---
> 
> 
> diff -r bf533533046c xen/arch/x86/hvm/irq.c
> --- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>  
>      if ( hvm_domain_use_pirq(d, pirq) )
>      {
> -        send_guest_pirq(d, pirq);
> +        if ( send_guest_pirq(d, pirq) && ioapic_gsi >=
NR_ISAIRQS )
> +            pirq->lost++;
>          return;
>      }
>      vioapic_irq_positive_edge(d, ioapic_gsi);
> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>  {
>      struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>      unsigned int gsi, link, isa_irq;
> +    struct pirq *pirq;
>  
>      ASSERT((device <= 31) && (intx <= 3));
>  
> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>      gsi = hvm_pci_intx_gsi(device, intx);
>      if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>          assert_gsi(d, gsi);
> +    else {
> +        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
> +        if ( hvm_domain_use_pirq(d, pirq) )
> +            pirq->lost++;
> +    }
>  
>      link    = hvm_pci_intx_link(device, intx);
>      isa_irq = hvm_irq->pci_link.route[link];
> diff -r bf533533046c xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>               !test_and_set_bool(pirq->masked) )
>              action->in_flight++;
>          if ( !hvm_do_IRQ_dpci(d, pirq) )
> -            send_guest_pirq(d, pirq);
> +        {
> +            if ( send_guest_pirq(d, pirq) &&
> +                    action->ack_type == ACKTYPE_EOI )
> +                pirq->lost++;
> +        }
>      }
>  
>      if ( action->ack_type != ACKTYPE_NONE )
> diff -r bf533533046c xen/arch/x86/physdev.c
> --- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
> @@ -11,6 +11,7 @@
>  #include <asm/current.h>
>  #include <asm/io_apic.h>
>  #include <asm/msi.h>
> +#include <asm/hvm/irq.h>
>  #include <asm/hypercall.h>
>  #include <public/xen.h>
>  #include <public/physdev.h>
> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>          if ( !is_hvm_domain(v->domain) ||
>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>              pirq_guest_eoi(pirq);
> +        if ( pirq->lost > 0) {
> +            if ( !send_guest_pirq(v->domain, pirq) )
> +                pirq->lost--;
> +        }
>          spin_unlock(&v->domain->event_lock);
>          ret = 0;
>          break;
> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>              break;
>          irq_status_query.flags = 0;
>          if ( is_hvm_domain(v->domain) &&
> -             domain_pirq_to_irq(v->domain, irq) <= 0 )
> +             domain_pirq_to_irq(v->domain, irq) <= 0 &&
> +             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>          {
> -            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT :
0;
> +            ret = -EINVAL;
>              break;
>          }
>  
> diff -r bf533533046c xen/include/xen/irq.h
> --- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
> +++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
> @@ -146,6 +146,7 @@ struct pirq {
>      int pirq;
>      u16 evtchn;
>      bool_t masked;
> +    u32 lost;
>      struct rcu_head rcu_head;
>      struct arch_pirq arch;
>  };
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andrew Cooper

2011-Oct-04 10:07 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 03/10/11 19:13, Stefano Stabellini wrote:> CC''ing Jan, that probably is going to have an opinion on this.
>
> Let me add a bit of background: Stefan found out that PV on HVM guests
> could loose level interrupts coming from emulated devices. Looking
> through the code I realized that we need to add some logic to inject a
> pirq in the guest if a level interrupt has been raised while the guest
> is servicing the first one.
> While this is all very specific to interrupt remapping and emulated
> devices, I realized that something similar could happen even with dom0
> or other PV guests with PCI passthrough:
>
> 1) the device raises a level interrupt and xen injects it into the
> guest;
>
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
>
> 3) the xen timer kicks in and eois the interrupt;
>
> 4) the device thinks it is all fine and sends a second interrupt;
>
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
>
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and I think it might
> cause problems with some devices.
>
> Jan, do you think we should try to handle this case, or is it too
> unlikely?
I am not certain whether this is relevant, but the ICH10 IO-APIC
documentation indicated that early EOI''ing of a line level interrupt
should not have this effect.  Specifically, it states that EOI''ing a
line level interrupt whos line is still asserted will cause the
interrupt to be "re-raised" from the IO-APIC.  It uses this to assert
that it is fine to use multiple IO-APIC entries with the same vector,
with a broadcast of vector number alone to EOI the interrupt.

In this case, while Xen sees two interrupts, from the devices point of
view, only I has happened.

In the case where the device has dropped its line level interrupt of its
own accord, then I would agree that the current Xen behavior is wrong. 
I cant offhand think of a good reason why this would occur.

I know it is not helpful in this case, but as a rule of thumb, line
level interrupts should not be used with Xen.  The average response time
on an unloaded system is ~30ms, ranging from 5 to 150.  (On a sample set
of a Dell R710, Xen 4.1.0 and 2.6.32 dom0, over 2 weeks of debugging
another line level interrupt bug).

~Andrew
> In any case we need to handle the PV on HVM remapping bug, that because
> of the way interrupts are emulated is much more likely to happen...
>
>
> On Mon, 3 Oct 2011, Stefano Stabellini wrote:
>> On Fri, 30 Sep 2011, Stefan Bader wrote:
>>> Also I did not completely remove the section that would return the
status
>>> without setting needsEOI. I just changed the if condition to be
<0 instead of
>>> <=0 (I knew from the tests that the mapping was always 0 and
maybe the <0 check
>>> could be useful for something.
>>>
>>>         irq_status_query.flags = 0;
>>>         if ( is_hvm_domain(v->domain) &&
>>>              domain_pirq_to_irq(v->domain, irq) < 0 )
>>>         {
>>>             ret = copy_to_guest(arg, &irq_status_query, 1) ?
-EFAULT : 0;
>>>             break;
>>>         }
>>>
>> You need to remove the entire test because we want to receive
>> notifications in all cases.
>>
>>
>>> With that a quick test shows both the re-sends done sometimes and
the domU doing
>>> EOIs. And there is no stall apparent. Did the same quick test with
the e1000
>>> emulated NIC and that still seems ok. Those were not very thorough
tests but at
>>> least I would have observed a stall pretty quick otherwise.
>> I am glad it fixes the problem for you.
>>
>> I am going to send a different patch upstream for Xen 4.2, because I
>> would also like it to cover the very unlikely scenario in which a PV
>> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
>> interrupts because when Xen tries to set the corresponding event
channel
>> pending the bit is alreay set. The codebase is different enough that
>> making the same change on 4.1 is non-trivial. I am appending the new
>> patch to this email, it would be great if you could test it. You just
>> need a 4.2 hypervisor, not the entire system. You should be able to
>> perform the test updating only xen.gz.
>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> ---
>>
>>
>> diff -r bf533533046c xen/arch/x86/hvm/irq.c
>> --- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
>> @@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
>>  
>>      if ( hvm_domain_use_pirq(d, pirq) )
>>      {
>> -        send_guest_pirq(d, pirq);
>> +        if ( send_guest_pirq(d, pirq) && ioapic_gsi >=
NR_ISAIRQS )
>> +            pirq->lost++;
>>          return;
>>      }
>>      vioapic_irq_positive_edge(d, ioapic_gsi);
>> @@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
>>  {
>>      struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
>>      unsigned int gsi, link, isa_irq;
>> +    struct pirq *pirq;
>>  
>>      ASSERT((device <= 31) && (intx <= 3));
>>  
>> @@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
>>      gsi = hvm_pci_intx_gsi(device, intx);
>>      if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
>>          assert_gsi(d, gsi);
>> +    else {
>> +        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
>> +        if ( hvm_domain_use_pirq(d, pirq) )
>> +            pirq->lost++;
>> +    }
>>  
>>      link    = hvm_pci_intx_link(device, intx);
>>      isa_irq = hvm_irq->pci_link.route[link];
>> diff -r bf533533046c xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
>> @@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
>>               !test_and_set_bool(pirq->masked) )
>>              action->in_flight++;
>>          if ( !hvm_do_IRQ_dpci(d, pirq) )
>> -            send_guest_pirq(d, pirq);
>> +        {
>> +            if ( send_guest_pirq(d, pirq) &&
>> +                    action->ack_type == ACKTYPE_EOI )
>> +                pirq->lost++;
>> +        }
>>      }
>>  
>>      if ( action->ack_type != ACKTYPE_NONE )
>> diff -r bf533533046c xen/arch/x86/physdev.c
>> --- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
>> @@ -11,6 +11,7 @@
>>  #include <asm/current.h>
>>  #include <asm/io_apic.h>
>>  #include <asm/msi.h>
>> +#include <asm/hvm/irq.h>
>>  #include <asm/hypercall.h>
>>  #include <public/xen.h>
>>  #include <public/physdev.h>
>> @@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>          if ( !is_hvm_domain(v->domain) ||
>>               domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
>>              pirq_guest_eoi(pirq);
>> +        if ( pirq->lost > 0) {
>> +            if ( !send_guest_pirq(v->domain, pirq) )
>> +                pirq->lost--;
>> +        }
>>          spin_unlock(&v->domain->event_lock);
>>          ret = 0;
>>          break;
>> @@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
>>              break;
>>          irq_status_query.flags = 0;
>>          if ( is_hvm_domain(v->domain) &&
>> -             domain_pirq_to_irq(v->domain, irq) <= 0 )
>> +             domain_pirq_to_irq(v->domain, irq) <= 0 &&
>> +             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
>>          {
>> -            ret = copy_to_guest(arg, &irq_status_query, 1) ?
-EFAULT : 0;
>> +            ret = -EINVAL;
>>              break;
>>          }
>>  
>> diff -r bf533533046c xen/include/xen/irq.h
>> --- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
>> +++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
>> @@ -146,6 +146,7 @@ struct pirq {
>>      int pirq;
>>      u16 evtchn;
>>      bool_t masked;
>> +    u32 lost;
>>      struct rcu_head rcu_head;
>>      struct arch_pirq arch;
>>  };
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Oct-04 14:13 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Tue, 4 Oct 2011, Andrew Cooper wrote:> On 03/10/11 19:13, Stefano Stabellini wrote:
> > CC''ing Jan, that probably is going to have an opinion on
this.
> >
> > Let me add a bit of background: Stefan found out that PV on HVM guests
> > could loose level interrupts coming from emulated devices. Looking
> > through the code I realized that we need to add some logic to inject a
> > pirq in the guest if a level interrupt has been raised while the guest
> > is servicing the first one.
> > While this is all very specific to interrupt remapping and emulated
> > devices, I realized that something similar could happen even with dom0
> > or other PV guests with PCI passthrough:
> >
> > 1) the device raises a level interrupt and xen injects it into the
> > guest;
> >
> > 2) the guest is temporarely stuck: it does not ack it or eoi it;
> >
> > 3) the xen timer kicks in and eois the interrupt;
> >
> > 4) the device thinks it is all fine and sends a second interrupt;
> >
> > 5) Xen fails to inject the second interrupt into the guest because the
> > guest has still the event channel pending bit set;
> >
> > at this point the guest looses the second interrupt notification, that
> > is not supposed to happen with level interrupts and I think it might
> > cause problems with some devices.
> >
> > Jan, do you think we should try to handle this case, or is it too
> > unlikely?
> 
> I am not certain whether this is relevant, but the ICH10 IO-APIC
> documentation indicated that early EOI''ing of a line level
interrupt
> should not have this effect.  Specifically, it states that EOI''ing
a
> line level interrupt whos line is still asserted will cause the
> interrupt to be "re-raised" from the IO-APIC.  It uses this to
assert
> that it is fine to use multiple IO-APIC entries with the same vector,
> with a broadcast of vector number alone to EOI the interrupt.
> 
> In this case, while Xen sees two interrupts, from the devices point of
> view, only I has happened.
> 
> In the case where the device has dropped its line level interrupt of its
> own accord, then I would agree that the current Xen behavior is wrong. 
> I cant offhand think of a good reason why this would occur.
I think this scenario is actually possible. It is certainly happening
with qemu''s emulated devices.
This patch would take care of re-injecting the interrupts both in the
case of the device deasserting and reasserting the interrupt while the
guest hasn''t cleared the pending bit yet and in case a PV on HVM guest
eois the interrupt too early.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Oct-05 16:10 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 03.10.2011 19:24, Stefano Stabellini wrote:> I am going to send a different patch upstream for Xen 4.2, because I
> would also like it to cover the very unlikely scenario in which a PV
> guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> interrupts because when Xen tries to set the corresponding event channel
> pending the bit is alreay set. The codebase is different enough that
> making the same change on 4.1 is non-trivial. I am appending the new
> patch to this email, it would be great if you could test it. You just
> need a 4.2 hypervisor, not the entire system. You should be able to
> perform the test updating only xen.gz.
> If you have trouble if xen-unstable.hg tip, try changeset 23843.
Hi Stefano,

currently I would have the problem that I don''t have too much time to
move to
another hypervisor (tests may or may not be useful there with substantial
changes beside this one) with our next release being close.
But I think I got a usable backport of your change to 4.1.1 (you think it looks
ok?) and have given that a quick test which seems to be ok...
Though one drawback is that I don''t have a setup which would use
passthrough, so
that path is not tested. I think I did see (with a debugging version) that the
lost count was incremented and decremented in dom0, though.

-Stefan

---

Index: xen-4.1.1/xen/arch/x86/domain.c
==================================================================---
xen-4.1.1.orig/xen/arch/x86/domain.c	2011-10-05 15:03:19.405815293 +0200
+++ xen-4.1.1/xen/arch/x86/domain.c	2011-10-05 15:09:59.781816622 +0200
@@ -514,6 +514,12 @@ int arch_domain_create(struct domain *d,
         memset(d->arch.pirq_irq, 0,
                d->nr_pirqs * sizeof(*d->arch.pirq_irq));

+        d->arch.pirq_lost = xmalloc_array(int, d->nr_pirqs);
+        if ( !d->arch.pirq_lost)
+            goto fail;
+        memset(d->arch.pirq_lost, 0,
+               d->nr_pirqs * sizeof(*d->arch.pirq_lost));
+
         d->arch.irq_pirq = xmalloc_array(int, nr_irqs);
         if ( !d->arch.irq_pirq )
             goto fail;
@@ -575,6 +581,7 @@ int arch_domain_create(struct domain *d,
  fail:
     d->is_dying = DOMDYING_dead;
     vmce_destroy_msr(d);
+    xfree(d->arch.pirq_lost);
     xfree(d->arch.pirq_irq);
     xfree(d->arch.irq_pirq);
     xfree(d->arch.pirq_emuirq);
@@ -628,6 +635,7 @@ void arch_domain_destroy(struct domain *
 #endif

     free_xenheap_page(d->shared_info);
+    xfree(d->arch.pirq_lost);
     xfree(d->arch.pirq_irq);
     xfree(d->arch.irq_pirq);
     xfree(d->arch.pirq_emuirq);
Index: xen-4.1.1/xen/arch/x86/hvm/irq.c
==================================================================---
xen-4.1.1.orig/xen/arch/x86/hvm/irq.c	2011-10-05 15:14:35.441815292 +0200
+++ xen-4.1.1/xen/arch/x86/hvm/irq.c	2011-10-05 17:55:43.603986605 +0200
@@ -33,7 +33,9 @@ static void assert_gsi(struct domain *d,
     int pirq = domain_emuirq_to_pirq(d, ioapic_gsi);
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            if (d->arch.pirq_lost)
+                d->arch.pirq_lost[pirq]++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -67,6 +69,12 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        int pirq = domain_emuirq_to_pirq(d, gsi);
+
+        if ( hvm_domain_use_pirq(d, pirq) && d->arch.pirq_lost)
+            d->arch.pirq_lost[pirq]++;
+    }

     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
Index: xen-4.1.1/xen/arch/x86/irq.c
==================================================================---
xen-4.1.1.orig/xen/arch/x86/irq.c	2011-10-05 15:26:58.477815292 +0200
+++ xen-4.1.1/xen/arch/x86/irq.c	2011-10-05 17:56:23.191986535 +0200
@@ -888,10 +888,13 @@ static void __do_IRQ_guest(int irq)
                 desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */
             }
         }
-        else if ( send_guest_pirq(d, pirq) &&
-                  (action->ack_type == ACKTYPE_NONE) )
-        {
-            already_pending++;
+        else {
+            if ( send_guest_pirq(d, pirq) ) {
+	        if ( action->ack_type == ACKTYPE_EOI &&
d->arch.pirq_lost)
+                    d->arch.pirq_lost[pirq]++;
+                else if ( action->ack_type == ACKTYPE_NONE )
+                    already_pending++;
+            }
         }
     }

Index: xen-4.1.1/xen/arch/x86/physdev.c
==================================================================---
xen-4.1.1.orig/xen/arch/x86/physdev.c	2011-10-05 15:36:14.545815292 +0200
+++ xen-4.1.1/xen/arch/x86/physdev.c	2011-10-05 17:57:06.055986460 +0200
@@ -261,13 +261,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         ret = -EINVAL;
         if ( eoi.irq >= v->domain->nr_pirqs )
             break;
+        spin_lock(&v->domain->event_lock);
         if ( v->domain->arch.pirq_eoi_map )
             evtchn_unmask(v->domain->pirq_to_evtchn[eoi.irq]);
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
-            ret = pirq_guest_eoi(v->domain, eoi.irq);
-        else
-            ret = 0;
+            pirq_guest_eoi(v->domain, eoi.irq);
+        if ( v->domain->arch.pirq_lost &&
v->domain->arch.pirq_lost[eoi.irq]) {
+            if ( !send_guest_pirq(v->domain, eoi.irq) )
+                v->domain->arch.pirq_lost[eoi.irq]--;
+        }
+        ret = 0;
+        spin_unlock(&v->domain->event_lock);
         break;
     }

@@ -323,9 +328,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }

Index: xen-4.1.1/xen/include/asm-x86/domain.h
==================================================================---
xen-4.1.1.orig/xen/include/asm-x86/domain.h	2011-10-05 15:10:11.709815293 +0200
+++ xen-4.1.1/xen/include/asm-x86/domain.h	2011-10-05 15:12:46.237815276 +0200
@@ -312,6 +312,9 @@ struct arch_domain
                                 (possibly other cases in the future */
     uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */
     uint64_t vtsc_usercount; /* not used for hvm */
+
+    /* Protected by d->event_lock, count of lost pirqs */
+    int *pirq_lost;
 } __cacheline_aligned;

 #define has_arch_pdevs(d)    (!list_empty(&(d)->arch.pdev_list))


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Oct-06 10:12 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On Wed, 5 Oct 2011, Stefan Bader wrote:> On 03.10.2011 19:24, Stefano Stabellini wrote:
> > I am going to send a different patch upstream for Xen 4.2, because I
> > would also like it to cover the very unlikely scenario in which a PV
> > guest (like dom0 or a PV guest with PCI passthrough) is loosing level
> > interrupts because when Xen tries to set the corresponding event
channel
> > pending the bit is alreay set. The codebase is different enough that
> > making the same change on 4.1 is non-trivial. I am appending the new
> > patch to this email, it would be great if you could test it. You just
> > need a 4.2 hypervisor, not the entire system. You should be able to
> > perform the test updating only xen.gz.
> > If you have trouble if xen-unstable.hg tip, try changeset 23843.
> 
> Hi Stefano,
> 
> currently I would have the problem that I don''t have too much time
to move to
> another hypervisor (tests may or may not be useful there with substantial
> changes beside this one) with our next release being close.
> But I think I got a usable backport of your change to 4.1.1 (you think it
looks
> ok?) and have given that a quick test which seems to be ok...
> Though one drawback is that I don''t have a setup which would use
passthrough, so
> that path is not tested. I think I did see (with a debugging version) that
the
> lost count was incremented and decremented in dom0, though.
> 
Honestly if you have to commit to a backport for your package right now,
I would go for the previous version, because it is simpler and less
likely to introduce regressions.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefan Bader

2011-Oct-06 12:16 UTC

head link

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

On 06.10.2011 12:12, Stefano Stabellini wrote:> On Wed, 5 Oct 2011, Stefan Bader wrote:
>> On 03.10.2011 19:24, Stefano Stabellini wrote:
>>> I am going to send a different patch upstream for Xen 4.2, because
I
>>> would also like it to cover the very unlikely scenario in which a
PV
>>> guest (like dom0 or a PV guest with PCI passthrough) is loosing
level
>>> interrupts because when Xen tries to set the corresponding event
channel
>>> pending the bit is alreay set. The codebase is different enough
that
>>> making the same change on 4.1 is non-trivial. I am appending the
new
>>> patch to this email, it would be great if you could test it. You
just
>>> need a 4.2 hypervisor, not the entire system. You should be able to
>>> perform the test updating only xen.gz.
>>> If you have trouble if xen-unstable.hg tip, try changeset 23843.
>>
>> Hi Stefano,
>>
>> currently I would have the problem that I don''t have too much
time to move to
>> another hypervisor (tests may or may not be useful there with
substantial
>> changes beside this one) with our next release being close.
>> But I think I got a usable backport of your change to 4.1.1 (you think
it looks
>> ok?) and have given that a quick test which seems to be ok...
>> Though one drawback is that I don''t have a setup which would
use passthrough, so
>> that path is not tested. I think I did see (with a debugging version)
that the
>> lost count was incremented and decremented in dom0, though.
>>
> 
> Honestly if you have to commit to a backport for your package right now,
> I would go for the previous version, because it is simpler and less
> likely to introduce regressions.
Agreed. Well at least I hope that since that backport seemed to fix the issue I
saw in 4.1.1 it will give some more confidence for you on the 4.2 version. With
the drawback of the passthrough not being tested.

-Stefan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Oct-27 10:37 UTC

head link

[Xen-devel] [PATCH] xen: do not loose level interrupt notifications

PV on HVM guests can loose level interrupts coming from emulated
devices: we are missing code to retry to inject a pirq in the guest if
it corresponds to a level interrupt and the interrupt has been raised
while the guest is servicing the first one.

The same thing could also happen with PV guests, including dom0, even
though it is much more unlikely. In case of PV guests the scenario would
be the following:

1) a device raises a level interrupt and xen injects it into the
guest;

2) the guest is temporarely stuck: it does not ack it or eoi it;

3) the xen timer kicks in and eois the interrupt;

4) the device thinks it is all fine and sends a second interrupt;

5) Xen fails to inject the second interrupt into the guest because the
guest has still the event channel pending bit set;

at this point the guest looses the second interrupt notification, that
is not supposed to happen with level interrupts and it might cause
problems with some devices.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff -r bf533533046c xen/arch/x86/hvm/irq.c
--- a/xen/arch/x86/hvm/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/hvm/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -36,7 +36,8 @@ static void assert_gsi(struct domain *d,
 
     if ( hvm_domain_use_pirq(d, pirq) )
     {
-        send_guest_pirq(d, pirq);
+        if ( send_guest_pirq(d, pirq) && ioapic_gsi >= NR_ISAIRQS )
+            pirq->lost++;
         return;
     }
     vioapic_irq_positive_edge(d, ioapic_gsi);
@@ -63,6 +64,7 @@ static void __hvm_pci_intx_assert(
 {
     struct hvm_irq *hvm_irq = &d->arch.hvm_domain.irq;
     unsigned int gsi, link, isa_irq;
+    struct pirq *pirq;
 
     ASSERT((device <= 31) && (intx <= 3));
 
@@ -72,6 +74,11 @@ static void __hvm_pci_intx_assert(
     gsi = hvm_pci_intx_gsi(device, intx);
     if ( hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
+    else {
+        pirq = pirq_info(d, domain_emuirq_to_pirq(d, gsi));
+        if ( hvm_domain_use_pirq(d, pirq) )
+            pirq->lost++;
+    }
 
     link    = hvm_pci_intx_link(device, intx);
     isa_irq = hvm_irq->pci_link.route[link];
diff -r bf533533046c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/irq.c	Mon Oct 03 16:54:51 2011 +0000
@@ -965,7 +965,11 @@ static void __do_IRQ_guest(int irq)
              !test_and_set_bool(pirq->masked) )
             action->in_flight++;
         if ( !hvm_do_IRQ_dpci(d, pirq) )
-            send_guest_pirq(d, pirq);
+        {
+            if ( send_guest_pirq(d, pirq) &&
+                    action->ack_type == ACKTYPE_EOI )
+                pirq->lost++;
+        }
     }
 
     if ( action->ack_type != ACKTYPE_NONE )
diff -r bf533533046c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/arch/x86/physdev.c	Mon Oct 03 16:54:51 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( pirq->lost > 0) {
+            if ( !send_guest_pirq(v->domain, pirq) )
+                pirq->lost--;
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -328,9 +333,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }
 
diff -r bf533533046c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Fri Sep 30 14:12:35 2011 +0000
+++ b/xen/include/xen/irq.h	Mon Oct 03 16:54:51 2011 +0000
@@ -146,6 +146,7 @@ struct pirq {
     int pirq;
     u16 evtchn;
     bool_t masked;
+    u32 lost;
     struct rcu_head rcu_head;
     struct arch_pirq arch;
 };

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-27 11:18 UTC

head link

Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications

On 27/10/2011 11:37, "Stefano Stabellini"
<Stefano.Stabellini@eu.citrix.com>
wrote:
> PV on HVM guests can loose level interrupts coming from emulated
> devices: we are missing code to retry to inject a pirq in the guest if
> it corresponds to a level interrupt and the interrupt has been raised
> while the guest is servicing the first one.
> 
> The same thing could also happen with PV guests, including dom0, even
> though it is much more unlikely. In case of PV guests the scenario would
> be the following:
> 
> 1) a device raises a level interrupt and xen injects it into the
> guest;
> 
> 2) the guest is temporarely stuck: it does not ack it or eoi it;
> 
> 3) the xen timer kicks in and eois the interrupt;
> 
> 4) the device thinks it is all fine and sends a second interrupt;
> 
> 5) Xen fails to inject the second interrupt into the guest because the
> guest has still the event channel pending bit set;
> 
> at this point the guest looses the second interrupt notification, that
> is not supposed to happen with level interrupts and it might cause
> problems with some devices.
You can''t really lose a level-triggered interrupt. In step (4) the
device
isn''t really actively involved in sending another interrupt -- it never
deasserted its INTx line, and nor will it until the guest''s ISR
quenches the
interrupt at the device. If the guest misses such an interrupt, and
doesn''t
execute the relevant ISR when it should, then another interrupt will simply
be raised by the interrupt controller when the guest does finally EOI the
interrupt. Because the device is *still* asserting the line.

Well, that''s the PV case anyway. I don''t see any problem with
our handling
of the PV case.

Is PV-HVM so different?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stefano Stabellini

2011-Oct-27 11:42 UTC

head link

Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications

On Thu, 27 Oct 2011, Keir Fraser wrote:> On 27/10/2011 11:37, "Stefano Stabellini"
<Stefano.Stabellini@eu.citrix.com>
> wrote:
> 
> > PV on HVM guests can loose level interrupts coming from emulated
> > devices: we are missing code to retry to inject a pirq in the guest if
> > it corresponds to a level interrupt and the interrupt has been raised
> > while the guest is servicing the first one.
> > 
> > The same thing could also happen with PV guests, including dom0, even
> > though it is much more unlikely. In case of PV guests the scenario
would
> > be the following:
> > 
> > 1) a device raises a level interrupt and xen injects it into the
> > guest;
> > 
> > 2) the guest is temporarely stuck: it does not ack it or eoi it;
> > 
> > 3) the xen timer kicks in and eois the interrupt;
> > 
> > 4) the device thinks it is all fine and sends a second interrupt;
> > 
> > 5) Xen fails to inject the second interrupt into the guest because the
> > guest has still the event channel pending bit set;
> > 
> > at this point the guest looses the second interrupt notification, that
> > is not supposed to happen with level interrupts and it might cause
> > problems with some devices.
> 
> You can''t really lose a level-triggered interrupt. In step (4) the
device
> isn''t really actively involved in sending another interrupt -- it
never
> deasserted its INTx line, and nor will it until the guest''s ISR
quenches the
> interrupt at the device. If the guest misses such an interrupt, and
doesn''t
> execute the relevant ISR when it should, then another interrupt will simply
> be raised by the interrupt controller when the guest does finally EOI the
> interrupt. Because the device is *still* asserting the line.
> 
> Well, that''s the PV case anyway. I don''t see any problem
with our handling
> of the PV case.
OK, you convinced me.

> Is PV-HVM so different?
Yes, it is. In the PV on HVM case we need to reassert an emulated
interrupt if the guest EOIs it without quenching the interrupt in the
ISR.
We are doing it already in the emulated code path but we are not doing
it for interrupts that have been remapped into pirqs.

That said, if we don''t care about the PV case we can simplify the
patch,
I am appending a new one that only takes care of the PV on HVM case.

---


xen: re-inject emulated level pirqs in PV on HVM guests if still asserted

PV on HVM guests can loose level interrupts coming from emulated devices
if they have been remapped onto event channels.  The reason is that we
are missing the code to inject a pirq again in the guest when the guest
EOIs it, if it corresponds to an emulated level interrupt and the
interrupt is still asserted.

Fix this issue and also return error when the guest tries to get the
irq_status of a non-existing pirq.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff -r c681dd5aecf3 xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Tue Oct 25 19:22:09 2011 +0100
+++ b/xen/arch/x86/physdev.c	Thu Oct 27 11:30:55 2011 +0000
@@ -11,6 +11,7 @@
 #include <asm/current.h>
 #include <asm/io_apic.h>
 #include <asm/msi.h>
+#include <asm/hvm/irq.h>
 #include <asm/hypercall.h>
 #include <public/xen.h>
 #include <public/physdev.h>
@@ -270,6 +271,18 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
         if ( !is_hvm_domain(v->domain) ||
              domain_pirq_to_irq(v->domain, eoi.irq) > 0 )
             pirq_guest_eoi(pirq);
+        if ( is_hvm_domain(v->domain) &&
+                domain_pirq_to_emuirq(v->domain, eoi.irq) > 0 )
+        {
+            struct hvm_irq *hvm_irq =
&v->domain->arch.hvm_domain.irq;
+            int gsi = domain_pirq_to_emuirq(v->domain, eoi.irq);
+
+            /* if this is a level irq and count > 0, send another
+             * notification */ 
+            if ( gsi >= NR_ISAIRQS /* ISA irqs are edge triggered */
+                    && hvm_irq->gsi_assert_count[gsi] )
+                send_guest_pirq(v->domain, pirq);
+        }
         spin_unlock(&v->domain->event_lock);
         ret = 0;
         break;
@@ -328,9 +341,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
             break;
         irq_status_query.flags = 0;
         if ( is_hvm_domain(v->domain) &&
-             domain_pirq_to_irq(v->domain, irq) <= 0 )
+             domain_pirq_to_irq(v->domain, irq) <= 0 &&
+             domain_pirq_to_emuirq(v->domain, irq) == IRQ_UNBOUND )
         {
-            ret = copy_to_guest(arg, &irq_status_query, 1) ? -EFAULT : 0;
+            ret = -EINVAL;
             break;
         }
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-27 12:17 UTC

head link

Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications

On 27/10/2011 12:42, "Stefano Stabellini"
<stefano.stabellini@eu.citrix.com>
wrote:
>> Is PV-HVM so different?
> 
> Yes, it is. In the PV on HVM case we need to reassert an emulated
> interrupt if the guest EOIs it without quenching the interrupt in the
> ISR.
> We are doing it already in the emulated code path but we are not doing
> it for interrupts that have been remapped into pirqs.
> 
> That said, if we don''t care about the PV case we can simplify the
patch,
> I am appending a new one that only takes care of the PV on HVM case.
Ah yes, when we are *emulating* an INTx line, either for an emulated device
or because we are doing MSI-INTx emulation, we do have to remember to
reassert. That makes sense.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Sep 2011 - Still struggling with HVM: tx timeouts on emulated nics

[Xen-devel] Still struggling with HVM: tx timeouts on emulated nics

[Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

Re: [Xen-devel] Re: Still struggling with HVM: tx timeouts on emulated nics

[Xen-devel] [PATCH] xen: do not loose level interrupt notifications

Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications

Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications

Re: [Xen-devel] [PATCH] xen: do not loose level interrupt notifications