thr3ads.net - Xen devel - [Xen-devel] spurious interrupts [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2006-Apr-12 13:37 UTC

[Xen-devel] spurious interrupts

While I haven''t heard anything whether others were able to reproduce
this, I was now able to nail this down to a simple
operation: On the system I''m testing with I was able to identify that
the handling of the interrupt from the SCSI
controller triggers a simultaneous interrupt from one of the USB controllers, of
which (by way of looking at the native
execution) it is known that it doesn''t generate any interrupts. Hence
it was possible to BUG() the box the first time
such an interrupt appears. This happens when mask_and_ack_level_ioapic_irq()
masks the irq from the first SCSI
controller (pin 0 of IOAPIC 3). No matter how large delays I insert before
calling mask_IO_APIC_irq(), the other
interrupt (pin 16 of IOAPIC 0) becomes visible (in the redirection
table''s irr bit) immediately after the write that
sets the mask bit for the first interrupt.

Obviously I am lost here - I have no way to tell why writing on
IOAPIC''s redir entry affects an interrupt routed
through a completely different IOAPIC. Nevertheless it is clear that the problem
is unique to Xen, because native Linux
doesn''t try to mask the IRQ.

Besides the massive spurious interrupts that lead to IRQs getting shut off
I''m also seeing occasional ones on other
interrupt lines, which must have a different reason. I wonder whether this is
related to attempts to do irq balancing
(which doesn''t seem to work at all under Xen - all device interrupts
are always seen bound to vcpu 0).

While looking at all this, I also found that CONFIG_PCI_MSI not being supported
under Xen is a significant limitation,
as some PCI Express devices may not work at all without this (on the box
I''m working with, all bridges supporting
hotplug). Are there any plans to get this working?

Further, while this also may be a native Linux problem, I wonder whether it is
appropriate for assign_irq_vector to not
use any serialization regardless of the fact that it accesses static variables
(in the xenlinux case, the static
current_vector can actually be easily converted into an automatic variable).

Finally, sufficiently unrelated, I wonder whether xen_create_contiguous_region()
(or its caller(s)) shouldn''t special
case order being zero, as it seems pointless to go through numerous hypercalls
in that case.

Thanks for any comments/explanations, Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Apr-12 14:00 UTC

head link

RE: [Xen-devel] spurious interrupts

>From: Jan Beulich
>Sent: 2006年4月12日 21:37
>
>Finally, sufficiently unrelated, I wonder whether
>xen_create_contiguous_region() (or its caller(s)) shouldn''t special
>case order being zero, as it seems pointless to go through numerous
>hypercalls in that case.
>
The question is that corresponding gmfns is not necessarily contiguous.
Maybe instead of setting order to non-zero, we can change nr_extents
to >1 by allocating buffer to contain all gmfns to be released.

Another abstraction could be to incorporate set_phys_to_machine
into decrease/increase_reservation, and thus allow 
xen_create_contiguous_region to be used by auto_translated_mode if
we want that translated mode to work for backend like for xen/ia64...

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Apr-12 17:21 UTC

head link

Re: [Xen-devel] spurious interrupts

This happens definitely immediately after writing to the IO-APIC? There 
is an ack_APIC_irq() not many instructions after that, and it would 
make much more sense for the IRR flag to be set after that. :-)

  -- Keir

On 12 Apr 2006, at 14:37, Jan Beulich wrote:
> While I haven''t heard anything whether others were able to
reproduce
> this, I was now able to nail this down to a simple
> operation: On the system I''m testing with I was able to identify
that
> the handling of the interrupt from the SCSI
> controller triggers a simultaneous interrupt from one of the USB 
> controllers, of which (by way of looking at the native
> execution) it is known that it doesn''t generate any interrupts.
Hence
> it was possible to BUG() the box the first time
> such an interrupt appears. This happens when 
> mask_and_ack_level_ioapic_irq() masks the irq from the first SCSI
> controller (pin 0 of IOAPIC 3). No matter how large delays I insert 
> before calling mask_IO_APIC_irq(), the other
> interrupt (pin 16 of IOAPIC 0) becomes visible (in the redirection 
> table''s irr bit) immediately after the write that
> sets the mask bit for the first interrupt.
>
> Obviously I am lost here - I have no way to tell why writing on 
> IOAPIC''s redir entry affects an interrupt routed
> through a completely different IOAPIC. Nevertheless it is clear that 
> the problem is unique to Xen, because native Linux
> doesn''t try to mask the IRQ.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2006-Apr-12 17:48 UTC

head link

Re: [Xen-devel] spurious interrupts

Hi,

On Wed, 2006-04-12 at 15:37 +0200, Jan Beulich wrote:
> While I haven''t heard anything whether others were able to
reproduce
> this, I was now able to nail this down to a simple operation: On the
> system I''m testing with I was able to identify that the handling
of
> the interrupt from the SCSI controller triggers a simultaneous
> interrupt from one of the USB controllers
I''ve been seeing spurious irq16 on a USB port with no devices attached
for Xen for some time now, resulting in

        irq 16: nobody cared

on boot and 

         Disabling IRQ #16

ten minutes later once it reaches 100,000 unhandled interrupts.  With
today''s build, including Keir''s latest APIC changes, not only
is serial
console input fixed (thanks Keir!), but the spurious IRQ also seems to
be solved.  (At least, at the time of writing that box is showing only
at 15,000 interrupts on irq16 after an hour instead of 100,000 after 10
minutes --- so I should know in another 5 hours whether or not it
actually still gets disabled at 100,000.)

--Stephen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2006-Apr-12 21:16 UTC

head link

Re: [Xen-devel] spurious interrupts

Hi,

On Wed, 2006-04-12 at 13:48 -0400, Stephen C. Tweedie wrote:
> > While I haven''t heard anything whether others were able to
reproduce
> > this, I was now able to nail this down to a simple operation: On the
> > system I''m testing with I was able to identify that the
handling of
> > the interrupt from the SCSI controller triggers a simultaneous
> > interrupt from one of the USB controllers
Turns out that I''m seeing the same.  On the box that was showing
>         irq 16: nobody cared
>          Disabling IRQ #16
it seems that the irq16 is indeed getting dispatched every time an
interrupt arrives for the disk.  In this case it''s sata, ICH5 on irq17.
> (At least, at the time of writing that box is showing only
> at 15,000 interrupts on irq16 after an hour instead of 100,000 after 10
> minutes
this turned out to be related to disk activity, and a heavy disk load
rapidly reproduced the problem even with the latest Xen kernel.

--Stephen



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Deuskar, Prafulla

2006-Apr-12 22:11 UTC

head link

RE: [Xen-devel] spurious interrupts

>
>On Wed, 2006-04-12 at 13:48 -0400, Stephen C. Tweedie wrote:
>
>> > While I haven''t heard anything whether others were able
to
reproduce>> > this, I was now able to nail this down to a simple operation: On
the>> > system I''m testing with I was able to identify that the
handling of
>> > the interrupt from the SCSI controller triggers a simultaneous
>> > interrupt from one of the USB controllers
>
>Turns out that I''m seeing the same.  On the box that was showing
>
>>         irq 16: nobody cared
>>          Disabling IRQ #16
>
>it seems that the irq16 is indeed getting dispatched every time an
>interrupt arrives for the disk.  In this case it''s sata, ICH5 on
irq17.
>
>> (At least, at the time of writing that box is showing only
>> at 15,000 interrupts on irq16 after an hour instead of 100,000 after
10>> minutes
>
>this turned out to be related to disk activity, and a heavy disk load
>rapidly reproduced the problem even with the latest Xen kernel.
>
We see similar behavior with network traffic - irq16 (attached to USB)
getting dispatched every time NIC interrupt is triggered.

Thanks,
Prafulla

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2006-Apr-13 11:42 UTC

head link

Re: [Xen-devel] spurious interrupts

>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 12.04.06 19:21:44
>>>
>This happens definitely immediately after writing to the IO-APIC? There 
>is an ack_APIC_irq() not many instructions after that, and it would 
>make much more sense for the IRR flag to be set after that. :-)
Yes, definitely there. I added (long) delays before masking and after masking
(and also after ack-ing). The delay
routine spins on reading the (expected) other redir entry, and the first
iteration of it after the masking sees the ill
irr bit turned on.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2006 - spurious interrupts

[Xen-devel] spurious interrupts

RE: [Xen-devel] spurious interrupts

Re: [Xen-devel] spurious interrupts

Re: [Xen-devel] spurious interrupts

Re: [Xen-devel] spurious interrupts

RE: [Xen-devel] spurious interrupts

Re: [Xen-devel] spurious interrupts