Jeremy Fitzhardinge
2010-Aug-24 21:35 UTC
[Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
Hi Linus, This pair of patches fixes a long-standing bug whose most noticeable symptom was that the Xen blkfront driver would hang up very occasionally (sometimes never, sometimes after weeks or months of uptime). We worked out the root cause was that it was incorrectly treating Xen events as level rather than edge triggered interrupts, which works fine unless you''re handling one interrupt, the interrupt gets migrated to another cpu and then re-raised. This ends up losing the interrupt because the edge-triggering of the second interrupt is lost. The other change changes IPI and VIRQ event sources to use handle_percpu_irq, because treating them as level is also wrong, and they''re actually inherently percpu events, much like LAPIC vectors. I''d like to get this fix into the current kernel and into stable sooner rather than later. Thanks, J The following changes since commit 76be97c1fc945db08aae1f1b746012662d643e97: Linux 2.6.36-rc2 (2010-08-22 17:43:29 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git upstream/core Jeremy Fitzhardinge (2): xen: use percpu interrupts for IPIs and VIRQs xen: handle events as edge-triggered drivers/xen/events.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 72f91bf..13365ba 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -112,6 +112,7 @@ static inline unsigned long *cpu_evtchn_mask(int cpu) #define VALID_EVTCHN(chn) ((chn) != 0) static struct irq_chip xen_dynamic_chip; +static struct irq_chip xen_percpu_chip; /* Constructor for packed IRQ information. */ static struct irq_info mk_unbound_info(void) @@ -377,7 +378,7 @@ int bind_evtchn_to_irq(unsigned int evtchn) irq = find_unbound_irq(); set_irq_chip_and_handler_name(irq, &xen_dynamic_chip, - handle_level_irq, "event"); + handle_edge_irq, "event"); evtchn_to_irq[evtchn] = irq; irq_info[irq] = mk_evtchn_info(evtchn); @@ -403,8 +404,8 @@ static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu) if (irq < 0) goto out; - set_irq_chip_and_handler_name(irq, &xen_dynamic_chip, - handle_level_irq, "ipi"); + set_irq_chip_and_handler_name(irq, &xen_percpu_chip, + handle_percpu_irq, "ipi"); bind_ipi.vcpu = cpu; if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi, @@ -444,8 +445,8 @@ static int bind_virq_to_irq(unsigned int virq, unsigned int cpu) irq = find_unbound_irq(); - set_irq_chip_and_handler_name(irq, &xen_dynamic_chip, - handle_level_irq, "virq"); + set_irq_chip_and_handler_name(irq, &xen_percpu_chip, + handle_percpu_irq, "virq"); evtchn_to_irq[evtchn] = irq; irq_info[irq] = mk_virq_info(evtchn, virq); @@ -964,6 +965,16 @@ static struct irq_chip xen_dynamic_chip __read_mostly = { .retrigger = retrigger_dynirq, }; +static struct irq_chip xen_percpu_chip __read_mostly = { + .name = "xen-percpu", + + .disable = disable_dynirq, + .mask = disable_dynirq, + .unmask = enable_dynirq, + + .ack = ack_dynirq, +}; + int xen_set_callback_via(uint64_t via) { struct xen_hvm_param a; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-25 07:52 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > We worked out the root cause was that it was incorrectly treating Xen > events as level rather than edge triggered interrupts, which works fine > unless you''re handling one interrupt, the interrupt gets migrated to > another cpu and then re-raised. This ends up losing the interrupt > because the edge-triggering of the second interrupt is lost.While this description would seem plausible at the first glance, it doesn''t match up with unmask_evtchn() already taking care of exactly this case. Or are you implicitly saying that this code is broken in some way (if so, how, and shouldn''t it then be that code that needs fixing, or removing if you want to stay with the edge handling)? I do however agree that using handle_level_irq() is problematic (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html), but as said there I think using the fasteoi logic is preferable. No matter whether using edge or level, the ->end() method will never be called (whereas fasteoi calls ->eoi(), which would just need to be vectored to the same function as ->end()). Without end_pirq() ever called, you can''t let Xen know of bad PIRQs (so that it can disable them instead of continuing to call the [now shortcut] handler in the owning domain).> The other change changes IPI and VIRQ event sources to use > handle_percpu_irq, because treating them as level is also wrong, and > they''re actually inherently percpu events, much like LAPIC vectors.This doesn''t seem right for the general VIRQ case: global ones should not be disallowed migration between CPUs. Since in your model the requestor has to pass IRQF_PERCPU anyway, shouldn''t you make the selection of the handler dependent upon this flag? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Aug-25 10:04 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On Wed, 2010-08-25 at 03:52 -0400, Jan Beulich wrote:> >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > We worked out the root cause was that it was incorrectly treating Xen > > events as level rather than edge triggered interrupts, which works fine > > unless you''re handling one interrupt, the interrupt gets migrated to > > another cpu and then re-raised. This ends up losing the interrupt > > because the edge-triggering of the second interrupt is lost. > > While this description would seem plausible at the first glance, it > doesn''t match up with unmask_evtchn() already taking care of > exactly this case. Or are you implicitly saying that this code is > broken in some way (if so, how, and shouldn''t it then be that > code that needs fixing, or removing if you want to stay with the > edge handling)?Not broken, but a different problem. The unmask ''resend'' only catches the edge lost if the event was raised while it was still masked. But level irq doesn''t have to save PENDING state. In the Xen event migration case the edge isn''t lost, but the upcall will drop the invocation when the handler is found inprogress on the previous cpu.> I do however agree that using handle_level_irq() is problematic > (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html), > but as said there I think using the fasteoi logic is preferable. No > matter whether using edge or level, the ->end() method will > never be called (whereas fasteoi calls ->eoi(), which would > just need to be vectored to the same function as ->end()). > Without end_pirq() ever called, you can''t let Xen know of > bad PIRQs (so that it can disable them instead of continuing > to call the [now shortcut] handler in the owning domain).Not an opinion, just confused: Isn''t all that dealt with in chip->disable? Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-25 11:24 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 25.08.10 at 12:04, Daniel Stodden <daniel.stodden@citrix.com> wrote: > On Wed, 2010-08-25 at 03:52 -0400, Jan Beulich wrote: >> >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> > We worked out the root cause was that it was incorrectly treating Xen >> > events as level rather than edge triggered interrupts, which works fine >> > unless you''re handling one interrupt, the interrupt gets migrated to >> > another cpu and then re-raised. This ends up losing the interrupt >> > because the edge-triggering of the second interrupt is lost. >> >> While this description would seem plausible at the first glance, it >> doesn''t match up with unmask_evtchn() already taking care of >> exactly this case. Or are you implicitly saying that this code is >> broken in some way (if so, how, and shouldn''t it then be that >> code that needs fixing, or removing if you want to stay with the >> edge handling)? > > Not broken, but a different problem. The unmask ''resend'' only catches > the edge lost if the event was raised while it was still masked. But > level irq doesn''t have to save PENDING state. In the Xen event migration > case the edge isn''t lost, but the upcall will drop the invocation when > the handler is found inprogress on the previous cpu.Hmm, indeed. But that problem must have existed in all post-2.6.18 kernels then... And that shouldn''t be a problem with fasteoi, as that one calls ->eoi() even when INPROGRESS was set (other than level, which calls unmask only when it wasn''t set).>> I do however agree that using handle_level_irq() is problematic >> (see > http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html), >> but as said there I think using the fasteoi logic is preferable. No >> matter whether using edge or level, the ->end() method will >> never be called (whereas fasteoi calls ->eoi(), which would >> just need to be vectored to the same function as ->end()). >> Without end_pirq() ever called, you can''t let Xen know of >> bad PIRQs (so that it can disable them instead of continuing >> to call the [now shortcut] handler in the owning domain). > > Not an opinion, just confused: Isn''t all that dealt with in > chip->disable?With disable_pirq() being empty (at least in the branches I looked at)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-25 17:54 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On 08/25/2010 12:52 AM, Jan Beulich wrote:> >>> On 24.08.10 at 23:35, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> We worked out the root cause was that it was incorrectly treating Xen >> events as level rather than edge triggered interrupts, which works fine >> unless you''re handling one interrupt, the interrupt gets migrated to >> another cpu and then re-raised. This ends up losing the interrupt >> because the edge-triggering of the second interrupt is lost. > While this description would seem plausible at the first glance, it > doesn''t match up with unmask_evtchn() already taking care of > exactly this case. Or are you implicitly saying that this code is > broken in some way (if so, how, and shouldn''t it then be that > code that needs fixing, or removing if you want to stay with the > edge handling)? > > I do however agree that using handle_level_irq() is problematic > (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html), > but as said there I think using the fasteoi logic is preferable. No > matter whether using edge or level, the ->end() method will > never be called (whereas fasteoi calls ->eoi(), which would > just need to be vectored to the same function as ->end()). > Without end_pirq() ever called, you can''t let Xen know of > bad PIRQs (so that it can disable them instead of continuing > to call the [now shortcut] handler in the owning domain).Note that this patch is specifically for upstream Xen, which doesn''t have any pirq support in it at present. However, I did consider using fasteoi, but I couldn''t see how to make it work. The problem is that it only does a single call into the irq_chip for EOI after calling the interrupt handler, but there is no call beforehand to ack the interrupt (which means clear the event flag in our case). This leads to a race where an event can be lost after the interrupt handler has returned, but before the event flag has been cleared (because Xen won''t set pending or call the upcall function if the event is already set). I guess I could pre-clear the event in the upcall function, but I''m not sure that''s any better. In the dom0 kernels, I followed the example of the IOAPIC irq_chip implementation, which does the hardware EOI in the ack call at the start of handle_edge_irq, can did the EOI hypercall (when necessary) there. I welcome a reviewer''s eye on this though.>> The other change changes IPI and VIRQ event sources to use >> handle_percpu_irq, because treating them as level is also wrong, and >> they''re actually inherently percpu events, much like LAPIC vectors. > This doesn''t seem right for the general VIRQ case: global ones > should not be disallowed migration between CPUs. Since in your > model the requestor has to pass IRQF_PERCPU anyway, > shouldn''t you make the selection of the handler dependent > upon this flag?I was thinking specifically of the timer, debug and console virqs. The last is the only one which could conceivably be non-percpu, but in practice I think it would be a bad idea to put it on anything other than cpu0. What other global virqs are there? Nothing that''s high-frequency enough to be worth migrating? But yes, if necessary, I could make it depend on the IRQF_PERCPU flag. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-26 06:46 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Note that this patch is specifically for upstream Xen, which doesn''t > have any pirq support in it at present.I understand that, but saw that you had paralleling changes to the pirq handling in your Dom0 tree.> However, I did consider using fasteoi, but I couldn''t see how to make > it work. The problem is that it only does a single call into the > irq_chip for EOI after calling the interrupt handler, but there is no > call beforehand to ack the interrupt (which means clear the event flag > in our case). This leads to a race where an event can be lost after the > interrupt handler has returned, but before the event flag has been > cleared (because Xen won''t set pending or call the upcall function if > the event is already set). I guess I could pre-clear the event in the > upcall function, but I''m not sure that''s any better.That''s precisely what we''re doing.> In the dom0 kernels, I followed the example of the IOAPIC irq_chip > implementation, which does the hardware EOI in the ack call at the start > of handle_edge_irq, can did the EOI hypercall (when necessary) there. I > welcome a reviewer''s eye on this though.This I didn''t actually notice so far. That doesn''t look right, at least in combination with ->mask() being wired to disable_pirq(), which is empty (and btw., if the latter was right, you should also wire ->mask_ack() to disable_pirq() to avoid a pointless indirect call). But even with ->mask() actually masking the IRQ I''m not certain this is right. At the very least it''ll make Xen see a potential second instance of the same IRQ much earlier than you will really be able to handle it. This is particularly bad for level triggered ones, as Xen will see them right again after you passed it the EOI notification - as a result there''ll be twice as many interrupts seen by Xen on the respective lines. The native I/O APIC can validly do this as ->ack() only gets called for edge triggered interrupts (which is why ->eoi() is wired to ack_apic_level()).> I was thinking specifically of the timer, debug and console virqs. The > last is the only one which could conceivably be non-percpu, but in > practice I think it would be a bad idea to put it on anything other than > cpu0. What other global virqs are there? Nothing that''s high-frequency > enough to be worth migrating?Once supported in your tree, oprofile could have high interrupt rates, and the trace buffer ones might too. Admittedly both are debugging aids, but I don''t think it''d be nice for them to influence performance more than necessary. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-26 16:32 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On 08/25/2010 11:46 PM, Jan Beulich wrote:> >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> Note that this patch is specifically for upstream Xen, which doesn''t >> have any pirq support in it at present. > I understand that, but saw that you had paralleling changes to the > pirq handling in your Dom0 tree. > >> However, I did consider using fasteoi, but I couldn''t see how to make >> it work. The problem is that it only does a single call into the >> irq_chip for EOI after calling the interrupt handler, but there is no >> call beforehand to ack the interrupt (which means clear the event flag >> in our case). This leads to a race where an event can be lost after the >> interrupt handler has returned, but before the event flag has been >> cleared (because Xen won''t set pending or call the upcall function if >> the event is already set). I guess I could pre-clear the event in the >> upcall function, but I''m not sure that''s any better. > That''s precisely what we''re doing.You mean pre-clearing the event? OK. But aren''t you still subject to the bug the switch to handle_edge_irq fixed? With handle_fasteoi_irq: cpu A cpu B get event set INPROGRESS call action : : <migrate event channel to B> : get event : INPROGRESS set? -> EOI, return : action returns clear INPROGRESS EOI The event arriving on B is lost, and there''s no record made of it ever having arrived. How do you avoid this? With handle_edge_irq, the second event will set PENDING in the irq_desc, and a loop will keep running on cpu A until PENDING is clear, so nothing is ever lost.>> In the dom0 kernels, I followed the example of the IOAPIC irq_chip >> implementation, which does the hardware EOI in the ack call at the start >> of handle_edge_irq, can did the EOI hypercall (when necessary) there. I >> welcome a reviewer''s eye on this though. > This I didn''t actually notice so far. > > That doesn''t look right, at least in combination with ->mask() being > wired to disable_pirq(), which is empty (and btw., if the latter was > right, you should also wire ->mask_ack() to disable_pirq() to avoid > a pointless indirect call). > > But even with ->mask() actually masking the IRQ I''m not certain > this is right. At the very least it''ll make Xen see a potential > second instance of the same IRQ much earlier than you will > really be able to handle it. This is particularly bad for level > triggered ones, as Xen will see them right again after you > passed it the EOI notification - as a result there''ll be twice as > many interrupts seen by Xen on the respective lines. > > The native I/O APIC can validly do this as ->ack() only gets > called for edge triggered interrupts (which is why ->eoi() is > wired to ack_apic_level()).Yes. The code as-is works OK, but I haven''t checked to see if Xen it taking many spurious interrupts. It probably helps that my test machine has MSI for almost everything. But does that mean the pirq code needs to have different ack/eoi behaviour depending on the triggering of the ioapic interrupt?>> I was thinking specifically of the timer, debug and console virqs. The >> last is the only one which could conceivably be non-percpu, but in >> practice I think it would be a bad idea to put it on anything other than >> cpu0. What other global virqs are there? Nothing that''s high-frequency >> enough to be worth migrating? > Once supported in your tree, oprofile could have high interrupt > rates, and the trace buffer ones might too. Admittedly both are > debugging aids, but I don''t think it''d be nice for them to influence > performance more than necessary.VIRQ_XENOPROF is percpu as well. VIRQ_TBUF is global, but it has to happen on *some* vcpu - but yes, I guess it would be useful to put it on another vcpu so that it doesn''t affect all the stuff tied to vcpu 0. (At the moment it would be tied to whatever vcpu set up the virq, so you could control initial placement that way...) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-27 08:56 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > On 08/25/2010 11:46 PM, Jan Beulich wrote: >> >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >>> Note that this patch is specifically for upstream Xen, which doesn''t >>> have any pirq support in it at present. >> I understand that, but saw that you had paralleling changes to the >> pirq handling in your Dom0 tree. >> >>> However, I did consider using fasteoi, but I couldn''t see how to make >>> it work. The problem is that it only does a single call into the >>> irq_chip for EOI after calling the interrupt handler, but there is no >>> call beforehand to ack the interrupt (which means clear the event flag >>> in our case). This leads to a race where an event can be lost after the >>> interrupt handler has returned, but before the event flag has been >>> cleared (because Xen won''t set pending or call the upcall function if >>> the event is already set). I guess I could pre-clear the event in the >>> upcall function, but I''m not sure that''s any better. >> That''s precisely what we''re doing. > > You mean pre-clearing the event? OK. > > But aren''t you still subject to the bug the switch to handle_edge_irq fixed? > > With handle_fasteoi_irq: > > cpu A cpu B > get eventmask and clear event> set INPROGRESS > call action > : > : > <migrate event channel to B> > : get eventCannot happen, event is masked (i.e. all that would happen is that the event occurrence would be logged evtchn_pending).> : INPROGRESS set? -> EOI, return > : > action returns > clear INPROGRESS > EOIunmask event, checking for whether the event got re-bound (and doing the unmask through a hypercall if necessary), thus re-raising the event in any case> The event arriving on B is lost, and there''s no record made of it ever > having arrived. How do you avoid this? > > With handle_edge_irq, the second event will set PENDING in the irq_desc, > and a loop will keep running on cpu A until PENDING is clear, so nothing > is ever lost.Actually, considering that you mask and unmask just like we do, I cannot even see why you would have run into above scenario with handle_level_irq(). Where is the window that I''m missing?>>> In the dom0 kernels, I followed the example of the IOAPIC irq_chip >>> implementation, which does the hardware EOI in the ack call at the start >>> of handle_edge_irq, can did the EOI hypercall (when necessary) there. I >>> welcome a reviewer''s eye on this though. >> This I didn''t actually notice so far. >> >> That doesn''t look right, at least in combination with ->mask() being >> wired to disable_pirq(), which is empty (and btw., if the latter was >> right, you should also wire ->mask_ack() to disable_pirq() to avoid >> a pointless indirect call). >> >> But even with ->mask() actually masking the IRQ I''m not certain >> this is right. At the very least it''ll make Xen see a potential >> second instance of the same IRQ much earlier than you will >> really be able to handle it. This is particularly bad for level >> triggered ones, as Xen will see them right again after you >> passed it the EOI notification - as a result there''ll be twice as >> many interrupts seen by Xen on the respective lines. >> >> The native I/O APIC can validly do this as ->ack() only gets >> called for edge triggered interrupts (which is why ->eoi() is >> wired to ack_apic_level()). > > Yes. The code as-is works OK, but I haven''t checked to see if Xen it > taking many spurious interrupts. It probably helps that my test machine > has MSI for almost everything. > > But does that mean the pirq code needs to have different ack/eoi > behaviour depending on the triggering of the ioapic interrupt?If you want to continue to use handle_edge_irq(), I think you will. With handle_fasteoi_irq(), you would leverage Xen''s handling of edge/level, and wouldn''t need to make any distinction. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Aug-27 20:43 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote:> >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > On 08/25/2010 11:46 PM, Jan Beulich wrote: > >> >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > >>> Note that this patch is specifically for upstream Xen, which doesn''t > >>> have any pirq support in it at present. > >> I understand that, but saw that you had paralleling changes to the > >> pirq handling in your Dom0 tree. > >> > >>> However, I did consider using fasteoi, but I couldn''t see how to make > >>> it work. The problem is that it only does a single call into the > >>> irq_chip for EOI after calling the interrupt handler, but there is no > >>> call beforehand to ack the interrupt (which means clear the event flag > >>> in our case). This leads to a race where an event can be lost after the > >>> interrupt handler has returned, but before the event flag has been > >>> cleared (because Xen won''t set pending or call the upcall function if > >>> the event is already set). I guess I could pre-clear the event in the > >>> upcall function, but I''m not sure that''s any better. > >> That''s precisely what we''re doing. > > > > You mean pre-clearing the event? OK. > > > > But aren''t you still subject to the bug the switch to handle_edge_irq fixed? > > > > With handle_fasteoi_irq: > > > > cpu A cpu B > > get event > > mask and clear eventArgh. Right, I guess that''s my fault, I was the one who came up with the PENDING theory, but indeed I failed to see the event masking bits. However, please read on.> > set INPROGRESS > > call action > > : > > : > > <migrate event channel to B> > > : get event > > Cannot happen, event is masked (i.e. all that would happen is > that the event occurrence would be logged evtchn_pending). > > > : INPROGRESS set? -> EOI, return > > : > > action returns > > clear INPROGRESS > > EOI > > unmask event, checking for whether the event got re-bound (and > doing the unmask through a hypercall if necessary), thus re-raising > the event in any caseYes. I agree. So let''s come up with a new theory. Right now I''m still looking at xen/next. Correct me if I''m mistaken: mask_ack_pirq will: 1. chip->mask 2. chip->ack Where chip->ack will: 1. move_native_irq 2. clear_evtchn. Now if you look into move_native_irq, it will: 1. chip->mask (gratuitous) 2. move 3. chip->unmask (aiiiiiie). That explains why edge_irq still fixed the problem. Price question is if that''s the kind of fix we wanted then. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Aug-27 21:49 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On Fri, 2010-08-27 at 13:43 -0700, Daniel Stodden wrote:> On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote: > > >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > > On 08/25/2010 11:46 PM, Jan Beulich wrote: > > >> >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > > >>> Note that this patch is specifically for upstream Xen, which doesn''t > > >>> have any pirq support in it at present. > > >> I understand that, but saw that you had paralleling changes to the > > >> pirq handling in your Dom0 tree. > > >> > > >>> However, I did consider using fasteoi, but I couldn''t see how to make > > >>> it work. The problem is that it only does a single call into the > > >>> irq_chip for EOI after calling the interrupt handler, but there is no > > >>> call beforehand to ack the interrupt (which means clear the event flag > > >>> in our case). This leads to a race where an event can be lost after the > > >>> interrupt handler has returned, but before the event flag has been > > >>> cleared (because Xen won''t set pending or call the upcall function if > > >>> the event is already set). I guess I could pre-clear the event in the > > >>> upcall function, but I''m not sure that''s any better. > > >> That''s precisely what we''re doing. > > > > > > You mean pre-clearing the event? OK. > > > > > > But aren''t you still subject to the bug the switch to handle_edge_irq fixed? > > > > > > With handle_fasteoi_irq: > > > > > > cpu A cpu B > > > get event > > > > mask and clear event > > Argh. Right, I guess that''s my fault, I was the one who came up with the > PENDING theory, but indeed I failed to see the event masking bits. > > However, please read on. > > > > set INPROGRESS > > > call action > > > : > > > : > > > <migrate event channel to B> > > > : get event > > > > Cannot happen, event is masked (i.e. all that would happen is > > that the event occurrence would be logged evtchn_pending). > > > > > : INPROGRESS set? -> EOI, return > > > : > > > action returns > > > clear INPROGRESS > > > EOI > > > > unmask event, checking for whether the event got re-bound (and > > doing the unmask through a hypercall if necessary), thus re-raising > > the event in any case > > Yes. I agree. So let''s come up with a new theory. Right now I''m still > looking at xen/next. Correct me if I''m mistaken: > > mask_ack_pirq will: > 1. chip->mask > 2. chip->ack > > Where chip->ack will: > 1. move_native_irq > 2. clear_evtchn. > > Now if you look into move_native_irq, it will: > 1. chip->mask (gratuitous) > 2. move > 3. chip->unmask (aiiiiiie). > > That explains why edge_irq still fixed the problem. > > Price question is if that''s the kind of fix we wanted then.XCP has, presumably older, mask_ack() and ack() handlers in core/evtchn.c. Those 1. move 2. mask 3. ack and therefore don''t have that problem. So maybe this was caused by some pvops specific patch a while ago? Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-27 23:43 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On 08/27/2010 01:43 PM, Daniel Stodden wrote:> On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote: >>>>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >>> On 08/25/2010 11:46 PM, Jan Beulich wrote: >>>> >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >>>>> Note that this patch is specifically for upstream Xen, which doesn''t >>>>> have any pirq support in it at present. >>>> I understand that, but saw that you had paralleling changes to the >>>> pirq handling in your Dom0 tree. >>>> >>>>> However, I did consider using fasteoi, but I couldn''t see how to make >>>>> it work. The problem is that it only does a single call into the >>>>> irq_chip for EOI after calling the interrupt handler, but there is no >>>>> call beforehand to ack the interrupt (which means clear the event flag >>>>> in our case). This leads to a race where an event can be lost after the >>>>> interrupt handler has returned, but before the event flag has been >>>>> cleared (because Xen won''t set pending or call the upcall function if >>>>> the event is already set). I guess I could pre-clear the event in the >>>>> upcall function, but I''m not sure that''s any better. >>>> That''s precisely what we''re doing. >>> You mean pre-clearing the event? OK. >>> >>> But aren''t you still subject to the bug the switch to handle_edge_irq fixed? >>> >>> With handle_fasteoi_irq: >>> >>> cpu A cpu B >>> get event >> mask and clear event > Argh. Right, I guess that''s my fault, I was the one who came up with the > PENDING theory, but indeed I failed to see the event masking bits. > > However, please read on. > >>> set INPROGRESS >>> call action >>> : >>> : >>> <migrate event channel to B> >>> : get event >> Cannot happen, event is masked (i.e. all that would happen is >> that the event occurrence would be logged evtchn_pending). >> >>> : INPROGRESS set? -> EOI, return >>> : >>> action returns >>> clear INPROGRESS >>> EOI >> unmask event, checking for whether the event got re-bound (and >> doing the unmask through a hypercall if necessary), thus re-raising >> the event in any case > Yes. I agree. So let''s come up with a new theory. Right now I''m still > looking at xen/next. Correct me if I''m mistaken: > > mask_ack_pirq will: > 1. chip->mask > 2. chip->ack > > Where chip->ack will: > 1. move_native_irq > 2. clear_evtchn. > > Now if you look into move_native_irq, it will: > 1. chip->mask (gratuitous) > 2. move > 3. chip->unmask (aiiiiiie). > > That explains why edge_irq still fixed the problem.Good point. I guess the simplest fix in that case would have been to use move_masked_irq()... The current fix is not wrong, so we can leave it as-is upstream for now. But I think I will try Jan''s idea about masking/clearing in the event upcall then using fasteoi as the handler. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-30 08:03 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 27.08.10 at 22:43, Daniel Stodden <daniel.stodden@citrix.com> wrote: > So let''s come up with a new theory. Right now I''m still > looking at xen/next. Correct me if I''m mistaken: > > mask_ack_pirq will: > 1. chip->mask > 2. chip->ack > > Where chip->ack will: > 1. move_native_irq > 2. clear_evtchn. > > Now if you look into move_native_irq, it will: > 1. chip->mask (gratuitous) > 2. move > 3. chip->unmask (aiiiiiie). > > That explains why edge_irq still fixed the problem.This totals to it having been wrong to break up -mask() and ->ack() - when using the combined ->mask_ack() (like in XCP etc) it would have been pretty obvious that move_native_irq() cannot go between mask() and ack(). For us, using fasteoi, move_native_irq() sits in ->eoi(), before un-masking. One could, as Jeremy suggests, call move_masked_irq() here, but I didn''t want to duplicate the IRQ_DISABLED check done in move_native_irq(), mainly to not depend on following potential future changes (additions) to the set of conditions checked there. Helpful would be if the function returned whether it actually went through the mask/unmask pair, as I''m not sure the double unmasking really is a good idea, especially in the PIRQ case (for the moment I''m considering putting an already-unmasked check into both ->eoi() handlers). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-30 08:43 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 30.08.10 at 10:03, "Jan Beulich" <JBeulich@novell.com> wrote: > Helpful would be if the function returned whether it actually went > through the mask/unmask pair, as I''m not sure the double > unmasking really is a good idea, especially in the PIRQ case (for > the moment I''m considering putting an already-unmasked check > into both ->eoi() handlers).Actually, it seems to me that this check really (also) belongs into unmask_evtchn(). Keir (I think you wrote this code originally), is there a reason (other than the implied assumption that it won''t get called on an already unmasked event channel) the function uses sync_clear_bit() rather than sync_test_and_clear_bit(), doing the other actions only if the bit wasn''t already clear, just like Xen itself does for the respective hypercall implementation? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Aug-30 08:48 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On 30/08/2010 09:43, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 30.08.10 at 10:03, "Jan Beulich" <JBeulich@novell.com> wrote: >> Helpful would be if the function returned whether it actually went >> through the mask/unmask pair, as I''m not sure the double >> unmasking really is a good idea, especially in the PIRQ case (for >> the moment I''m considering putting an already-unmasked check >> into both ->eoi() handlers). > > Actually, it seems to me that this check really (also) belongs into > unmask_evtchn(). Keir (I think you wrote this code originally), is > there a reason (other than the implied assumption that it won''t > get called on an already unmasked event channel) the function > uses sync_clear_bit() rather than sync_test_and_clear_bit(), > doing the other actions only if the bit wasn''t already clear, just > like Xen itself does for the respective hypercall implementation?Well, in 2.6.18 it was at least very unlikely that unmask_evtchn() would be called on an already-unmasked port. And the implementation of unmask-evtchn() is safe in the case that it is called for an already-unmasked port -- it just does a bit more work than necessary in that case. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-30 09:06 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 30.08.10 at 10:48, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > Well, in 2.6.18 it was at least very unlikely that unmask_evtchn() would be > called on an already-unmasked port. And the implementation of > unmask-evtchn() is safe in the case that it is called for an > already-unmasked port -- it just does a bit more work than necessary in that > case.... plus possibly causes an unnecessary upcall. However, do you also think that pirq_unmask_and_notify() is safe to be called twice? I would think the double EOI potentially sent to Xen could lead to an interrupt getting ack-ed that didn''t even get started to be serviced yet. And this, afaict, can happen in 2.6.18 as well (ack_pirq() -> move_native_irq() -> disable_pirq()/ enable_pirq() -> pirq_unmask_and_notify() followed by end_pirq() -> pirq_unmask_and_notify()). Here, however, you couldn''t even use the mask bit to detect the situation, since the masking only happens after already having called move_native_irq() (i.e. the event channel will be masked when you get into pirq_unmask_and_notify() the second time). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Aug-30 09:15 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On 30/08/2010 10:06, "Jan Beulich" <JBeulich@novell.com> wrote:> However, do you also think that pirq_unmask_and_notify() is safe > to be called twice? I would think the double EOI potentially sent to > Xen could lead to an interrupt getting ack-ed that didn''t even get > started to be serviced yet.Erm, well if this is a race that happens only occasionally, does it matter? Worst case you get another interrupt straight away. Only a problem if it happens often enough to cause a performance issue or even livelock interrupt storm. The obvious fix would be for the kernel to privately keep track of which event channels or pirqs are masked and/or disabled (e.g., with two bitflags per port). Then have the evtchn_mask flag be the OR of the two. If this is actually a real problem at all. I doubt move_native_irq() should be doing work very often when it is called. -- Keir> And this, afaict, can happen in 2.6.18 > as well (ack_pirq() -> move_native_irq() -> disable_pirq()/ > enable_pirq() -> pirq_unmask_and_notify() followed by end_pirq() > -> pirq_unmask_and_notify()). Here, however, you couldn''t even > use the mask bit to detect the situation, since the masking only > happens after already having called move_native_irq() (i.e. the > event channel will be masked when you get into > pirq_unmask_and_notify() the second time)._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-30 09:22 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 30.08.10 at 11:15, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 30/08/2010 10:06, "Jan Beulich" <JBeulich@novell.com> wrote: > >> However, do you also think that pirq_unmask_and_notify() is safe >> to be called twice? I would think the double EOI potentially sent to >> Xen could lead to an interrupt getting ack-ed that didn''t even get >> started to be serviced yet. > > Erm, well if this is a race that happens only occasionally, does it matter? > Worst case you get another interrupt straight away. Only a problem if it > happens often enough to cause a performance issue or even livelock interrupt > storm.I wasn''t worried about the level case, but rather the edge one. I forgot, however, that edge ones are ACKTYPE_NONE in Xen, so indeed there shouldn''t be a problem unless IRQs get moved around at a very high rate (which clearly they shouldn''t). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-30 16:36 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On 08/30/2010 01:03 AM, Jan Beulich wrote:> This totals to it having been wrong to break up -mask() and ->ack() - > when using the combined ->mask_ack() (like in XCP etc) it would have > been pretty obvious that move_native_irq() cannot go between > mask() and ack().Probably, but I just viewed the mask_ack() as an optimisation that didn''t need to be addressed on the first pass. I converted the basic event channels to fasteoi and it seems to work OK. I''ll look at pirq today.> For us, using fasteoi, move_native_irq() sits in ->eoi(), before > un-masking. One could, as Jeremy suggests, call move_masked_irq() > here, but I didn''t want to duplicate the IRQ_DISABLED check done > in move_native_irq(), mainly to not depend on following potential > future changes (additions) to the set of conditions checked there.Is there actually a problem with moving a IRQ_DISABLED interrupt? If so, shouldn''t that IRQ_DISABLED check also be in move_masked_irq()? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Aug-31 06:38 UTC
Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
>>> On 30.08.10 at 18:36, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > On 08/30/2010 01:03 AM, Jan Beulich wrote: >> For us, using fasteoi, move_native_irq() sits in ->eoi(), before >> un-masking. One could, as Jeremy suggests, call move_masked_irq() >> here, but I didn''t want to duplicate the IRQ_DISABLED check done >> in move_native_irq(), mainly to not depend on following potential >> future changes (additions) to the set of conditions checked there. > > Is there actually a problem with moving a IRQ_DISABLED interrupt? If > so, shouldn''t that IRQ_DISABLED check also be in move_masked_irq()?I don''t know, all I do know is that it initially wasn''t that way, but got changed to this at some point. Maybe it''s more like being pointless to move a disabled interrupt? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/25/2010 12:52 AM, Jan Beulich wrote:> I do however agree that using handle_level_irq() is problematic > (see http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01178.html), > but as said there I think using the fasteoi logic is preferable.I''ve been looking at this again. For non-pirq interrupts, fasteoi seems like a solid win. It looks like an overall simplification and I haven''t seen any problems. However, I''ve had more trouble extending this to pirq. My first attempt appeared to work in emulation, but when I run it on real hardware, msi interrupts are not getting through. If I boot with "pci=nomsi" then it sometimes works, but it often crashes Xen (see separate mail). Part of the problem is that I''m not really sure what the various irq_chip functions are really supposed to do, and the documentation is awful. .startup and .shutdown I understand, and I think they''re being called when we expect them to be (ie, when the driver registers an irq for the first time. Using .startup/.shutdown for enable/disable seems very heavyweight. Do we really want to be rebinding the pirq each time? Isn''t unmask/masking the event channel sufficient? At the moment my xen_evtchn_do_upcall() is masking and clearing the event channel before calling into generic_handle_irq_desc(), which will call handle_fasteoi_irq fairly directly. That runs straight through and the priq_chip''s eoi just does an EOI on the pirq if Xen says it needs one. But apparently this isn''t enough. Is there anything else I should be doing? (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq eoi flags, but I haven''t tested it yet. I''m also not really sure what the advantage of it is.) Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 03.09.10 at 20:46, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Using .startup/.shutdown for enable/disable seems very heavyweight. Do > we really want to be rebinding the pirq each time? Isn''t unmask/masking > the event channel sufficient?Depends - the original (2.6.18) implementation only makes enable_pirq() a conditional startup (and really startup_pirq() is conditional too), while disable_pirq() does nothing at all. While forward porting, considering the contexts in which ->disable() gets called (namely note_interrupt()) and after initially having had no ->enable()/->disable() methods at all (for default_enable() calling ->unmask() anyway, and default_disable() being a no-op as much as disable_pirq() was), I got to the conclusion that it might be better to do a full shutdown/startup, since it isn''t known whether a disable is permanent or just temporary. Now part of the question whether this is actually a good choice is why default_disable() doesn''t mask the affected IRQ - likely because IRQ_DISABLED is checked for and handled accordingly in all non- trivial flow handlers. The other aspect is that with the (original) switch to use handle_level_irq() we noticed at some point that the disabling of bad IRQs (where e.g. storms are noticed) didn''t work anymore, due to that logic sitting in ->end(), which doesn''t usually get called at all (nor does any other method except ->unmask() for the level case). Right now I don''t really remember whether making ->disable() an alias of shutdown wasn''t just a (failed iirc) attempt at getting Xen to know of the need to shut down such a bad IRQ. After the switch to fasteoi this logic should now be working again independent of what ->disable() does, so I will have to consider to un-alias disable_pirq() and shutdown_pirq() again.> At the moment my xen_evtchn_do_upcall() is masking and clearing the > event channel before calling into generic_handle_irq_desc(), which will > call handle_fasteoi_irq fairly directly. That runs straight through and > the priq_chip''s eoi just does an EOI on the pirq if Xen says it needs one. > > But apparently this isn''t enough. Is there anything else I should be doing?I can''t see anything, and our kernel also doesn''t.> (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq > eoi flags, but I haven''t tested it yet. I''m also not really sure what > the advantage of it is.)This is for you to avoid the EOI hypercall if it would be a no-op in Xen anyway. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Sep-07 01:53 UTC
[Xen-devel] Re: Using handle_fasteoi_irq for pirqs
On 09/06/2010 05:58 PM, Jan Beulich wrote:> >>> On 03.09.10 at 20:46, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> Using .startup/.shutdown for enable/disable seems very heavyweight. Do >> we really want to be rebinding the pirq each time? Isn''t unmask/masking >> the event channel sufficient? > Depends - the original (2.6.18) implementation only makes enable_pirq() > a conditional startup (and really startup_pirq() is conditional too), while > disable_pirq() does nothing at all. While forward porting, considering > the contexts in which ->disable() gets called (namely note_interrupt()) > and after initially having had no ->enable()/->disable() methods at all > (for default_enable() calling ->unmask() anyway, and default_disable() > being a no-op as much as disable_pirq() was), I got to the conclusion > that it might be better to do a full shutdown/startup, since it isn''t > known whether a disable is permanent or just temporary. > > Now part of the question whether this is actually a good choice is > why default_disable() doesn''t mask the affected IRQ - likely because > IRQ_DISABLED is checked for and handled accordingly in all non- > trivial flow handlers. > > The other aspect is that with the (original) switch to use > handle_level_irq() we noticed at some point that the disabling of > bad IRQs (where e.g. storms are noticed) didn''t work anymore, > due to that logic sitting in ->end(), which doesn''t usually get > called at all (nor does any other method except ->unmask() for > the level case). Right now I don''t really remember whether making > ->disable() an alias of shutdown wasn''t just a (failed iirc) attempt > at getting Xen to know of the need to shut down such a bad IRQ. > After the switch to fasteoi this logic should now be working again > independent of what ->disable() does, so I will have to consider > to un-alias disable_pirq() and shutdown_pirq() again.At the moment, I''m using this: static struct irq_chip xen_pirq_chip __read_mostly = { .name = "xen-pirq", .startup = startup_pirq, .shutdown = shutdown_pirq, .enable = pirq_eoi, .unmask = unmask_irq, .disable = mask_irq, .mask = mask_irq, .eoi = ack_pirq, .end = end_pirq, .set_affinity = set_affinity_irq, .retrigger = retrigger_irq, }; which seems to work OK now. The "enable=pirq_eoi" is essentially the same as "enable=startup_pirq", because your startup_pirq just does an EOI if the evtchn is already valid (and EOI always ends up unmasking too). ack_pirq and pirq_eoi are almost the same, except the former also does the call to move_masked_irq().>> At the moment my xen_evtchn_do_upcall() is masking and clearing the >> event channel before calling into generic_handle_irq_desc(), which will >> call handle_fasteoi_irq fairly directly. That runs straight through and >> the priq_chip''s eoi just does an EOI on the pirq if Xen says it needs one. >> >> But apparently this isn''t enough. Is there anything else I should be doing? > I can''t see anything, and our kernel also doesn''t.Where''s the source to your kernel? The one I looked at most recently was using handle_level_irq for everything. But anyway, I found my bug; I''d overlooked where MSI interrupts were being set up, and they were still using handle_edge_irq; changing them to fasteoi seems to have done trick.>> (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq >> eoi flags, but I haven''t tested it yet. I''m also not really sure what >> the advantage of it is.) > This is for you to avoid the EOI hypercall if it would be a no-op in > Xen anyway.Yes, but there''s also PHYSDEVOP_irq_status_query call. Does the "needs EOI" actually change from moment to moment in a way where having a shared page makes sense? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 07.09.10 at 03:53, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > On 09/06/2010 05:58 PM, Jan Beulich wrote: >> >>> On 03.09.10 at 20:46, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Where''s the source to your kernel? The one I looked at most recently > was using handle_level_irq for everything.Indeed, I did the switch in the master (and SLE11 SP1) kernels only relatively recently (and only the master one is already visible to the outside) - look at either ftp://ftp.suse.com/pub/projects/kernel/kotd/master/ or (un-expanded tree of patches) http://gitorious.org/opensuse/kernel-source/trees/master.>>> (I just implemented PHYSDEVOP_pirq_eoi_gmfn method of getting the pirq >>> eoi flags, but I haven''t tested it yet. I''m also not really sure what >>> the advantage of it is.) >> This is for you to avoid the EOI hypercall if it would be a no-op in >> Xen anyway. > > Yes, but there''s also PHYSDEVOP_irq_status_query call. Does the "needs > EOI" actually change from moment to moment in a way where having a > shared page makes sense?No, it doesn''t - using this function you can of course maintain the bitmap on your own (which we also fall back to if PHYSDEVOP_pirq_eoi_gmfn is not supported by the hypervisor). The actual advantage of using PHYSDEVOP_pirq_eoi_gmfn is that it implies an unmask of the corresponding event channel - see http://xenbits.xensource.com/xen-unstable.hg?rev/c820bf73a914. Also, regarding your earlier question on .disable - I just ran across http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/51b2b0d0921c, which makes me think that .enable indeed should remain an alias of .startup, but I also think that .disable should nevertheless do the masking (i.e. be an alias of .mask) Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Sep-07 08:02 UTC
[Xen-devel] Re: Using handle_fasteoi_irq for pirqs
On 09/07/2010 04:58 PM, Jan Beulich wrote:>> Yes, but there''s also PHYSDEVOP_irq_status_query call. Does the "needs >> EOI" actually change from moment to moment in a way where having a >> shared page makes sense? > No, it doesn''t - using this function you can of course maintain the > bitmap on your own (which we also fall back to if PHYSDEVOP_pirq_eoi_gmfn > is not supported by the hypervisor). The actual advantage of using > PHYSDEVOP_pirq_eoi_gmfn is that it implies an unmask of the > corresponding event channel - see > http://xenbits.xensource.com/xen-unstable.hg?rev/c820bf73a914.Yes, though it seems like an odd way of implementing it. The shared memory region for EOI flags looks like overkill when all it saves is one hypercall on pirq setup, and making that also have the side-effect of changing an existing hypercall''s behaviour is surprising. Too late now, I suppose, but it would seem that just adding a new PHYSDEVOP_eoi_unmask hypercall would have been a simpler approach. (But unmask typically doesn''t need a hypercall anyway, unless you''ve managed to get into a cross-cpu unmask situation, so it doesn''t seem like a big saving?) Also, in the restore path, the code does a BUG if it fails to call PHYSDEVOP_pirq_eoi_gmfn again. Couldn''t it just clear pirq_eoi_does_unmask (and re-initialize all the needs-eoi flags using PHYSDEVOP_irq_status_query) and carry on the old way? But the comment in PHYSDEVOP_irq_status_query says: /* * Even edge-triggered or message-based IRQs can need masking from * time to time. If teh guest is not dynamically checking for this * via the new pirq_eoi_map mechanism, it must conservatively always * execute the EOI hypercall. In practice, this only really makes a * difference for maskable MSI sources, and if those are supported * then dom0 is probably modern anyway. */ which suggests that the "needs EOI" status for each pirq can change after the pirq has been initialized (or if not, why isn''t using PHYSDEVOP_irq_status_query sufficient?). OTOH, if it can change spontaneously, then it all seems a bit racy... IOW, what does "from time to time" mean?> Also, regarding your earlier question on .disable - I just ran across > http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/51b2b0d0921c, > which makes me think that .enable indeed should remain an alias of > .startup, but I also think that .disable should nevertheless do the > masking (i.e. be an alias of .mask)That particular change has the strange asymmetry of making .enable do a startup (which only does eoi if the event channel is still valid), .disable a no-op, and .end shuts the pirq down iff the irq is pending and disabled. I see how this works in the context of the old __do_IRQ stuff in 2.6.18 though. What''s the specific deadlock mentioned in the commit comment? Is that still an issue with Xen''s auto-EOI-after-timeout behaviour? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 07.09.10 at 10:02, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > On 09/07/2010 04:58 PM, Jan Beulich wrote: > Also, in the restore path, the code does a BUG if it fails to call > PHYSDEVOP_pirq_eoi_gmfn again. Couldn''t it just clear > pirq_eoi_does_unmask (and re-initialize all the needs-eoi flags using > PHYSDEVOP_irq_status_query) and carry on the old way?That would seem possible, yes.> But the comment in PHYSDEVOP_irq_status_query says: > > /* > * Even edge-triggered or message-based IRQs can need masking from > * time to time. If teh guest is not dynamically checking for this > * via the new pirq_eoi_map mechanism, it must conservatively always > * execute the EOI hypercall. In practice, this only really makes a > * difference for maskable MSI sources, and if those are supported > * then dom0 is probably modern anyway. > */ > > which suggests that the "needs EOI" status for each pirq can change > after the pirq has been initialized (or if not, why isn''t using > PHYSDEVOP_irq_status_query sufficient?). OTOH, if it can change > spontaneously, then it all seems a bit racy... > > IOW, what does "from time to time" mean?Just look at the places where set_pirq_eoi() gets called in xen/arch/x86/irq.c: The flag (obviously) always gets set for IRQs that aren''t ACKTYPE_NONE, but there is an additional case in __do_IRQ_guest() where Xen wants the notification to re-enable the interrupt after disabling it due to all possible handler domains having it pending already. And you see that Xen would force you to always execute the EOI notification hypercall without the shared page (PHYSDEVOP_irq_status_query setting XENIRQSTAT_needs_eoi unconditionally), so using it you save a hypercall on each interrupt that doesn''t need an EOI (namely MSI-X and maskable MSI; I don''t think edge triggered ones are very important), and if you (as usual) don''t need to unmask via hypercall, you don''t need any hypercall at all in the IRQ exit path in these cases.>> Also, regarding your earlier question on .disable - I just ran across >> http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/51b2b0d0921c, >> which makes me think that .enable indeed should remain an alias of >> .startup, but I also think that .disable should nevertheless do the >> masking (i.e. be an alias of .mask) > > That particular change has the strange asymmetry of making .enable do a > startup (which only does eoi if the event channel is still valid), > .disable a no-op, and .end shuts the pirq down iff the irq is pending > and disabled. I see how this works in the context of the old __do_IRQ > stuff in 2.6.18 though.And it should similarly be fine with handle_fasteoi_irq() afaict, provided .end and .eoi reference the same function. The asymmetry was part of what made me change .disable to alias .shutdown.> What''s the specific deadlock mentioned in the commit comment? Is that > still an issue with Xen''s auto-EOI-after-timeout behaviour?Honestly, I don''t know. Keir? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel