All, I have a laptop here that has some interrupt trouble, unrelated to Xen (I believe it''s a bug in the ACPI tables, but I digress). On the bare-metal kernel, if 999,900 out of 100,000 interrupts go un-acknowledged to a particular interrupt line, the kernel disables that interrupt line. In the Xen kernel, the same logic applies. However, when the Xen kernel goes to disable the physical interrupt, it only masks out the event channel: static void disable_pirq(unsigned int irq) { int evtchn = evtchn_from_irq(irq); if (VALID_EVTCHN(evtchn)) mask_evtchn(evtchn); } And at this point, *all* interrupts on the affected machine seem to stop, not just this physical IRQ line. I believe the problem is that when we go to disable a PIRQ, we actually need to get the HV to mask it out on the IOAPIC. This patch does exactly that; on disable_pirq(), we actually just call out to shutdown_pirq(), which ends up hypercalling and getting the HV to mask out the interrupt. On the other side, I needed to modify enable_pirq() to call out to startup_pirq(), so that it would actually allocate the event channel. With this patch in place, the affected laptop no longer hangs up when the kernel decides to disable that particular interrupt line. Signed-off-by: Chris Lalancette <clalance@redhat.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Not sure if the change is a bit over-kill, since enable_pirq is has void return type, while startup_pirq is "int" return type, with possibility to fail. For example, in following situation, the startup_pirq may fail : 1) when startup_pirq again, fail to get free port, 2) if another domain try to bind the pirq with BIND_PIRQ_WILL_SHARE cleared (like to probing, will it happen?) between the shutdown_pirq/startup_pirq sequence. -- Yunhong Jiang xen-devel-bounces@lists.xensource.com <> wrote:> All, > I have a laptop here that has some interrupt trouble, > unrelated to Xen (I > believe it''s a bug in the ACPI tables, but I digress). On thebare-metal> kernel, if 999,900 out of 100,000 interrupts go > un-acknowledged to a particular > interrupt line, the kernel disables that interrupt line. In > the Xen kernel, the > same logic applies. However, when the Xen kernel goes to > disable the physical > interrupt, it only masks out the event channel: > > static void disable_pirq(unsigned int irq) > { > int evtchn = evtchn_from_irq(irq); > > if (VALID_EVTCHN(evtchn)) > mask_evtchn(evtchn); > } > > And at this point, *all* interrupts on the affected machine > seem to stop, not > just this physical IRQ line. I believe the problem is that > when we go to > disable a PIRQ, we actually need to get the HV to mask it out > on the IOAPIC. > This patch does exactly that; on disable_pirq(), we actually > just call out to > shutdown_pirq(), which ends up hypercalling and getting the HV > to mask out the > interrupt. On the other side, I needed to modify > enable_pirq() to call out to > startup_pirq(), so that it would actually allocate the event channel. > > With this patch in place, the affected laptop no longer hangs > up when the kernel > decides to disable that particular interrupt line. > > Signed-off-by: Chris Lalancette <clalance@redhat.com>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong wrote:> Not sure if the change is a bit over-kill, since enable_pirq is has void > return type, while startup_pirq is "int" return type, with possibility > to fail.Thanks for looking! This is true, startup_pirq() *could* fail; but if you notice in the code, it doesn''t actually have anything but a "return 0", so it doesn''t report errors currently anyway.> > For example, in following situation, the startup_pirq may fail : 1) when > startup_pirq again, fail to get free port, 2) if another domain try to > bind the pirq with BIND_PIRQ_WILL_SHARE cleared (like to probing, will > it happen?) between the shutdown_pirq/startup_pirq sequence.Yes, you are right, this can happen if another domain is probing. However, I''m not sure that it is any different from when you are calling ->startup() for the first time; you will just fail to get the event channel. Without introducing another event channel op (which seems like a LOT of overkill), I''m not aware of another way of asking the HV to mask out that IRQ on the IOAPIC. Chris Lalancette _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It''s a slightly tricky one. I was actually thinking before about adding explicit pirq mask/unmask physdev_op hypercalls, but actually doing it with startup/shutdown seems maybe more elegant. I need to read the code a bit more closely and check we can''t lose edge-triggered interrupts this way, or anything like that. But I think the higher-level irq code deals with that. -- Keir On 15/11/07 14:10, "Chris Lalancette" <clalance@redhat.com> wrote:> Jiang, Yunhong wrote: >> Not sure if the change is a bit over-kill, since enable_pirq is has void >> return type, while startup_pirq is "int" return type, with possibility >> to fail. > > Thanks for looking! > > This is true, startup_pirq() *could* fail; but if you notice in the code, it > doesn''t actually have anything but a "return 0", so it doesn''t report errors > currently anyway. > >> >> For example, in following situation, the startup_pirq may fail : 1) when >> startup_pirq again, fail to get free port, 2) if another domain try to >> bind the pirq with BIND_PIRQ_WILL_SHARE cleared (like to probing, will >> it happen?) between the shutdown_pirq/startup_pirq sequence. > > Yes, you are right, this can happen if another domain is probing. However, > I''m > not sure that it is any different from when you are calling ->startup() for > the > first time; you will just fail to get the event channel. Without introducing > another event channel op (which seems like a LOT of overkill), I''m not aware > of > another way of asking the HV to mask out that IRQ on the IOAPIC. > > Chris Lalancette > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes, seems the startup_irq''s return value does not matter, even to request_irq(). -- Yunhong Jiang Chris Lalancette <mailto:clalance@redhat.com> wrote:> Jiang, Yunhong wrote: >> Not sure if the change is a bit over-kill, since enable_pirq is hasvoid>> return type, while startup_pirq is "int" return type, withpossibility>> to fail. > > Thanks for looking! > > This is true, startup_pirq() *could* fail; but if you notice > in the code, it > doesn''t actually have anything but a "return 0", so it doesn''t reporterrors> currently anyway. > >> >> For example, in following situation, the startup_pirq may fail : 1)when>> startup_pirq again, fail to get free port, 2) if another domain tryto>> bind the pirq with BIND_PIRQ_WILL_SHARE cleared (like to probing,will>> it happen?) between the shutdown_pirq/startup_pirq sequence. > > Yes, you are right, this can happen if another domain is probing.However,> I''m not sure that it is any different from when you are calling->startup()> for the first time; you will just fail to get the event channel.Without> introducing another event channel op (which seems like a LOT ofoverkill),> I''m not aware of another way of asking the HV to mask out that IRQ onthe> IOAPIC. > > Chris Lalancette_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chris, Could you try the alternative attached patch? It''s not been build or run tested, but hopefully you can see the gist of what''s going on. Basically, in case the irq is edge-triggered, I want to have an ''edge in hand'' before we actually shutdown the irq. -- Keir On 14/11/07 19:10, "Chris Lalancette" <clalance@redhat.com> wrote:> All, > I have a laptop here that has some interrupt trouble, unrelated to Xen (I > believe it''s a bug in the ACPI tables, but I digress). On the bare-metal > kernel, if 999,900 out of 100,000 interrupts go un-acknowledged to a > particular > interrupt line, the kernel disables that interrupt line. In the Xen kernel, > the > same logic applies. However, when the Xen kernel goes to disable the physical > interrupt, it only masks out the event channel: > > static void disable_pirq(unsigned int irq) > { > int evtchn = evtchn_from_irq(irq); > > if (VALID_EVTCHN(evtchn)) > mask_evtchn(evtchn); > } > > And at this point, *all* interrupts on the affected machine seem to stop, not > just this physical IRQ line. I believe the problem is that when we go to > disable a PIRQ, we actually need to get the HV to mask it out on the IOAPIC. > This patch does exactly that; on disable_pirq(), we actually just call out to > shutdown_pirq(), which ends up hypercalling and getting the HV to mask out the > interrupt. On the other side, I needed to modify enable_pirq() to call out to > startup_pirq(), so that it would actually allocate the event channel. > > With this patch in place, the affected laptop no longer hangs up when the > kernel > decides to disable that particular interrupt line. > > Signed-off-by: Chris Lalancette <clalance@redhat.com> > --- linux-2.6.18.noarch/drivers/xen/core/evtchn.c.orig > +++ linux-2.6.18.noarch/drivers/xen/core/evtchn.c > @@ -629,20 +629,12 @@ static void shutdown_pirq(unsigned int i > > static void enable_pirq(unsigned int irq) > { > - int evtchn = evtchn_from_irq(irq); > - > - if (VALID_EVTCHN(evtchn)) { > - unmask_evtchn(evtchn); > - pirq_unmask_notify(irq_to_pirq(irq)); > - } > + startup_pirq(irq); > } > > static void disable_pirq(unsigned int irq) > { > - int evtchn = evtchn_from_irq(irq); > - > - if (VALID_EVTCHN(evtchn)) > - mask_evtchn(evtchn); > + shutdown_pirq(irq); > } > > static void ack_pirq(unsigned int irq) > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> Chris, > > Could you try the alternative attached patch? It''s not been build or run > tested, but hopefully you can see the gist of what''s going on. Basically, in > case the irq is edge-triggered, I want to have an ''edge in hand'' before we > actually shutdown the irq. >Keir, I was off at some training today; the laptop in question is at work. I''ll give the patch a whirl on Monday. Thanks, Chris Lalancette _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> Chris, > > Could you try the alternative attached patch? It''s not been build or run > tested, but hopefully you can see the gist of what''s going on. Basically, in > case the irq is edge-triggered, I want to have an ''edge in hand'' before we > actually shutdown the irq. > > -- Keir >Keir, I just got the hardware this afternoon. I''ve been testing it the past couple of hours, and your new patch seems to work fine. That is, when it goes to disable the pIRQ, it doesn''t hang up the machine. Thanks, Chris Lalancette _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel