Tian, Kevin
2011-May-06 06:43 UTC
[Xen-devel] [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
x86: don''t unmask disabled irqs when migrating them it doesn''t make sense to mask/unmask a disabled irq when migrating it from offlined cpu to another, because it''s not expected to handle any instance of it. Current mask/set_affinity/unmask steps may trigger unexpected instance on disabled irq which then simply bug on when there is no handler for it. One failing example is observed in Xen. Xen pvops guest marks a special type of irqs as disabled, which are simply used as a notification mechanism to wake up blocked guest in event polling state. Absolutely unmask them may cause the notification instance instead injected into the guest and then cause trouble. Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Jan Beulich <JBeulich@novell.com> --- linux-2.6.39-rc6.orig/arch/x86/kernel/irq.c 2011-05-04 10:59:13.000000000 +0800 +++ linux-2.6.39-rc6/arch/x86/kernel/irq.c 2011-05-06 09:20:25.563963000 +0800 @@ -237,6 +237,7 @@ void fixup_irqs(void) for_each_irq_desc(irq, desc) { int break_affinity = 0; int set_affinity = 1; + int do_mask; const struct cpumask *affinity; if (!desc) @@ -268,7 +269,9 @@ void fixup_irqs(void) } chip = irq_data_get_irq_chip(data); - if (!irqd_can_move_in_process_context(data) && chip->irq_mask) + do_mask = !irqd_irq_disabled(data) && + !irqd_can_move_in_process_context(data) && chip->irq_mask; + if (do_mask) chip->irq_mask(data); if (chip->irq_set_affinity) @@ -276,7 +279,7 @@ void fixup_irqs(void) else if (!(warned++)) set_affinity = 0; - if (!irqd_can_move_in_process_context(data) && chip->irq_unmask) + if (do_mask) chip->irq_unmask(data); raw_spin_unlock(&desc->lock); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thomas Gleixner
2011-May-06 09:59 UTC
[Xen-devel] Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Fri, 6 May 2011, Tian, Kevin wrote:> x86: don''t unmask disabled irqs when migrating them > > it doesn''t make sense to mask/unmask a disabled irq when migrating it > from offlined cpu to another, because it''s not expected to handle > any instance of it. Current mask/set_affinity/unmask steps may trigger > unexpected instance on disabled irq which then simply bug on when there > is no handler for it. One failing example is observed in Xen. Xen pvopsSo there is no handler, why the heck is there an irq action? if (!irq_has_action(irq) .... continue; Should have caught an uninitialized interrupt. If Xen abuses interrupts that way, then it rightfully explodes. And we do not fix it by magic somewhere else.> guest marks a special type of irqs as disabled, which are simply usedAs I explained before several times, IRQF_DISABLED has absolutely nothing to do with it and pvops _CANNOT_ mark an interrupt disabled.> > chip = irq_data_get_irq_chip(data); > - if (!irqd_can_move_in_process_context(data) && chip->irq_mask) > + do_mask = !irqd_irq_disabled(data) && > + !irqd_can_move_in_process_context(data) && chip->irq_mask; > + if (do_mask) > chip->irq_mask(data);This is completely wrong. irqd_irq_disabled() is a status information which does not tell you whether the interrupt is actually masked at the hardware level because we do lazy interrupt hardware masking. So your change would keep the line unmasked at the hardware level for all interrupts which are in the lazy disabled state. The only conditional which is interesting is the unmask path and that''s a simple optimization and not a correctness problem. Thanks, tglx _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-06 12:54 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Thomas Gleixner > Sent: Friday, May 06, 2011 6:00 PM > > On Fri, 6 May 2011, Tian, Kevin wrote: > > x86: don''t unmask disabled irqs when migrating them > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating it > > from offlined cpu to another, because it''s not expected to handle any > > instance of it. Current mask/set_affinity/unmask steps may trigger > > unexpected instance on disabled irq which then simply bug on when > > there is no handler for it. One failing example is observed in Xen. > > Xen pvops > > So there is no handler, why the heck is there an irq action? > > if (!irq_has_action(irq) .... > continue; > > Should have caught an uninitialized interrupt. If Xen abuses interrupts that way, > then it rightfully explodes. And we do not fix it by magic somewhere else.sorry that my bad description here. there does be a dummy handler registered on such irqs which simply throws out a BUG_ON when hit. I should just say such injection is not expected instead of no handler. :-)> > > guest marks a special type of irqs as disabled, which are simply used > > As I explained before several times, IRQF_DISABLED has absolutely nothing to > do with it and pvops _CANNOT_ mark an interrupt disabled.I have to admit that I need more study about whole interrupt sub-system, to better understand your explanation here. Also here again my description is not accurate enough. I meant that Xen pvops request the special irq with below flags: IRQF_DISABLED|IRQF_PERCPU|IRQF_NOBALANCING and then later explicitly disable it with disable_irq(). As you said that IRQF_DISABLED itself has nothing to do with it, and it''s the later disable_irq() which takes real effect because Xen event chip hooks this callback to mask the irq from the chip level.> > > > > chip = irq_data_get_irq_chip(data); > > - if (!irqd_can_move_in_process_context(data) && chip->irq_mask) > > + do_mask = !irqd_irq_disabled(data) && > > + !irqd_can_move_in_process_context(data) && chip->irq_mask; > > + if (do_mask) > > chip->irq_mask(data); > > This is completely wrong. irqd_irq_disabled() is a status information which does > not tell you whether the interrupt is actually masked at the hardware level > because we do lazy interrupt hardware masking. So your change would keep > the line unmasked at the hardware level for all interrupts which are in the lazy > disabled state.Got it.> > The only conditional which is interesting is the unmask path and that''s a simple > optimization and not a correctness problem. >So what''s your suggestion based on my updated information? Is there any interface I may take to differentiate above exception with normal case? Basically in Xen usage we want such irqs permanently disabled at the chip level. Or could we only do mask/unmask for irqs which are unmasked atm if as you said it''s just an optimization step? :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-06 13:14 UTC
RE: [Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Tian, Kevin > Sent: Friday, May 06, 2011 8:55 PM > > > From: Thomas Gleixner > > Sent: Friday, May 06, 2011 6:00 PM > > > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > x86: don''t unmask disabled irqs when migrating them > > > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating > > > it from offlined cpu to another, because it''s not expected to handle > > > any instance of it. Current mask/set_affinity/unmask steps may > > > trigger unexpected instance on disabled irq which then simply bug on > > > when there is no handler for it. One failing example is observed in Xen. > > > Xen pvops > > > > So there is no handler, why the heck is there an irq action? > > > > if (!irq_has_action(irq) .... > > continue; > > > > Should have caught an uninitialized interrupt. If Xen abuses > > interrupts that way, then it rightfully explodes. And we do not fix it by magic > somewhere else. > > sorry that my bad description here. there does be a dummy handler registered > on such irqs which simply throws out a BUG_ON when hit. I should just say such > injection is not expected instead of no handler. :-) > > > > > > guest marks a special type of irqs as disabled, which are simply > > > used > > > > As I explained before several times, IRQF_DISABLED has absolutely > > nothing to do with it and pvops _CANNOT_ mark an interrupt disabled. > > I have to admit that I need more study about whole interrupt sub-system, to > better understand your explanation here. Also here again my description is not > accurate enough. I meant that Xen pvops request the special irq with below > flags: > IRQF_DISABLED|IRQF_PERCPU|IRQF_NOBALANCING > and then later explicitly disable it with disable_irq(). As you said that > IRQF_DISABLED itself has nothing to do with it, and it''s the later disable_irq() > which takes real effect because Xen event chip hooks this callback to mask the > irq from the chip level. > > > > > > > > > chip = irq_data_get_irq_chip(data); > > > - if (!irqd_can_move_in_process_context(data) && chip->irq_mask) > > > + do_mask = !irqd_irq_disabled(data) && > > > + !irqd_can_move_in_process_context(data) && > chip->irq_mask; > > > + if (do_mask) > > > chip->irq_mask(data); > > > > This is completely wrong. irqd_irq_disabled() is a status information > > which does not tell you whether the interrupt is actually masked at > > the hardware level because we do lazy interrupt hardware masking. So > > your change would keep the line unmasked at the hardware level for all > > interrupts which are in the lazy disabled state. > > Got it. > > > > > The only conditional which is interesting is the unmask path and > > that''s a simple optimization and not a correctness problem. > > > > So what''s your suggestion based on my updated information? Is there any > interface I may take to differentiate above exception with normal case? > Basically in Xen usage we want such irqs permanently disabled at the chip level. > Or could we only do mask/unmask for irqs which are unmasked atm if as you > said it''s just an optimization step? :-) >After another thought, I just got myself messed. This is not required by Xen actually, as the 1st patch to handle IRQF_PERCPU has been enough to cover the special requirement. Could you ack the 1st one if you agree? This 2nd patch is cooked upon the question whether generally a irq disabled at chip level should be unmasked at migration. If there''s no action registered, it won''t be migrated. The only question is if there does be action registered, and this irq is disabled at chip level, will there be any issue to force unmask it at migration, or should we only mask/unmask for a irq which is unmasked at the migration? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thomas Gleixner
2011-May-06 13:24 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Fri, 6 May 2011, Tian, Kevin wrote:> > From: Thomas Gleixner > > Sent: Friday, May 06, 2011 6:00 PM > > > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > x86: don''t unmask disabled irqs when migrating them > > > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating it > > > from offlined cpu to another, because it''s not expected to handle any > > > instance of it. Current mask/set_affinity/unmask steps may trigger > > > unexpected instance on disabled irq which then simply bug on when > > > there is no handler for it. One failing example is observed in Xen. > > > Xen pvops > > > > So there is no handler, why the heck is there an irq action? > > > > if (!irq_has_action(irq) .... > > continue; > > > > Should have caught an uninitialized interrupt. If Xen abuses interrupts that way, > > then it rightfully explodes. And we do not fix it by magic somewhere else. > > sorry that my bad description here. there does be a dummy handler registered > on such irqs which simply throws out a BUG_ON when hit. I should just say such > injection is not expected instead of no handler. :-)So can please someone point me to that particular incarnation of nonsense and provide a reasonable explanation for this abuse? What is the point of an interrupt, which is permanently disabled, has a handler with a BUG() inside and an irqaction assigned ? What''s the purpose of this? Why is the irqaction there in the first place? To be called by some other weird means than by the irq handling code?> > The only conditional which is interesting is the unmask path and that''s a simple > > optimization and not a correctness problem. > > > > So what''s your suggestion based on my updated information? Is there any > interface I may take to differentiate above exception with normal case? Basically > in Xen usage we want such irqs permanently disabled at the chip level. Or > could we only do mask/unmask for irqs which are unmasked atm if as you said > it''s just an optimization step? :-)No we can make the unmask conditional on !irqd_irq_disabled() because that''s not violating any of the semantics. The interrupt would be masked anyway when it arrives and the handler code sees that it is lazy disabled. I mean real handler code, not the Xen abomination. The only valid reason why I''d apply that patch is that it avoids a potential extra interrupt, but not to prevent screwed up handlers from exploding. Thanks, tglx _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-May-06 14:04 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Fri, 2011-05-06 at 14:24 +0100, Thomas Gleixner wrote:> On Fri, 6 May 2011, Tian, Kevin wrote: > > > From: Thomas Gleixner > > > Sent: Friday, May 06, 2011 6:00 PM > > > > > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > > x86: don''t unmask disabled irqs when migrating them > > > > > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating it > > > > from offlined cpu to another, because it''s not expected to handle any > > > > instance of it. Current mask/set_affinity/unmask steps may trigger > > > > unexpected instance on disabled irq which then simply bug on when > > > > there is no handler for it. One failing example is observed in Xen. > > > > Xen pvops > > > > > > So there is no handler, why the heck is there an irq action? > > > > > > if (!irq_has_action(irq) .... > > > continue; > > > > > > Should have caught an uninitialized interrupt. If Xen abuses interrupts that way, > > > then it rightfully explodes. And we do not fix it by magic somewhere else. > > > > sorry that my bad description here. there does be a dummy handler registered > > on such irqs which simply throws out a BUG_ON when hit. I should just say such > > injection is not expected instead of no handler. :-) > > So can please someone point me to that particular incarnation of > nonsense and provide a reasonable explanation for this abuse? > > What is the point of an interrupt, which is permanently disabled, has > a handler with a BUG() inside and an irqaction assigned ? > > What''s the purpose of this? Why is the irqaction there in the first > place? To be called by some other weird means than by the irq > handling code?The Xen PV spinlock code (arch/x86/xen/spinlock.c) allocates an IRQ (per-cpu lock_kicker_irq). I think it is there purely in order to have the associated underlying evtchn to use as the thing to poll (Xen has an evtchn poll hypercall, see xen_poll_irq()) on the slow path and kick on release. There is never any need to call a handler for that evtchn -- just notifying the evtchn is enough to wake the poller. The irq is setup using request_irq(). Is there a different API to register an IRQ without attaching a handler/action to it? (I can''t think why such a thing would exist). I''m not really sure why these can''t just be an evtchn without an associated IRQ since it doesn''t really have any interrupt-like semantics. Perhaps just a general desire to keep event channels abstracted into the core Xen event subsystem with IRQs as the public facing API? Jeremy? Ian.> > > The only conditional which is interesting is the unmask path and that''s a simple > > > optimization and not a correctness problem. > > > > > > > So what''s your suggestion based on my updated information? Is there any > > interface I may take to differentiate above exception with normal case? Basically > > in Xen usage we want such irqs permanently disabled at the chip level. Or > > could we only do mask/unmask for irqs which are unmasked atm if as you said > > it''s just an optimization step? :-) > > No we can make the unmask conditional on !irqd_irq_disabled() because > that''s not violating any of the semantics. The interrupt would be > masked anyway when it arrives and the handler code sees that it is > lazy disabled. I mean real handler code, not the Xen abomination. > > The only valid reason why I''d apply that patch is that it avoids a > potential extra interrupt, but not to prevent screwed up handlers from > exploding. > > Thanks, > > tglx_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-06 14:28 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Fri, 6 May 2011, Tian, Kevin wrote:> > From: Thomas Gleixner > > Sent: Friday, May 06, 2011 6:00 PM > > > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > x86: don''t unmask disabled irqs when migrating them > > > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating it > > > from offlined cpu to another, because it''s not expected to handle any > > > instance of it. Current mask/set_affinity/unmask steps may trigger > > > unexpected instance on disabled irq which then simply bug on when > > > there is no handler for it. One failing example is observed in Xen. > > > Xen pvops > > > > So there is no handler, why the heck is there an irq action? > > > > if (!irq_has_action(irq) .... > > continue; > > > > Should have caught an uninitialized interrupt. If Xen abuses interrupts that way, > > then it rightfully explodes. And we do not fix it by magic somewhere else. > > sorry that my bad description here. there does be a dummy handler registered > on such irqs which simply throws out a BUG_ON when hit. I should just say such > injection is not expected instead of no handler. :-)I don''t think this patch is necessary anymore after the event channel handling cleanup patches I have just sent to the list. Could you please try the following two patches: http://marc.info/?l=linux-kernel&m=130468120032172&w=2 http://marc.info/?l=linux-kernel&m=130468178200468&w=2 and let me know if you still need this patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-06 21:43 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Friday, May 06, 2011 10:29 PM > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > From: Thomas Gleixner > > > Sent: Friday, May 06, 2011 6:00 PM > > > > > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > > x86: don''t unmask disabled irqs when migrating them > > > > > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating > > > > it from offlined cpu to another, because it''s not expected to > > > > handle any instance of it. Current mask/set_affinity/unmask steps > > > > may trigger unexpected instance on disabled irq which then simply > > > > bug on when there is no handler for it. One failing example is observed in > Xen. > > > > Xen pvops > > > > > > So there is no handler, why the heck is there an irq action? > > > > > > if (!irq_has_action(irq) .... > > > continue; > > > > > > Should have caught an uninitialized interrupt. If Xen abuses > > > interrupts that way, then it rightfully explodes. And we do not fix it by magic > somewhere else. > > > > sorry that my bad description here. there does be a dummy handler > > registered on such irqs which simply throws out a BUG_ON when hit. I > > should just say such injection is not expected instead of no handler. > > :-) > > I don''t think this patch is necessary anymore after the event channel handling > cleanup patches I have just sent to the list. > Could you please try the following two patches: > > http://marc.info/?l=linux-kernel&m=130468120032172&w=2 > http://marc.info/?l=linux-kernel&m=130468178200468&w=2 > > and let me know if you still need this patch?thanks, and I''ll take a look at them. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-May-08 01:44 UTC
[Xen-devel] Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On 05/07/2011 12:04 AM, Ian Campbell wrote:> I''m not really sure why these can''t just be an evtchn without an > associated IRQ since it doesn''t really have any interrupt-like > semantics. Perhaps just a general desire to keep event channels > abstracted into the core Xen event subsystem with IRQs as the public > facing API? Jeremy?It doesn''t really need to be an irq. The main reason was so that it would appear in /proc/interrupts so I could use the counter as a "number of times a spinlock was kicked" counter. That could be exposed in some other way if being part of the interrupt infrastructure brings too much baggage with it. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-09 00:44 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Sunday, May 08, 2011 9:45 AM > > On 05/07/2011 12:04 AM, Ian Campbell wrote: > > I'm not really sure why these can't just be an evtchn without an > > associated IRQ since it doesn't really have any interrupt-like > > semantics. Perhaps just a general desire to keep event channels > > abstracted into the core Xen event subsystem with IRQs as the public > > facing API? Jeremy? > > It doesn't really need to be an irq. The main reason was so that it would > appear in /proc/interrupts so I could use the counter as a "number of times a > spinlock was kicked" counter. That could be exposed in some other way if > being part of the interrupt infrastructure brings too much baggage with it. >Perhaps we don't need an irq binding here. Just like a local APIC interrupt source which only needs vector. Somehow the virq or vipi concept in Xen context is similar. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-May-09 01:45 UTC
[Xen-devel] Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On 05/09/2011 10:44 AM, Tian, Kevin wrote:>> It doesn''t really need to be an irq. The main reason was so that it would >> appear in /proc/interrupts so I could use the counter as a "number of times a >> spinlock was kicked" counter. That could be exposed in some other way if >> being part of the interrupt infrastructure brings too much baggage with it. >> > Perhaps we don''t need an irq binding here. Just like a local APIC interrupt > source which only needs vector. Somehow the virq or vipi concept in Xen > context is similar.An event channel is logically equivalent to a vector, so that would make sense. We currently allocate irqs for cross-cpu call and reschedule event channels, whereas native x86 simply uses a naked vector for those. But they are real interrupts, so an irq at least makes some logical sense in those cases. For spinlocks, the event channel is more like a usermode-level signal which is always blocked and only ever tested with sigsuspend (or is it sigpoll? something like that). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-09 02:11 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Friday, May 06, 2011 10:29 PM > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > From: Thomas Gleixner > > > Sent: Friday, May 06, 2011 6:00 PM > > > > > > On Fri, 6 May 2011, Tian, Kevin wrote: > > > > x86: don''t unmask disabled irqs when migrating them > > > > > > > > it doesn''t make sense to mask/unmask a disabled irq when migrating > > > > it from offlined cpu to another, because it''s not expected to > > > > handle any instance of it. Current mask/set_affinity/unmask steps > > > > may trigger unexpected instance on disabled irq which then simply > > > > bug on when there is no handler for it. One failing example is observed in > Xen. > > > > Xen pvops > > > > > > So there is no handler, why the heck is there an irq action? > > > > > > if (!irq_has_action(irq) .... > > > continue; > > > > > > Should have caught an uninitialized interrupt. If Xen abuses > > > interrupts that way, then it rightfully explodes. And we do not fix it by magic > somewhere else. > > > > sorry that my bad description here. there does be a dummy handler > > registered on such irqs which simply throws out a BUG_ON when hit. I > > should just say such injection is not expected instead of no handler. > > :-) > > I don''t think this patch is necessary anymore after the event channel handling > cleanup patches I have just sent to the list. > Could you please try the following two patches: > > http://marc.info/?l=linux-kernel&m=130468120032172&w=2 > http://marc.info/?l=linux-kernel&m=130468178200468&w=2 > > and let me know if you still need this patch?yes, with your patch this issue disappears, since you explicitly make mask/unmask as a nop for xen_percpu_chip, which effectively avoids them from undesired unmask when doing the migration. Though it works, it''s not intuitive as to me it''s an workaround to make Xen chip implementation adapting to specific fixup_irqs logic. If the logic within fixup_irqs is changed in the future, it''s still possible to expose this issue again. If we explicitly avoid the percpu irqs in the 1st place in fixup_irqs like what my 1st patch does, it''d be clearer. But based on your referred patchesand Jeremy''s info about possible unbinding xenn event from irq namespace, my patches are not that necessary now as before. So I''ll leave to Thomas to decide whether he wants to take my 1st patch. the 2nd one talked in this thread could be dropped. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-09 12:02 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Mon, 9 May 2011, Tian, Kevin wrote:> yes, with your patch this issue disappears, since you explicitly make mask/unmask as > a nop for xen_percpu_chip, which effectively avoids them from undesired unmask > when doing the migration. Though it works, it''s not intuitive as to me it''s an > workaround to make Xen chip implementation adapting to specific fixup_irqs logic.I have been tring to follow the example of existing supported drivers. The only x86 driver I could find that uses handle_percpu_irq is uv_irq that does exatly the same thing. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thomas Gleixner
2011-May-09 12:36 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Mon, 9 May 2011, Stefano Stabellini wrote:> On Mon, 9 May 2011, Tian, Kevin wrote: > > yes, with your patch this issue disappears, since you explicitly make mask/unmask as > > a nop for xen_percpu_chip, which effectively avoids them from undesired unmask > > when doing the migration. Though it works, it''s not intuitive as to me it''s an > > workaround to make Xen chip implementation adapting to specific fixup_irqs logic. > > I have been tring to follow the example of existing supported drivers. > The only x86 driver I could find that uses handle_percpu_irq is uv_irq > that does exatly the same thing.Which is a good enough argument to make that change at the common code level instead of having fancy workarounds here and there. Thanks, tglx _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-10 03:24 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Thomas Gleixner [mailto:tglx@linutronix.de] > Sent: Monday, May 09, 2011 8:37 PM > > On Mon, 9 May 2011, Stefano Stabellini wrote: > > > On Mon, 9 May 2011, Tian, Kevin wrote: > > > yes, with your patch this issue disappears, since you explicitly > > > make mask/unmask as a nop for xen_percpu_chip, which effectively > > > avoids them from undesired unmask when doing the migration. Though > > > it works, it''s not intuitive as to me it''s an workaround to make Xen chip > implementation adapting to specific fixup_irqs logic. > > > > I have been tring to follow the example of existing supported drivers. > > The only x86 driver I could find that uses handle_percpu_irq is uv_irq > > that does exatly the same thing. > > Which is a good enough argument to make that change at the common code > level instead of having fancy workarounds here and there. >So Thomas, what''s your suggestion to continue here? Is my original patch to skip percpu irq in common code a good option to go, or you want a cleaner code in other form? Once it''s clear I''ll discuss with Stefano e.g. possibly merge with his cleanup patch series. :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-May-18 23:49 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
> From: Tian, Kevin > Sent: Tuesday, May 10, 2011 11:24 AM > > > From: Thomas Gleixner [mailto:tglx@linutronix.de] > > Sent: Monday, May 09, 2011 8:37 PM > > > > On Mon, 9 May 2011, Stefano Stabellini wrote: > > > > > On Mon, 9 May 2011, Tian, Kevin wrote: > > > > yes, with your patch this issue disappears, since you explicitly > > > > make mask/unmask as a nop for xen_percpu_chip, which effectively > > > > avoids them from undesired unmask when doing the migration. Though > > > > it works, it''s not intuitive as to me it''s an workaround to make > > > > Xen chip > > implementation adapting to specific fixup_irqs logic. > > > > > > I have been tring to follow the example of existing supported drivers. > > > The only x86 driver I could find that uses handle_percpu_irq is > > > uv_irq that does exatly the same thing. > > > > Which is a good enough argument to make that change at the common code > > level instead of having fancy workarounds here and there. > > > > So Thomas, what''s your suggestion to continue here? Is my original patch to > skip percpu irq in common code a good option to go, or you want a cleaner code > in other form? Once it''s clear I''ll discuss with Stefano e.g. possibly merge with > his cleanup patch series. :-) >Hi, Thomas, any response on this? Sorry that you may explain your comments clearly in earlier thread, but I may not catch all of them in one place. I sent out two fixes here: [1/2] don''t move irq when it''s percpu type [2/2] don''t unmask irq when it''s disabled at irqchip level for [2/2], as you explained it''s legitimate to mask/unmask a disabled irq since from irqchip level mask/disable actually means same thing. The only possible gain to do that is to avoid a potential extra interrupt, which is a different purpose as what I originally target. So for [2/2] I think it''s not required now. for [1/2] I think it''s still necessary as it''s meaningless to migrate a percpu type irq. However Stefano has sent out a cleanup patch for Xen percpu irqchip which uses nop mask/unmask hack borrowed from uv machine to work around the issue. As you suggested it''s better to consolidate into the common place instead of scattering in different places. My view on this common logic is what [1/2] tries to address, is it correct? If yes, would you consider taking this patch? Stefano told me that his patches will go in in next merge window. So I think either you can take [1/2] now and then I''ll do cleanup after Stefano''s patch is in, or I can rebase my [1/2] after Stefano''s patch to clean both xen and uv parts. Let me know your thought. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-May-19 12:08 UTC
[Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Thu, 19 May 2011, Tian, Kevin wrote:> for [1/2] I think it''s still necessary as it''s meaningless to migrate a percpu type irq. > However Stefano has sent out a cleanup patch for Xen percpu irqchip which uses > nop mask/unmask hack borrowed from uv machine to work around the issue. As > you suggested it''s better to consolidate into the common place instead of scattering > in different places. My view on this common logic is what [1/2] tries to address, is > it correct? If yes, would you consider taking this patch? Stefano told me that his > patches will go in in next merge window. So I think either you can take [1/2] now and > then I''ll do cleanup after Stefano''s patch is in, or I can rebase my [1/2] after Stefano''s > patch to clean both xen and uv parts.Actually I think Kevin''s generic patch is better too. If you ack it I''ll remove my patch right away from the queue (maybe I should remove it anyway?). Kevin probably needs to write a cleanup patch to remove the equivalent hack from the uv_irq. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-19 16:18 UTC
Re: [Xen-devel] RE: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them
On Thu, May 19, 2011 at 01:08:07PM +0100, Stefano Stabellini wrote:> On Thu, 19 May 2011, Tian, Kevin wrote: > > for [1/2] I think it''s still necessary as it''s meaningless to migrate a percpu type irq. > > However Stefano has sent out a cleanup patch for Xen percpu irqchip which uses > > nop mask/unmask hack borrowed from uv machine to work around the issue. As > > you suggested it''s better to consolidate into the common place instead of scattering > > in different places. My view on this common logic is what [1/2] tries to address, is > > it correct? If yes, would you consider taking this patch? Stefano told me that his > > patches will go in in next merge window. So I think either you can take [1/2] now and > > then I''ll do cleanup after Stefano''s patch is in, or I can rebase my [1/2] after Stefano''s > > patch to clean both xen and uv parts. > > Actually I think Kevin''s generic patch is better too. > If you ack it I''ll remove my patch right away from the queue (maybe I > should remove it anyway?).I dropped your patch. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jonathan Nieder
2011-Aug-29 04:15 UTC
[Xen-devel] [regression] Ideapad S10-3 does not wake up from suspend (Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them)
Hi, Lars Boegild Thomsen writes[1]:> After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up after > sleep. Back to latest 2.6 kernel works fine.[...]> Upon wakeup, the power light go from slow flashing to on, the battery light > goes from off to on, the hdd light blink once and then everything is dead. > Nothing happens on the screen, all keys dead. The fan/hdd switch on > physically (very hard to hear on this model or I am getting deaf). > Ctrl+alt+del or the alt+sysreq is non-responsive. The only LED that show > keyboard status is CAPS lock and that is unresponsive too. Only way I have > found to get it rebooted is holding down the power button a few secs until it > switch physically off and then switch it on again.[...]> Here''s the result of the final bisect: > > 983bbf1af0664b78689612b247acb514300f62c7 is the first bad commit[...]> I also tried to go back to HEAD and manually change arch/x86/irq.c revert this > particular commit and it works.For reference:> commit 983bbf1af0664b78689612b247acb514300f62c7 > Author: Tian, Kevin <kevin.tian@intel.com> > Date: Fri May 6 14:43:56 2011 +0800 > > x86: Don''t unmask disabled irqs when migrating them > > It doesn''t make sense to unconditionally unmask a disabled irq when > migrating it from offlined cpu to another. If the irq triggers then it > will be disabled in the interrupt handler anyway. So we can just avoid > unmasking it. > > [ tglx: Made masking unconditional again and fixed the changelog ] > > Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com> > Signed-off-by: Kevin Tian <kevin.tian@intel.com> > Cc: Ian Campbell <Ian.Campbell@citrix.com> > Cc: Jan Beulich <JBeulich@novell.com> > Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com> > Link: http://lkml.kernel.org/r/%3C625BA99ED14B2D499DC4E29D8138F1505C8ED7F7E3%40shsmsx502.ccr.corp.intel.com%3 > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c > index 544efe2741be..6c0802eb2f7f 100644 > --- a/arch/x86/kernel/irq.c > +++ b/arch/x86/kernel/irq.c > @@ -276,7 +276,8 @@ void fixup_irqs(void) > else if (!(warned++)) > set_affinity = 0; > > - if (!irqd_can_move_in_process_context(data) && chip->irq_unmask) > + if (!irqd_can_move_in_process_context(data) && > + !irqd_irq_disabled(data) && chip->irq_unmask) > chip->irq_unmask(data); > > raw_spin_unlock(&desc->lock);Known problem? Ideas? Regards, Jonathan [1] http://bugs.debian.org/635575 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave Hansen
2011-Aug-31 01:04 UTC
[Xen-devel] Re: [regression] Ideapad S10-3 does not wake up from suspend (Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them)
On Sun, 2011-08-28 at 23:15 -0500, Jonathan Nieder wrote:> Lars Boegild Thomsen writes[1]: > > After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up after > > sleep. Back to latest 2.6 kernel works fine. > [...] > > Upon wakeup, the power light go from slow flashing to on, the battery light > > goes from off to on, the hdd light blink once and then everything is dead. > > Nothing happens on the screen, all keys dead. The fan/hdd switch on > > physically (very hard to hear on this model or I am getting deaf). > > Ctrl+alt+del or the alt+sysreq is non-responsive. The only LED that show > > keyboard status is CAPS lock and that is unresponsive too. Only way I have > > found to get it rebooted is holding down the power button a few secs until it > > switch physically off and then switch it on again. > [...] > > Here''s the result of the final bisect: > > > > 983bbf1af0664b78689612b247acb514300f62c7 is the first bad commit > [...] > > I also tried to go back to HEAD and manually change arch/x86/irq.c revert this > > particular commit and it works. > > For reference: > > > commit 983bbf1af0664b78689612b247acb514300f62c7 > > Author: Tian, Kevin <kevin.tian@intel.com> > > Date: Fri May 6 14:43:56 2011 +0800 > > > > x86: Don''t unmask disabled irqs when migrating them > > > > It doesn''t make sense to unconditionally unmask a disabled irq when > > migrating it from offlined cpu to another. If the irq triggers then it > > will be disabled in the interrupt handler anyway. So we can just avoid > > unmasking it. > > > > [ tglx: Made masking unconditional again and fixed the changelog ] > > > > Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com> > > Signed-off-by: Kevin Tian <kevin.tian@intel.com> > > Cc: Ian Campbell <Ian.Campbell@citrix.com> > > Cc: Jan Beulich <JBeulich@novell.com> > > Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com> > > Link: http://lkml.kernel.org/r/%3C625BA99ED14B2D499DC4E29D8138F1505C8ED7F7E3%40shsmsx502.ccr.corp.intel.com%3 > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > > > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c > > index 544efe2741be..6c0802eb2f7f 100644 > > --- a/arch/x86/kernel/irq.c > > +++ b/arch/x86/kernel/irq.c > > @@ -276,7 +276,8 @@ void fixup_irqs(void) > > else if (!(warned++)) > > set_affinity = 0; > > > > - if (!irqd_can_move_in_process_context(data) && chip->irq_unmask) > > + if (!irqd_can_move_in_process_context(data) && > > + !irqd_irq_disabled(data) && chip->irq_unmask) > > chip->irq_unmask(data); > > > > raw_spin_unlock(&desc->lock); > > Known problem? Ideas? > [1] http://bugs.debian.org/635575cc''ing Len Brown who tried to fix this, but in different code: http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4731fdcf6f7bdab3e369a3f844d4ea4d4017284d I''m seeing the exact same symptoms on my S10-3, fwiw. They definitely don''t happen when intel_idle is compiled out or when intel_idle.max_cstate=0 is specified on the kernel command-line. -- Dave _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jonathan Nieder
2011-Aug-31 08:22 UTC
[Xen-devel] Re: [regression] Ideapad S10-3 does not wake up from suspend (Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them)
Hi Dave, Dave Hansen wrote:> I''m seeing the exact same symptoms on my S10-3, fwiw. They definitely > don''t happen when intel_idle is compiled out or when > intel_idle.max_cstate=0 is specified on the kernel command-line.Lars reminds me[1] that the kernel he was testing already had the intel_idle driver compiled out, so you''re probably seeing a different (possibly related) bug. (By the way, Lars, it is fine to communicate with lkml directly. :) The people there don''t bite.) Over and out, Jonathan [1] http://bugs.debian.org/635575#42 (.config attached to that bug in a later comment) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2011-Sep-01 06:24 UTC
[Xen-devel] RE: [regression] Ideapad S10-3 does not wake up from suspend (Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them)
> From: Jonathan Nieder [mailto:jrnieder@gmail.com] > Sent: Monday, August 29, 2011 12:16 PM > > Hi, > > Lars Boegild Thomsen writes[1]: > > > After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up after > > sleep. Back to latest 2.6 kernel works fine. > [...] > > Upon wakeup, the power light go from slow flashing to on, the battery light > > goes from off to on, the hdd light blink once and then everything is dead. > > Nothing happens on the screen, all keys dead. The fan/hdd switch on > > physically (very hard to hear on this model or I am getting deaf). > > Ctrl+alt+del or the alt+sysreq is non-responsive. The only LED that show > > keyboard status is CAPS lock and that is unresponsive too. Only way I have > > found to get it rebooted is holding down the power button a few secs until it > > switch physically off and then switch it on again. > [...] > > Here''s the result of the final bisect: > > > > 983bbf1af0664b78689612b247acb514300f62c7 is the first bad commit > [...] > > I also tried to go back to HEAD and manually change arch/x86/irq.c revert this > > particular commit and it works. > > For reference: > > > commit 983bbf1af0664b78689612b247acb514300f62c7 > > Author: Tian, Kevin <kevin.tian@intel.com> > > Date: Fri May 6 14:43:56 2011 +0800 > > > > x86: Don''t unmask disabled irqs when migrating them > > > > It doesn''t make sense to unconditionally unmask a disabled irq when > > migrating it from offlined cpu to another. If the irq triggers then it > > will be disabled in the interrupt handler anyway. So we can just avoid > > unmasking it. > > > > [ tglx: Made masking unconditional again and fixed the changelog ] > > > > Signed-off-by: Fengzhe Zhang <fengzhe.zhang@intel.com> > > Signed-off-by: Kevin Tian <kevin.tian@intel.com> > > Cc: Ian Campbell <Ian.Campbell@citrix.com> > > Cc: Jan Beulich <JBeulich@novell.com> > > Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com> > > Link: > http://lkml.kernel.org/r/%3C625BA99ED14B2D499DC4E29D8138F1505C8ED7F > 7E3%40shsmsx502.ccr.corp.intel.com%3 > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > > > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c > > index 544efe2741be..6c0802eb2f7f 100644 > > --- a/arch/x86/kernel/irq.c > > +++ b/arch/x86/kernel/irq.c > > @@ -276,7 +276,8 @@ void fixup_irqs(void) > > else if (!(warned++)) > > set_affinity = 0; > > > > - if (!irqd_can_move_in_process_context(data) && chip->irq_unmask) > > + if (!irqd_can_move_in_process_context(data) && > > + !irqd_irq_disabled(data) && chip->irq_unmask) > > chip->irq_unmask(data); > > > > raw_spin_unlock(&desc->lock); > > Known problem? Ideas? >this is the 1st problem reported on this change. But honestly speaking I didn''t see obvious problem with it. As the commit msg says, it simply completes a delayed irq disable action, by keeping interrupt line masked, instead of relying on a future interrupt handler to actually mask it. fixup_irqs is invoked in suspend path. The only impact this change may bring to resume path is the interrupt line state, which is saved later in suspend and then restored in resume. w/ above change after resume given interrupt line is always masked, while w/o it there may be at least one interrupt raising. If this does matter to make your ideapad working, I''d think there may have other bugs which are hidden originally, e.g. by triggering a reschedule from that interrupt though the handler itself does nothing except masking the interrupt line. So... above commit is not important which can be easily reverted. My only concern is whether other severe issues are just hidden. btw, any serial output you may capture? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Serge E. Hallyn
2011-Sep-02 03:01 UTC
[Xen-devel] Re: [regression] Ideapad S10-3 does not wake up from suspend (Re: [PATCH v2 2/2] x86: don''t unmask disabled irqs when migrating them)
Quoting Jonathan Nieder (jrnieder@gmail.com):> Hi Dave, > > Dave Hansen wrote: > > > I''m seeing the exact same symptoms on my S10-3, fwiw. They definitely > > don''t happen when intel_idle is compiled out or when > > intel_idle.max_cstate=0 is specified on the kernel command-line. > > Lars reminds me[1] that the kernel he was testing already had the > intel_idle driver compiled out, so you''re probably seeing a different > (possibly related) bug. (By the way, Lars, it is fine to communicate > with lkml directly. :) The people there don''t bite.)Right, over the past year or so this has been hit or miss. Sometimes intel_idle.max_cstate=0 would fix it. Sometimes (like today) not. Ever since the problems have started, using intel_idle has not worked once, but disabling it is not 100% reliable. And when it doesn''t work, it doesn''t work at all until I get a new kernel. Sorry, not very helpful. -serge _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jonathan Nieder
2012-Apr-15 14:06 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
Hi, Quick summary and update.> Lars Boegild Thomsen writes[1]:>> After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up after >> sleep.[...]>> 983bbf1af0664b78689612b247acb514300f62c7 is the first bad commit983bbf1af06 is "x86: Don''t unmask disabled irqs when migrating them", 2011-05-06, and looks like this:>> --- a/arch/x86/kernel/irq.c >> +++ b/arch/x86/kernel/irq.c >> @@ -276,7 +276,8 @@ void fixup_irqs(void) >> else if (!(warned++)) >> set_affinity = 0; >> >> - if (!irqd_can_move_in_process_context(data) && chip->irq_unmask) >> + if (!irqd_can_move_in_process_context(data) && >> + !irqd_irq_disabled(data) && chip->irq_unmask) >> chip->irq_unmask(data);Robert Scott found[1], using 3.2.12:> I''m getting the same behaviour on my Lenovo Ideapad S10-3An anonymous contributor[2] also reports the same problem in v3.3. Lars, Robert, anon: can you try 3.4-rc2 or newer and let us know how it goes? I suspect v3.4-rc2~24^2~4 ("x86: Preserve lazy irq disable semantics in fixup_irqs()") will fix this. Liu Chuansheng et al: do you think that commit would be a good candidate for inclusion in -stable kernels? Thanks and hope that helps, Jonathan> [1] http://bugs.debian.org/635575[2] https://bugzilla.kernel.org/show_bug.cgi?id=41932
Robert Scott
2012-Apr-16 18:05 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
On Sunday 15 April 2012, Jonathan Nieder wrote:> Lars, Robert, anon: can you try 3.4-rc2 or newer and let us know how > it goes? I suspect v3.4-rc2~24^2~4 ("x86: Preserve lazy irq disable > semantics in fixup_irqs()") will fix this.Hmm, no, 3.4-rc2 seems to produce the same results I''m afraid. robert.
Jonathan Nieder
2012-Apr-17 02:04 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
Robert Scott wrote:> On Sunday 15 April 2012, Jonathan Nieder wrote:>> Lars, Robert, anon: can you try 3.4-rc2 or newer and let us know how >> it goes? I suspect v3.4-rc2~24^2~4 ("x86: Preserve lazy irq disable >> semantics in fixup_irqs()") will fix this. > > Hmm, no, 3.4-rc2 seems to produce the same results I''m afraid.Alas. Does suspend-to-disk ("echo disk >/sys/power/state") have the same problem? Can you get a log of the failure with "no_console_suspend" appended to the kernel command line, for example with a serial console or netconsole? If someone has time to go through the steps in "Documentation/power/basic-pm-debugging.txt", that would also be useful. Thanks, and sorry to take so long to get to this. Sincerely, Jonathan
Lars Boegild Thomsen
2012-Apr-18 10:03 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
On Tuesday 17 April 2012 10:04:33 Jonathan Nieder wrote:> > Hmm, no, 3.4-rc2 seems to produce the same results I''m afraid. > Alas. Does suspend-to-disk ("echo disk >/sys/power/state") have the > same problem? Can you get a log of the failure with > "no_console_suspend" appended to the kernel command line, for example > with a serial console or netconsole?Using a serial console is a bit of a problem on this netbook as it hasn''t got a serial port. Not sure how the netconsole works - will read up on that. And no - suspend to disk works fine. I have by the way experimentally switched to 32bit version and it''s the same thing (originally I was running 64bit version but that really doesn''t make sense on a netbook - except doing it because it could be done). //Lars...
Robert Scott
2012-Apr-21 13:14 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
On Tuesday 17 April 2012, Jonathan Nieder wrote:> Robert Scott wrote: > > On Sunday 15 April 2012, Jonathan Nieder wrote: > > >> Lars, Robert, anon: can you try 3.4-rc2 or newer and let us know how > >> it goes? I suspect v3.4-rc2~24^2~4 ("x86: Preserve lazy irq disable > >> semantics in fixup_irqs()") will fix this. > > > > Hmm, no, 3.4-rc2 seems to produce the same results I''m afraid. > > Alas. Does suspend-to-disk ("echo disk >/sys/power/state") have the > same problem? Can you get a log of the failure with > "no_console_suspend" appended to the kernel command line, for example > with a serial console or netconsole?(back on 3.2.0-0.bpo.2-686-pae 3.2.12-1~bpo60+1*:) eth0 seems to go down/come up to early/late to get anything useful from netconsole: [ 745.161322] PM: Syncing filesystems ... done. [ 747.088247] PM: Preparing system for mem sleep [ 747.187932] Freezing user space processes ... (elapsed 0.01 seconds) done. [ 747.204325] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. [ 747.220416] PM: Entering mem sleep [ 747.221085] sd 1:0:0:0: [sda] Synchronizing SCSI cache [ 747.222247] sd 1:0:0:0: [sda] Stopping disk (then nothing) I may be able to hook up a usb-serial adaptor to try and get more info but it''ll take me a bit longer what with all the rewiring fun as I don''t have a null modem cable lying around.> If someone has time to go through the steps in > "Documentation/power/basic-pm-debugging.txt", that would also be useful."freezer", "devices", "platform", "processors", or "core" pm_test modes all work fine, naturally. robert. * I notice I accidentally copy/pasted the linux-modules version in a previous mail but you knew what I was talking about
Pasi Kärkkäinen
2012-Apr-22 16:34 UTC
Re: [Xen-devel] [regression] Ideapad S10-3 does not wake up from suspend
On Wed, Apr 18, 2012 at 06:03:00PM +0800, Lars Boegild Thomsen wrote:> On Tuesday 17 April 2012 10:04:33 Jonathan Nieder wrote: > > > Hmm, no, 3.4-rc2 seems to produce the same results I''m afraid. > > Alas. Does suspend-to-disk ("echo disk >/sys/power/state") have the > > same problem? Can you get a log of the failure with > > "no_console_suspend" appended to the kernel command line, for example > > with a serial console or netconsole? > > Using a serial console is a bit of a problem on this netbook as it hasn''t got > a serial port. Not sure how the netconsole works - will read up on that. >You can get an ExpressCard Serial Card and use that.. or if the laptop has vPro then it has Serial Over LAN in the AMT chip. http://wiki.xen.org/wiki/Xen_Serial_Console -- Pasi> And no - suspend to disk works fine. > > I have by the way experimentally switched to 32bit version and it''s the same > thing (originally I was running 64bit version but that really doesn''t make > sense on a netbook - except doing it because it could be done). > > //Lars... > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Robert Scott
2012-May-06 12:44 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
On Tuesday 17 April 2012, Jonathan Nieder wrote:> Robert Scott wrote: > > On Sunday 15 April 2012, Jonathan Nieder wrote: > > >> Lars, Robert, anon: can you try 3.4-rc2 or newer and let us know how > >> it goes? I suspect v3.4-rc2~24^2~4 ("x86: Preserve lazy irq disable > >> semantics in fixup_irqs()") will fix this. > > > > Hmm, no, 3.4-rc2 seems to produce the same results I''m afraid. > > Alas. Does suspend-to-disk ("echo disk >/sys/power/state") have the > same problem? Can you get a log of the failure with > "no_console_suspend" appended to the kernel command line, for example > with a serial console or netconsole?Well, for what it''s worth, using a USB serial console to capture output, all you get is: [ 814.016541] PM: Syncing filesystems ... done. [ 814.018516] PM: Preparing system for mem sleep [ 814.100393] Freezing user space processes ... (elapsed 0.01 se before it goes to sleep, and it doesn''t output anything on (attempted) wakeup. Though I don''t know if this is due to USB being woken up quite late or something. robert.
Jonathan Nieder
2012-May-12 23:13 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
Hi again, In September, Kevin Tian wrote:>> Lars Boegild Thomsen writes[1]:>>> After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up after >>> sleep. Back to latest 2.6 kernel works fine. > > [...] >>> Upon wakeup, the power light go from slow flashing to on, the battery light >>> goes from off to on, the hdd light blink once and then everything is dead.[...]>>> 983bbf1af0664b78689612b247acb514300f62c7 is the first bad commit >> [...] >>> I also tried to go back to HEAD and manually change arch/x86/irq.c revert this >>> particular commit and it works.[...]> fixup_irqs is invoked in suspend path. The only impact this change may > bring to resume path is the interrupt line state, which is saved later > in suspend and then restored in resume. w/ above change after resume > given interrupt line is always masked, while w/o it there may be at least > one interrupt raising. If this does matter to make your ideapad working, > I''d think there may have other bugs which are hidden originally, e.g. by > triggering a reschedule from that interrupt though the handler itself > does nothing except masking the interrupt line. > > So... above commit is not important which can be easily reverted. My > only concern is whether other severe issues are just hidden. > > btw, any serial output you may capture?Sorry for the slow response. Result from reporters is: - 3.4-rc2 is affected as well - this only affects suspend-to-RAM --- suspend-to-disk works fine - all five pm_test modes for suspend-to-RAM work fine - this netbook doesn''t have a serial port. Netconsole gives: | [ 745.161322] PM: Syncing filesystems ... done. | [ 747.088247] PM: Preparing system for mem sleep | [ 747.187932] Freezing user space processes ... (elapsed 0.01 seconds) done. | [ 747.204325] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. | [ 747.220416] PM: Entering mem sleep | [ 747.221085] sd 1:0:0:0: [sda] Synchronizing SCSI cache | [ 747.222247] sd 1:0:0:0: [sda] Stopping disk | | (then nothing) Serial console via a USB-to-serial converter gives: | [ 814.016541] PM: Syncing filesystems ... done. | [ 814.018516] PM: Preparing system for mem sleep | [ 814.100393] Freezing user space processes ... (elapsed 0.01 se | | before it goes to sleep, and it doesn''t output anything on (attempted) wakeup. - passing parameters "hpet=disable highres=off nohz=off" helps some people if I understand correctly, but I might have misunderstood.[2] I''d be interested to hear whether the same problem occurs when trying to suspend from the minimal initramfs environment. (On systems like Debian that use initramfs-tools, that means passing the kernel command line parameter "break=top", booting, loading some appropriate minimal collection of modules --- maybe none ---, and then running "echo mem >/sys/power/state". initramfs-tools(8) has details.) Hope that helps, Jonathan>> [1] http://bugs.debian.org/635575[2] https://bugzilla.kernel.org/show_bug.cgi?id=41932
Lars Boegild Thomsen
2012-May-13 01:22 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
On Sunday 13 May 2012 07:13:49 Jonathan Nieder wrote:> In September, Kevin Tian wrote: > >> Lars Boegild Thomsen writes[1]: > >>> After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up > >>> after sleep. Back to latest 2.6 kernel works fine. > - passing parameters "hpet=disable highres=off nohz=off" helps some > people if I understand correctly, but I might have misunderstood.[2]I didn''t notice this one before but that actually works for me. Adding those kernel params and sleep works again. I tried combinations thereof but no-go - all 3 required.> I''d be interested to hear whether the same problem occurs when trying > to suspend from the minimal initramfs environment. (On systems like > Debian that use initramfs-tools, that means passing the kernel command > line parameter "break=top", booting, loading some appropriate minimal > collection of modules --- maybe none ---, and then running > "echo mem >/sys/power/state". initramfs-tools(8) has details.)And I just tried this loading no modules manually and it''s the same - never wakes up after sleep. //Lars...
Jonathan Nieder
2012-Jul-15 23:24 UTC
Re: [regression] Ideapad S10-3 does not wake up from suspend
Hi again, In May, Lars Boegild Thomsen wrote:> On Sunday 13 May 2012 07:13:49 Jonathan Nieder wrote: >>>> Lars Boegild Thomsen writes:>>>>> After update from 2.6 kernel to 3.0 my Idepad S10-3 will not wake up >>>>> after sleep. Back to latest 2.6 kernel works fine. >> >> - passing parameters "hpet=disable highres=off nohz=off" helps some >> people if I understand correctly,[...]> I didn''t notice this one before but that actually works for me. Adding those > kernel params and sleep works again. I tried combinations thereof but no-go - > all 3 required. > >> I''d be interested to hear whether the same problem occurs when trying >> to suspend from the minimal initramfs environment.[...]> And I just tried this loading no modules manually and it''s the same - never > wakes up after sleep.Thanks. Please test with the debugging patch below from Feng Tang[1]. I don''t know if it will get any useful information because e.g. serial console and netconsole are not very convenient on this machine[2], but it seems worth a try. If one of the people cc-ed needs help building a patched kernel to test, feel free to write privately and I can give more detailed instructions. Hope that helps, Jonathan [1] https://bugzilla.kernel.org/show_bug.cgi?id=41932#c18 [2] http://thread.gmane.org/gmane.linux.kernel/1136253/focus=1285790 diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index f113755..d2e0c90 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -439,6 +439,14 @@ again: * in the event mask */ if (next_event.tv64 != KTIME_MAX) { + s64 delta; + + delta = next_event.tv64 - now.tv64; + if (delta >= 10000000000) { + printk("%s(): The delta is big: %lld\n", __func__, delta); + next_event.tv64 = now.tv64 + 3000000000; + } + /* * Rearm the broadcast device. If event expired, * repeat the above