Jan Beulich
2013-Mar-18 11:12 UTC
[PATCH] x86/HPET: mask interrupt while changing affinity
While being unable to reproduce the "No irq handler for vector ..." messages observed on other systems, the change done by 5dc3fd2 (''x86: extend diagnostics for "No irq handler for vector" messages'') appears to point at the lack of masking - at least I can''t see what else might be wrong with the HPET MSI code that could trigger these warnings. While at it, also adjust the message printed by aforementioned commit to not pointlessly insert spaces - we don''t need aligned tabular output here. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -466,7 +466,9 @@ static void set_channel_irq_affinity(con ASSERT(!local_irq_is_enabled()); spin_lock(&desc->lock); + hpet_msi_mask(desc); hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); + hpet_msi_unmask(desc); spin_unlock(&desc->lock); } --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -826,7 +826,7 @@ void do_IRQ(struct cpu_user_regs *regs) if ( ~irq < nr_irqs && irq_desc_initialized(desc) ) { spin_lock(&desc->lock); - printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%-15s s=%08x\n", + printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%s s=%08x\n", ~irq, *cpumask_bits(desc->affinity), *cpumask_bits(desc->arch.cpu_mask), *cpumask_bits(desc->arch.old_cpu_mask), _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Keir Fraser
2013-Mar-18 12:09 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
On 18/03/2013 11:12, "Jan Beulich" <JBeulich@suse.com> wrote:> While being unable to reproduce the "No irq handler for vector ..." > messages observed on other systems, the change done by 5dc3fd2 (''x86: > extend diagnostics for "No irq handler for vector" messages'') appears > to point at the lack of masking - at least I can''t see what else might > be wrong with the HPET MSI code that could trigger these warnings. > > While at it, also adjust the message printed by aforementioned commit > to not pointlessly insert spaces - we don''t need aligned tabular output > here. > > Signed-off-by: Jan Beulich <jbeulich@suse.com>Acked-by: Keir Fraser <keir@xen.org>> --- a/xen/arch/x86/hpet.c > +++ b/xen/arch/x86/hpet.c > @@ -466,7 +466,9 @@ static void set_channel_irq_affinity(con > > ASSERT(!local_irq_is_enabled()); > spin_lock(&desc->lock); > + hpet_msi_mask(desc); > hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); > + hpet_msi_unmask(desc); > spin_unlock(&desc->lock); > } > > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -826,7 +826,7 @@ void do_IRQ(struct cpu_user_regs *regs) > if ( ~irq < nr_irqs && irq_desc_initialized(desc) ) > { > spin_lock(&desc->lock); > - printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%-15s > s=%08x\n", > + printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%s > s=%08x\n", > ~irq, *cpumask_bits(desc->affinity), > *cpumask_bits(desc->arch.cpu_mask), > *cpumask_bits(desc->arch.old_cpu_mask), > > >
Sander Eikelenboom
2013-Mar-19 15:53 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
Hi Jan, Could this change have a averse affect on AMD systems ? With this patch booting the dom0 kernel slowly seems to come to a halt (sometime trying to mount rootfs, sometimes a little further trying to bring networking up.) I don''t see any evident warnings or errors, reverting this commit makes the system boot OK again. (System is a 890fx motherboard with amd phenom II x6) -- Sander Monday, March 18, 2013, 12:12:50 PM, you wrote:> While being unable to reproduce the "No irq handler for vector ..." > messages observed on other systems, the change done by 5dc3fd2 (''x86: > extend diagnostics for "No irq handler for vector" messages'') appears > to point at the lack of masking - at least I can''t see what else might > be wrong with the HPET MSI code that could trigger these warnings.> While at it, also adjust the message printed by aforementioned commit > to not pointlessly insert spaces - we don''t need aligned tabular output > here.> Signed-off-by: Jan Beulich <jbeulich@suse.com>> --- a/xen/arch/x86/hpet.c > +++ b/xen/arch/x86/hpet.c > @@ -466,7 +466,9 @@ static void set_channel_irq_affinity(con > > ASSERT(!local_irq_is_enabled()); > spin_lock(&desc->lock); > + hpet_msi_mask(desc); > hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); > + hpet_msi_unmask(desc); > spin_unlock(&desc->lock); > } > > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -826,7 +826,7 @@ void do_IRQ(struct cpu_user_regs *regs) > if ( ~irq < nr_irqs && irq_desc_initialized(desc) ) > { > spin_lock(&desc->lock); > - printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%-15s s=%08x\n", > + printk("IRQ%d a=%04lx[%04lx,%04lx] v=%02x[%02x] t=%s s=%08x\n", > ~irq, *cpumask_bits(desc->affinity), > *cpumask_bits(desc->arch.cpu_mask), > *cpumask_bits(desc->arch.old_cpu_mask),
Jan Beulich
2013-Mar-19 16:00 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
>>> On 19.03.13 at 16:53, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Could this change have a averse affect on AMD systems ?It shouldn''t, ...> With this patch booting the dom0 kernel slowly seems to come to a halt > (sometime trying to mount rootfs, sometimes a little further trying to bring > networking up.)... but apparently does (apart from also having the intended effect of eliminating vector-without-IRQ warnings). But it''s not obvious how that would be - after all, the two added calls should be pretty benign performance wise.> I don''t see any evident warnings or errors, reverting this commit makes the > system boot OK again.Does the watchdog work on it? If so, could you see whether enabling that catches something? Or else, do the debug keys still work when the box stopped? Jan
Sander Eikelenboom
2013-Mar-19 22:48 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
Tuesday, March 19, 2013, 5:00:33 PM, you wrote:>>>> On 19.03.13 at 16:53, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Could this change have a averse affect on AMD systems ?> It shouldn''t, ...>> With this patch booting the dom0 kernel slowly seems to come to a halt >> (sometime trying to mount rootfs, sometimes a little further trying to bring >> networking up.)> ... but apparently does (apart from also having the intended effect > of eliminating vector-without-IRQ warnings). But it''s not obvious > how that would be - after all, the two added calls should be pretty > benign performance wise.>> I don''t see any evident warnings or errors, reverting this commit makes the >> system boot OK again.> Does the watchdog work on it? If so, could you see whether enabling > that catches something? Or else, do the debug keys still work when > the box stopped?Yes they still work, and somehow using the debug keys seems to make it continu for a bit (slowly and ending up in infinite loop printing firewall messages) Serial log attached, hope i have used the debug keys you are interested in, if not please do specify ... -- Sander> Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Mar-19 23:24 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
On 19/03/13 22:48, Sander Eikelenboom wrote:> Tuesday, March 19, 2013, 5:00:33 PM, you wrote: > >>>>> On 19.03.13 at 16:53, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>> Could this change have a averse affect on AMD systems ? >> It shouldn''t, ... >>> With this patch booting the dom0 kernel slowly seems to come to a halt >>> (sometime trying to mount rootfs, sometimes a little further trying to bring >>> networking up.) >> ... but apparently does (apart from also having the intended effect >> of eliminating vector-without-IRQ warnings). But it''s not obvious >> how that would be - after all, the two added calls should be pretty >> benign performance wise. >>> I don''t see any evident warnings or errors, reverting this commit makes the >>> system boot OK again. >> Does the watchdog work on it? If so, could you see whether enabling >> that catches something? Or else, do the debug keys still work when >> the box stopped? > Yes they still work, and somehow using the debug keys seems to make it continu for a bit (slowly and ending up in infinite loop printing firewall messages) > > Serial log attached, hope i have used the debug keys you are interested in, if not please do specify ... > > -- > Sanderhpet_msi_mask() flips the Timer Interrupt Enable bit, which causes the timer not to generate interrupts, but to continue running. Are we by any chance suffering from a bad interaction of oneshot timers not being re-armed for exceedingly long periods of time ? ~Andrew> > >> Jan
Jan Beulich
2013-Mar-20 08:22 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
>>> On 19.03.13 at 23:48, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Yes they still work, and somehow using the debug keys seems to make it > continu for a bit (slowly and ending up in infinite loop printing firewall > messages)That information is perhaps more important than the contents of the log. On my way home yesterday I had already realized that the temporary masking of the interrupt may cause wakeup events to be missed. Unfortunately the fix for this is quite a bit larger than the three of four lines I had hoped would need to be added. Could you give the attached patch a try? (Without seeing the problem, I obviously can''t say anything other about it than I know it compiles.) Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Mar-20 08:31 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
>>> On 20.03.13 at 00:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 19/03/13 22:48, Sander Eikelenboom wrote: >> Tuesday, March 19, 2013, 5:00:33 PM, you wrote: >> >>>>>> On 19.03.13 at 16:53, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>>> Could this change have a averse affect on AMD systems ? >>> It shouldn''t, ... >>>> With this patch booting the dom0 kernel slowly seems to come to a halt >>>> (sometime trying to mount rootfs, sometimes a little further trying to bring > >>>> networking up.) >>> ... but apparently does (apart from also having the intended effect >>> of eliminating vector-without-IRQ warnings). But it''s not obvious >>> how that would be - after all, the two added calls should be pretty >>> benign performance wise. >>>> I don''t see any evident warnings or errors, reverting this commit makes the >>>> system boot OK again. >>> Does the watchdog work on it? If so, could you see whether enabling >>> that catches something? Or else, do the debug keys still work when >>> the box stopped? >> Yes they still work, and somehow using the debug keys seems to make it > continu for a bit (slowly and ending up in infinite loop printing firewall > messages) >> >> Serial log attached, hope i have used the debug keys you are interested in, > if not please do specify ... >> >> -- >> Sander > > hpet_msi_mask() flips the Timer Interrupt Enable bit, which causes the > timer not to generate interrupts, but to continue running. > > Are we by any chance suffering from a bad interaction of oneshot timers > not being re-armed for exceedingly long periods of time ?See the response/patch I just sent - I don''t think it''s the lack of re-arming (which is only an indirect effect), but the missing of interrupts. Apparently this can be close to negligible (on the two machines I tested this on) to having a severe effect (on Sander''s) system. Jan
Sander Eikelenboom
2013-Mar-20 10:13 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
Wednesday, March 20, 2013, 9:22:33 AM, you wrote:>>>> On 19.03.13 at 23:48, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Yes they still work, and somehow using the debug keys seems to make it >> continu for a bit (slowly and ending up in infinite loop printing firewall >> messages)> That information is perhaps more important than the contents of > the log. On my way home yesterday I had already realized that > the temporary masking of the interrupt may cause wakeup events > to be missed. Unfortunately the fix for this is quite a bit larger > than the three of four lines I had hoped would need to be added. > Could you give the attached patch a try? (Without seeing the > problem, I obviously can''t say anything other about it than I > know it compiles.)One of your assertions failed, serial log attached ...> Thanks, Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Mar-20 11:02 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
>>> On 20.03.13 at 11:13, Sander Eikelenboom <linux@eikelenboom.it> wrote:> Wednesday, March 20, 2013, 9:22:33 AM, you wrote: > >>>>> On 19.03.13 at 23:48, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>> Yes they still work, and somehow using the debug keys seems to make it >>> continu for a bit (slowly and ending up in infinite loop printing firewall >>> messages) > >> That information is perhaps more important than the contents of >> the log. On my way home yesterday I had already realized that >> the temporary masking of the interrupt may cause wakeup events >> to be missed. Unfortunately the fix for this is quite a bit larger >> than the three of four lines I had hoped would need to be added. >> Could you give the attached patch a try? (Without seeing the >> problem, I obviously can''t say anything other about it than I >> know it compiles.) > > > One of your assertions failed, serial log attached ...That was due to me blindly doing the same thing to hpet_detach_channel() that I did for hpet_attach_channel(). Fixed now. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sander Eikelenboom
2013-Mar-20 11:55 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
Wednesday, March 20, 2013, 12:02:00 PM, you wrote:>>>> On 20.03.13 at 11:13, Sander Eikelenboom <linux@eikelenboom.it> wrote:>> Wednesday, March 20, 2013, 9:22:33 AM, you wrote: >> >>>>>> On 19.03.13 at 23:48, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>>> Yes they still work, and somehow using the debug keys seems to make it >>>> continu for a bit (slowly and ending up in infinite loop printing firewall >>>> messages) >> >>> That information is perhaps more important than the contents of >>> the log. On my way home yesterday I had already realized that >>> the temporary masking of the interrupt may cause wakeup events >>> to be missed. Unfortunately the fix for this is quite a bit larger >>> than the three of four lines I had hoped would need to be added. >>> Could you give the attached patch a try? (Without seeing the >>> problem, I obviously can''t say anything other about it than I >>> know it compiles.) >> >> >> One of your assertions failed, serial log attached ...> That was due to me blindly doing the same thing to > hpet_detach_channel() that I did for hpet_attach_channel(). > Fixed now.Close but not entirely ;) See attached serial log> Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Mar-20 13:38 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
>>> On 20.03.13 at 12:55, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Close but not entirely ;)Close to not crashing, maybe, but whether this really helps with your problem is still entirely unclear.> See attached serial logOkay, I wasn''t even aware of that assertion in _spin_lock_irq(). Keir, do you really think this is necessary? In the prior patch version, handle_hpet_broadcast() had a flow like this spin_lock_irqsave(); ... spin_unlock_irqrestore(); ... if ( next_event != STIME_MAX ) { spin_lock_irq(); ... spin_unlock_irqrestore(); } avoiding the saving of the flags in the second lock acquire. Said assertion makes it impossible to do this. Sander, in any case, attached a fixed version of the patch (I had to guess which of the two spin_lock_irq() calls it was, as the log was incomplete in that the stack trace got dropped, but am pretty positive that it was the one in handle_hpet_broadcast()). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Mar-20 13:46 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
On 20/03/13 13:38, Jan Beulich wrote:>>>> On 20.03.13 at 12:55, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Close but not entirely ;) > Close to not crashing, maybe, but whether this really helps with your > problem is still entirely unclear. > >> See attached serial log > Okay, I wasn''t even aware of that assertion in _spin_lock_irq(). > > Keir, do you really think this is necessary? In the prior patch > version, handle_hpet_broadcast() had a flow like thisAs I have been playing with spinlocks and the recursive NMI path, I would say that the assertion is necessary, given the other callsites of spin_{un,}lock_irq()> > spin_lock_irqsave(); > ... > spin_unlock_irqrestore(); > ... > if ( next_event != STIME_MAX ) > { > spin_lock_irq(); > ... > spin_unlock_irqrestore(); > } > > avoiding the saving of the flags in the second lock acquire. Said > assertion makes it impossible to do this.I would agree that in this case the logic is correct despite the assertion. Overall, I would argue that obviously correct matching locking pairs is more important the performance penalty from an additional pushf & mov ~Andrew> > Sander, in any case, attached a fixed version of the patch (I had > to guess which of the two spin_lock_irq() calls it was, as the log > was incomplete in that the stack trace got dropped, but am pretty > positive that it was the one in handle_hpet_broadcast()). > > Jan >
Keir Fraser
2013-Mar-20 14:19 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
On 20/03/2013 13:38, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 20.03.13 at 12:55, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Close but not entirely ;) > > Close to not crashing, maybe, but whether this really helps with your > problem is still entirely unclear. > >> See attached serial log > > Okay, I wasn''t even aware of that assertion in _spin_lock_irq(). > > Keir, do you really think this is necessary?You are more cunning than some others. ;) I''m pretty confident those spinlock assertions save us from real bugs. Also I''d rather have blindingly obvious code here than slightly faster code. -- Keir> In the prior patch > version, handle_hpet_broadcast() had a flow like this > > spin_lock_irqsave(); > ... > spin_unlock_irqrestore(); > ... > if ( next_event != STIME_MAX ) > { > spin_lock_irq(); > ... > spin_unlock_irqrestore(); > } > > avoiding the saving of the flags in the second lock acquire. Said > assertion makes it impossible to do this. > > Sander, in any case, attached a fixed version of the patch (I had > to guess which of the two spin_lock_irq() calls it was, as the log > was incomplete in that the stack trace got dropped, but am pretty > positive that it was the one in handle_hpet_broadcast()). > > Jan >
Sander Eikelenboom
2013-Mar-20 14:35 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
Wednesday, March 20, 2013, 2:38:58 PM, you wrote:>>>> On 20.03.13 at 12:55, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Close but not entirely ;)> Close to not crashing, maybe, but whether this really helps with your > problem is still entirely unclear.>> See attached serial log> Okay, I wasn''t even aware of that assertion in _spin_lock_irq().> Keir, do you really think this is necessary? In the prior patch > version, handle_hpet_broadcast() had a flow like this> spin_lock_irqsave(); > ... > spin_unlock_irqrestore(); > ... > if ( next_event != STIME_MAX ) > { > spin_lock_irq(); > ... > spin_unlock_irqrestore(); > }> avoiding the saving of the flags in the second lock acquire. Said > assertion makes it impossible to do this.> Sander, in any case, attached a fixed version of the patch (I had > to guess which of the two spin_lock_irq() calls it was, as the log > was incomplete in that the stack trace got dropped, but am pretty > positive that it was the one in handle_hpet_broadcast()).Yes this one makes it boot again, thx ! -- Sander> Jan
Jan Beulich
2013-Mar-20 14:41 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
>>> On 20.03.13 at 15:35, Sander Eikelenboom <linux@eikelenboom.it> wrote:> Wednesday, March 20, 2013, 2:38:58 PM, you wrote: > >>>>> On 20.03.13 at 12:55, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>> Close but not entirely ;) > >> Close to not crashing, maybe, but whether this really helps with your >> problem is still entirely unclear. > >>> See attached serial log > >> Okay, I wasn''t even aware of that assertion in _spin_lock_irq(). > >> Keir, do you really think this is necessary? In the prior patch >> version, handle_hpet_broadcast() had a flow like this > >> spin_lock_irqsave(); >> ... >> spin_unlock_irqrestore(); >> ... >> if ( next_event != STIME_MAX ) >> { >> spin_lock_irq(); >> ... >> spin_unlock_irqrestore(); >> } > >> avoiding the saving of the flags in the second lock acquire. Said >> assertion makes it impossible to do this. > >> Sander, in any case, attached a fixed version of the patch (I had >> to guess which of the two spin_lock_irq() calls it was, as the log >> was incomplete in that the stack trace got dropped, but am pretty >> positive that it was the one in handle_hpet_broadcast()). > > Yes this one makes it boot again, thx !I''ll put a Tested-by: <you> on it then, if you don''t mind. Jan
Sander Eikelenboom
2013-Mar-20 14:44 UTC
Re: [PATCH] x86/HPET: mask interrupt while changing affinity
Wednesday, March 20, 2013, 3:41:45 PM, you wrote:>>>> On 20.03.13 at 15:35, Sander Eikelenboom <linux@eikelenboom.it> wrote:>> Wednesday, March 20, 2013, 2:38:58 PM, you wrote: >> >>>>>> On 20.03.13 at 12:55, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>>> Close but not entirely ;) >> >>> Close to not crashing, maybe, but whether this really helps with your >>> problem is still entirely unclear. >> >>>> See attached serial log >> >>> Okay, I wasn''t even aware of that assertion in _spin_lock_irq(). >> >>> Keir, do you really think this is necessary? In the prior patch >>> version, handle_hpet_broadcast() had a flow like this >> >>> spin_lock_irqsave(); >>> ... >>> spin_unlock_irqrestore(); >>> ... >>> if ( next_event != STIME_MAX ) >>> { >>> spin_lock_irq(); >>> ... >>> spin_unlock_irqrestore(); >>> } >> >>> avoiding the saving of the flags in the second lock acquire. Said >>> assertion makes it impossible to do this. >> >>> Sander, in any case, attached a fixed version of the patch (I had >>> to guess which of the two spin_lock_irq() calls it was, as the log >>> was incomplete in that the stack trace got dropped, but am pretty >>> positive that it was the one in handle_hpet_broadcast()). >> >> Yes this one makes it boot again, thx !> I''ll put a Tested-by: <you> on it then, if you don''t mind.Sure I don''t mind !> Jan