Ian Campbell
2010-Oct-15 10:52 UTC
[Xen-devel] [PATCH] xen: always handle VIRQ_TIMER first.
This ensures that system is updated before calling any hard irq handlers after a long period of ticklessness. If we do not do this then hardirq will see a jiffies from before the period of ticklessness and make intcorrect decisions regarding timer expiry etc. This resolves issues e.g. with USB keyboard timer repeats. Based on a patch by Keir Fraser. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: keir@xen.org --- drivers/xen/events.c | 22 +++++++++++++++++++++- 1 files changed, 21 insertions(+), 1 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index c9d1d4a..1496ba5 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -1052,6 +1052,7 @@ static void __xen_evtchn_do_upcall(struct pt_regs *regs) do { unsigned long pending_words; + int irq; vcpu_info->evtchn_upcall_pending = 0; @@ -1062,6 +1063,24 @@ static void __xen_evtchn_do_upcall(struct pt_regs *regs) /* Clear master flag /before/ clearing selector flag. */ wmb(); #endif + + /* + * Handle timer interrupts before all others, so that all + * hardirq handlers see an up-to-date system time even if we + * have just woken from a long idle period. + */ + irq = percpu_read(virq_to_irq[VIRQ_TIMER]); + if (irq != -1) { + int word_idx; + int bit_idx; + int port = evtchn_from_irq(irq); + word_idx = port / BITS_PER_LONG; + bit_idx = port % BITS_PER_LONG; + if (VALID_EVTCHN(port) && + (active_evtchns(cpu, s, word_idx) & (1UL<<bit_idx))) + (void)handle_irq(irq, regs); + } + pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0); while (pending_words != 0) { unsigned long pending_bits; @@ -1071,9 +1090,10 @@ static void __xen_evtchn_do_upcall(struct pt_regs *regs) while ((pending_bits = active_evtchns(cpu, s, word_idx)) != 0) { int bit_idx = __ffs(pending_bits); int port = (word_idx * BITS_PER_LONG) + bit_idx; - int irq = evtchn_to_irq[port]; struct irq_desc *desc; + irq = evtchn_to_irq[port]; + mask_evtchn(port); clear_evtchn(port); -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-15 17:18 UTC
[Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 10/15/2010 03:52 AM, Ian Campbell wrote:> This ensures that system is updated before calling any hard irq > handlers after a long period of ticklessness. If we do not do this > then hardirq will see a jiffies from before the period of ticklessness > and make intcorrect decisions regarding timer expiry etc. > > This resolves issues e.g. with USB keyboard timer repeats. > > Based on a patch by Keir Fraser.I talked about this with James, and it makes no sense to me at all. J> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: keir@xen.org > --- > drivers/xen/events.c | 22 +++++++++++++++++++++- > 1 files changed, 21 insertions(+), 1 deletions(-) > > diff --git a/drivers/xen/events.c b/drivers/xen/events.c > index c9d1d4a..1496ba5 100644 > --- a/drivers/xen/events.c > +++ b/drivers/xen/events.c > @@ -1052,6 +1052,7 @@ static void __xen_evtchn_do_upcall(struct pt_regs *regs) > > do { > unsigned long pending_words; > + int irq; > > vcpu_info->evtchn_upcall_pending = 0; > > @@ -1062,6 +1063,24 @@ static void __xen_evtchn_do_upcall(struct pt_regs *regs) > /* Clear master flag /before/ clearing selector flag. */ > wmb(); > #endif > + > + /* > + * Handle timer interrupts before all others, so that all > + * hardirq handlers see an up-to-date system time even if we > + * have just woken from a long idle period. > + */ > + irq = percpu_read(virq_to_irq[VIRQ_TIMER]); > + if (irq != -1) { > + int word_idx; > + int bit_idx; > + int port = evtchn_from_irq(irq); > + word_idx = port / BITS_PER_LONG; > + bit_idx = port % BITS_PER_LONG; > + if (VALID_EVTCHN(port) && > + (active_evtchns(cpu, s, word_idx) & (1UL<<bit_idx))) > + (void)handle_irq(irq, regs); > + } > + > pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0); > while (pending_words != 0) { > unsigned long pending_bits; > @@ -1071,9 +1090,10 @@ static void __xen_evtchn_do_upcall(struct pt_regs *regs) > while ((pending_bits = active_evtchns(cpu, s, word_idx)) != 0) { > int bit_idx = __ffs(pending_bits); > int port = (word_idx * BITS_PER_LONG) + bit_idx; > - int irq = evtchn_to_irq[port]; > struct irq_desc *desc; > > + irq = evtchn_to_irq[port]; > + > mask_evtchn(port); > clear_evtchn(port); >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Oct-15 18:30 UTC
[Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 15/10/2010 18:18, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> On 10/15/2010 03:52 AM, Ian Campbell wrote: >> This ensures that system is updated before calling any hard irq >> handlers after a long period of ticklessness. If we do not do this >> then hardirq will see a jiffies from before the period of ticklessness >> and make intcorrect decisions regarding timer expiry etc. >> >> This resolves issues e.g. with USB keyboard timer repeats. >> >> Based on a patch by Keir Fraser. > > I talked about this with James, and it makes no sense to me at all.When guest resumes execution after a long period blocked, the unblocking interrupt may be handled before the inevitable timer interrupt which actually syncs up jiffies to current system time. The unblocking interrupt sees old jiffies -- most hardirq handlers really don''t care about time, but it happens that USB keyboard repeat is handled at that level -- it sees the key pressed at old jiffies and not released until new jiffies plus small delta. The difference between old and new jiffies can easily be enough to cause phantom key repeats. One question of course is whether the same hardirq key repeat mechanism can be foxed simply be involuntary preemption of the guest. I suppose it could, but it''s vastly more unlikely than the systematic deterministic race introduced by resume-from-block. Also we would hope that a runnable guest would not be descheduled for as long periods as a guest can be voluntarily blocked (bit arm waving that one I''ll admit ;-). -- Keir> J > >> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> >> Cc: keir@xen.org >> --- >> drivers/xen/events.c | 22 +++++++++++++++++++++- >> 1 files changed, 21 insertions(+), 1 deletions(-) >> >> diff --git a/drivers/xen/events.c b/drivers/xen/events.c >> index c9d1d4a..1496ba5 100644 >> --- a/drivers/xen/events.c >> +++ b/drivers/xen/events.c >> @@ -1052,6 +1052,7 @@ static void __xen_evtchn_do_upcall(struct pt_regs >> *regs) >> >> do { >> unsigned long pending_words; >> + int irq; >> >> vcpu_info->evtchn_upcall_pending = 0; >> >> @@ -1062,6 +1063,24 @@ static void __xen_evtchn_do_upcall(struct pt_regs >> *regs) >> /* Clear master flag /before/ clearing selector flag. */ >> wmb(); >> #endif >> + >> + /* >> + * Handle timer interrupts before all others, so that all >> + * hardirq handlers see an up-to-date system time even if we >> + * have just woken from a long idle period. >> + */ >> + irq = percpu_read(virq_to_irq[VIRQ_TIMER]); >> + if (irq != -1) { >> + int word_idx; >> + int bit_idx; >> + int port = evtchn_from_irq(irq); >> + word_idx = port / BITS_PER_LONG; >> + bit_idx = port % BITS_PER_LONG; >> + if (VALID_EVTCHN(port) && >> + (active_evtchns(cpu, s, word_idx) & (1UL<<bit_idx))) >> + (void)handle_irq(irq, regs); >> + } >> + >> pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0); >> while (pending_words != 0) { >> unsigned long pending_bits; >> @@ -1071,9 +1090,10 @@ static void __xen_evtchn_do_upcall(struct pt_regs >> *regs) >> while ((pending_bits = active_evtchns(cpu, s, word_idx)) != 0) { >> int bit_idx = __ffs(pending_bits); >> int port = (word_idx * BITS_PER_LONG) + bit_idx; >> - int irq = evtchn_to_irq[port]; >> struct irq_desc *desc; >> >> + irq = evtchn_to_irq[port]; >> + >> mask_evtchn(port); >> clear_evtchn(port); >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-15 21:11 UTC
[Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 10/15/2010 11:30 AM, Keir Fraser wrote:> On 15/10/2010 18:18, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote: > >> On 10/15/2010 03:52 AM, Ian Campbell wrote: >>> This ensures that system is updated before calling any hard irq >>> handlers after a long period of ticklessness. If we do not do this >>> then hardirq will see a jiffies from before the period of ticklessness >>> and make intcorrect decisions regarding timer expiry etc. >>> >>> This resolves issues e.g. with USB keyboard timer repeats. >>> >>> Based on a patch by Keir Fraser. >> I talked about this with James, and it makes no sense to me at all. > When guest resumes execution after a long period blocked, the unblocking > interrupt may be handled before the inevitable timer interrupt whichWhy "inevitable"? What if the next timer event is still some time in the future? Or are you assuming the timer is driven by the default Xen 100Hz timer?> actually syncs up jiffies to current system time. The unblocking interrupt > sees old jiffies -- most hardirq handlers really don''t care about time, but > it happens that USB keyboard repeat is handled at that level -- it sees the > key pressed at old jiffies and not released until new jiffies plus small > delta. The difference between old and new jiffies can easily be enough to > cause phantom key repeats.Yes, but... If the system is idle and has disabled timer ticks, and the next interrupt is from a piece of hardware, then jiffies will be out of date, but there won''t necessarily be a pending timer tick. If a device interrupt handler is allowed to rely on jiffies, then there must be some generic mechanism to update jiffies before calling any interrupt handler. This situation doesn''t seem like it is in any way Xen dependent, and AFAIK there''s no general requirement that timer interrupts be handled first. I''m guessing that this particular problem in the forward-port Xen kernel as a side-effect of its bespoke time handling code (including IDLE_NO_HZ) which is not doing something that the core time infrastructure would normally do. (I don''t see why the forward-port kernels couldn''t use the existing Xen time support in mainline rather than replacing it.) Or perhaps there is a real bug here, but again, I don''t think it is Xen-specific, or be addressed in Xen code.> One question of course is whether the same hardirq key repeat mechanism can > be foxed simply be involuntary preemption of the guest. I suppose it could, > but it''s vastly more unlikely than the systematic deterministic race > introduced by resume-from-block. Also we would hope that a runnable guest > would not be descheduled for as long periods as a guest can be voluntarily > blocked (bit arm waving that one I''ll admit ;-).I''ve seen unexpected key repeats in guests when using kvm keyboards. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Oct-16 06:48 UTC
[Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On Fri, 2010-10-15 at 22:11 +0100, Jeremy Fitzhardinge wrote:> This situation doesn''t seem like it is in any way Xen > dependent, and AFAIK there''s no general requirement that timer > interrupts be handled first.It''s not implicit somehow on native due to timer interrupt always being IRQ0 or something like that?> I''m guessing that this particular problem in the forward-port Xen > kernel as a side-effect of its bespoke time handling code (including > IDLE_NO_HZ) which is not doing something that the core time > infrastructure would normally do.You are right, it''s very possible this is a forward-port Xen only issue. The patch is out there now so perhaps we should not worry about it and revisit it if someone shows up with a plausible looking issue affecting pvops.> (I don''t see why the forward-port > kernels couldn''t use the existing Xen time support in mainline rather > than replacing it.)Agreed. Things like *front and the /dev/xen/* drivers would be good first candidates for this sort of convergence too if someone were interested. FWIW netback and blktap2 are already mostly converged in the XCP tree which has made pushing patches back and forth much easier. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Oct-16 07:14 UTC
[Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 15/10/2010 22:11, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:>> When guest resumes execution after a long period blocked, the unblocking >> interrupt may be handled before the inevitable timer interrupt which > > Why "inevitable"? What if the next timer event is still some time in > the future? Or are you assuming the timer is driven by the default Xen > 100Hz timer?Do you sometimes disable, or indeed never use, VCPUOP_set_periodic_timer? Hmmm... Perhaps as you suggest this would be a generic issue with any tickless kernel, and the correct upstream fix for issues such as USB kbd repeat -- if indeed such issues still exist -- is to fix such hardirq handlers to not depend on jiffies. We fixed it the way we did in ''classic Xen'' patched kernels since it seemed arhitecturally neatest. I can accept that in the tickless kernel world that may not actually be true. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-17 06:11 UTC
Re: [Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 10/16/2010 12:14 AM, Keir Fraser wrote:> On 15/10/2010 22:11, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote: > >>> When guest resumes execution after a long period blocked, the unblocking >>> interrupt may be handled before the inevitable timer interrupt which >> Why "inevitable"? What if the next timer event is still some time in >> the future? Or are you assuming the timer is driven by the default Xen >> 100Hz timer? > Do you sometimes disable, or indeed never use, VCPUOP_set_periodic_timer?I disable it ASAP at boot and always use VCPUOP_set_singleshot_timer from then on.> Hmmm... Perhaps as you suggest this would be a generic issue with any > tickless kernel, and the correct upstream fix for issues such as USB kbd > repeat -- if indeed such issues still exist -- is to fix such hardirq > handlers to not depend on jiffies. > > We fixed it the way we did in ''classic Xen'' patched kernels since it seemed > arhitecturally neatest. I can accept that in the tickless kernel world that > may not actually be true.I think (but I haven''t spelunked into that code lately) that after a tickless idle period it will update jiffies N ticks based on the clocksource, and then run any other interrupt handler code, so jiffies will always appear to be up to date. Ah, yes, here it is: /** * tick_nohz_update_jiffies - update jiffies when idle was interrupted * * Called from interrupt entry when the CPU was idle * * In case the sched_tick was stopped on this CPU, we have to check if jiffies * must be updated. Otherwise an interrupt handler could use a stale jiffy * value. We do this unconditionally on any cpu, as we don''t know whether the * cpu, which has the update task assigned is in a long sleep. */ static void tick_nohz_update_jiffies(ktime_t now) { ... } J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Oct-17 07:38 UTC
Re: [Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 17/10/2010 07:11, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:>> We fixed it the way we did in ''classic Xen'' patched kernels since it seemed >> arhitecturally neatest. I can accept that in the tickless kernel world that >> may not actually be true. > > I think (but I haven''t spelunked into that code lately) that after a > tickless idle period it will update jiffies N ticks based on the > clocksource, and then run any other interrupt handler code, so jiffies > will always appear to be up to date.Okay, that should suffice. That presumably calls into the clocksource and gives us our opportunity to sync with Xen''s current system time. Effectively it''s just hooking into the interrupt handling preamble at a different, more generic, point. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Oct-18 13:20 UTC
Re: [Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
>>> On 17.10.10 at 08:11, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > I think (but I haven''t spelunked into that code lately) that after a > tickless idle period it will update jiffies N ticks based on the > clocksource, and then run any other interrupt handler code, so jiffies > will always appear to be up to date. > > Ah, yes, here it is: > > /** > * tick_nohz_update_jiffies - update jiffies when idle was interrupted > * > * Called from interrupt entry when the CPU was idle > * > * In case the sched_tick was stopped on this CPU, we have to check if > jiffies > * must be updated. Otherwise an interrupt handler could use a stale jiffy > * value. We do this unconditionally on any cpu, as we don''t know whether > the > * cpu, which has the update task assigned is in a long sleep. > */ > static void tick_nohz_update_jiffies(ktime_t now) > { > ... > }But this is available only with CONFIG_NO_HZ, which is a freely selectable option. So perhaps the code should still be added inside an #ifndef CONFIG_NO_HZ? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-18 16:52 UTC
Re: [Xen-devel] Re: [PATCH] xen: always handle VIRQ_TIMER first.
On 10/18/2010 06:20 AM, Jan Beulich wrote:>>>> On 17.10.10 at 08:11, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> I think (but I haven''t spelunked into that code lately) that after a >> tickless idle period it will update jiffies N ticks based on the >> clocksource, and then run any other interrupt handler code, so jiffies >> will always appear to be up to date. >> >> Ah, yes, here it is: >> >> /** >> * tick_nohz_update_jiffies - update jiffies when idle was interrupted >> * >> * Called from interrupt entry when the CPU was idle >> * >> * In case the sched_tick was stopped on this CPU, we have to check if >> jiffies >> * must be updated. Otherwise an interrupt handler could use a stale jiffy >> * value. We do this unconditionally on any cpu, as we don''t know whether >> the >> * cpu, which has the update task assigned is in a long sleep. >> */ >> static void tick_nohz_update_jiffies(ktime_t now) >> { >> ... >> } > But this is available only with CONFIG_NO_HZ, which is a freely > selectable option. So perhaps the code should still be added > inside an #ifndef CONFIG_NO_HZ?Non-NO_HZ is a pretty pessimal configuration for a VM, or indeed any system which cares about power. Are there any use cases for which its a good idea? However, you could change it to always update jiffies on any interrupt entrypoint, regardless of whether its coming from an idle state. Or even just change "jiffies" into a macro which calls a function to just compute the current value without needing to rely on interrupts at all. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel