David Vrabel
2013-Jun-20 19:16 UTC
[PATCH 1/4] hrtimers: provide a hrtimers_late_resume() call
From: David Vrabel <david.vrabel@citrix.com> Xen suspends (and resumes) without disabling non-boot CPUs as doing so adds considerable delay to live migrations. A 4 VCPU guest had more than 200 ms of additional downtime if disable_nonboot_cpus() was called prior to suspending. As a consequence, only high resolution timers on the current CPU are retriggered when resuming. The Xen resume path worked around this with a call to clock_was_set() to retrigger timers on all the CPUs. A subsequent change will make clock_was_set() internal to hrtimers so add an new call (hrtimers_late_resume()) to do the same job. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/xen/manage.c | 8 ++++++-- include/linux/hrtimer.h | 1 + kernel/hrtimer.c | 9 +++++++++ 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 412b96c..75bc2d5 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -166,8 +166,12 @@ out_resume: dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE); - /* Make sure timer events get retriggered on all CPUs */ - clock_was_set(); + /* + * syscore_resume() ends up calling hrtimer_resume() but this + * only retriggers timer events on the current CPU. We need + * to retrigger the events on all the other CPUS. + */ + hrtimers_late_resume(); out_thaw: #ifdef CONFIG_PREEMPT diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index d19a5c2..13df0fa 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -323,6 +323,7 @@ extern void timerfd_clock_was_set(void); static inline void timerfd_clock_was_set(void) { } #endif extern void hrtimers_resume(void); +extern void hrtimers_late_resume(void); extern ktime_t ktime_get(void); extern ktime_t ktime_get_real(void); diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index fd4b13b..34384b4 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -784,6 +784,15 @@ void hrtimers_resume(void) timerfd_clock_was_set(); } +/* + * If non-boot CPUs were online during resume, we need to retrigger + * the events for all the non-boot CPUs. + */ +void hrtimers_late_resume(void) +{ + clock_was_set(); +} + static inline void timer_stats_hrtimer_set_start_info(struct hrtimer *timer) { #ifdef CONFIG_TIMER_STATS -- 1.7.2.5
Thomas Gleixner
2013-Jun-21 07:53 UTC
Re: [PATCH 1/4] hrtimers: provide a hrtimers_late_resume() call
On Thu, 20 Jun 2013, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Xen suspends (and resumes) without disabling non-boot CPUs as doing so > adds considerable delay to live migrations. A 4 VCPU guest had more > than 200 ms of additional downtime if disable_nonboot_cpus() was > called prior to suspending. > > As a consequence, only high resolution timers on the current CPU are > retriggered when resuming. The Xen resume path worked around this > with a call to clock_was_set() to retrigger timers on all the CPUs. > > A subsequent change will make clock_was_set() internal to hrtimers so > add an new call (hrtimers_late_resume()) to do the same job. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > --- > drivers/xen/manage.c | 8 ++++++-- > include/linux/hrtimer.h | 1 + > kernel/hrtimer.c | 9 +++++++++ > 3 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c > index 412b96c..75bc2d5 100644 > --- a/drivers/xen/manage.c > +++ b/drivers/xen/manage.c > @@ -166,8 +166,12 @@ out_resume: > > dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE); > > - /* Make sure timer events get retriggered on all CPUs */ > - clock_was_set(); > + /* > + * syscore_resume() ends up calling hrtimer_resume() but this > + * only retriggers timer events on the current CPU. We need > + * to retrigger the events on all the other CPUS. > + */ > + hrtimers_late_resume();This is the completely wrong approach. If an architecture does not shut down the non boot cpus on suspend, then this wants to be handled in the core code and not in some random arch specific driver. Thanks, tglx
David Vrabel
2013-Jun-21 12:32 UTC
Re: [PATCH 1/4] hrtimers: provide a hrtimers_late_resume() call
On 21/06/13 08:53, Thomas Gleixner wrote:> On Thu, 20 Jun 2013, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> Xen suspends (and resumes) without disabling non-boot CPUs as doing so >> adds considerable delay to live migrations. A 4 VCPU guest had more >> than 200 ms of additional downtime if disable_nonboot_cpus() was >> called prior to suspending. >> >> As a consequence, only high resolution timers on the current CPU are >> retriggered when resuming. The Xen resume path worked around this >> with a call to clock_was_set() to retrigger timers on all the CPUs. >> >> A subsequent change will make clock_was_set() internal to hrtimers so >> add an new call (hrtimers_late_resume()) to do the same job. >> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> --- >> drivers/xen/manage.c | 8 ++++++-- >> include/linux/hrtimer.h | 1 + >> kernel/hrtimer.c | 9 +++++++++ >> 3 files changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c >> index 412b96c..75bc2d5 100644 >> --- a/drivers/xen/manage.c >> +++ b/drivers/xen/manage.c >> @@ -166,8 +166,12 @@ out_resume: >> >> dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE); >> >> - /* Make sure timer events get retriggered on all CPUs */ >> - clock_was_set(); >> + /* >> + * syscore_resume() ends up calling hrtimer_resume() but this >> + * only retriggers timer events on the current CPU. We need >> + * to retrigger the events on all the other CPUS. >> + */ >> + hrtimers_late_resume(); > > This is the completely wrong approach. If an architecture does not > shut down the non boot cpus on suspend, then this wants to be handled > in the core code and not in some random arch specific driver.Agreed. Does the following meet your requirements? David 8<------------------------------------------------------- hrtimers: support resuming with two or more CPUs online (but stopped) hrtimers_resume() only reprograms the timers for the current CPU as it assumes that all other CPUs are offline at this point in the resume process. If other CPUs are online then their timers will not be corrected and they may fire at the wrong time. When running as a Xen guest, this assumption is not true. Non-boot CPUs are only stopped with IRQs disabled instead of offlining them. This is a performance optimization as disabling the CPUs would add an unacceptable amount of additional downtime during a live migration (> 200 ms for a 4 VCPU guest). hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...) as the other CPUs will be stopped with IRQs disabled. Instead, defer the call to the next softirq. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/xen/manage.c | 3 --- kernel/hrtimer.c | 10 +++++++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 412b96c..421da85 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -166,9 +166,6 @@ out_resume: dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE); - /* Make sure timer events get retriggered on all CPUs */ - clock_was_set(); - out_thaw: #ifdef CONFIG_PREEMPT thaw_processes(); diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index fd4b13b..74aa7c5 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -773,15 +773,19 @@ void clock_was_set(void) /* * During resume we might have to reprogram the high resolution timer - * interrupt (on the local CPU): + * interrupt on all online CPUs. However, all other CPUs will be + * stopped with IRQs interrupts disabled at this point so defer this + * to the softirq. */ void hrtimers_resume(void) { + struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases); + WARN_ONCE(!irqs_disabled(), KERN_INFO "hrtimers_resume() called with IRQs enabled!"); - retrigger_next_event(NULL); - timerfd_clock_was_set(); + cpu_base->clock_was_set = 1; + __raise_softirq_irqoff(HRTIMER_SOFTIRQ); } static inline void timer_stats_hrtimer_set_start_info(struct hrtimer *timer) -- 1.7.2.5
Thomas Gleixner
2013-Jun-21 14:32 UTC
Re: [PATCH 1/4] hrtimers: provide a hrtimers_late_resume() call
On Fri, 21 Jun 2013, David Vrabel wrote:> On 21/06/13 08:53, Thomas Gleixner wrote: > > This is the completely wrong approach. If an architecture does not > > shut down the non boot cpus on suspend, then this wants to be handled > > in the core code and not in some random arch specific driver. > > Agreed. Does the following meet your requirements?Indeed. That''s looks way more reasonable. Though...> hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...) > as the other CPUs will be stopped with IRQs disabled. Instead, defer > the call to the next softirq.that''s just working by chance and not by design as there is no guarantee that the next interrupt, which invokes the softirq, will arrive in time. So you want to make sure that an interrupt arrives. Invoking retrigger_next_event(NULL) from hrtimer_resume() should do the trick. Thanks, tglx
David Vrabel
2013-Jun-21 17:30 UTC
Re: [PATCH 1/4] hrtimers: provide a hrtimers_late_resume() call
On 21/06/13 15:32, Thomas Gleixner wrote:> On Fri, 21 Jun 2013, David Vrabel wrote: >> On 21/06/13 08:53, Thomas Gleixner wrote: >>> This is the completely wrong approach. If an architecture does not >>> shut down the non boot cpus on suspend, then this wants to be handled >>> in the core code and not in some random arch specific driver. >> >> Agreed. Does the following meet your requirements? > > Indeed. That''s looks way more reasonable. Though... > >> hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...) >> as the other CPUs will be stopped with IRQs disabled. Instead, defer >> the call to the next softirq. > > that''s just working by chance and not by design as there is no > guarantee that the next interrupt, which invokes the softirq, will > arrive in time. So you want to make sure that an interrupt arrives.That is a good point.> Invoking retrigger_next_event(NULL) from hrtimer_resume() should do > the trick.But I''m not sure that would be sufficient, although I may not be understanding how the hrtimers work or are used. There may be timers on other CPUs that are supposed to fire earlier than those on the current CPU. There may even be no timers scheduled on the current CPU. I think there needs to be something like: hrtimers_resume() { retrigger_next_event(NULL); /* Timers on other CPUs might expire earlier. Program an earlier event and use this to kick the softirq to correctly reprogram the events on the other CPUs. */ expires_next = cpu_base->expires_next; for_each_online_cpu(cpu) { expires = hrtimers_next_event_on_cpu(cpu) if (expires < expires_next) expires_next = expires; } tick_program_event(expires_next, 1); cpu_base->clock_was_set = 1; __raise_softirq_irqoff(HRTIMER_SOFTIRQ); } However, since hrtimers require the use of a one-shot ticker and when one-shot timers are resumed they are armed to fire immediately (see tick_resume_oneshot()) this interrupt is sufficient to kick the require softirq. So, as proposed before: hrtimers_resume() { /* This CPU''s tick is armed to fire immediately by tick_oneshot_resume(). Just need raise a softirq to program the timers on all CPUs. */ cpu_base->clock_was_set = 1; __raise_softirq_irqoff(HRTIMER_SOFTIRQ); } Do you agree or disagree? David
Thomas Gleixner
2013-Jun-21 21:24 UTC
Re: [PATCH 1/4] hrtimers: provide a hrtimers_late_resume() call
On Fri, 21 Jun 2013, David Vrabel wrote:> On 21/06/13 15:32, Thomas Gleixner wrote: > However, since hrtimers require the use of a one-shot ticker and when > one-shot timers are resumed they are armed to fire immediately (see > tick_resume_oneshot()) this interrupt is sufficient to kick the require > softirq. > > So, as proposed before: > > hrtimers_resume() > { > /* This CPU''s tick is armed to fire immediately by > tick_oneshot_resume(). Just need raise a softirq to program > the timers on all CPUs. */ > cpu_base->clock_was_set = 1; > __raise_softirq_irqoff(HRTIMER_SOFTIRQ); > } > > Do you agree or disagree?Fair enough. I did not think about that. With the comment in place it is clear. It might be a bit more elaborate for the casual reader. Thanks, tglx