Yu, Ke
2009-Mar-20 12:51 UTC
[Xen-devel] [PATCH] cpuidle: suspend/resume scheduler tick timer during cpu idle entry/exit
cpuidle can collaborate with scheduler to reduce unnecessary timer interrupt. For example, credit scheduler accounting timer doesn''t need to be active at idle time, so it can be stopped at cpuidle entry and resumed at cpuidle exit. This patch implements this function by adding two ops in scheduler: tick_suspend/tick_resume, and implement them for credit scheduler With this patch, under idle scenario, timer interrupt frequency decreased from ~100HZ to ~10HZ, and average C state residency increase from ~10ms to larger than 100ms. Also in a two-socket machine, about 4% idle power saving is observed. However, one issue is observed with this patch, i.e. there is soft-lockup in dom0 occasionally. This issue is still under debugging. Currently we already find a >1s VCPUOP_set_singleshot_timer timeout, which imply this may be a dom0 issue. we are working hard to figure the root cause. Considering the very visible effect of this patch, and the issue mentioned above only occurs when cpuidle is enabled, and has no impact to normal user, we finally decide to send out this patch to see if it is possible for 3.4 inclusion. In the bug fix phase, we will send out bug fix for the issue. Signed-off-by: Yu Ke <ke.yu@intel.com> Tian Kevin <kevin.tian@intel.com> diff -r 1e3171c4ea26 xen/arch/x86/acpi/cpu_idle.c --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -288,6 +288,8 @@ static void acpi_processor_idle(void) power->last_state = cx; + sched_tick_suspend(); + /* * Sleep: * ------ @@ -389,14 +391,20 @@ static void acpi_processor_idle(void) default: local_irq_enable(); + sched_tick_resume(0); return; } cx->usage++; if ( sleep_ticks > 0 ) { + sched_tick_resume(acpi_pm_tick_to_ns(sleep_ticks)); power->last_residency = acpi_pm_tick_to_ns(sleep_ticks) / 1000UL; cx->time += sleep_ticks; + } + else + { + sched_tick_resume(0); } if ( cpuidle_current_governor->reflect ) diff -r 1e3171c4ea26 xen/common/sched_credit.c --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -154,6 +154,7 @@ struct csched_private { spinlock_t lock; struct list_head active_sdom; uint32_t ncpus; + struct timer master_ticker; unsigned int master; cpumask_t idlers; uint32_t weight; @@ -754,7 +755,7 @@ csched_runq_sort(unsigned int cpu) } static void -csched_acct(void) +csched_acct(void* dummy) { unsigned long flags; struct list_head *iter_vcpu, *next_vcpu; @@ -962,18 +963,6 @@ csched_tick(void *_cpu) */ if ( !is_idle_vcpu(current) ) csched_vcpu_acct(cpu); - - /* - * Host-wide accounting duty - * - * Note: Currently, this is always done by the master boot CPU. Eventually, - * we could distribute or at the very least cycle the duty. - */ - if ( (csched_priv.master == cpu) && - (spc->tick % CSCHED_TICKS_PER_ACCT) == 0 ) - { - csched_acct(); - } /* * Check if runq needs to be sorted @@ -1307,10 +1296,33 @@ static __init int csched_start_tickers(v set_timer(&spc->ticker, NOW() + MILLISECS(CSCHED_MSECS_PER_TICK)); } + init_timer( &csched_priv.master_ticker, csched_acct, NULL, + csched_priv.master); + + set_timer( &csched_priv.master_ticker, NOW() + + CSCHED_MSECS_PER_TICK * CSCHED_TICKS_PER_ACCT ); + return 0; } __initcall(csched_start_tickers); +static void csched_tick_suspend(void) +{ + struct csched_pcpu *spc; + + spc = CSCHED_PCPU(smp_processor_id()); + + stop_timer(&spc->ticker); +} + +static void csched_tick_resume(s_time_t elaps) +{ + struct csched_pcpu *spc; + + spc = CSCHED_PCPU(smp_processor_id()); + + set_timer(&spc->ticker, spc->ticker.expires + elaps); +} struct scheduler sched_credit_def = { .name = "SMP Credit Scheduler", @@ -1334,4 +1346,7 @@ struct scheduler sched_credit_def = { .dump_cpu_state = csched_dump_pcpu, .dump_settings = csched_dump, .init = csched_init, + + .tick_suspend = csched_tick_suspend, + .tick_resume = csched_tick_resume, }; diff -r 1e3171c4ea26 xen/common/schedule.c --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -955,6 +955,16 @@ void dump_runq(unsigned char key) local_irq_restore(flags); } +inline void sched_tick_suspend(void) +{ + SCHED_OP(tick_suspend); +} + +inline void sched_tick_resume(s_time_t elaps) +{ + SCHED_OP(tick_resume, elaps); +} + #ifdef CONFIG_COMPAT #include "compat/schedule.c" #endif diff -r 1e3171c4ea26 xen/include/xen/sched-if.h --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -77,6 +77,9 @@ struct scheduler { struct xen_domctl_scheduler_op *); void (*dump_settings) (void); void (*dump_cpu_state) (int); + + void (*tick_suspend) (void); + void (*tick_resume) (s_time_t); }; #endif /* __XEN_SCHED_IF_H__ */ diff -r 1e3171c4ea26 xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -428,6 +428,8 @@ void sched_destroy_domain(struct domain void sched_destroy_domain(struct domain *d); long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *); int sched_id(void); +void sched_tick_suspend(void); +void sched_tick_resume(s_time_t elaps); void vcpu_wake(struct vcpu *d); void vcpu_sleep_nosync(struct vcpu *d); void vcpu_sleep_sync(struct vcpu *d); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 13:18 UTC
[Xen-devel] Re: [PATCH] cpuidle: suspend/resume scheduler tick timer during cpu idle entry/exit
On 20/03/2009 12:51, "Yu, Ke" <ke.yu@intel.com> wrote:> cpuidle can collaborate with scheduler to reduce unnecessary timer interrupt. > For example, credit scheduler accounting timer doesn''t need to be active at > idle time, so it can be stopped at cpuidle entry and resumed at cpuidle exit. > This patch implements this function by adding two ops in scheduler: > tick_suspend/tick_resume, and implement them for credit scheduler > > With this patch, under idle scenario, timer interrupt frequency decreased from > ~100HZ to ~10HZ, and average C state residency increase from ~10ms to larger > than 100ms. Also in a two-socket machine, about 4% idle power saving is > observed. > > However, one issue is observed with this patch, i.e. there is soft-lockup in > dom0 occasionally. This issue is still under debugging. Currently we already > find a >1s VCPUOP_set_singleshot_timer timeout, which imply this may be a dom0 > issue. we are working hard to figure the root cause.I don''t really want to take the patch while it is soft locking up. I would expect linux-2.6.18-xen.hg:22 to avoid lockup warnings due to too long singleshot timeouts (I assume you are testing with the 2.6.18 tree?). Personally I would rather have cpuidle be enabled by default (or even always with no disable option) and get existing Cx benefits for everyone, rather than have a slightly broken cpuidle option. Is there a reason not to turn on cpuidle by default now? Or even enable and then remove the boot option? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu, Ke
2009-Mar-20 14:28 UTC
[Xen-devel] RE: [PATCH] cpuidle: suspend/resume scheduler tick timer during cpu idle entry/exit
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >On 20/03/2009 12:51, "Yu, Ke" <ke.yu@intel.com> wrote: > >> cpuidle can collaborate with scheduler to reduce unnecessary timer interrupt. >> For example, credit scheduler accounting timer doesn''t need to be active at >> idle time, so it can be stopped at cpuidle entry and resumed at cpuidle exit. >> This patch implements this function by adding two ops in scheduler: >> tick_suspend/tick_resume, and implement them for credit scheduler >> >> With this patch, under idle scenario, timer interrupt frequency decreased from >> ~100HZ to ~10HZ, and average C state residency increase from ~10ms to larger >> than 100ms. Also in a two-socket machine, about 4% idle power saving is >> observed. >> >> However, one issue is observed with this patch, i.e. there is soft-lockup in >> dom0 occasionally. This issue is still under debugging. Currently we already >> find a >1s VCPUOP_set_singleshot_timer timeout, which imply this may be a dom0 >> issue. we are working hard to figure the root cause. > >I don''t really want to take the patch while it is soft locking up. I would >expect linux-2.6.18-xen.hg:22 to avoid lockup warnings due to too long >singleshot timeouts (I assume you are testing with the 2.6.18 tree?).Right, I am testing it with 2.6.18 tree. I am also looking into the dom0 code, to see if should change dom0.> >Personally I would rather have cpuidle be enabled by default (or even always >with no disable option) and get existing Cx benefits for everyone, rather >than have a slightly broken cpuidle option.Agree.> >Is there a reason not to turn on cpuidle by default now? Or even enable and >then remove the boot option?There is no obvious obstacle to turn on cpuidle by default. According to our testing and measurement, cpuidle is pretty stable now, maybe it is time to enable it by default. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 15:34 UTC
[Xen-devel] Re: [PATCH] cpuidle: suspend/resume scheduler tick timer during cpu idle entry/exit
On 20/03/2009 14:28, "Yu, Ke" <ke.yu@intel.com> wrote:>> Is there a reason not to turn on cpuidle by default now? Or even enable and >> then remove the boot option? > > There is no obvious obstacle to turn on cpuidle by default. According to our > testing and measurement, cpuidle is pretty stable now, maybe it is time to > enable it by default.Okay, I''ll do this. K. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel