Quan Xu
2017-Nov-13 10:06 UTC
[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path
From: Yang Zhang <yang.zhang.wz at gmail.com> Implement a generic idle poll which resembles the functionality found in arch/. Provide weak arch_cpu_idle_poll function which can be overridden by the architecture code if needed. Interrupts arrive which may not cause a reschedule in idle loops. In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry for interrupts and VM-exit immediately. Also this becomes more expensive than bare metal. Add a generic idle poll before enter real idle path. When a reschedule event is pending, we can bypass the real idle path. Signed-off-by: Quan Xu <quan.xu0 at gmail.com> Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: Ingo Molnar <mingo at redhat.com> Cc: "H. Peter Anvin" <hpa at zytor.com> Cc: x86 at kernel.org Cc: Peter Zijlstra <peterz at infradead.org> Cc: Borislav Petkov <bp at alien8.de> Cc: Kyle Huey <me at kylehuey.com> Cc: Len Brown <len.brown at intel.com> Cc: Andy Lutomirski <luto at kernel.org> Cc: Tom Lendacky <thomas.lendacky at amd.com> Cc: Tobias Klauser <tklauser at distanz.ch> Cc: linux-kernel at vger.kernel.org --- arch/x86/kernel/process.c | 7 +++++++ kernel/sched/idle.c | 2 ++ 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c676853..f7db8b5 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -333,6 +333,13 @@ void arch_cpu_idle(void) x86_idle(); } +#ifdef CONFIG_PARAVIRT +void arch_cpu_idle_poll(void) +{ + paravirt_idle_poll(); +} +#endif + /* * We use this if we don't have any better idle routine.. */ diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 257f4f0..df7c422 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void) } /* Weak implementations for optional arch specific functions */ +void __weak arch_cpu_idle_poll(void) { } void __weak arch_cpu_idle_prepare(void) { } void __weak arch_cpu_idle_enter(void) { } void __weak arch_cpu_idle_exit(void) { } @@ -219,6 +220,7 @@ static void do_idle(void) */ __current_set_polling(); + arch_cpu_idle_poll(); quiet_vmstat(); tick_nohz_idle_enter(); -- 1.7.1
Thomas Gleixner
2017-Nov-15 22:03 UTC
[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path
On Wed, 15 Nov 2017, Peter Zijlstra wrote:> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote: > > From: Yang Zhang <yang.zhang.wz at gmail.com> > > > > Implement a generic idle poll which resembles the functionality > > found in arch/. Provide weak arch_cpu_idle_poll function which > > can be overridden by the architecture code if needed. > > No, we want less of those magic hooks, not more. > > > Interrupts arrive which may not cause a reschedule in idle loops. > > In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry > > for interrupts and VM-exit immediately. Also this becomes more > > expensive than bare metal. Add a generic idle poll before enter > > real idle path. When a reschedule event is pending, we can bypass > > the real idle path. > > Why not do a HV specific idle driver?If I understand the problem correctly then he wants to avoid the heavy lifting in tick_nohz_idle_enter() in the first place, but there is already an interesting quirk there which makes it exit early. See commit 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this commit looks similar. But lets not proliferate that. I'd rather see that go away. But the irq_timings stuff is heading into the same direction, with a more complex prediction logic which should tell you pretty good how long that idle period is going to be and in case of an interrupt heavy workload this would skip the extra work of stopping and restarting the tick and provide a very good input into a polling decision. This can be handled either in a HV specific idle driver or even in the generic core code. If the interrupt does not arrive then you can assume within the predicted time then you can assume that the flood stopped and invoke halt or whatever. That avoids all of that 'tunable and tweakable' x86 specific hackery and utilizes common functionality which is mostly there already. Thanks, tglx
Possibly Parallel Threads
- [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path
- [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path
- [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path
- [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path
- [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path