thr3ads.net - Linux Virtualization - [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Thomas Gleixner

2017-Nov-15 22:03 UTC

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On Wed, 15 Nov 2017, Peter Zijlstra wrote:
> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
> > From: Yang Zhang <yang.zhang.wz at gmail.com>
> > 
> > Implement a generic idle poll which resembles the functionality
> > found in arch/. Provide weak arch_cpu_idle_poll function which
> > can be overridden by the architecture code if needed.
> 
> No, we want less of those magic hooks, not more.
> 
> > Interrupts arrive which may not cause a reschedule in idle loops.
> > In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
> > for interrupts and VM-exit immediately. Also this becomes more
> > expensive than bare metal. Add a generic idle poll before enter
> > real idle path. When a reschedule event is pending, we can bypass
> > the real idle path.
> 
> Why not do a HV specific idle driver?
If I understand the problem correctly then he wants to avoid the heavy
lifting in tick_nohz_idle_enter() in the first place, but there is already
an interesting quirk there which makes it exit early.  See commit
3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this
commit
looks similar. But lets not proliferate that. I'd rather see that go away.

But the irq_timings stuff is heading into the same direction, with a more
complex prediction logic which should tell you pretty good how long that
idle period is going to be and in case of an interrupt heavy workload this
would skip the extra work of stopping and restarting the tick and provide a
very good input into a polling decision.

This can be handled either in a HV specific idle driver or even in the
generic core code. If the interrupt does not arrive then you can assume
within the predicted time then you can assume that the flood stopped and
invoke halt or whatever.

That avoids all of that 'tunable and tweakable' x86 specific hackery and
utilizes common functionality which is mostly there already.

Thanks,

	tglx

Peter Zijlstra

2017-Nov-16 08:45 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On Wed, Nov 15, 2017 at 11:03:08PM +0100, Thomas Gleixner
wrote:> If I understand the problem correctly then he wants to avoid the heavy
> lifting in tick_nohz_idle_enter() in the first place, but there is already
> an interesting quirk there which makes it exit early. 
Sure. And there are people who want to do the same for native.

Adding more ugly and special cases just isn't the way to go about doing
that.

I'm fairly sure I've told the various groups that want to tinker with
this to work together on this. I've also in fairly significant detail
sketched how to rework the idle code and idle predictors.

At this point I'm too tired to dig any of that up, so I'll just keep
saying no to patches that don't even attempt to go in the right
direction.

Thomas Gleixner

2017-Nov-16 08:58 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On Thu, 16 Nov 2017, Peter Zijlstra wrote:
> On Wed, Nov 15, 2017 at 11:03:08PM +0100, Thomas Gleixner wrote:
> > If I understand the problem correctly then he wants to avoid the heavy
> > lifting in tick_nohz_idle_enter() in the first place, but there is
already
> > an interesting quirk there which makes it exit early. 
> 
> Sure. And there are people who want to do the same for native.
> 
> Adding more ugly and special cases just isn't the way to go about doing
> that.
> 
> I'm fairly sure I've told the various groups that want to tinker
with
> this to work together on this. I've also in fairly significant detail
> sketched how to rework the idle code and idle predictors.
> 
> At this point I'm too tired to dig any of that up, so I'll just
keep
> saying no to patches that don't even attempt to go in the right
> direction.
That's why I said: But lets not proliferate that. I'd rather see that go
away.

And yes, the VM folks should talk to those who are trying to solve similar
problems for native (embedded/mobile).

Thanks,

	tglx

Quan Xu

2017-Nov-16 09:12 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On 2017-11-16 06:03, Thomas Gleixner wrote:> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>
>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>> From: Yang Zhang <yang.zhang.wz at gmail.com>
>>>
>>> Implement a generic idle poll which resembles the functionality
>>> found in arch/. Provide weak arch_cpu_idle_poll function which
>>> can be overridden by the architecture code if needed.
>> No, we want less of those magic hooks, not more.
>>
>>> Interrupts arrive which may not cause a reschedule in idle loops.
>>> In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
>>> for interrupts and VM-exit immediately. Also this becomes more
>>> expensive than bare metal. Add a generic idle poll before enter
>>> real idle path. When a reschedule event is pending, we can bypass
>>> the real idle path.
>> Why not do a HV specific idle driver?
> If I understand the problem correctly then he wants to avoid the heavy
> lifting in tick_nohz_idle_enter() in the first place, but there is already
> an interesting quirk there which makes it exit early.  See commit
> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for
this commit
> looks similar. But lets not proliferate that. I'd rather see that go
away.
agreed.

Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce 
arch_needs_cpu")
in kvm guest. I won't proliferate that..
> But the irq_timings stuff is heading into the same direction, with a more
> complex prediction logic which should tell you pretty good how long that
> idle period is going to be and in case of an interrupt heavy workload this
> would skip the extra work of stopping and restarting the tick and provide a
> very good input into a polling decision.

interesting. I have tested with IRQ_TIMINGS related code, which seems 
not working so far.
Also I'd like to help as much as I can.> This can be handled either in a HV specific idle driver or even in the
> generic core code. If the interrupt does not arrive then you can assume
> within the predicted time then you can assume that the flood stopped and
> invoke halt or whatever.
>
> That avoids all of that 'tunable and tweakable' x86 specific
hackery and
> utilizes common functionality which is mostly there already.here is some sample code. Poll for a while before enter halt in 
cpuidle_enter_state()
If I get a reschedule event, then don't try to enter halt.? (I hope this 
is the right direction as Peter mentioned in another email)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
 ??????????????? target_state = &drv->states[index];
 ??????? }

+#ifdef CONFIG_PARAVIRT
+?????? paravirt_idle_poll();
+
+?????? if (need_resched())
+?????????????? return -EBUSY;
+#endif
+
 ??????? /* Take note of the planned idle state. */
 ??????? sched_idle_set_state(target_state);




thanks,

Quan
Alibaba Cloud

Quan Xu

2017-Nov-16 09:29 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On 2017-11-16 16:45, Peter Zijlstra wrote:> On Wed, Nov 15, 2017 at 11:03:08PM +0100, Thomas Gleixner wrote:
>> If I understand the problem correctly then he wants to avoid the heavy
>> lifting in tick_nohz_idle_enter() in the first place, but there is
already
>> an interesting quirk there which makes it exit early.
> Sure. And there are people who want to do the same for native.
>
> Adding more ugly and special cases just isn't the way to go about doing
> that.
>
> I'm fairly sure I've told the various groups that want to tinker
with
> this to work together on this. I've also in fairly significant detail
> sketched how to rework the idle code and idle predictors.
>
> At this point I'm too tired to dig any of that up, so I'll just
keep
> saying no to patches that don't even attempt to go in the right
> direction.Peter, take care.

I really have considered this factor, and try my best not to interfere 
with scheduler/idle code.
if irq_timings code is ready, I can use it directly. I think irq_timings 
is not an easy task, I'd
like to help as much as I can.? Also don't try to touch tick_nohz* code 
again.

as tglx suggested, this can be handled either in a HV specific idle driver or
even in the generic core code.

I hope this is in the right direction.

Quan
Alibaba Cloud

Daniel Lezcano

2017-Nov-16 09:45 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On 16/11/2017 10:12, Quan Xu wrote:> 
> 
> On 2017-11-16 06:03, Thomas Gleixner wrote:
>> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>>
>>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>>> From: Yang Zhang <yang.zhang.wz at gmail.com>
>>>>
>>>> Implement a generic idle poll which resembles the functionality
>>>> found in arch/. Provide weak arch_cpu_idle_poll function which
>>>> can be overridden by the architecture code if needed.
>>> No, we want less of those magic hooks, not more.
>>>
>>>> Interrupts arrive which may not cause a reschedule in idle
loops.
>>>> In KVM guest, this costs several VM-exit/VM-entry cycles,
VM-entry
>>>> for interrupts and VM-exit immediately. Also this becomes more
>>>> expensive than bare metal. Add a generic idle poll before enter
>>>> real idle path. When a reschedule event is pending, we can
bypass
>>>> the real idle path.
>>> Why not do a HV specific idle driver?
>> If I understand the problem correctly then he wants to avoid the heavy
>> lifting in tick_nohz_idle_enter() in the first place, but there is
>> already
>> an interesting quirk there which makes it exit early.? See commit
>> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason
for this
>> commit
>> looks similar. But lets not proliferate that. I'd rather see that
go
>> away.
> 
> agreed.
> 
> Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz:
Introduce
> arch_needs_cpu")
> in kvm guest. I won't proliferate that..
> 
>> But the irq_timings stuff is heading into the same direction, with a
more
>> complex prediction logic which should tell you pretty good how long
that
>> idle period is going to be and in case of an interrupt heavy workload
>> this
>> would skip the extra work of stopping and restarting the tick and
>> provide a
>> very good input into a polling decision.
> 
> 
> interesting. I have tested with IRQ_TIMINGS related code, which seems
> not working so far.
I don't know how you tested it, can you elaborate what you meant by
"seems not working so far" ?

There are still some work to do to be more efficient. The prediction
based on the irq timings is all right if the interrupts have a simple
periodicity. But as soon as there is a pattern, the current code can't
handle it properly and does bad predictions.

I'm working on a self-learning pattern detection which is too heavy for
the kernel, and with it we should be able to detect properly the
patterns and re-ajust the period if it changes. I'm in the process of
making it suitable for kernel code (both math and perf).

One improvement which can be done right now and which can help you is
the interrupts rate on the CPU. It is possible to compute it and that
will give an accurate information for the polling decision.



-- 
 <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Thomas Gleixner

2017-Nov-16 09:53 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On Thu, 16 Nov 2017, Quan Xu wrote:> On 2017-11-16 06:03, Thomas Gleixner wrote:
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> struct cpuidle_driver *drv,
> ??????????????? target_state = &drv->states[index];
> ??????? }
> 
> +#ifdef CONFIG_PARAVIRT
> +?????? paravirt_idle_poll();
> +
> +?????? if (need_resched())
> +?????????????? return -EBUSY;
> +#endif
That's just plain wrong. We don't want to see any of this PARAVIRT crap
in
anything outside the architecture/hypervisor interfacing code which really
needs it.

The problem can and must be solved at the generic level in the first place
to gather the data which can be used to make such decisions.

How that information is used might be either completely generic or requires
system specific variants. But as long as we don't have any information at
all we cannot discuss that.

Please sit down and write up which data needs to be considered to make
decisions about probabilistic polling. Then we need to compare and contrast
that with the data which is necessary to make power/idle state decisions.

I would be very surprised if this data would not overlap by at least 90%.

Thanks,

	tglx

Reasonably Related Threads

Search for more maybe matching threads

Linux Virtualization - Nov 2017 - [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

Reasonably Related Threads