thr3ads.net - Linux Virtualization - [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Lezcano

2017-Nov-16 09:45 UTC

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On 16/11/2017 10:12, Quan Xu wrote:> 
> 
> On 2017-11-16 06:03, Thomas Gleixner wrote:
>> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>>
>>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>>> From: Yang Zhang <yang.zhang.wz at gmail.com>
>>>>
>>>> Implement a generic idle poll which resembles the functionality
>>>> found in arch/. Provide weak arch_cpu_idle_poll function which
>>>> can be overridden by the architecture code if needed.
>>> No, we want less of those magic hooks, not more.
>>>
>>>> Interrupts arrive which may not cause a reschedule in idle
loops.
>>>> In KVM guest, this costs several VM-exit/VM-entry cycles,
VM-entry
>>>> for interrupts and VM-exit immediately. Also this becomes more
>>>> expensive than bare metal. Add a generic idle poll before enter
>>>> real idle path. When a reschedule event is pending, we can
bypass
>>>> the real idle path.
>>> Why not do a HV specific idle driver?
>> If I understand the problem correctly then he wants to avoid the heavy
>> lifting in tick_nohz_idle_enter() in the first place, but there is
>> already
>> an interesting quirk there which makes it exit early.? See commit
>> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason
for this
>> commit
>> looks similar. But lets not proliferate that. I'd rather see that
go
>> away.
> 
> agreed.
> 
> Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz:
Introduce
> arch_needs_cpu")
> in kvm guest. I won't proliferate that..
> 
>> But the irq_timings stuff is heading into the same direction, with a
more
>> complex prediction logic which should tell you pretty good how long
that
>> idle period is going to be and in case of an interrupt heavy workload
>> this
>> would skip the extra work of stopping and restarting the tick and
>> provide a
>> very good input into a polling decision.
> 
> 
> interesting. I have tested with IRQ_TIMINGS related code, which seems
> not working so far.
I don't know how you tested it, can you elaborate what you meant by
"seems not working so far" ?

There are still some work to do to be more efficient. The prediction
based on the irq timings is all right if the interrupts have a simple
periodicity. But as soon as there is a pattern, the current code can't
handle it properly and does bad predictions.

I'm working on a self-learning pattern detection which is too heavy for
the kernel, and with it we should be able to detect properly the
patterns and re-ajust the period if it changes. I'm in the process of
making it suitable for kernel code (both math and perf).

One improvement which can be done right now and which can help you is
the interrupts rate on the CPU. It is possible to compute it and that
will give an accurate information for the polling decision.



-- 
 <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Quan Xu

2017-Nov-20 07:05 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On 2017-11-16 17:45, Daniel Lezcano wrote:> On 16/11/2017 10:12, Quan Xu wrote:
>>
>> On 2017-11-16 06:03, Thomas Gleixner wrote:
>>> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>>>
>>>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>>>> From: Yang Zhang <yang.zhang.wz at gmail.com>
>>>>>
>>>>> Implement a generic idle poll which resembles the
functionality
>>>>> found in arch/. Provide weak arch_cpu_idle_poll function
which
>>>>> can be overridden by the architecture code if needed.
>>>> No, we want less of those magic hooks, not more.
>>>>
>>>>> Interrupts arrive which may not cause a reschedule in idle
loops.
>>>>> In KVM guest, this costs several VM-exit/VM-entry cycles,
VM-entry
>>>>> for interrupts and VM-exit immediately. Also this becomes
more
>>>>> expensive than bare metal. Add a generic idle poll before
enter
>>>>> real idle path. When a reschedule event is pending, we can
bypass
>>>>> the real idle path.
>>>> Why not do a HV specific idle driver?
>>> If I understand the problem correctly then he wants to avoid the
heavy
>>> lifting in tick_nohz_idle_enter() in the first place, but there is
>>> already
>>> an interesting quirk there which makes it exit early.? See commit
>>> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The
reason for this
>>> commit
>>> looks similar. But lets not proliferate that. I'd rather see
that go
>>> away.
>> agreed.
>>
>> Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz:
Introduce
>> arch_needs_cpu")
>> in kvm guest. I won't proliferate that..
>>
>>> But the irq_timings stuff is heading into the same direction, with
a more
>>> complex prediction logic which should tell you pretty good how long
that
>>> idle period is going to be and in case of an interrupt heavy
workload
>>> this
>>> would skip the extra work of stopping and restarting the tick and
>>> provide a
>>> very good input into a polling decision.
>>
>> interesting. I have tested with IRQ_TIMINGS related code, which seems
>> not working so far.
> I don't know how you tested it, can you elaborate what you meant by
> "seems not working so far" ?
Daniel, I tried to enable IRQ_TIMINGS* manually. used 
irq_timings_next_event()
to return estimation of the earliest interrupt. However I got a constant.
> There are still some work to do to be more efficient. The prediction
> based on the irq timings is all right if the interrupts have a simple
> periodicity. But as soon as there is a pattern, the current code can't
> handle it properly and does bad predictions.
>
> I'm working on a self-learning pattern detection which is too heavy for
> the kernel, and with it we should be able to detect properly the
> patterns and re-ajust the period if it changes. I'm in the process of
> making it suitable for kernel code (both math and perf).
>
> One improvement which can be done right now and which can help you is
> the interrupts rate on the CPU. It is possible to compute it and that
> will give an accurate information for the polling decision.
>
>As tglx said, talk to each other / work together to make it usable for 
all use cases.
could you share how to enable it to get the interrupts rate on the CPU? 
I can try it
in cloud scenario. of course, I'd like to work with you to improve it.

Quan
Alibaba Cloud

Daniel Lezcano

2017-Nov-20 18:01 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On 20/11/2017 08:05, Quan Xu wrote:

[ ... ]
>>>> But the irq_timings stuff is heading into the same direction,
with a
>>>> more
>>>> complex prediction logic which should tell you pretty good how
long
>>>> that
>>>> idle period is going to be and in case of an interrupt heavy
workload
>>>> this
>>>> would skip the extra work of stopping and restarting the tick
and
>>>> provide a
>>>> very good input into a polling decision.
>>>
>>> interesting. I have tested with IRQ_TIMINGS related code, which
seems
>>> not working so far.
>> I don't know how you tested it, can you elaborate what you meant by
>> "seems not working so far" ?
> 
> Daniel, I tried to enable IRQ_TIMINGS* manually. used
> irq_timings_next_event()
> to return estimation of the earliest interrupt. However I got a constant.
The irq timings gives you an indication of the next interrupt deadline.

This information is a piece of the puzzle, you need to combine it with
the next timer expiration, and the next scheduling event. Then take the
earliest event in a timeline basis.

Using the trivial scheme above will work well with workload like videos
or mp3 but will fail as soon as the interrupts are not coming in a
regular basis and this is where the pattern recognition algorithm must act.
>> There are still some work to do to be more efficient. The prediction
>> based on the irq timings is all right if the interrupts have a simple
>> periodicity. But as soon as there is a pattern, the current code
can't
>> handle it properly and does bad predictions.
>>
>> I'm working on a self-learning pattern detection which is too heavy
for
>> the kernel, and with it we should be able to detect properly the
>> patterns and re-ajust the period if it changes. I'm in the process
of
>> making it suitable for kernel code (both math and perf).
>>
>> One improvement which can be done right now and which can help you is
>> the interrupts rate on the CPU. It is possible to compute it and that
>> will give an accurate information for the polling decision.
>>
>>
> As tglx said, talk to each other / work together to make it usable for
> all use cases.
> could you share how to enable it to get the interrupts rate on the CPU?
> I can try it
> in cloud scenario. of course, I'd like to work with you to improve it.
Sure, I will be glad if we can collaborate. I have some draft code but
before sharing it I would like we define what is the rate and what kind
of information we expect to infer from it. From my point of view it is a
value indicating the interrupt period per CPU, a short value indicates a
high number of interrupts on the CPU.

This value must decay with the time, the question here is what decay
function we apply to the rate from the last timestamp ?




-- 
 <http://www.linaro.org/> Linaro.org ? Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Possibly Parallel Threads

Search for more maybe matching threads

Linux Virtualization - Nov 2017 - [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

Possibly Parallel Threads