thr3ads.net - Linux Virtualization - [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Quan Xu

2017-Nov-13 10:05 UTC

[PATCH RFC v3 0/6] x86/idle: add halt poll support

From: Yang Zhang <yang.zhang.wz at gmail.com>

Some latency-intensive workload have seen obviously performance
drop when running inside VM. The main reason is that the overhead
is amplified when running inside VM. The most cost I have seen is
inside idle path.

This patch introduces a new mechanism to poll for a while before
entering idle state. If schedule is needed during poll, then we
don't need to goes through the heavy overhead path.

Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):

   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
     3402.9 ns/ctxsw -- 199.8 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
      halt_poll_threshold=10000  -- 1151.4 ns/ctxsw -- 200.1 %CPU
      halt_poll_threshold=20000  -- 1149.7 ns/ctxsw -- 199.9 %CPU
      halt_poll_threshold=30000  -- 1151.0 ns/ctxsw -- 199.9 %CPU
      halt_poll_threshold=40000  -- 1155.4 ns/ctxsw -- 199.3 %CPU
      halt_poll_threshold=50000  -- 1161.0 ns/ctxsw -- 200.0 %CPU
      halt_poll_threshold=100000 -- 1163.8 ns/ctxsw -- 200.4 %CPU
      halt_poll_threshold=300000 -- 1159.4 ns/ctxsw -- 201.9 %CPU
      halt_poll_threshold=500000 -- 1163.5 ns/ctxsw -- 205.5 %CPU

   3. w/ kvm dynamic poll:
      halt_poll_ns=10000  -- 3470.5 ns/ctxsw -- 199.6 %CPU
      halt_poll_ns=20000  -- 3273.0 ns/ctxsw -- 199.7 %CPU
      halt_poll_ns=30000  -- 3628.7 ns/ctxsw -- 199.4 %CPU
      halt_poll_ns=40000  -- 2280.6 ns/ctxsw -- 199.5 %CPU
      halt_poll_ns=50000  -- 3200.3 ns/ctxsw -- 199.7 %CPU
      halt_poll_ns=100000 -- 2186.6 ns/ctxsw -- 199.6 %CPU
      halt_poll_ns=300000 -- 3178.7 ns/ctxsw -- 199.6 %CPU
      halt_poll_ns=500000 -- 3505.4 ns/ctxsw -- 199.7 %CPU

   4. w/patch and w/ kvm dynamic poll:

      halt_poll_ns=10000 & halt_poll_threshold=10000  -- 1155.5 ns/ctxsw --
199.8 %CPU
      halt_poll_ns=10000 & halt_poll_threshold=20000  -- 1165.6 ns/ctxsw --
199.8 %CPU
      halt_poll_ns=10000 & halt_poll_threshold=30000  -- 1161.1 ns/ctxsw --
200.0 %CPU

      halt_poll_ns=20000 & halt_poll_threshold=10000  -- 1158.1 ns/ctxsw --
199.8 %CPU
      halt_poll_ns=20000 & halt_poll_threshold=20000  -- 1161.0 ns/ctxsw --
199.7 %CPU
      halt_poll_ns=20000 & halt_poll_threshold=30000  -- 1163.7 ns/ctxsw --
199.9 %CPU

      halt_poll_ns=30000 & halt_poll_threshold=10000  -- 1158.7 ns/ctxsw --
199.7 %CPU
      halt_poll_ns=30000 & halt_poll_threshold=20000  -- 1153.8 ns/ctxsw --
199.8 %CPU
      halt_poll_ns=30000 & halt_poll_threshold=30000  -- 1155.1 ns/ctxsw --
199.8 %CPU

   5. idle=poll
      3957.57 ns/ctxsw --  999.4%CPU

Here is the data we get when running benchmark netperf:

   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
      29031.6 bit/s -- 76.1 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
      halt_poll_threshold=10000  -- 29021.7 bit/s -- 105.1 %CPU
      halt_poll_threshold=20000  -- 33463.5 bit/s -- 128.2 %CPU
      halt_poll_threshold=30000  -- 34436.4 bit/s -- 127.8 %CPU
      halt_poll_threshold=40000  -- 35563.3 bit/s -- 129.6 %CPU
      halt_poll_threshold=50000  -- 35787.7 bit/s -- 129.4 %CPU
      halt_poll_threshold=100000 -- 35477.7 bit/s -- 130.0 %CPU
      halt_poll_threshold=300000 -- 35730.0 bit/s -- 132.4 %CPU
      halt_poll_threshold=500000 -- 34978.4 bit/s -- 134.2 %CPU

   3. w/ kvm dynamic poll:
      halt_poll_ns=10000  -- 28849.8 bit/s -- 75.2  %CPU
      halt_poll_ns=20000  -- 29004.8 bit/s -- 76.1  %CPU
      halt_poll_ns=30000  -- 35662.0 bit/s -- 199.7 %CPU
      halt_poll_ns=40000  -- 35874.8 bit/s -- 187.5 %CPU
      halt_poll_ns=50000  -- 35603.1 bit/s -- 199.8 %CPU
      halt_poll_ns=100000 -- 35588.8 bit/s -- 200.0 %CPU
      halt_poll_ns=300000 -- 35912.4 bit/s -- 200.0 %CPU
      halt_poll_ns=500000 -- 35735.6 bit/s -- 200.0 %CPU

   4. w/patch and w/ kvm dynamic poll:

      halt_poll_ns=10000 & halt_poll_threshold=10000  -- 29427.9 bit/s --
107.8 %CPU
      halt_poll_ns=10000 & halt_poll_threshold=20000  -- 33048.4 bit/s --
128.1 %CPU
      halt_poll_ns=10000 & halt_poll_threshold=30000  -- 35129.8 bit/s --
129.1 %CPU

      halt_poll_ns=20000 & halt_poll_threshold=10000  -- 31091.3 bit/s --
130.3 %CPU
      halt_poll_ns=20000 & halt_poll_threshold=20000  -- 33587.9 bit/s --
128.9 %CPU
      halt_poll_ns=20000 & halt_poll_threshold=30000  -- 35532.9 bit/s --
129.1 %CPU

      halt_poll_ns=30000 & halt_poll_threshold=10000  -- 35633.1 bit/s --
199.4 %CPU
      halt_poll_ns=30000 & halt_poll_threshold=20000  -- 42225.3 bit/s --
198.7 %CPU
      halt_poll_ns=30000 & halt_poll_threshold=30000  -- 42210.7 bit/s --
200.3 %CPU

   5. idle=poll
      37081.7 bit/s -- 998.1 %CPU

---
V2 -> V3:
- move poll update into arch/. in v3, poll update is based on duration of the
  last idle loop which is from tick_nohz_idle_enter to tick_nohz_idle_exit,
  and try our best not to interfere with scheduler/idle code. (This seems
  not to follow Peter's v2 comment, however we had a f2f discussion about it
  in Prague.)
- enhance patch desciption.
- enhance Documentation and sysctls.
- test with IRQ_TIMINGS related code, which seems not working so far.

V1 -> V2:
- integrate the smart halt poll into paravirt code
- use idle_stamp instead of check_poll
- since it hard to get whether vcpu is the only task in pcpu, so we
  don't consider it in this series.(May improve it in future)

---
Quan Xu (4):
  x86/paravirt: Add pv_idle_ops to paravirt ops
  KVM guest: register kvm_idle_poll for pv_idle_ops
  Documentation: Add three sysctls for smart idle poll
  tick: get duration of the last idle loop

Yang Zhang (2):
  sched/idle: Add a generic poll before enter real idle path
  KVM guest: introduce smart idle poll algorithm

 Documentation/sysctl/kernel.txt       |   35 ++++++++++++++++
 arch/x86/include/asm/paravirt.h       |    5 ++
 arch/x86/include/asm/paravirt_types.h |    6 +++
 arch/x86/kernel/kvm.c                 |   73 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/paravirt.c            |   10 +++++
 arch/x86/kernel/process.c             |    7 +++
 include/linux/kernel.h                |    6 +++
 include/linux/tick.h                  |    2 +
 kernel/sched/idle.c                   |    2 +
 kernel/sysctl.c                       |   34 +++++++++++++++
 kernel/time/tick-sched.c              |   11 +++++
 kernel/time/tick-sched.h              |    3 +
 12 files changed, 194 insertions(+), 0 deletions(-)

Quan Xu

2017-Nov-13 10:06 UTC

head link

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

From: Quan Xu <quan.xu0 at gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com>
Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
Cc: Juergen Gross <jgross at suse.com>
Cc: Alok Kataria <akataria at vmware.com>
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Ingo Molnar <mingo at redhat.com>
Cc: "H. Peter Anvin" <hpa at zytor.com>
Cc: x86 at kernel.org
Cc: virtualization at lists.linux-foundation.org
Cc: linux-kernel at vger.kernel.org
Cc: xen-devel at lists.xenproject.org
---
 arch/x86/include/asm/paravirt.h       |    5 +++++
 arch/x86/include/asm/paravirt_types.h |    6 ++++++
 arch/x86/kernel/paravirt.c            |    6 ++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index fd81228..3c83727 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -198,6 +198,11 @@ static inline unsigned long long paravirt_read_pmc(int
counter)
 
 #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
 
+static inline void paravirt_idle_poll(void)
+{
+	PVOP_VCALL0(pv_idle_ops.poll);
+}
+
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned
entries)
 {
 	PVOP_VCALL2(pv_cpu_ops.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h
b/arch/x86/include/asm/paravirt_types.h
index 10cc3b9..95c0e3e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -313,6 +313,10 @@ struct pv_lock_ops {
 	struct paravirt_callee_save vcpu_is_preempted;
 } __no_randomize_layout;
 
+struct pv_idle_ops {
+	void (*poll)(void);
+} __no_randomize_layout;
+
 /* This contains all the paravirt structures: we get a convenient
  * number for each function using the offset which we use to indicate
  * what to patch. */
@@ -323,6 +327,7 @@ struct paravirt_patch_template {
 	struct pv_irq_ops pv_irq_ops;
 	struct pv_mmu_ops pv_mmu_ops;
 	struct pv_lock_ops pv_lock_ops;
+	struct pv_idle_ops pv_idle_ops;
 } __no_randomize_layout;
 
 extern struct pv_info pv_info;
@@ -332,6 +337,7 @@ struct paravirt_patch_template {
 extern struct pv_irq_ops pv_irq_ops;
 extern struct pv_mmu_ops pv_mmu_ops;
 extern struct pv_lock_ops pv_lock_ops;
+extern struct pv_idle_ops pv_idle_ops;
 
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 19a3e8f..67cab22 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -128,6 +128,7 @@ unsigned paravirt_patch_jmp(void *insnbuf, const void
*target,
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 		.pv_lock_ops = pv_lock_ops,
 #endif
+		.pv_idle_ops = pv_idle_ops,
 	};
 	return *((void **)&tmpl + type);
 }
@@ -312,6 +313,10 @@ struct pv_time_ops pv_time_ops = {
 	.steal_clock = native_steal_clock,
 };
 
+struct pv_idle_ops pv_idle_ops = {
+	.poll = paravirt_nop,
+};
+
 __visible struct pv_irq_ops pv_irq_ops = {
 	.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
 	.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
@@ -463,3 +468,4 @@ struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
 EXPORT_SYMBOL    (pv_mmu_ops);
 EXPORT_SYMBOL_GPL(pv_info);
 EXPORT_SYMBOL    (pv_irq_ops);
+EXPORT_SYMBOL    (pv_idle_ops);
-- 
1.7.1

Quan Xu

2017-Nov-13 10:06 UTC

head link

[PATCH RFC v3 2/6] KVM guest: register kvm_idle_poll for pv_idle_ops

From: Quan Xu <quan.xu0 at gmail.com>

Although smart idle poll has nothing to do with paravirt, it can
not bring any benifit to native. So we only enable it when Linux
runs as a KVM guest( also it can extend to other hypervisor like
Xen, HyperV and VMware).

Introduce per-CPU variable poll_duration_ns to control the max
poll time.

Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com>
Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
Cc: Paolo Bonzini <pbonzini at redhat.com>
Cc: "Radim Kr?m??" <rkrcmar at redhat.com>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Ingo Molnar <mingo at redhat.com>
Cc: "H. Peter Anvin" <hpa at zytor.com>
Cc: x86 at kernel.org
Cc: kvm at vger.kernel.org
Cc: linux-kernel at vger.kernel.org
---
 arch/x86/kernel/kvm.c |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..2a6e402 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,6 +75,7 @@ static int parse_no_kvmclock_vsyscall(char *arg)
 
 early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 
+static DEFINE_PER_CPU(unsigned long, poll_duration_ns);
 static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
 static int has_steal_clock = 0;
@@ -364,6 +365,29 @@ static void kvm_guest_cpu_init(void)
 		kvm_register_steal_time();
 }
 
+static void kvm_idle_poll(void)
+{
+	unsigned long poll_duration = this_cpu_read(poll_duration_ns);
+	ktime_t start, cur, stop;
+
+	start = cur = ktime_get();
+	stop = ktime_add_ns(ktime_get(), poll_duration);
+
+	do {
+		if (need_resched())
+			break;
+		cur = ktime_get();
+	} while (ktime_before(cur, stop));
+}
+
+static void kvm_guest_idle_init(void)
+{
+	if (!kvm_para_available())
+		return;
+
+	pv_idle_ops.poll = kvm_idle_poll;
+}
+
 static void kvm_pv_disable_apf(void)
 {
 	if (!__this_cpu_read(apf_reason.enabled))
@@ -499,6 +523,8 @@ void __init kvm_guest_init(void)
 	kvm_guest_cpu_init();
 #endif
 
+	kvm_guest_idle_init();
+
 	/*
 	 * Hard lockup detection is enabled by default. Disable it, as guests
 	 * can get false positives too easily, for example if the host is
-- 
1.7.1

Quan Xu

2017-Nov-13 10:06 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

From: Yang Zhang <yang.zhang.wz at gmail.com>

Implement a generic idle poll which resembles the functionality
found in arch/. Provide weak arch_cpu_idle_poll function which
can be overridden by the architecture code if needed.

Interrupts arrive which may not cause a reschedule in idle loops.
In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
for interrupts and VM-exit immediately. Also this becomes more
expensive than bare metal. Add a generic idle poll before enter
real idle path. When a reschedule event is pending, we can bypass
the real idle path.

Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Ingo Molnar <mingo at redhat.com>
Cc: "H. Peter Anvin" <hpa at zytor.com>
Cc: x86 at kernel.org
Cc: Peter Zijlstra <peterz at infradead.org>
Cc: Borislav Petkov <bp at alien8.de>
Cc: Kyle Huey <me at kylehuey.com>
Cc: Len Brown <len.brown at intel.com>
Cc: Andy Lutomirski <luto at kernel.org>
Cc: Tom Lendacky <thomas.lendacky at amd.com>
Cc: Tobias Klauser <tklauser at distanz.ch>
Cc: linux-kernel at vger.kernel.org
---
 arch/x86/kernel/process.c |    7 +++++++
 kernel/sched/idle.c       |    2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c676853..f7db8b5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -333,6 +333,13 @@ void arch_cpu_idle(void)
 	x86_idle();
 }
 
+#ifdef CONFIG_PARAVIRT
+void arch_cpu_idle_poll(void)
+{
+	paravirt_idle_poll();
+}
+#endif
+
 /*
  * We use this if we don't have any better idle routine..
  */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 257f4f0..df7c422 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void)
 }
 
 /* Weak implementations for optional arch specific functions */
+void __weak arch_cpu_idle_poll(void) { }
 void __weak arch_cpu_idle_prepare(void) { }
 void __weak arch_cpu_idle_enter(void) { }
 void __weak arch_cpu_idle_exit(void) { }
@@ -219,6 +220,7 @@ static void do_idle(void)
 	 */
 
 	__current_set_polling();
+	arch_cpu_idle_poll();
 	quiet_vmstat();
 	tick_nohz_idle_enter();
 
-- 
1.7.1

Juergen Gross

2017-Nov-13 10:53 UTC

head link

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

On 13/11/17 11:06, Quan Xu wrote:> From: Quan Xu <quan.xu0 at gmail.com>
> 
> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
> in idle path which will poll for a while before we enter the real idle
> state.
> 
> In virtualization, idle path includes several heavy operations
> includes timer access(LAPIC timer or TSC deadline timer) which will
> hurt performance especially for latency intensive workload like message
> passing task. The cost is mainly from the vmexit which is a hardware
> context switch between virtual machine and hypervisor. Our solution is
> to poll for a while and do not enter real idle path if we can get the
> schedule event during polling.
> 
> Poll may cause the CPU waste so we adopt a smart polling mechanism to
> reduce the useless poll.
> 
> Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com>
> Signed-off-by: Quan Xu <quan.xu0 at gmail.com>
> Cc: Juergen Gross <jgross at suse.com>
> Cc: Alok Kataria <akataria at vmware.com>
> Cc: Rusty Russell <rusty at rustcorp.com.au>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Ingo Molnar <mingo at redhat.com>
> Cc: "H. Peter Anvin" <hpa at zytor.com>
> Cc: x86 at kernel.org
> Cc: virtualization at lists.linux-foundation.org
> Cc: linux-kernel at vger.kernel.org
> Cc: xen-devel at lists.xenproject.org
Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary? Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


Juergen

Peter Zijlstra

2017-Nov-15 12:11 UTC

head link

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:> From: Yang Zhang <yang.zhang.wz at gmail.com>
> 
> Implement a generic idle poll which resembles the functionality
> found in arch/. Provide weak arch_cpu_idle_poll function which
> can be overridden by the architecture code if needed.
No, we want less of those magic hooks, not more.
> Interrupts arrive which may not cause a reschedule in idle loops.
> In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
> for interrupts and VM-exit immediately. Also this becomes more
> expensive than bare metal. Add a generic idle poll before enter
> real idle path. When a reschedule event is pending, we can bypass
> the real idle path.
Why not do a HV specific idle driver?

Konrad Rzeszutek Wilk

2017-Nov-15 21:31 UTC

head link

[Xen-devel] [PATCH RFC v3 0/6] x86/idle: add halt poll support

On Mon, Nov 13, 2017 at 06:05:59PM +0800, Quan Xu wrote:> From: Yang Zhang <yang.zhang.wz at gmail.com>
> 
> Some latency-intensive workload have seen obviously performance
> drop when running inside VM. The main reason is that the overhead
> is amplified when running inside VM. The most cost I have seen is
> inside idle path.
Meaning an VMEXIT b/c it is an 'halt' operation ? And then going
back in guest (VMRESUME) takes time. And hence your latency gets
all whacked b/c of this?

So if I understand - you want to use your _full_ timeslice (of the guest)
without ever (or as much as possible) to go in the hypervisor?

Which means in effect you don't care about power-saving or CPUfreq
savings, you just want to eat the full CPU for snack?
> 
> This patch introduces a new mechanism to poll for a while before
> entering idle state. If schedule is needed during poll, then we
> don't need to goes through the heavy overhead path.
Schedule of what? The guest or the host?

Quan Xu

2017-Nov-20 07:18 UTC

head link

[Xen-devel] [PATCH RFC v3 0/6] x86/idle: add halt poll support

On 2017-11-16 05:31, Konrad Rzeszutek Wilk wrote:> On Mon, Nov 13, 2017 at 06:05:59PM +0800, Quan Xu wrote:
>> From: Yang Zhang <yang.zhang.wz at gmail.com>
>>
>> Some latency-intensive workload have seen obviously performance
>> drop when running inside VM. The main reason is that the overhead
>> is amplified when running inside VM. The most cost I have seen is
>> inside idle path.
> Meaning an VMEXIT b/c it is an 'halt' operation ? And then going
> back in guest (VMRESUME) takes time. And hence your latency gets
> all whacked b/c of this? ?? Konrad, I can't follow 'b/c' here.. sorry.
> So if I understand - you want to use your _full_ timeslice (of the guest)
> without ever (or as much as possible) to go in the hypervisor? ??? as much as possible.
> Which means in effect you don't care about power-saving or CPUfreq
> savings, you just want to eat the full CPU for snack? ? actually, we? care about power-saving. The poll duration is 
self-tuning, otherwise it is almost as the same as
 ? 'halt=poll'. Also we always sent out with CPU usage of benchmark 
netperf/ctxsw. We got much more
 ? performance with limited promotion of CPU usage.

>> This patch introduces a new mechanism to poll for a while before
>> entering idle state. If schedule is needed during poll, then we
>> don't need to goes through the heavy overhead path.
> Schedule of what? The guest or the host? ? rescheduled of guest scheduler..
 ? it is the guest.


Quan
Alibaba Cloud>
>

Seemingly Similar Threads

Search for more apparently analagous threads

Linux Virtualization - Nov 2017 - [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 0/6] x86/idle: add halt poll support

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

[PATCH RFC v3 2/6] KVM guest: register kvm_idle_poll for pv_idle_ops

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

[Xen-devel] [PATCH RFC v3 0/6] x86/idle: add halt poll support

[Xen-devel] [PATCH RFC v3 0/6] x86/idle: add halt poll support

Seemingly Similar Threads