Quan Xu
2017-Nov-13 10:06 UTC
[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
From: Quan Xu <quan.xu0 at gmail.com> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called in idle path which will poll for a while before we enter the real idle state. In virtualization, idle path includes several heavy operations includes timer access(LAPIC timer or TSC deadline timer) which will hurt performance especially for latency intensive workload like message passing task. The cost is mainly from the vmexit which is a hardware context switch between virtual machine and hypervisor. Our solution is to poll for a while and do not enter real idle path if we can get the schedule event during polling. Poll may cause the CPU waste so we adopt a smart polling mechanism to reduce the useless poll. Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com> Signed-off-by: Quan Xu <quan.xu0 at gmail.com> Cc: Juergen Gross <jgross at suse.com> Cc: Alok Kataria <akataria at vmware.com> Cc: Rusty Russell <rusty at rustcorp.com.au> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: Ingo Molnar <mingo at redhat.com> Cc: "H. Peter Anvin" <hpa at zytor.com> Cc: x86 at kernel.org Cc: virtualization at lists.linux-foundation.org Cc: linux-kernel at vger.kernel.org Cc: xen-devel at lists.xenproject.org --- arch/x86/include/asm/paravirt.h | 5 +++++ arch/x86/include/asm/paravirt_types.h | 6 ++++++ arch/x86/kernel/paravirt.c | 6 ++++++ 3 files changed, 17 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index fd81228..3c83727 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -198,6 +198,11 @@ static inline unsigned long long paravirt_read_pmc(int counter) #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter)) +static inline void paravirt_idle_poll(void) +{ + PVOP_VCALL0(pv_idle_ops.poll); +} + static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned entries) { PVOP_VCALL2(pv_cpu_ops.alloc_ldt, ldt, entries); diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 10cc3b9..95c0e3e 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -313,6 +313,10 @@ struct pv_lock_ops { struct paravirt_callee_save vcpu_is_preempted; } __no_randomize_layout; +struct pv_idle_ops { + void (*poll)(void); +} __no_randomize_layout; + /* This contains all the paravirt structures: we get a convenient * number for each function using the offset which we use to indicate * what to patch. */ @@ -323,6 +327,7 @@ struct paravirt_patch_template { struct pv_irq_ops pv_irq_ops; struct pv_mmu_ops pv_mmu_ops; struct pv_lock_ops pv_lock_ops; + struct pv_idle_ops pv_idle_ops; } __no_randomize_layout; extern struct pv_info pv_info; @@ -332,6 +337,7 @@ struct paravirt_patch_template { extern struct pv_irq_ops pv_irq_ops; extern struct pv_mmu_ops pv_mmu_ops; extern struct pv_lock_ops pv_lock_ops; +extern struct pv_idle_ops pv_idle_ops; #define PARAVIRT_PATCH(x) \ (offsetof(struct paravirt_patch_template, x) / sizeof(void *)) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 19a3e8f..67cab22 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -128,6 +128,7 @@ unsigned paravirt_patch_jmp(void *insnbuf, const void *target, #ifdef CONFIG_PARAVIRT_SPINLOCKS .pv_lock_ops = pv_lock_ops, #endif + .pv_idle_ops = pv_idle_ops, }; return *((void **)&tmpl + type); } @@ -312,6 +313,10 @@ struct pv_time_ops pv_time_ops = { .steal_clock = native_steal_clock, }; +struct pv_idle_ops pv_idle_ops = { + .poll = paravirt_nop, +}; + __visible struct pv_irq_ops pv_irq_ops = { .save_fl = __PV_IS_CALLEE_SAVE(native_save_fl), .restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl), @@ -463,3 +468,4 @@ struct pv_mmu_ops pv_mmu_ops __ro_after_init = { EXPORT_SYMBOL (pv_mmu_ops); EXPORT_SYMBOL_GPL(pv_info); EXPORT_SYMBOL (pv_irq_ops); +EXPORT_SYMBOL (pv_idle_ops); -- 1.7.1
Wanpeng Li
2017-Nov-13 11:09 UTC
[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
2017-11-13 18:53 GMT+08:00 Juergen Gross <jgross at suse.com>:> On 13/11/17 11:06, Quan Xu wrote: >> From: Quan Xu <quan.xu0 at gmail.com> >> >> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called >> in idle path which will poll for a while before we enter the real idle >> state. >> >> In virtualization, idle path includes several heavy operations >> includes timer access(LAPIC timer or TSC deadline timer) which will >> hurt performance especially for latency intensive workload like message >> passing task. The cost is mainly from the vmexit which is a hardware >> context switch between virtual machine and hypervisor. Our solution is >> to poll for a while and do not enter real idle path if we can get the >> schedule event during polling. >> >> Poll may cause the CPU waste so we adopt a smart polling mechanism to >> reduce the useless poll. >> >> Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com> >> Signed-off-by: Quan Xu <quan.xu0 at gmail.com> >> Cc: Juergen Gross <jgross at suse.com> >> Cc: Alok Kataria <akataria at vmware.com> >> Cc: Rusty Russell <rusty at rustcorp.com.au> >> Cc: Thomas Gleixner <tglx at linutronix.de> >> Cc: Ingo Molnar <mingo at redhat.com> >> Cc: "H. Peter Anvin" <hpa at zytor.com> >> Cc: x86 at kernel.org >> Cc: virtualization at lists.linux-foundation.org >> Cc: linux-kernel at vger.kernel.org >> Cc: xen-devel at lists.xenproject.org > > Hmm, is the idle entry path really so critical to performance that a new > pvops function is necessary? Wouldn't a function pointer, maybe guarded > by a static key, be enough? A further advantage would be that this would > work on other architectures, too.There is a "Adaptive halt-polling" which are merged to upstream more than two years ago avoids to thread the critical path and has already been ported to other architectures. https://lkml.org/lkml/2015/9/3/615 Regards, Wanpeng Li
Quan Xu
2017-Nov-14 07:02 UTC
[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
On 2017/11/13 18:53, Juergen Gross wrote:> On 13/11/17 11:06, Quan Xu wrote: >> From: Quan Xu <quan.xu0 at gmail.com> >> >> So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called >> in idle path which will poll for a while before we enter the real idle >> state. >> >> In virtualization, idle path includes several heavy operations >> includes timer access(LAPIC timer or TSC deadline timer) which will >> hurt performance especially for latency intensive workload like message >> passing task. The cost is mainly from the vmexit which is a hardware >> context switch between virtual machine and hypervisor. Our solution is >> to poll for a while and do not enter real idle path if we can get the >> schedule event during polling. >> >> Poll may cause the CPU waste so we adopt a smart polling mechanism to >> reduce the useless poll. >> >> Signed-off-by: Yang Zhang <yang.zhang.wz at gmail.com> >> Signed-off-by: Quan Xu <quan.xu0 at gmail.com> >> Cc: Juergen Gross <jgross at suse.com> >> Cc: Alok Kataria <akataria at vmware.com> >> Cc: Rusty Russell <rusty at rustcorp.com.au> >> Cc: Thomas Gleixner <tglx at linutronix.de> >> Cc: Ingo Molnar <mingo at redhat.com> >> Cc: "H. Peter Anvin" <hpa at zytor.com> >> Cc: x86 at kernel.org >> Cc: virtualization at lists.linux-foundation.org >> Cc: linux-kernel at vger.kernel.org >> Cc: xen-devel at lists.xenproject.org > Hmm, is the idle entry path really so critical to performance that a new > pvops function is necessary?Juergen, Here is the data we get when running benchmark netperf: ?1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0): ??? 29031.6 bit/s -- 76.1 %CPU ?2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0): ??? 35787.7 bit/s -- 129.4 %CPU ?3. w/ kvm dynamic poll: ??? 35735.6 bit/s -- 200.0 %CPU ?4. w/patch and w/ kvm dynamic poll: ??? 42225.3 bit/s -- 198.7 %CPU ?5. idle=poll ??? 37081.7 bit/s -- 998.1 %CPU ?w/ this patch, we will improve performance by 23%.. even we could improve ?performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the ?cost of CPU is much lower than 'idle=poll' case..> Wouldn't a function pointer, maybe guarded > by a static key, be enough? A further advantage would be that this would > work on other architectures, too.I assume this feature will be ported to other archs.. a new pvops makes code clean and easy to maintain. also I tried to add it into existed pvops, but it doesn't match. Quan Alibaba Cloud> > Juergen >
Reasonably Related Threads
- [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
- [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
- [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
- [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops
- [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops