Peter Zijlstra
2022-Jun-27 07:49 UTC
[PATCH v2] x86/paravirt: useless assignment instructions cause Unixbench full core performance degradation
On Mon, Jun 27, 2022 at 10:13:50AM +0800, Guo Hui wrote:> The instructions assigned to the vcpu_is_preempted function parameter > in the X86 architecture physical machine are redundant instructions, > causing the multi-core performance of Unixbench to drop by about 4% to 5%. > The C function is as follows: > static bool vcpu_is_preempted(long vcpu); > > The parameter 'vcpu' in the function osq_lock > that calls the function vcpu_is_preempted is assigned as follows: > > The C code is in the function node_cpu: > cpu = node->cpu - 1; > > The instructions corresponding to the C code are: > mov 0x14(%rax),%edi > sub $0x1,%edi > > The above instructions are unnecessary > in the X86 Native operating environment, > causing high cache-misses and degrading performance.The above basically says that argument setup is not patched out and causes significant pain due to a cache-miss.> Signed-off-by: Guo Hui <guohui at uniontech.com> > --- > arch/x86/kernel/paravirt-spinlocks.c | 4 ++++ > kernel/locking/osq_lock.c | 9 ++++++++- > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c > index 9e1ea99ad..7a55f8407 100644 > --- a/arch/x86/kernel/paravirt-spinlocks.c > +++ b/arch/x86/kernel/paravirt-spinlocks.c > @@ -33,6 +33,8 @@ bool pv_is_native_vcpu_is_preempted(void) > __raw_callee_save___native_vcpu_is_preempted; > } > > +DECLARE_STATIC_KEY_FALSE(preemted_key); > + > void __init paravirt_set_cap(void) > { > if (!pv_is_native_spin_unlock()) > @@ -40,4 +42,6 @@ void __init paravirt_set_cap(void) > > if (!pv_is_native_vcpu_is_preempted()) > setup_force_cpu_cap(X86_FEATURE_VCPUPREEMPT); > + else > + static_branch_enable(&preemted_key); > }At least for x86 it makes sense to have the static_key default the other way around. That is, enable it along with vcpu_is_preempted().> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c > index d5610ad52..a8798e701 100644 > --- a/kernel/locking/osq_lock.c > +++ b/kernel/locking/osq_lock.c > @@ -22,9 +22,16 @@ static inline int encode_cpu(int cpu_nr) > return cpu_nr + 1; > } > > +DEFINE_STATIC_KEY_FALSE(preemted_key); > + > static inline int node_cpu(struct optimistic_spin_node *node) > { > - return node->cpu - 1; > + int cpu = 0; > + > + if (!static_branch_unlikely(&preemted_key)) > + cpu = node->cpu - 1; > + > + return cpu; > }Would not something like: static inline bool vcpu_is_preempted_node(struct optimistic_spin_node *node) { if (!static_branch_unlikely(&vcpu_has_preemption)) return false; return vcpu_is_preempted(node_cpu(node->prev)); } And then use that like: if (smp_cond_load_relaxed(&node->locked, VAL || need_resched() || vcpu_is_preempted_node(node))) Not generate better code still?