Juergen Gross
2015-Apr-30 10:53 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
Paravirtualized spinlocks produce some overhead even if the kernel is running on bare metal. The main reason are the more complex locking and unlocking functions. Especially unlocking is no longer just one instruction but so complex that it is no longer inlined. This patch series addresses this issue by adding two more pvops functions to reduce the size of the inlined spinlock functions. When running on bare metal unlocking is again basically one instruction. Compile tested with CONFIG_PARAVIRT_SPINLOCKS on and off, 32 and 64 bits. Functional testing on bare metal and as Xen dom0. Correct patching verified by disassembly of active kernel. Juergen Gross (6): x86: use macro instead of "0" for setting TICKET_SLOWPATH_FLAG x86: move decision about clearing slowpath flag into arch_spin_lock() x86: introduce new pvops function clear_slowpath x86: introduce new pvops function spin_unlock x86: switch config from UNINLINE_SPIN_UNLOCK to INLINE_SPIN_UNLOCK x86: remove no longer needed paravirt_ticketlocks_enabled arch/x86/Kconfig | 1 - arch/x86/include/asm/paravirt.h | 13 +++++++++ arch/x86/include/asm/paravirt_types.h | 12 ++++++++ arch/x86/include/asm/spinlock.h | 53 ++++++++++++----------------------- arch/x86/include/asm/spinlock_types.h | 3 +- arch/x86/kernel/kvm.c | 14 +-------- arch/x86/kernel/paravirt-spinlocks.c | 42 +++++++++++++++++++++++++-- arch/x86/kernel/paravirt.c | 12 ++++++++ arch/x86/kernel/paravirt_patch_32.c | 25 +++++++++++++++++ arch/x86/kernel/paravirt_patch_64.c | 24 ++++++++++++++++ arch/x86/xen/spinlock.c | 23 +-------------- include/linux/spinlock_api_smp.h | 2 +- kernel/Kconfig.locks | 7 +++-- kernel/Kconfig.preempt | 3 +- kernel/locking/spinlock.c | 2 +- lib/Kconfig.debug | 1 - 16 files changed, 154 insertions(+), 83 deletions(-) -- 2.1.4
Juergen Gross
2015-Apr-30 10:53 UTC
[PATCH 1/6] x86: use macro instead of "0" for setting TICKET_SLOWPATH_FLAG
For paravirtualized spinlocks setting the "slowpath" flag in __ticket_enter_slowpath() is done via setting bit "0" in lock->tickets.head instead of using a macro. Change this by defining an appropriate macro. Signed-off-by: Juergen Gross <jgross at suse.com> --- arch/x86/include/asm/spinlock.h | 3 ++- arch/x86/include/asm/spinlock_types.h | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index cf87de3..8ceec8d 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct static_key *key); static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { - set_bit(0, (volatile unsigned long *)&lock->tickets.head); + set_bit(TICKET_SLOWPATH_BIT, + (volatile unsigned long *)&lock->tickets.head); } #else /* !CONFIG_PARAVIRT_SPINLOCKS */ diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 5f9d757..579409a 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -4,8 +4,9 @@ #include <linux/types.h> #ifdef CONFIG_PARAVIRT_SPINLOCKS +#define TICKET_SLOWPATH_BIT 0 #define __TICKET_LOCK_INC 2 -#define TICKET_SLOWPATH_FLAG ((__ticket_t)1) +#define TICKET_SLOWPATH_FLAG ((__ticket_t)(1 << TICKET_SLOWPATH_BIT)) #else #define __TICKET_LOCK_INC 1 #define TICKET_SLOWPATH_FLAG ((__ticket_t)0) -- 2.1.4
Juergen Gross
2015-Apr-30 10:53 UTC
[PATCH 2/6] x86: move decision about clearing slowpath flag into arch_spin_lock()
The decision whether the slowpath flag is to be cleared for paravirtualized spinlocks is located in __ticket_check_and_clear_slowpath() today. Move that decision into arch_spin_lock() and add an unlikely attribute to it to avoid calling a function in case the compiler chooses not to inline __ticket_check_and_clear_slowpath() and the slowpath flag isn't set. Signed-off-by: Juergen Gross <jgross at suse.com> --- arch/x86/include/asm/spinlock.h | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 8ceec8d..268b9da 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -66,20 +66,18 @@ static inline int __tickets_equal(__ticket_t one, __ticket_t two) return !((one ^ two) & ~TICKET_SLOWPATH_FLAG); } -static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, - __ticket_t head) +static inline void __ticket_clear_slowpath(arch_spinlock_t *lock, + __ticket_t head) { - if (head & TICKET_SLOWPATH_FLAG) { - arch_spinlock_t old, new; + arch_spinlock_t old, new; - old.tickets.head = head; - new.tickets.head = head & ~TICKET_SLOWPATH_FLAG; - old.tickets.tail = new.tickets.head + TICKET_LOCK_INC; - new.tickets.tail = old.tickets.tail; + old.tickets.head = head; + new.tickets.head = head & ~TICKET_SLOWPATH_FLAG; + old.tickets.tail = new.tickets.head + TICKET_LOCK_INC; + new.tickets.tail = old.tickets.tail; - /* try to clear slowpath flag when there are no contenders */ - cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); - } + /* try to clear slowpath flag when there are no contenders */ + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); } static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) @@ -120,7 +118,8 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) __ticket_lock_spinning(lock, inc.tail); } clear_slowpath: - __ticket_check_and_clear_slowpath(lock, inc.head); + if (unlikely(inc.head & TICKET_SLOWPATH_FLAG)) + __ticket_clear_slowpath(lock, inc.head); out: barrier(); /* make sure nothing creeps before the lock is taken */ } -- 2.1.4
Juergen Gross
2015-Apr-30 10:54 UTC
[PATCH 3/6] x86: introduce new pvops function clear_slowpath
To speed up paravirtualized spinlock handling when running on bare metal introduce a new pvops function "clear_slowpath". This is a nop when the kernel is running on bare metal. As the clear_slowpath function is common for all users add a new initialization function to set the pvops function pointer in order to avoid spreading the knowledge which function to use. Signed-off-by: Juergen Gross <jgross at suse.com> --- arch/x86/include/asm/paravirt.h | 7 +++++++ arch/x86/include/asm/paravirt_types.h | 1 + arch/x86/include/asm/spinlock.h | 18 ++++-------------- arch/x86/kernel/kvm.c | 2 ++ arch/x86/kernel/paravirt-spinlocks.c | 22 ++++++++++++++++++++++ arch/x86/xen/spinlock.c | 2 ++ 6 files changed, 38 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 8957810..318f077 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -724,6 +724,13 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } +static __always_inline void __ticket_clear_slowpath(arch_spinlock_t *lock, + __ticket_t head) +{ + PVOP_VCALL2(pv_lock_ops.clear_slowpath, lock, head); +} + +void pv_lock_activate(void); #endif #ifdef CONFIG_X86_32 diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index f7b0b5c..3432713 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -336,6 +336,7 @@ typedef u16 __ticket_t; struct pv_lock_ops { struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); + void (*clear_slowpath)(arch_spinlock_t *lock, __ticket_t head); }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 268b9da..ab76c3e 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -60,26 +60,16 @@ static inline void __ticket_unlock_kick(arch_spinlock_t *lock, { } +static inline void __ticket_clear_slowpath(arch_spinlock_t *lock, + __ticket_t head) +{ +} #endif /* CONFIG_PARAVIRT_SPINLOCKS */ static inline int __tickets_equal(__ticket_t one, __ticket_t two) { return !((one ^ two) & ~TICKET_SLOWPATH_FLAG); } -static inline void __ticket_clear_slowpath(arch_spinlock_t *lock, - __ticket_t head) -{ - arch_spinlock_t old, new; - - old.tickets.head = head; - new.tickets.head = head & ~TICKET_SLOWPATH_FLAG; - old.tickets.tail = new.tickets.head + TICKET_LOCK_INC; - new.tickets.tail = old.tickets.tail; - - /* try to clear slowpath flag when there are no contenders */ - cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); -} - static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) { return __tickets_equal(lock.tickets.head, lock.tickets.tail); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 9435620..c3b4b43 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -830,6 +830,8 @@ void __init kvm_spinlock_init(void) pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); pv_lock_ops.unlock_kick = kvm_unlock_kick; + + pv_lock_activate(); } static __init int kvm_spinlock_init_jump(void) diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index bbb6c73..5ece813 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -8,10 +8,32 @@ #include <asm/paravirt.h> +#ifdef CONFIG_SMP +static void pv_ticket_clear_slowpath(arch_spinlock_t *lock, __ticket_t head) +{ + arch_spinlock_t old, new; + + old.tickets.head = head; + new.tickets.head = head & ~TICKET_SLOWPATH_FLAG; + old.tickets.tail = new.tickets.head + TICKET_LOCK_INC; + new.tickets.tail = old.tickets.tail; + + /* try to clear slowpath flag when there are no contenders */ + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); +} + +void pv_lock_activate(void) +{ + pv_lock_ops.clear_slowpath = pv_ticket_clear_slowpath; +} +EXPORT_SYMBOL_GPL(pv_lock_activate); +#endif + struct pv_lock_ops pv_lock_ops = { #ifdef CONFIG_SMP .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), .unlock_kick = paravirt_nop, + .clear_slowpath = paravirt_nop, #endif }; EXPORT_SYMBOL(pv_lock_ops); diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 956374c..988c895 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -282,6 +282,8 @@ void __init xen_init_spinlocks(void) printk(KERN_DEBUG "xen: PV spinlocks enabled\n"); pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; + + pv_lock_activate(); } /* -- 2.1.4
Juergen Gross
2015-Apr-30 10:54 UTC
[PATCH 4/6] x86: introduce new pvops function spin_unlock
To speed up paravirtualized spinlock handling when running on bare metal introduce a new pvops function "spin_unlock". This is a simple add instruction (possibly with lock prefix) when the kernel is running on bare metal. As the patched instruction includes a lock prefix in some configurations annotate this location to be subject to lock prefix patching. This is working even if paravirtualization requires a call instead of the unlock instruction, because patching the lock or it's replacement is done only if the correct counterpart is found at the location to be patched. We even are not dependant on the order of lock prefix and paravirtualization patching as the sample instruction is subject to lock prefix patching as well. Signed-off-by: Juergen Gross <jgross at suse.com> --- arch/x86/include/asm/paravirt.h | 6 ++++++ arch/x86/include/asm/paravirt_types.h | 11 +++++++++++ arch/x86/include/asm/spinlock.h | 24 ++++++++++-------------- arch/x86/kernel/paravirt-spinlocks.c | 16 ++++++++++++++++ arch/x86/kernel/paravirt.c | 12 ++++++++++++ arch/x86/kernel/paravirt_patch_32.c | 25 +++++++++++++++++++++++++ arch/x86/kernel/paravirt_patch_64.c | 24 ++++++++++++++++++++++++ 7 files changed, 104 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 318f077..2f39129 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -730,6 +730,11 @@ static __always_inline void __ticket_clear_slowpath(arch_spinlock_t *lock, PVOP_VCALL2(pv_lock_ops.clear_slowpath, lock, head); } +static __always_inline void __ticket_unlock(arch_spinlock_t *lock) +{ + PVOP_VCALL1_LOCK(pv_lock_ops.unlock, lock); +} + void pv_lock_activate(void); #endif @@ -843,6 +848,7 @@ static inline notrace unsigned long arch_local_irq_save(void) #undef PVOP_VCALL0 #undef PVOP_CALL0 #undef PVOP_VCALL1 +#undef PVOP_VCALL1_LOCK #undef PVOP_CALL1 #undef PVOP_VCALL2 #undef PVOP_CALL2 diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 3432713..a26af74 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -337,6 +337,7 @@ struct pv_lock_ops { struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); void (*clear_slowpath)(arch_spinlock_t *lock, __ticket_t head); + void (*unlock)(arch_spinlock_t *lock); }; /* This contains all the paravirt structures: we get a convenient @@ -398,6 +399,7 @@ extern struct pv_lock_ops pv_lock_ops; unsigned paravirt_patch_nop(void); unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len); unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len); +unsigned paravirt_patch_unlock(void *insnbuf, unsigned len); unsigned paravirt_patch_ignore(unsigned len); unsigned paravirt_patch_call(void *insnbuf, const void *target, u16 tgt_clobbers, @@ -620,6 +622,12 @@ int paravirt_disable_iospace(void); __PVOP_CALL(rettype, op, "", "", PVOP_CALL_ARG1(arg1)) #define PVOP_VCALL1(op, arg1) \ __PVOP_VCALL(op, "", "", PVOP_CALL_ARG1(arg1)) +/* + * Using LOCK_PREFIX_HERE works because the lock prefix or it's replacement + * is checked to be present before being replaced. + */ +#define PVOP_VCALL1_LOCK(op, arg1) \ + __PVOP_VCALL(op, LOCK_PREFIX_HERE, "", PVOP_CALL_ARG1(arg1)) #define PVOP_CALLEE1(rettype, op, arg1) \ __PVOP_CALLEESAVE(rettype, op, "", "", PVOP_CALL_ARG1(arg1)) @@ -690,6 +698,9 @@ void paravirt_flush_lazy_mmu(void); void _paravirt_nop(void); u32 _paravirt_ident_32(u32); u64 _paravirt_ident_64(u64); +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void _paravirt_native_ticket_unlock(arch_spinlock_t *lock); +#endif #define paravirt_nop ((void *)_paravirt_nop) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index ab76c3e..40a1091 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -42,6 +42,10 @@ extern struct static_key paravirt_ticketlocks_enabled; static __always_inline bool static_key_false(struct static_key *key); +static inline void ___ticket_unlock(arch_spinlock_t *lock) +{ + __add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); +} #ifdef CONFIG_PARAVIRT_SPINLOCKS static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) @@ -64,6 +68,11 @@ static inline void __ticket_clear_slowpath(arch_spinlock_t *lock, __ticket_t head) { } + +static inline void __ticket_unlock(arch_spinlock_t *lock) +{ + ___ticket_unlock(lock); +} #endif /* CONFIG_PARAVIRT_SPINLOCKS */ static inline int __tickets_equal(__ticket_t one, __ticket_t two) { @@ -131,20 +140,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { - if (TICKET_SLOWPATH_FLAG && - static_key_false(¶virt_ticketlocks_enabled)) { - __ticket_t head; - - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); - - head = xadd(&lock->tickets.head, TICKET_LOCK_INC); - - if (unlikely(head & TICKET_SLOWPATH_FLAG)) { - head &= ~TICKET_SLOWPATH_FLAG; - __ticket_unlock_kick(lock, (head + TICKET_LOCK_INC)); - } - } else - __add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); + __ticket_unlock(lock); } static inline int arch_spin_is_locked(arch_spinlock_t *lock) diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index 5ece813..91273fb 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -22,9 +22,24 @@ static void pv_ticket_clear_slowpath(arch_spinlock_t *lock, __ticket_t head) cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); } +static void pv_ticket_unlock(arch_spinlock_t *lock) +{ + __ticket_t head; + + BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); + + head = xadd(&lock->tickets.head, TICKET_LOCK_INC); + + if (unlikely(head & TICKET_SLOWPATH_FLAG)) { + head &= ~TICKET_SLOWPATH_FLAG; + __ticket_unlock_kick(lock, (head + TICKET_LOCK_INC)); + } +} + void pv_lock_activate(void) { pv_lock_ops.clear_slowpath = pv_ticket_clear_slowpath; + pv_lock_ops.unlock = pv_ticket_unlock; } EXPORT_SYMBOL_GPL(pv_lock_activate); #endif @@ -34,6 +49,7 @@ struct pv_lock_ops pv_lock_ops = { .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), .unlock_kick = paravirt_nop, .clear_slowpath = paravirt_nop, + .unlock = _paravirt_native_ticket_unlock, #endif }; EXPORT_SYMBOL(pv_lock_ops); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index c614dd4..5c5e9f1 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -57,6 +57,13 @@ u64 _paravirt_ident_64(u64 x) return x; } +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void _paravirt_native_ticket_unlock(arch_spinlock_t *lock) +{ + ___ticket_unlock(lock); +} +#endif + void __init default_banner(void) { printk(KERN_INFO "Booting paravirtualized kernel on %s\n", @@ -153,6 +160,11 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf, else if (opfunc == _paravirt_ident_64) ret = paravirt_patch_ident_64(insnbuf, len); +#ifdef CONFIG_PARAVIRT_SPINLOCKS + else if (opfunc == _paravirt_native_ticket_unlock) + ret = paravirt_patch_unlock(insnbuf, len); +#endif + else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) || type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) || type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret32) || diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c index d9f32e6..628c23b 100644 --- a/arch/x86/kernel/paravirt_patch_32.c +++ b/arch/x86/kernel/paravirt_patch_32.c @@ -1,4 +1,6 @@ #include <asm/paravirt.h> +#include <asm/spinlock.h> +#include <linux/stringify.h> DEF_NATIVE(pv_irq_ops, irq_disable, "cli"); DEF_NATIVE(pv_irq_ops, irq_enable, "sti"); @@ -12,6 +14,14 @@ DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax"); DEF_NATIVE(pv_cpu_ops, clts, "clts"); DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc"); +DEF_NATIVE(, unlock1, UNLOCK_LOCK_PREFIX + "addb $"__stringify(__TICKET_LOCK_INC)", (%eax)"); +DEF_NATIVE(, unlock2, UNLOCK_LOCK_PREFIX + "addw $"__stringify(__TICKET_LOCK_INC)", (%eax)"); + +extern void __unlock_wrong_size(void) + __compiletime_error("Bad argument size for unlock"); + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { /* arg in %eax, return in %eax */ @@ -24,6 +34,21 @@ unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len) return 0; } +unsigned paravirt_patch_unlock(void *insnbuf, unsigned len) +{ + switch (sizeof(__ticket_t)) { + case __X86_CASE_B: + return paravirt_patch_insns(insnbuf, len, + start__unlock1, end__unlock1); + case __X86_CASE_W: + return paravirt_patch_insns(insnbuf, len, + start__unlock2, end__unlock2); + default: + __unlock_wrong_size(); + } + return 0; +} + unsigned native_patch(u8 type, u16 clobbers, void *ibuf, unsigned long addr, unsigned len) { diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c index a1da673..dc4d9af 100644 --- a/arch/x86/kernel/paravirt_patch_64.c +++ b/arch/x86/kernel/paravirt_patch_64.c @@ -1,5 +1,6 @@ #include <asm/paravirt.h> #include <asm/asm-offsets.h> +#include <asm/spinlock.h> #include <linux/stringify.h> DEF_NATIVE(pv_irq_ops, irq_disable, "cli"); @@ -21,6 +22,14 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs"); DEF_NATIVE(, mov32, "mov %edi, %eax"); DEF_NATIVE(, mov64, "mov %rdi, %rax"); +DEF_NATIVE(, unlock1, UNLOCK_LOCK_PREFIX + "addb $"__stringify(__TICKET_LOCK_INC)", (%rdi)"); +DEF_NATIVE(, unlock2, UNLOCK_LOCK_PREFIX + "addw $"__stringify(__TICKET_LOCK_INC)", (%rdi)"); + +extern void __unlock_wrong_size(void) + __compiletime_error("Bad argument size for unlock"); + unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len) { return paravirt_patch_insns(insnbuf, len, @@ -33,6 +42,21 @@ unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len) start__mov64, end__mov64); } +unsigned paravirt_patch_unlock(void *insnbuf, unsigned len) +{ + switch (sizeof(__ticket_t)) { + case __X86_CASE_B: + return paravirt_patch_insns(insnbuf, len, + start__unlock1, end__unlock1); + case __X86_CASE_W: + return paravirt_patch_insns(insnbuf, len, + start__unlock2, end__unlock2); + default: + __unlock_wrong_size(); + } + return 0; +} + unsigned native_patch(u8 type, u16 clobbers, void *ibuf, unsigned long addr, unsigned len) { -- 2.1.4
Juergen Gross
2015-Apr-30 10:54 UTC
[PATCH 5/6] x86: switch config from UNINLINE_SPIN_UNLOCK to INLINE_SPIN_UNLOCK
There is no need any more for a special treatment of _raw_spin_unlock() regarding inlining compared to the other spinlock functions. Just treat it like all the other spinlock functions. Remove selecting UNINLINE_SPIN_UNLOCK in case of PARAVIRT_SPINLOCKS. Signed-off-by: Juergen Gross <jgross at suse.com> --- arch/x86/Kconfig | 1 - include/linux/spinlock_api_smp.h | 2 +- kernel/Kconfig.locks | 7 ++++--- kernel/Kconfig.preempt | 3 +-- kernel/locking/spinlock.c | 2 +- lib/Kconfig.debug | 1 - 6 files changed, 7 insertions(+), 9 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 226d569..4f85c7e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -666,7 +666,6 @@ config PARAVIRT_DEBUG config PARAVIRT_SPINLOCKS bool "Paravirtualization layer for spinlocks" depends on PARAVIRT && SMP - select UNINLINE_SPIN_UNLOCK ---help--- Paravirtualized spinlocks allow a pvops backend to replace the spinlock implementation with something virtualization-friendly diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h index 5344268..839a804 100644 --- a/include/linux/spinlock_api_smp.h +++ b/include/linux/spinlock_api_smp.h @@ -69,7 +69,7 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags) #define _raw_spin_trylock_bh(lock) __raw_spin_trylock_bh(lock) #endif -#ifndef CONFIG_UNINLINE_SPIN_UNLOCK +#ifdef CONFIG_INLINE_SPIN_UNLOCK #define _raw_spin_unlock(lock) __raw_spin_unlock(lock) #endif diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 08561f1..9cc5f72 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -87,9 +87,6 @@ config ARCH_INLINE_WRITE_UNLOCK_IRQ config ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE bool -config UNINLINE_SPIN_UNLOCK - bool - # # lock_* functions are inlined when: # - DEBUG_SPINLOCK=n and GENERIC_LOCKBREAK=n and ARCH_INLINE_*LOCK=y @@ -132,6 +129,10 @@ config INLINE_SPIN_LOCK_IRQSAVE def_bool y depends on !GENERIC_LOCKBREAK && ARCH_INLINE_SPIN_LOCK_IRQSAVE +config INLINE_SPIN_UNLOCK + def_bool y + depends on !PREEMPT || ARCH_INLINE_SPIN_UNLOCK + config INLINE_SPIN_UNLOCK_BH def_bool y depends on ARCH_INLINE_SPIN_UNLOCK_BH diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt index 3f9c974..6aca8987 100644 --- a/kernel/Kconfig.preempt +++ b/kernel/Kconfig.preempt @@ -36,7 +36,6 @@ config PREEMPT_VOLUNTARY config PREEMPT bool "Preemptible Kernel (Low-Latency Desktop)" select PREEMPT_COUNT - select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK help This option reduces the latency of the kernel by making all kernel code (that is not executing in a critical section) diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c index db3ccb1..0e2b531 100644 --- a/kernel/locking/spinlock.c +++ b/kernel/locking/spinlock.c @@ -177,7 +177,7 @@ void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock) EXPORT_SYMBOL(_raw_spin_lock_bh); #endif -#ifdef CONFIG_UNINLINE_SPIN_UNLOCK +#ifndef CONFIG_INLINE_SPIN_UNLOCK void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock) { __raw_spin_unlock(lock); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 1767057..0b4cc3c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -920,7 +920,6 @@ config RT_MUTEX_TESTER config DEBUG_SPINLOCK bool "Spinlock and rw-lock debugging: basic checks" depends on DEBUG_KERNEL - select UNINLINE_SPIN_UNLOCK help Say Y here and build SMP to catch missing spinlock initialization and certain other kinds of spinlock errors commonly made. This is -- 2.1.4
Juergen Gross
2015-Apr-30 10:54 UTC
[PATCH 6/6] x86: remove no longer needed paravirt_ticketlocks_enabled
With the paravirtualized spinlock unlock function being a pvops function paravirt_ticketlocks_enabled is no longer needed. Remove it. Signed-off-by: Juergen Gross <jgross at suse.com> --- arch/x86/include/asm/spinlock.h | 3 --- arch/x86/kernel/kvm.c | 14 -------------- arch/x86/kernel/paravirt-spinlocks.c | 4 +--- arch/x86/xen/spinlock.c | 23 ----------------------- 4 files changed, 1 insertion(+), 43 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 40a1091..2ac8118 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -39,9 +39,6 @@ /* How long a lock should spin before we consider blocking */ #define SPIN_THRESHOLD (1 << 15) -extern struct static_key paravirt_ticketlocks_enabled; -static __always_inline bool static_key_false(struct static_key *key); - static inline void ___ticket_unlock(arch_spinlock_t *lock) { __add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index c3b4b43..27d815a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -834,18 +834,4 @@ void __init kvm_spinlock_init(void) pv_lock_activate(); } -static __init int kvm_spinlock_init_jump(void) -{ - if (!kvm_para_available()) - return 0; - if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) - return 0; - - static_key_slow_inc(¶virt_ticketlocks_enabled); - printk(KERN_INFO "KVM setup paravirtual spinlock\n"); - - return 0; -} -early_initcall(kvm_spinlock_init_jump); - #endif /* CONFIG_PARAVIRT_SPINLOCKS */ diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index 91273fb..6b5f33c 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -40,6 +40,7 @@ void pv_lock_activate(void) { pv_lock_ops.clear_slowpath = pv_ticket_clear_slowpath; pv_lock_ops.unlock = pv_ticket_unlock; + pr_info("paravirtual spinlocks activated\n"); } EXPORT_SYMBOL_GPL(pv_lock_activate); #endif @@ -53,6 +54,3 @@ struct pv_lock_ops pv_lock_ops = { #endif }; EXPORT_SYMBOL(pv_lock_ops); - -struct static_key paravirt_ticketlocks_enabled = STATIC_KEY_INIT_FALSE; -EXPORT_SYMBOL(paravirt_ticketlocks_enabled); diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 988c895..a125d67 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -265,10 +265,6 @@ void xen_uninit_lock_cpu(int cpu) /* - * Our init of PV spinlocks is split in two init functions due to us - * using paravirt patching and jump labels patching and having to do - * all of this before SMP code is invoked. - * * The paravirt patching needs to be done _before_ the alternative asm code * is started, otherwise we would not patch the core kernel code. */ @@ -286,25 +282,6 @@ void __init xen_init_spinlocks(void) pv_lock_activate(); } -/* - * While the jump_label init code needs to happend _after_ the jump labels are - * enabled and before SMP is started. Hence we use pre-SMP initcall level - * init. We cannot do it in xen_init_spinlocks as that is done before - * jump labels are activated. - */ -static __init int xen_init_spinlocks_jump(void) -{ - if (!xen_pvspin) - return 0; - - if (!xen_domain()) - return 0; - - static_key_slow_inc(¶virt_ticketlocks_enabled); - return 0; -} -early_initcall(xen_init_spinlocks_jump); - static __init int xen_parse_nopvspin(char *arg) { xen_pvspin = false; -- 2.1.4
Jeremy Fitzhardinge
2015-Apr-30 16:39 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
On 04/30/2015 03:53 AM, Juergen Gross wrote:> Paravirtualized spinlocks produce some overhead even if the kernel is > running on bare metal. The main reason are the more complex locking > and unlocking functions. Especially unlocking is no longer just one > instruction but so complex that it is no longer inlined. > > This patch series addresses this issue by adding two more pvops > functions to reduce the size of the inlined spinlock functions. When > running on bare metal unlocking is again basically one instruction.Out of curiosity, is there a measurable difference? J
Juergen Gross
2015-May-04 05:55 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
On 04/30/2015 06:39 PM, Jeremy Fitzhardinge wrote:> On 04/30/2015 03:53 AM, Juergen Gross wrote: >> Paravirtualized spinlocks produce some overhead even if the kernel is >> running on bare metal. The main reason are the more complex locking >> and unlocking functions. Especially unlocking is no longer just one >> instruction but so complex that it is no longer inlined. >> >> This patch series addresses this issue by adding two more pvops >> functions to reduce the size of the inlined spinlock functions. When >> running on bare metal unlocking is again basically one instruction. > > Out of curiosity, is there a measurable difference?I did a small measurement of the pure locking functions on bare metal without and with my patches. spin_lock() for the first time (lock and code not in cache) dropped from about 600 to 500 cycles. spin_unlock() for first time dropped from 145 to 87 cycles. spin_lock() in a loop dropped from 48 to 45 cycles. spin_unlock() in the same loop dropped from 24 to 22 cycles. Juergen
Juergen Gross
2015-May-15 12:16 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
Ping? On 04/30/2015 12:53 PM, Juergen Gross wrote:> Paravirtualized spinlocks produce some overhead even if the kernel is > running on bare metal. The main reason are the more complex locking > and unlocking functions. Especially unlocking is no longer just one > instruction but so complex that it is no longer inlined. > > This patch series addresses this issue by adding two more pvops > functions to reduce the size of the inlined spinlock functions. When > running on bare metal unlocking is again basically one instruction. > > Compile tested with CONFIG_PARAVIRT_SPINLOCKS on and off, 32 and 64 > bits. > > Functional testing on bare metal and as Xen dom0. > > Correct patching verified by disassembly of active kernel. > > Juergen Gross (6): > x86: use macro instead of "0" for setting TICKET_SLOWPATH_FLAG > x86: move decision about clearing slowpath flag into arch_spin_lock() > x86: introduce new pvops function clear_slowpath > x86: introduce new pvops function spin_unlock > x86: switch config from UNINLINE_SPIN_UNLOCK to INLINE_SPIN_UNLOCK > x86: remove no longer needed paravirt_ticketlocks_enabled > > arch/x86/Kconfig | 1 - > arch/x86/include/asm/paravirt.h | 13 +++++++++ > arch/x86/include/asm/paravirt_types.h | 12 ++++++++ > arch/x86/include/asm/spinlock.h | 53 ++++++++++++----------------------- > arch/x86/include/asm/spinlock_types.h | 3 +- > arch/x86/kernel/kvm.c | 14 +-------- > arch/x86/kernel/paravirt-spinlocks.c | 42 +++++++++++++++++++++++++-- > arch/x86/kernel/paravirt.c | 12 ++++++++ > arch/x86/kernel/paravirt_patch_32.c | 25 +++++++++++++++++ > arch/x86/kernel/paravirt_patch_64.c | 24 ++++++++++++++++ > arch/x86/xen/spinlock.c | 23 +-------------- > include/linux/spinlock_api_smp.h | 2 +- > kernel/Kconfig.locks | 7 +++-- > kernel/Kconfig.preempt | 3 +- > kernel/locking/spinlock.c | 2 +- > lib/Kconfig.debug | 1 - > 16 files changed, 154 insertions(+), 83 deletions(-) >
Juergen Gross
2015-Jun-08 04:09 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
Ping? Anything missing from my side? On 04/30/2015 12:53 PM, Juergen Gross wrote:> Paravirtualized spinlocks produce some overhead even if the kernel is > running on bare metal. The main reason are the more complex locking > and unlocking functions. Especially unlocking is no longer just one > instruction but so complex that it is no longer inlined. > > This patch series addresses this issue by adding two more pvops > functions to reduce the size of the inlined spinlock functions. When > running on bare metal unlocking is again basically one instruction. > > Compile tested with CONFIG_PARAVIRT_SPINLOCKS on and off, 32 and 64 > bits. > > Functional testing on bare metal and as Xen dom0. > > Correct patching verified by disassembly of active kernel. > > Juergen Gross (6): > x86: use macro instead of "0" for setting TICKET_SLOWPATH_FLAG > x86: move decision about clearing slowpath flag into arch_spin_lock() > x86: introduce new pvops function clear_slowpath > x86: introduce new pvops function spin_unlock > x86: switch config from UNINLINE_SPIN_UNLOCK to INLINE_SPIN_UNLOCK > x86: remove no longer needed paravirt_ticketlocks_enabled > > arch/x86/Kconfig | 1 - > arch/x86/include/asm/paravirt.h | 13 +++++++++ > arch/x86/include/asm/paravirt_types.h | 12 ++++++++ > arch/x86/include/asm/spinlock.h | 53 ++++++++++++----------------------- > arch/x86/include/asm/spinlock_types.h | 3 +- > arch/x86/kernel/kvm.c | 14 +-------- > arch/x86/kernel/paravirt-spinlocks.c | 42 +++++++++++++++++++++++++-- > arch/x86/kernel/paravirt.c | 12 ++++++++ > arch/x86/kernel/paravirt_patch_32.c | 25 +++++++++++++++++ > arch/x86/kernel/paravirt_patch_64.c | 24 ++++++++++++++++ > arch/x86/xen/spinlock.c | 23 +-------------- > include/linux/spinlock_api_smp.h | 2 +- > kernel/Kconfig.locks | 7 +++-- > kernel/Kconfig.preempt | 3 +- > kernel/locking/spinlock.c | 2 +- > lib/Kconfig.debug | 1 - > 16 files changed, 154 insertions(+), 83 deletions(-)
Juergen Gross
2015-Jun-16 14:37 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
AFAIK there are no outstanding questions for more than one month now. I'd appreciate some feedback or accepting these patches. Juergen On 04/30/2015 12:53 PM, Juergen Gross wrote:> Paravirtualized spinlocks produce some overhead even if the kernel is > running on bare metal. The main reason are the more complex locking > and unlocking functions. Especially unlocking is no longer just one > instruction but so complex that it is no longer inlined. > > This patch series addresses this issue by adding two more pvops > functions to reduce the size of the inlined spinlock functions. When > running on bare metal unlocking is again basically one instruction. > > Compile tested with CONFIG_PARAVIRT_SPINLOCKS on and off, 32 and 64 > bits. > > Functional testing on bare metal and as Xen dom0. > > Correct patching verified by disassembly of active kernel. > > Juergen Gross (6): > x86: use macro instead of "0" for setting TICKET_SLOWPATH_FLAG > x86: move decision about clearing slowpath flag into arch_spin_lock() > x86: introduce new pvops function clear_slowpath > x86: introduce new pvops function spin_unlock > x86: switch config from UNINLINE_SPIN_UNLOCK to INLINE_SPIN_UNLOCK > x86: remove no longer needed paravirt_ticketlocks_enabled > > arch/x86/Kconfig | 1 - > arch/x86/include/asm/paravirt.h | 13 +++++++++ > arch/x86/include/asm/paravirt_types.h | 12 ++++++++ > arch/x86/include/asm/spinlock.h | 53 ++++++++++++----------------------- > arch/x86/include/asm/spinlock_types.h | 3 +- > arch/x86/kernel/kvm.c | 14 +-------- > arch/x86/kernel/paravirt-spinlocks.c | 42 +++++++++++++++++++++++++-- > arch/x86/kernel/paravirt.c | 12 ++++++++ > arch/x86/kernel/paravirt_patch_32.c | 25 +++++++++++++++++ > arch/x86/kernel/paravirt_patch_64.c | 24 ++++++++++++++++ > arch/x86/xen/spinlock.c | 23 +-------------- > include/linux/spinlock_api_smp.h | 2 +- > kernel/Kconfig.locks | 7 +++-- > kernel/Kconfig.preempt | 3 +- > kernel/locking/spinlock.c | 2 +- > lib/Kconfig.debug | 1 - > 16 files changed, 154 insertions(+), 83 deletions(-) >
Thomas Gleixner
2015-Jun-16 15:18 UTC
[PATCH 0/6] x86: reduce paravirtualized spinlock overhead
On Tue, 16 Jun 2015, Juergen Gross wrote:> AFAIK there are no outstanding questions for more than one month now. > I'd appreciate some feedback or accepting these patches.They are against dead code, which will be gone soon. We switched over to queued locks. Thanks, tglx
Reasonably Related Threads
- [PATCH 0/6] x86: reduce paravirtualized spinlock overhead
- [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
- [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
- [PATCH V3] x86 spinlock: Fix memory corruption on completing completions
- [PATCH V3] x86 spinlock: Fix memory corruption on completing completions