Christian Borntraeger
2016-Oct-25 09:03 UTC
[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
Peter, here is v2 with some improved patch descriptions and some fixes. The previous version has survived one day of linux-next and I only changed small parts. So unless there is some other issue, feel free to pull (or to apply the patches) to tip/locking. The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69: Linux 4.9-rc2 (2016-10-23 17:10:14 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git tags/cpurelax for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07: processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200) ---------------------------------------------------------------- cpu_relax: drop lowlatency, introduce yield For spinning loops people do often use barrier() or cpu_relax(). For most architectures cpu_relax and barrier are the same, but on some architectures cpu_relax can add some latency. For example on power,sparc64 and arc, cpu_relax can shift the CPU towards other hardware threads in an SMT environment. On s390 cpu_relax does even more, it uses an hypercall to the hypervisor to give up the timeslice. In contrast to the SMT yielding this can result in larger latencies. In some places this latency is unwanted, so another variant "cpu_relax_lowlatency" was introduced. Before this is used in more and more places, lets revert the logic and provide a cpu_relax_yield that can be called in places where yielding is more important than latency. By default this is the same as cpu_relax on all architectures. So my proposal boils down to: - lowest latency: use barrier() or mb() if necessary - low latency: use cpu_relax (e.g. might give up some cpu for the other _hardware_ threads) - really give up CPU: use cpu_relax_yield PS: In the long run I would also try to provide for s390 something like cpu_relax_yield_to with a cpu number (or just add that to cpu_relax_yield), since a yield_to is always better than a yield as long as we know the waiter. ---------------------------------------------------------------- Christian Borntraeger (5): processor.h: introduce cpu_relax_yield stop_machine: yield CPU during stop machine s390: make cpu_relax a barrier again processor.h: Remove cpu_relax_lowlatency users processor.h: remove cpu_relax_lowlatency arch/alpha/include/asm/processor.h | 2 +- arch/arc/include/asm/processor.h | 4 ++-- arch/arm/include/asm/processor.h | 2 +- arch/arm64/include/asm/processor.h | 2 +- arch/avr32/include/asm/processor.h | 2 +- arch/blackfin/include/asm/processor.h | 2 +- arch/c6x/include/asm/processor.h | 2 +- arch/cris/include/asm/processor.h | 2 +- arch/frv/include/asm/processor.h | 2 +- arch/h8300/include/asm/processor.h | 2 +- arch/hexagon/include/asm/processor.h | 2 +- arch/ia64/include/asm/processor.h | 2 +- arch/m32r/include/asm/processor.h | 2 +- arch/m68k/include/asm/processor.h | 2 +- arch/metag/include/asm/processor.h | 2 +- arch/microblaze/include/asm/processor.h | 2 +- arch/mips/include/asm/processor.h | 2 +- arch/mn10300/include/asm/processor.h | 2 +- arch/nios2/include/asm/processor.h | 2 +- arch/openrisc/include/asm/processor.h | 2 +- arch/parisc/include/asm/processor.h | 2 +- arch/powerpc/include/asm/processor.h | 2 +- arch/s390/include/asm/processor.h | 4 ++-- arch/s390/kernel/processor.c | 4 ++-- arch/score/include/asm/processor.h | 2 +- arch/sh/include/asm/processor.h | 2 +- arch/sparc/include/asm/processor_32.h | 2 +- arch/sparc/include/asm/processor_64.h | 2 +- arch/tile/include/asm/processor.h | 2 +- arch/unicore32/include/asm/processor.h | 2 +- arch/x86/include/asm/processor.h | 2 +- arch/x86/um/asm/processor.h | 2 +- arch/xtensa/include/asm/processor.h | 2 +- drivers/gpu/drm/i915/i915_gem_request.c | 2 +- drivers/vhost/net.c | 4 ++-- kernel/locking/mcs_spinlock.h | 4 ++-- kernel/locking/mutex.c | 4 ++-- kernel/locking/osq_lock.c | 6 +++--- kernel/locking/qrwlock.c | 6 +++--- kernel/locking/rwsem-xadd.c | 4 ++-- kernel/stop_machine.c | 2 +- lib/lockref.c | 2 +- 42 files changed, 53 insertions(+), 53 deletions(-)
Christian Borntraeger
2016-Oct-25 09:03 UTC
[GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
For spinning loops people do often use barrier() or cpu_relax(). For most architectures cpu_relax and barrier are the same, but on some architectures cpu_relax can add some latency. For example on power,sparc64 and arc, cpu_relax can shift the CPU towards other hardware threads in an SMT environment. On s390 cpu_relax does even more, it uses an hypercall to the hypervisor to give up the timeslice. In contrast to the SMT yielding this can result in larger latencies. In some places this latency is unwanted, so another variant "cpu_relax_lowlatency" was introduced. Before this is used in more and more places, lets revert the logic and provide a cpu_relax_yield that can be called in places where yielding is more important than latency. By default this is the same as cpu_relax on all architectures. Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com> --- arch/alpha/include/asm/processor.h | 1 + arch/arc/include/asm/processor.h | 2 ++ arch/arm/include/asm/processor.h | 1 + arch/arm64/include/asm/processor.h | 1 + arch/avr32/include/asm/processor.h | 1 + arch/blackfin/include/asm/processor.h | 1 + arch/c6x/include/asm/processor.h | 1 + arch/cris/include/asm/processor.h | 1 + arch/frv/include/asm/processor.h | 1 + arch/h8300/include/asm/processor.h | 1 + arch/hexagon/include/asm/processor.h | 1 + arch/ia64/include/asm/processor.h | 1 + arch/m32r/include/asm/processor.h | 1 + arch/m68k/include/asm/processor.h | 1 + arch/metag/include/asm/processor.h | 1 + arch/microblaze/include/asm/processor.h | 1 + arch/mips/include/asm/processor.h | 1 + arch/mn10300/include/asm/processor.h | 1 + arch/nios2/include/asm/processor.h | 1 + arch/openrisc/include/asm/processor.h | 1 + arch/parisc/include/asm/processor.h | 1 + arch/powerpc/include/asm/processor.h | 1 + arch/s390/include/asm/processor.h | 3 ++- arch/s390/kernel/processor.c | 4 ++-- arch/score/include/asm/processor.h | 1 + arch/sh/include/asm/processor.h | 1 + arch/sparc/include/asm/processor_32.h | 1 + arch/sparc/include/asm/processor_64.h | 1 + arch/tile/include/asm/processor.h | 1 + arch/unicore32/include/asm/processor.h | 1 + arch/x86/include/asm/processor.h | 1 + arch/x86/um/asm/processor.h | 1 + arch/xtensa/include/asm/processor.h | 1 + 33 files changed, 36 insertions(+), 3 deletions(-) diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h index 43a7559..0556fda 100644 --- a/arch/alpha/include/asm/processor.h +++ b/arch/alpha/include/asm/processor.h @@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p); ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define ARCH_HAS_PREFETCH diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h index 16b630f..6c158d5 100644 --- a/arch/arc/include/asm/processor.h +++ b/arch/arc/include/asm/processor.h @@ -60,6 +60,7 @@ struct task_struct; #ifndef CONFIG_EZNPS_MTM_EXT #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #else @@ -67,6 +68,7 @@ struct task_struct; #define cpu_relax() \ __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory") +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() barrier() #endif diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 8a1e8e9..db660e0 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #endif +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(p) \ diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 60e3482..3f9b0e5 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -149,6 +149,7 @@ static inline void cpu_relax(void) asm volatile("yield" ::: "memory"); } +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Thread switching */ diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h index 941593c..e412e8b 100644 --- a/arch/avr32/include/asm/processor.h +++ b/arch/avr32/include/asm/processor.h @@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data; #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define cpu_sync_pipeline() asm volatile("sub pc, -2" : : : "memory") diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h index 0c265ab..8b8704a 100644 --- a/arch/blackfin/include/asm/processor.h +++ b/arch/blackfin/include/asm/processor.h @@ -92,6 +92,7 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) ((tsk) == current ? rdusp() : (tsk)->thread.usp) #define cpu_relax() smp_mb() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Get the Silicon Revision of the chip */ diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h index f2ef31b..914d730 100644 --- a/arch/c6x/include/asm/processor.h +++ b/arch/c6x/include/asm/processor.h @@ -121,6 +121,7 @@ extern unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(task) (task_pt_regs(task)->sp) #define cpu_relax() do { } while (0) +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() extern const struct seq_operations cpuinfo_op; diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h index 862126b..01dd52e 100644 --- a/arch/cris/include/asm/processor.h +++ b/arch/cris/include/asm/processor.h @@ -63,6 +63,7 @@ static inline void release_thread(struct task_struct *dead_task) #define init_stack (init_thread_union.stack) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() void default_idle(void); diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h index 73f0a79..4d00d65 100644 --- a/arch/frv/include/asm/processor.h +++ b/arch/frv/include/asm/processor.h @@ -107,6 +107,7 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) ((tsk)->thread.frame0->sp) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* data cache prefetch */ diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h index 111df73..683a061 100644 --- a/arch/h8300/include/asm/processor.h +++ b/arch/h8300/include/asm/processor.h @@ -127,6 +127,7 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) ((tsk) == current ? rdusp() : (tsk)->thread.usp) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define HARD_RESET_NOW() ({ \ diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h index d850113..1558ddb 100644 --- a/arch/hexagon/include/asm/processor.h +++ b/arch/hexagon/include/asm/processor.h @@ -56,6 +56,7 @@ struct thread_struct { } #define cpu_relax() __vmyield() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h index ce53c50..4654b71 100644 --- a/arch/ia64/include/asm/processor.h +++ b/arch/ia64/include/asm/processor.h @@ -547,6 +547,7 @@ ia64_eoi (void) } #define cpu_relax() ia64_hint(ia64_hint_pause) +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() static inline int diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h index 9f8fd9b..b262037 100644 --- a/arch/m32r/include/asm/processor.h +++ b/arch/m32r/include/asm/processor.h @@ -133,6 +133,7 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) ((tsk)->thread.sp) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #endif /* _ASM_M32R_PROCESSOR_H */ diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h index c84a218..13e07ae 100644 --- a/arch/m68k/include/asm/processor.h +++ b/arch/m68k/include/asm/processor.h @@ -156,6 +156,7 @@ unsigned long get_wchan(struct task_struct *p); #define task_pt_regs(tsk) ((struct pt_regs *) ((tsk)->thread.esp0)) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #endif diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h index a0333eb..61d6e27 100644 --- a/arch/metag/include/asm/processor.h +++ b/arch/metag/include/asm/processor.h @@ -152,6 +152,7 @@ unsigned long get_wchan(struct task_struct *p); #define user_stack_pointer(regs) ((regs)->ctx.AX[0].U0) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() extern void setup_priv(void); diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h index c38d0dd..fd7dd11 100644 --- a/arch/microblaze/include/asm/processor.h +++ b/arch/microblaze/include/asm/processor.h @@ -22,6 +22,7 @@ extern const struct seq_operations cpuinfo_op; # define cpu_relax() barrier() +# define cpu_relax_yield() cpu_relax() # define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(tsk) \ diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index 0d36c87..9a656f6 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -389,6 +389,7 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h index b10ba12..89f63d1 100644 --- a/arch/mn10300/include/asm/processor.h +++ b/arch/mn10300/include/asm/processor.h @@ -69,6 +69,7 @@ extern void print_cpu_info(struct mn10300_cpuinfo *); extern void dodgy_tsc(void); #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h index 1c953f0..303e593 100644 --- a/arch/nios2/include/asm/processor.h +++ b/arch/nios2/include/asm/processor.h @@ -88,6 +88,7 @@ extern unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) ((tsk)->thread.kregs->sp) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #endif /* __ASSEMBLY__ */ diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h index 70334c9..6ecfc2a 100644 --- a/arch/openrisc/include/asm/processor.h +++ b/arch/openrisc/include/asm/processor.h @@ -92,6 +92,7 @@ extern unsigned long thread_saved_pc(struct task_struct *t); #define init_stack (init_thread_union.stack) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #endif /* __ASSEMBLY__ */ diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h index 2e674e1..ea2ff9f 100644 --- a/arch/parisc/include/asm/processor.h +++ b/arch/parisc/include/asm/processor.h @@ -309,6 +309,7 @@ extern unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) ((tsk)->thread.regs.gr[30]) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index c07c31b..908fa7c 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -404,6 +404,7 @@ static inline unsigned long __pack_fe01(unsigned int fpmode) #define cpu_relax() barrier() #endif +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Check that a certain kernel stack pointer is valid in task_struct p */ diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h index 0332317..d05965b 100644 --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -234,8 +234,9 @@ static inline unsigned short stap(void) /* * Give up the time slice of the virtual PU. */ -void cpu_relax(void); +void cpu_relax_yield(void); +#define cpu_relax() cpu_relax_yield() #define cpu_relax_lowlatency() barrier() #define ECAG_CACHE_ATTRIBUTE 0 diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c index 81d0808..9e60ef1 100644 --- a/arch/s390/kernel/processor.c +++ b/arch/s390/kernel/processor.c @@ -53,7 +53,7 @@ void s390_update_cpu_mhz(void) on_each_cpu(update_cpu_mhz, NULL, 0); } -void notrace cpu_relax(void) +void notrace cpu_relax_yield(void) { if (!smp_cpu_mtid && MACHINE_HAS_DIAG44) { diag_stat_inc(DIAG_STAT_X044); @@ -61,7 +61,7 @@ void notrace cpu_relax(void) } barrier(); } -EXPORT_SYMBOL(cpu_relax); +EXPORT_SYMBOL(cpu_relax_yield); /* * cpu_init - initializes state that is per-CPU. diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h index 851f441..e8e87b4 100644 --- a/arch/score/include/asm/processor.h +++ b/arch/score/include/asm/processor.h @@ -24,6 +24,7 @@ extern unsigned long get_wchan(struct task_struct *p); #define current_text_addr() ({ __label__ _l; _l: &&_l; }) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define release_thread(thread) do {} while (0) diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h index f9a0994..099a991 100644 --- a/arch/sh/include/asm/processor.h +++ b/arch/sh/include/asm/processor.h @@ -97,6 +97,7 @@ extern struct sh_cpuinfo cpu_data[]; #define cpu_sleep() __asm__ __volatile__ ("sleep" : : : "memory") #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() void default_idle(void); diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h index 812fd08..50e908a3c 100644 --- a/arch/sparc/include/asm/processor_32.h +++ b/arch/sparc/include/asm/processor_32.h @@ -119,6 +119,7 @@ extern struct task_struct *last_task_used_math; int do_mathemu(struct pt_regs *regs, struct task_struct *fpt); #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() extern void (*sparc_idle)(void); diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index ce2595c..3e8fac7 100644 --- a/arch/sparc/include/asm/processor_64.h +++ b/arch/sparc/include/asm/processor_64.h @@ -216,6 +216,7 @@ unsigned long get_wchan(struct task_struct *task); "nop\n\t" \ ".previous" \ ::: "memory") +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Prefetch support. This is tuned for UltraSPARC-III and later. diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h index 0684e88..91a39a5 100644 --- a/arch/tile/include/asm/processor.h +++ b/arch/tile/include/asm/processor.h @@ -264,6 +264,7 @@ static inline void cpu_relax(void) barrier(); } +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Info on this processor (see fs/proc/cpuinfo.c) */ diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h index 8d21b7a..fc54d5d 100644 --- a/arch/unicore32/include/asm/processor.h +++ b/arch/unicore32/include/asm/processor.h @@ -71,6 +71,7 @@ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(p) \ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 984a7bf..44adada 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -588,6 +588,7 @@ static __always_inline void cpu_relax(void) rep_nop(); } +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Stop speculative execution and prefetching of modified code. */ diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h index 233ee09..01597af 100644 --- a/arch/x86/um/asm/processor.h +++ b/arch/x86/um/asm/processor.h @@ -26,6 +26,7 @@ static inline void rep_nop(void) } #define cpu_relax() rep_nop() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(t) (&(t)->thread.regs) diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h index b42d68b..fe14dc2 100644 --- a/arch/xtensa/include/asm/processor.h +++ b/arch/xtensa/include/asm/processor.h @@ -206,6 +206,7 @@ extern unsigned long get_wchan(struct task_struct *p); #define KSTK_ESP(tsk) (task_pt_regs(tsk)->areg[1]) #define cpu_relax() barrier() +#define cpu_relax_yield() cpu_relax() #define cpu_relax_lowlatency() cpu_relax() /* Special register access. */ -- 2.5.5
Christian Borntraeger
2016-Oct-25 09:03 UTC
[GIT PULL v2 2/5] stop_machine: yield CPU during stop machine
Some time ago commit 57f2ffe14fd125c2 ("s390: remove diag 44 calls from cpu_relax()") did stop cpu_relax on s390 yielding to the hypervisor. As it turns out this made stop_machine run really slow on virtualized overcommited systems. For example the kprobes test during bootup took several seconds instead of just running unnoticed with large guests. Therefore, the yielding was reintroduced with commit 4d92f50249eb ("s390: reintroduce diag 44 calls for cpu_relax()"), but in fact the stop machine code seems to be the only place where this yielding was really necessary. This place is probably the most important one as it makes all but one guest CPUs wait for one guest CPU. As we now have cpu_relax_yield, we can use this in multi_cpu_stop. For now lets only add it here. We can add it later in other places when necessary. Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com> --- kernel/stop_machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index ec9ab2f..1eb8266 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data) /* Simple state machine */ do { /* Chill out and ensure we re-read multi_stop_state. */ - cpu_relax(); + cpu_relax_yield(); if (msdata->state != curstate) { curstate = msdata->state; switch (curstate) { -- 2.5.5
Christian Borntraeger
2016-Oct-25 09:03 UTC
[GIT PULL v2 3/5] s390: make cpu_relax a barrier again
stop_machine seemed to be the only important place for yielding during cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can now redefine cpu_relax to be a barrier instead on s390, making s390 identical to all other architectures. Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com> --- arch/s390/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h index d05965b..5d262cf 100644 --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -236,7 +236,7 @@ static inline unsigned short stap(void) */ void cpu_relax_yield(void); -#define cpu_relax() cpu_relax_yield() +#define cpu_relax() barrier() #define cpu_relax_lowlatency() barrier() #define ECAG_CACHE_ATTRIBUTE 0 -- 2.5.5
Christian Borntraeger
2016-Oct-25 09:03 UTC
[GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users
With the s390 special case of a yielding cpu_relax implementation gone, we can now remove all users of cpu_relax_lowlatency and replace them with cpu_relax. Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com> --- drivers/gpu/drm/i915/i915_gem_request.c | 2 +- drivers/vhost/net.c | 4 ++-- kernel/locking/mcs_spinlock.h | 4 ++-- kernel/locking/mutex.c | 4 ++-- kernel/locking/osq_lock.c | 6 +++--- kernel/locking/qrwlock.c | 6 +++--- kernel/locking/rwsem-xadd.c | 4 ++-- lib/lockref.c | 2 +- 8 files changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index 8832f8e..383d134 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req, if (busywait_stop(timeout_us, cpu)) break; - cpu_relax_lowlatency(); + cpu_relax(); } while (!need_resched()); return false; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 5dc128a..5dc3465 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net, endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(vq->dev, endtime) && vhost_vq_avail_empty(vq->dev, vq)) - cpu_relax_lowlatency(); + cpu_relax(); preempt_enable(); r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), out_num, in_num, NULL, NULL); @@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk) while (vhost_can_busy_poll(&net->dev, endtime) && !sk_has_rx_data(sk) && vhost_vq_avail_empty(&net->dev, vq)) - cpu_relax_lowlatency(); + cpu_relax(); preempt_enable(); diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index c835270..6a385aa 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -28,7 +28,7 @@ struct mcs_spinlock { #define arch_mcs_spin_lock_contended(l) \ do { \ while (!(smp_load_acquire(l))) \ - cpu_relax_lowlatency(); \ + cpu_relax(); \ } while (0) #endif @@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) return; /* Wait until the next pointer is set */ while (!(next = READ_ONCE(node->next))) - cpu_relax_lowlatency(); + cpu_relax(); } /* Pass lock to next waiter. */ diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index a70b90d..4463405 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner) break; } - cpu_relax_lowlatency(); + cpu_relax(); } rcu_read_unlock(); @@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock, * memory barriers as we'll eventually observe the right * values at the cost of a few extra spins. */ - cpu_relax_lowlatency(); + cpu_relax(); } osq_unlock(&lock->osq); diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 05a3785..4ea2710 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock, break; } - cpu_relax_lowlatency(); + cpu_relax(); } return next; @@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) if (need_resched()) goto unqueue; - cpu_relax_lowlatency(); + cpu_relax(); } return true; @@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock) if (smp_load_acquire(&node->locked)) return true; - cpu_relax_lowlatency(); + cpu_relax(); /* * Or we race against a concurrent unqueue()'s step-B, in which diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c index 19248dd..cc3ed0c 100644 --- a/kernel/locking/qrwlock.c +++ b/kernel/locking/qrwlock.c @@ -54,7 +54,7 @@ static __always_inline void rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts) { while ((cnts & _QW_WMASK) == _QW_LOCKED) { - cpu_relax_lowlatency(); + cpu_relax(); cnts = atomic_read_acquire(&lock->cnts); } } @@ -130,7 +130,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock) (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0)) break; - cpu_relax_lowlatency(); + cpu_relax(); } /* When no more readers, set the locked flag */ @@ -141,7 +141,7 @@ void queued_write_lock_slowpath(struct qrwlock *lock) _QW_LOCKED) == _QW_WAITING)) break; - cpu_relax_lowlatency(); + cpu_relax(); } unlock: arch_spin_unlock(&lock->wait_lock); diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 2337b4b..2fa2e2e6 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -368,7 +368,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) return false; } - cpu_relax_lowlatency(); + cpu_relax(); } rcu_read_unlock(); out: @@ -423,7 +423,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) * memory barriers as we'll eventually observe the right * values at the cost of a few extra spins. */ - cpu_relax_lowlatency(); + cpu_relax(); } osq_unlock(&sem->osq); done: diff --git a/lib/lockref.c b/lib/lockref.c index 5a92189..c4bfcb8 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -20,7 +20,7 @@ if (likely(old.lock_count == prev.lock_count)) { \ SUCCESS; \ } \ - cpu_relax_lowlatency(); \ + cpu_relax(); \ } \ } while (0) -- 2.5.5
Christian Borntraeger
2016-Oct-25 09:03 UTC
[GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency
As there are no users left, we can remove cpu_relax_lowlatency. Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com> --- arch/alpha/include/asm/processor.h | 1 - arch/arc/include/asm/processor.h | 2 -- arch/arm/include/asm/processor.h | 1 - arch/arm64/include/asm/processor.h | 1 - arch/avr32/include/asm/processor.h | 1 - arch/blackfin/include/asm/processor.h | 1 - arch/c6x/include/asm/processor.h | 1 - arch/cris/include/asm/processor.h | 1 - arch/frv/include/asm/processor.h | 1 - arch/h8300/include/asm/processor.h | 1 - arch/hexagon/include/asm/processor.h | 1 - arch/ia64/include/asm/processor.h | 1 - arch/m32r/include/asm/processor.h | 1 - arch/m68k/include/asm/processor.h | 1 - arch/metag/include/asm/processor.h | 1 - arch/microblaze/include/asm/processor.h | 1 - arch/mips/include/asm/processor.h | 1 - arch/mn10300/include/asm/processor.h | 1 - arch/nios2/include/asm/processor.h | 1 - arch/openrisc/include/asm/processor.h | 1 - arch/parisc/include/asm/processor.h | 1 - arch/powerpc/include/asm/processor.h | 1 - arch/s390/include/asm/processor.h | 1 - arch/score/include/asm/processor.h | 1 - arch/sh/include/asm/processor.h | 1 - arch/sparc/include/asm/processor_32.h | 1 - arch/sparc/include/asm/processor_64.h | 1 - arch/tile/include/asm/processor.h | 1 - arch/unicore32/include/asm/processor.h | 1 - arch/x86/include/asm/processor.h | 1 - arch/x86/um/asm/processor.h | 1 - arch/xtensa/include/asm/processor.h | 1 - 32 files changed, 33 deletions(-) diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h index 0556fda..31e8dbe 100644 --- a/arch/alpha/include/asm/processor.h +++ b/arch/alpha/include/asm/processor.h @@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define ARCH_HAS_PREFETCH #define ARCH_HAS_PREFETCHW diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h index 6c158d5..d102a49 100644 --- a/arch/arc/include/asm/processor.h +++ b/arch/arc/include/asm/processor.h @@ -61,7 +61,6 @@ struct task_struct; #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #else @@ -69,7 +68,6 @@ struct task_struct; __asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory") #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() barrier() #endif diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index db660e0..9e71c58b 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p); #endif #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(p) \ ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 3f9b0e5..6132f64 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -150,7 +150,6 @@ static inline void cpu_relax(void) } #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Thread switching */ extern struct task_struct *cpu_switch_to(struct task_struct *prev, diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h index e412e8b..ee62365 100644 --- a/arch/avr32/include/asm/processor.h +++ b/arch/avr32/include/asm/processor.h @@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data; #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define cpu_sync_pipeline() asm volatile("sub pc, -2" : : : "memory") struct cpu_context { diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h index 8b8704a..57acfb1 100644 --- a/arch/blackfin/include/asm/processor.h +++ b/arch/blackfin/include/asm/processor.h @@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() smp_mb() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Get the Silicon Revision of the chip */ static inline uint32_t __pure bfin_revid(void) diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h index 914d730..1fd22e7 100644 --- a/arch/c6x/include/asm/processor.h +++ b/arch/c6x/include/asm/processor.h @@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p); #define cpu_relax() do { } while (0) #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() extern const struct seq_operations cpuinfo_op; diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h index 01dd52e..1a57841 100644 --- a/arch/cris/include/asm/processor.h +++ b/arch/cris/include/asm/processor.h @@ -64,7 +64,6 @@ static inline void release_thread(struct task_struct *dead_task) #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() void default_idle(void); diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h index 4d00d65..c1e5f2a 100644 --- a/arch/frv/include/asm/processor.h +++ b/arch/frv/include/asm/processor.h @@ -108,7 +108,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* data cache prefetch */ #define ARCH_HAS_PREFETCH diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h index 683a061..42d6053 100644 --- a/arch/h8300/include/asm/processor.h +++ b/arch/h8300/include/asm/processor.h @@ -128,7 +128,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define HARD_RESET_NOW() ({ \ local_irq_disable(); \ diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h index 1558ddb..5d694cc 100644 --- a/arch/hexagon/include/asm/processor.h +++ b/arch/hexagon/include/asm/processor.h @@ -57,7 +57,6 @@ struct thread_struct { #define cpu_relax() __vmyield() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* * Decides where the kernel will search for a free chunk of vm space during diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h index 4654b71..0c2c3b2 100644 --- a/arch/ia64/include/asm/processor.h +++ b/arch/ia64/include/asm/processor.h @@ -548,7 +548,6 @@ ia64_eoi (void) #define cpu_relax() ia64_hint(ia64_hint_pause) #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() static inline int ia64_get_irr(unsigned int vector) diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h index b262037..9b83a13 100644 --- a/arch/m32r/include/asm/processor.h +++ b/arch/m32r/include/asm/processor.h @@ -134,6 +134,5 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #endif /* _ASM_M32R_PROCESSOR_H */ diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h index 13e07ae..b0d0442 100644 --- a/arch/m68k/include/asm/processor.h +++ b/arch/m68k/include/asm/processor.h @@ -157,6 +157,5 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #endif diff --git a/arch/metag/include/asm/processor.h b/arch/metag/include/asm/processor.h index 61d6e27..ee302a6 100644 --- a/arch/metag/include/asm/processor.h +++ b/arch/metag/include/asm/processor.h @@ -153,7 +153,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() extern void setup_priv(void); diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h index fd7dd11..08ec1f7 100644 --- a/arch/microblaze/include/asm/processor.h +++ b/arch/microblaze/include/asm/processor.h @@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op; # define cpu_relax() barrier() # define cpu_relax_yield() cpu_relax() -# define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(tsk) \ (((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1) diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index 9a656f6..8ea95e7 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -390,7 +390,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* * Return_address is a replacement for __builtin_return_address(count) diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h index 89f63d1..d11397b 100644 --- a/arch/mn10300/include/asm/processor.h +++ b/arch/mn10300/include/asm/processor.h @@ -70,7 +70,6 @@ extern void dodgy_tsc(void); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* * User space process size: 1.75GB (default). diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h index 303e593..d32c176 100644 --- a/arch/nios2/include/asm/processor.h +++ b/arch/nios2/include/asm/processor.h @@ -89,7 +89,6 @@ extern unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #endif /* __ASSEMBLY__ */ diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h index 6ecfc2a..7f47fc7 100644 --- a/arch/openrisc/include/asm/processor.h +++ b/arch/openrisc/include/asm/processor.h @@ -93,7 +93,6 @@ extern unsigned long thread_saved_pc(struct task_struct *t); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #endif /* __ASSEMBLY__ */ #endif /* __ASM_OPENRISC_PROCESSOR_H */ diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h index ea2ff9f..a4a07f4 100644 --- a/arch/parisc/include/asm/processor.h +++ b/arch/parisc/include/asm/processor.h @@ -310,7 +310,6 @@ extern unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* * parisc_requires_coherency() is used to identify the combined VIPT/PIPT diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 908fa7c..5684e68 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -405,7 +405,6 @@ static inline unsigned long __pack_fe01(unsigned int fpmode) #endif #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Check that a certain kernel stack pointer is valid in task_struct p */ int validate_sp(unsigned long sp, struct task_struct *p, diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h index 5d262cf..17c001a 100644 --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -237,7 +237,6 @@ static inline unsigned short stap(void) void cpu_relax_yield(void); #define cpu_relax() barrier() -#define cpu_relax_lowlatency() barrier() #define ECAG_CACHE_ATTRIBUTE 0 #define ECAG_CPU_ATTRIBUTE 1 diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h index e8e87b4..a1e97c0 100644 --- a/arch/score/include/asm/processor.h +++ b/arch/score/include/asm/processor.h @@ -25,7 +25,6 @@ extern unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define release_thread(thread) do {} while (0) /* diff --git a/arch/sh/include/asm/processor.h b/arch/sh/include/asm/processor.h index 099a991..9454ff1 100644 --- a/arch/sh/include/asm/processor.h +++ b/arch/sh/include/asm/processor.h @@ -98,7 +98,6 @@ extern struct sh_cpuinfo cpu_data[]; #define cpu_sleep() __asm__ __volatile__ ("sleep" : : : "memory") #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() void default_idle(void); void stop_this_cpu(void *); diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h index 50e908a3c..fc32b73 100644 --- a/arch/sparc/include/asm/processor_32.h +++ b/arch/sparc/include/asm/processor_32.h @@ -120,7 +120,6 @@ int do_mathemu(struct pt_regs *regs, struct task_struct *fpt); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() extern void (*sparc_idle)(void); diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index 3e8fac7..12787df 100644 --- a/arch/sparc/include/asm/processor_64.h +++ b/arch/sparc/include/asm/processor_64.h @@ -217,7 +217,6 @@ unsigned long get_wchan(struct task_struct *task); ".previous" \ ::: "memory") #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Prefetch support. This is tuned for UltraSPARC-III and later. * UltraSPARC-I will treat these as nops, and UltraSPARC-II has diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h index 91a39a5..c1c228b 100644 --- a/arch/tile/include/asm/processor.h +++ b/arch/tile/include/asm/processor.h @@ -265,7 +265,6 @@ static inline void cpu_relax(void) } #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Info on this processor (see fs/proc/cpuinfo.c) */ struct seq_operations; diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h index fc54d5d..eeefe7c 100644 --- a/arch/unicore32/include/asm/processor.h +++ b/arch/unicore32/include/asm/processor.h @@ -72,7 +72,6 @@ unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(p) \ ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 44adada..7513c99 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -589,7 +589,6 @@ static __always_inline void cpu_relax(void) } #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Stop speculative execution and prefetching of modified code. */ static inline void sync_core(void) diff --git a/arch/x86/um/asm/processor.h b/arch/x86/um/asm/processor.h index 01597af..b4bd63b 100644 --- a/arch/x86/um/asm/processor.h +++ b/arch/x86/um/asm/processor.h @@ -27,7 +27,6 @@ static inline void rep_nop(void) #define cpu_relax() rep_nop() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() #define task_pt_regs(t) (&(t)->thread.regs) diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h index fe14dc2..7d8d6be 100644 --- a/arch/xtensa/include/asm/processor.h +++ b/arch/xtensa/include/asm/processor.h @@ -207,7 +207,6 @@ extern unsigned long get_wchan(struct task_struct *p); #define cpu_relax() barrier() #define cpu_relax_yield() cpu_relax() -#define cpu_relax_lowlatency() cpu_relax() /* Special register access. */ -- 2.5.5
Christian Borntraeger
2016-Nov-15 10:15 UTC
[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
On 10/25/2016 11:03 AM, Christian Borntraeger wrote:> Peter, > > here is v2 with some improved patch descriptions and some fixes. The > previous version has survived one day of linux-next and I only changed > small parts. > So unless there is some other issue, feel free to pull (or to apply > the patches) to tip/locking. > > The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69: > > Linux 4.9-rc2 (2016-10-23 17:10:14 -0700) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git tags/cpurelax > > for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07: > > processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)Ping. Peter, you had these patches in your https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/ repository, but now the patches are gone. Any feedback?> > ---------------------------------------------------------------- > cpu_relax: drop lowlatency, introduce yield > > For spinning loops people do often use barrier() or cpu_relax(). > For most architectures cpu_relax and barrier are the same, but on > some architectures cpu_relax can add some latency. > For example on power,sparc64 and arc, cpu_relax can shift the CPU > towards other hardware threads in an SMT environment. > On s390 cpu_relax does even more, it uses an hypercall to the > hypervisor to give up the timeslice. > In contrast to the SMT yielding this can result in larger latencies. > In some places this latency is unwanted, so another variant > "cpu_relax_lowlatency" was introduced. Before this is used in more > and more places, lets revert the logic and provide a cpu_relax_yield > that can be called in places where yielding is more important than > latency. By default this is the same as cpu_relax on all architectures. > > So my proposal boils down to: > - lowest latency: use barrier() or mb() if necessary > - low latency: use cpu_relax (e.g. might give up some cpu for the other > _hardware_ threads) > - really give up CPU: use cpu_relax_yield > > PS: In the long run I would also try to provide for s390 something > like cpu_relax_yield_to with a cpu number (or just add that to > cpu_relax_yield), since a yield_to is always better than a yield as > long as we know the waiter. > > ---------------------------------------------------------------- > Christian Borntraeger (5): > processor.h: introduce cpu_relax_yield > stop_machine: yield CPU during stop machine > s390: make cpu_relax a barrier again > processor.h: Remove cpu_relax_lowlatency users > processor.h: remove cpu_relax_lowlatency > > arch/alpha/include/asm/processor.h | 2 +- > arch/arc/include/asm/processor.h | 4 ++-- > arch/arm/include/asm/processor.h | 2 +- > arch/arm64/include/asm/processor.h | 2 +- > arch/avr32/include/asm/processor.h | 2 +- > arch/blackfin/include/asm/processor.h | 2 +- > arch/c6x/include/asm/processor.h | 2 +- > arch/cris/include/asm/processor.h | 2 +- > arch/frv/include/asm/processor.h | 2 +- > arch/h8300/include/asm/processor.h | 2 +- > arch/hexagon/include/asm/processor.h | 2 +- > arch/ia64/include/asm/processor.h | 2 +- > arch/m32r/include/asm/processor.h | 2 +- > arch/m68k/include/asm/processor.h | 2 +- > arch/metag/include/asm/processor.h | 2 +- > arch/microblaze/include/asm/processor.h | 2 +- > arch/mips/include/asm/processor.h | 2 +- > arch/mn10300/include/asm/processor.h | 2 +- > arch/nios2/include/asm/processor.h | 2 +- > arch/openrisc/include/asm/processor.h | 2 +- > arch/parisc/include/asm/processor.h | 2 +- > arch/powerpc/include/asm/processor.h | 2 +- > arch/s390/include/asm/processor.h | 4 ++-- > arch/s390/kernel/processor.c | 4 ++-- > arch/score/include/asm/processor.h | 2 +- > arch/sh/include/asm/processor.h | 2 +- > arch/sparc/include/asm/processor_32.h | 2 +- > arch/sparc/include/asm/processor_64.h | 2 +- > arch/tile/include/asm/processor.h | 2 +- > arch/unicore32/include/asm/processor.h | 2 +- > arch/x86/include/asm/processor.h | 2 +- > arch/x86/um/asm/processor.h | 2 +- > arch/xtensa/include/asm/processor.h | 2 +- > drivers/gpu/drm/i915/i915_gem_request.c | 2 +- > drivers/vhost/net.c | 4 ++-- > kernel/locking/mcs_spinlock.h | 4 ++-- > kernel/locking/mutex.c | 4 ++-- > kernel/locking/osq_lock.c | 6 +++--- > kernel/locking/qrwlock.c | 6 +++--- > kernel/locking/rwsem-xadd.c | 4 ++-- > kernel/stop_machine.c | 2 +- > lib/lockref.c | 2 +- > 42 files changed, 53 insertions(+), 53 deletions(-) >
Russell King - ARM Linux
2016-Nov-15 12:30 UTC
[GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield
On Tue, Oct 25, 2016 at 11:03:11AM +0200, Christian Borntraeger wrote:> For spinning loops people do often use barrier() or cpu_relax(). > For most architectures cpu_relax and barrier are the same, but on > some architectures cpu_relax can add some latency. > For example on power,sparc64 and arc, cpu_relax can shift the CPU > towards other hardware threads in an SMT environment. > On s390 cpu_relax does even more, it uses an hypercall to the > hypervisor to give up the timeslice. > In contrast to the SMT yielding this can result in larger latencies. > In some places this latency is unwanted, so another variant > "cpu_relax_lowlatency" was introduced. Before this is used in more > and more places, lets revert the logic and provide a cpu_relax_yield > that can be called in places where yielding is more important than > latency. By default this is the same as cpu_relax on all architectures.Rather than having to update all these architectures in this way, can't we put in some linux/*.h header something like: #ifndef cpu_relax_yield #define cpu_relax_yield() cpu_relax() #endif so only those architectures that need to do something need to be modified? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Possibly Parallel Threads
- [PATCH 1/1] sched: provide common cpu_relax_yield definition
- [PATCH 1/1] sched: provide common cpu_relax_yield definition
- [GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency
- [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield
- [GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield