Michael S. Tsirkin
2016-Jan-28 17:02 UTC
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunately no callers of clflush (except one) order it using smp_mb(), so after fixing that one caller, it seems safe to override smp_mb straight away. Down the road, it might make sense to introduce clflush_mb() and switch to that for clflush callers. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all. Borislav Petkov said they will appear in tip eventually, included here for completeness. The last patch changes __smp_mb() to lock addl. I was unable to measure a speed difference on a macro benchmark, but I noted that even doing #define mb() barrier() seems to make no difference for most benchmarks (it causes hangs sometimes, of course). Lightly tested on my laptop. HPA asked that the last patch is deferred until we hear back from intel, which makes sense of course. So it needs HPA's ack. Changes from v4: Fix up the 64 bit version. Changes from v3: Leave mb() alone for now since it's used to order clflush, which requires mfence. Optimize smp_mb instead. Changes from v2: add patch adding cc clobber for addl tweak commit log for patch 2 use addl at SP-4 (as opposed to SP) to reduce data dependencies Michael S. Tsirkin (5): x86: add cc clobber for addl x86: drop a comment left over from X86_OOSTORE x86: tweak the comment about use of wmb for IO x86: use mb() around clflush x86: drop mfence in favor of lock+addl arch/x86/include/asm/barrier.h | 21 ++++++++++++--------- arch/x86/kernel/process.c | 4 ++-- 2 files changed, 14 insertions(+), 11 deletions(-) -- MST
addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc. Historically, gcc doesn't need one on x86, and always considers flags clobbered. We are probably missing the cc clobber in a *lot* of places for this reason. But even if not necessary, it's probably a good thing to add for documentation, and in case gcc semantcs ever change. Reported-by: Borislav Petkov <bp at alien8.de> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/include/asm/barrier.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a584e1c..a65bdb1 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -15,9 +15,12 @@ * Some non-Intel clones support out of order store. wmb() ceases to be a * nop for these. */ -#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2) -#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2) -#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) +#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") +#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") #else #define mb() asm volatile("mfence":::"memory") #define rmb() asm volatile("lfence":::"memory") -- MST
Michael S. Tsirkin
2016-Jan-28 17:02 UTC
[PATCH v5 2/5] x86: drop a comment left over from X86_OOSTORE
The comment about wmb being non-nop to deal with non-intel CPUs is a left over from before commit 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE"). It makes no sense now: in particular, wmb is not a nop even for regular intel CPUs because of weird use-cases e.g. dealing with WC memory. Drop this comment. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/include/asm/barrier.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a65bdb1..a291745 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -11,10 +11,6 @@ */ #ifdef CONFIG_X86_32 -/* - * Some non-Intel clones support out of order store. wmb() ceases to be a - * nop for these. - */ #define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") #define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \ -- MST
Michael S. Tsirkin
2016-Jan-28 17:02 UTC
[PATCH v5 3/5] x86: tweak the comment about use of wmb for IO
On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even that is generally questionable. Leave them around as historial unless somebody can point to a case where they care about the performance, but tweak the comment so people don't think they are strictly required in all cases. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/include/asm/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a291745..bfb28ca 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -6,7 +6,7 @@ /* * Force strict CPU ordering. - * And yes, this is required on UP too when we're talking + * And yes, this might be required on UP too when we're talking * to devices. */ -- MST
commit f8e617f4582995f7c25ef25b4167213120ad122b ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs") adds memory barriers around clflush, but this seems wrong for UP since barrier() has no effect on clflush. We really want mfence so switch to mb() instead. Cc: Mike Galbraith <bitbucket at online.de> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/kernel/process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f7c21c..9decee2 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -418,9 +418,9 @@ static void mwait_idle(void) if (!current_set_polling_and_test()) { trace_cpu_idle_rcuidle(1, smp_processor_id()); if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) { - smp_mb(); /* quirk */ + mb(); /* quirk */ clflush((void *)¤t_thread_info()->flags); - smp_mb(); /* quirk */ + mb(); /* quirk */ } __monitor((void *)¤t_thread_info()->flags, 0, 0); -- MST
Michael S. Tsirkin
2016-Jan-28 17:02 UTC
[PATCH v5 5/5] x86: drop mfence in favor of lock+addl
mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Just poking at SP would be the most natural, but if we then read the value from SP, we get a false dependency which will slow us down. This was noted in this article: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ And is easy to reproduce by sticking a barrier in a small non-inline function. So let's use a negative offset - which avoids this problem since we build with the red zone disabled. Unfortunately there's some code that wants to order clflush instructions using mb(), so we can't replace that - but smp_mb should be safe to replace. Update mb/rmb/wmb on 32 bit to use the negative offset, too, for consistency. Suggested-by: Andy Lutomirski <luto at amacapital.net> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/include/asm/barrier.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index bfb28ca..3c6ba1e 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -11,11 +11,11 @@ */ #ifdef CONFIG_X86_32 -#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ +#define mb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "mfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") -#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \ +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") -#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \ +#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") #else #define mb() asm volatile("mfence":::"memory") @@ -30,7 +30,11 @@ #endif #define dma_wmb() barrier() -#define __smp_mb() mb() +#ifdef CONFIG_X86_32 +#define __smp_mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc") +#else +#define __smp_mb() asm volatile("lock; addl $0,-4(%%rsp)" ::: "memory", "cc") +#endif #define __smp_rmb() dma_rmb() #define __smp_wmb() barrier() #define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) -- MST
On Thu, Jan 28, 2016 at 07:02:51PM +0200, Michael S. Tsirkin wrote:> commit f8e617f4582995f7c25ef25b4167213120ad122b ("sched/idle/x86: > Optimize unnecessary mwait_idle() resched IPIs") adds > memory barriers around clflush, but this seems wrong > for UP since barrier() has no effect on clflush. > We really want mfence so switch to mb() instead. > > Cc: Mike Galbraith <bitbucket at online.de> > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org>
tip-bot for Michael S. Tsirkin
2016-Jan-29 11:32 UTC
[tip:locking/core] locking/x86: Add cc clobber for ADDL
Commit-ID: bd922477d9350a3006d73dabb241400e6c4181b0 Gitweb: http://git.kernel.org/tip/bd922477d9350a3006d73dabb241400e6c4181b0 Author: Michael S. Tsirkin <mst at redhat.com> AuthorDate: Thu, 28 Jan 2016 19:02:29 +0200 Committer: Ingo Molnar <mingo at kernel.org> CommitDate: Fri, 29 Jan 2016 09:40:10 +0100 locking/x86: Add cc clobber for ADDL ADDL clobbers flags (such as CF) but barrier.h didn't tell this to GCC. Historically, GCC doesn't need one on x86, and always considers flags clobbered. We are probably missing the cc clobber in a *lot* of places for this reason. But even if not necessary, it's probably a good thing to add for documentation, and in case GCC semantcs ever change. Reported-by: Borislav Petkov <bp at alien8.de> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Andrey Konovalov <andreyknvl at google.com> Cc: Andy Lutomirski <luto at amacapital.net> Cc: Andy Lutomirski <luto at kernel.org> Cc: Borislav Petkov <bp at suse.de> Cc: Brian Gerst <brgerst at gmail.com> Cc: Davidlohr Bueso <dave at stgolabs.net> Cc: Davidlohr Bueso <dbueso at suse.de> Cc: Denys Vlasenko <dvlasenk at redhat.com> Cc: H. Peter Anvin <hpa at zytor.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Cc: Paul E. McKenney <paulmck at linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz at infradead.org> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: virtualization <virtualization at lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-2-git-send-email-mst at redhat.com Signed-off-by: Ingo Molnar <mingo at kernel.org> --- arch/x86/include/asm/barrier.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a584e1c..a65bdb1 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -15,9 +15,12 @@ * Some non-Intel clones support out of order store. wmb() ceases to be a * nop for these. */ -#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2) -#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2) -#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) +#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") +#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") #else #define mb() asm volatile("mfence":::"memory") #define rmb() asm volatile("lfence":::"memory")
tip-bot for Michael S. Tsirkin
2016-Jan-29 11:32 UTC
[tip:locking/core] locking/x86: Drop a comment left over from X86_OOSTORE
Commit-ID: e37cee133c72c9529f74a20d9b7eb3b6dfb928b5 Gitweb: http://git.kernel.org/tip/e37cee133c72c9529f74a20d9b7eb3b6dfb928b5 Author: Michael S. Tsirkin <mst at redhat.com> AuthorDate: Thu, 28 Jan 2016 19:02:37 +0200 Committer: Ingo Molnar <mingo at kernel.org> CommitDate: Fri, 29 Jan 2016 09:40:10 +0100 locking/x86: Drop a comment left over from X86_OOSTORE The comment about wmb being non-NOP to deal with non-Intel CPUs is a left over from before the following commit: 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE") It makes no sense now: in particular, wmb() is not a NOP even for regular Intel CPUs because of weird use-cases e.g. dealing with WC memory. Drop this comment. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Andrey Konovalov <andreyknvl at google.com> Cc: Andy Lutomirski <luto at amacapital.net> Cc: Andy Lutomirski <luto at kernel.org> Cc: Borislav Petkov <bp at alien8.de> Cc: Borislav Petkov <bp at suse.de> Cc: Brian Gerst <brgerst at gmail.com> Cc: Davidlohr Bueso <dave at stgolabs.net> Cc: Davidlohr Bueso <dbueso at suse.de> Cc: Denys Vlasenko <dvlasenk at redhat.com> Cc: H. Peter Anvin <hpa at zytor.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Cc: Paul E. McKenney <paulmck at linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz at infradead.org> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: virtualization <virtualization at lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-3-git-send-email-mst at redhat.com Signed-off-by: Ingo Molnar <mingo at kernel.org> --- arch/x86/include/asm/barrier.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a65bdb1..a291745 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -11,10 +11,6 @@ */ #ifdef CONFIG_X86_32 -/* - * Some non-Intel clones support out of order store. wmb() ceases to be a - * nop for these. - */ #define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") #define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
tip-bot for Michael S. Tsirkin
2016-Jan-29 11:32 UTC
[tip:locking/core] locking/x86: Tweak the comment about use of wmb() for IO
Commit-ID: 57d9b1b43433a6ba7267c80b87d8e8f6e86edceb Gitweb: http://git.kernel.org/tip/57d9b1b43433a6ba7267c80b87d8e8f6e86edceb Author: Michael S. Tsirkin <mst at redhat.com> AuthorDate: Thu, 28 Jan 2016 19:02:44 +0200 Committer: Ingo Molnar <mingo at kernel.org> CommitDate: Fri, 29 Jan 2016 09:40:10 +0100 locking/x86: Tweak the comment about use of wmb() for IO On x86, we *do* still use the non-NOP rmb()/wmb() for IO barriers, but even that is generally questionable. Leave them around as historial unless somebody can point to a case where they care about the performance, but tweak the comment so people don't think they are strictly required in all cases. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Andrey Konovalov <andreyknvl at google.com> Cc: Andy Lutomirski <luto at amacapital.net> Cc: Andy Lutomirski <luto at kernel.org> Cc: Borislav Petkov <bp at alien8.de> Cc: Borislav Petkov <bp at suse.de> Cc: Brian Gerst <brgerst at gmail.com> Cc: Davidlohr Bueso <dave at stgolabs.net> Cc: Davidlohr Bueso <dbueso at suse.de> Cc: Denys Vlasenko <dvlasenk at redhat.com> Cc: H. Peter Anvin <hpa at zytor.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Cc: Paul E. McKenney <paulmck at linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz at infradead.org> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: virtualization <virtualization at lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-4-git-send-email-mst at redhat.com Signed-off-by: Ingo Molnar <mingo at kernel.org> --- arch/x86/include/asm/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index a291745..bfb28ca 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -6,7 +6,7 @@ /* * Force strict CPU ordering. - * And yes, this is required on UP too when we're talking + * And yes, this might be required on UP too when we're talking * to devices. */
tip-bot for Michael S. Tsirkin
2016-Jan-29 11:33 UTC
[tip:locking/core] locking/x86: Use mb() around clflush()
Commit-ID: ca59809ff6d572ae58fc6bedf7500f5a60fdbd64 Gitweb: http://git.kernel.org/tip/ca59809ff6d572ae58fc6bedf7500f5a60fdbd64 Author: Michael S. Tsirkin <mst at redhat.com> AuthorDate: Thu, 28 Jan 2016 19:02:51 +0200 Committer: Ingo Molnar <mingo at kernel.org> CommitDate: Fri, 29 Jan 2016 09:40:10 +0100 locking/x86: Use mb() around clflush() The following commit: f8e617f4582995f ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs") adds memory barriers around clflush(), but this seems wrong for UP since barrier() has no effect on clflush(). We really want MFENCE, so switch to mb() instead. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz at infradead.org> Cc: Andrew Morton <akpm at linux-foundation.org> Cc: Andy Lutomirski <luto at amacapital.net> Cc: Andy Lutomirski <luto at kernel.org> Cc: Borislav Petkov <bp at alien8.de> Cc: Brian Gerst <brgerst at gmail.com> Cc: Davidlohr Bueso <dave at stgolabs.net> Cc: Davidlohr Bueso <dbueso at suse.de> Cc: Denys Vlasenko <dvlasenk at redhat.com> Cc: H. Peter Anvin <hpa at zytor.com> Cc: Len Brown <len.brown at intel.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Cc: Mike Galbraith <bitbucket at online.de> Cc: Oleg Nesterov <oleg at redhat.com> Cc: Paul E. McKenney <paulmck at linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz at infradead.org> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: virtualization <virtualization at lists.linux-foundation.org> Link: http://lkml.kernel.org/r/1453921746-16178-5-git-send-email-mst at redhat.com Signed-off-by: Ingo Molnar <mingo at kernel.org> --- arch/x86/kernel/process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f7c21c..9decee2 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -418,9 +418,9 @@ static void mwait_idle(void) if (!current_set_polling_and_test()) { trace_cpu_idle_rcuidle(1, smp_processor_id()); if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) { - smp_mb(); /* quirk */ + mb(); /* quirk */ clflush((void *)¤t_thread_info()->flags); - smp_mb(); /* quirk */ + mb(); /* quirk */ } __monitor((void *)¤t_thread_info()->flags, 0, 0);
Apparently Analagous Threads
- [PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
- [PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
- [PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
- [PATCH v6] x86: use lock+addl for smp_mb()
- [PATCH v6] x86: use lock+addl for smp_mb()