Michael S. Tsirkin
2015-Dec-30 13:24 UTC
[PATCH 00/34] arch: barrier cleanup + __smp_XXX barriers for virt
This is really trying to cleanup some virt code, as suggested by Peter, who said> You could of course go fix that instead of mutilating things into > sort-of functional state.This work is needed for virtio, so it's probably easiest to merge it through my tree - is this fine by everyone? Arnd, if you agree, could you ack this please? Note to arch maintainers: please don't cherry-pick patches out of this patchset as it's been structured in this order to avoid breaking bisect. Please send acks instead! Sometimes, virtualization is weird. For example, virtio does this (conceptually): #ifdef CONFIG_SMP smp_mb(); #else mb(); #endif Similarly, Xen calls mb() when it's not doing any MMIO at all. Of course it's wrong in the sense that it's suboptimal. What we would really like is to have, on UP, exactly the same barrier as on SMP. This is because a UP guest can run on an SMP host. But Linux doesn't provide this ability: if CONFIG_SMP is not defined is optimizes most barriers out to a compiler barrier. Consider for example x86: what we want is xchg (NOT mfence - there's no real IO going on here - just switching out of the VM - more like a function call really) but if built without CONFIG_SMP smp_store_mb does not include this. Virt in general is probably the only use-case, because this really is an artifact of interfacing with an SMP host while running an UP kernel, but since we have (at least) two users, it seems to make sense to put these APIs in a central place. In fact, smp_ barriers are stubs on !SMP, so they can be defined as follows: arch/XXX/include/asm/barrier.h: #define __smp_mb() DOSOMETHING include/asm-generic/barrier.h: #ifdef CONFIG_SMP #define smp_mb() __smp_mb() #else #define smp_mb() barrier() #endif This has the benefit of cleaning out a bunch of duplicated ifdefs on a bunch of architectures - this patchset brings about a net reduction in LOC, compensated for by extra documentation :) Then virt can use __smp_XXX when talking to an SMP host. Touching all archs is a tad tedious, but its fairly straight forward. The rest of the patchset is structured as follows: -. Patch 1 documents the __smp APIs, and explains why they are useful for virt -. Patches 2-7 makes sure barrier.h on ia64, powerpc, s390 and sparc includes asm-generic/barrier.h: this is safe because code there matches asm-generic/barrier.h almost verbatim. Minor code tweaks were required in a couple of places. Macros duplicated from asm-generic/barrier.h are dropped in the process. I compiled these arches before and after the changes, dumped the .text section (using objdump -O binary) and made sure that the object code is exactly identical before and after the change. -. Patch 8 fixes up smp_store_mb in asm-generic/barrier.h: it should call smp_mb and not mb. Luckily only used in core kernel so we know no one misused it for an MMIO barrier (so far). This is the only patch that affects code behaviour outside virt, and that for !SMP only. Compile-tested on a bunch of architectures (x86 hardware which I have is unaffected by this). It's very straightforward so I hope it's enough. Hoping for some acks on this architecture. -. Patches 9-14 make sure barrier.h on all remaining architectures includes asm-generic/barrier.h: after the change in Patch 8, code there matches asm-generic/barrier.h almost verbatim. Macros duplicated from asm-generic/barrier.h are dropped in the process. After all that preparatory work, we are getting to the actual change. -. Patches 15 adds generic smp_XXX wrappers in asm-generic these select __smp_XXX or barrier() depending on CONFIG_SMP -. Patches 16-28 change all architectures to define __smp_XXX macros; the generic code in asm-generic/barrier.h then defines smp_XXX macros I compiled the affected arches before and after the changes, dumped the .text section (using objdump -O binary) and made sure that the object code is exactly identical before and after the change. I couldn't fully build sh,tile,xtensa but I did this test kernel/rcu/tree.o kernel/sched/wait.o and kernel/futex.o and tested these instead. Unfortunately, I don't have a metag cross-build toolset ready. Hoping for some acks on this architecture. Finally, the following patches put the __smp_XXX APIs to work for virt: -. Patches 29-31 convert virtio and xen drivers to use the __smp_XXX APIs xen patches are untested virtio ones have been tested on x86 -. Patches 33-34 teach virtio to use __smp_load_acquire/__smp_store_release/__smp_store_mb This is what started all this work. tested on x86 The patchset has been in linux-next for a bit, so far without issues. Michael S. Tsirkin (34): Documentation/memory-barriers.txt: document __smb_mb() asm-generic: guard smp_store_release/load_acquire ia64: rename nop->iosapic_nop ia64: reuse asm-generic/barrier.h powerpc: reuse asm-generic/barrier.h s390: reuse asm-generic/barrier.h sparc: reuse asm-generic/barrier.h asm-generic: smp_store_mb should use smp_mb arm: reuse asm-generic/barrier.h arm64: reuse asm-generic/barrier.h metag: reuse asm-generic/barrier.h mips: reuse asm-generic/barrier.h x86/um: reuse asm-generic/barrier.h x86: reuse asm-generic/barrier.h asm-generic: add __smp_XXX wrappers powerpc: define __smp_XXX arm64: define __smp_XXX arm: define __smp_XXX blackfin: define __smp_XXX ia64: define __smp_XXX metag: define __smp_XXX mips: define __smp_XXX s390: define __smp_XXX sh: define __smp_XXX, fix smp_store_mb for !SMP sparc: define __smp_XXX tile: define __smp_XXX xtensa: define __smp_XXX x86: define __smp_XXX Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" virtio_ring: update weak barriers to use __smp_XXX xenbus: use __smp_XXX barriers xen/io: use __smp_XXX barriers virtio: use __smp_load_acquire/__smp_store_release virtio_ring: use __smp_store_mb arch/arm/include/asm/barrier.h | 35 ++------------ arch/arm64/include/asm/barrier.h | 19 +++----- arch/blackfin/include/asm/barrier.h | 4 +- arch/ia64/include/asm/barrier.h | 24 +++------- arch/metag/include/asm/barrier.h | 55 +++++++-------------- arch/mips/include/asm/barrier.h | 51 +++++++------------- arch/powerpc/include/asm/barrier.h | 33 ++++--------- arch/s390/include/asm/barrier.h | 25 +++++----- arch/sh/include/asm/barrier.h | 3 +- arch/sparc/include/asm/barrier_32.h | 1 - arch/sparc/include/asm/barrier_64.h | 29 +++-------- arch/sparc/include/asm/processor.h | 3 -- arch/tile/include/asm/barrier.h | 9 ++-- arch/x86/include/asm/barrier.h | 36 ++++++-------- arch/x86/um/asm/barrier.h | 9 +--- arch/xtensa/include/asm/barrier.h | 4 +- include/asm-generic/barrier.h | 95 +++++++++++++++++++++++++++++++++---- include/linux/virtio_ring.h | 48 ++++++++++++++++--- include/xen/interface/io/ring.h | 16 +++---- arch/ia64/kernel/iosapic.c | 6 +-- drivers/virtio/virtio_ring.c | 35 +++++++------- drivers/xen/xenbus/xenbus_comms.c | 8 ++-- Documentation/memory-barriers.txt | 33 +++++++++++-- 23 files changed, 292 insertions(+), 289 deletions(-) -- MST
Michael S. Tsirkin
2015-Dec-30 13:24 UTC
[PATCH 01/34] Documentation/memory-barriers.txt: document __smb_mb()
Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- Documentation/memory-barriers.txt | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index aef9487..a20f7ef 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1655,17 +1655,18 @@ macro is a good place to start looking. SMP memory barriers are reduced to compiler barriers on uniprocessor compiled systems because it is assumed that a CPU will appear to be self-consistent, and will order overlapping accesses correctly with respect to itself. +However, see the subsection on "Virtual Machine Guests" below. [!] Note that SMP memory barriers _must_ be used to control the ordering of references to shared memory on SMP systems, though the use of locking instead is sufficient. Mandatory barriers should not be used to control SMP effects, since mandatory -barriers unnecessarily impose overhead on UP systems. They may, however, be -used to control MMIO effects on accesses through relaxed memory I/O windows. -These are required even on non-SMP systems as they affect the order in which -memory operations appear to a device by prohibiting both the compiler and the -CPU from reordering them. +barriers impose unnecessary overhead on both SMP and UP systems. They may, +however, be used to control MMIO effects on accesses through relaxed memory I/O +windows. These barriers are required even on non-SMP systems as they affect +the order in which memory operations appear to a device by prohibiting both the +compiler and the CPU from reordering them. There are some more advanced barrier functions: @@ -2948,6 +2949,28 @@ The Alpha defines the Linux kernel's memory barrier model. See the subsection on "Cache Coherency" above. +VIRTUAL MACHINE GUESTS +------------------- + +Guests running within virtual machines might be affected by +SMP effects even if the guest itself is compiled within +SMP support. + +This is an artifact of interfacing with an SMP host while +running an UP kernel. + +Using mandatory barriers for this use-case would be possible +but is often suboptimal. + +To handle this case optimally, low-level __smp_mb() etc macros are available. +These have the same effect as smp_mb() etc when SMP is enabled, but generate +identical code for SMP and non-SMP systems. For example, virtual machine guests +should use __smp_mb() rather than smp_mb() when synchronizing against a +(possibly SMP) host. + +These are equivalent to smp_mb() etc counterparts in all other respects, +in particular, they do not control MMIO effects: to control +MMIO effects, use mandatory barriers. =========== EXAMPLE USES -- MST
Michael S. Tsirkin
2015-Dec-30 13:24 UTC
[PATCH 02/34] asm-generic: guard smp_store_release/load_acquire
Allow architectures to override smp_store_release and smp_load_acquire by guarding the defines in asm-generic/barrier.h with ifndef directives. This is in preparation to reusing asm-generic/barrier.h on architectures which have their own definition of these macros. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/asm-generic/barrier.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index b42afad..538f8d1 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -104,13 +104,16 @@ #define smp_mb__after_atomic() smp_mb() #endif +#ifndef smp_store_release #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ smp_mb(); \ WRITE_ONCE(*p, v); \ } while (0) +#endif +#ifndef smp_load_acquire #define smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ @@ -118,6 +121,7 @@ do { \ smp_mb(); \ ___p1; \ }) +#endif #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ -- MST
asm-generic/barrier.h defines a nop() macro. To be able to use this header on ia64, we shouldn't call local functions/variables nop(). There's one instance where this breaks on ia64: rename the function to iosapic_nop to avoid the conflict. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/ia64/kernel/iosapic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c index d2fae05..90fde5b 100644 --- a/arch/ia64/kernel/iosapic.c +++ b/arch/ia64/kernel/iosapic.c @@ -256,7 +256,7 @@ set_rte (unsigned int gsi, unsigned int irq, unsigned int dest, int mask) } static void -nop (struct irq_data *data) +iosapic_nop (struct irq_data *data) { /* do nothing... */ } @@ -415,7 +415,7 @@ iosapic_unmask_level_irq (struct irq_data *data) #define iosapic_shutdown_level_irq mask_irq #define iosapic_enable_level_irq unmask_irq #define iosapic_disable_level_irq mask_irq -#define iosapic_ack_level_irq nop +#define iosapic_ack_level_irq iosapic_nop static struct irq_chip irq_type_iosapic_level = { .name = "IO-SAPIC-level", @@ -453,7 +453,7 @@ iosapic_ack_edge_irq (struct irq_data *data) } #define iosapic_enable_edge_irq unmask_irq -#define iosapic_disable_edge_irq nop +#define iosapic_disable_edge_irq iosapic_nop static struct irq_chip irq_type_iosapic_edge = { .name = "IO-SAPIC-edge", -- MST
On ia64 smp_rmb, smp_wmb, read_barrier_depends, smp_read_barrier_depends and smp_store_mb() match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/ia64/include/asm/barrier.h | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h index df896a1..2f93348 100644 --- a/arch/ia64/include/asm/barrier.h +++ b/arch/ia64/include/asm/barrier.h @@ -48,12 +48,6 @@ # define smp_mb() barrier() #endif -#define smp_rmb() smp_mb() -#define smp_wmb() smp_mb() - -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) - #define smp_mb__before_atomic() barrier() #define smp_mb__after_atomic() barrier() @@ -77,12 +71,12 @@ do { \ ___p1; \ }) -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) - /* * The group barrier in front of the rsm & ssm are necessary to ensure * that none of the previous instructions in the same group are * affected by the rsm/ssm. */ +#include <asm-generic/barrier.h> + #endif /* _ASM_IA64_BARRIER_H */ -- MST
Michael S. Tsirkin
2015-Dec-30 13:24 UTC
[PATCH 05/34] powerpc: reuse asm-generic/barrier.h
On powerpc read_barrier_depends, smp_read_barrier_depends smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/powerpc/include/asm/barrier.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 0eca6ef..980ad0c 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -34,8 +34,6 @@ #define rmb() __asm__ __volatile__ ("sync" : : : "memory") #define wmb() __asm__ __volatile__ ("sync" : : : "memory") -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) - #ifdef __SUBARCH_HAS_LWSYNC # define SMPWMB LWSYNC #else @@ -60,9 +58,6 @@ #define smp_wmb() barrier() #endif /* CONFIG_SMP */ -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) - /* * This is a barrier which prevents following instructions from being * started until the value of the argument x is known. For example, if @@ -87,8 +82,8 @@ do { \ ___p1; \ }) -#define smp_mb__before_atomic() smp_mb() -#define smp_mb__after_atomic() smp_mb() #define smp_mb__before_spinlock() smp_mb() +#include <asm-generic/barrier.h> + #endif /* _ASM_POWERPC_BARRIER_H */ -- MST
On s390 read_barrier_depends, smp_read_barrier_depends smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/s390/include/asm/barrier.h | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h index d68e11e..c358c31 100644 --- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -30,14 +30,6 @@ #define smp_rmb() rmb() #define smp_wmb() wmb() -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) - -#define smp_mb__before_atomic() smp_mb() -#define smp_mb__after_atomic() smp_mb() - -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) - #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ @@ -53,4 +45,6 @@ do { \ ___p1; \ }) +#include <asm-generic/barrier.h> + #endif /* __ASM_BARRIER_H */ -- MST
On sparc 64 bit dma_rmb, dma_wmb, smp_store_mb, smp_mb, smp_rmb, smp_wmb, read_barrier_depends and smp_read_barrier_depends match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. nop uses __asm__ __volatile but is otherwise identical to the generic version, drop that as well. This is in preparation to refactoring this code area. Note: nop() was in processor.h and not in barrier.h as on other architectures. Nothing seems to depend on it being there though. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/sparc/include/asm/barrier_32.h | 1 - arch/sparc/include/asm/barrier_64.h | 21 ++------------------- arch/sparc/include/asm/processor.h | 3 --- 3 files changed, 2 insertions(+), 23 deletions(-) diff --git a/arch/sparc/include/asm/barrier_32.h b/arch/sparc/include/asm/barrier_32.h index ae69eda..8059130 100644 --- a/arch/sparc/include/asm/barrier_32.h +++ b/arch/sparc/include/asm/barrier_32.h @@ -1,7 +1,6 @@ #ifndef __SPARC_BARRIER_H #define __SPARC_BARRIER_H -#include <asm/processor.h> /* for nop() */ #include <asm-generic/barrier.h> #endif /* !(__SPARC_BARRIER_H) */ diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h index 14a9286..26c3f72 100644 --- a/arch/sparc/include/asm/barrier_64.h +++ b/arch/sparc/include/asm/barrier_64.h @@ -37,25 +37,6 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ #define rmb() __asm__ __volatile__("":::"memory") #define wmb() __asm__ __volatile__("":::"memory") -#define dma_rmb() rmb() -#define dma_wmb() wmb() - -#define smp_store_mb(__var, __value) \ - do { WRITE_ONCE(__var, __value); membar_safe("#StoreLoad"); } while(0) - -#ifdef CONFIG_SMP -#define smp_mb() mb() -#define smp_rmb() rmb() -#define smp_wmb() wmb() -#else -#define smp_mb() __asm__ __volatile__("":::"memory") -#define smp_rmb() __asm__ __volatile__("":::"memory") -#define smp_wmb() __asm__ __volatile__("":::"memory") -#endif - -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) - #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ @@ -74,4 +55,6 @@ do { \ #define smp_mb__before_atomic() barrier() #define smp_mb__after_atomic() barrier() +#include <asm-generic/barrier.h> + #endif /* !(__SPARC64_BARRIER_H) */ diff --git a/arch/sparc/include/asm/processor.h b/arch/sparc/include/asm/processor.h index 2fe99e6..9da9646 100644 --- a/arch/sparc/include/asm/processor.h +++ b/arch/sparc/include/asm/processor.h @@ -5,7 +5,4 @@ #else #include <asm/processor_32.h> #endif - -#define nop() __asm__ __volatile__ ("nop") - #endif -- MST
Michael S. Tsirkin
2015-Dec-30 13:24 UTC
[PATCH 08/34] asm-generic: smp_store_mb should use smp_mb
asm-generic variant of smp_store_mb() calls mb() which is stronger than implied by both the name and the documentation. smp_store_mb is only used by core kernel code at the moment, so we know no one mis-uses it for an MMIO barrier. Make it call smp_mb consistently before some arch-specific code uses it as such by mistake. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/asm-generic/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 538f8d1..987b2e0 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -93,7 +93,7 @@ #endif /* CONFIG_SMP */ #ifndef smp_store_mb -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) #endif #ifndef smp_mb__before_atomic -- MST
On arm smp_store_mb, read_barrier_depends, smp_read_barrier_depends, smp_store_release, smp_load_acquire, smp_mb__before_atomic and smp_mb__after_atomic match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/arm/include/asm/barrier.h | 23 +---------------------- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index 3ff5642..31152e8 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -70,28 +70,7 @@ extern void arm_heavy_mb(void); #define smp_wmb() dmb(ishst) #endif -#define smp_store_release(p, v) \ -do { \ - compiletime_assert_atomic_type(*p); \ - smp_mb(); \ - WRITE_ONCE(*p, v); \ -} while (0) - -#define smp_load_acquire(p) \ -({ \ - typeof(*p) ___p1 = READ_ONCE(*p); \ - compiletime_assert_atomic_type(*p); \ - smp_mb(); \ - ___p1; \ -}) - -#define read_barrier_depends() do { } while(0) -#define smp_read_barrier_depends() do { } while(0) - -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) - -#define smp_mb__before_atomic() smp_mb() -#define smp_mb__after_atomic() smp_mb() +#include <asm-generic/barrier.h> #endif /* !__ASSEMBLY__ */ #endif /* __ASM_BARRIER_H */ -- MST
On arm64 nop, read_barrier_depends, smp_read_barrier_depends smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/arm64/include/asm/barrier.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 9622eb4..91a43f4 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -91,14 +91,7 @@ do { \ __u.__val; \ }) -#define read_barrier_depends() do { } while(0) -#define smp_read_barrier_depends() do { } while(0) - -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) -#define nop() asm volatile("nop"); - -#define smp_mb__before_atomic() smp_mb() -#define smp_mb__after_atomic() smp_mb() +#include <asm-generic/barrier.h> #endif /* __ASSEMBLY__ */ -- MST
On metag dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends, smp_read_barrier_depends, smp_store_release and smp_load_acquire match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- This is straightforward, but unfortunately untested: I don't have a metag compiler. I would appreciate an ack from someone with a metag compiler. arch/metag/include/asm/barrier.h | 25 ++----------------------- 1 file changed, 2 insertions(+), 23 deletions(-) diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h index 172b7e5..b5b778b 100644 --- a/arch/metag/include/asm/barrier.h +++ b/arch/metag/include/asm/barrier.h @@ -44,9 +44,6 @@ static inline void wr_fence(void) #define rmb() barrier() #define wmb() mb() -#define dma_rmb() rmb() -#define dma_wmb() wmb() - #ifndef CONFIG_SMP #define fence() do { } while (0) #define smp_mb() barrier() @@ -81,27 +78,9 @@ static inline void fence(void) #endif #endif -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) - -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) - -#define smp_store_release(p, v) \ -do { \ - compiletime_assert_atomic_type(*p); \ - smp_mb(); \ - WRITE_ONCE(*p, v); \ -} while (0) - -#define smp_load_acquire(p) \ -({ \ - typeof(*p) ___p1 = READ_ONCE(*p); \ - compiletime_assert_atomic_type(*p); \ - smp_mb(); \ - ___p1; \ -}) - #define smp_mb__before_atomic() barrier() #define smp_mb__after_atomic() barrier() +#include <asm-generic/barrier.h> + #endif /* _ASM_METAG_BARRIER_H */ -- MST
On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends, smp_read_barrier_depends, smp_store_release and smp_load_acquire match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/mips/include/asm/barrier.h | 25 ++----------------------- 1 file changed, 2 insertions(+), 23 deletions(-) diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index 752e0b8..3eac4b9 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -10,9 +10,6 @@ #include <asm/addrspace.h> -#define read_barrier_depends() do { } while(0) -#define smp_read_barrier_depends() do { } while(0) - #ifdef CONFIG_CPU_HAS_SYNC #define __sync() \ __asm__ __volatile__( \ @@ -87,8 +84,6 @@ #define wmb() fast_wmb() #define rmb() fast_rmb() -#define dma_wmb() fast_wmb() -#define dma_rmb() fast_rmb() #if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP) # ifdef CONFIG_CPU_CAVIUM_OCTEON @@ -112,9 +107,6 @@ #define __WEAK_LLSC_MB " \n" #endif -#define smp_store_mb(var, value) \ - do { WRITE_ONCE(var, value); smp_mb(); } while (0) - #define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory") #ifdef CONFIG_CPU_CAVIUM_OCTEON @@ -129,22 +121,9 @@ #define nudge_writes() mb() #endif -#define smp_store_release(p, v) \ -do { \ - compiletime_assert_atomic_type(*p); \ - smp_mb(); \ - WRITE_ONCE(*p, v); \ -} while (0) - -#define smp_load_acquire(p) \ -({ \ - typeof(*p) ___p1 = READ_ONCE(*p); \ - compiletime_assert_atomic_type(*p); \ - smp_mb(); \ - ___p1; \ -}) - #define smp_mb__before_atomic() smp_mb__before_llsc() #define smp_mb__after_atomic() smp_llsc_mb() +#include <asm-generic/barrier.h> + #endif /* __ASM_BARRIER_H */ -- MST
Michael S. Tsirkin
2015-Dec-30 13:25 UTC
[PATCH 13/34] x86/um: reuse asm-generic/barrier.h
On x86/um CONFIG_SMP is never defined. As a result, several macros match the asm-generic variant exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/um/asm/barrier.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h index 755481f..174781a 100644 --- a/arch/x86/um/asm/barrier.h +++ b/arch/x86/um/asm/barrier.h @@ -36,13 +36,6 @@ #endif /* CONFIG_X86_PPRO_FENCE */ #define dma_wmb() barrier() -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() - -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) - -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) +#include <asm-generic/barrier.h> #endif -- MST
As on most architectures, on x86 read_barrier_depends and smp_read_barrier_depends are empty. Drop the local definitions and pull the generic ones from asm-generic/barrier.h instead: they are identical. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/include/asm/barrier.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 0681d25..cc4c2a7 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -43,9 +43,6 @@ #define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) #endif /* SMP */ -#define read_barrier_depends() do { } while (0) -#define smp_read_barrier_depends() do { } while (0) - #if defined(CONFIG_X86_PPRO_FENCE) /* @@ -91,4 +88,6 @@ do { \ #define smp_mb__before_atomic() barrier() #define smp_mb__after_atomic() barrier() +#include <asm-generic/barrier.h> + #endif /* _ASM_X86_BARRIER_H */ -- MST
Michael S. Tsirkin
2015-Dec-30 13:25 UTC
[PATCH 15/34] asm-generic: add __smp_XXX wrappers
On !SMP, most architectures define their barriers as compiler barriers. On SMP, most need an actual barrier. Make it possible to remove the code duplication for !SMP by defining a low-level __smp_XXX barriers which do not depend on the value of SMP, then use them from asm-generic conditionally. Besides reducing code duplication, these low level APIs will also be useful for virtualization, where a barrier is sometimes needed even if !SMP since we might be talking to another kernel on the same SMP system. Both virtio and Xen drivers will benefit. The smp_XXX variants should use __smp_XXX ones or barrier() depending on SMP, identically for all architectures. We keep ifndef guards around them for now - once all architectures have been converted to use the generic code, we'll be able to remove these. Suggested-by: Peter Zijlstra <peterz at infradead.org> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/asm-generic/barrier.h | 91 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 82 insertions(+), 9 deletions(-) diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 987b2e0..8752964 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -54,22 +54,38 @@ #define read_barrier_depends() do { } while (0) #endif +#ifndef __smp_mb +#define __smp_mb() mb() +#endif + +#ifndef __smp_rmb +#define __smp_rmb() rmb() +#endif + +#ifndef __smp_wmb +#define __smp_wmb() wmb() +#endif + +#ifndef __smp_read_barrier_depends +#define __smp_read_barrier_depends() read_barrier_depends() +#endif + #ifdef CONFIG_SMP #ifndef smp_mb -#define smp_mb() mb() +#define smp_mb() __smp_mb() #endif #ifndef smp_rmb -#define smp_rmb() rmb() +#define smp_rmb() __smp_rmb() #endif #ifndef smp_wmb -#define smp_wmb() wmb() +#define smp_wmb() __smp_wmb() #endif #ifndef smp_read_barrier_depends -#define smp_read_barrier_depends() read_barrier_depends() +#define smp_read_barrier_depends() __smp_read_barrier_depends() #endif #else /* !CONFIG_SMP */ @@ -92,23 +108,78 @@ #endif /* CONFIG_SMP */ +#ifndef __smp_store_mb +#define __smp_store_mb(var, value) do { WRITE_ONCE(var, value); __smp_mb(); } while (0) +#endif + +#ifndef __smp_mb__before_atomic +#define __smp_mb__before_atomic() __smp_mb() +#endif + +#ifndef __smp_mb__after_atomic +#define __smp_mb__after_atomic() __smp_mb() +#endif + +#ifndef __smp_store_release +#define __smp_store_release(p, v) \ +do { \ + compiletime_assert_atomic_type(*p); \ + __smp_mb(); \ + WRITE_ONCE(*p, v); \ +} while (0) +#endif + +#ifndef __smp_load_acquire +#define __smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = READ_ONCE(*p); \ + compiletime_assert_atomic_type(*p); \ + __smp_mb(); \ + ___p1; \ +}) +#endif + +#ifdef CONFIG_SMP + +#ifndef smp_store_mb +#define smp_store_mb(var, value) __smp_store_mb(var, value) +#endif + +#ifndef smp_mb__before_atomic +#define smp_mb__before_atomic() __smp_mb__before_atomic() +#endif + +#ifndef smp_mb__after_atomic +#define smp_mb__after_atomic() __smp_mb__after_atomic() +#endif + +#ifndef smp_store_release +#define smp_store_release(p, v) __smp_store_release(p, v) +#endif + +#ifndef smp_load_acquire +#define smp_load_acquire(p) __smp_load_acquire(p) +#endif + +#else /* !CONFIG_SMP */ + #ifndef smp_store_mb -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) #endif #ifndef smp_mb__before_atomic -#define smp_mb__before_atomic() smp_mb() +#define smp_mb__before_atomic() barrier() #endif #ifndef smp_mb__after_atomic -#define smp_mb__after_atomic() smp_mb() +#define smp_mb__after_atomic() barrier() #endif #ifndef smp_store_release #define smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ - smp_mb(); \ + barrier(); \ WRITE_ONCE(*p, v); \ } while (0) #endif @@ -118,10 +189,12 @@ do { \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ - smp_mb(); \ + barrier(); \ ___p1; \ }) #endif +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ -- MST
This defines __smp_XXX barriers for powerpc for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h This reduces the amount of arch-specific boiler-plate code. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/powerpc/include/asm/barrier.h | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 980ad0c..c0deafc 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -44,19 +44,11 @@ #define dma_rmb() __lwsync() #define dma_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") -#ifdef CONFIG_SMP -#define smp_lwsync() __lwsync() +#define __smp_lwsync() __lwsync() -#define smp_mb() mb() -#define smp_rmb() __lwsync() -#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") -#else -#define smp_lwsync() barrier() - -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() -#endif /* CONFIG_SMP */ +#define __smp_mb() mb() +#define __smp_rmb() __lwsync() +#define __smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") /* * This is a barrier which prevents following instructions from being @@ -67,18 +59,18 @@ #define data_barrier(x) \ asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory"); -#define smp_store_release(p, v) \ +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ - smp_lwsync(); \ + __smp_lwsync(); \ WRITE_ONCE(*p, v); \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ - smp_lwsync(); \ + __smp_lwsync(); \ ___p1; \ }) -- MST
This defines __smp_XXX barriers for arm64, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Note: arm64 does not support !SMP config, so smp_XXX and __smp_XXX are always equivalent. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/arm64/include/asm/barrier.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 91a43f4..dae5c49 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -35,11 +35,11 @@ #define dma_rmb() dmb(oshld) #define dma_wmb() dmb(oshst) -#define smp_mb() dmb(ish) -#define smp_rmb() dmb(ishld) -#define smp_wmb() dmb(ishst) +#define __smp_mb() dmb(ish) +#define __smp_rmb() dmb(ishld) +#define __smp_wmb() dmb(ishst) -#define smp_store_release(p, v) \ +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ switch (sizeof(*p)) { \ @@ -62,7 +62,7 @@ do { \ } \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ union { typeof(*p) __val; char __c[1]; } __u; \ compiletime_assert_atomic_type(*p); \ -- MST
This defines __smp_XXX barriers for arm, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h This reduces the amount of arch-specific boiler-plate code. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/arm/include/asm/barrier.h | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index 31152e8..112cc1a 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h @@ -60,15 +60,9 @@ extern void arm_heavy_mb(void); #define dma_wmb() barrier() #endif -#ifndef CONFIG_SMP -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() -#else -#define smp_mb() dmb(ish) -#define smp_rmb() smp_mb() -#define smp_wmb() dmb(ishst) -#endif +#define __smp_mb() dmb(ish) +#define __smp_rmb() __smp_mb() +#define __smp_wmb() dmb(ishst) #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for blackfin, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/blackfin/include/asm/barrier.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h index dfb66fe..7cca51c 100644 --- a/arch/blackfin/include/asm/barrier.h +++ b/arch/blackfin/include/asm/barrier.h @@ -78,8 +78,8 @@ #endif /* !CONFIG_SMP */ -#define smp_mb__before_atomic() barrier() -#define smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for ia64, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h This reduces the amount of arch-specific boiler-plate code. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/ia64/include/asm/barrier.h | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h index 2f93348..588f161 100644 --- a/arch/ia64/include/asm/barrier.h +++ b/arch/ia64/include/asm/barrier.h @@ -42,28 +42,24 @@ #define dma_rmb() mb() #define dma_wmb() mb() -#ifdef CONFIG_SMP -# define smp_mb() mb() -#else -# define smp_mb() barrier() -#endif +# define __smp_mb() mb() -#define smp_mb__before_atomic() barrier() -#define smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() /* * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no * need for asm trickery! */ -#define smp_store_release(p, v) \ +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ barrier(); \ WRITE_ONCE(*p, v); \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ -- MST
This defines __smp_XXX barriers for metag, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Note: as __smp_XX macros should not depend on CONFIG_SMP, they can not use the existing fence() macro since that is defined differently between SMP and !SMP. For this reason, this patch introduces a wrapper metag_fence() that doesn't depend on CONFIG_SMP. fence() is then defined using that, depending on CONFIG_SMP. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/metag/include/asm/barrier.h | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h index b5b778b..84880c9 100644 --- a/arch/metag/include/asm/barrier.h +++ b/arch/metag/include/asm/barrier.h @@ -44,13 +44,6 @@ static inline void wr_fence(void) #define rmb() barrier() #define wmb() mb() -#ifndef CONFIG_SMP -#define fence() do { } while (0) -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() -#else - #ifdef CONFIG_METAG_SMP_WRITE_REORDERING /* * Write to the atomic memory unlock system event register (command 0). This is @@ -60,26 +53,31 @@ static inline void wr_fence(void) * incoherence). It is therefore ineffective if used after and on the same * thread as a write. */ -static inline void fence(void) +static inline void metag_fence(void) { volatile int *flushptr = (volatile int *) LINSYSEVENT_WR_ATOMIC_UNLOCK; barrier(); *flushptr = 0; barrier(); } -#define smp_mb() fence() -#define smp_rmb() fence() -#define smp_wmb() barrier() +#define __smp_mb() metag_fence() +#define __smp_rmb() metag_fence() +#define __smp_wmb() barrier() #else -#define fence() do { } while (0) -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() +#define metag_fence() do { } while (0) +#define __smp_mb() barrier() +#define __smp_rmb() barrier() +#define __smp_wmb() barrier() #endif + +#ifdef CONFIG_SMP +#define fence() metag_fence() +#else +#define fence() do { } while (0) #endif -#define smp_mb__before_atomic() barrier() -#define smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for mips, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Note: the only exception is smp_mb__before_llsc which is mips-specific. We define both the __smp_mb__before_llsc variant (for use in asm/barriers.h) and smp_mb__before_llsc (for use elsewhere on this architecture). Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/mips/include/asm/barrier.h | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h index 3eac4b9..d296633 100644 --- a/arch/mips/include/asm/barrier.h +++ b/arch/mips/include/asm/barrier.h @@ -85,20 +85,20 @@ #define wmb() fast_wmb() #define rmb() fast_rmb() -#if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP) +#if defined(CONFIG_WEAK_ORDERING) # ifdef CONFIG_CPU_CAVIUM_OCTEON -# define smp_mb() __sync() -# define smp_rmb() barrier() -# define smp_wmb() __syncw() +# define __smp_mb() __sync() +# define __smp_rmb() barrier() +# define __smp_wmb() __syncw() # else -# define smp_mb() __asm__ __volatile__("sync" : : :"memory") -# define smp_rmb() __asm__ __volatile__("sync" : : :"memory") -# define smp_wmb() __asm__ __volatile__("sync" : : :"memory") +# define __smp_mb() __asm__ __volatile__("sync" : : :"memory") +# define __smp_rmb() __asm__ __volatile__("sync" : : :"memory") +# define __smp_wmb() __asm__ __volatile__("sync" : : :"memory") # endif #else -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() +#define __smp_mb() barrier() +#define __smp_rmb() barrier() +#define __smp_wmb() barrier() #endif #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP) @@ -111,6 +111,7 @@ #ifdef CONFIG_CPU_CAVIUM_OCTEON #define smp_mb__before_llsc() smp_wmb() +#define __smp_mb__before_llsc() __smp_wmb() /* Cause previous writes to become visible on all CPUs as soon as possible */ #define nudge_writes() __asm__ __volatile__(".set push\n\t" \ ".set arch=octeon\n\t" \ @@ -118,11 +119,12 @@ ".set pop" : : : "memory") #else #define smp_mb__before_llsc() smp_llsc_mb() +#define __smp_mb__before_llsc() smp_llsc_mb() #define nudge_writes() mb() #endif -#define smp_mb__before_atomic() smp_mb__before_llsc() -#define smp_mb__after_atomic() smp_llsc_mb() +#define __smp_mb__before_atomic() __smp_mb__before_llsc() +#define __smp_mb__after_atomic() smp_llsc_mb() #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for s390, for use by virtualization. Some smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers unconditionally on this architecture. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/s390/include/asm/barrier.h | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h index c358c31..fbd25b2 100644 --- a/arch/s390/include/asm/barrier.h +++ b/arch/s390/include/asm/barrier.h @@ -26,18 +26,21 @@ #define wmb() barrier() #define dma_rmb() mb() #define dma_wmb() mb() -#define smp_mb() mb() -#define smp_rmb() rmb() -#define smp_wmb() wmb() - -#define smp_store_release(p, v) \ +#define __smp_mb() mb() +#define __smp_rmb() rmb() +#define __smp_wmb() wmb() +#define smp_mb() __smp_mb() +#define smp_rmb() __smp_rmb() +#define smp_wmb() __smp_wmb() + +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ barrier(); \ WRITE_ONCE(*p, v); \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ -- MST
Michael S. Tsirkin
2015-Dec-30 13:25 UTC
[PATCH 24/34] sh: define __smp_XXX, fix smp_store_mb for !SMP
sh variant of smp_store_mb() calls xchg() on !SMP which is stronger than implied by both the name and the documentation. define __smp_store_mb instead: code in asm-generic/barrier.h will then define smp_store_mb correctly depending on CONFIG_SMP. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/sh/include/asm/barrier.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h index bf91037..f887c64 100644 --- a/arch/sh/include/asm/barrier.h +++ b/arch/sh/include/asm/barrier.h @@ -32,7 +32,8 @@ #define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop") #endif -#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) +#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) +#define smp_store_mb(var, value) __smp_store_mb(var, value) #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for sparc, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/sparc/include/asm/barrier_64.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h index 26c3f72..c9f6ee6 100644 --- a/arch/sparc/include/asm/barrier_64.h +++ b/arch/sparc/include/asm/barrier_64.h @@ -37,14 +37,14 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \ #define rmb() __asm__ __volatile__("":::"memory") #define wmb() __asm__ __volatile__("":::"memory") -#define smp_store_release(p, v) \ +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ barrier(); \ WRITE_ONCE(*p, v); \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ @@ -52,8 +52,8 @@ do { \ ___p1; \ }) -#define smp_mb__before_atomic() barrier() -#define smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for tile, for use by virtualization. Some smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Note: for 32 bit, keep smp_mb__after_atomic around since it's faster than the generic implementation. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/tile/include/asm/barrier.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/tile/include/asm/barrier.h b/arch/tile/include/asm/barrier.h index 96a42ae..d552228 100644 --- a/arch/tile/include/asm/barrier.h +++ b/arch/tile/include/asm/barrier.h @@ -79,11 +79,12 @@ mb_incoherent(void) * But after the word is updated, the routine issues an "mf" before returning, * and since it's a function call, we don't even need a compiler barrier. */ -#define smp_mb__before_atomic() smp_mb() -#define smp_mb__after_atomic() do { } while (0) +#define __smp_mb__before_atomic() __smp_mb() +#define __smp_mb__after_atomic() do { } while (0) +#define smp_mb__after_atomic() __smp_mb__after_atomic() #else /* 64 bit */ -#define smp_mb__before_atomic() smp_mb() -#define smp_mb__after_atomic() smp_mb() +#define __smp_mb__before_atomic() __smp_mb() +#define __smp_mb__after_atomic() __smp_mb() #endif #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for xtensa, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/xtensa/include/asm/barrier.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/xtensa/include/asm/barrier.h b/arch/xtensa/include/asm/barrier.h index 5b88774..956596e 100644 --- a/arch/xtensa/include/asm/barrier.h +++ b/arch/xtensa/include/asm/barrier.h @@ -13,8 +13,8 @@ #define rmb() barrier() #define wmb() mb() -#define smp_mb__before_atomic() barrier() -#define smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() #include <asm-generic/barrier.h> -- MST
This defines __smp_XXX barriers for x86, for use by virtualization. smp_XXX barriers are removed as they are defined correctly by asm-generic/barriers.h Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- arch/x86/include/asm/barrier.h | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index cc4c2a7..a584e1c 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -31,17 +31,10 @@ #endif #define dma_wmb() barrier() -#ifdef CONFIG_SMP -#define smp_mb() mb() -#define smp_rmb() dma_rmb() -#define smp_wmb() barrier() -#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) -#else /* !SMP */ -#define smp_mb() barrier() -#define smp_rmb() barrier() -#define smp_wmb() barrier() -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0) -#endif /* SMP */ +#define __smp_mb() mb() +#define __smp_rmb() dma_rmb() +#define __smp_wmb() barrier() +#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0) #if defined(CONFIG_X86_PPRO_FENCE) @@ -50,31 +43,31 @@ * model and we should fall back to full barriers. */ -#define smp_store_release(p, v) \ +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ - smp_mb(); \ + __smp_mb(); \ WRITE_ONCE(*p, v); \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ - smp_mb(); \ + __smp_mb(); \ ___p1; \ }) #else /* regular x86 TSO memory ordering */ -#define smp_store_release(p, v) \ +#define __smp_store_release(p, v) \ do { \ compiletime_assert_atomic_type(*p); \ barrier(); \ WRITE_ONCE(*p, v); \ } while (0) -#define smp_load_acquire(p) \ +#define __smp_load_acquire(p) \ ({ \ typeof(*p) ___p1 = READ_ONCE(*p); \ compiletime_assert_atomic_type(*p); \ @@ -85,8 +78,8 @@ do { \ #endif /* Atomic operations are already serializing on x86 */ -#define smp_mb__before_atomic() barrier() -#define smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() barrier() +#define __smp_mb__after_atomic() barrier() #include <asm-generic/barrier.h> -- MST
Michael S. Tsirkin
2015-Dec-30 13:26 UTC
[PATCH 29/34] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
This reverts commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9. While that commit optimizes !CONFIG_SMP, it mixes up DMA and SMP concepts, making the code hard to figure out. A better way to optimize this is with the new __smp_XXX barriers. As a first step, go back to full rmb/wmb barriers for !SMP. We switch to __smp_XXX barriers in the next patch. Cc: Peter Zijlstra <peterz at infradead.org> Cc: Alexander Duyck <alexander.duyck at gmail.com> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/linux/virtio_ring.h | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index 8e50888..67e06fe 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -21,20 +21,19 @@ * actually quite cheap. */ +#ifdef CONFIG_SMP static inline void virtio_mb(bool weak_barriers) { -#ifdef CONFIG_SMP if (weak_barriers) smp_mb(); else -#endif mb(); } static inline void virtio_rmb(bool weak_barriers) { if (weak_barriers) - dma_rmb(); + smp_rmb(); else rmb(); } @@ -42,10 +41,26 @@ static inline void virtio_rmb(bool weak_barriers) static inline void virtio_wmb(bool weak_barriers) { if (weak_barriers) - dma_wmb(); + smp_wmb(); else wmb(); } +#else +static inline void virtio_mb(bool weak_barriers) +{ + mb(); +} + +static inline void virtio_rmb(bool weak_barriers) +{ + rmb(); +} + +static inline void virtio_wmb(bool weak_barriers) +{ + wmb(); +} +#endif struct virtio_device; struct virtqueue; -- MST
Michael S. Tsirkin
2015-Dec-30 13:26 UTC
[PATCH 30/34] virtio_ring: update weak barriers to use __smp_XXX
virtio ring uses smp_wmb on SMP and wmb on !SMP, the reason for the later being that it might be talking to another kernel on the same SMP machine. This is exactly what __smp_XXX barriers do, so switch to these instead of homegrown ifdef hacks. Cc: Peter Zijlstra <peterz at infradead.org> Cc: Alexander Duyck <alexander.duyck at gmail.com> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/linux/virtio_ring.h | 25 ++++--------------------- 1 file changed, 4 insertions(+), 21 deletions(-) diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index 67e06fe..71176be 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -12,7 +12,7 @@ * anyone care? * * For virtio_pci on SMP, we don't need to order with respect to MMIO - * accesses through relaxed memory I/O windows, so smp_mb() et al are + * accesses through relaxed memory I/O windows, so __smp_mb() et al are * sufficient. * * For using virtio to talk to real devices (eg. other heterogeneous @@ -21,11 +21,10 @@ * actually quite cheap. */ -#ifdef CONFIG_SMP static inline void virtio_mb(bool weak_barriers) { if (weak_barriers) - smp_mb(); + __smp_mb(); else mb(); } @@ -33,7 +32,7 @@ static inline void virtio_mb(bool weak_barriers) static inline void virtio_rmb(bool weak_barriers) { if (weak_barriers) - smp_rmb(); + __smp_rmb(); else rmb(); } @@ -41,26 +40,10 @@ static inline void virtio_rmb(bool weak_barriers) static inline void virtio_wmb(bool weak_barriers) { if (weak_barriers) - smp_wmb(); + __smp_wmb(); else wmb(); } -#else -static inline void virtio_mb(bool weak_barriers) -{ - mb(); -} - -static inline void virtio_rmb(bool weak_barriers) -{ - rmb(); -} - -static inline void virtio_wmb(bool weak_barriers) -{ - wmb(); -} -#endif struct virtio_device; struct virtqueue; -- MST
drivers/xen/xenbus/xenbus_comms.c uses full memory barriers to communicate with the other side. For guests compiled with CONFIG_SMP, smp_wmb and smp_mb would be sufficient, so mb() and wmb() here are only needed if a non-SMP guest runs on an SMP host. Switch to __smp_XXX barriers which serve this exact purpose. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- This is straight-forward, but untested. I can either merge this patchset through my tree if this is acked, or defer this and merge the patchset first, and xen bits through xen tree afterwards. Pls let me know. drivers/xen/xenbus/xenbus_comms.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c index fdb0f33..09b17c7 100644 --- a/drivers/xen/xenbus/xenbus_comms.c +++ b/drivers/xen/xenbus/xenbus_comms.c @@ -123,14 +123,14 @@ int xb_write(const void *data, unsigned len) avail = len; /* Must write data /after/ reading the consumer index. */ - mb(); + __smp_mb(); memcpy(dst, data, avail); data += avail; len -= avail; /* Other side must not see new producer until data is there. */ - wmb(); + __smp_wmb(); intf->req_prod += avail; /* Implies mb(): other side will see the updated producer. */ @@ -180,14 +180,14 @@ int xb_read(void *data, unsigned len) avail = len; /* Must read data /after/ reading the producer index. */ - rmb(); + __smp_rmb(); memcpy(data, src, avail); data += avail; len -= avail; /* Other side must not see free space until we've copied out */ - mb(); + __smp_mb(); intf->rsp_cons += avail; pr_debug("Finished read of %i bytes (%i to go)\n", avail, len); -- MST
include/xen/interface/io/ring.h uses full memory barriers to communicate with the other side. For guests compiled with CONFIG_SMP, smp_wmb and smp_mb would be sufficient, so mb() and wmb() here are only needed if a non-SMP guest runs on an SMP host. Switch to __smp_XXX barriers which serve this exact purpose. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/xen/interface/io/ring.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h index 7dc685b..46dfc65 100644 --- a/include/xen/interface/io/ring.h +++ b/include/xen/interface/io/ring.h @@ -208,12 +208,12 @@ struct __name##_back_ring { \ #define RING_PUSH_REQUESTS(_r) do { \ - wmb(); /* back sees requests /before/ updated producer index */ \ + __smp_wmb(); /* back sees requests /before/ updated producer index */ \ (_r)->sring->req_prod = (_r)->req_prod_pvt; \ } while (0) #define RING_PUSH_RESPONSES(_r) do { \ - wmb(); /* front sees responses /before/ updated producer index */ \ + __smp_wmb(); /* front sees responses /before/ updated producer index */ \ (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \ } while (0) @@ -250,9 +250,9 @@ struct __name##_back_ring { \ #define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \ RING_IDX __old = (_r)->sring->req_prod; \ RING_IDX __new = (_r)->req_prod_pvt; \ - wmb(); /* back sees requests /before/ updated producer index */ \ + __smp_wmb(); /* back sees requests /before/ updated producer index */ \ (_r)->sring->req_prod = __new; \ - mb(); /* back sees new requests /before/ we check req_event */ \ + __smp_mb(); /* back sees new requests /before/ we check req_event */ \ (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \ (RING_IDX)(__new - __old)); \ } while (0) @@ -260,9 +260,9 @@ struct __name##_back_ring { \ #define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \ RING_IDX __old = (_r)->sring->rsp_prod; \ RING_IDX __new = (_r)->rsp_prod_pvt; \ - wmb(); /* front sees responses /before/ updated producer index */ \ + __smp_wmb(); /* front sees responses /before/ updated producer index */ \ (_r)->sring->rsp_prod = __new; \ - mb(); /* front sees new responses /before/ we check rsp_event */ \ + __smp_mb(); /* front sees new responses /before/ we check rsp_event */ \ (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \ (RING_IDX)(__new - __old)); \ } while (0) @@ -271,7 +271,7 @@ struct __name##_back_ring { \ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ if (_work_to_do) break; \ (_r)->sring->req_event = (_r)->req_cons + 1; \ - mb(); \ + __smp_mb(); \ (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ } while (0) @@ -279,7 +279,7 @@ struct __name##_back_ring { \ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ if (_work_to_do) break; \ (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \ - mb(); \ + __smp_mb(); \ (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ } while (0) -- MST
Michael S. Tsirkin
2015-Dec-30 13:26 UTC
[PATCH 33/34] virtio: use __smp_load_acquire/__smp_store_release
virtio ring entries have exactly the acquire/release semantics: - reading used index acquires a ring entry from host - updating the available index releases it to host Thus when using weak barriers (as most people do), __smp_load_acquire and __smp_store_release will do exactly the right thing to synchronize with the host. In fact, QEMU already uses __atomic_thread_fence(__ATOMIC_ACQUIRE) and __atomic_thread_fence(__ATOMIC_RELEASE); Documentation/circular-buffers.txt suggests smp_load_acquire and smp_store_release for head and tail updates. Since we have to worry about strong barrier users, let's add APIs to wrap these, and use in virtio_ring.c: in case of the string barriers, we fall back on rmb/wmb as previously. It is tempting to use this in vhost as well, this needs a bit more work to make acquire/release macros work on __user pointers. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/linux/virtio_ring.h | 26 ++++++++++++++++++++++++++ drivers/virtio/virtio_ring.c | 20 ++++++++++---------- 2 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index 71176be..528cf81 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -45,6 +45,32 @@ static inline void virtio_wmb(bool weak_barriers) wmb(); } +/* a load + acquire barrier, but only guaranteed to order reads */ +static inline __virtio16 virtio_load_acquire_rmb(bool weak_barriers, + __virtio16 *p) +{ + if (weak_barriers) + return __smp_load_acquire(p); + else { + __virtio16 v = READ_ONCE(*p); + rmb(); + return v; + } +} + +/* a release barrier + store, but only guaranteed to order writes */ +static inline void virtio_store_release_wmb(bool weak_barriers, + __virtio16 *p, __virtio16 v) +{ + if (weak_barriers) + __smp_store_release(p, v); + else { + wmb(); + WRITE_ONCE(*p, v); + return; + } +} + struct virtio_device; struct virtqueue; diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index ee663c4..edc01d0 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -244,11 +244,11 @@ static inline int virtqueue_add(struct virtqueue *_vq, avail = vq->avail_idx_shadow & (vq->vring.num - 1); vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head); + vq->avail_idx_shadow++; /* Descriptors and available array need to be set before we expose the * new available array entries. */ - virtio_wmb(vq->weak_barriers); - vq->avail_idx_shadow++; - vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow); + virtio_store_release_wmb(vq->weak_barriers, &vq->vring.avail->idx, + cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow)); vq->num_added++; pr_debug("Added buffer head %i to %p\n", head, vq); @@ -453,9 +453,10 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head) vq->vq.num_free++; } -static inline bool more_used(const struct vring_virtqueue *vq) +static inline +bool more_used(const struct vring_virtqueue *vq, __virtio16 used_idx) { - return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx); + return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, used_idx); } /** @@ -488,15 +489,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) return NULL; } - if (!more_used(vq)) { + /* Only get used array entries after they have been exposed by host. */ + if (!more_used(vq, virtio_load_acquire_rmb(vq->weak_barriers, + &vq->vring.used->idx))) { pr_debug("No more buffers in queue\n"); END_USE(vq); return NULL; } - /* Only get used array entries after they have been exposed by host. */ - virtio_rmb(vq->weak_barriers); - last_used = (vq->last_used_idx & (vq->vring.num - 1)); i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id); *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len); @@ -704,7 +704,7 @@ irqreturn_t vring_interrupt(int irq, void *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); - if (!more_used(vq)) { + if (!more_used(vq, vq->vring.used->idx)) { pr_debug("virtqueue interrupt with no work for %p\n", vq); return IRQ_NONE; } -- MST
We need a full barrier after writing out event index, using __smp_store_mb there seems better than open-coding. As usual, we need a wrapper to account for strong barriers. It's tempting to use this in vhost as well, for that, we'll need a variant of smp_store_mb that works on __user pointers. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/linux/virtio_ring.h | 12 ++++++++++++ drivers/virtio/virtio_ring.c | 15 +++++++++------ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index 528cf81..8ff0c25 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -45,6 +45,18 @@ static inline void virtio_wmb(bool weak_barriers) wmb(); } +static inline void virtio_store_mb(bool weak_barriers, + __virtio16 *p, __virtio16 v) +{ + if (weak_barriers) + __smp_store_mb(*p, v); + else + { + WRITE_ONCE(*p, v); + mb(); + } +} + /* a load + acquire barrier, but only guaranteed to order reads */ static inline __virtio16 virtio_load_acquire_rmb(bool weak_barriers, __virtio16 *p) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index edc01d0..9befa32 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -517,10 +517,10 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) /* If we expect an interrupt for the next entry, tell host * by writing event index and flush out the write before * the read in the next get_buf call. */ - if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { - vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx); - virtio_mb(vq->weak_barriers); - } + if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) + virtio_store_mb(vq->weak_barriers, + &vring_used_event(&vq->vring), + cpu_to_virtio16(_vq->vdev, vq->last_used_idx)); #ifdef DEBUG vq->last_add_time_valid = false; @@ -653,8 +653,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq) } /* TODO: tune this threshold */ bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4; - vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs); - virtio_mb(vq->weak_barriers); + + virtio_store_mb(vq->weak_barriers, + &vring_used_event(&vq->vring), + cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs)); + if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) { END_USE(vq); return false; -- MST
Arnd Bergmann
2015-Dec-30 13:44 UTC
[PATCH 08/34] asm-generic: smp_store_mb should use smp_mb
On Wednesday 30 December 2015 15:24:47 Michael S. Tsirkin wrote:> asm-generic variant of smp_store_mb() calls mb() which is stronger > than implied by both the name and the documentation. > > smp_store_mb is only used by core kernel code at the moment, so > we know no one mis-uses it for an MMIO barrier. > Make it call smp_mb consistently before some arch-specific > code uses it as such by mistake. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > include/asm-generic/barrier.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index 538f8d1..987b2e0 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -93,7 +93,7 @@ > #endif /* CONFIG_SMP */ > > #ifndef smp_store_mb > -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); mb(); } while (0) > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0) > #endif > > #ifndef smp_mb__before_atomic >The same patch is already in the tip tree scheduled for 4.5 as d5a73cadf3fd ("lcoking/barriers, arch: Use smp barriers in smp_store_release()"). I think you can drop your version. arnd
Arnd Bergmann
2015-Dec-30 13:49 UTC
[PATCH 00/34] arch: barrier cleanup + __smp_XXX barriers for virt
On Wednesday 30 December 2015 15:24:12 Michael S. Tsirkin wrote:> This is really trying to cleanup some virt code, as suggested by Peter, who > said > > You could of course go fix that instead of mutilating things into > > sort-of functional state. > > This work is needed for virtio, so it's probably easiest to > merge it through my tree - is this fine by everyone? > Arnd, if you agree, could you ack this please?I've looked over all patches now, and everything seems fine (I had one small comment about a duplicate patch). It's a very nice cleanup, both for using the asm-generic file, and for the virtualization users. Please add my "Acked-by: Arnd Bergmann <arnd at arndb.de>" when merging the series through your tree. Arnd
> asm-generic/barrier.h defines a nop() macro. > To be able to use this header on ia64, we shouldn't > call local functions/variables nop(). > > There's one instance where this breaks on ia64: > rename the function to iosapic_nop to avoid the conflict.Acked-by: Tony Luck <tony.luck at intel.com>
> On ia64 smp_rmb, smp_wmb, read_barrier_depends, smp_read_barrier_depends > and smp_store_mb() match the asm-generic variants exactly. Drop the > local definitions and pull in asm-generic/barrier.h instead. > > This is in preparation to refactoring this code area.Acked-by: Tony Luck <tony.luck at intel.com>
> This defines __smp_XXX barriers for ia64, > for use by virtualization. > > smp_XXX barriers are removed as they are > defined correctly by asm-generic/barriers.h > > This reduces the amount of arch-specific boiler-plate code.Acked-by: Tony Luck <tony.luck at intel.com>
I find it a little bit irritating when I receive a patch directly addressed to me to review, but I don't see the "00/34" cover letter. I want to see what this series is about, and what the motiviation for this change is. I'm also highly frustrated, in general, with mix-and-match CC: lists for patch series. Just put everyone relevant in the CC: for the entire series. That way I can easily inspect other parts of the series to see infrastructure buildup, and examples of other conversions or versions of the change. Thanks.
David Miller
2015-Dec-30 20:46 UTC
[PATCH 00/34] arch: barrier cleanup + __smp_xxx barriers for virt
From: "Michael S. Tsirkin" <mst at redhat.com> Date: Wed, 30 Dec 2015 14:58:19 +0200> -. Patch 1 documents the __smp APIs, and explains why they are > useful for virtIf virt is doing things like interacting with descriptors that are shared with a (potentially SMP) host, why don't we just annotate those specific cases? The other memory barriers in the kernel do not matter for SMP'ness when build UP.
Michael S. Tsirkin
2015-Dec-30 21:36 UTC
[PATCH 00/34] arch: barrier cleanup + __smp_xxx barriers for virt
On Wed, Dec 30, 2015 at 03:46:46PM -0500, David Miller wrote:> From: "Michael S. Tsirkin" <mst at redhat.com> > Date: Wed, 30 Dec 2015 14:58:19 +0200 > > > -. Patch 1 documents the __smp APIs, and explains why they are > > useful for virt > > If virt is doing things like interacting with descriptors that are > shared with a (potentially SMP) host, why don't we just annotate those > specific cases?Using a bunch of per-arch ifdefs in virtio? That's fundamentally what we have now. But basically the rework reduces the LOC count in kernel anyway by moving all ifdef CONFIG_SMP hacks into asm-generic. So why not let virt benefit? Or do you mean wrappers for __smp_XXX that explicitly say they are for talking to host? E.g. pv_mb() pv_rmb() etc. That sounds very reasonable to me. __smp_XXX things then become an implementation detail.> The other memory barriers in the kernel do not matter for SMP'ness > when build UP.
Michael S. Tsirkin
2015-Dec-30 21:45 UTC
[PATCH 00/34] arch: barrier cleanup + __smp_xxx barriers for virt
On Wed, Dec 30, 2015 at 02:49:53PM +0100, Arnd Bergmann wrote:> On Wednesday 30 December 2015 15:24:12 Michael S. Tsirkin wrote: > > This is really trying to cleanup some virt code, as suggested by Peter, who > > said > > > You could of course go fix that instead of mutilating things into > > > sort-of functional state. > > > > This work is needed for virtio, so it's probably easiest to > > merge it through my tree - is this fine by everyone? > > Arnd, if you agree, could you ack this please? > > I've looked over all patches now, and everything seems fine (I had one > small comment about a duplicate patch). It's a very nice cleanup, both > for using the asm-generic file, and for the virtualization users. > > Please add my "Acked-by: Arnd Bergmann <arnd at arndb.de>" when merging the > series through your tree. > > ArndTo all patches in the series?
Stefano Stabellini
2016-Jan-04 11:08 UTC
[PATCH 01/34] Documentation/memory-barriers.txt: document __smb_mb()
On Wed, 30 Dec 2015, Michael S. Tsirkin wrote:> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > Documentation/memory-barriers.txt | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index aef9487..a20f7ef 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1655,17 +1655,18 @@ macro is a good place to start looking. > SMP memory barriers are reduced to compiler barriers on uniprocessor compiled > systems because it is assumed that a CPU will appear to be self-consistent, > and will order overlapping accesses correctly with respect to itself. > +However, see the subsection on "Virtual Machine Guests" below. > > [!] Note that SMP memory barriers _must_ be used to control the ordering of > references to shared memory on SMP systems, though the use of locking instead > is sufficient. > > Mandatory barriers should not be used to control SMP effects, since mandatory > -barriers unnecessarily impose overhead on UP systems. They may, however, be > -used to control MMIO effects on accesses through relaxed memory I/O windows. > -These are required even on non-SMP systems as they affect the order in which > -memory operations appear to a device by prohibiting both the compiler and the > -CPU from reordering them. > +barriers impose unnecessary overhead on both SMP and UP systems. They may, > +however, be used to control MMIO effects on accesses through relaxed memory I/O > +windows. These barriers are required even on non-SMP systems as they affect > +the order in which memory operations appear to a device by prohibiting both the > +compiler and the CPU from reordering them. > > > There are some more advanced barrier functions: > @@ -2948,6 +2949,28 @@ The Alpha defines the Linux kernel's memory barrier model. > > See the subsection on "Cache Coherency" above. > > +VIRTUAL MACHINE GUESTS > +------------------- > + > +Guests running within virtual machines might be affected by > +SMP effects even if the guest itself is compiled within^ without> +SMP support. > + > +This is an artifact of interfacing with an SMP host while > +running an UP kernel. > + > +Using mandatory barriers for this use-case would be possible > +but is often suboptimal. > + > +To handle this case optimally, low-level __smp_mb() etc macros are available. > +These have the same effect as smp_mb() etc when SMP is enabled, but generate > +identical code for SMP and non-SMP systems. For example, virtual machine guests > +should use __smp_mb() rather than smp_mb() when synchronizing against a > +(possibly SMP) host. > + > +These are equivalent to smp_mb() etc counterparts in all other respects, > +in particular, they do not control MMIO effects: to control > +MMIO effects, use mandatory barriers. > > ===========> EXAMPLE USES > -- > MST >
On Wed, 30 Dec 2015, Michael S. Tsirkin wrote:> drivers/xen/xenbus/xenbus_comms.c uses > full memory barriers to communicate with the other side. > > For guests compiled with CONFIG_SMP, smp_wmb and smp_mb > would be sufficient, so mb() and wmb() here are only needed if > a non-SMP guest runs on an SMP host. > > Switch to __smp_XXX barriers which serve this exact purpose. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>Reviewed-by: Stefano Stabellini <stefano.stabellini at eu.citrix.com>> This is straight-forward, but untested. > I can either merge this patchset through my tree if this is > acked, or defer this and merge the patchset first, > and xen bits through xen tree afterwards. > > Pls let me know.I think it can go via your tree. However you have missed a few Xen source files which need the same mb/__smb_mb conversions: drivers/xen/grant-table.c drivers/xen/evtchn.c drivers/xen/events/events_fifo.c drivers/xen/xen-scsiback.c drivers/xen/tmem.c drivers/xen/xen-pciback/pci_stub.c drivers/xen/xen-pciback/pciback_ops.c> drivers/xen/xenbus/xenbus_comms.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c > index fdb0f33..09b17c7 100644 > --- a/drivers/xen/xenbus/xenbus_comms.c > +++ b/drivers/xen/xenbus/xenbus_comms.c > @@ -123,14 +123,14 @@ int xb_write(const void *data, unsigned len) > avail = len; > > /* Must write data /after/ reading the consumer index. */ > - mb(); > + __smp_mb(); > > memcpy(dst, data, avail); > data += avail; > len -= avail; > > /* Other side must not see new producer until data is there. */ > - wmb(); > + __smp_wmb(); > intf->req_prod += avail; > > /* Implies mb(): other side will see the updated producer. */ > @@ -180,14 +180,14 @@ int xb_read(void *data, unsigned len) > avail = len; > > /* Must read data /after/ reading the producer index. */ > - rmb(); > + __smp_rmb(); > > memcpy(data, src, avail); > data += avail; > len -= avail; > > /* Other side must not see free space until we've copied out */ > - mb(); > + __smp_mb(); > intf->rsp_cons += avail; > > pr_debug("Finished read of %i bytes (%i to go)\n", avail, len); > -- > MST >
On Wed, 30 Dec 2015, Michael S. Tsirkin wrote:> include/xen/interface/io/ring.h uses > full memory barriers to communicate with the other side. > > For guests compiled with CONFIG_SMP, smp_wmb and smp_mb > would be sufficient, so mb() and wmb() here are only needed if > a non-SMP guest runs on an SMP host. > > Switch to __smp_XXX barriers which serve this exact purpose. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>Reviewed-by: Stefano Stabellini <stefano.stabellini at eu.citrix.com>> include/xen/interface/io/ring.h | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h > index 7dc685b..46dfc65 100644 > --- a/include/xen/interface/io/ring.h > +++ b/include/xen/interface/io/ring.h > @@ -208,12 +208,12 @@ struct __name##_back_ring { \ > > > #define RING_PUSH_REQUESTS(_r) do { \ > - wmb(); /* back sees requests /before/ updated producer index */ \ > + __smp_wmb(); /* back sees requests /before/ updated producer index */ \ > (_r)->sring->req_prod = (_r)->req_prod_pvt; \ > } while (0) > > #define RING_PUSH_RESPONSES(_r) do { \ > - wmb(); /* front sees responses /before/ updated producer index */ \ > + __smp_wmb(); /* front sees responses /before/ updated producer index */ \ > (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt; \ > } while (0) > > @@ -250,9 +250,9 @@ struct __name##_back_ring { \ > #define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do { \ > RING_IDX __old = (_r)->sring->req_prod; \ > RING_IDX __new = (_r)->req_prod_pvt; \ > - wmb(); /* back sees requests /before/ updated producer index */ \ > + __smp_wmb(); /* back sees requests /before/ updated producer index */ \ > (_r)->sring->req_prod = __new; \ > - mb(); /* back sees new requests /before/ we check req_event */ \ > + __smp_mb(); /* back sees new requests /before/ we check req_event */ \ > (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) < \ > (RING_IDX)(__new - __old)); \ > } while (0) > @@ -260,9 +260,9 @@ struct __name##_back_ring { \ > #define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do { \ > RING_IDX __old = (_r)->sring->rsp_prod; \ > RING_IDX __new = (_r)->rsp_prod_pvt; \ > - wmb(); /* front sees responses /before/ updated producer index */ \ > + __smp_wmb(); /* front sees responses /before/ updated producer index */ \ > (_r)->sring->rsp_prod = __new; \ > - mb(); /* front sees new responses /before/ we check rsp_event */ \ > + __smp_mb(); /* front sees new responses /before/ we check rsp_event */ \ > (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) < \ > (RING_IDX)(__new - __old)); \ > } while (0) > @@ -271,7 +271,7 @@ struct __name##_back_ring { \ > (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ > if (_work_to_do) break; \ > (_r)->sring->req_event = (_r)->req_cons + 1; \ > - mb(); \ > + __smp_mb(); \ > (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r); \ > } while (0) > > @@ -279,7 +279,7 @@ struct __name##_back_ring { \ > (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ > if (_work_to_do) break; \ > (_r)->sring->rsp_event = (_r)->rsp_cons + 1; \ > - mb(); \ > + __smp_mb(); \ > (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r); \ > } while (0) > > -- > MST >
Possibly Parallel Threads
- [PATCH 01/34] Documentation/memory-barriers.txt: document __smb_mb()
- [PATCH 01/34] Documentation/memory-barriers.txt: document __smb_mb()
- [PATCH 00/34] arch: barrier cleanup + __smp_XXX barriers for virt
- [PATCH 00/34] arch: barrier cleanup + __smp_XXX barriers for virt
- [PATCH v2 00/34] arch: barrier cleanup + barriers for virt