Hi all, Although [smp_]read_barrier_depends() became part of READ_ONCE() in commit 76ebbe78f739 ("locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()"), it still limps on in the Linux memory model with the sinister hope of attracting innocent new users so that it becomes impossible to remove altogether. Let's strike before it's too late: there's only one user outside of arch/alpha/ and that lives in the vhost code which I don't think you can actually compile for Alpha. Even if you could, it appears to be redundant. The rest of these patches remove any mention of the barrier from Documentation and comments, as well as removing its use from the Alpha backend and finally dropping it from the memory model completely. After this series, there are still two places where it is mentioned: 1. The Korean translation of memory-barriers.txt. I'd appreciate some help fixing this because it's not entirely a straightforward deletion. 2. The virtio vring tests under tools/. This is userspace code so I'm not too fussed about it. There's a chunk of header reshuffling at the start of the series so that READ_ONCE() can sensibly be overridden by arch code. Feedback welcome. Cheers, Will Cc: Yunjae Lee <lyj7694 at gmail.com> Cc: SeongJae Park <sj38.park at gmail.com> Cc: "Paul E. McKenney" <paulmck at kernel.org> Cc: Josh Triplett <josh at joshtriplett.org> Cc: Matt Turner <mattst88 at gmail.com> Cc: Ivan Kokshaysky <ink at jurassic.park.msu.ru> Cc: Richard Henderson <rth at twiddle.net> Cc: Peter Zijlstra <peterz at infradead.org> Cc: Alan Stern <stern at rowland.harvard.edu> Cc: Michael Ellerman <mpe at ellerman.id.au> Cc: "Michael S. Tsirkin" <mst at redhat.com> Cc: Jason Wang <jasowang at redhat.com> Cc: Arnd Bergmann <arnd at arndb.de> Cc: Joe Perches <joe at perches.com> Cc: Boqun Feng <boqun.feng at gmail.com> Cc: linux-alpha at vger.kernel.org Cc: virtualization at lists.linux-foundation.org --->8 Will Deacon (13): compiler.h: Split {READ,WRITE}_ONCE definitions out into rwonce.h READ_ONCE: Undefine internal __READ_ONCE_SIZE macro after use READ_ONCE: Allow __READ_ONCE_SIZE cases to be overridden by the architecture vhost: Remove redundant use of read_barrier_depends() barrier alpha: Override READ_ONCE() with barriered implementation READ_ONCE: Remove smp_read_barrier_depends() invocation alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb() locking/barriers: Remove definitions for [smp_]read_barrier_depends() Documentation/barriers: Remove references to [smp_]read_barrier_depends() tools/memory-model: Remove smp_read_barrier_depends() from informal doc powerpc: Remove comment about read_barrier_depends() include/linux: Remove smp_read_barrier_depends() from comments checkpatch: Remove checks relating to [smp_]read_barrier_depends() .../RCU/Design/Requirements/Requirements.html | 11 +- Documentation/memory-barriers.txt | 156 +----------------- arch/alpha/include/asm/atomic.h | 16 +- arch/alpha/include/asm/barrier.h | 61 +------ arch/alpha/include/asm/pgtable.h | 10 +- arch/alpha/include/asm/rwonce.h | 22 +++ arch/powerpc/include/asm/barrier.h | 2 - drivers/vhost/vhost.c | 5 - include/asm-generic/Kbuild | 1 + include/asm-generic/barrier.h | 17 -- include/asm-generic/rwonce.h | 131 +++++++++++++++ include/linux/compiler.h | 114 +------------ include/linux/compiler_attributes.h | 12 ++ include/linux/percpu-refcount.h | 2 +- include/linux/ptr_ring.h | 2 +- mm/memory.c | 2 +- scripts/checkpatch.pl | 9 +- .../Documentation/explanation.txt | 26 ++- 18 files changed, 217 insertions(+), 382 deletions(-) create mode 100644 arch/alpha/include/asm/rwonce.h create mode 100644 include/asm-generic/rwonce.h -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 01/13] compiler.h: Split {READ, WRITE}_ONCE definitions out into rwonce.h
In preparation for allowing architectures to define their own implementation of the 'READ_ONCE()' macro, move the generic '{READ,WRITE}_ONCE()' definitions out of the unwieldy 'linux/compiler.h' and into a new 'rwonce.h' header under 'asm-generic'. Signed-off-by: Will Deacon <will at kernel.org> --- include/asm-generic/Kbuild | 1 + include/asm-generic/rwonce.h | 110 +++++++++++++++++++++++++++ include/linux/compiler.h | 114 +--------------------------- include/linux/compiler_attributes.h | 12 +++ 4 files changed, 125 insertions(+), 112 deletions(-) create mode 100644 include/asm-generic/rwonce.h diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index adff14fcb8e4..2a7a1e94d4cb 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -4,4 +4,5 @@ # (This file is not included when SRCARCH=um since UML borrows several # asm headers from the host architecutre.) +mandatory-y += rwonce.h mandatory-y += simd.h diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h new file mode 100644 index 000000000000..d189455ae038 --- /dev/null +++ b/include/asm-generic/rwonce.h @@ -0,0 +1,110 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Prevent the compiler from merging or refetching reads or writes. The + * compiler is also forbidden from reordering successive instances of + * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some + * particular ordering. One way to make the compiler aware of ordering is to + * put the two invocations of READ_ONCE or WRITE_ONCE in different C + * statements. + * + * These two macros will also work on aggregate data types like structs or + * unions. If the size of the accessed data type exceeds the word size of + * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will + * fall back to memcpy(). There's at least two memcpy()s: one for the + * __builtin_memcpy() and then one for the macro doing the copy of variable + * - '__u' allocated on the stack. + * + * Their two major use cases are: (1) Mediating communication between + * process-level code and irq/NMI handlers, all running on the same CPU, + * and (2) Ensuring that the compiler does not fold, spindle, or otherwise + * mutilate accesses that either do not require ordering or that interact + * with an explicit memory barrier or atomic instruction that provides the + * required ordering. + */ +#ifndef __ASM_GENERIC_RWONCE_H +#define __ASM_GENERIC_RWONCE_H + +#ifndef __ASSEMBLY__ + +#include <linux/compiler_attributes.h> +#include <linux/kasan-checks.h> + +#include <uapi/linux/types.h> + +#include <asm/barrier.h> + +#define __READ_ONCE_SIZE \ +({ \ + switch (size) { \ + case 1: *(__u8 *)res = *(volatile __u8 *)p; break; \ + case 2: *(__u16 *)res = *(volatile __u16 *)p; break; \ + case 4: *(__u32 *)res = *(volatile __u32 *)p; break; \ + case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \ + default: \ + barrier(); \ + __builtin_memcpy((void *)res, (const void *)p, size); \ + barrier(); \ + } \ +}) + +static __always_inline +void __read_once_size(const volatile void *p, void *res, int size) +{ + __READ_ONCE_SIZE; +} + +static __no_kasan_or_inline +void __read_once_size_nocheck(const volatile void *p, void *res, int size) +{ + __READ_ONCE_SIZE; +} + +static __always_inline void __write_once_size(volatile void *p, void *res, int size) +{ + switch (size) { + case 1: *(volatile __u8 *)p = *(__u8 *)res; break; + case 2: *(volatile __u16 *)p = *(__u16 *)res; break; + case 4: *(volatile __u32 *)p = *(__u32 *)res; break; + case 8: *(volatile __u64 *)p = *(__u64 *)res; break; + default: + barrier(); + __builtin_memcpy((void *)p, (const void *)res, size); + barrier(); + } +} + +#define __READ_ONCE(x, check) \ +({ \ + union { typeof(x) __val; char __c[1]; } __u; \ + if (check) \ + __read_once_size(&(x), __u.__c, sizeof(x)); \ + else \ + __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ + __u.__val; \ +}) +#define READ_ONCE(x) __READ_ONCE(x, 1) + +/* + * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need + * to hide memory access from KASAN. + */ +#define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0) + +static __no_kasan_or_inline +unsigned long read_word_at_a_time(const void *addr) +{ + kasan_check_read(addr, 1); + return *(unsigned long *)addr; +} + +#define WRITE_ONCE(x, val) \ +({ \ + union { typeof(x) __val; char __c[1]; } __u = \ + { .__val = (__force typeof(x)) (val) }; \ + __write_once_size(&(x), __u.__c, sizeof(x)); \ + __u.__val; \ +}) + +#endif /* __ASSEMBLY__ */ +#endif /* __ASM_GENERIC_RWONCE_H */ diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 5e88e7e33abe..6de09d2f42ad 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -177,118 +177,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__) #endif -#include <uapi/linux/types.h> - -#define __READ_ONCE_SIZE \ -({ \ - switch (size) { \ - case 1: *(__u8 *)res = *(volatile __u8 *)p; break; \ - case 2: *(__u16 *)res = *(volatile __u16 *)p; break; \ - case 4: *(__u32 *)res = *(volatile __u32 *)p; break; \ - case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \ - default: \ - barrier(); \ - __builtin_memcpy((void *)res, (const void *)p, size); \ - barrier(); \ - } \ -}) - -static __always_inline -void __read_once_size(const volatile void *p, void *res, int size) -{ - __READ_ONCE_SIZE; -} - -#ifdef CONFIG_KASAN -/* - * We can't declare function 'inline' because __no_sanitize_address confilcts - * with inlining. Attempt to inline it may cause a build failure. - * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 - * '__maybe_unused' allows us to avoid defined-but-not-used warnings. - */ -# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused -#else -# define __no_kasan_or_inline __always_inline -#endif - -static __no_kasan_or_inline -void __read_once_size_nocheck(const volatile void *p, void *res, int size) -{ - __READ_ONCE_SIZE; -} - -static __always_inline void __write_once_size(volatile void *p, void *res, int size) -{ - switch (size) { - case 1: *(volatile __u8 *)p = *(__u8 *)res; break; - case 2: *(volatile __u16 *)p = *(__u16 *)res; break; - case 4: *(volatile __u32 *)p = *(__u32 *)res; break; - case 8: *(volatile __u64 *)p = *(__u64 *)res; break; - default: - barrier(); - __builtin_memcpy((void *)p, (const void *)res, size); - barrier(); - } -} - -/* - * Prevent the compiler from merging or refetching reads or writes. The - * compiler is also forbidden from reordering successive instances of - * READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some - * particular ordering. One way to make the compiler aware of ordering is to - * put the two invocations of READ_ONCE or WRITE_ONCE in different C - * statements. - * - * These two macros will also work on aggregate data types like structs or - * unions. If the size of the accessed data type exceeds the word size of - * the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will - * fall back to memcpy(). There's at least two memcpy()s: one for the - * __builtin_memcpy() and then one for the macro doing the copy of variable - * - '__u' allocated on the stack. - * - * Their two major use cases are: (1) Mediating communication between - * process-level code and irq/NMI handlers, all running on the same CPU, - * and (2) Ensuring that the compiler does not fold, spindle, or otherwise - * mutilate accesses that either do not require ordering or that interact - * with an explicit memory barrier or atomic instruction that provides the - * required ordering. - */ -#include <asm/barrier.h> -#include <linux/kasan-checks.h> - -#define __READ_ONCE(x, check) \ -({ \ - union { typeof(x) __val; char __c[1]; } __u; \ - if (check) \ - __read_once_size(&(x), __u.__c, sizeof(x)); \ - else \ - __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ - smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ - __u.__val; \ -}) -#define READ_ONCE(x) __READ_ONCE(x, 1) - -/* - * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need - * to hide memory access from KASAN. - */ -#define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0) - -static __no_kasan_or_inline -unsigned long read_word_at_a_time(const void *addr) -{ - kasan_check_read(addr, 1); - return *(unsigned long *)addr; -} - -#define WRITE_ONCE(x, val) \ -({ \ - union { typeof(x) __val; char __c[1]; } __u = \ - { .__val = (__force typeof(x)) (val) }; \ - __write_once_size(&(x), __u.__c, sizeof(x)); \ - __u.__val; \ -}) - #endif /* __KERNEL__ */ /* @@ -356,4 +244,6 @@ static inline void *offset_to_ptr(const int *off) /* &a[0] degrades to a pointer: a different type from an array */ #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0])) +#include <asm/rwonce.h> + #endif /* __LINUX_COMPILER_H */ diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index cdf016596659..7cb92286de9f 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -270,4 +270,16 @@ */ #define __weak __attribute__((__weak__)) +#ifdef CONFIG_KASAN +/* + * We can't declare function 'inline' because __no_sanitize_address confilcts + * with inlining. Attempt to inline it may cause a build failure. + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368 + * '__maybe_unused' allows us to avoid defined-but-not-used warnings. + */ +# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused +#else +# define __no_kasan_or_inline __always_inline +#endif + #endif /* __LINUX_COMPILER_ATTRIBUTES_H */ -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 02/13] READ_ONCE: Undefine internal __READ_ONCE_SIZE macro after use
The '__READ_ONCE_SIZE()' macro is only used as part of building 'READ_ONCE()', so undefine it once we don't need it anymore. Signed-off-by: Will Deacon <will at kernel.org> --- include/asm-generic/rwonce.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h index d189455ae038..abf326634ecd 100644 --- a/include/asm-generic/rwonce.h +++ b/include/asm-generic/rwonce.h @@ -59,6 +59,8 @@ void __read_once_size_nocheck(const volatile void *p, void *res, int size) __READ_ONCE_SIZE; } +#undef __READ_ONCE_SIZE + static __always_inline void __write_once_size(volatile void *p, void *res, int size) { switch (size) { -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 03/13] READ_ONCE: Allow __READ_ONCE_SIZE cases to be overridden by the architecture
The meat and potatoes of 'READ_ONCE()' is defined by the '__READ_ONCE_SIZE()' macro, which uses volatile casts in an attempt to avoid tearing of byte, halfword, word and double-word accesses. Allow this to be overridden by the architecture code in the case that things like memory barriers are also required. Signed-off-by: Will Deacon <will at kernel.org> --- include/asm-generic/rwonce.h | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h index abf326634ecd..2c2ac0948c94 100644 --- a/include/asm-generic/rwonce.h +++ b/include/asm-generic/rwonce.h @@ -33,13 +33,29 @@ #include <asm/barrier.h> +#ifndef __read_once_size_1 +#define __read_once_size_1(p) (*(volatile __u8 *)(p)) +#endif + +#ifndef __read_once_size_2 +#define __read_once_size_2(p) (*(volatile __u16 *)(p)) +#endif + +#ifndef __read_once_size_4 +#define __read_once_size_4(p) (*(volatile __u32 *)(p)) +#endif + +#ifndef __read_once_size_8 +#define __read_once_size_8(p) (*(volatile __u64 *)(p)) +#endif + #define __READ_ONCE_SIZE \ ({ \ switch (size) { \ - case 1: *(__u8 *)res = *(volatile __u8 *)p; break; \ - case 2: *(__u16 *)res = *(volatile __u16 *)p; break; \ - case 4: *(__u32 *)res = *(volatile __u32 *)p; break; \ - case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \ + case 1: *(__u8 *)res = __read_once_size_1(p); break; \ + case 2: *(__u16 *)res = __read_once_size_2(p); break; \ + case 4: *(__u32 *)res = __read_once_size_4(p); break; \ + case 8: *(__u64 *)res = __read_once_size_8(p); break; \ default: \ barrier(); \ __builtin_memcpy((void *)res, (const void *)p, size); \ @@ -59,6 +75,10 @@ void __read_once_size_nocheck(const volatile void *p, void *res, int size) __READ_ONCE_SIZE; } +#undef __read_once_size_1 +#undef __read_once_size_2 +#undef __read_once_size_4 +#undef __read_once_size_8 #undef __READ_ONCE_SIZE static __always_inline void __write_once_size(volatile void *p, void *res, int size) -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 04/13] vhost: Remove redundant use of read_barrier_depends() barrier
Since commit 76ebbe78f739 ("locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()"), there is no need to use 'smp_read_barrier_depends()' outside of the Alpha architecture code. Unfortunately, there is precisely >one< user in the vhost code, and there isn't an obvious 'READ_ONCE()' access making the barrier redundant. However, on closer inspection (thanks, Jason), it appears that vring synchronisation between the producer and consumer occurs via the 'avail_idx' field, which is followed up by an 'rmb()' in 'vhost_get_vq_desc()', making the 'read_barrier_depends()' redundant on Alpha. Jason says: | I'm also confused about the barrier here, basically in driver side | we did: | | 1) allocate pages | 2) store pages in indirect->addr | 3) smp_wmb() | 4) increase the avail idx (somehow a tail pointer of vring) | | in vhost we did: | | 1) read avail idx | 2) smp_rmb() | 3) read indirect->addr | 4) read from indirect->addr | | It looks to me even the data dependency barrier is not necessary | since we have rmb() which is sufficient for us to the correct | indirect->addr and driver are not expected to do any writing to | indirect->addr after avail idx is increased Remove the redundant barrier invocation. Suggested-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Will Deacon <will at kernel.org> --- drivers/vhost/vhost.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 36ca2cf419bf..865bc91b783c 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2128,11 +2128,6 @@ static int get_indirect(struct vhost_virtqueue *vq, return ret; } iov_iter_init(&from, READ, vq->indirect, ret, len); - - /* We will use the result as an address to read from, so most - * architectures only need a compiler barrier here. */ - read_barrier_depends(); - count = len / sizeof desc; /* Buffers are chained via a 16 bit next field, so * we can have at most 2^16 of these. */ -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 05/13] alpha: Override READ_ONCE() with barriered implementation
Rather then relying on the core code to use 'smp_read_barrier_depends()' as part of the 'READ_ONCE()' definition, instead override 'READ_ONCE()' in the Alpha code so that it is treated the same way as 'smp_load_acquire()'. Signed-off-by: Will Deacon <will at kernel.org> --- arch/alpha/include/asm/barrier.h | 61 ++++---------------------------- arch/alpha/include/asm/rwonce.h | 22 ++++++++++++ 2 files changed, 29 insertions(+), 54 deletions(-) create mode 100644 arch/alpha/include/asm/rwonce.h diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h index 92ec486a4f9e..1f6abe2d1392 100644 --- a/arch/alpha/include/asm/barrier.h +++ b/arch/alpha/include/asm/barrier.h @@ -2,64 +2,17 @@ #ifndef __BARRIER_H #define __BARRIER_H -#include <asm/compiler.h> - #define mb() __asm__ __volatile__("mb": : :"memory") #define rmb() __asm__ __volatile__("mb": : :"memory") #define wmb() __asm__ __volatile__("wmb": : :"memory") -/** - * read_barrier_depends - Flush all pending reads that subsequents reads - * depend on. - * - * No data-dependent reads from memory-like regions are ever reordered - * over this barrier. All reads preceding this primitive are guaranteed - * to access memory (but not necessarily other CPUs' caches) before any - * reads following this primitive that depend on the data return by - * any of the preceding reads. This primitive is much lighter weight than - * rmb() on most CPUs, and is never heavier weight than is - * rmb(). - * - * These ordering constraints are respected by both the local CPU - * and the compiler. - * - * Ordering is not guaranteed by anything other than these primitives, - * not even by data dependencies. See the documentation for - * memory_barrier() for examples and URLs to more information. - * - * For example, the following code would force ordering (the initial - * value of "a" is zero, "b" is one, and "p" is "&a"): - * - * <programlisting> - * CPU 0 CPU 1 - * - * b = 2; - * memory_barrier(); - * p = &b; q = p; - * read_barrier_depends(); - * d = *q; - * </programlisting> - * - * because the read of "*q" depends on the read of "p" and these - * two reads are separated by a read_barrier_depends(). However, - * the following code, with the same initial values for "a" and "b": - * - * <programlisting> - * CPU 0 CPU 1 - * - * a = 2; - * memory_barrier(); - * b = 3; y = b; - * read_barrier_depends(); - * x = a; - * </programlisting> - * - * does not enforce ordering, since there is no data dependency between - * the read of "a" and the read of "b". Therefore, on some CPUs, such - * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() - * in cases like this where there are no data dependencies. - */ -#define read_barrier_depends() __asm__ __volatile__("mb": : :"memory") +#define __smp_load_acquire(p) \ +({ \ + typeof(*p) ___p1 = (*(volatile typeof(*p) *)(p)); \ + compiletime_assert_atomic_type(*p); \ + mb(); \ + ___p1; \ +}) #ifdef CONFIG_SMP #define __ASM_SMP_MB "\tmb\n" diff --git a/arch/alpha/include/asm/rwonce.h b/arch/alpha/include/asm/rwonce.h new file mode 100644 index 000000000000..ef5601352b55 --- /dev/null +++ b/arch/alpha/include/asm/rwonce.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2019 Google LLC. + */ +#ifndef __ASM_RWONCE_H +#define __ASM_RWONCE_H + +#include <asm/barrier.h> + +/* + * Alpha is apparently daft enough to reorder address-dependent loads + * on some CPU implementations. Knock some common sense into it with + * a memory barrier in READ_ONCE(). + */ +#define __read_once_size_1(p) __smp_load_acquire((u8 *)(p)) +#define __read_once_size_2(p) __smp_load_acquire((u16 *)(p)) +#define __read_once_size_4(p) __smp_load_acquire((u32 *)(p)) +#define __read_once_size_8(p) __smp_load_acquire((u64 *)(p)) + +#include <asm-generic/rwonce.h> + +#endif /* __ASM_RWONCE_H */ -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 06/13] READ_ONCE: Remove smp_read_barrier_depends() invocation
Alpha overrides '__read_once_size_n()' directly, so there's no need to use 'smp_read_barrier_depends()' in the core code. Signed-off-by: Will Deacon <will at kernel.org> --- include/asm-generic/rwonce.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h index 2c2ac0948c94..2e3289268a89 100644 --- a/include/asm-generic/rwonce.h +++ b/include/asm-generic/rwonce.h @@ -102,7 +102,6 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s __read_once_size(&(x), __u.__c, sizeof(x)); \ else \ __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ - smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ __u.__val; \ }) #define READ_ONCE(x) __READ_ONCE(x, 1) -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 07/13] alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()
In preparation for removing 'smp_read_barrier_depends()' altogether, move the Alpha code over to using 'smp_rmb()' and 'smp_mb()' directly. Signed-off-by: Will Deacon <will at kernel.org> --- arch/alpha/include/asm/atomic.h | 16 ++++++++-------- arch/alpha/include/asm/pgtable.h | 10 +++++----- mm/memory.c | 2 +- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/alpha/include/asm/atomic.h b/arch/alpha/include/asm/atomic.h index 2144530d1428..2f8f7e54792f 100644 --- a/arch/alpha/include/asm/atomic.h +++ b/arch/alpha/include/asm/atomic.h @@ -16,10 +16,10 @@ /* * To ensure dependency ordering is preserved for the _relaxed and - * _release atomics, an smp_read_barrier_depends() is unconditionally - * inserted into the _relaxed variants, which are used to build the - * barriered versions. Avoid redundant back-to-back fences in the - * _acquire and _fence versions. + * _release atomics, an smp_mb() is unconditionally inserted into the + * _relaxed variants, which are used to build the barriered versions. + * Avoid redundant back-to-back fences in the _acquire and _fence + * versions. */ #define __atomic_acquire_fence() #define __atomic_post_full_fence() @@ -70,7 +70,7 @@ static inline int atomic_##op##_return_relaxed(int i, atomic_t *v) \ ".previous" \ :"=&r" (temp), "=m" (v->counter), "=&r" (result) \ :"Ir" (i), "m" (v->counter) : "memory"); \ - smp_read_barrier_depends(); \ + smp_mb(); \ return result; \ } @@ -88,7 +88,7 @@ static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v) \ ".previous" \ :"=&r" (temp), "=m" (v->counter), "=&r" (result) \ :"Ir" (i), "m" (v->counter) : "memory"); \ - smp_read_barrier_depends(); \ + smp_mb(); \ return result; \ } @@ -123,7 +123,7 @@ static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v) \ ".previous" \ :"=&r" (temp), "=m" (v->counter), "=&r" (result) \ :"Ir" (i), "m" (v->counter) : "memory"); \ - smp_read_barrier_depends(); \ + smp_mb(); \ return result; \ } @@ -141,7 +141,7 @@ static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v) \ ".previous" \ :"=&r" (temp), "=m" (v->counter), "=&r" (result) \ :"Ir" (i), "m" (v->counter) : "memory"); \ - smp_read_barrier_depends(); \ + smp_mb(); \ return result; \ } diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index 065b57f408c3..b807793646c7 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -288,9 +288,9 @@ extern inline pte_t pte_mkspecial(pte_t pte) { return pte; } #define pgd_offset(mm, address) ((mm)->pgd+pgd_index(address)) /* - * The smp_read_barrier_depends() in the following functions are required to - * order the load of *dir (the pointer in the top level page table) with any - * subsequent load of the returned pmd_t *ret (ret is data dependent on *dir). + * The smp_rmb() in the following functions are required to order the load of + * *dir (the pointer in the top level page table) with any subsequent load of + * the returned pmd_t *ret (ret is data dependent on *dir). * * If this ordering is not enforced, the CPU might load an older value of * *ret, which may be uninitialized data. See mm/memory.c:__pte_alloc for @@ -304,7 +304,7 @@ extern inline pte_t pte_mkspecial(pte_t pte) { return pte; } extern inline pmd_t * pmd_offset(pgd_t * dir, unsigned long address) { pmd_t *ret = (pmd_t *) pgd_page_vaddr(*dir) + ((address >> PMD_SHIFT) & (PTRS_PER_PAGE - 1)); - smp_read_barrier_depends(); /* see above */ + smp_rmb(); /* see above */ return ret; } @@ -313,7 +313,7 @@ extern inline pte_t * pte_offset_kernel(pmd_t * dir, unsigned long address) { pte_t *ret = (pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 1)); - smp_read_barrier_depends(); /* see above */ + smp_rmb(); /* see above */ return ret; } diff --git a/mm/memory.c b/mm/memory.c index b1ca51a079f2..c4a74e6d2c5c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -420,7 +420,7 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) * of a chain of data-dependent loads, meaning most CPUs (alpha * being the notable exception) will already guarantee loads are * seen in-order. See the alpha page table accessors for the - * smp_read_barrier_depends() barriers in page table walking code. + * smp_rmb() barriers in page table walking code. */ smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 08/13] locking/barriers: Remove definitions for [smp_]read_barrier_depends()
There are no remaining users of '[smp_]read_barrier_depends()', so remove it from the generic implementation of 'barrier.h'. Signed-off-by: Will Deacon <will at kernel.org> --- include/asm-generic/barrier.h | 17 ----------------- 1 file changed, 17 deletions(-) diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 85b28eb80b11..4b2a60f738b2 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -46,10 +46,6 @@ #define dma_wmb() wmb() #endif -#ifndef read_barrier_depends -#define read_barrier_depends() do { } while (0) -#endif - #ifndef __smp_mb #define __smp_mb() mb() #endif @@ -62,10 +58,6 @@ #define __smp_wmb() wmb() #endif -#ifndef __smp_read_barrier_depends -#define __smp_read_barrier_depends() read_barrier_depends() -#endif - #ifdef CONFIG_SMP #ifndef smp_mb @@ -80,10 +72,6 @@ #define smp_wmb() __smp_wmb() #endif -#ifndef smp_read_barrier_depends -#define smp_read_barrier_depends() __smp_read_barrier_depends() -#endif - #else /* !CONFIG_SMP */ #ifndef smp_mb @@ -98,10 +86,6 @@ #define smp_wmb() barrier() #endif -#ifndef smp_read_barrier_depends -#define smp_read_barrier_depends() do { } while (0) -#endif - #endif /* CONFIG_SMP */ #ifndef __smp_store_mb @@ -196,7 +180,6 @@ do { \ #define virt_mb() __smp_mb() #define virt_rmb() __smp_rmb() #define virt_wmb() __smp_wmb() -#define virt_read_barrier_depends() __smp_read_barrier_depends() #define virt_store_mb(var, value) __smp_store_mb(var, value) #define virt_mb__before_atomic() __smp_mb__before_atomic() #define virt_mb__after_atomic() __smp_mb__after_atomic() -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 09/13] Documentation/barriers: Remove references to [smp_]read_barrier_depends()
The '[smp_]read_barrier_depends()' barrier macros no longer exist as part of the Linux memory model, so remove all references to them from the Documentation/ directory. Although this is fairly mechanical on the whole, we drop the "CACHE COHERENCY" section entirely from 'memory-barriers.txt' as it doesn't make any sense now that the dependency barriers have been removed. Signed-off-by: Will Deacon <will at kernel.org> --- .../RCU/Design/Requirements/Requirements.html | 11 +- Documentation/memory-barriers.txt | 156 +----------------- 2 files changed, 13 insertions(+), 154 deletions(-) diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 467251f7fef6..4b8357def8ff 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -577,12 +577,11 @@ guarantee to also allow data elements to be removed from RCU-protected linked data structures, again without disrupting RCU readers. <p> -This guarantee was only partially premeditated. -DYNIX/ptx used an explicit memory barrier for publication, but had nothing -resembling <tt>rcu_dereference()</tt> for subscription, nor did it -have anything resembling the <tt>smp_read_barrier_depends()</tt> -that was later subsumed into <tt>rcu_dereference()</tt> and later -still into <tt>READ_ONCE()</tt>. +This guarantee was only partially premeditated. DYNIX/ptx used an explicit +memory barrier for publication, but had nothing resembling +<tt>rcu_dereference()</tt> for subscription, nor did it have anything +resembling the dependency-ordering barrier that was later subsumed into +<tt>rcu_dereference()</tt> and later still into <tt>READ_ONCE()</tt>. The need for these operations made itself known quite suddenly at a late-1990s meeting with the DEC Alpha architects, back in the days when DEC was still a free-standing company. diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 1adbb8a371c7..16b6aa7c5fe4 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -556,12 +556,12 @@ There are certain things that the Linux kernel memory barriers do not guarantee: DATA DEPENDENCY BARRIERS (HISTORICAL) ------------------------------------- -As of v4.15 of the Linux kernel, an smp_read_barrier_depends() was -added to READ_ONCE(), which means that about the only people who -need to pay attention to this section are those working on DEC Alpha -architecture-specific code and those working on READ_ONCE() itself. -For those who need it, and for those who are interested in the history, -here is the story of data-dependency barriers. +As of v4.15 of the Linux kernel, an smp_mb() was added to READ_ONCE() for +DEC Alpha, which means that about the only people who need to pay attention +to this section are those working on DEC Alpha architecture-specific code +and those working on READ_ONCE() itself. For those who need it, and for +those who are interested in the history, here is the story of +data-dependency barriers. The usage requirements of data dependency barriers are a little subtle, and it's not always obvious that they're needed. To illustrate, consider the @@ -2709,144 +2709,6 @@ the properties of the memory window through which devices are accessed and/or the use of any special device communication instructions the CPU may have. -CACHE COHERENCY ---------------- - -Life isn't quite as simple as it may appear above, however: for while the -caches are expected to be coherent, there's no guarantee that that coherency -will be ordered. This means that while changes made on one CPU will -eventually become visible on all CPUs, there's no guarantee that they will -become apparent in the same order on those other CPUs. - - -Consider dealing with a system that has a pair of CPUs (1 & 2), each of which -has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D): - - : - : +--------+ - : +---------+ | | - +--------+ : +--->| Cache A |<------->| | - | | : | +---------+ | | - | CPU 1 |<---+ | | - | | : | +---------+ | | - +--------+ : +--->| Cache B |<------->| | - : +---------+ | | - : | Memory | - : +---------+ | System | - +--------+ : +--->| Cache C |<------->| | - | | : | +---------+ | | - | CPU 2 |<---+ | | - | | : | +---------+ | | - +--------+ : +--->| Cache D |<------->| | - : +---------+ | | - : +--------+ - : - -Imagine the system has the following properties: - - (*) an odd-numbered cache line may be in cache A, cache C or it may still be - resident in memory; - - (*) an even-numbered cache line may be in cache B, cache D or it may still be - resident in memory; - - (*) while the CPU core is interrogating one cache, the other cache may be - making use of the bus to access the rest of the system - perhaps to - displace a dirty cacheline or to do a speculative load; - - (*) each cache has a queue of operations that need to be applied to that cache - to maintain coherency with the rest of the system; - - (*) the coherency queue is not flushed by normal loads to lines already - present in the cache, even though the contents of the queue may - potentially affect those loads. - -Imagine, then, that two writes are made on the first CPU, with a write barrier -between them to guarantee that they will appear to reach that CPU's caches in -the requisite order: - - CPU 1 CPU 2 COMMENT - =============== =============== ======================================- u == 0, v == 1 and p == &u, q == &u - v = 2; - smp_wmb(); Make sure change to v is visible before - change to p - <A:modify v=2> v is now in cache A exclusively - p = &v; - <B:modify p=&v> p is now in cache B exclusively - -The write memory barrier forces the other CPUs in the system to perceive that -the local CPU's caches have apparently been updated in the correct order. But -now imagine that the second CPU wants to read those values: - - CPU 1 CPU 2 COMMENT - =============== =============== ======================================- ... - q = p; - x = *q; - -The above pair of reads may then fail to happen in the expected order, as the -cacheline holding p may get updated in one of the second CPU's caches while -the update to the cacheline holding v is delayed in the other of the second -CPU's caches by some other cache event: - - CPU 1 CPU 2 COMMENT - =============== =============== ======================================- u == 0, v == 1 and p == &u, q == &u - v = 2; - smp_wmb(); - <A:modify v=2> <C:busy> - <C:queue v=2> - p = &v; q = p; - <D:request p> - <B:modify p=&v> <D:commit p=&v> - <D:read p> - x = *q; - <C:read *q> Reads from v before v updated in cache - <C:unbusy> - <C:commit v=2> - -Basically, while both cachelines will be updated on CPU 2 eventually, there's -no guarantee that, without intervention, the order of update will be the same -as that committed on CPU 1. - - -To intervene, we need to interpolate a data dependency barrier or a read -barrier between the loads (which as of v4.15 is supplied unconditionally -by the READ_ONCE() macro). This will force the cache to commit its -coherency queue before processing any further requests: - - CPU 1 CPU 2 COMMENT - =============== =============== ======================================- u == 0, v == 1 and p == &u, q == &u - v = 2; - smp_wmb(); - <A:modify v=2> <C:busy> - <C:queue v=2> - p = &v; q = p; - <D:request p> - <B:modify p=&v> <D:commit p=&v> - <D:read p> - smp_read_barrier_depends() - <C:unbusy> - <C:commit v=2> - x = *q; - <C:read *q> Reads from v after v updated in cache - - -This sort of problem can be encountered on DEC Alpha processors as they have a -split cache that improves performance by making better use of the data bus. -While most CPUs do imply a data dependency barrier on the read when a memory -access depends on a read, not all do, so it may not be relied on. - -Other CPUs may also have split caches, but must coordinate between the various -cachelets for normal memory accesses. The semantics of the Alpha removes the -need for hardware coordination in the absence of memory barriers, which -permitted Alpha to sport higher CPU clock rates back in the day. However, -please note that (again, as of v4.15) smp_read_barrier_depends() should not -be used except in Alpha arch-specific code and within the READ_ONCE() macro. - - CACHE COHERENCY VS DMA ---------------------- @@ -3010,10 +2872,8 @@ caches with the memory coherence system, thus making it seem like pointer changes vs new data occur in the right order. The Alpha defines the Linux kernel's memory model, although as of v4.15 -the Linux kernel's addition of smp_read_barrier_depends() to READ_ONCE() -greatly reduced Alpha's impact on the memory model. - -See the subsection on "Cache Coherency" above. +the Linux kernel's addition of smp_mb() to READ_ONCE() on Alpha greatly +reduced its impact on the memory model. VIRTUAL MACHINE GUESTS -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 10/13] tools/memory-model: Remove smp_read_barrier_depends() from informal doc
'smp_read_barrier_depends()' has gone the way of mmiowb() and so many esoteric memory barriers before it. Drop the two mentions of this deceased barrier from the LKMM informal explanation document. Signed-off-by: Will Deacon <will at kernel.org> --- .../Documentation/explanation.txt | 26 +++++++++---------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt index 488f11f6c588..3050bf67b8d0 100644 --- a/tools/memory-model/Documentation/explanation.txt +++ b/tools/memory-model/Documentation/explanation.txt @@ -1118,12 +1118,10 @@ maintain at least the appearance of FIFO order. In practice, this difficulty is solved by inserting a special fence between P1's two loads when the kernel is compiled for the Alpha architecture. In fact, as of version 4.15, the kernel automatically -adds this fence (called smp_read_barrier_depends() and defined as -nothing at all on non-Alpha builds) after every READ_ONCE() and atomic -load. The effect of the fence is to cause the CPU not to execute any -po-later instructions until after the local cache has finished -processing all the stores it has already received. Thus, if the code -was changed to: +adds this fence after every READ_ONCE() and atomic load on Alpha. The +effect of the fence is to cause the CPU not to execute any po-later +instructions until after the local cache has finished processing all +the stores it has already received. Thus, if the code was changed to: P1() { @@ -1142,14 +1140,14 @@ READ_ONCE() or another synchronization primitive rather than accessed directly. The LKMM requires that smp_rmb(), acquire fences, and strong fences -share this property with smp_read_barrier_depends(): They do not allow -the CPU to execute any po-later instructions (or po-later loads in the -case of smp_rmb()) until all outstanding stores have been processed by -the local cache. In the case of a strong fence, the CPU first has to -wait for all of its po-earlier stores to propagate to every other CPU -in the system; then it has to wait for the local cache to process all -the stores received as of that time -- not just the stores received -when the strong fence began. +share this property: They do not allow the CPU to execute any po-later +instructions (or po-later loads in the case of smp_rmb()) until all +outstanding stores have been processed by the local cache. In the +case of a strong fence, the CPU first has to wait for all of its +po-earlier stores to propagate to every other CPU in the system; then +it has to wait for the local cache to process all the stores received +as of that time -- not just the stores received when the strong fence +began. And of course, none of this matters for any architecture other than Alpha. -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 11/13] powerpc: Remove comment about read_barrier_depends()
'read_barrier_depends()' doesn't exist anymore so stop talking about it. Signed-off-by: Will Deacon <will at kernel.org> --- arch/powerpc/include/asm/barrier.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index fbe8df433019..123adcefd40f 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -18,8 +18,6 @@ * mb() prevents loads and stores being reordered across this point. * rmb() prevents loads being reordered across this point. * wmb() prevents stores being reordered across this point. - * read_barrier_depends() prevents data-dependent loads being reordered - * across this point (nop on PPC). * * *mb() variants without smp_ prefix must order all types of memory * operations with one another. sync is the only instruction sufficient -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 12/13] include/linux: Remove smp_read_barrier_depends() from comments
'smp_read_barrier_depends()' doesn't exist any more, so reword the two comments that mention it to refer to "dependency ordering" instead. Signed-off-by: Will Deacon <will at kernel.org> --- include/linux/percpu-refcount.h | 2 +- include/linux/ptr_ring.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index 7aef0abc194a..246760b93715 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -155,7 +155,7 @@ static inline bool __ref_is_percpu(struct percpu_ref *ref, * between contaminating the pointer value, meaning that * READ_ONCE() is required when fetching it. * - * The smp_read_barrier_depends() implied by READ_ONCE() pairs + * The dependency ordering from the READ_ONCE() pairs * with smp_store_release() in __percpu_ref_switch_to_percpu(). */ percpu_ptr = READ_ONCE(ref->percpu_count_ptr); diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 0abe9a4fc842..b6e94b60fc12 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -106,7 +106,7 @@ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) return -ENOSPC; /* Make sure the pointer we are storing points to a valid data. */ - /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ + /* Pairs with the dependency ordering in __ptr_ring_consume. */ smp_wmb(); WRITE_ONCE(r->queue[r->producer++], ptr); -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Will Deacon
2019-Nov-08 17:01 UTC
[PATCH 13/13] checkpatch: Remove checks relating to [smp_]read_barrier_depends()
The '[smp_]read_barrier_depends()' macros no longer exist, so we don't need to deal with them in the checkpatch script. Signed-off-by: Will Deacon <will at kernel.org> --- scripts/checkpatch.pl | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 6fcc66afb088..7f5a368206bf 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5787,8 +5787,7 @@ sub process { my $barriers = qr{ mb| rmb| - wmb| - read_barrier_depends + wmb }x; my $barrier_stems = qr{ mb__before_atomic| @@ -5829,12 +5828,6 @@ sub process { } } -# check for smp_read_barrier_depends and read_barrier_depends - if (!$file && $line =~ /\b(smp_|)read_barrier_depends\s*\(/) { - WARN("READ_BARRIER_DEPENDS", - "$1read_barrier_depends should only be used in READ_ONCE or DEC Alpha code\n" . $herecurr); - } - # check of hardware specific defines if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m at include/asm-@) { CHK("ARCH_DEFINES", -- 2.24.0.rc1.363.gb1bccd3e3d-goog
Alan Stern
2019-Nov-08 17:42 UTC
[PATCH 10/13] tools/memory-model: Remove smp_read_barrier_depends() from informal doc
On Fri, 8 Nov 2019, Will Deacon wrote:> 'smp_read_barrier_depends()' has gone the way of mmiowb() and so many > esoteric memory barriers before it. Drop the two mentions of this > deceased barrier from the LKMM informal explanation document. > > Signed-off-by: Will Deacon <will at kernel.org> > --- > .../Documentation/explanation.txt | 26 +++++++++---------- > 1 file changed, 12 insertions(+), 14 deletions(-) > > diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt > index 488f11f6c588..3050bf67b8d0 100644 > --- a/tools/memory-model/Documentation/explanation.txt > +++ b/tools/memory-model/Documentation/explanation.txt > @@ -1118,12 +1118,10 @@ maintain at least the appearance of FIFO order. > In practice, this difficulty is solved by inserting a special fence > between P1's two loads when the kernel is compiled for the Alpha > architecture. In fact, as of version 4.15, the kernel automatically > -adds this fence (called smp_read_barrier_depends() and defined as > -nothing at all on non-Alpha builds) after every READ_ONCE() and atomic > -load. The effect of the fence is to cause the CPU not to execute any > -po-later instructions until after the local cache has finished > -processing all the stores it has already received. Thus, if the code > -was changed to: > +adds this fence after every READ_ONCE() and atomic load on Alpha. The > +effect of the fence is to cause the CPU not to execute any po-later > +instructions until after the local cache has finished processing all > +the stores it has already received. Thus, if the code was changed to: > > P1() > { > @@ -1142,14 +1140,14 @@ READ_ONCE() or another synchronization primitive rather than accessed > directly. > > The LKMM requires that smp_rmb(), acquire fences, and strong fences > -share this property with smp_read_barrier_depends(): They do not allow > -the CPU to execute any po-later instructions (or po-later loads in the > -case of smp_rmb()) until all outstanding stores have been processed by > -the local cache. In the case of a strong fence, the CPU first has to > -wait for all of its po-earlier stores to propagate to every other CPU > -in the system; then it has to wait for the local cache to process all > -the stores received as of that time -- not just the stores received > -when the strong fence began. > +share this property: They do not allow the CPU to execute any po-later > +instructions (or po-later loads in the case of smp_rmb()) until all > +outstanding stores have been processed by the local cache. In the > +case of a strong fence, the CPU first has to wait for all of its > +po-earlier stores to propagate to every other CPU in the system; then > +it has to wait for the local cache to process all the stores received > +as of that time -- not just the stores received when the strong fence > +began. > > And of course, none of this matters for any architecture other than > Alpha.Acked-by: Alan Stern <stern at rowland.harvard.edu>
Arnd Bergmann
2019-Nov-08 19:57 UTC
[PATCH 01/13] compiler.h: Split {READ, WRITE}_ONCE definitions out into rwonce.h
On Fri, Nov 8, 2019 at 6:01 PM Will Deacon <will at kernel.org> wrote:> > In preparation for allowing architectures to define their own > implementation of the 'READ_ONCE()' macro, move the generic > '{READ,WRITE}_ONCE()' definitions out of the unwieldy 'linux/compiler.h' > and into a new 'rwonce.h' header under 'asm-generic'.Adding Christian Borntr?ger to Cc, he originally added the READ_ONCE()/WRITE_ONCE() code. I wonder if it would be appropriate now to revert back to a much simpler version of these helpers for any modern compiler. As I understand, only gcc-4.6 and gcc4.7 actually need the song-and-dance version with the union and switch/case, while for others, we can might be able back to a macro doing a volatile access. Arnd