So as explained in Documentation/memory-barriers.txt e.g. a load followed by a store require a full memory barrier, to avoid store being ordered before the load. Similarly load-load requires a read memory barrier. Thinking about it, we can actually create a data dependency by mixing the first loaded value into the pointer being accessed. This adds an API for this and uses it in virtio. Written over the holiday and build tested only so far. This patchset is also suboptimal on e.g. x86 where e.g. smp_rmb is a nop. Sending out for early feedback/flames. Michael S. Tsirkin (4): include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR include/linux/compiler.h: allow memory operands barriers: convert a control to a data dependency virtio: use dependent_ptr_mb Documentation/memory-barriers.txt | 20 ++++++++++++++++++++ arch/alpha/include/asm/barrier.h | 1 + drivers/virtio/virtio_ring.c | 6 ++++-- include/asm-generic/barrier.h | 18 ++++++++++++++++++ include/linux/compiler-clang.h | 5 ++--- include/linux/compiler-gcc.h | 4 ---- include/linux/compiler-intel.h | 4 +--- include/linux/compiler.h | 8 +++++++- 8 files changed, 53 insertions(+), 13 deletions(-) -- MST
Michael S. Tsirkin
2019-Jan-02 20:57 UTC
[PATCH RFC 1/4] include/linux/compiler*.h: fix OPTIMIZER_HIDE_VAR
Since commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") clang no longer reuses the OPTIMIZER_HIDE_VAR macro from compiler-gcc - instead it gets the version in include/linux/compiler.h. Unfortunately that version doesn't actually prevent compiler from optimizing out the variable. Fix up by moving the macro out from compiler-gcc.h to compiler.h. Compilers without incline asm support will keep working since it's protected by an ifdef. Also fix up comments to match reality since we are no longer overriding any macros. Build-tested with gcc and clang. Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Cc: Eli Friedman <efriedma at codeaurora.org> Cc: Joe Perches <joe at perches.com> Cc: Nick Desaulniers <ndesaulniers at google.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/linux/compiler-clang.h | 5 ++--- include/linux/compiler-gcc.h | 4 ---- include/linux/compiler-intel.h | 4 +--- include/linux/compiler.h | 4 +++- 4 files changed, 6 insertions(+), 11 deletions(-) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 3e7dafb3ea80..7ddaeb5182e3 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -3,9 +3,8 @@ #error "Please don't include <linux/compiler-clang.h> directly, include <linux/compiler.h> instead." #endif -/* Some compiler specific definitions are overwritten here - * for Clang compiler - */ +/* Compiler specific definitions for Clang compiler */ + #define uninitialized_var(x) x = *(&(x)) /* same as gcc, this was present in clang-2.6 so we can assume it works diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 2010493e1040..72054d9f0eaa 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -58,10 +58,6 @@ (typeof(ptr)) (__ptr + (off)); \ }) -/* Make the optimizer believe the variable can be manipulated arbitrarily. */ -#define OPTIMIZER_HIDE_VAR(var) \ - __asm__ ("" : "=r" (var) : "0" (var)) - /* * A trick to suppress uninitialized variable warning without generating any * code diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h index 517bd14e1222..b17f3cd18334 100644 --- a/include/linux/compiler-intel.h +++ b/include/linux/compiler-intel.h @@ -5,9 +5,7 @@ #ifdef __ECC -/* Some compiler specific definitions are overwritten here - * for Intel ECC compiler - */ +/* Compiler specific definitions for Intel ECC compiler */ #include <asm/intrinsics.h> diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 06396c1cf127..1ad367b4cd8d 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -152,7 +152,9 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, #endif #ifndef OPTIMIZER_HIDE_VAR -#define OPTIMIZER_HIDE_VAR(var) barrier() +/* Make the optimizer believe the variable can be manipulated arbitrarily. */ +#define OPTIMIZER_HIDE_VAR(var) \ + __asm__ ("" : "=r" (var) : "0" (var)) #endif /* Not-quite-unique ID. */ -- MST
Michael S. Tsirkin
2019-Jan-02 20:57 UTC
[PATCH RFC 2/4] include/linux/compiler.h: allow memory operands
We don't really care whether the variable is in-register or in-memory. Relax the constraint accordingly. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- include/linux/compiler.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 1ad367b4cd8d..6601d39e8c48 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -154,7 +154,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, #ifndef OPTIMIZER_HIDE_VAR /* Make the optimizer believe the variable can be manipulated arbitrarily. */ #define OPTIMIZER_HIDE_VAR(var) \ - __asm__ ("" : "=r" (var) : "0" (var)) + __asm__ ("" : "=rm" (var) : "0" (var)) #endif /* Not-quite-unique ID. */ -- MST
Michael S. Tsirkin
2019-Jan-02 20:57 UTC
[PATCH RFC 3/4] barriers: convert a control to a data dependency
It's not uncommon to have two access two unrelated memory locations in a specific order. At the moment one has to use a memory barrier for this. However, if the first access was a read and the second used an address depending on the first one we would have a data dependency and no barrier would be necessary. This adds a new interface: dependent_ptr_mb which does exactly this: it returns a pointer with a data dependency on the supplied value. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- Documentation/memory-barriers.txt | 20 ++++++++++++++++++++ arch/alpha/include/asm/barrier.h | 1 + include/asm-generic/barrier.h | 18 ++++++++++++++++++ include/linux/compiler.h | 4 ++++ 4 files changed, 43 insertions(+) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index c1d913944ad8..9dbaa2e1dbf6 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -691,6 +691,18 @@ case what's actually required is: p = READ_ONCE(b); } +Alternatively, a control dependency can be converted to a data dependency, +e.g.: + + q = READ_ONCE(a); + if (q) { + b = dependent_ptr_mb(b, q); + p = READ_ONCE(b); + } + +Note how the result of dependent_ptr_mb must be used with the following +accesses in order to have an effect. + However, stores are not speculated. This means that ordering -is- provided for load-store control dependencies, as in the following example: @@ -836,6 +848,12 @@ out-guess your code. More generally, although READ_ONCE() does force the compiler to actually emit code for a given load, it does not force the compiler to use the results. +Converting to a data dependency helps with this too: + + q = READ_ONCE(a); + b = dependent_ptr_mb(b, q); + WRITE_ONCE(b, 1); + In addition, control dependencies apply only to the then-clause and else-clause of the if-statement in question. In particular, it does not necessarily apply to code following the if-statement: @@ -875,6 +893,8 @@ to the CPU containing it. See the section on "Multicopy atomicity" for more information. + + In summary: (*) Control dependencies can order prior loads against later stores. diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h index 92ec486a4f9e..b4934e8c551b 100644 --- a/arch/alpha/include/asm/barrier.h +++ b/arch/alpha/include/asm/barrier.h @@ -59,6 +59,7 @@ * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() * in cases like this where there are no data dependencies. */ +#define ARCH_NEEDS_READ_BARRIER_DEPENDS 1 #define read_barrier_depends() __asm__ __volatile__("mb": : :"memory") #ifdef CONFIG_SMP diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 2cafdbb9ae4c..fa2e2ef72b68 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -70,6 +70,24 @@ #define __smp_read_barrier_depends() read_barrier_depends() #endif +#if defined(COMPILER_HAS_OPTIMIZER_HIDE_VAR) && \ + !defined(ARCH_NEEDS_READ_BARRIER_DEPENDS) + +#define dependent_ptr_mb(ptr, val) ({ \ + long dependent_ptr_mb_val = (long)(val); \ + long dependent_ptr_mb_ptr = (long)(ptr) - dependent_ptr_mb_val; \ + \ + BUILD_BUG_ON(sizeof(val) > sizeof(long)); \ + OPTIMIZER_HIDE_VAR(dependent_ptr_mb_val); \ + (typeof(ptr))(dependent_ptr_mb_ptr + dependent_ptr_mb_val); \ +}) + +#else + +#define dependent_ptr_mb(ptr, val) ({ mb(); (ptr); }) + +#endif + #ifdef CONFIG_SMP #ifndef smp_mb diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 6601d39e8c48..f599c30f1b28 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -152,9 +152,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, #endif #ifndef OPTIMIZER_HIDE_VAR + /* Make the optimizer believe the variable can be manipulated arbitrarily. */ #define OPTIMIZER_HIDE_VAR(var) \ __asm__ ("" : "=rm" (var) : "0" (var)) + +#define COMPILER_HAS_OPTIMIZER_HIDE_VAR 1 + #endif /* Not-quite-unique ID. */ -- MST
Use dependent_ptr_mb which is - on some architectures - more light-weight than an rmb. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/virtio/virtio_ring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 814b395007b2..2d320396eff8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -702,6 +702,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len, void *ret; unsigned int i; u16 last_used; + bool more; START_USE(vq); @@ -710,14 +711,15 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len, return NULL; } - if (!more_used(vq)) { + more = more_used(vq); + if (!more) { pr_debug("No more buffers in queue\n"); END_USE(vq); return NULL; } /* Only get used array entries after they have been exposed by host. */ - virtio_rmb(vq->weak_barriers); + vq = dependent_ptr_mb(vq, more); last_used = (vq->last_used_idx & (vq->vring.num - 1)); i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id); -- MST
Matthew Wilcox
2019-Jan-02 21:00 UTC
[PATCH RFC 3/4] barriers: convert a control to a data dependency
On Wed, Jan 02, 2019 at 03:57:58PM -0500, Michael S. Tsirkin wrote:> @@ -875,6 +893,8 @@ to the CPU containing it. See the section on "Multicopy atomicity" > for more information. > > > + > + > In summary: > > (*) Control dependencies can order prior loads against later stores.Was this hunk intentional?
Jason Wang
2019-Jan-07 03:58 UTC
[PATCH RFC 3/4] barriers: convert a control to a data dependency
On 2019/1/3 ??4:57, Michael S. Tsirkin wrote:> It's not uncommon to have two access two unrelated memory locations in a > specific order. At the moment one has to use a memory barrier for this. > > However, if the first access was a read and the second used an address > depending on the first one we would have a data dependency and no > barrier would be necessary. > > This adds a new interface: dependent_ptr_mb which does exactly this: it > returns a pointer with a data dependency on the supplied value. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > Documentation/memory-barriers.txt | 20 ++++++++++++++++++++ > arch/alpha/include/asm/barrier.h | 1 + > include/asm-generic/barrier.h | 18 ++++++++++++++++++ > include/linux/compiler.h | 4 ++++ > 4 files changed, 43 insertions(+) > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt > index c1d913944ad8..9dbaa2e1dbf6 100644 > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -691,6 +691,18 @@ case what's actually required is: > p = READ_ONCE(b); > } > > +Alternatively, a control dependency can be converted to a data dependency, > +e.g.: > + > + q = READ_ONCE(a); > + if (q) { > + b = dependent_ptr_mb(b, q); > + p = READ_ONCE(b); > + } > + > +Note how the result of dependent_ptr_mb must be used with the following > +accesses in order to have an effect. > + > However, stores are not speculated. This means that ordering -is- provided > for load-store control dependencies, as in the following example: > > @@ -836,6 +848,12 @@ out-guess your code. More generally, although READ_ONCE() does force > the compiler to actually emit code for a given load, it does not force > the compiler to use the results. > > +Converting to a data dependency helps with this too: > + > + q = READ_ONCE(a); > + b = dependent_ptr_mb(b, q); > + WRITE_ONCE(b, 1); > + > In addition, control dependencies apply only to the then-clause and > else-clause of the if-statement in question. In particular, it does > not necessarily apply to code following the if-statement: > @@ -875,6 +893,8 @@ to the CPU containing it. See the section on "Multicopy atomicity" > for more information. > > > + > + > In summary: > > (*) Control dependencies can order prior loads against later stores. > diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h > index 92ec486a4f9e..b4934e8c551b 100644 > --- a/arch/alpha/include/asm/barrier.h > +++ b/arch/alpha/include/asm/barrier.h > @@ -59,6 +59,7 @@ > * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() > * in cases like this where there are no data dependencies. > */ > +#define ARCH_NEEDS_READ_BARRIER_DEPENDS 1 > #define read_barrier_depends() __asm__ __volatile__("mb": : :"memory") > > #ifdef CONFIG_SMP > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index 2cafdbb9ae4c..fa2e2ef72b68 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -70,6 +70,24 @@ > #define __smp_read_barrier_depends() read_barrier_depends() > #endif > > +#if defined(COMPILER_HAS_OPTIMIZER_HIDE_VAR) && \ > + !defined(ARCH_NEEDS_READ_BARRIER_DEPENDS) > + > +#define dependent_ptr_mb(ptr, val) ({ \ > + long dependent_ptr_mb_val = (long)(val); \ > + long dependent_ptr_mb_ptr = (long)(ptr) - dependent_ptr_mb_val; \ > + \ > + BUILD_BUG_ON(sizeof(val) > sizeof(long)); \ > + OPTIMIZER_HIDE_VAR(dependent_ptr_mb_val); \ > + (typeof(ptr))(dependent_ptr_mb_ptr + dependent_ptr_mb_val); \ > +}) > + > +#else > + > +#define dependent_ptr_mb(ptr, val) ({ mb(); (ptr); })So for the example of patch 4, we'd better fall back to rmb() or need a dependent_ptr_rmb()? Thanks> + > +#endif > + > #ifdef CONFIG_SMP > > #ifndef smp_mb > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 6601d39e8c48..f599c30f1b28 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -152,9 +152,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, > #endif > > #ifndef OPTIMIZER_HIDE_VAR > + > /* Make the optimizer believe the variable can be manipulated arbitrarily. */ > #define OPTIMIZER_HIDE_VAR(var) \ > __asm__ ("" : "=rm" (var) : "0" (var)) > + > +#define COMPILER_HAS_OPTIMIZER_HIDE_VAR 1 > + > #endif > > /* Not-quite-unique ID. */