thr3ads.net - similar to: "[LLVMdev] confused about llvm.memory.barrier"

[LLVMdev] confused about llvm.memory.barrier

2008 Sep 25

0

[LLVMdev] confused about llvm.memory.barrier

On Thu, 2008-09-25 at 10:28 -0400, Luke Dalessandro wrote: > When I request a write-before-read memory barrier on x86 I would expect > to get an assembly instruction that would enforce this ordering (mfence, > xchg, cas), but it just turns into a nop. In its usual configuration, an x86 family CPU implements a strong memory ordering constraint for all loads and stores, so as long as the

[LLVMdev] PIC documentation ?

2009 Jun 16

0

[LLVMdev] PIC documentation ?

>> 2. ABI docs for Darwin (x86, x86_64, ppc, ppc64) you might find >> somewhere @apple.com. There you can have all 3 types of PIC code: >> static (no pic at all), DynamicNoPIC and full PIC. > > Okay. We need documentation, what is the difference between > DynamicNoPIC and > full PIC ? The best way to figure this out is to run a small program through and look at

[LLVMdev] More questions about non_lazy_ptr

2010 Oct 10

1

[LLVMdev] More questions about non_lazy_ptr

I have a problem where my LLVM-generated code works on Linux but not on OS X, and the problem involves non_lazy_ptr. I have an external symbol named "@gc_safepoint_map", which is generated by the linker's GCStrategy plugin. Since it is not generated until link time, I declared it as an external symbol so that the modules that use it can compile without error. Here's what the

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

5

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds <torvalds at linux-foundation.org> wrote: > On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote: >> >> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) >> was better because it avoided stomping on very-likely-to-be-hot write >> buffers. > > I suspect it

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

5

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds <torvalds at linux-foundation.org> wrote: > On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote: >> >> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64) >> was better because it avoided stomping on very-likely-to-be-hot write >> buffers. > > I suspect it

[PATCH] virtio_ring: use smp_store_mb

2015 Dec 17

4

[PATCH] virtio_ring: use smp_store_mb

We need a full barrier after writing out event index, using smp_store_mb there seems better than open-coding. As usual, we need a wrapper to account for strong barriers/non smp. It's tempting to use this in vhost as well, for that, we'll need a variant of smp_store_mb that works on __user pointers. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- Seems to give a speedup

[PATCH] virtio_ring: use smp_store_mb

2015 Dec 17

4

[PATCH] virtio_ring: use smp_store_mb

We need a full barrier after writing out event index, using smp_store_mb there seems better than open-coding. As usual, we need a wrapper to account for strong barriers/non smp. It's tempting to use this in vhost as well, for that, we'll need a variant of smp_store_mb that works on __user pointers. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- Seems to give a speedup

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[LLVMdev] ARM assembler's syntax in clang

2013 Mar 08

0

[LLVMdev] ARM assembler's syntax in clang

> And be warned that the PC doesn't point at the next instruction when you use it like this - I believe you don't need to modify it at all if you swap the pop and the .long. Bernie, is it related to ARM pipeline? I'm interesting in this, is there any other additional information? On Fri, Mar 8, 2013 at 4:59 AM, Tim Northover <t.p.northover at gmail.com>wrote: > Hi Ashi,

[PATCH] virtio_ring: use smp_store_mb

2015 Dec 17

2

[PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote: > On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote: > > +static inline void virtio_store_mb(bool weak_barriers, > > + __virtio16 *p, __virtio16 v) > > +{ > > +#ifdef CONFIG_SMP > > + if (weak_barriers) > > + smp_store_mb(*p, v); > > + else > > +#endif >

[PATCH] virtio_ring: use smp_store_mb

2015 Dec 17

2

[PATCH] virtio_ring: use smp_store_mb

On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote: > On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote: > > +static inline void virtio_store_mb(bool weak_barriers, > > + __virtio16 *p, __virtio16 v) > > +{ > > +#ifdef CONFIG_SMP > > + if (weak_barriers) > > + smp_store_mb(*p, v); > > + else > > +#endif >

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016 Jan 12

7

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016 Jan 12

7

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016 Jan 26

2

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote: > On 01/12/16 14:10, Michael S. Tsirkin wrote: > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. > > > > So let's use the locked variant everywhere - helps keep the code simple as >

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016 Jan 26

2

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote: > On 01/12/16 14:10, Michael S. Tsirkin wrote: > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. > > > > So let's use the locked variant everywhere - helps keep the code simple as >

similar to: [LLVMdev] confused about llvm.memory.barrier