Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] confused about llvm.memory.barrier"
2008 Sep 25
0
[LLVMdev] confused about llvm.memory.barrier
On Thu, 2008-09-25 at 10:28 -0400, Luke Dalessandro wrote:
> When I request a write-before-read memory barrier on x86 I would expect
> to get an assembly instruction that would enforce this ordering (mfence,
> xchg, cas), but it just turns into a nop.
In its usual configuration, an x86 family CPU implements a strong memory
ordering constraint for all loads and stores, so as long as the
2009 Jun 16
0
[LLVMdev] PIC documentation ?
>> 2. ABI docs for Darwin (x86, x86_64, ppc, ppc64) you might find
>> somewhere @apple.com. There you can have all 3 types of PIC code:
>> static (no pic at all), DynamicNoPIC and full PIC.
>
> Okay. We need documentation, what is the difference between
> DynamicNoPIC and
> full PIC ?
The best way to figure this out is to run a small program through and
look at
2010 Oct 10
1
[LLVMdev] More questions about non_lazy_ptr
I have a problem where my LLVM-generated code works on Linux but not on OS
X, and the problem involves non_lazy_ptr.
I have an external symbol named "@gc_safepoint_map", which is generated by
the linker's GCStrategy plugin. Since it is not generated until link time, I
declared it as an external symbol so that the modules that use it can
compile without error.
Here's what the
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference
2016 Jan 12
5
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote:
>>
>> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64)
>> was better because it avoided stomping on very-likely-to-be-hot write
>> buffers.
>
> I suspect it
2016 Jan 12
5
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 12:54 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> On Tue, Jan 12, 2016 at 12:30 PM, Andy Lutomirski <luto at kernel.org> wrote:
>>
>> I recall reading somewhere that lock addl $0, 32(%rsp) or so (maybe even 64)
>> was better because it avoided stomping on very-likely-to-be-hot write
>> buffers.
>
> I suspect it
2015 Dec 17
4
[PATCH] virtio_ring: use smp_store_mb
We need a full barrier after writing out event index, using smp_store_mb
there seems better than open-coding.
As usual, we need a wrapper to account for strong barriers/non smp.
It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.
Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
---
Seems to give a speedup
2015 Dec 17
4
[PATCH] virtio_ring: use smp_store_mb
We need a full barrier after writing out event index, using smp_store_mb
there seems better than open-coding.
As usual, we need a wrapper to account for strong barriers/non smp.
It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.
Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
---
Seems to give a speedup
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Results:
perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0
Before:
0.922565990 seconds time elapsed ( +- 1.15% )
After:
0.578667024 seconds time elapsed
2013 Mar 08
0
[LLVMdev] ARM assembler's syntax in clang
> And be warned that the PC doesn't point at the next instruction when you
use it like this - I believe you don't need to modify it at all if you swap
the pop and the .long.
Bernie, is it related to ARM pipeline? I'm interesting in this, is there
any other additional information?
On Fri, Mar 8, 2013 at 4:59 AM, Tim Northover <t.p.northover at gmail.com>wrote:
> Hi Ashi,
2015 Dec 17
2
[PATCH] virtio_ring: use smp_store_mb
On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > +static inline void virtio_store_mb(bool weak_barriers,
> > + __virtio16 *p, __virtio16 v)
> > +{
> > +#ifdef CONFIG_SMP
> > + if (weak_barriers)
> > + smp_store_mb(*p, v);
> > + else
> > +#endif
>
2015 Dec 17
2
[PATCH] virtio_ring: use smp_store_mb
On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > +static inline void virtio_store_mb(bool weak_barriers,
> > + __virtio16 *p, __virtio16 v)
> > +{
> > +#ifdef CONFIG_SMP
> > + if (weak_barriers)
> > + smp_store_mb(*p, v);
> > + else
> > +#endif
>
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
So let's use the locked variant everywhere - helps keep the code simple as
well.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
I hope I'm not splitting this up too much - the reason
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
So let's use the locked variant everywhere - helps keep the code simple as
well.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
I hope I'm not splitting this up too much - the reason
2016 Jan 26
2
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> >
> > So let's use the locked variant everywhere - helps keep the code simple as
>
2016 Jan 26
2
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> >
> > So let's use the locked variant everywhere - helps keep the code simple as
>