thr3ads.net - llvm dev - [LLVMdev] confused about llvm.memory.barrier [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Luke Dalessandro

2008-Sep-25 14:28 UTC

[LLVMdev] confused about llvm.memory.barrier

When I request a write-before-read memory barrier on x86 I would expect 
to get an assembly instruction that would enforce this ordering (mfence, 
xchg, cas), but it just turns into a nop.

  1. ; ModuleID = 'test.bc'
  2. target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-
i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
  3. target triple = "i686-apple-darwin9"
  4. @a = external global i32                ; <i32*> [#uses=1]
  5. @b = external global i32                ; <i32*> [#uses=1]
  6.
  7. define i32 @_Z3foov() nounwind {
  8. entry:
  9.          store i32 1, i32* @a, align 4
10.          tail call void @llvm.memory.barrier(i1 true, i1 true, \
                                                  i1 true, i1 true, \
                                                  i1 false)
11.          %0 = load i32* @b, align 4              ; <i32> [#uses=1]
12.          ret i32 %0
13. }
14.
15. declare void @llvm.memory.barrier(i1, i1, i1, i1, i1) nounwind

compiled with:

1. llc -mcpu=core2 -mattr=+sse2,+sse3 -f -o test.s test.bc

becomes

  1.	.text
  2.	.align	4,0x90
  3.	.globl	__Z3foov
  4. __Z3foov:
  5.	movl	L_a$non_lazy_ptr, %eax
  6.	movl	$1, (%eax)
  7.	nop
  8.	movl	L_b$non_lazy_ptr, %eax
  9.	movl	(%eax), %eax
10.	ret
11.
12.	.section __IMPORT,__pointers,non_lazy_symbol_pointers
13. L_a$non_lazy_ptr:
14.	.indirect_symbol _a
15.	.long	0
16. L_b$non_lazy_ptr:
17.	.indirect_symbol _b
18.	.long	0
19.	.subsections_via_symbols

Is the problem related to the fact that I get i386 from uname -m? If so, 
how can I tell override this setting during compilation?

Thanks,
Luke

Jonathan S. Shapiro

2008-Sep-25 14:41 UTC

head link

[LLVMdev] confused about llvm.memory.barrier

On Thu, 2008-09-25 at 10:28 -0400, Luke Dalessandro
wrote:> When I request a write-before-read memory barrier on x86 I would expect 
> to get an assembly instruction that would enforce this ordering (mfence, 
> xchg, cas), but it just turns into a nop.
In its usual configuration, an x86 family CPU implements a strong memory
ordering constraint for all loads and stores, so as long as the ordering
of the read and write operations is preserved no atomic operation is
required between them. XCHG and CAS only become necessary when you are
coordinating reads and writes across processors. MFENCE similarly.

So the current behavior of LLVM is correct, but there is a valid concern
hiding here: there exist programs that intentionally alter the strong
ordering contract in high-performance applications for the sake of
performance, and in those applications it really is necessary to do some
operation that suitably serializes the memory subsystem on the
processor.

The LLVM team may already have a better answer for this, but my first
reaction is that this is effectively a different target architecture. My
second, and possibly more interesting reaction is that

  a) There needs to be some means (through annotation) to insist that
     these instructions are not removed. Perhaps some means already
     exists; I have not looked.

  b) It might be interesting to examine whether coherency behavior
     could be handled as an attribute of address spaces in LLVM.
     Offhand, this would seem to require a notion of address spaces that
     are exact duplicates of each other except for coherency behavior,
     but there might be some cleaner way to handle that.

The entire LLVM address space notion intrigues me, and I just haven't
had any chance to dig in to it.

shap

Luke Dalessandro

2008-Sep-25 15:01 UTC

head link

[LLVMdev] confused about llvm.memory.barrier

Jonathan S. Shapiro wrote:> On Thu, 2008-09-25 at 10:28 -0400, Luke Dalessandro wrote:
>> When I request a write-before-read memory barrier on x86 I would expect
>> to get an assembly instruction that would enforce this ordering
(mfence,
>> xchg, cas), but it just turns into a nop.
> 
> In its usual configuration, an x86 family CPU implements a strong memory
> ordering constraint for all loads and stores, so as long as the ordering
> of the read and write operations is preserved no atomic operation is
> required between them. XCHG and CAS only become necessary when you are
> coordinating reads and writes across processors. MFENCE similarly.
IA32 (http://www.intel.com/products/processor/manuals/318147.pdf) always 
allows load bypassing.

I found the problem. llvm-gcc compiles __sync_synchronize() ("a full 
memory barrier") as:

1. tail call void @llvm.memory.barrier(i1 true, i1 true, i1 true, i1 
true, i1 false)

As pointed out on IRC, that 5th parameter being false is what is 
generating the nop. If I go in and manually change it to true I get the 
mfence.

Did llvm.memory.barrier always have 5 parameters? What's the purpose of 
the 5th? Why isn't requesting a ls barrier enough?

I think this might be a change that llvm-gcc doesn't know about yet (yet 
== r56496).

Luke
> 
> So the current behavior of LLVM is correct, but there is a valid concern
> hiding here: there exist programs that intentionally alter the strong
> ordering contract in high-performance applications for the sake of
> performance, and in those applications it really is necessary to do some
> operation that suitably serializes the memory subsystem on the
> processor.
> 
> The LLVM team may already have a better answer for this, but my first
> reaction is that this is effectively a different target architecture. My
> second, and possibly more interesting reaction is that
> 
>   a) There needs to be some means (through annotation) to insist that
>      these instructions are not removed. Perhaps some means already
>      exists; I have not looked.
> 
>   b) It might be interesting to examine whether coherency behavior
>      could be handled as an attribute of address spaces in LLVM.
>      Offhand, this would seem to require a notion of address spaces that
>      are exact duplicates of each other except for coherency behavior,
>      but there might be some cleaner way to handle that.
> 
> The entire LLVM address space notion intrigues me, and I just haven't
> had any chance to dig in to it.
> 
> 
> shap
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Duncan Sands

2008-Sep-26 07:49 UTC

head link

[LLVMdev] confused about llvm.memory.barrier

Consider the following example (A and B are global
variables that initially contain 0):

Processor 1:
  store 1, A
  x = load B

Processor 2:
  store 1, B
  y = load A

Is it possible to end up with x = 0 and y = 0?
Yes!  This is exactly the example in table 2.3.a
of http://www.intel.com/products/processor/manuals/318147.pdf.
Yet it seems impossible to use gcc memory barriers
to prevent this, since they compile to nothing on x86...

Ciao,

Duncan.

David Greene

2008-Sep-26 16:00 UTC

head link

[LLVMdev] confused about llvm.memory.barrier

On Thursday 25 September 2008 09:41, Jonathan S. Shapiro wrote:
> In its usual configuration, an x86 family CPU implements a strong memory
> ordering constraint for all loads and stores, so as long as the ordering
> of the read and write operations is preserved no atomic operation is
> required between them. XCHG and CAS only become necessary when you are
> coordinating reads and writes across processors. MFENCE similarly.
That's not quite true.  If you use non-temporal stores you need a way to 
generate a real mfence.
> So the current behavior of LLVM is correct, but there is a valid concern
> hiding here: there exist programs that intentionally alter the strong
> ordering contract in high-performance applications for the sake of
> performance, and in those applications it really is necessary to do some
> operation that suitably serializes the memory subsystem on the
> processor.
This is going to become more and more common on x86.
> The LLVM team may already have a better answer for this, but my first
> reaction is that this is effectively a different target architecture. My
> second, and possibly more interesting reaction is that
No, it's not a separate target architecture.  That would be overkill.
>   a) There needs to be some means (through annotation) to insist that
>      these instructions are not removed. Perhaps some means already
>      exists; I have not looked.
As Luke discovered, it's the argument to llvm.memory.barrier that make the
difference.  See X86InstrSSE.td.
>   b) It might be interesting to examine whether coherency behavior
>      could be handled as an attribute of address spaces in LLVM.
>      Offhand, this would seem to require a notion of address spaces that
>      are exact duplicates of each other except for coherency behavior,
>      but there might be some cleaner way to handle that.
That's an interesting thought as that's exactly how WC and non-WC
memory is described in the Opteron manuals.

                                                              -Dave

Duncan Sands

2008-Sep-29 13:11 UTC

head link

[LLVMdev] confused about llvm.memory.barrier

> In its usual configuration, an x86 family CPU implements a strong memory
> ordering constraint for all loads and stores, so as long as the ordering
> of the read and write operations is preserved no atomic operation is
> required between them. XCHG and CAS only become necessary when you are
> coordinating reads and writes across processors. MFENCE similarly.
So... gcc's memory barriers are of no use on a multi-processor system?
These are pretty common nowadays, so that sounds very bad...

Ciao,

Duncan.

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Sep 2008 - [LLVMdev] confused about llvm.memory.barrier

[LLVMdev] confused about llvm.memory.barrier

[LLVMdev] confused about llvm.memory.barrier

[LLVMdev] confused about llvm.memory.barrier

[LLVMdev] confused about llvm.memory.barrier

[LLVMdev] confused about llvm.memory.barrier

[LLVMdev] confused about llvm.memory.barrier

Possibly Parallel Threads