There was some discussion a while back about adding a C++0x-style memory model and atomics for LLVM a while back (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/31295), but it got stalled. I'm going to try and restart progress on it. Attached are two patches; the first adds a section to LangRef with just the memory model, without directly changing the documentation or implementation of atomic intrinsics. This mostly comes from https://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest, but it's been modified a bit, so please look at the attached version. The second fixes the one place I know of where LLVM violates that proposed memory model. I would appreciate any reviews (primarily for the LangRef bits; I'm reasonably confident the patch to LICM is correct). There was some discussion about allowing targets to implement unaligned stores by widening, but I've left that out. No current backend uses such an implementation, so there isn't a substantial benefit, and allowing it would break some current optimizations (like transforming a memset to an unaligned store). See also the FIXME in the LangRef patch about architectures without a proper single-byte store instruction. -Eli -------------- next part -------------- Index: docs/LangRef.html ==================================================================--- docs/LangRef.html (revision 135415) +++ docs/LangRef.html (working copy) @@ -53,6 +53,7 @@ <li><a href="#datalayout">Data Layout</a></li> <li><a href="#pointeraliasing">Pointer Aliasing Rules</a></li> <li><a href="#volatile">Volatile Memory Accesses</a></li> + <li><a href="#memmodel">Memory Model for Concurrent Operations</a></li> </ol> </li> <li><a href="#typesystem">Type System</a> @@ -1470,8 +1471,93 @@ </div> +<!-- ======================================================================= --> +<h3> + <a name="memmodel">Memory Model for Concurrent Operations</a> +</h3> + +<div> + +<p>The LLVM IR does not define any way to start parallel threads of execution +or to register signal handlers. Nonetheless, there are platform-specific +ways to create them, and we define LLVM IR's behavior in their presence. This +model is inspired by the C++0x memory model.</p> + +<p>We define a <i>happens-before</i> partial order as the least partial order +that</p> +<ul> + <li>Is a superset of single-thread program order, and</li> + <li>When a <i>synchronizes-with</i> <tt>b</tt>, includes an edge from + <tt>a</tt> to <tt>b</tt>. <i>Synchronizes-with</i> pairs are introduced + by platform-specific techniques, like pthread locks, thread + creation, thread joining, etc., and by the atomic operations described + in the <a href="#int_atomics">Atomic intrinsics</a> section.</li> +</ul> + +<p>Note that program order does not introduce <i>happens-before</i> edges +between a thread and signals executing inside that thread.</p> + +<p>Every (defined) read operation (load instructions, memcpy, atomic +loads/read-modify-writes, etc.) <var>R</var> reads a series of bytes written by +(defined) write operations (store instructions, atomic +stores/read-modify-writes, memcpy, etc.). For each byte, <var>R</var> reads the +value written by some write that it <i>may see</i>, given any relevant +<i>happens-before</i> constraints. <var>R<sub>byte</sub></var> may +see any write to the same byte, except:</p> + +<ul> + <li>If <var>write<sub>1</sub></var> happens before + <var>write<sub>2</sub></var>, and <var>write<sub>2</sub></var> happens + before <var>R<sub>byte</sub></var>, then <var>R<sub>byte</sub></var> + must not see <var>write<sub>1</sub></var>. + <li>If <var>R<sub>byte</sub></var> happens before <var>write<sub>3</var>, + then <var>R<sub>byte</sub></var> must not see + <var>write<sub>3</sub></var>. +</ul> + +<p>Given that definition, <var>R<sub>byte</sub></var> is defined as follows: +<ul> + <li>If there is no write to the same byte that happens before + <var>R<sub>byte</sub></var>, <var>R<sub>byte</sub></var> returns + <tt>undef</tt> for that byte. + <li>If <var>R<sub>byte</sub></var> may see exactly one write, + <var>R<sub>byte</sub></var> returns the value written by that + write.</li> + <li>If <var>R<sub>byte</sub></var> and all the writes it may see are + atomic, it chooses one of those writes and returns it value. + Given any two bytes in a given read <var>R</var>, if the set of + writes <var>R<sub>byte</sub></var> may see is the same as the set + of writes another byte may see, they will both choose the same write. + <li>Otherwise <var>R<sub>byte</sub></var> returns <tt>undef</tt>.</li> +</ul> + +<p><var>R</var> returns the value composed of the series of bytes it read. +This implies that some bytes within the value may be <tt>undef</tt> +<b>without</b> the entire value being <tt>undef</tt>. Note that this only +defines the semantics of the operation; it doesn't mean that targets will +emit more than one instruction to read the series of bytes.</p> + +<p>Note that in cases where none of the atomic intrinsics are used, this model +places only one restriction on IR transformations on top of what is required +for single-threaded execution: introducing a store to a byte which might not +otherwise be stored to can introduce undefined behavior.</p> + +<!-- FIXME: This model assumes all targets where concurrency is relevant have +a byte-size store which doesn't affect adjacent bytes. As far as I can tell, +none of the backends currently in the tree fall into this category; however, +there might be targets which care. If there are, we want a paragraph +like the following: + +Targets may specify that stores narrower than a certain width are not +available; on such a target, for the purposes of this model, treat any non-atomic +write with an alignment or width less than the minimum width as if it writes +to the relevant surrounding bytes. +--> + </div> +</div> + <!-- *********************************************************************** --> <h2><a name="typesystem">Type System</a></h2> <!-- *********************************************************************** --> -------------- next part -------------- Index: test/Transforms/LICM/scalar-promote-memmodel.ll ==================================================================--- test/Transforms/LICM/scalar-promote-memmodel.ll (revision 0) +++ test/Transforms/LICM/scalar-promote-memmodel.ll (revision 0) @@ -0,0 +1,37 @@ +; RUN: opt < %s -basicaa -licm -S | FileCheck %s + +; Make sure we don't hoist a conditionally-executed store out of the loop; +; it would violate the concurrency memory model + + at g = common global i32 0, align 4 + +define void @bar(i32 %n, i32 %b) nounwind uwtable ssp { +entry: + br label %for.cond + +for.cond: ; preds = %for.inc, %entry + %i.0 = phi i32 [ 0, %entry ], [ %inc5, %for.inc ] + %cmp = icmp slt i32 %i.0, %n + br i1 %cmp, label %for.body, label %for.end + +for.body: ; preds = %for.cond + %tobool = icmp eq i32 %b, 0 + br i1 %tobool, label %for.inc, label %if.then + +if.then: ; preds = %for.body + %tmp3 = load i32* @g, align 4 + %inc = add nsw i32 %tmp3, 1 + store i32 %inc, i32* @g, align 4 + br label %for.inc + +; CHECK: load i32* +; CHECK-NEXT: add +; CHECK-NEXT: store i32 + +for.inc: ; preds = %for.body, %if.then + %inc5 = add nsw i32 %i.0, 1 + br label %for.cond + +for.end: ; preds = %for.cond + ret void +} Index: lib/Transforms/Scalar/LICM.cpp ==================================================================--- lib/Transforms/Scalar/LICM.cpp (revision 135415) +++ lib/Transforms/Scalar/LICM.cpp (working copy) @@ -151,6 +151,11 @@ /// bool isSafeToExecuteUnconditionally(Instruction &I); + /// isGuaranteedToExecute - Check that the instruction is guaranteed to + /// execute. + /// + bool isGuaranteedToExecute(Instruction &I); + /// pointerInvalidatedByLoop - Return true if the body of this loop may /// store into the memory location pointed to by V. /// @@ -577,6 +582,10 @@ if (Inst.isSafeToSpeculativelyExecute()) return true; + return isGuaranteedToExecute(Inst); +} + +bool LICM::isGuaranteedToExecute(Instruction &Inst) { // Otherwise we have to check to make sure that the instruction dominates all // of the exit blocks. If it doesn't, then there is a path out of the loop // which does not execute this instruction, so we can't hoist it. @@ -713,34 +722,37 @@ // If there is an non-load/store instruction in the loop, we can't promote // it. - unsigned InstAlignment; - if (LoadInst *load = dyn_cast<LoadInst>(Use)) { + if (isa<LoadInst>(Use)) { assert(!cast<LoadInst>(Use)->isVolatile() && "AST broken"); - InstAlignment = load->getAlignment(); } else if (StoreInst *store = dyn_cast<StoreInst>(Use)) { // Stores *of* the pointer are not interesting, only stores *to* the // pointer. if (Use->getOperand(1) != ASIV) continue; - InstAlignment = store->getAlignment(); + unsigned InstAlignment = store->getAlignment(); assert(!cast<StoreInst>(Use)->isVolatile() && "AST broken"); + + // Note that we only check GuaranteedToExecute inside the store case + // so that we do not introduce stores where they did not exist before + // (which would break the LLVM concurrency model). + + // If the alignment of this instruction allows us to specify a more + // restrictive (and performant) alignment and if we are sure this + // instruction will be executed, update the alignment. + // Larger is better, with the exception of 0 being the best alignment. + if ((InstAlignment > Alignment || InstAlignment == 0) + && (Alignment != 0)) + if (isGuaranteedToExecute(*Use)) { + GuaranteedToExecute = true; + Alignment = InstAlignment; + } + + if (!GuaranteedToExecute) + GuaranteedToExecute = isGuaranteedToExecute(*Use); + } else return; // Not a load or store. - // If the alignment of this instruction allows us to specify a more - // restrictive (and performant) alignment and if we are sure this - // instruction will be executed, update the alignment. - // Larger is better, with the exception of 0 being the best alignment. - if ((InstAlignment > Alignment || InstAlignment == 0) - && (Alignment != 0)) - if (isSafeToExecuteUnconditionally(*Use)) { - GuaranteedToExecute = true; - Alignment = InstAlignment; - } - - if (!GuaranteedToExecute) - GuaranteedToExecute = isSafeToExecuteUnconditionally(*Use); - LoopUses.push_back(Use); } }
I noticed the patch was already merged into the current LLVM language reference manual with new memory instructions, fence, cmpxchg and atomicrmw. Will the instructions be available in LLVM 3.0? On Mon, Jul 18, 2011 at 8:22 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> There was some discussion a while back about adding a C++0x-style > memory model and atomics for LLVM a while back > (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/31295), but > it got stalled. I'm going to try and restart progress on it. > > Attached are two patches; the first adds a section to LangRef with > just the memory model, without directly changing the documentation or > implementation of atomic intrinsics. This mostly comes from > https://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest, > but it's been modified a bit, so please look at the attached version. > The second fixes the one place I know of where LLVM violates that > proposed memory model. > > I would appreciate any reviews (primarily for the LangRef bits; I'm > reasonably confident the patch to LICM is correct). > > There was some discussion about allowing targets to implement > unaligned stores by widening, but I've left that out. No current > backend uses such an implementation, so there isn't a substantial > benefit, and allowing it would break some current optimizations (like > transforming a memset to an unaligned store). See also the FIXME in > the LangRef patch about architectures without a proper single-byte > store instruction. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Jianzhou
The current memory model section ends with the following discussions: "Note that in cases where none of the atomic intrinsics are used, this model places only one restriction on IR transformations on top of what is required for single-threaded execution: introducing a store to a byte which might not otherwise be stored to can introduce undefined behavior.... " Why is introducing additional loads allowed? For example, in a program that already contains two unordered stores to a location l, if we introduced an unordered load to l, then the load cannot see more than one stores, it must return undef. The question is whether the memory model takes the two unordered stores as a data race that has already makes the program's behavior undefined. But the spec does not make this point clear. My second question is whether the "undefined behavior" in "... introducing a store to a byte which might not otherwise be stored to can introduce undefined behavior.... " is same to the ``undef'' in the definition of R_{byte} that can be undef in several cases. Does the undef mean the LLVM undef values (http://llvm.org/docs/LangRef.html#undefvalues)? If "undefined behavior" means a program can do anything, why does R_{byte} have to return undef? Thanks. On Mon, Jul 18, 2011 at 8:22 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> There was some discussion a while back about adding a C++0x-style > memory model and atomics for LLVM a while back > (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/31295), but > it got stalled. I'm going to try and restart progress on it. > > Attached are two patches; the first adds a section to LangRef with > just the memory model, without directly changing the documentation or > implementation of atomic intrinsics. This mostly comes from > https://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest, > but it's been modified a bit, so please look at the attached version. > The second fixes the one place I know of where LLVM violates that > proposed memory model. > > I would appreciate any reviews (primarily for the LangRef bits; I'm > reasonably confident the patch to LICM is correct). > > There was some discussion about allowing targets to implement > unaligned stores by widening, but I've left that out. No current > backend uses such an implementation, so there isn't a substantial > benefit, and allowing it would break some current optimizations (like > transforming a memset to an unaligned store). See also the FIXME in > the LangRef patch about architectures without a proper single-byte > store instruction. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Jianzhou
On Sun, Jul 31, 2011 at 3:04 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:> The current memory model section ends with the following discussions: > > "Note that in cases where none of the atomic intrinsics are used, this > model places only one restriction on IR transformations on top of what > is required for single-threaded execution: introducing a store to a > byte which might not otherwise be stored to can introduce undefined > behavior.... " > > Why is introducing additional loads allowed? For example, in a program > that already contains two unordered stores to a location l, if we > introduced an unordered load to l, then the load cannot see more than > one stores, it must return undef.The intent of the model is that if the behavior of a program doesn't depend on the value of a load, it's okay if it is undef. This allows, for example, hoisting loads which are conditionally executed out of loops.> My second question is whether the "undefined behavior" in "... > introducing a store to a byte which might not otherwise be stored to > can introduce undefined behavior.... " is same to the ``undef'' in the > definition of R_{byte} that can be undef in several cases. Does the > undef mean the LLVM undef values > (http://llvm.org/docs/LangRef.html#undefvalues)? If "undefined > behavior" means a program can do anything, why does R_{byte} have to > return undef? Thanks."can introduce undefined behavior", I meant "Specifically, in the case where another thread might write to and read from an address, introducing a store can change a load that may see exactly one write into a load that may see multiple writes." Any suggestion for how to phrase that paragraph more clearly? -Eli
On Sun, Jul 31, 2011 at 12:49 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:> I noticed the patch was already merged into the current LLVM language > reference manual with new memory instructions, fence, cmpxchg and > atomicrmw. Will the instructions be available in LLVM 3.0?Hopefully, yes; the implementation is in progress. -Eli
C++ and Java memory models impose restrictions for locks and unlocks, such as a thread that releases a lock must acquired the lock, or the number of locks must be larger than the number of unlocks in the same thread... for enabling some optimizations, for example, simplifying trylocks (http://www.hpl.hp.com/techreports/2008/HPL-2008-56.html), and moving some instructions inside lock acquires (http://www.hpl.hp.com/techreports/2005/HPL-2005-217R1.html). What is the rationale that the LLVM memory model ignores such restrictions? On Mon, Jul 18, 2011 at 8:22 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> There was some discussion a while back about adding a C++0x-style > memory model and atomics for LLVM a while back > (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/31295), but > it got stalled. I'm going to try and restart progress on it. > > Attached are two patches; the first adds a section to LangRef with > just the memory model, without directly changing the documentation or > implementation of atomic intrinsics. This mostly comes from > https://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest, > but it's been modified a bit, so please look at the attached version. > The second fixes the one place I know of where LLVM violates that > proposed memory model. > > I would appreciate any reviews (primarily for the LangRef bits; I'm > reasonably confident the patch to LICM is correct). > > There was some discussion about allowing targets to implement > unaligned stores by widening, but I've left that out. No current > backend uses such an implementation, so there isn't a substantial > benefit, and allowing it would break some current optimizations (like > transforming a memset to an unaligned store). See also the FIXME in > the LangRef patch about architectures without a proper single-byte > store instruction. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Jianzhou
On Sun, Jul 31, 2011 at 8:16 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:> C++ and Java memory models impose restrictions for locks and unlocks, > such as a thread that releases a lock must acquired the lock, or the > number of locks must be larger than the number of unlocks in the same > thread... for enabling some optimizations, for example, simplifying > trylocks (http://www.hpl.hp.com/techreports/2008/HPL-2008-56.html), > and moving some instructions inside lock acquires > (http://www.hpl.hp.com/techreports/2005/HPL-2005-217R1.html). What is > the rationale that the LLVM memory model ignores such restrictions?The LLVM memory model doesn't include locks at all, so it can't include restrictions about them. One would implement locks, whether C++0x, posix, or something else, using the atomic instructions in Eli's patches, along with syscalls like futex or kqueue. C++0x's restrictions on locks allow an implementor to choose cheaper LLVM instructions than they'd otherwise have to. For example, the fact that try_lock is allowed to fail spuriously allows lock and try_lock to be implemented with acquire cmpxchg instructions rather than acq_rel cmpxchg instructions. The current cmpxchg instruction still leaves a small amount of performance on the table by having only a single ordering argument, rather than the two arguments that C++0x takes. In the case of try_lock, I expect this to manifest as an unnecessary fence on the failing path on ARM and PPC. Only time will tell whether this matters in practice, but if it does matter, I expect it to be straightforward to add the second argument. Jeffrey
In the definition of 'monotonic' ordering, ... "If an address is written monotonically by one thread, and other threads monotonically read that address repeatedly, the other threads must eventually see the write"... Does this mean if a thread does multi-writes monotonically, monotonic reads from other threads should see all of them? But intuitively, it seems to be fine for a read to ``miss'' some of the writes as long as the writes seen are monotonic in the sense that later reads should see the same write of earlier reads, or any write monotonically after the writes seen. In the case there is only one monotonic write, what does 'eventually' mean? Can we know a write must be seen when some condition holds, for example, a number of instructions executed, the thread that did the write executes a fence, ...? C++ memory model does not have ``unordered'', and "monotonic", but have "modification ordering" (is it same to the relaxed atomic variables the LLVM IR mentions?). If I am compiling C++ to LLVM, can all modification atomic be compiled to monotonic? And when should we use "unordered"? On Mon, Jul 18, 2011 at 8:22 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> There was some discussion a while back about adding a C++0x-style > memory model and atomics for LLVM a while back > (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/31295), but > it got stalled. I'm going to try and restart progress on it. > > Attached are two patches; the first adds a section to LangRef with > just the memory model, without directly changing the documentation or > implementation of atomic intrinsics. This mostly comes from > https://docs.google.com/View?docID=ddb4mhxz_22dz5g98dd&revision=_latest, > but it's been modified a bit, so please look at the attached version. > The second fixes the one place I know of where LLVM violates that > proposed memory model. > > I would appreciate any reviews (primarily for the LangRef bits; I'm > reasonably confident the patch to LICM is correct). > > There was some discussion about allowing targets to implement > unaligned stores by widening, but I've left that out. No current > backend uses such an implementation, so there isn't a substantial > benefit, and allowing it would break some current optimizations (like > transforming a memset to an unaligned store). See also the FIXME in > the LangRef patch about architectures without a proper single-byte > store instruction. > > -Eli > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-- Jianzhou
On Mon, Aug 22, 2011 at 9:55 AM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:> In the definition of 'monotonic' ordering, > ... "If an address is written monotonically by one thread, and other > threads monotonically read that address repeatedly, the other threads > must eventually see the write"...It's supposed to mean that if you have a something like looks like a spinloop with monotonic reads, it shouldn't spin forever if the value changes. I'll take another look at rewording that.> Does this mean if a thread does multi-writes monotonically, monotonic > reads from other threads should see all of them? But intuitively, it > seems to be fine for a read to ``miss'' some of the writes as long as > the writes seen are monotonic in the sense that later reads should see > the same write of earlier reads, or any write monotonically after the > writes seen. > > In the case there is only one monotonic write, what does 'eventually' > mean? Can we know a write must be seen when some condition holds, for > example, a number of instructions executed, the thread that did the > write executes a fence, ...? > > C++ memory model does not have ``unordered'', and "monotonic", but > have "modification ordering" (is it same to the relaxed atomic > variables the LLVM IR mentions?). If I am compiling C++ to LLVM, can > all modification atomic be compiled to monotonic? And when should we > use "unordered"?http://llvm.org/docs/Atomics.html is an attempt to make things much more straightforward than the stuff in LangRef. -Eli