On Mon, Aug 22, 2011 at 3:49 PM, Eli Friedman <eli.friedman at gmail.com> wrote:> On Mon, Aug 22, 2011 at 3:40 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote: >> On Mon, Aug 22, 2011 at 6:08 PM, Eli Friedman <eli.friedman at gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 2:49 PM, Santosh Nagarakatte >>> <santosh.nagarakatte at gmail.com> wrote: >>>> Hi all, >>>> >>>> I have been trying to understand the use of undef in both sequential >>>> and concurrent programs. >>>> >>>> >From the LLVM Language Reference Manual, I see the following >>>> definition of undef. >>>> "Undef can be used anywhere a constant is expected, and indicates that >>>> the user of the value may receive an unspecified bit-pattern". >>>> LLVM Language Reference manual also demonstrates how optimizers can >>>> use these undef values to optimize the program. >>>> >>>> However, on the other hand, with the LLVM Atomics and Concurrency >>>> Guide states that >>>> If code accesses a memory location from multiple threads at the same >>>> time, the resulting loads return 'undef'. >>>> This is different from the C++ memory model, which provides undefined >>>> behavior. What is the rationale for returning an undef on racing >>>> reads? >>>> >>>> LLVM Atomics and Concurrency guide also states the following >>>> "Note that speculative loads are allowed; a load which is part of a >>>> race returns undef, but does not have undefined behavior" >>>> >>>> If the speculative loads returns an undef and the returned value is >>>> used, then it results in an undefined behavior. Am I correct? >>> >>> It behaves like any other undef value... which do often lead to >>> undefined behavior. >>> >>>> If so, what is the purpose of returning an undef with a speculative load? >>>> Is it to ensure that the subsequent uses of the value of the >>>> speculatively introduced load is caught/detected by the optimization? >>> >>> The point is primarily to allow optimizations like LICM to introduce >>> loads whose value is never used. It also keeps consistent semantics >>> through CodeGen, where some targets widen loads. >>> >>>> Is it possible to separate the "undef" in a sequential setting and >>>> "undef" with speculative loads in a concurrent setting with separate >>>> undefs? >>> >>> The intention is that they should have the same semantics. >> >> Suppose there are three threads T1, T2 and T3, >> T1 (S1 )stores to a location l as non-atomic, >> T2 then (S2)stores to l as SC-atomic, >> later T3 (L3)loads from l as SC-atomic. >> >> I think the load @ T3 should return undef, since it can see both >> writes from T1 T2. Then the question is if the SC store @ T2 --- S2 >> and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with) >> edge. >> >> This will affect if later conflicting accesses are ordered or not, and >> whether memory accesses are ordered makes load return undef or not. >> >> If the S2 and L3 still introduces an SW edge, the next question is >> suppose there is a later SC-load L3' @ T3, does it also return undef? >> But this potentially makes the SC atomic sequence S2 L3 L3' >> inconsistent --- later SC loads can read different writes from earlier >> loads if there are no SC-stores in between. >> >> So I think data-races SC/acq/rel atomics cannot introduce SW edges. >> Intuitively if the locations designed for locks are used by non-atomic >> memory accesses, the locks cannot behave correctly. Is it correct? > > Yes, that is correct. I am now convinced that it is worthwhile to > include the definition of how SW edges form in LangRef.It's already there: "We define a happens-before partial order as the least partial order that * Is a superset of single-thread program order, and * When a synchronizes-with b, includes an edge from a to b. Synchronizes-with pairs are introduced by platform-specific techniques, like pthread locks, thread creation, thread joining, etc., and by atomic instructions. (See also Atomic Memory Ordering Constraints)." ... "acquire - In addition to the guarantees of monotonic, if this operation reads a value written by a release atomic operation, it synchronizes-with that operation." (Strictly, the release operation synchronizes-with the acquire, not the other way around.) Since atomic/non-atomic races are defined to return undef from the load, even if the load has seq_cst ordering, the load never reads a value written, so none of the stores synchronize with the load. The text does say that all seq_cst loads and stores participate in the global seq_cst ordering that's compatible with the happens-before ordering, but that doesn't imply that happens-before is a superset of the seq_cst ordering. I'm not sure whether any paradoxes arise from allowing racy seq_cst loads to return undef, but I haven't seen any examples so far. I don't object to clarifying the text, of course. Jeffrey
On Mon, Aug 22, 2011 at 7:27 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:> On Mon, Aug 22, 2011 at 3:49 PM, Eli Friedman <eli.friedman at gmail.com> wrote: >> On Mon, Aug 22, 2011 at 3:40 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote: >>> On Mon, Aug 22, 2011 at 6:08 PM, Eli Friedman <eli.friedman at gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 2:49 PM, Santosh Nagarakatte >>>> <santosh.nagarakatte at gmail.com> wrote: >>>>> Hi all, >>>>> >>>>> I have been trying to understand the use of undef in both sequential >>>>> and concurrent programs. >>>>> >>>>> >From the LLVM Language Reference Manual, I see the following >>>>> definition of undef. >>>>> "Undef can be used anywhere a constant is expected, and indicates that >>>>> the user of the value may receive an unspecified bit-pattern". >>>>> LLVM Language Reference manual also demonstrates how optimizers can >>>>> use these undef values to optimize the program. >>>>> >>>>> However, on the other hand, with the LLVM Atomics and Concurrency >>>>> Guide states that >>>>> If code accesses a memory location from multiple threads at the same >>>>> time, the resulting loads return 'undef'. >>>>> This is different from the C++ memory model, which provides undefined >>>>> behavior. What is the rationale for returning an undef on racing >>>>> reads? >>>>> >>>>> LLVM Atomics and Concurrency guide also states the following >>>>> "Note that speculative loads are allowed; a load which is part of a >>>>> race returns undef, but does not have undefined behavior" >>>>> >>>>> If the speculative loads returns an undef and the returned value is >>>>> used, then it results in an undefined behavior. Am I correct? >>>> >>>> It behaves like any other undef value... which do often lead to >>>> undefined behavior. >>>> >>>>> If so, what is the purpose of returning an undef with a speculative load? >>>>> Is it to ensure that the subsequent uses of the value of the >>>>> speculatively introduced load is caught/detected by the optimization? >>>> >>>> The point is primarily to allow optimizations like LICM to introduce >>>> loads whose value is never used. It also keeps consistent semantics >>>> through CodeGen, where some targets widen loads. >>>> >>>>> Is it possible to separate the "undef" in a sequential setting and >>>>> "undef" with speculative loads in a concurrent setting with separate >>>>> undefs? >>>> >>>> The intention is that they should have the same semantics. >>> >>> Suppose there are three threads T1, T2 and T3, >>> T1 (S1 )stores to a location l as non-atomic, >>> T2 then (S2)stores to l as SC-atomic, >>> later T3 (L3)loads from l as SC-atomic. >>> >>> I think the load @ T3 should return undef, since it can see both >>> writes from T1 T2. Then the question is if the SC store @ T2 --- S2 >>> and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with) >>> edge. >>> >>> This will affect if later conflicting accesses are ordered or not, and >>> whether memory accesses are ordered makes load return undef or not. >>> >>> If the S2 and L3 still introduces an SW edge, the next question is >>> suppose there is a later SC-load L3' @ T3, does it also return undef? >>> But this potentially makes the SC atomic sequence S2 L3 L3' >>> inconsistent --- later SC loads can read different writes from earlier >>> loads if there are no SC-stores in between. >>> >>> So I think data-races SC/acq/rel atomics cannot introduce SW edges. >>> Intuitively if the locations designed for locks are used by non-atomic >>> memory accesses, the locks cannot behave correctly. Is it correct? >> >> Yes, that is correct. I am now convinced that it is worthwhile to >> include the definition of how SW edges form in LangRef. > > It's already there: > > "We define a happens-before partial order as the least partial order that > > * Is a superset of single-thread program order, and > * When a synchronizes-with b, includes an edge from a to b. > Synchronizes-with pairs are introduced by platform-specific > techniques, like pthread locks, thread creation, thread joining, etc., > and by atomic instructions. (See also Atomic Memory Ordering > Constraints)." > > ... > > "acquire - In addition to the guarantees of monotonic, if this > operation reads a value written by a release atomic operation, it > synchronizes-with that operation." > > (Strictly, the release operation synchronizes-with the acquire, not > the other way around.) > > Since atomic/non-atomic races are defined to return undef from the > load, even if the load has seq_cst ordering, the load never reads a > value written, so none of the stores synchronize with the load.A undef can be replaced by any concrete value. If the undef returned from the racy SC load happens to be instantiated by a value of the latest SC store, does it consider as "... reads a value written by ..."? I think our answer is still no, right?> > The text does say that all seq_cst loads and stores participate in the > global seq_cst ordering that's compatible with the happens-before > ordering, but that doesn't imply that happens-before is a superset of > the seq_cst ordering. I'm not sure whether any paradoxes arise from > allowing racy seq_cst loads to return undef, but I haven't seen any > examples so far. > > I don't object to clarifying the text, of course. > > Jeffrey >-- Jianzhou
On Mon, Aug 22, 2011 at 5:29 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:> On Mon, Aug 22, 2011 at 7:27 PM, Jeffrey Yasskin <jyasskin at google.com> wrote: >> On Mon, Aug 22, 2011 at 3:49 PM, Eli Friedman <eli.friedman at gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 3:40 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote: >>>> On Mon, Aug 22, 2011 at 6:08 PM, Eli Friedman <eli.friedman at gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 2:49 PM, Santosh Nagarakatte >>>>> <santosh.nagarakatte at gmail.com> wrote: >>>>>> Hi all, >>>>>> >>>>>> I have been trying to understand the use of undef in both sequential >>>>>> and concurrent programs. >>>>>> >>>>>> >From the LLVM Language Reference Manual, I see the following >>>>>> definition of undef. >>>>>> "Undef can be used anywhere a constant is expected, and indicates that >>>>>> the user of the value may receive an unspecified bit-pattern". >>>>>> LLVM Language Reference manual also demonstrates how optimizers can >>>>>> use these undef values to optimize the program. >>>>>> >>>>>> However, on the other hand, with the LLVM Atomics and Concurrency >>>>>> Guide states that >>>>>> If code accesses a memory location from multiple threads at the same >>>>>> time, the resulting loads return 'undef'. >>>>>> This is different from the C++ memory model, which provides undefined >>>>>> behavior. What is the rationale for returning an undef on racing >>>>>> reads? >>>>>> >>>>>> LLVM Atomics and Concurrency guide also states the following >>>>>> "Note that speculative loads are allowed; a load which is part of a >>>>>> race returns undef, but does not have undefined behavior" >>>>>> >>>>>> If the speculative loads returns an undef and the returned value is >>>>>> used, then it results in an undefined behavior. Am I correct? >>>>> >>>>> It behaves like any other undef value... which do often lead to >>>>> undefined behavior. >>>>> >>>>>> If so, what is the purpose of returning an undef with a speculative load? >>>>>> Is it to ensure that the subsequent uses of the value of the >>>>>> speculatively introduced load is caught/detected by the optimization? >>>>> >>>>> The point is primarily to allow optimizations like LICM to introduce >>>>> loads whose value is never used. It also keeps consistent semantics >>>>> through CodeGen, where some targets widen loads. >>>>> >>>>>> Is it possible to separate the "undef" in a sequential setting and >>>>>> "undef" with speculative loads in a concurrent setting with separate >>>>>> undefs? >>>>> >>>>> The intention is that they should have the same semantics. >>>> >>>> Suppose there are three threads T1, T2 and T3, >>>> T1 (S1 )stores to a location l as non-atomic, >>>> T2 then (S2)stores to l as SC-atomic, >>>> later T3 (L3)loads from l as SC-atomic. >>>> >>>> I think the load @ T3 should return undef, since it can see both >>>> writes from T1 T2. Then the question is if the SC store @ T2 --- S2 >>>> and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with) >>>> edge. >>>> >>>> This will affect if later conflicting accesses are ordered or not, and >>>> whether memory accesses are ordered makes load return undef or not. >>>> >>>> If the S2 and L3 still introduces an SW edge, the next question is >>>> suppose there is a later SC-load L3' @ T3, does it also return undef? >>>> But this potentially makes the SC atomic sequence S2 L3 L3' >>>> inconsistent --- later SC loads can read different writes from earlier >>>> loads if there are no SC-stores in between. >>>> >>>> So I think data-races SC/acq/rel atomics cannot introduce SW edges. >>>> Intuitively if the locations designed for locks are used by non-atomic >>>> memory accesses, the locks cannot behave correctly. Is it correct? >>> >>> Yes, that is correct. I am now convinced that it is worthwhile to >>> include the definition of how SW edges form in LangRef. >> >> It's already there: >> >> "We define a happens-before partial order as the least partial order that >> >> * Is a superset of single-thread program order, and >> * When a synchronizes-with b, includes an edge from a to b. >> Synchronizes-with pairs are introduced by platform-specific >> techniques, like pthread locks, thread creation, thread joining, etc., >> and by atomic instructions. (See also Atomic Memory Ordering >> Constraints)." >> >> ... >> >> "acquire - In addition to the guarantees of monotonic, if this >> operation reads a value written by a release atomic operation, it >> synchronizes-with that operation." >> >> (Strictly, the release operation synchronizes-with the acquire, not >> the other way around.) >> >> Since atomic/non-atomic races are defined to return undef from the >> load, even if the load has seq_cst ordering, the load never reads a >> value written, so none of the stores synchronize with the load. > > A undef can be replaced by any concrete value. If the undef returned > from the racy SC load happens to be instantiated by a value of the > latest SC store, does it consider as "... reads a value written by > ..."? I think our answer is still no, right?Right; semantically, undef isn't any concrete value, even though backends are forced to eventually turn it into one. -Eli