thr3ads.net - search: "seq

Displaying 20 results from an estimated 51 matches for "seq_cst".

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Jul 31

[LLVMdev] Intended semantics for ``fence seq_cst``

Hi, TL;DR: should we add a new memory ordering to fences? ``fence seq_cst`` is currently use to represent two things: - GCC-style builtin ``__sync_synchronize()`` [0][1]. - C11/C++11's sequentially-consistent thread fence ``std::atomic_thread_fence(std::memory_order_seq_cst)`` [2]. As far as I understand: - The former orders all memory and emits an actual fence i...

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Jul 31

[LLVMdev] Intended semantics for ``fence seq_cst``

2013/7/31 JF Bastien <jfb at google.com>: > Hi, > > TL;DR: should we add a new memory ordering to fences? > > > ``fence seq_cst`` is currently use to represent two things: > - GCC-style builtin ``__sync_synchronize()`` [0][1]. > - C11/C++11's sequentially-consistent thread fence > ``std::atomic_thread_fence(std::memory_order_seq_cst)`` [2]. > > As far as I understand: > - The former orders all memor...

[LLVMdev] [RFC] Add second "failure" AtomicOrdering to cmpxchg instruction

2014 Mar 07

[LLVMdev] [RFC] Add second "failure" AtomicOrdering to cmpxchg instruction

...+ case Monotonic: Out << " monotonic"; break; + case Acquire: Out << " acquire"; break; + case Release: Out << " release"; break; + case AcquireRelease: Out << " acq_rel"; break; + case SequentiallyConsistent: Out << " seq_cst"; break; + } + + switch (FailureOrdering) { + default: Out << " <bad ordering " << int(FailureOrdering) << ">"; break; + case Unordered: Out << " unordered"; break; + case Monotonic: Out << " monotonic"; break; +...

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Jul 31

[LLVMdev] Intended semantics for ``fence seq_cst``

...y model, it doesn't matter that it's undefined behavior and relies on a GCC-style builtin to be "correct". The current standards offer all you need to write new code that can express the above intended behavior, but __sync_synchronize isn't a 1:1 mapping to atomic_thread_fence(seq_cst), it has stronger semantics and that's constraining which optimizations can be done on ``fence seq_cst``. LLVM therefore probably wants to distinguish both, so that it can fully optimize C++11 code without leaving legacy code in a bad position. 2013/7/31 Jeffrey Yasskin <jyasskin at google....

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Aug 01

[LLVMdev] Intended semantics for ``fence seq_cst``

Ok, so the semantics of your fence would be that it's a volatile memory access (http://llvm.org/docs/LangRef.html#volatile-memory-accesses), and that it provides happens-before edges for volatile accesses in the same way that a seq_cst fence provides for atomic accesses. FWIW, I don't think we should add that, because it's an attempt to define behavior that's undefined for other reasons (the data race on the volatile). If you (PNaCl?) explicitly want to define the behavior of legacy code that used 'volatile'...

[LLVMdev] Plan to optimize atomics in LLVM

2014 Aug 08

[LLVMdev] Plan to optimize atomics in LLVM

> I am planning in doing in IR, but with target specific-passes (such as X86ExpandAtomicPass) > that just share some of the code This would more normally be done via target hooks in LLVM, though the principle is sound. > But it must be target-dependent as for example on Power a > seq_cst store has a fence before it, while on ARM it has a fence > both before and after it (per http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html) That certainly seems to suggest some kind of parametrisation. > For this exact reason, I am planning on splitting AtomicExpandLoadLinkedPass > i...

RFC: atomic operations on SI+

2016 Mar 28

RFC: atomic operations on SI+

...+}}:{{[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:16{{$}} > +define void @atomic_cmpxchg_i32_offset(i32 addrspace(1)* %out, i32 %in, i32 %old) { > +entry: > + %gep = getelementptr i32, i32 addrspace(1)* %out, i32 4 > + %0 = cmpxchg volatile i32 addrspace(1)* %gep, i32 %old, i32 %in seq_cst seq_cst > + ret void > +} > + > +; FUNC-LABEL: {{^}}atomic_cmpxchg_i32_ret_offset: > +; GCN: buffer_atomic_cmpswap v{{\[}}[[RET:[0-9]+]]{{:[0-9]+}}], s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:16 glc{{$}} > +; GCN: buffer_store_dword v[[RET]] > +define void @atomic_cmpxchg_i32_ret_off...

RFC: atomic operations on SI+

2016 Mar 25

RFC: atomic operations on SI+

Hi Tom, Matt, I'm working on a project that needs few coherent atomic operations (HSA mode: load, store, compare-and-swap) for std::atomic_uint in HCC. the attached patch implements atomic compare and swap for SI+ (untested). I tried to stay within what was available, but there are few issues that I was unsure how to address: 1.) it currently uses v2i32 for both input and output. This

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Aug 01

[LLVMdev] Intended semantics for ``fence seq_cst``

...you compile volatile > accesses to 'atomic volatile monotonic' accesses? Then the normal > memory model would apply, and I don't think the instructions emitted > would change at all on the platforms I'm familiar with. I actually go further for now and promote volatiles to seq_cst atomics. This promotion happens after opt, but before most architecture-specific optimizations. I could have used relaxed ordering, but as a conservative first approach went with seq_cst. For PNaCl it's correct because we only support 8/16/32/64 bit types, require natural alignment (though we s...

[LLVMdev] Proposal: "load linked" and "store conditional" atomic instructions

2014 May 29

[LLVMdev] Proposal: "load linked" and "store conditional" atomic instructions

...For example the return value of the C++11 and C11 compare_exchange operations is actually whether the exchange succeeded, which leads to some common idioms in Clang-produced IR. >From "if(__c11_compare_exchange_strong(...))": %loaded = cmpxchg i32* %addr, i32 %oldval, i32 %newval seq_cst seq_cst %success = icmp eq i32 %loaded, %oldval br i1 %success, label %true, label %false the control-flow here should be something like: loop: %loaded = load linked i32* %addr seq_cst %trystore = icmp eq %loaded, %oldval br i1 %trystore, label %store.cond, label %fals...

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Aug 01

[LLVMdev] Intended semantics for ``fence seq_cst``

On Wed, Jul 31, 2013 at 6:10 PM, JF Bastien <jfb at google.com> wrote: > This promotion happens after opt, but before most > architecture-specific optimizations > You will need to do this in the frontend. The target independent optimizers are allowed to use the memory model. -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Aug 01

[LLVMdev] Intended semantics for ``fence seq_cst``

On Wed, Jul 31, 2013 at 6:39 PM, JF Bastien <jfb at google.com> wrote: > > You will need to do this in the frontend. The target independent > optimizers are allowed to use the memory model. > > We discussed doing this, and concluded that doing it pre-opt was overly > restrictive on correct code. Doing it post-opt bakes the behavior into the > portable code, so in a way

[LLVMdev] Intended semantics for ``fence seq_cst``

2013 Aug 01

[LLVMdev] Intended semantics for ``fence seq_cst``

> You will need to do this in the frontend. The target independent optimizers are allowed to use the memory model. We discussed doing this, and concluded that doing it pre-opt was overly restrictive on correct code. Doing it post-opt bakes the behavior into the portable code, so in a way it'll be reliably broken but won't penalize good code. FWIW it's easy to change from one to

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

RFC: non-temporal fencing in LLVM IR

...o think that risks unexpected coherence miss problems, though they would >>>> probably be very rare. But they would be very surprising if they did occur. >>>> >>> >>> Today's LLVM already emits 'lock or %eax, (%esp)' for 'fence >>> seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) when >>> targeting 32-bit x86 machines which do not support mfence. What >>> instruction sequence should we be using instead? >>> >> >> Do they have non-temporal accesses in the ISA? >> &gt...

[LLVMdev] LLVM Concurrency and Undef

2011 Aug 22

[LLVMdev] LLVM Concurrency and Undef

...f this operation reads a value written by a release atomic operation, it synchronizes-with that operation." (Strictly, the release operation synchronizes-with the acquire, not the other way around.) Since atomic/non-atomic races are defined to return undef from the load, even if the load has seq_cst ordering, the load never reads a value written, so none of the stores synchronize with the load. The text does say that all seq_cst loads and stores participate in the global seq_cst ordering that's compatible with the happens-before ordering, but that doesn't imply that happens-before is...

Swift to IR, generates wrong IR

2015 Oct 05

Swift to IR, generates wrong IR

...with ‘swiftc test.swift -emit-ir -o test.ll' When I try to run the .ll file or apply optimization with opt, I get errors like this one: lli: test.ll:548:110: error: expected instruction opcode %9 = cmpxchg i64* bitcast (%swift.type*** @field_type_vector_TipCalculator to i64*), i64 0, i64 %8 seq_cst seq_cst ^ I know swift is not part of the LLVM oss project, but is this problem coming from swift generating wrong IR code or...

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

2015 Apr 10

[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

..., ARM, ARM v8, Xeon Phi, Nvidia GPUs, etc.)? [I'll just answer "yes" for that one ;)] * What is a flush lowered to in assembly for each of the supported architectures? For instance, a flush might be implemented as an MFENCE on the x86 architecture in some compilers. * What are non-seq_cst atomic read, write, update and capture lowered to for each of your targets? * What are seq_cst atomic read, write, update and capture lowered to for each of your targets? * What is the taskwait construct lowered to for each of your targets? * What are omp_set_lock and omp_unset_lock lowered to for...

[LLVMdev] piping into lli broken on darwin

2012 Oct 31

[LLVMdev] piping into lli broken on darwin

While testing llvm/polly svn on x86_64-apple-darwin10/11/12, I noticed that three darwin specific polly-test regressions exist. At least part of these failures appear to be due to lli on darwin not accepting piped input such that the test... opt -load /sw/src/fink.build/llvm32-3.2-0/llvm-3.2/build/lib/LLVMPolly.so -basicaa -polly-prepare -polly-region-simplify -O3

RFC: non-temporal fencing in LLVM IR

2016 Jan 15

RFC: non-temporal fencing in LLVM IR

...lems, though they would probably be very > rare. But they would be very surprising if they > did occur. > > > Today's LLVM already emits 'lock or %eax, (%esp)' for > 'fence > seq_cst'/__sync_synchronize/__atomic_thread_fence(__ATOMIC_SEQ_CST) > when targeting 32-bit x86 machines which do not > support mfence. What instruction sequence should we > be using instead? > > > Do they have non-temporal...

[LLVMdev] LLVM Concurrency and Undef

2011 Aug 23

[LLVMdev] LLVM Concurrency and Undef

...itten by a release atomic operation, it > synchronizes-with that operation." > > (Strictly, the release operation synchronizes-with the acquire, not > the other way around.) > > Since atomic/non-atomic races are defined to return undef from the > load, even if the load has seq_cst ordering, the load never reads a > value written, so none of the stores synchronize with the load. A undef can be replaced by any concrete value. If the undef returned from the racy SC load happens to be instantiated by a value of the latest SC store, does it consider as "... reads a value...

search for: seq_cst