search for: or4i

Displaying 3 results from an estimated 3 matches for "or4i".

Did you mean: or4
2016 Jan 13
4
RFC: non-temporal fencing in LLVM IR
...s with Hans Boehm on the topic. Any mistakes below are my own, all the clever bits are theirs. *Why?* Ignore non-temporals for a moment, on most x86 targets LLVM generates an mfence for seq_cst atomic fencing. One could instead use a locked idempotent atomic accesses to top-of-stack such as lock or4i [RSP-8] 0. Philip has measured this as equivalent on micro-benchmarks, but as ~25% faster in macro-benchmarks (other codebases confirm this). There's one problem with this approach: non-temporal accesses on x86 are only ordered by fence instructions! This means that code using non-temporal acce...
2016 Jan 14
2
RFC: non-temporal fencing in LLVM IR
...e clever bits are theirs. > > > Why? > > > Ignore non-temporals for a moment, on most x86 targets LLVM > > generates > > an mfence for seq_cst atomic fencing. One could instead use a > > locked > > idempotent atomic accesses to top-of-stack such as lock or4i > > [RSP-8] > > 0 . Philip has measured this as equivalent on micro-benchmarks, but > > as ~25% faster in macro-benchmarks (other codebases confirm this). > > There's one problem with this approach: non-temporal accesses on > > x86 > > are only ordered by fen...
2016 Jan 13
2
RFC: non-temporal fencing in LLVM IR
...> all the clever bits are theirs. > > > > > > *Why?* > > > > Ignore non-temporals for a moment, on most x86 targets LLVM generates an > mfence for seq_cst atomic fencing. One could instead use a locked > idempotent atomic accesses to top-of-stack such as lock or4i [RSP-8] 0. > Philip has measured this as equivalent on micro-benchmarks, but as ~25% > faster in macro-benchmarks (other codebases confirm this). There's one > problem with this approach: non-temporal accesses on x86 are only ordered > by fence instructions! This means that code usi...