thr3ads.net - search: "or4i"

Displaying 3 results from an estimated 3 matches for "or4i".

Did you mean: or4

2016 Jan 13

RFC: non-temporal fencing in LLVM IR

...s with Hans Boehm on the topic. Any mistakes below are my own, all the clever bits are theirs. *Why?* Ignore non-temporals for a moment, on most x86 targets LLVM generates an mfence for seq_cst atomic fencing. One could instead use a locked idempotent atomic accesses to top-of-stack such as lock or4i [RSP-8] 0. Philip has measured this as equivalent on micro-benchmarks, but as ~25% faster in macro-benchmarks (other codebases confirm this). There's one problem with this approach: non-temporal accesses on x86 are only ordered by fence instructions! This means that code using non-temporal acce...

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

RFC: non-temporal fencing in LLVM IR

...e clever bits are theirs. > > > Why? > > > Ignore non-temporals for a moment, on most x86 targets LLVM > > generates > > an mfence for seq_cst atomic fencing. One could instead use a > > locked > > idempotent atomic accesses to top-of-stack such as lock or4i > > [RSP-8] > > 0 . Philip has measured this as equivalent on micro-benchmarks, but > > as ~25% faster in macro-benchmarks (other codebases confirm this). > > There's one problem with this approach: non-temporal accesses on > > x86 > > are only ordered by fen...

RFC: non-temporal fencing in LLVM IR

2016 Jan 13

RFC: non-temporal fencing in LLVM IR

...> all the clever bits are theirs. > > > > > > *Why?* > > > > Ignore non-temporals for a moment, on most x86 targets LLVM generates an > mfence for seq_cst atomic fencing. One could instead use a locked > idempotent atomic accesses to top-of-stack such as lock or4i [RSP-8] 0. > Philip has measured this as equivalent on micro-benchmarks, but as ~25% > faster in macro-benchmarks (other codebases confirm this). There's one > problem with this approach: non-temporal accesses on x86 are only ordered > by fence instructions! This means that code usi...

search for: or4i