Displaying 3 results from an estimated 3 matches for "or4i".
Did you mean:
or4
2016 Jan 13
4
RFC: non-temporal fencing in LLVM IR
...s with Hans Boehm on the topic. Any mistakes below are my own,
all the clever bits are theirs.
*Why?*
Ignore non-temporals for a moment, on most x86 targets LLVM generates an
mfence for seq_cst atomic fencing. One could instead use a locked
idempotent atomic accesses to top-of-stack such as lock or4i [RSP-8] 0.
Philip has measured this as equivalent on micro-benchmarks, but as ~25%
faster in macro-benchmarks (other codebases confirm this). There's one
problem with this approach: non-temporal accesses on x86 are only ordered
by fence instructions! This means that code using non-temporal acce...
2016 Jan 14
2
RFC: non-temporal fencing in LLVM IR
...e clever bits are theirs.
>
> > Why?
>
> > Ignore non-temporals for a moment, on most x86 targets LLVM
> > generates
> > an mfence for seq_cst atomic fencing. One could instead use a
> > locked
> > idempotent atomic accesses to top-of-stack such as lock or4i
> > [RSP-8]
> > 0 . Philip has measured this as equivalent on micro-benchmarks, but
> > as ~25% faster in macro-benchmarks (other codebases confirm this).
> > There's one problem with this approach: non-temporal accesses on
> > x86
> > are only ordered by fen...
2016 Jan 13
2
RFC: non-temporal fencing in LLVM IR
...> all the clever bits are theirs.
>
>
>
>
>
> *Why?*
>
>
>
> Ignore non-temporals for a moment, on most x86 targets LLVM generates an
> mfence for seq_cst atomic fencing. One could instead use a locked
> idempotent atomic accesses to top-of-stack such as lock or4i [RSP-8] 0.
> Philip has measured this as equivalent on micro-benchmarks, but as ~25%
> faster in macro-benchmarks (other codebases confirm this). There's one
> problem with this approach: non-temporal accesses on x86 are only ordered
> by fence instructions! This means that code usi...