thr3ads.net - llvm dev - [llvm-dev] [compiler-rt] Improve atomic locking? [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Paul Peet via llvm-dev

2016-Dec-29 23:20 UTC

[llvm-dev] [compiler-rt] Improve atomic locking?

Hey,

I am wondering if there wouldn't be more room for improving the
locking of a pointer when an atomic operation is being made since I've
noticed that one could increase the SPINLOCK_COUNT in
lib/builtins/atomic.c to (1 << 13) which is a 8x increase of available
locks if we also change the type of the atomic lock which currently is
uintptr_t to a single byte (uint8_t) which I think is more appropriate
since the lock can only have two states for other systems than
freebsd, macos.

Am I missing something? This seems like choosing uintptr_t was intentional.

~ Paul

David Chisnall via llvm-dev

2016-Dec-30 10:01 UTC

head link

[llvm-dev] [compiler-rt] Improve atomic locking?

On 30 Dec 2016, at 00:20, Paul Peet via llvm-dev <llvm-dev at
lists.llvm.org> wrote:> 
> Hey,
> 
> I am wondering if there wouldn't be more room for improving the
> locking of a pointer when an atomic operation is being made since I've
> noticed that one could increase the SPINLOCK_COUNT in
> lib/builtins/atomic.c to (1 << 13) which is a 8x increase of
available
> locks if we also change the type of the atomic lock which currently is
> uintptr_t to a single byte (uint8_t) which I think is more appropriate
> since the lock can only have two states for other systems than
> freebsd, macos.
> 
> Am I missing something? This seems like choosing uintptr_t was intentional.
It’s mostly about false sharing and depends a bit on how the atomics are
implemented for any given architecture and microarchitecture.  In many
implementations, the atomicity is guaranteed by the cache coherency protocol, so
the granularity is one cache line.  With 64 byte cache lines and an 8-byte lock,
you have 8 locks per cache line, so with 2^10 locks there’s a 1/128 chance of
false sharing causing contention on the locks even when two addresses map to
different locks.  With int8_t on the same system you now have 64 locks per cache
line, so a 1/16 chance of false sharing.

Additionally, on many architectures there are no atomic instructions for atomics
that work on quantities smaller than a register (or a 32-bit value on 64-bit
systems).  For example, on MIPS you have 32- and 64-bit load-linked and
store-conditional.  If you want to do an 8-bit compare and swap, then you will
encounter contention with other operations on the same 32-bit value, even on a
microarchitecture that supports word-granularity atomics (using something finer
grained than the cache coherency protocol).  Even on systems where the ISA
supports 8-bit atomics, they’re often microcoded as shift and mask sequences.

Note: On Linux, we should be using futexes, but no one who uses Linux appears to
care enough to implement this code path.

Do you have an example in which lock contention here is a bottleneck?  I’d be
interested in seeing it.  If you do encounter contention here, then you’d likely
see more of a speedup simply bumping SPINLOCK_COUNT without changing the
underlying type.  There’s no reason on most systems not to allocate 64KB for the
spinlock array: memory is cheap, most of it won’t be in the cache, and a more
sparse structure will reduce contention.

David

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Dec 2016 - [compiler-rt] Improve atomic locking?

[llvm-dev] [compiler-rt] Improve atomic locking?

[llvm-dev] [compiler-rt] Improve atomic locking?

Reasonably Related Threads