thr3ads.net - llvm dev - [llvm-dev] Creating sanity in LLVM/Clang's atomics impl [Mar 2016]

If this information is useful, please help other people find it:
Share via:
James Y Knight via llvm-dev
2016-Mar-15 23:34 UTC
[llvm-dev] Creating sanity in LLVM/Clang's atomics impl

I've uploaded a few changes to review for implementing Step "A1"
in the
atomic cleanup plan:

http://reviews.llvm.org/D18200
  Add __atomic_* lowering to AtomicExpandPass.

http://reviews.llvm.org/D18201
  Switch over targets to use AtomicExpandPass, and clean up target atomics
code.

There's also this, semi-related cleanup:
http://reviews.llvm.org/D17933
 Clang: Set MaxAtomicInlineWidth properly for i386, i486, and x86-64 cpus
without cmpxchg16b. Also, remove all the manual definition of
GCC_HAVE_SYNC_COMPARE_AND_SWAP_* macros.


There's some kinda annoying issues in LLVM I've run into so far:

1) The required duplication of logic for which platforms have what-sized
atomics between LLVM and Clang is unfortunate. This falls into the general
problem of clang's lib/Basic/Targets.cpp file being mostly an awful
repetition of information available in the LLVM backends.

We really ought to have Clang start depending on the LLVM backends.

2) The inability to access C ABI information in LLVM, whilst creating the
libcalls is somewhat of a problem.

The atomic libcalls need the following information:
- What "size_t" and "int" mean? (DL.getIntPtrType() seems
okayish for the
former (malloc already does that), and can use getInt32() for the
latter...except that MSP430 seems to use 16-bit "int")
- What sizes of integers actually exist? You need to know which functions
of the form "T __atomic_load_N(T*, int)", with some integral type T,
can
actually exist in the atomics library. (e.g. if the platform has no __int128,
there won't be an "__int128 __atomic_load_16(__int128*, int)" you
can
actually call.

All that information is available in Clang, but not in LLVM. I think such
low-level C ABI details would be a good idea to have in LLVM.


I also ran into a couple of other issues whilst on this quest:
1) The C11 _Atomic ABI is incompatible between GCC and Clang . As is
libc++'s std::atomic. (But not libstdc++: that uses the same ABI in both
compilers).

Also, in Clang, _Atomic and libc++'s std::atomic use unfortunate choices of
"maximum size to promote" that depend on the platform, and don't
allow for
easy future extension of the set of supported atomic sizes.

https://llvm.org/bugs/show_bug.cgi?id=26462

2) LLVM may not have a strong enough memory model to support GCC's __sync_*
builtins properly (this may affect only ARMv8/AArch64 at the moment?). GCC
has (recently) decided that __sync_fetch_and_add(&val, 4); needs *stronger*
memory barriers than "__atomic_fetch_add(&val, 4,
__ATOMIC_SEQ_CST);" . In
particular, the former should not allow the processor to reorder a
subsequent store before the store-release instruction at the end of the
LL/SC sequence generated by atomic_fetch_add.

The C11, and LLVM, definition of a seq_cst memory operation only makes it
ordered with other seq_cst operations, NOT with other non-seq_cst atomics.
But, apparently GCC feels that the definition of its
__sync_fetch_and_add builtin
requires that it act like it act like a seq_cst FENCE, preventing
non-seq_cst memory operations from moving across it. And thus, it now emits
an explicit barrier instruction after the store-release.

https://gcc.gnu.org/PR65697

Should we do similar to GCC, and add a 6th kind of ordering,
"seq_cst_fence" to model this?


On Wed, Jan 27, 2016 at 2:25 PM, James Knight <jyknight at google.com>
wrote:
> Right now, the atomics implementation in clang is a mess. It has three
> basically completely different code-paths:
>
>    1. There's the legacy __sync_* builtins, which clang lowers directly
>    to atomic IR instructions. Then, the llvm atomic IR instructions
themselves
>    can sometimes emit libcalls to __sync_* library functions (which are
>    basically undocumented, and users are typically responsible for
>    implementing themselves if they want it).
>    2. There's the new __atomic_* builtins, which clang will, depending
on
>    size and alignment and target, lower either to a libcall to a "
>    standardized-by-GCC
<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>"
>    __atomic_* library function (implemented by libatomic), or, to the
atomic
>    IR instructions. (Also included here are the __c11_atomic_* builtins,
which
>    are almost-identical clang-specific aliases of the __atomic_* builtins.
>    They share 99% of the same behavior and code.)
>    3. There's the C11 "_Atomic" type modifier and some OpenMP
stuff,
>    which clang lowers into atomic IR or libcalls to __atomic_* library
>    functions, almost completely separately from #2. (Note: doesn't
apply to
>    C++ std::atomic<>, which gets implemented with the builtins).
Beyond just a
>    lot of duplicate code, the _Atomic impl is just broken: sometimes it
emits
>    llvm IR instructions directly (letting things fall back to __sync_*
calls),
>    and sometimes it checks if it should emit a libcall first (falling back
to
>    __atomic_* calls). This is certainly bad behavior.
>
>
> I'd like to make a proposal for cleaning this mess up.
>
> BTW, as a sidenote [[
> One thing that's important to remember: at least on the C/C++ level, if
> you support lock-free atomic-ops for a given size/alignment, *ALL* the
> atomic ops for that size/alignment must be lock-free. This property is
> usually quite easy, because you'll have LL/SC or CAS instructions, with
> which everything else is implementable, and thus wouldn't ever emit a
> libcall, except for misaligned or too-large atomics, in any case.
>
> On the other hand, many older CPU models are missing those instructions,
> such as ARMv5, Sparcv8, and Intel 80386. When your minimum CPU is set to
> such a CPU, all atomic operations must be done via libcall -- it's not
> acceptable for atomic_store to emit an atomic "store"
instruction, but
> atomic_fetch_add to require a libcall which is implemented with a lock. If
> that were to happen, then atomic_fetch_add could not actually be atomic
> versus a simultaneous atomic_store.
> ]]
>
> So anyhow, there's basically two paths I think we could take for
cleanup.
> I like "Alternative A" below better, but would be interested to
hear if
> others have reasons to think the other would be preferable for some reason.
>
> *Both* alternatives I've suggested will have the effect that the
__sync_*
> builtins in clang will now lower to __atomic_* function calls on platforms
> without inline atomics (or for unsupported sizes), and C11 atomics will
> stop being schizophrenic. Clang will cease to ever emit a call to a
> __sync_* function from any of its builtins or language support. *This
> could theoretically cause an incompatibility on some target. *
>
> However, I think it ought to be pretty safe, by now: I can't imagine
> anyone has __sync_* functions implemented for their platform but
doesn't
> have the __atomic_* library functions implemented, as the __atomic_*
> builtins (which already use those library calls) are in widespread use.
> Notably, both libstdc++ and libc++ use them, as does a lot of third-party
> code. IMO it's worth having that potential incompatibility, in order to
> simplify the situation for users (only one set of atomic library calls to
> worry about), and to have a single code-path for atomics in the compiler.
>
>
>
> *Alternative A*: Move all atomic libcall emission into LLVM
>
> In this alternative, LLVM will be responsible for lowering all atomic
> operations, for all three code-paths listed at the beginning. Clang will
> emit no libcalls for atomic operations.
>
>
> A1) In LLVM: when lowering "store atomic", "load
atomic", "atomicrmw", and
> "cmpxchg" instructions that aren't supported by the target,
emit libcalls
> to the new __atomic_* functions, (rather than the current behavior of
> calling the legacy __sync_* functions.)
>
> Additionally, I'd like to move the decision to emit a libcall into
> AtomicExpandPass, and remove the ability to use Expand in ISel to create a
> libcall for ISD::ATOMIC_*. Putting the decision there allows querying a
> target attribute, up front, before any other lowering decisions (both in
> AtomicExpandPass and in ISel). When choosing whether to inline or libcall,
> the target will get to see only the size of the operation, and not the
> operation itself or the type. This will ensure that the "all or
none"
> property is adhered to.
>
> (Q: what about the implementation of
> __atomic_is_lock_free/__atomic_always_lock_free in clang? The clang
> frontend can't query target information from the llvm target backend,
can
> it? Is there some other way to expose the information of what sizes are
> supported by a backend, such that those builtins -- the latter of which
> must be a C++ constant expression -- can use it?)
>
> A2) In clang, start removing the code that knows how to do optimized
> library call lowering for atomics -- always emit llvm atomic ops. Except
> for one case: clang will still need to emit library calls itself for data
> not aligned naturally for its size. The LLVM atomic instructions currently
> will not handle unaligned data, but that *is* allowed for the four
"slab
> of memory" builtins (__atomic_load, __atomic_store,
> __atomic_compare_exchange, and __atomic_exchange).
>
> A3) Hopefully after that removal, the three codepaths in clang will be
> sufficiently the same that they can be merged or have been simplified so
> drastically that the remaining duplication is not worthwhile to merge.
>
> A4) In LLVM, add "align" attributes to cmpxchg and atomicrmw, and
allow
> specifying "align" values for "load atomic" and
"store atomic". LLVM will
> lower them to the generic library calls. In clang, start lowering
> misaligned atomics to these llvm instructions as well.
>
>
>
>
> *Alternative B*: Move all libcall emission for atomics into Clang.
>
> In this alternative, LLVM will *never* emit atomic libcalls from the
> atomic IR instructions. If the operation requested is not possible on a
> given target, that is an error. So, the cleanups here are to get clang to
> stop emitting LLVM IR atomics that cannot be lowered without libcalls.
>
> B1) In Clang: have the legacy __sync_* builtins become essentially aliases
> for the __atomic_* builtins (thus they will emit calls to __atomic_*
> library functions when applicable).
>
> B2) In Clang: Have the C11 _Atomic type support and openmp stuff call into
> the __atomic builtins' code to do their dirty work, instead of having
its
> own separate implementation.
>
> B3) After those two changes, I believe clang will only ever emit atomic IR
> instructions when they can be lowered. So then, in LLVM: get rid of the
> fallback to libcalls to __sync_* from the atomic IR instructions. If an
> atomic IR operation is requested, and not implementable lock-free, it will
> be an error, with no fallback.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160315/424ccd23/attachment-0001.html>
llvm dev - Mar 2016 - Creating sanity in LLVM/Clang's atomics impl

[llvm-dev] Creating sanity in LLVM/Clang's atomics impl