Geoff Berry via llvm-dev
2017-May-30 22:29 UTC
[llvm-dev] [atomics][AArch64] Possible bug in cmpxchg lowering
Currently the AtomicExpandPass will lower the following IR: define i1 @foo(i32* %obj, i32 %old, i32 %new) { entry: %v0 = cmpxchg weak volatile i32* %obj, i32 %old, i32 %new _*release acquire*_ %v1 = extractvalue { i32, i1 } %v0, 1 ret i1 %v1 } to the equivalent of the following on AArch64: _*ldxr w8, [x0]*_ cmp w8, w1 b.ne .LBB0_3 // BB#1: // %cmpxchg.trystore stlxr w8, w2, [x0] cbz w8, .LBB0_4 // BB#2: // %cmpxchg.failure mov w0, wzr ret .LBB0_3: // %cmpxchg.nostore clrex mov w0, wzr ret .LBB0_4: orr w0, wzr, #0x1 ret GCC instead generates a ldaxr for the initial load, which seems more correct to me since it is honoring the requested failure case acquire ordering. I'd like to get other opinions on this before filing a bug. I believe the code in AtomicExpand::expandAtomicCmpXchg() is responsible for this discrepancy, since it only uses the failure case memory order for targets that use fences (i.e. when TLI->shouldInsertFencesForAtomic(CI) is true). -- Geoff Berry Employee of Qualcomm Datacenter Technologies, Inc. Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/8f300dfd/attachment.html>
Tim Northover via llvm-dev
2017-May-30 22:40 UTC
[llvm-dev] [atomics][AArch64] Possible bug in cmpxchg lowering
Hi Geoff, On 30 May 2017 at 15:29, Geoff Berry via llvm-dev <llvm-dev at lists.llvm.org> wrote:> GCC instead generates a ldaxr for the initial load, which seems more correct > to me since it is honoring the requested failure case acquire ordering. I'd > like to get other opinions on this before filing a bug.It's all a bit uncertain at the moment, as the C++ standard is at the very least ambiguous (as is the LangRef). I think historically I decided to interpret "no stronger than" to forbid a failure ordering from having acquire semantics if the success ordering didn't (as in, you can only drop requirements). I don't know if I ever convinced anyone else of that (or even mentioned it to them) though. But it looks like the constraints might well be relaxed in C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0418r2.html), and I know at least some Swift people are keen on what you're describing being valid. So changing LLVM to do the same as GCC does seem like a decent idea to me. But remember to update the LangRef to make it clear. Cheers. Tim.
Friedman, Eli via llvm-dev
2017-May-30 22:43 UTC
[llvm-dev] [atomics][AArch64] Possible bug in cmpxchg lowering
On 5/30/2017 3:29 PM, Geoff Berry via llvm-dev wrote:> > Currently the AtomicExpandPass will lower the following IR: > > define i1 @foo(i32* %obj, i32 %old, i32 %new) { > entry: > %v0 = cmpxchg weak volatile i32* %obj, i32 %old, i32 %new _*release > acquire*_ > %v1 = extractvalue { i32, i1 } %v0, 1 > ret i1 %v1 > } > > to the equivalent of the following on AArch64: > > _*ldxr w8, [x0]*_ > cmp w8, w1 > b.ne .LBB0_3 > // BB#1: // %cmpxchg.trystore > stlxr w8, w2, [x0] > cbz w8, .LBB0_4 > // BB#2: // %cmpxchg.failure > mov w0, wzr > ret > .LBB0_3: // %cmpxchg.nostore > clrex > mov w0, wzr > ret > .LBB0_4: > orr w0, wzr, #0x1 > ret > > GCC instead generates a ldaxr for the initial load, which seems more > correct to me since it is honoring the requested failure case acquire > ordering. I'd like to get other opinions on this before filing a bug. > > I believe the code in AtomicExpand::expandAtomicCmpXchg() is > responsible for this discrepancy, since it only uses the failure case > memory order for targets that use fences (i.e. when > TLI->shouldInsertFencesForAtomic(CI) is true). >From LangRef: "the ordering constraint on failure must be no stronger than that on success". It's not really explained in LangRef, but per C++11, the way we convert the success semantic to a failure semantic for cmpxchg is by converted acq_rel to acquire and release to relaxed. Acquire is stronger than relaxed, so your cmpxchg is illegal. Not sure why the verifier isn't catching it. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170530/174aa738/attachment.html>
Tim Northover via llvm-dev
2017-May-30 22:49 UTC
[llvm-dev] [atomics][AArch64] Possible bug in cmpxchg lowering
On 30 May 2017 at 15:40, Tim Northover <t.p.northover at gmail.com> wrote:> So changing LLVM to do the same as GCC does seem like a decent idea to > me. But remember to update the LangRef to make it clear.Oh, and if you do this you may want to check what Clang does when the order isn't a compile-time constant. Schematically it does a couple of switches and a nested set of cmpxchgs with all possible ordering constraints. Those switches may or may not include the cases you're interested in. Cheers. Tim.