David Chisnall
2014-May-10 17:29 UTC
[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities
On 10 May 2014, at 18:14, Tim Northover <t.p.northover at gmail.com> wrote:>> The easiest solution would be to extend the cmpxchg instruction with a >> weak variant. It is then trivial to map load, modify, weak-cmpxchg to >> load-linked, modify, store-conditional (that is what weak cmpxchg was >> intended for in the C[++]11 memory model). > > That would certainly be the easiest. But you'd get less scope for > optimising control flow around the instructions (say an early return > on failure or something). I think quite a bit can be done if LLVM > *really* knows what's going to be going on with these atomic ops on > LL/SC architectures.I am not sure of any transforms that we'd want to do that aren't microarchitecture-specific that need to know about the difference between ll-modify-sc and load-modify-weak-cmpxchg.>> I don't suppose you have any plans to port Mips to the IR-level LL/SC >>> expansion? Now that the infrastructure is present it's quite a >>> simplification (r206490 in ARM64 for example, though you need existing >>> target-specific intrinsics at the moment). It would be good to iron >>> out any ARM-specific assumptions I've made. >> >> I'd rather avoid it, because it doing it that late precludes a lot of optimisations >> that we're interested in. I'd much rather extend the IR to support them at a >> generic level. > > I think you might be misinterpreting what the change actually is. > Currently the expansion happens post-ISel (emitAtomicBinary and > friends building the control flow and MachineInstrs directly). > > This moves it to before ISel but still late in the pipeline (actually, > you could even put it earlier: I didn't because of fears of opaque > @llvm.arm.ldrex intrinsics pessimising mid-end optimisations). > Strictly earlier than what happens now, and a reasonable > stepping-stone to generic load-linked instructions or intrinsics.The problem is that the optimisations that we're most interested in should be done by the mid-level optimisers and are architecture agnostic.> In my experience, CodeGen has improved with the change. ISelDAG gets > to make use of more information when choosing how to do the operation: > values already known to be sign/zero extended, immediates, etc.Yes, it's definitely an improvement in the short term, but I'm not convinced by the approach in the long term. It's a useful hack that works around a shortcoming in the IR, not a solution. David
Tim Northover
2014-May-10 17:41 UTC
[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities
>> In my experience, CodeGen has improved with the change. ISelDAG gets >> to make use of more information when choosing how to do the operation: >> values already known to be sign/zero extended, immediates, etc. > > Yes, it's definitely an improvement in the short term, but I'm not convinced > by the approach in the long term. It's a useful hack that works around a > shortcoming in the IR, not a solution.Hmm, so it sounds like you're not actually after an IR-level LL/SC, but a higher-level "cmpxchg weak". Fair enough, I suppose I'd envisaged putting that burden on Clang. Tim.
David Chisnall
2014-May-10 18:01 UTC
[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities
On 10 May 2014, at 18:41, Tim Northover <t.p.northover at gmail.com> wrote:>>> In my experience, CodeGen has improved with the change. ISelDAG gets >>> to make use of more information when choosing how to do the operation: >>> values already known to be sign/zero extended, immediates, etc. >> >> Yes, it's definitely an improvement in the short term, but I'm not convinced >> by the approach in the long term. It's a useful hack that works around a >> shortcoming in the IR, not a solution. > > Hmm, so it sounds like you're not actually after an IR-level LL/SC, > but a higher-level "cmpxchg weak". Fair enough, I suppose I'd > envisaged putting that burden on Clang.Yes. The weak cmpxchg is what the C[++]11 memory model provides, so there's a lot of work proving soundness for various transforms involving it. Once it gets to pre-codegen IR passes, it's trivial to map a load that's paired with an weak cmpxchg to a ll / ldrex and the cmpxchg to the sc / strex. This could be a generic IR pass that is parameterised with the names of the ll / sc intrinsics (or even some architecture-agnostic intrinsics for ll / sc, since they're fairly common), but ideally the optimisation would be on something that closely resembles the memory model of the source language. There are also microarchitectural optimisations that can happen later. In clang currently, we approximate a weak cmpxchg with a strong cmpxchg, but that approximation is not quite semantically valid for all architectures (strong cmpxchg is permitted to block, weak is not) and is not ideal for optimisation either. David