David Chisnall
2014-May-10 16:38 UTC
[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities
On 10 May 2014, at 16:18, Tim Northover <t.p.northover at gmail.com> wrote:> Actually, I really agree there. I considered it recently, but decided > to leave it as an intrinsic for now (the new IR expansion pass happens > after most optimisations so there wouldn't be much benefit, but if we > did it earlier and the mid-end understood what an ldrex/strex meant, I > could see code getting much better). > > Load linked would be fairly easy (perhaps even written as "load > linked", a minor extension to "load atomic"). Store conditional would > be a bigger change since stores don't return anything at the moment; > passes may not be expecting to have to ReplaceAllUses on them.The easiest solution would be to extend the cmpxchg instruction with a weak variant. It is then trivial to map load, modify, weak-cmpxchg to load-linked, modify, store-conditional (that is what weak cmpxchg was intended for in the C[++]11 memory model).> I'm hoping to have some more time to spend on atomics soon, after this > merge business is done. Perhaps then. > > I don't suppose you have any plans to port Mips to the IR-level LL/SC > expansion? Now that the infrastructure is present it's quite a > simplification (r206490 in ARM64 for example, though you need existing > target-specific intrinsics at the moment). It would be good to iron > out any ARM-specific assumptions I've made.I'd rather avoid it, because it doing it that late precludes a lot of optimisations that we're interested in. I'd much rather extend the IR to support them at a generic level. We have a couple of plans for variations of atomic operations in our architecture, so we'll likely end up trying and throwing away a few approaches over the next couple of years.> But it would still be a construct that probably just couldn't be used > on x86 efficiently, not really a step towards a target independent IR.On x86, we could map weak cmpxchg to the same thing as a strong cmpxchg, so it would still generate the same code. The same is true for all architectures with a non-blocking compare and exchange operation. David
Tim Northover
2014-May-10 17:14 UTC
[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities
> The easiest solution would be to extend the cmpxchg instruction with a > weak variant. It is then trivial to map load, modify, weak-cmpxchg to > load-linked, modify, store-conditional (that is what weak cmpxchg was > intended for in the C[++]11 memory model).That would certainly be the easiest. But you'd get less scope for optimising control flow around the instructions (say an early return on failure or something). I think quite a bit can be done if LLVM *really* knows what's going to be going on with these atomic ops on LL/SC architectures.>> I don't suppose you have any plans to port Mips to the IR-level LL/SC >> expansion? Now that the infrastructure is present it's quite a >> simplification (r206490 in ARM64 for example, though you need existing >> target-specific intrinsics at the moment). It would be good to iron >> out any ARM-specific assumptions I've made. > > I'd rather avoid it, because it doing it that late precludes a lot of optimisations > that we're interested in. I'd much rather extend the IR to support them at a > generic level.I think you might be misinterpreting what the change actually is. Currently the expansion happens post-ISel (emitAtomicBinary and friends building the control flow and MachineInstrs directly). This moves it to before ISel but still late in the pipeline (actually, you could even put it earlier: I didn't because of fears of opaque @llvm.arm.ldrex intrinsics pessimising mid-end optimisations). Strictly earlier than what happens now, and a reasonable stepping-stone to generic load-linked instructions or intrinsics. In my experience, CodeGen has improved with the change. ISelDAG gets to make use of more information when choosing how to do the operation: values already known to be sign/zero extended, immediates, etc. Tim.
David Chisnall
2014-May-10 17:29 UTC
[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities
On 10 May 2014, at 18:14, Tim Northover <t.p.northover at gmail.com> wrote:>> The easiest solution would be to extend the cmpxchg instruction with a >> weak variant. It is then trivial to map load, modify, weak-cmpxchg to >> load-linked, modify, store-conditional (that is what weak cmpxchg was >> intended for in the C[++]11 memory model). > > That would certainly be the easiest. But you'd get less scope for > optimising control flow around the instructions (say an early return > on failure or something). I think quite a bit can be done if LLVM > *really* knows what's going to be going on with these atomic ops on > LL/SC architectures.I am not sure of any transforms that we'd want to do that aren't microarchitecture-specific that need to know about the difference between ll-modify-sc and load-modify-weak-cmpxchg.>> I don't suppose you have any plans to port Mips to the IR-level LL/SC >>> expansion? Now that the infrastructure is present it's quite a >>> simplification (r206490 in ARM64 for example, though you need existing >>> target-specific intrinsics at the moment). It would be good to iron >>> out any ARM-specific assumptions I've made. >> >> I'd rather avoid it, because it doing it that late precludes a lot of optimisations >> that we're interested in. I'd much rather extend the IR to support them at a >> generic level. > > I think you might be misinterpreting what the change actually is. > Currently the expansion happens post-ISel (emitAtomicBinary and > friends building the control flow and MachineInstrs directly). > > This moves it to before ISel but still late in the pipeline (actually, > you could even put it earlier: I didn't because of fears of opaque > @llvm.arm.ldrex intrinsics pessimising mid-end optimisations). > Strictly earlier than what happens now, and a reasonable > stepping-stone to generic load-linked instructions or intrinsics.The problem is that the optimisations that we're most interested in should be done by the mid-level optimisers and are architecture agnostic.> In my experience, CodeGen has improved with the change. ISelDAG gets > to make use of more information when choosing how to do the operation: > values already known to be sign/zero extended, immediates, etc.Yes, it's definitely an improvement in the short term, but I'm not convinced by the approach in the long term. It's a useful hack that works around a shortcoming in the IR, not a solution. David