thr3ads.net - llvm dev - [LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities [May 2014]

If this information is useful, please help other people find it:
Share via:

David Chisnall

2014-May-10 16:38 UTC

[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities

On 10 May 2014, at 16:18, Tim Northover <t.p.northover at gmail.com>
wrote:
> Actually, I really agree there. I considered it recently, but decided
> to leave it as an intrinsic for now (the new IR expansion pass happens
> after most optimisations so there wouldn't be much benefit, but if we
> did it earlier and the mid-end understood what an ldrex/strex meant, I
> could see code getting much better).
> 
> Load linked would be fairly easy (perhaps even written as "load
> linked", a minor extension to "load atomic"). Store
conditional would
> be a bigger change since stores don't return anything at the moment;
> passes may not be expecting to have to ReplaceAllUses on them.
The easiest solution would be to extend the cmpxchg instruction with a weak
variant.  It is then trivial to map load, modify, weak-cmpxchg to load-linked,
modify, store-conditional (that is what weak cmpxchg was intended for in the
C[++]11 memory model).
> I'm hoping to have some more time to spend on atomics soon, after this
> merge business is done. Perhaps then.
> 
> I don't suppose you have any plans to port Mips to the IR-level LL/SC
> expansion? Now that the infrastructure is present it's quite a
> simplification (r206490 in ARM64 for example, though you need existing
> target-specific intrinsics at the moment). It would be good to iron
> out any ARM-specific assumptions I've made.
I'd rather avoid it, because it doing it that late precludes a lot of
optimisations that we're interested in.  I'd much rather extend the IR
to support them at a generic level.

We have a couple of plans for variations of atomic operations in our
architecture, so we'll likely end up trying and throwing away a few
approaches over the next couple of years.
> But it would still be a construct that probably just couldn't be used
> on x86 efficiently, not really a step towards a target independent IR.
On x86, we could map weak cmpxchg to the same thing as a strong cmpxchg, so it
would still generate the same code.  The same is true for all architectures with
a non-blocking compare and exchange operation.

David

Tim Northover

2014-May-10 17:14 UTC

head link

[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities

> The easiest solution would be to extend the cmpxchg instruction with a
> weak variant.  It is then trivial to map load, modify, weak-cmpxchg to
> load-linked, modify, store-conditional (that is what weak cmpxchg was
> intended for in the C[++]11 memory model).
That would certainly be the easiest. But you'd get less scope for
optimising control flow around the instructions (say an early return
on failure or something). I think quite a bit can be done if LLVM
*really* knows what's going to be going on with these atomic ops on
LL/SC architectures.
>> I don't suppose you have any plans to port Mips to the IR-level
LL/SC
>> expansion? Now that the infrastructure is present it's quite a
>> simplification (r206490 in ARM64 for example, though you need existing
>> target-specific intrinsics at the moment). It would be good to iron
>> out any ARM-specific assumptions I've made.
>
> I'd rather avoid it, because it doing it that late precludes a lot of
optimisations
> that we're interested in.  I'd much rather extend the IR to support
them at a
> generic level.
I think you might be misinterpreting what the change actually is.
Currently the expansion happens post-ISel (emitAtomicBinary and
friends building the control flow and MachineInstrs directly).

This moves it to before ISel but still late in the pipeline (actually,
you could even put it earlier: I didn't because of fears of opaque
@llvm.arm.ldrex intrinsics pessimising mid-end optimisations).
Strictly earlier than what happens now, and a reasonable
stepping-stone to generic load-linked instructions or intrinsics.

In my experience, CodeGen has improved with the change. ISelDAG gets
to make use of more information when choosing how to do the operation:
values already known to be sign/zero extended, immediates, etc.

Tim.

David Chisnall

2014-May-10 17:29 UTC

head link

[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities

On 10 May 2014, at 18:14, Tim Northover <t.p.northover at gmail.com>
wrote:
>> The easiest solution would be to extend the cmpxchg instruction with a
>> weak variant.  It is then trivial to map load, modify, weak-cmpxchg to
>> load-linked, modify, store-conditional (that is what weak cmpxchg was
>> intended for in the C[++]11 memory model).
> 
> That would certainly be the easiest. But you'd get less scope for
> optimising control flow around the instructions (say an early return
> on failure or something). I think quite a bit can be done if LLVM
> *really* knows what's going to be going on with these atomic ops on
> LL/SC architectures.
I am not sure of any transforms that we'd want to do that aren't
microarchitecture-specific that need to know about the difference between
ll-modify-sc and load-modify-weak-cmpxchg.
>> I don't suppose you have any plans to port Mips to the IR-level
LL/SC
>>> expansion? Now that the infrastructure is present it's quite a
>>> simplification (r206490 in ARM64 for example, though you need
existing
>>> target-specific intrinsics at the moment). It would be good to iron
>>> out any ARM-specific assumptions I've made.
>> 
>> I'd rather avoid it, because it doing it that late precludes a lot
of optimisations
>> that we're interested in.  I'd much rather extend the IR to
support them at a
>> generic level.
> 
> I think you might be misinterpreting what the change actually is.
> Currently the expansion happens post-ISel (emitAtomicBinary and
> friends building the control flow and MachineInstrs directly).
> 
> This moves it to before ISel but still late in the pipeline (actually,
> you could even put it earlier: I didn't because of fears of opaque
> @llvm.arm.ldrex intrinsics pessimising mid-end optimisations).
> Strictly earlier than what happens now, and a reasonable
> stepping-stone to generic load-linked instructions or intrinsics.
The problem is that the optimisations that we're most interested in should
be done by the mid-level optimisers and are architecture agnostic.
> In my experience, CodeGen has improved with the change. ISelDAG gets
> to make use of more information when choosing how to do the operation:
> values already known to be sign/zero extended, immediates, etc.
Yes, it's definitely an improvement in the short term, but I'm not
convinced by the approach in the long term.  It's a useful hack that works
around a shortcoming in the IR, not a solution.

David

llvm dev - May 2014 - [LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities

[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities

[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities

[LLVMdev] Replacing Platform Specific IR Codes with Generic Implementation and Introducing Macro Facilities