thr3ads.net - llvm dev - [llvm-dev] Rotates, once again [May 2018]

If this information is useful, please help other people find it:
Share via:

John Regehr via llvm-dev

2018-May-16 20:20 UTC

[llvm-dev] Rotates, once again

On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
> An informal metric might be: if the operation is supported as a 
> primitive op or built-in in source languages and it is supported as a 
> single target instruction, can we guarantee that 1-to-1 translation 
> through optimization?
It seems perfectly reasonable for LLVM users to expect this to happen 
reliably.

I'd like to take a look at the other side of the equation: the cost of 
adding a new intrinsic in terms of teaching passes to see through it, so 
we don't lose optimizations that worked before the intrinsic was added.

For example, clearly ValueTracking needs a few lines added so that 
computeKnownBits and friends don't get stopped by a rotate. Anyone have 
a reasonably complete list of files that need similar changes?

John

Craig Topper via llvm-dev

2018-May-16 23:26 UTC

head link

[llvm-dev] Rotates, once again

What do we want do with masking of the amount on this proposed intrinsic.
Do we need an explicit AND to keep it in bounds? X86 can delete the AND
during isel since the hardware is well behaved for out of range values.
Hardware only masks to 5-bits for 8/16 bit rotates for the purpose of
flags, but the data will be modulo the bit width. Since we don't use the
flags from rotates we can remove the mask. But if the mask is explicit in
IR, then LICM might hoist it and isel won't see it to remove it.

~Craig


On Wed, May 16, 2018 at 1:21 PM John Regehr via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
>
> > An informal metric might be: if the operation is supported as a
> > primitive op or built-in in source languages and it is supported as a
> > single target instruction, can we guarantee that 1-to-1 translation
> > through optimization?
>
> It seems perfectly reasonable for LLVM users to expect this to happen
> reliably.
>
> I'd like to take a look at the other side of the equation: the cost of
> adding a new intrinsic in terms of teaching passes to see through it, so
> we don't lose optimizations that worked before the intrinsic was added.
>
> For example, clearly ValueTracking needs a few lines added so that
> computeKnownBits and friends don't get stopped by a rotate. Anyone have
> a reasonably complete list of files that need similar changes?
>
> John
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180516/fa70f8dc/attachment.html>

Sanjay Patel via llvm-dev

2018-May-17 17:14 UTC

head link

[llvm-dev] Rotates, once again

A rotate intrinsic should be relatively close in cost/complexity to the
existing bswap.

A grep of intrinsic::bswap says we'd probably add code in:
InstCombine
InstructionSimplify
ConstantFolding
DemandedBits
ValueTracking
VectorUtils
SelectionDAGBuilder

But I don't think it's fair to view those additions as pure added cost.
As
an example, consider that we have to add hacks to EarlyCSE to recognize
multi-IR-instruction min/max/abs patterns. Intrinsics just work as-is
there. So if you search for 'matchSelectPattern', you get an idea (I see
32
hits in 10 files) of the cost of *not* having intrinsics for those
operations that we've decided are not worthy of intrinsics.


On Wed, May 16, 2018 at 2:20 PM, John Regehr via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
>
> An informal metric might be: if the operation is supported as a primitive
>> op or built-in in source languages and it is supported as a single
target
>> instruction, can we guarantee that 1-to-1 translation through
optimization?
>>
>
> It seems perfectly reasonable for LLVM users to expect this to happen
> reliably.
>
> I'd like to take a look at the other side of the equation: the cost of
> adding a new intrinsic in terms of teaching passes to see through it, so we
> don't lose optimizations that worked before the intrinsic was added.
>
> For example, clearly ValueTracking needs a few lines added so that
> computeKnownBits and friends don't get stopped by a rotate. Anyone have
a
> reasonably complete list of files that need similar changes?
>
> John
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180517/ab4ec4a6/attachment.html>

John Regehr via llvm-dev

2018-May-17 23:23 UTC

head link

[llvm-dev] Rotates, once again

Thanks Sanjay!

At this point the cost/benefit tradeoff for rotate intrinsics seems 
pretty good.

John


On 05/17/2018 11:14 AM, Sanjay Patel wrote:> A rotate intrinsic should be relatively close in cost/complexity to the 
> existing bswap.
> 
> A grep of intrinsic::bswap says we'd probably add code in:
> InstCombine
> InstructionSimplify
> ConstantFolding
> DemandedBits
> ValueTracking
> VectorUtils
> SelectionDAGBuilder
> 
> But I don't think it's fair to view those additions as pure added
cost.
> As an example, consider that we have to add hacks to EarlyCSE to 
> recognize multi-IR-instruction min/max/abs patterns. Intrinsics just 
> work as-is there. So if you search for 'matchSelectPattern', you
get an
> idea (I see 32 hits in 10 files) of the cost of *not* having intrinsics 
> for those operations that we've decided are not worthy of intrinsics.
> 
> 
> On Wed, May 16, 2018 at 2:20 PM, John Regehr via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
> 
>     On 5/16/18 1:58 PM, Sanjay Patel via llvm-dev wrote:
> 
>         An informal metric might be: if the operation is supported as a
>         primitive op or built-in in source languages and it is supported
>         as a single target instruction, can we guarantee that 1-to-1
>         translation through optimization?
> 
> 
>     It seems perfectly reasonable for LLVM users to expect this to
>     happen reliably.
> 
>     I'd like to take a look at the other side of the equation: the cost
>     of adding a new intrinsic in terms of teaching passes to see through
>     it, so we don't lose optimizations that worked before the intrinsic
>     was added.
> 
>     For example, clearly ValueTracking needs a few lines added so that
>     computeKnownBits and friends don't get stopped by a rotate. Anyone
>     have a reasonably complete list of files that need similar changes?
> 
>     John
> 
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - May 2018 - Rotates, once again

[llvm-dev] Rotates, once again

[llvm-dev] Rotates, once again

[llvm-dev] Rotates, once again

[llvm-dev] Rotates, once again

Reasonably Related Threads