thr3ads.net - llvm dev - [LLVMdev] some superoptimizer results [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2015-Jul-22 20:49 UTC

[LLVMdev] some superoptimizer results

On 07/22/2015 01:28 PM, Sean Silva wrote:>
>
> On Wed, Jul 22, 2015 at 12:54 PM, Hal Finkel <hfinkel at anl.gov 
> <mailto:hfinkel at anl.gov>> wrote:
>
>     One thing that is important to consider is where in the pipeline
>     these kinds of optimizations fit. We normally try to put the IR
>     into a canonical simplified form in the mid-level optimizer. This
>     form is supposed to be whatever is most useful for exposing other
>     optimizations, and for lowering, but only in a generic sense. We
>     do have some optimizations near the end of our pipeline
>     (vectorization, partial unrolling, etc.) that consider
>     target-specific properties, but only because the alternative is
>     doing those loop optimizations after instruction selection.
>
>     Considering ILP and other pipeline-level costs are something we
>     generally consider only in the SelectionDAG and after. If these
>     are IR optimizations, then I'm not sure that considering ILP, etc.
>     is the right metric -- so long as the transformations are
>     sufficiently reversible to allow of efficient lowering afterward.
>
>
> Agreed. It might just be that these initial results are from the 
> "burn-in" specifically targeting short simple sequences, but most
of
> the transformations in the link seem to be things that, if applicable, 
> we would want to do in the backend instead of in the middle-end.Looking through the items, I see a number which are suitable for mid 
level canonicalization.  For example, the two for converting and/cmps 
into truncs seem like good candidates.  We need to make sure to apply 
judgement here, but not *all* of these are backend specific.


>
> -- Sean Silva
>
>
>      -Hal
>
>     ----- Original Message -----
>     > From: "Sean Silva" <chisophugis at gmail.com
>     <mailto:chisophugis at gmail.com>>
>     > To: "John Regehr" <regehr at cs.utah.edu
<mailto:regehr at cs.utah.edu>>
>     > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>
>     > Sent: Wednesday, July 22, 2015 2:35:51 PM
>     > Subject: Re: [LLVMdev] some superoptimizer results
>     >
>     >
>     >
>     > Are you taking into account critical path length? Because e.g.
for:
>     >
>     >
>     >
>     > %0:i64 = var
>     > %1:i1 = slt 18446744073709551615:i64, %0
>     > %2:i64 = subnsw 0:i64, %0
>     > %3:i64 = select %1, %0, %2
>     > infer %3
>     > %4:i64 = ashr %0, 63:i64
>     > %5:i64 = add %0, %4
>     > %6:i64 = xor %5, %4
>     > result %6
>     >
>     >
>     > In the former case, the cmp and sub are independent, so can be
>     > executed in parallel, while in the latter case all 3 instructions
>     > are dependent. So the former case can execute in 2 cycles while
the
>     > latter takes 3. Modern OoO chips do in fact exploit this kind of
>     > thing.
>     >
>     >
>     > -- Sean Silva
>     >
>     >
>     >
>     >
>     > On Wed, Jul 22, 2015 at 10:15 AM, John Regehr <
>     regehr at cs.utah.edu <mailto:regehr at cs.utah.edu> >
>     > wrote:
>     >
>     >
>     > We (the folks working on Souper) would appreciate any feedback on
>     > these IR-level superoptimizer results:
>     >
>     > http://blog.regehr.org/extra_files/souper-jul-15.html
>    
<https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.regehr.org_extra-5Ffiles_souper-2Djul-2D15.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=Zws35DV5cT6VNt0_Wc8MbZqd8-UXh4q602LshCf-frE&s=2cBkApfDIxWcm7AqVD7-qdjeHG3okw8lDLBn_URJO2s&e=>
>     >
>     > My impression is that while there's clearly plenty of material
in
>     > here that doesn't want to get implemented in an opt pass,
there are
>     > a number of gems hiding in there that are worth implementing.
>     >
>     > Blog post containing additional explanation and caveats is here:
>     >
>     > http://blog.regehr.org/archives/1252
>    
<https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.regehr.org_archives_1252&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=Zws35DV5cT6VNt0_Wc8MbZqd8-UXh4q602LshCf-frE&s=Ek-DIAbJ7gqXWn_mfOpgfQE3i2dh1D_5peAOqtO1oYc&e=>
>     >
>     > Thanks!
>     >
>     > John
>     >
>     > _______________________________________________
>     > LLVM Developers mailing list
>     >LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>     >
>     >
>     > _______________________________________________
>     > LLVM Developers mailing list
>     > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>     >
>
>     --
>     Hal Finkel
>     Assistant Computational Scientist
>     Leadership Computing Facility
>     Argonne National Laboratory
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/6ea2f36b/attachment.html>

Philip Reames

2015-Jul-22 21:52 UTC

head link

[LLVMdev] some superoptimizer results

On 07/22/2015 01:49 PM, Philip Reames wrote:>
>
> On 07/22/2015 01:28 PM, Sean Silva wrote:
>>
>>
>> On Wed, Jul 22, 2015 at 12:54 PM, Hal Finkel <hfinkel at anl.gov 
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>     One thing that is important to consider is where in the pipeline
>>     these kinds of optimizations fit. We normally try to put the IR
>>     into a canonical simplified form in the mid-level optimizer. This
>>     form is supposed to be whatever is most useful for exposing other
>>     optimizations, and for lowering, but only in a generic sense. We
>>     do have some optimizations near the end of our pipeline
>>     (vectorization, partial unrolling, etc.) that consider
>>     target-specific properties, but only because the alternative is
>>     doing those loop optimizations after instruction selection.
>>
>>     Considering ILP and other pipeline-level costs are something we
>>     generally consider only in the SelectionDAG and after. If these
>>     are IR optimizations, then I'm not sure that considering ILP,
>>     etc. is the right metric -- so long as the transformations are
>>     sufficiently reversible to allow of efficient lowering afterward.
>>
>>
>> Agreed. It might just be that these initial results are from the 
>> "burn-in" specifically targeting short simple sequences, but
most of
>> the transformations in the link seem to be things that, if 
>> applicable, we would want to do in the backend instead of in the 
>> middle-end.
> Looking through the items, I see a number which are suitable for mid 
> level canonicalization.  For example, the two for converting and/cmps 
> into truncs seem like good candidates.  We need to make sure to apply 
> judgement here, but not *all* of these are backend specific.I took a shot at implementing this and it looks like instcombine is 
actually canonicalizing in the opposite direction.  All truncs to i1 are 
being converted to an and/cmp pattern.  Anyone know why this is?  I 
expected that as a lowering in selectiondag, but doing in InstCombine 
seems too early.... I guess I can see an argument that we're not 
actually loosing any information (i.e. ComputeKnownBits will handle the 
and/cmp just fine), but this still seems a bit surprising.  Anyone have 
thoughts here?

Anyways, at least those half dozen items from the list appear to be 
WONTFIX.  :)>
>
>
>>
>> -- Sean Silva
>>
>>
>>      -Hal
>>
>>     ----- Original Message -----
>>     > From: "Sean Silva" <chisophugis at gmail.com
>>     <mailto:chisophugis at gmail.com>>
>>     > To: "John Regehr" <regehr at cs.utah.edu
<mailto:regehr at cs.utah.edu>>
>>     > Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at
cs.uiuc.edu>
>>     > Sent: Wednesday, July 22, 2015 2:35:51 PM
>>     > Subject: Re: [LLVMdev] some superoptimizer results
>>     >
>>     >
>>     >
>>     > Are you taking into account critical path length? Because e.g.
for:
>>     >
>>     >
>>     >
>>     > %0:i64 = var
>>     > %1:i1 = slt 18446744073709551615:i64, %0
>>     > %2:i64 = subnsw 0:i64, %0
>>     > %3:i64 = select %1, %0, %2
>>     > infer %3
>>     > %4:i64 = ashr %0, 63:i64
>>     > %5:i64 = add %0, %4
>>     > %6:i64 = xor %5, %4
>>     > result %6
>>     >
>>     >
>>     > In the former case, the cmp and sub are independent, so can be
>>     > executed in parallel, while in the latter case all 3
instructions
>>     > are dependent. So the former case can execute in 2 cycles
while the
>>     > latter takes 3. Modern OoO chips do in fact exploit this kind
of
>>     > thing.
>>     >
>>     >
>>     > -- Sean Silva
>>     >
>>     >
>>     >
>>     >
>>     > On Wed, Jul 22, 2015 at 10:15 AM, John Regehr <
>>     regehr at cs.utah.edu <mailto:regehr at cs.utah.edu> >
>>     > wrote:
>>     >
>>     >
>>     > We (the folks working on Souper) would appreciate any feedback
on
>>     > these IR-level superoptimizer results:
>>     >
>>     > http://blog.regehr.org/extra_files/souper-jul-15.html
>>    
<https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.regehr.org_extra-5Ffiles_souper-2Djul-2D15.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=Zws35DV5cT6VNt0_Wc8MbZqd8-UXh4q602LshCf-frE&s=2cBkApfDIxWcm7AqVD7-qdjeHG3okw8lDLBn_URJO2s&e=>
>>     >
>>     > My impression is that while there's clearly plenty of
material in
>>     > here that doesn't want to get implemented in an opt pass,
there are
>>     > a number of gems hiding in there that are worth implementing.
>>     >
>>     > Blog post containing additional explanation and caveats is
here:
>>     >
>>     > http://blog.regehr.org/archives/1252
>>    
<https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.regehr.org_archives_1252&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=Zws35DV5cT6VNt0_Wc8MbZqd8-UXh4q602LshCf-frE&s=Ek-DIAbJ7gqXWn_mfOpgfQE3i2dh1D_5peAOqtO1oYc&e=>
>>     >
>>     > Thanks!
>>     >
>>     > John
>>     >
>>     > _______________________________________________
>>     > LLVM Developers mailing list
>>     >LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>     > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>     >
>>     >
>>     > _______________________________________________
>>     > LLVM Developers mailing list
>>     > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>     http://llvm.cs.uiuc.edu
>>     > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>     >
>>
>>     --
>>     Hal Finkel
>>     Assistant Computational Scientist
>>     Leadership Computing Facility
>>     Argonne National Laboratory
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu          http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/76d11290/attachment.html>

Sean Silva

2015-Jul-22 23:34 UTC

head link

[LLVMdev] some superoptimizer results

On Wed, Jul 22, 2015 at 1:49 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>
>
> On 07/22/2015 01:28 PM, Sean Silva wrote:
>
>
>
> On Wed, Jul 22, 2015 at 12:54 PM, Hal Finkel <hfinkel at anl.gov>
wrote:
>
>> One thing that is important to consider is where in the pipeline these
>> kinds of optimizations fit. We normally try to put the IR into a
canonical
>> simplified form in the mid-level optimizer. This form is supposed to be
>> whatever is most useful for exposing other optimizations, and for
lowering,
>> but only in a generic sense. We do have some optimizations near the end
of
>> our pipeline (vectorization, partial unrolling, etc.) that consider
>> target-specific properties, but only because the alternative is doing
those
>> loop optimizations after instruction selection.
>>
>> Considering ILP and other pipeline-level costs are something we
generally
>> consider only in the SelectionDAG and after. If these are IR
optimizations,
>> then I'm not sure that considering ILP, etc. is the right metric --
so long
>> as the transformations are sufficiently reversible to allow of
efficient
>> lowering afterward.
>>
>
>  Agreed. It might just be that these initial results are from the
> "burn-in" specifically targeting short simple sequences, but most
of the
> transformations in the link seem to be things that, if applicable, we would
> want to do in the backend instead of in the middle-end.
>
> Looking through the items, I see a number which are suitable for mid level
> canonicalization.  For example, the two for converting and/cmps into truncs
> seem like good candidates.  We need to make sure to apply judgement here,
> but not *all* of these are backend specific.
>
Hence "most". From the blog post it seems like this batch is
deliberately
obtained by running Souper on particularly short sequences; I'm looking
forward to seeing what it does with longer sequences, especially ones
containing "phis and path conditions".

-- Sean Silva

>
>
>
>
>  -- Sean Silva
>
>
>>
>>  -Hal
>>
>> ----- Original Message -----
>> > From: "Sean Silva" <chisophugis at gmail.com>
>> > To: "John Regehr" <regehr at cs.utah.edu>
>> > Cc: llvmdev at cs.uiuc.edu
>> > Sent: Wednesday, July 22, 2015 2:35:51 PM
>> > Subject: Re: [LLVMdev] some superoptimizer results
>> >
>> >
>> >
>> > Are you taking into account critical path length? Because e.g.
for:
>> >
>> >
>> >
>> > %0:i64 = var
>> > %1:i1 = slt 18446744073709551615:i64, %0
>> > %2:i64 = subnsw 0:i64, %0
>> > %3:i64 = select %1, %0, %2
>> > infer %3
>> > %4:i64 = ashr %0, 63:i64
>> > %5:i64 = add %0, %4
>> > %6:i64 = xor %5, %4
>> > result %6
>> >
>> >
>> > In the former case, the cmp and sub are independent, so can be
>> > executed in parallel, while in the latter case all 3 instructions
>> > are dependent. So the former case can execute in 2 cycles while
the
>> > latter takes 3. Modern OoO chips do in fact exploit this kind of
>> > thing.
>> >
>> >
>> > -- Sean Silva
>> >
>> >
>> >
>> >
>> > On Wed, Jul 22, 2015 at 10:15 AM, John Regehr < regehr at
cs.utah.edu >
>> > wrote:
>> >
>> >
>> > We (the folks working on Souper) would appreciate any feedback on
>> > these IR-level superoptimizer results:
>> >
>> > http://blog.regehr.org/extra_files/souper-jul-15.html
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.regehr.org_extra-5Ffiles_souper-2Djul-2D15.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=Zws35DV5cT6VNt0_Wc8MbZqd8-UXh4q602LshCf-frE&s=2cBkApfDIxWcm7AqVD7-qdjeHG3okw8lDLBn_URJO2s&e=>
>> >
>> > My impression is that while there's clearly plenty of material
in
>> > here that doesn't want to get implemented in an opt pass,
there are
>> > a number of gems hiding in there that are worth implementing.
>> >
>> > Blog post containing additional explanation and caveats is here:
>> >
>> > http://blog.regehr.org/archives/1252
>>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.regehr.org_archives_1252&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=Zws35DV5cT6VNt0_Wc8MbZqd8-UXh4q602LshCf-frE&s=Ek-DIAbJ7gqXWn_mfOpgfQE3i2dh1D_5peAOqtO1oYc&e=>
>> >
>> > Thanks!
>> >
>> > John
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>>  > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>>  --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/1f0ab8a5/attachment.html>

John Regehr

2015-Jul-23 04:19 UTC

head link

[LLVMdev] some superoptimizer results

> Hence "most". From the blog post it seems like this batch is
deliberately obtained by running Souper on
> particularly short sequences; I'm looking forward to seeing what it
does with longer sequences,
> especially ones containing "phis and path conditions".
I can easily provide these (and will) but I have been concerned that 
Souper's output can be really hard to understand.  I was hoping that these 
shallow expressions would provide you folks with the right kind of 
information.

In any case, my only goal here is to do whatever is most broadly useful to 
LLVM.  I suspect a fair amount of iteration will be required.

I might work on a backend superoptmizer next.  I was not interested in 
doign that one first since it sounds somewhat more complex.

John

llvm dev - Jul 2015 - [LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results