thr3ads.net - llvm dev - [LLVMdev] some superoptimizer results [Jul 2015]

If this information is useful, please help other people find it:
Share via:

John Regehr

2015-Jul-22 17:15 UTC

[LLVMdev] some superoptimizer results

We (the folks working on Souper) would appreciate any feedback on these 
IR-level superoptimizer results:

   http://blog.regehr.org/extra_files/souper-jul-15.html

My impression is that while there's clearly plenty of material in here 
that doesn't want to get implemented in an opt pass, there are a number 
of gems hiding in there that are worth implementing.

Blog post containing additional explanation and caveats is here:

   http://blog.regehr.org/archives/1252

Thanks!

John

escha

2015-Jul-22 17:54 UTC

head link

[LLVMdev] some superoptimizer results

> On Jul 22, 2015, at 10:15 AM, John Regehr <regehr at cs.utah.edu>
wrote:
> 
> We (the folks working on Souper) would appreciate any feedback on these
IR-level superoptimizer results:
> 
>  http://blog.regehr.org/extra_files/souper-jul-15.html
> 
> My impression is that while there's clearly plenty of material in here
that doesn't want to get implemented in an opt pass, there are a number of
gems hiding in there that are worth implementing.
> 
> Blog post containing additional explanation and caveats is here:
> 
>  http://blog.regehr.org/archives/1252
> 
> Thanks!
> 
> John
I was reading about these; it looks really interesting! One thing that I worry
about is that we want to make sure not to IR-transform into “weird constructs”
that might be very suboptimal on certain targets (and hard or impossible to
invert later); e.g. you noted that selects had cost 3 in your model, but I work
on a GPU where select_cc (fcmp/icmp + select) has cost 1. Doing sneaky bit math
to avoid a select might be good in some cases, but not others. Maybe it could
use target hooks of some sort so it only does transformations that makes sense
given a certain cost model?

—escha

Sean Silva

2015-Jul-22 19:24 UTC

head link

[LLVMdev] some superoptimizer results

On Wed, Jul 22, 2015 at 10:54 AM, escha <escha at apple.com> wrote:
>
> > On Jul 22, 2015, at 10:15 AM, John Regehr <regehr at
cs.utah.edu> wrote:
> >
> > We (the folks working on Souper) would appreciate any feedback on
these
> IR-level superoptimizer results:
> >
> >  http://blog.regehr.org/extra_files/souper-jul-15.html
> >
> > My impression is that while there's clearly plenty of material in
here
> that doesn't want to get implemented in an opt pass, there are a number
of
> gems hiding in there that are worth implementing.
> >
> > Blog post containing additional explanation and caveats is here:
> >
> >  http://blog.regehr.org/archives/1252
> >
> > Thanks!
> >
> > John
>
> I was reading about these; it looks really interesting! One thing that I
> worry about is that we want to make sure not to IR-transform into “weird
> constructs” that might be very suboptimal on certain targets (and hard or
> impossible to invert later); e.g. you noted that selects had cost 3 in your
> model, but I work on a GPU where select_cc (fcmp/icmp + select) has cost 1.
> Doing sneaky bit math to avoid a select might be good in some cases, but
> not others. Maybe it could use target hooks of some sort so it only does
> transformations that makes sense given a certain cost model?
>
Even on x86, materializing i1's residing in the condition flags is a pain
(e.g. to sext) but CMOVcc based on them is easy.

-- Sean Silva

>
> —escha
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/d651128b/attachment.html>

Sean Silva

2015-Jul-22 19:35 UTC

head link

[LLVMdev] some superoptimizer results

Are you taking into account critical path length? Because e.g. for:

%0:i64 = var
%1:i1 = slt 18446744073709551615:i64, %0
%2:i64 = subnsw 0:i64, %0
%3:i64 = select %1, %0, %2
infer %3
%4:i64 = ashr %0, 63:i64
%5:i64 = add %0, %4
%6:i64 = xor %5, %4
result %6

In the former case, the cmp and sub are independent, so can be executed in
parallel, while in the latter case all 3 instructions are dependent. So the
former case can execute in 2 cycles while the latter takes 3. Modern OoO
chips do in fact exploit this kind of thing.

-- Sean Silva


On Wed, Jul 22, 2015 at 10:15 AM, John Regehr <regehr at cs.utah.edu>
wrote:
> We (the folks working on Souper) would appreciate any feedback on these
> IR-level superoptimizer results:
>
>   http://blog.regehr.org/extra_files/souper-jul-15.html
>
> My impression is that while there's clearly plenty of material in here
> that doesn't want to get implemented in an opt pass, there are a number
of
> gems hiding in there that are worth implementing.
>
> Blog post containing additional explanation and caveats is here:
>
>   http://blog.regehr.org/archives/1252
>
> Thanks!
>
> John
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/d164ea11/attachment.html>

Hal Finkel

2015-Jul-22 19:54 UTC

head link

[LLVMdev] some superoptimizer results

One thing that is important to consider is where in the pipeline these kinds of
optimizations fit. We normally try to put the IR into a canonical simplified
form in the mid-level optimizer. This form is supposed to be whatever is most
useful for exposing other optimizations, and for lowering, but only in a generic
sense. We do have some optimizations near the end of our pipeline
(vectorization, partial unrolling, etc.) that consider target-specific
properties, but only because the alternative is doing those loop optimizations
after instruction selection.

Considering ILP and other pipeline-level costs are something we generally
consider only in the SelectionDAG and after. If these are IR optimizations, then
I'm not sure that considering ILP, etc. is the right metric -- so long as
the transformations are sufficiently reversible to allow of efficient lowering
afterward.

 -Hal

----- Original Message -----> From: "Sean Silva" <chisophugis at gmail.com>
> To: "John Regehr" <regehr at cs.utah.edu>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Wednesday, July 22, 2015 2:35:51 PM
> Subject: Re: [LLVMdev] some superoptimizer results
> 
> 
> 
> Are you taking into account critical path length? Because e.g. for:
> 
> 
> 
> %0:i64 = var
> %1:i1 = slt 18446744073709551615:i64, %0
> %2:i64 = subnsw 0:i64, %0
> %3:i64 = select %1, %0, %2
> infer %3
> %4:i64 = ashr %0, 63:i64
> %5:i64 = add %0, %4
> %6:i64 = xor %5, %4
> result %6
> 
> 
> In the former case, the cmp and sub are independent, so can be
> executed in parallel, while in the latter case all 3 instructions
> are dependent. So the former case can execute in 2 cycles while the
> latter takes 3. Modern OoO chips do in fact exploit this kind of
> thing.
> 
> 
> -- Sean Silva
> 
> 
> 
> 
> On Wed, Jul 22, 2015 at 10:15 AM, John Regehr < regehr at cs.utah.edu
>
> wrote:
> 
> 
> We (the folks working on Souper) would appreciate any feedback on
> these IR-level superoptimizer results:
> 
> http://blog.regehr.org/extra_files/souper-jul-15.html
> 
> My impression is that while there's clearly plenty of material in
> here that doesn't want to get implemented in an opt pass, there are
> a number of gems hiding in there that are worth implementing.
> 
> Blog post containing additional explanation and caveats is here:
> 
> http://blog.regehr.org/archives/1252
> 
> Thanks!
> 
> John
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Sean Silva

2015-Jul-23 00:15 UTC

head link

[LLVMdev] some superoptimizer results

I just noticed: most of the results in this batch seem to be about
exploiting `[zs]ext i1` having cost 1 in order to replace a select of cost
3.

Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to i1)
has cost 2 or 3?

-- Sean Silva

On Wed, Jul 22, 2015 at 10:15 AM, John Regehr <regehr at cs.utah.edu>
wrote:
> We (the folks working on Souper) would appreciate any feedback on these
> IR-level superoptimizer results:
>
>   http://blog.regehr.org/extra_files/souper-jul-15.html
>
> My impression is that while there's clearly plenty of material in here
> that doesn't want to get implemented in an opt pass, there are a number
of
> gems hiding in there that are worth implementing.
>
> Blog post containing additional explanation and caveats is here:
>
>   http://blog.regehr.org/archives/1252
>
> Thanks!
>
> John
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/4eccbf5a/attachment.html>

John Regehr

2015-Jul-23 01:30 UTC

head link

[LLVMdev] some superoptimizer results

> some cases, but not others. Maybe it could use target hooks of some sort 
> so it only does transformations that makes sense given a certain cost 
> model?
Yeah, I would love to explore that, when Souper is being invoked as a 
pass.

On the other hand, here I was only hoping to suggest some IR-level 
optimizations that might be useful...

John

John Regehr

2015-Jul-23 04:14 UTC

head link

[LLVMdev] some superoptimizer results

> Are you taking into account critical path length?
No, but of course that't not hard.  There are a lot of things that could 
be taken into account in the cost functions, but I was hoping to keep it 
really simple, so that the results would not be opaque.
> case all 3 instructions are dependent. So the former case can execute in 2
cycles while the latter takes
> 3. Modern OoO chips do in fact exploit this kind of thing.
Sure.  One way to deal with this would be to use the HW as its own model: 
run a lot of code fragments on real chips under a benchmark harness and 
then use the results to train a cost model.

John

John Regehr

2015-Jul-23 04:27 UTC

head link

[LLVMdev] some superoptimizer results

> I just noticed: most of the results in this batch seem to be about
exploiting `[zs]ext i1` having cost 1
> in order to replace a select of cost 3.
> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to i1)
has cost 2 or 3?
I tried this (or something quite similar) earlier and it caused Souper to 
introduce *a lot* of selects.  So the problem is that Souper's preferences 
become skewed too far in the other direction.

How about for the next run I give everything (except maybe div/rem) cost 
1?  Then the only thing Souper will tell us about is when it can eliminate 
instructions, which seems to be a generally desirable thing.

As I said in the blog post, in the next batch Souper will exploit known 
bits.  I'm excited to see what kind of results come out of Philip's 
value-tracking-dom-conditions.

John

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jul 2015 - [LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

Possibly Parallel Threads