thr3ads.net - llvm dev - [LLVMdev] some superoptimizer results [Jul 2015]

If this information is useful, please help other people find it:
Share via:

John Regehr

2015-Jul-23 04:27 UTC

[LLVMdev] some superoptimizer results

> I just noticed: most of the results in this batch seem to be about
exploiting `[zs]ext i1` having cost 1
> in order to replace a select of cost 3.
> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to i1)
has cost 2 or 3?
I tried this (or something quite similar) earlier and it caused Souper to 
introduce *a lot* of selects.  So the problem is that Souper's preferences 
become skewed too far in the other direction.

How about for the next run I give everything (except maybe div/rem) cost 
1?  Then the only thing Souper will tell us about is when it can eliminate 
instructions, which seems to be a generally desirable thing.

As I said in the blog post, in the next batch Souper will exploit known 
bits.  I'm excited to see what kind of results come out of Philip's 
value-tracking-dom-conditions.

John

Sean Silva

2015-Jul-23 06:09 UTC

head link

[LLVMdev] some superoptimizer results

On Wed, Jul 22, 2015 at 9:27 PM, John Regehr <regehr at cs.utah.edu>
wrote:
> I just noticed: most of the results in this batch seem to be about
>> exploiting `[zs]ext i1` having cost 1
>> in order to replace a select of cost 3.
>> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to
>> i1) has cost 2 or 3?
>>
>
> I tried this (or something quite similar) earlier and it caused Souper to
> introduce *a lot* of selects.  So the problem is that Souper's
preferences
> become skewed too far in the other direction.
>
Interesting. Do you happen to have some examples laying around?

>
> How about for the next run I give everything (except maybe div/rem) cost 1?

That seems reasonable. Probably give mul >1 cost too due to the increased
latency.

>   Then the only thing Souper will tell us about is when it can eliminate
> instructions, which seems to be a generally desirable thing.
>
Except for decode-limited situations, in general decreasing the critical
path length is more important than eliminating instructions. The critical
path length is a target-independent lower bound on the maximum achievable
execution speed; the number of instructions is not.

(assuming our chips don't start doing significant algebraic simplifications
on the fly, which seems like a safe bet)

-- Sean Silva

>
> As I said in the blog post, in the next batch Souper will exploit known
> bits.  I'm excited to see what kind of results come out of Philip's
> value-tracking-dom-conditions.
>
> John
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/0279c84f/attachment.html>

John Regehr

2015-Jul-23 06:33 UTC

head link

[LLVMdev] some superoptimizer results

> Interesting. Do you happen to have some examples laying around?
Sorry, no, I'll have to re-run. I felt that the select-heavy results were 
sort of humorous and clever at first, but then just annoying.
> Except for decode-limited situations, in general decreasing the critical
path length is more important
> than eliminating instructions. The critical path length is a
target-independent lower bound on the
> maximum achievable execution speed; the number of instructions is not.
Sounds easy enough. I'll give it a try. I've observed many situations 
where Souper can shorten the critical path but I hadn't really thought 
about making that into a top-level goal.

Thanks,

John

Philip Reames

2015-Jul-23 17:52 UTC

head link

[LLVMdev] some superoptimizer results

On 07/22/2015 09:27 PM, John Regehr wrote:>> I just noticed: most of the results in this batch seem to be about 
>> exploiting `[zs]ext i1` having cost 1
>> in order to replace a select of cost 3.
>> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc 
>> to i1) has cost 2 or 3?
>
> I tried this (or something quite similar) earlier and it caused Souper 
> to introduce *a lot* of selects.  So the problem is that Souper's 
> preferences become skewed too far in the other direction.
>
> How about for the next run I give everything (except maybe div/rem) 
> cost 1?  Then the only thing Souper will tell us about is when it can 
> eliminate instructions, which seems to be a generally desirable thing.
>
> As I said in the blog post, in the next batch Souper will exploit 
> known bits.  I'm excited to see what kind of results come out of 
> Philip's value-tracking-dom-conditions.Your timing for mentioning this is slightly ironic.  I'm in the process 
of deciding whether I should just delete all the code in question.  It's 
been in tree for a while now and I haven't been able to justify the time 
to make it ready for production.

At this point, I'm not really seeing much remaining benefit from 
enabling the dominating condition work in ValueTracking.  I've got one 
benchmark (one!) that shows a very slight improvement with the code 
enabled.  There are also a number of crashing bugs with the option 
enabled which I haven't yet had time to track down.  I suspect these are 
miscompiles.  The good news is that all of them have the same 
"signature" so I suspect it's only one or two specific issues.  In
theory, these crashes could be hiding performance improvements, but I 
have little evidence of that.

Unless someone else steps up to help drive this work, I'm likely to 
delete the code from tree for now.  It's not a failed experiment per se, 
just one that I don't have the time to take to completion.  I don't want
the code in tree and bit rotting if it's not serving any purpose.

For the purposes of documentation, let me summarize a few lessons learned:
- The use list based implementation appears to be *very* slightly faster 
than the dom walk based one.  Both implementations scale to 10s of 
thousands of (uses, blocks) with around a 2-3% compile time impact.  The 
measurements collected were fairly noisy, so add 1-2% for possible 
measurement error.
- Given the simplicity - both implementation and conceptually - I now 
think the dominance based approach should be preferred.  It also has 
room for more performance tweaking and makes caching more obvious.
- Increasing the depth of the known bits search at which dominating 
conditions are applied increases compile time without material 
improvement in result quality.
- Enabling this seems to tickle a lot of unrelated bugs.  Taking the 
time to triage those might be good purely from a robustness standpoint 
since in *theory* this is just returning a result the known bits could 
have figured out otherwise.
- At least for me, the ability to reason about non-nullness is more 
important than the value of particular bits.  If I were to start this 
from scratch, I'd probably start by making isKnownNonNull dominating 
condition aware.  Given that starting place, a separate pass (i.e. 
not-instcomine) might make far more sense and help to decouple the code 
more.
- I suspect that many of the cases found by this approach could be 
implemented as special cases.  To use nullness as an example, 
recognizing the pattern { if (!p) p = malloc(X) } is much easier than 
handling general control flow.  The general framework might be most 
useful in finding the special patterns that could be manually 
implemented.  My suspicion is that handling the top-N cases will get 95% 
of the gain.

>
> John
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

John Regehr

2015-Jul-23 20:24 UTC

head link

[LLVMdev] some superoptimizer results

> Your timing for mentioning this is slightly ironic.  I'm in the process
of
> deciding whether I should just delete all the code in question.  It's
been in
> tree for a while now and I haven't been able to justify the time to
make it
> ready for production.
Ouch, I'll try to get the next batch of Souper results done before you 
remove the code!  Also if I have time I'll do runs with and without your 
code and then we'll have another data point about its efficacy.
> - Enabling this seems to tickle a lot of unrelated bugs.  Taking the time
to
> triage those might be good purely from a robustness standpoint since in 
> *theory* this is just returning a result the known bits could have figured 
> out otherwise.
I would like to hear more. Do you have any suspicions about which pass 
you're tickling problems in? If you think it'll be helpful I can crank
up
Csmith with your option turned on and go looking for trouble.

Love to see the lessons learned, people should always do that.

John

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jul 2015 - [LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

[LLVMdev] some superoptimizer results

Possibly Parallel Threads