> I just noticed: most of the results in this batch seem to be about exploiting `[zs]ext i1` having cost 1 > in order to replace a select of cost 3. > Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to i1) has cost 2 or 3?I tried this (or something quite similar) earlier and it caused Souper to introduce *a lot* of selects. So the problem is that Souper's preferences become skewed too far in the other direction. How about for the next run I give everything (except maybe div/rem) cost 1? Then the only thing Souper will tell us about is when it can eliminate instructions, which seems to be a generally desirable thing. As I said in the blog post, in the next batch Souper will exploit known bits. I'm excited to see what kind of results come out of Philip's value-tracking-dom-conditions. John
On Wed, Jul 22, 2015 at 9:27 PM, John Regehr <regehr at cs.utah.edu> wrote:> I just noticed: most of the results in this batch seem to be about >> exploiting `[zs]ext i1` having cost 1 >> in order to replace a select of cost 3. >> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to >> i1) has cost 2 or 3? >> > > I tried this (or something quite similar) earlier and it caused Souper to > introduce *a lot* of selects. So the problem is that Souper's preferences > become skewed too far in the other direction. >Interesting. Do you happen to have some examples laying around?> > How about for the next run I give everything (except maybe div/rem) cost 1?That seems reasonable. Probably give mul >1 cost too due to the increased latency.> Then the only thing Souper will tell us about is when it can eliminate > instructions, which seems to be a generally desirable thing. >Except for decode-limited situations, in general decreasing the critical path length is more important than eliminating instructions. The critical path length is a target-independent lower bound on the maximum achievable execution speed; the number of instructions is not. (assuming our chips don't start doing significant algebraic simplifications on the fly, which seems like a safe bet) -- Sean Silva> > As I said in the blog post, in the next batch Souper will exploit known > bits. I'm excited to see what kind of results come out of Philip's > value-tracking-dom-conditions. > > John >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/0279c84f/attachment.html>
> Interesting. Do you happen to have some examples laying around?Sorry, no, I'll have to re-run. I felt that the select-heavy results were sort of humorous and clever at first, but then just annoying.> Except for decode-limited situations, in general decreasing the critical path length is more important > than eliminating instructions. The critical path length is a target-independent lower bound on the > maximum achievable execution speed; the number of instructions is not.Sounds easy enough. I'll give it a try. I've observed many situations where Souper can shorten the critical path but I hadn't really thought about making that into a top-level goal. Thanks, John
On 07/22/2015 09:27 PM, John Regehr wrote:>> I just noticed: most of the results in this batch seem to be about >> exploiting `[zs]ext i1` having cost 1 >> in order to replace a select of cost 3. >> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc >> to i1) has cost 2 or 3? > > I tried this (or something quite similar) earlier and it caused Souper > to introduce *a lot* of selects. So the problem is that Souper's > preferences become skewed too far in the other direction. > > How about for the next run I give everything (except maybe div/rem) > cost 1? Then the only thing Souper will tell us about is when it can > eliminate instructions, which seems to be a generally desirable thing. > > As I said in the blog post, in the next batch Souper will exploit > known bits. I'm excited to see what kind of results come out of > Philip's value-tracking-dom-conditions.Your timing for mentioning this is slightly ironic. I'm in the process of deciding whether I should just delete all the code in question. It's been in tree for a while now and I haven't been able to justify the time to make it ready for production. At this point, I'm not really seeing much remaining benefit from enabling the dominating condition work in ValueTracking. I've got one benchmark (one!) that shows a very slight improvement with the code enabled. There are also a number of crashing bugs with the option enabled which I haven't yet had time to track down. I suspect these are miscompiles. The good news is that all of them have the same "signature" so I suspect it's only one or two specific issues. In theory, these crashes could be hiding performance improvements, but I have little evidence of that. Unless someone else steps up to help drive this work, I'm likely to delete the code from tree for now. It's not a failed experiment per se, just one that I don't have the time to take to completion. I don't want the code in tree and bit rotting if it's not serving any purpose. For the purposes of documentation, let me summarize a few lessons learned: - The use list based implementation appears to be *very* slightly faster than the dom walk based one. Both implementations scale to 10s of thousands of (uses, blocks) with around a 2-3% compile time impact. The measurements collected were fairly noisy, so add 1-2% for possible measurement error. - Given the simplicity - both implementation and conceptually - I now think the dominance based approach should be preferred. It also has room for more performance tweaking and makes caching more obvious. - Increasing the depth of the known bits search at which dominating conditions are applied increases compile time without material improvement in result quality. - Enabling this seems to tickle a lot of unrelated bugs. Taking the time to triage those might be good purely from a robustness standpoint since in *theory* this is just returning a result the known bits could have figured out otherwise. - At least for me, the ability to reason about non-nullness is more important than the value of particular bits. If I were to start this from scratch, I'd probably start by making isKnownNonNull dominating condition aware. Given that starting place, a separate pass (i.e. not-instcomine) might make far more sense and help to decouple the code more. - I suspect that many of the cases found by this approach could be implemented as special cases. To use nullness as an example, recognizing the pattern { if (!p) p = malloc(X) } is much easier than handling general control flow. The general framework might be most useful in finding the special patterns that could be manually implemented. My suspicion is that handling the top-N cases will get 95% of the gain.> > John > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> Your timing for mentioning this is slightly ironic. I'm in the process of > deciding whether I should just delete all the code in question. It's been in > tree for a while now and I haven't been able to justify the time to make it > ready for production.Ouch, I'll try to get the next batch of Souper results done before you remove the code! Also if I have time I'll do runs with and without your code and then we'll have another data point about its efficacy.> - Enabling this seems to tickle a lot of unrelated bugs. Taking the time to > triage those might be good purely from a robustness standpoint since in > *theory* this is just returning a result the known bits could have figured > out otherwise.I would like to hear more. Do you have any suspicions about which pass you're tickling problems in? If you think it'll be helpful I can crank up Csmith with your option turned on and go looking for trouble. Love to see the lessons learned, people should always do that. John