We (the folks working on Souper) would appreciate any feedback on these IR-level superoptimizer results: http://blog.regehr.org/extra_files/souper-jul-15.html My impression is that while there's clearly plenty of material in here that doesn't want to get implemented in an opt pass, there are a number of gems hiding in there that are worth implementing. Blog post containing additional explanation and caveats is here: http://blog.regehr.org/archives/1252 Thanks! John
> On Jul 22, 2015, at 10:15 AM, John Regehr <regehr at cs.utah.edu> wrote: > > We (the folks working on Souper) would appreciate any feedback on these IR-level superoptimizer results: > > http://blog.regehr.org/extra_files/souper-jul-15.html > > My impression is that while there's clearly plenty of material in here that doesn't want to get implemented in an opt pass, there are a number of gems hiding in there that are worth implementing. > > Blog post containing additional explanation and caveats is here: > > http://blog.regehr.org/archives/1252 > > Thanks! > > JohnI was reading about these; it looks really interesting! One thing that I worry about is that we want to make sure not to IR-transform into “weird constructs” that might be very suboptimal on certain targets (and hard or impossible to invert later); e.g. you noted that selects had cost 3 in your model, but I work on a GPU where select_cc (fcmp/icmp + select) has cost 1. Doing sneaky bit math to avoid a select might be good in some cases, but not others. Maybe it could use target hooks of some sort so it only does transformations that makes sense given a certain cost model? —escha
On Wed, Jul 22, 2015 at 10:54 AM, escha <escha at apple.com> wrote:> > > On Jul 22, 2015, at 10:15 AM, John Regehr <regehr at cs.utah.edu> wrote: > > > > We (the folks working on Souper) would appreciate any feedback on these > IR-level superoptimizer results: > > > > http://blog.regehr.org/extra_files/souper-jul-15.html > > > > My impression is that while there's clearly plenty of material in here > that doesn't want to get implemented in an opt pass, there are a number of > gems hiding in there that are worth implementing. > > > > Blog post containing additional explanation and caveats is here: > > > > http://blog.regehr.org/archives/1252 > > > > Thanks! > > > > John > > I was reading about these; it looks really interesting! One thing that I > worry about is that we want to make sure not to IR-transform into “weird > constructs” that might be very suboptimal on certain targets (and hard or > impossible to invert later); e.g. you noted that selects had cost 3 in your > model, but I work on a GPU where select_cc (fcmp/icmp + select) has cost 1. > Doing sneaky bit math to avoid a select might be good in some cases, but > not others. Maybe it could use target hooks of some sort so it only does > transformations that makes sense given a certain cost model? >Even on x86, materializing i1's residing in the condition flags is a pain (e.g. to sext) but CMOVcc based on them is easy. -- Sean Silva> > —escha > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/d651128b/attachment.html>
Are you taking into account critical path length? Because e.g. for: %0:i64 = var %1:i1 = slt 18446744073709551615:i64, %0 %2:i64 = subnsw 0:i64, %0 %3:i64 = select %1, %0, %2 infer %3 %4:i64 = ashr %0, 63:i64 %5:i64 = add %0, %4 %6:i64 = xor %5, %4 result %6 In the former case, the cmp and sub are independent, so can be executed in parallel, while in the latter case all 3 instructions are dependent. So the former case can execute in 2 cycles while the latter takes 3. Modern OoO chips do in fact exploit this kind of thing. -- Sean Silva On Wed, Jul 22, 2015 at 10:15 AM, John Regehr <regehr at cs.utah.edu> wrote:> We (the folks working on Souper) would appreciate any feedback on these > IR-level superoptimizer results: > > http://blog.regehr.org/extra_files/souper-jul-15.html > > My impression is that while there's clearly plenty of material in here > that doesn't want to get implemented in an opt pass, there are a number of > gems hiding in there that are worth implementing. > > Blog post containing additional explanation and caveats is here: > > http://blog.regehr.org/archives/1252 > > Thanks! > > John > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/d164ea11/attachment.html>
One thing that is important to consider is where in the pipeline these kinds of optimizations fit. We normally try to put the IR into a canonical simplified form in the mid-level optimizer. This form is supposed to be whatever is most useful for exposing other optimizations, and for lowering, but only in a generic sense. We do have some optimizations near the end of our pipeline (vectorization, partial unrolling, etc.) that consider target-specific properties, but only because the alternative is doing those loop optimizations after instruction selection. Considering ILP and other pipeline-level costs are something we generally consider only in the SelectionDAG and after. If these are IR optimizations, then I'm not sure that considering ILP, etc. is the right metric -- so long as the transformations are sufficiently reversible to allow of efficient lowering afterward. -Hal ----- Original Message -----> From: "Sean Silva" <chisophugis at gmail.com> > To: "John Regehr" <regehr at cs.utah.edu> > Cc: llvmdev at cs.uiuc.edu > Sent: Wednesday, July 22, 2015 2:35:51 PM > Subject: Re: [LLVMdev] some superoptimizer results > > > > Are you taking into account critical path length? Because e.g. for: > > > > %0:i64 = var > %1:i1 = slt 18446744073709551615:i64, %0 > %2:i64 = subnsw 0:i64, %0 > %3:i64 = select %1, %0, %2 > infer %3 > %4:i64 = ashr %0, 63:i64 > %5:i64 = add %0, %4 > %6:i64 = xor %5, %4 > result %6 > > > In the former case, the cmp and sub are independent, so can be > executed in parallel, while in the latter case all 3 instructions > are dependent. So the former case can execute in 2 cycles while the > latter takes 3. Modern OoO chips do in fact exploit this kind of > thing. > > > -- Sean Silva > > > > > On Wed, Jul 22, 2015 at 10:15 AM, John Regehr < regehr at cs.utah.edu > > wrote: > > > We (the folks working on Souper) would appreciate any feedback on > these IR-level superoptimizer results: > > http://blog.regehr.org/extra_files/souper-jul-15.html > > My impression is that while there's clearly plenty of material in > here that doesn't want to get implemented in an opt pass, there are > a number of gems hiding in there that are worth implementing. > > Blog post containing additional explanation and caveats is here: > > http://blog.regehr.org/archives/1252 > > Thanks! > > John > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
I just noticed: most of the results in this batch seem to be about exploiting `[zs]ext i1` having cost 1 in order to replace a select of cost 3. Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to i1) has cost 2 or 3? -- Sean Silva On Wed, Jul 22, 2015 at 10:15 AM, John Regehr <regehr at cs.utah.edu> wrote:> We (the folks working on Souper) would appreciate any feedback on these > IR-level superoptimizer results: > > http://blog.regehr.org/extra_files/souper-jul-15.html > > My impression is that while there's clearly plenty of material in here > that doesn't want to get implemented in an opt pass, there are a number of > gems hiding in there that are worth implementing. > > Blog post containing additional explanation and caveats is here: > > http://blog.regehr.org/archives/1252 > > Thanks! > > John > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150722/4eccbf5a/attachment.html>
> some cases, but not others. Maybe it could use target hooks of some sort > so it only does transformations that makes sense given a certain cost > model?Yeah, I would love to explore that, when Souper is being invoked as a pass. On the other hand, here I was only hoping to suggest some IR-level optimizations that might be useful... John
> Are you taking into account critical path length?No, but of course that't not hard. There are a lot of things that could be taken into account in the cost functions, but I was hoping to keep it really simple, so that the results would not be opaque.> case all 3 instructions are dependent. So the former case can execute in 2 cycles while the latter takes > 3. Modern OoO chips do in fact exploit this kind of thing.Sure. One way to deal with this would be to use the HW as its own model: run a lot of code fragments on real chips under a benchmark harness and then use the results to train a cost model. John
> I just noticed: most of the results in this batch seem to be about exploiting `[zs]ext i1` having cost 1 > in order to replace a select of cost 3. > Could you do a run where select has cost 1 and [zs]ext i1 (and trunc to i1) has cost 2 or 3?I tried this (or something quite similar) earlier and it caused Souper to introduce *a lot* of selects. So the problem is that Souper's preferences become skewed too far in the other direction. How about for the next run I give everything (except maybe div/rem) cost 1? Then the only thing Souper will tell us about is when it can eliminate instructions, which seems to be a generally desirable thing. As I said in the blog post, in the next batch Souper will exploit known bits. I'm excited to see what kind of results come out of Philip's value-tracking-dom-conditions. John