On 15 January 2015 at 10:49, Chandler Carruth <chandlerc at google.com> wrote:> On Thu, Jan 15, 2015 at 10:30 AM, Sanjay Patel <spatel at rotateright.com> > wrote: > >> Would it be wrong to generate the llvm.assume IR suggested below? in GVN? >> > > I think so... Because: > > One very small tweak you could make would be to add an llvm.assume inside >>> the if case of the if condition. (Yes, that is as silly as it sounds.) I >>> tested that, and it did optimize as expected. It's essentially working >>> around a deficiency in the optimizer around path constraints. >>> >> > Once upon a time, there was an optimization for this called predsimplify. > I believe Nick can tell you a long, glorious, tragic tale about its life > and ultimate death. >Okay, gather 'round the fire. ;-) Predsimplify was the first optimization pass I tried to write, to fix PR807. The basis was that it would propagate properties (icmp's between two SSA values) down paths in the domtree. In its various rewrites it knew all sorts of tricks. For instance it could do: a = b + c if b == 5: if c == 3: print a # replace 'a' with 8 which requires that it solve a bidirectional dataflow problem. It also knew that a < b implied b > 0, and a < b < c implies that c > 1. Some of the things predsimplify used to do are now implemented inside gvn's processEquality. Predsimplify also had an odd trick: a = phi [0, a.i] b = phi [10, x] if a == 1: # here we know that b == x. (x could still equal 10 though.) which I haven't seen replicated elsewhere in llvm. I went through many iterations of the implementation trying to minimize the compile time impact. So why predsimplify die? The first largest problem is that it rarely fired. It turns out that people don't write code which needs this sort of optimization in C very often, or at least in the llvm test suite. Instcombine already does a ton of optimization that overlapped with predsimplify in its demanded bits analysis, and that does a much better job of getting the cases that show up in real-world code. The cost of the analysis that predsimplify was doing, eagerly building up its value graph was too expensive to fit in llvm, especially when instcombine's demanded bits analysis tended to get the cases already. If your IR looks different, it might be worth adding to the pass pipeline for you. I think a future VRP (value range propagation) pass would need to be a superset of simplify-demanded-bits, so that we could remove the cost of running simplify-demanded-bits to help pay for the VRP. I've been thinking about the data structure we could use for that, but haven't worked out all the operations yet. Nick In particular:> > >>> define void @example_tweaked(i64* %x0) #0 { >>> %1 = load i64* %x0, align 8 >>> %2 = and i64 %1, 3 >>> %3 = icmp eq i64 %2, 2 >>> br i1 %3, label %4, label %8 >>> ; <label>:4 ; preds = %0 >>> call void @llvm.assume(i1 %3) >>> >> > We don't need the assume here. If we want to do predicate-based > simplification, we can just have <whatever> look at dominating branch > conditions, much like it looks at dominating assume calls. Worst case is > that it requires more fancy-ness in the assumption tracking infrastructure > to actually funnel queries through to different ways of saying "this value > is true at this point". > > However, that is the core essence of predsimplify. Historically, we have > found a really terrifying amount of complexity and compile time hit for > relatively small gains in terms of generated code quality. =/ Maybe we just > need to handle the "obvious" cases much like the very limited calls to > assumption tracker do, but I think this path needs to be trod with great > care. > > >>> %5 = and i64 %1, -4 ; this should be optimized away >>> %6 = or i64 %5, 2 ; this should be optimized away >>> %7 = add nsw i64 %6, 12 >>> store i64 %7, i64* %x0, align 8 >>> ret void >>> ; <label>:8 ; preds = %0 >>> tail call void @go_error() #2 >>> unreachable >>> } >>> >>> declare void @llvm.assume(i1) >>> >>> >>> >>> Thanks for any ideas, >>> /Lars >>> >>> /***************************************************/ >>> >>> /* The two LSB of x0 are 'tag bits' */ >>> /* that we want to manipulate. */ >>> extern long x0; >>> >>> void go_error(void) __attribute__ ((noreturn)); >>> >>> void example_not_optimized(void) >>> { >>> if((x0 & 3) == 2) { >>> // Here the tag bits are removed and added >>> // with bitwise 'and' and 'or'. >>> x0 = ((x0 & ~3) | 2) + 12; >>> } else { >>> go_error(); >>> } >>> } >>> >>> /* >>> define void @example_not_optimized() #0 { >>> %1 = load i64* @x0, align 8, !tbaa !1 >>> %2 = and i64 %1, 3 >>> %3 = icmp eq i64 %2, 2 >>> br i1 %3, label %4, label %8 >>> >>> ; <label>:4 ; preds = %0 >>> %5 = and i64 %1, -4 ; this should be optimized away >>> %6 = or i64 %5, 2 ; this should be optimized away >>> %7 = add nsw i64 %6, 12 >>> store i64 %7, i64* @x0, align 8, !tbaa !1 >>> ret void >>> >>> ; <label>:8 ; preds = %0 >>> tail call void @go_error() #2 >>> unreachable >>> } >>> */ >>> >>> >>> void example_optimized(void) >>> { >>> if((x0 & 3) == 2) { >>> // Here the tag bits are removed and added >>> // with subtraction and addition. >>> x0 = (x0 - (x0 & 3) + 2) + 12; >>> } else { >>> go_error(); >>> } >>> } >>> >>> /* >>> define void @example_optimized() #0 { >>> %1 = load i64* @x0, align 8, !tbaa !1 >>> %2 = and i64 %1, 3 >>> %3 = icmp eq i64 %2, 2 >>> br i1 %3, label %4, label %6 >>> >>> ; <label>:4 ; preds = %0 >>> %5 = add i64 %1, 12 >>> store i64 %5, i64* @x0, align 8, !tbaa !1 >>> ret void >>> >>> ; <label>:6 ; preds = %0 >>> tail call void @go_error() #2 >>> unreachable >>> } >>> >>> */ >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/6d396d58/attachment.html>
Thanks, LLVM Ol' Timers! Your ghost stories are scary enough that I'll tread quietly past predsimplify's grave. I was peeking into this because - and I'm happy to be corrected if I've misdiagnosed these - we've seen a few VRP-related bugs cropping up lately: 1. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-November/079038.html (remove alignment checks) 2. http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079345.html (remove tag bit tests) 3. http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080314.html (remove FP range checks, patch submitted) 4. http://llvm.org/bugs/show_bug.cgi?id=17683 (range check to improve division) 5. http://llvm.org/bugs/show_bug.cgi?id=17713 (propagate FP equalities, patch committed) On Thu, Jan 15, 2015 at 3:42 PM, Nick Lewycky <nlewycky at google.com> wrote:> On 15 January 2015 at 10:49, Chandler Carruth <chandlerc at google.com> > wrote: > >> On Thu, Jan 15, 2015 at 10:30 AM, Sanjay Patel <spatel at rotateright.com> >> wrote: >> >>> Would it be wrong to generate the llvm.assume IR suggested below? in GVN? >>> >> >> I think so... Because: >> >> One very small tweak you could make would be to add an llvm.assume inside >>>> the if case of the if condition. (Yes, that is as silly as it sounds.) I >>>> tested that, and it did optimize as expected. It's essentially working >>>> around a deficiency in the optimizer around path constraints. >>>> >>> >> Once upon a time, there was an optimization for this called predsimplify. >> I believe Nick can tell you a long, glorious, tragic tale about its life >> and ultimate death. >> > > Okay, gather 'round the fire. ;-) > > Predsimplify was the first optimization pass I tried to write, to fix > PR807. The basis was that it would propagate properties (icmp's between two > SSA values) down paths in the domtree. In its various rewrites it knew all > sorts of tricks. For instance it could do: > > a = b + c > if b == 5: > if c == 3: > print a # replace 'a' with 8 > > which requires that it solve a bidirectional dataflow problem. It also > knew that a < b implied b > 0, and a < b < c implies that c > 1. Some of > the things predsimplify used to do are now implemented inside gvn's > processEquality. Predsimplify also had an odd trick: > > a = phi [0, a.i] > b = phi [10, x] > if a == 1: > # here we know that b == x. (x could still equal 10 though.) > > which I haven't seen replicated elsewhere in llvm. > > I went through many iterations of the implementation trying to minimize > the compile time impact. So why predsimplify die? > > The first largest problem is that it rarely fired. It turns out that > people don't write code which needs this sort of optimization in C very > often, or at least in the llvm test suite. Instcombine already does a ton > of optimization that overlapped with predsimplify in its demanded bits > analysis, and that does a much better job of getting the cases that show up > in real-world code. The cost of the analysis that predsimplify was doing, > eagerly building up its value graph was too expensive to fit in llvm, > especially when instcombine's demanded bits analysis tended to get the > cases already. If your IR looks different, it might be worth adding to the > pass pipeline for you. > > I think a future VRP (value range propagation) pass would need to be a > superset of simplify-demanded-bits, so that we could remove the cost of > running simplify-demanded-bits to help pay for the VRP. I've been thinking > about the data structure we could use for that, but haven't worked out all > the operations yet. > > Nick > > In particular: >> >> >>>> define void @example_tweaked(i64* %x0) #0 { >>>> %1 = load i64* %x0, align 8 >>>> %2 = and i64 %1, 3 >>>> %3 = icmp eq i64 %2, 2 >>>> br i1 %3, label %4, label %8 >>>> ; <label>:4 ; preds = %0 >>>> call void @llvm.assume(i1 %3) >>>> >>> >> We don't need the assume here. If we want to do predicate-based >> simplification, we can just have <whatever> look at dominating branch >> conditions, much like it looks at dominating assume calls. Worst case is >> that it requires more fancy-ness in the assumption tracking infrastructure >> to actually funnel queries through to different ways of saying "this value >> is true at this point". >> >> However, that is the core essence of predsimplify. Historically, we have >> found a really terrifying amount of complexity and compile time hit for >> relatively small gains in terms of generated code quality. =/ Maybe we just >> need to handle the "obvious" cases much like the very limited calls to >> assumption tracker do, but I think this path needs to be trod with great >> care. >> >> >>>> %5 = and i64 %1, -4 ; this should be optimized away >>>> %6 = or i64 %5, 2 ; this should be optimized away >>>> %7 = add nsw i64 %6, 12 >>>> store i64 %7, i64* %x0, align 8 >>>> ret void >>>> ; <label>:8 ; preds = %0 >>>> tail call void @go_error() #2 >>>> unreachable >>>> } >>>> >>>> declare void @llvm.assume(i1) >>>> >>>> >>>> >>>> Thanks for any ideas, >>>> /Lars >>>> >>>> /***************************************************/ >>>> >>>> /* The two LSB of x0 are 'tag bits' */ >>>> /* that we want to manipulate. */ >>>> extern long x0; >>>> >>>> void go_error(void) __attribute__ ((noreturn)); >>>> >>>> void example_not_optimized(void) >>>> { >>>> if((x0 & 3) == 2) { >>>> // Here the tag bits are removed and added >>>> // with bitwise 'and' and 'or'. >>>> x0 = ((x0 & ~3) | 2) + 12; >>>> } else { >>>> go_error(); >>>> } >>>> } >>>> >>>> /* >>>> define void @example_not_optimized() #0 { >>>> %1 = load i64* @x0, align 8, !tbaa !1 >>>> %2 = and i64 %1, 3 >>>> %3 = icmp eq i64 %2, 2 >>>> br i1 %3, label %4, label %8 >>>> >>>> ; <label>:4 ; preds = %0 >>>> %5 = and i64 %1, -4 ; this should be optimized away >>>> %6 = or i64 %5, 2 ; this should be optimized away >>>> %7 = add nsw i64 %6, 12 >>>> store i64 %7, i64* @x0, align 8, !tbaa !1 >>>> ret void >>>> >>>> ; <label>:8 ; preds = %0 >>>> tail call void @go_error() #2 >>>> unreachable >>>> } >>>> */ >>>> >>>> >>>> void example_optimized(void) >>>> { >>>> if((x0 & 3) == 2) { >>>> // Here the tag bits are removed and added >>>> // with subtraction and addition. >>>> x0 = (x0 - (x0 & 3) + 2) + 12; >>>> } else { >>>> go_error(); >>>> } >>>> } >>>> >>>> /* >>>> define void @example_optimized() #0 { >>>> %1 = load i64* @x0, align 8, !tbaa !1 >>>> %2 = and i64 %1, 3 >>>> %3 = icmp eq i64 %2, 2 >>>> br i1 %3, label %4, label %6 >>>> >>>> ; <label>:4 ; preds = %0 >>>> %5 = add i64 %1, 12 >>>> store i64 %5, i64* @x0, align 8, !tbaa !1 >>>> ret void >>>> >>>> ; <label>:6 ; preds = %0 >>>> tail call void @go_error() #2 >>>> unreachable >>>> } >>>> >>>> */ >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing listLLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>> >>>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>> >>> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150116/6d8e2d0d/attachment.html>
The general problem of exploiting path sensitive facts for optimization is a very open one. I've been thinking about this myself a bit recently, but haven't come up with a particularly workable approach yet. The major problem is finding the right trade off between compile time and optimization quality. One simplification I think I've talked myself into is completely ignoring merging of conditions (i.e. dataflow) and simply rely on dominating conditions. At least so far, this seems to handle most of the cases I care about. (I'm mostly looking at cases involving either null checks or range checks.) I've more or less convinced myself that any approach taken would have to be lazy, and make aggressive use of analysis caching. The only exception to this might be a early dominance based early propagation restricted to specific narrow cases. (Either an extension to EarlyCSE, or something similar.) In an ideal world, the analysis results would feed all other optimization passes (in particular, InstSimplify and InstCombine). Not sure if this is doable for version one though. It's also worth mentioning that a between LVI, GVN, & LoopUnswitch, we actually do get a lot of conditions inside loops. In particular, any check inside a loop *header* block with an immediately dominating check outside the loop is generally taken care of. Philip On 01/16/2015 08:46 AM, Sanjay Patel wrote:> Thanks, LLVM Ol' Timers! Your ghost stories are scary enough that I'll > tread quietly past predsimplify's grave. > > I was peeking into this because - and I'm happy to be corrected if > I've misdiagnosed these - we've seen a few VRP-related bugs cropping > up lately: > 1. > http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-November/079038.html > (remove alignment checks) > 2. > http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079345.html > (remove tag bit tests) > 3. http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080314.html > (remove FP range checks, patch submitted) > 4. http://llvm.org/bugs/show_bug.cgi?id=17683 (range check to improve > division) > 5. http://llvm.org/bugs/show_bug.cgi?id=17713 (propagate FP > equalities, patch committed) > > > On Thu, Jan 15, 2015 at 3:42 PM, Nick Lewycky <nlewycky at google.com > <mailto:nlewycky at google.com>> wrote: > > On 15 January 2015 at 10:49, Chandler Carruth > <chandlerc at google.com <mailto:chandlerc at google.com>> wrote: > > On Thu, Jan 15, 2015 at 10:30 AM, Sanjay Patel > <spatel at rotateright.com <mailto:spatel at rotateright.com>> wrote: > > Would it be wrong to generate the llvm.assume IR suggested > below? in GVN? > > > I think so... Because: > > One very small tweak you could make would be to add an > llvm.assume inside the if case of the if condition. > (Yes, that is as silly as it sounds.) I tested that, > and it did optimize as expected. It's essentially > working around a deficiency in the optimizer around > path constraints. > > > Once upon a time, there was an optimization for this called > predsimplify. I believe Nick can tell you a long, glorious, > tragic tale about its life and ultimate death. > > > Okay, gather 'round the fire. ;-) > > Predsimplify was the first optimization pass I tried to write, to > fix PR807. The basis was that it would propagate properties > (icmp's between two SSA values) down paths in the domtree. In its > various rewrites it knew all sorts of tricks. For instance it > could do: > > a = b + c > if b == 5: > if c == 3: > print a # replace 'a' with 8 > > which requires that it solve a bidirectional dataflow problem. It > also knew that a < b implied b > 0, and a < b < c implies that c > > 1. Some of the things predsimplify used to do are now implemented > inside gvn's processEquality. Predsimplify also had an odd trick: > > a = phi [0, a.i] > b = phi [10, x] > if a == 1: > # here we know that b == x. (x could still equal 10 though.) > > which I haven't seen replicated elsewhere in llvm. > > I went through many iterations of the implementation trying to > minimize the compile time impact. So why predsimplify die? > > The first largest problem is that it rarely fired. It turns out > that people don't write code which needs this sort of optimization > in C very often, or at least in the llvm test suite. Instcombine > already does a ton of optimization that overlapped with > predsimplify in its demanded bits analysis, and that does a much > better job of getting the cases that show up in real-world code. > The cost of the analysis that predsimplify was doing, eagerly > building up its value graph was too expensive to fit in llvm, > especially when instcombine's demanded bits analysis tended to get > the cases already. If your IR looks different, it might be worth > adding to the pass pipeline for you. > > I think a future VRP (value range propagation) pass would need to > be a superset of simplify-demanded-bits, so that we could remove > the cost of running simplify-demanded-bits to help pay for the > VRP. I've been thinking about the data structure we could use for > that, but haven't worked out all the operations yet. > > Nick > > In particular: > > > define void @example_tweaked(i64* %x0) #0 { > %1 = load i64* %x0, align 8 > %2 = and i64 %1, 3 > %3 = icmp eq i64 %2, 2 > br i1 %3, label %4, label %8 > ; <label>:4 ; preds = %0 > call void @llvm.assume(i1 %3) > > > We don't need the assume here. If we want to do > predicate-based simplification, we can just have <whatever> > look at dominating branch conditions, much like it looks at > dominating assume calls. Worst case is that it requires more > fancy-ness in the assumption tracking infrastructure to > actually funnel queries through to different ways of saying > "this value is true at this point". > > However, that is the core essence of predsimplify. > Historically, we have found a really terrifying amount of > complexity and compile time hit for relatively small gains in > terms of generated code quality. =/ Maybe we just need to > handle the "obvious" cases much like the very limited calls to > assumption tracker do, but I think this path needs to be trod > with great care. > > > %5 = and i64 %1, -4 ; this should be optimized away > %6 = or i64 %5, 2 ; this should be optimized away > %7 = add nsw i64 %6, 12 > store i64 %7, i64* %x0, align 8 > ret void > ; <label>:8 ; preds = %0 > tail call void @go_error() #2 > unreachable > } > > declare void @llvm.assume(i1) > > >> >> Thanks for any ideas, >> /Lars >> >> /***************************************************/ >> >> /* The two LSB of x0 are 'tag bits' */ >> /* that we want to manipulate. */ >> extern long x0; >> >> void go_error(void) __attribute__ ((noreturn)); >> >> void example_not_optimized(void) >> { >> if((x0 & 3) == 2) { >> // Here the tag bits are removed and added >> // with bitwise 'and' and 'or'. >> x0 = ((x0 & ~3) | 2) + 12; >> } else { >> go_error(); >> } >> } >> >> /* >> define void @example_not_optimized() #0 { >> %1 = load i64* @x0, align 8, !tbaa !1 >> %2 = and i64 %1, 3 >> %3 = icmp eq i64 %2, 2 >> br i1 %3, label %4, label %8 >> >> ; <label>:4 ; preds = %0 >> %5 = and i64 %1, -4 ; this should be optimized away >> %6 = or i64 %5, 2 ; this should be >> optimized away >> %7 = add nsw i64 %6, 12 >> store i64 %7, i64* @x0, align 8, !tbaa !1 >> ret void >> >> ; <label>:8 ; preds = %0 >> tail call void @go_error() #2 >> unreachable >> } >> */ >> >> >> void example_optimized(void) >> { >> if((x0 & 3) == 2) { >> // Here the tag bits are removed and added >> // with subtraction and addition. >> x0 = (x0 - (x0 & 3) + 2) + 12; >> } else { >> go_error(); >> } >> } >> >> /* >> define void @example_optimized() #0 { >> %1 = load i64* @x0, align 8, !tbaa !1 >> %2 = and i64 %1, 3 >> %3 = icmp eq i64 %2, 2 >> br i1 %3, label %4, label %6 >> >> ; <label>:4 ; preds = %0 >> %5 = add i64 %1, 12 >> store i64 %5, i64* @x0, align 8, !tbaa !1 >> ret void >> >> ; <label>:6 ; preds = %0 >> tail call void @go_error() #2 >> unreachable >> } >> >> */ >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> > http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150116/f685c5c3/attachment.html>