Andrew Trick
2013-Apr-25 09:07 UTC
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
On Apr 23, 2013, at 10:37 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:> As far as I can understand of the code, the Reassociate tries to achieve this result by its "ranking" mechanism. > > If it dose not, it is not hard to achieve this result, just restructure the expression in a way such that > the earlier definition of the sub-expression is permute earlier in the resulting expr. > > e.g. > outer-loop1 > x> outer-loop2 > y > > inner-loop3 > z > t = z * y * z > > the RHS is better restructured into "x * y * z", such that x * y can moved as early as possible. > Restructuring expr this expr is also able to reduce the critical path even if no sub-expr is > moved out of loop. > > By no means can "canonicalization" can achieve this result, because as this transformation need > some context. (It seems the name "cannonicalization" is badly abused in LLVM; it could refers > to this kind of xform. IMO, cannonicalization is sort of blind but consistent restructuring) > > BTW, I did similar stuff before in other compiler, I didn't see any %.Right. Reassociate ranks expressions by their order in the IR, so shouldn't make matters worse, but it doesn't know about loops so can't expose more of these opportunities AFAIK. Jeroen, it may be worth filing a bug. I'm not sure why LSR isn't handling it later. For cases that LSR doesn't handle, we've been wanting an MI reassociate pass in the backend anyway to expose ILP. That could clean up cases like this when it's actually on the critical path and we don't mind the regpressure. -Andy> On 4/23/13 6:48 AM, Jeroen Dobbelaere wrote: >> Hi, >> >> I am investigating a performance degradation between llvm-3.1 and llvm-3.2 >> (Note: current top-of-tree shows a similar degradation) >> >> One issue I see is the following: >> - 'loop invariant code motion' seems to be depending on the result of the 'reassociate expression' pass: >> >> In the samples below I observer the following behavior: >> >> Both start with the same expression: >> %add = add i32 %total.0, %next.0 >> %add1 = add i32 %add, 1 >> >> -LLVM-3.1 converts this into: >> --after Reassociate expressions: >> %add = add i32 %next.0, 1 >> %add1 = add i32 %add, %total.0 >> -- after Canonicalize natural loops: >> %add = add i32 %next.0.ph, 1 >> %add1 = add i32 %add, %total.0 >> -- and during 'loop invariant code motion': >> LICM hoisting to while.cond.outer: %add = add i32 %next.0.ph, 1 >> >> - LLVM-3.2 converts it into: (Notice the reversion of %total.0 and %next.0) >> --after Reassociate expressions: >> %add = add i32 %total.0, 1 >> %add1 = add i32 %add, %next.0 >> -- after Canonicalize natural loops: >> %add = add i32 %total.0, 1 >> %add1 = add i32 %add, %next.0.ph >> -- and during 'loop invariant code motion': >> -> %next.0.ph and '1' are loop invariant, but because those are spreaded, they are not moved any more. >> >> >> Is there a way that 'Reassociate expressions' of 'LICM' can rearrange the expressions (likely after 'Canonicalize natural loops'), >> and group loop-invariant parts of the expression ? >> Or is there another pass that I can/should enable to have this effect ? >> >> Greetings, >> Jeroen Dobbelaere >> >> --- >> >> llvm-3.1 >> >> >> while.cond: ; preds = %sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry >> %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8, %sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ] >> %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [ %total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %sw.bb ] >> %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ] >> %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9, %sw.bb ] >> %add = add i32 %total.0, %next.0 >> %add1 = add i32 %add, 1 >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29 >> >> *** IR Dump After Reassociate expressions *** >> define void @foo(i32 %size, i16 signext %seed, i8* nocapture %p) nounwind { >> entry: >> %dec = add i32 %size, -1 >> br label %while.cond >> >> while.cond: ; preds = %sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry >> %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8, %sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ] >> %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [ %total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %sw.bb ] >> %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ] >> %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9, %sw.bb ] >> %add = add i32 %next.0, 1 >> %add1 = add i32 %add, %total.0 >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29 >> >> >> *** IR Dump After Canonicalize natural loops *** >> while.cond: ; preds = %while.cond.outer, %if.end >> %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph, %while.cond.outer ] >> %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph, %while.cond.outer ] >> %add = add i32 %next.0.ph, 1 >> %add1 = add i32 %add, %total.0 >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29.preheader >> [...] >> >> for.cond.for.end_crit_edge: ; preds = %for.body >> %split = phi i32 [ %inc, %for.body ] >> br label %for.end >> LICM hoisting to while.cond.outer: %add = add i32 %next.0.ph, 1 >> LICM hoisting to while.cond.outer: %cmp2 = icmp eq i32 %next.0.ph, 0 >> LICM hoisting to while.cond.outer: %cmp343 = icmp ult i32 0, %next.0.ph >> LICM hoisting to while.cond.outer: %add740 = or i32 %next.0.ph, 1 >> *** IR Dump After Loop Invariant Code Motion *** >> while.cond: ; preds = %while.cond.outer, %if.end >> %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph, %while.cond.outer ] >> %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph, %while.cond.outer ] >> %add1 = add i32 %add, %total.0 >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29.preheader >> [...] >> >> ------- >> >> llvm-3.2 >> >> >> while.cond: ; preds = %sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry >> %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9, %sw.bb ] >> %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [ %total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %sw.bb ] >> %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8, %sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ] >> %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ] >> %add = add i32 %total.0, %next.0 >> %add1 = add i32 %add, 1 >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29 >> >> *** IR Dump After Reassociate expressions *** >> define void @foo(i32 %size, i16 signext %seed, i8* nocapture %p) nounwind { >> entry: >> %dec = add i32 %size, -1 >> br label %while.cond >> >> while.cond: ; preds = %sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry >> %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9, %sw.bb ] >> %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [ %total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %sw.bb ] >> %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8, %sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ] >> %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ] >> %add = add i32 %total.0, 1 >> %add1 = add i32 %add, %next.0 >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29 >> >> >> *** IR Dump After Canonicalize natural loops *** >> while.cond: ; preds = %while.cond.outer, %if.end >> %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph, %while.cond.outer ] >> %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph, %while.cond.outer ] >> %add = add i32 %total.0, 1 >> %add1 = add i32 %add, %next.0.ph >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29.preheader >> [.....] >> for.cond.for.end_crit_edge: ; preds = %for.body >> %split = phi i32 [ %inc, %for.body ] >> br label %for.end >> LICM hoisting to while.cond.outer: %cmp2 = icmp eq i32 %next.0.ph, 0 >> LICM hoisting to while.cond.outer: %cmp366 = icmp ult i32 0, %next.0.ph >> LICM hoisting to while.cond.outer: %add763 = or i32 %next.0.ph, 1 >> *** IR Dump After Loop Invariant Code Motion *** >> while.cond: ; preds = %while.cond.outer, %if.end >> %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph, %while.cond.outer ] >> %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph, %while.cond.outer ] >> %add = add i32 %total.0, 1 >> %add1 = add i32 %add, %next.0.ph >> %cmp = icmp ult i32 %add1, %dec >> br i1 %cmp, label %while.body, label %while.cond29.preheader >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Dan Gohman
2013-Apr-25 14:59 UTC
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
On Thu, Apr 25, 2013 at 2:07 AM, Andrew Trick <atrick at apple.com> wrote:> > On Apr 23, 2013, at 10:37 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote: > > > As far as I can understand of the code, the Reassociate tries to achieve > this result by its "ranking" mechanism. > > > > If it dose not, it is not hard to achieve this result, just restructure > the expression in a way such that > > the earlier definition of the sub-expression is permute earlier in the > resulting expr. > > > > e.g. > > outer-loop1 > > x> > outer-loop2 > > y > > > > inner-loop3 > > z > > t = z * y * z >I assume this was meant to be z * y * x.> > > > the RHS is better restructured into "x * y * z", such that x * y can > moved as early as possible. > > Restructuring expr this expr is also able to reduce the critical path > even if no sub-expr is > > moved out of loop. > > >> > > > By no means can "canonicalization" can achieve this result, because as > this transformation need > > some context. (It seems the name "cannonicalization" is badly abused in > LLVM; it could refers > > to this kind of xform. IMO, cannonicalization is sort of blind but > consistent restructuring) >Actually, canonicalization is not quite that blind; it does intentionally achieve exactly the result you want in your example.> > BTW, I did similar stuff before in other compiler, I didn't see any %. > > Right. Reassociate ranks expressions by their order in the IR, so > shouldn't make matters worse, but it doesn't know about loops so can't > expose more of these opportunities AFAIK. >Reassociate orders instruction with a reverse post order "which effectively gives values in deep loops higher rank than values not in loops". The problem in the original testcase is that %next.0 is not trivially loop-invariant when reassociate sees it. It's a phi node, and it's in the same block as %total.0. It's hard to say since we don't have a complete testcase, but probably this is causing it to turn into something actually loop-invariant under loop-simplify. Consequently, this may just be a phase-ordering problem. Alternatively, perhaps reassociate should rank phi nodes in the same block by the ranks of their inputs. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130425/b43d1afe/attachment.html>
Jeroen Dobbelaere
2013-Apr-26 14:32 UTC
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
Hi Andrew, Shuxin, Thanks for the explanation about 'Reassociate'. The problem is indeed that one of the variables, becomes loop invariant after some of the loop transformations.> -----Original Message----- > From: Andrew Trick [mailto:atrick at apple.com][...]> Right. Reassociate ranks expressions by their order in the IR, so shouldn't > make matters worse, but it doesn't know about loops so can't expose more > of these opportunities AFAIK. > > Jeroen, it may be worth filing a bug. I'm not sure why LSR isn't handling it > later. > > For cases that LSR doesn't handle, we've been wanting an MI reassociate > pass in the backend anyway to expose ILP. That could clean up cases like this > when it's actually on the critical path and we don't mind the regpressure. >I inserted a small optimization pass to recognize and fix this exact sequence. This did indeed improve the performance (good ;) ), but not to the amount that I hoped.. I'll try to come up with a reduced file that shows this problem and file a bugreport for it. Greetings, Jeroen [...]
Andrew Trick
2013-Apr-30 03:09 UTC
[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
On Apr 26, 2013, at 7:32 AM, Jeroen Dobbelaere <Jeroen.Dobbelaere at synopsys.com> wrote:> Hi Andrew, Shuxin, > > Thanks for the explanation about 'Reassociate'. > > The problem is indeed that one of the variables, becomes loop invariant after > some of the loop transformations.I imagine LoopSimplify is enough to fix your loop invariant phi so it's an obvious constant. At least obvious enough for InstCombine to remove it from the loop. Unless LoopSimplify is being defeated by an indirect branch, which is just silly in the case of a loop where the back edge is not indirect.>> -----Original Message----- >> From: Andrew Trick [mailto:atrick at apple.com] > [...] >> Right. Reassociate ranks expressions by their order in the IR, so shouldn't >> make matters worse, but it doesn't know about loops so can't expose more >> of these opportunities AFAIK. >> >> Jeroen, it may be worth filing a bug. I'm not sure why LSR isn't handling it >> later. >> >> For cases that LSR doesn't handle, we've been wanting an MI reassociate >> pass in the backend anyway to expose ILP. That could clean up cases like this >> when it's actually on the critical path and we don't mind the regpressure. >> > > I inserted a small optimization pass to recognize and fix this exact sequence. > This did indeed improve the performance (good ;) ), but not to the amount that I hoped.. > I'll try to come up with a reduced file that shows this problem and file a bugreport for it.A bug would be nice. It maybe something to consider if we decide to reorder passes. -Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130429/bf035ec5/attachment.html>
Seemingly Similar Threads
- [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
- [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
- [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
- [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'
- [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'