thr3ads.net - llvm dev - [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression' [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Preston Briggs

2013-Apr-25 15:51 UTC

[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

It's an interesting problem.
The best stuff I've seen published is by Cooper, Eckhart, & Kennedy, in
PACT '08.
Cooper gives a nice intro in one of his lectures:
http://www.cs.rice.edu/~keith/512/2012/Lectures/26ReassocII-1up.pdf
I can't tell, quickly, what's going on in Reassociate;
as usual, the documentation resolutely avoids giving any credit for the
ideas.
Why is that?

Preston

> On Apr 23, 2013, at 10:37 AM, Shuxin Yang <shuxin.llvm at gmail.com>
wrote:
>
> > As far as I can understand of the code, the Reassociate tries toachieve this result by its "ranking"
mechanism.> >
> > If it dose not, it is not hard to achieve this result, just
restructure
the expression in a way such that> > the earlier definition of the sub-expression is permute earlier in the
resulting expr.> >
> > e.g.
> >   outer-loop1
> >       x> >       outer-loop2
> >          y > >
> >          inner-loop3
> >              z > >              t = z * y * z
> >
> >    the RHS is better restructured into "x * y * z", such
that x * y can
moved as early as possible.> > Restructuring expr this expr is also able to reduce the critical path
even if no sub-expr is> > moved out of loop.
> >
> >   By no means can "canonicalization" can achieve this
result, because
as this transformation need> > some context. (It seems the name "cannonicalization" is
badly abused in
LLVM; it could refers> > to this kind of xform. IMO, cannonicalization is sort of blind but
consistent restructuring)> >
> >   BTW, I did similar stuff before in other compiler, I didn't see
any %.
>
> Right. Reassociate ranks expressions by their order in the IR, soshouldn't make matters worse, but it doesn't know about loops so
can't
expose more of these opportunities AFAIK.>
> Jeroen, it may be worth filing a bug. I'm not sure why LSR isn't
handling
it later.>
> For cases that LSR doesn't handle, we've been wanting an MI
reassociatepass in the backend anyway to expose ILP. That could clean up cases like
this when it's actually on the critical path and we don't mind the
regpressure.>
> -Andy
>
> > On 4/23/13 6:48 AM, Jeroen Dobbelaere wrote:
> >> Hi,
> >>
> >> I am investigating a performance degradation between llvm-3.1 and
llvm-3.2> >> (Note: current top-of-tree shows a similar degradation)
> >>
> >> One issue I see is the following:
> >> - 'loop invariant code motion' seems to be depending on
the result of
the 'reassociate expression' pass:> >>
> >> In the samples below I observer the following behavior:
> >>
> >> Both start with the same expression:
> >>   %add = add i32 %total.0, %next.0
> >>   %add1 = add i32 %add, 1
> >>
> >> -LLVM-3.1 converts this into:
> >> --after Reassociate expressions:
> >>   %add = add i32 %next.0, 1
> >>   %add1 = add i32 %add, %total.0
> >> -- after Canonicalize natural loops:
> >>   %add = add i32 %next.0.ph, 1
> >>   %add1 = add i32 %add, %total.0
> >> -- and during 'loop invariant code motion':
> >> LICM hoisting to while.cond.outer:   %add = add i32 %next.0.ph, 1
> >>
> >> - LLVM-3.2 converts it into:  (Notice the reversion of %total.0
and
%next.0)> >> --after Reassociate expressions:
> >>   %add = add i32 %total.0, 1
> >>   %add1 = add i32 %add, %next.0
> >> -- after Canonicalize natural loops:
> >>   %add = add i32 %total.0, 1
> >>   %add1 = add i32 %add, %next.0.ph
> >> -- and during 'loop invariant code motion':
> >> -> %next.0.ph and '1' are loop invariant, but because
those are
spreaded, they are not moved any more.> >>
> >>
> >> Is there a way that 'Reassociate expressions' of
'LICM' can rearrangethe expressions (likely after 'Canonicalize natural
loops'),> >> and group loop-invariant parts of the expression ?
> >> Or is there another pass that I can/should enable to have this
effect ?
> >>
> >> Greetings,
> >>  Jeroen Dobbelaere
> >>
> >> ---
> >>
> >> llvm-3.1
> >>
> >>
> >> while.cond:                                       ; preds =
%sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23
], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]> >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1,%sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %
sw.bb ]> >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> >>   %add = add i32 %total.0, %next.0
> >>   %add1 = add i32 %add, 1
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29
> >>
> >> *** IR Dump After Reassociate expressions ***
> >> define void @foo(i32 %size, i16 signext %seed, i8* nocapture %p)
nounwind {> >> entry:
> >>   %dec = add i32 %size, -1
> >>   br label %while.cond
> >>
> >> while.cond:                                       ; preds =
%sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23
], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]> >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1,%sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %
sw.bb ]> >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> >>   %add = add i32 %next.0, 1
> >>   %add1 = add i32 %add, %total.0
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29
> >>
> >>
> >> *** IR Dump After Canonicalize natural loops ***
> >> while.cond:                                       ; preds
%while.cond.outer, %if.end
> >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]> >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph,
%while.cond.outer ]> >>   %add = add i32 %next.0.ph, 1
> >>   %add1 = add i32 %add, %total.0
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29.preheader
> >> [...]
> >>
> >> for.cond.for.end_crit_edge:                       ; preds =
%for.body
> >>   %split = phi i32 [ %inc, %for.body ]
> >>   br label %for.end
> >> LICM hoisting to while.cond.outer:   %add = add i32 %next.0.ph, 1
> >> LICM hoisting to while.cond.outer:   %cmp2 = icmp eq i32
%next.0.ph, 0
> >> LICM hoisting to while.cond.outer:   %cmp343 = icmp ult i32 0, %
next.0.ph> >> LICM hoisting to while.cond.outer:   %add740 = or i32 %next.0.ph,
1
> >> *** IR Dump After Loop Invariant Code Motion ***
> >> while.cond:                                       ; preds
%while.cond.outer, %if.end
> >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]> >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph,
%while.cond.outer ]> >>   %add1 = add i32 %add, %total.0
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29.preheader
> >> [...]
> >>
> >> -------
> >>
> >> llvm-3.2
> >>
> >>
> >> while.cond:                                       ; preds =
%sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1,%sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %
sw.bb ]> >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23
], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]> >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> >>   %add = add i32 %total.0, %next.0
> >>   %add1 = add i32 %add, 1
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29
> >>
> >> *** IR Dump After Reassociate expressions ***
> >> define void @foo(i32 %size, i16 signext %seed, i8* nocapture %p)
nounwind {> >> entry:
> >>   %dec = add i32 %size, -1
> >>   br label %while.cond
> >>
> >> while.cond:                                       ; preds =
%sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1,%sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [ %total.1, %
sw.bb ]> >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23
], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]> >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> >>   %add = add i32 %total.0, 1
> >>   %add1 = add i32 %add, %next.0
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29
> >>
> >>
> >> *** IR Dump After Canonicalize natural loops ***
> >> while.cond:                                       ; preds
%while.cond.outer, %if.end
> >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph,
%while.cond.outer ]> >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]> >>   %add = add i32 %total.0, 1
> >>   %add1 = add i32 %add, %next.0.ph
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29.preheader
> >> [.....]
> >> for.cond.for.end_crit_edge:                       ; preds =
%for.body
> >>   %split = phi i32 [ %inc, %for.body ]
> >>   br label %for.end
> >> LICM hoisting to while.cond.outer:   %cmp2 = icmp eq i32
%next.0.ph, 0
> >> LICM hoisting to while.cond.outer:   %cmp366 = icmp ult i32 0, %
next.0.ph> >> LICM hoisting to while.cond.outer:   %add763 = or i32 %next.0.ph,
1
> >> *** IR Dump After Loop Invariant Code Motion ***
> >> while.cond:                                       ; preds
%while.cond.outer, %if.end
> >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [ %seed.addr.0.ph,
%while.cond.outer ]> >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]> >>   %add = add i32 %total.0, 1
> >>   %add1 = add i32 %add, %next.0.ph
> >>   %cmp = icmp ult i32 %add1, %dec
> >>   br i1 %cmp, label %while.body, label %while.cond29.preheader
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> > __-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130425/101c8a28/attachment.html>

Arnold Schwaighofer

2013-Apr-25 16:18 UTC

head link

[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

On Apr 25, 2013, at 10:51 AM, Preston Briggs <preston.briggs at gmail.com>
wrote:
> It's an interesting problem.
> The best stuff I've seen published is by Cooper, Eckhart, &
Kennedy, in PACT '08.
> Cooper gives a nice intro in one of his lectures:
http://www.cs.rice.edu/~keith/512/2012/Lectures/26ReassocII-1up.pdf
> I can't tell, quickly, what's going on in Reassociate;
Reasscociate is using a reverse postorder numbering, my guess is that the idea
is from Briggs, Cooper ’94.
> as usual, the documentation resolutely avoids giving any credit for the
ideas.
> Why is that?
> 
> Preston
> 
> 
> > On Apr 23, 2013, at 10:37 AM, Shuxin Yang <shuxin.llvm at
gmail.com> wrote:
> >
> > > As far as I can understand of the code, the Reassociate tries to
achieve this result by its "ranking" mechanism.
> > >
> > > If it dose not, it is not hard to achieve this result, just
restructure the expression in a way such that
> > > the earlier definition of the sub-expression is permute earlier
in the resulting expr.
> > >
> > > e.g.
> > >   outer-loop1
> > >       x> > >       outer-loop2
> > >          y > > >
> > >          inner-loop3
> > >              z > > >              t = z * y * z
> > >
> > >    the RHS is better restructured into "x * y * z",
such that x * y can moved as early as possible.
> > > Restructuring expr this expr is also able to reduce the critical
path even if no sub-expr is
> > > moved out of loop.
> > >
> > >   By no means can "canonicalization" can achieve this
result, because as this transformation need
> > > some context. (It seems the name "cannonicalization" is
badly abused in LLVM; it could refers
> > > to this kind of xform. IMO, cannonicalization is sort of blind
but consistent restructuring)
> > >
> > >   BTW, I did similar stuff before in other compiler, I didn't
see any %.
> >
> > Right. Reassociate ranks expressions by their order in the IR, so
shouldn't make matters worse, but it doesn't know about loops so
can't expose more of these opportunities AFAIK.
> >
> > Jeroen, it may be worth filing a bug. I'm not sure why LSR
isn't handling it later.
> >
> > For cases that LSR doesn't handle, we've been wanting an MI
reassociate pass in the backend anyway to expose ILP. That could clean up cases
like this when it's actually on the critical path and we don't mind the
regpressure.
> >
> > -Andy
> >
> > > On 4/23/13 6:48 AM, Jeroen Dobbelaere wrote:
> > >> Hi,
> > >>
> > >> I am investigating a performance degradation between llvm-3.1
and llvm-3.2
> > >> (Note: current top-of-tree shows a similar degradation)
> > >>
> > >> One issue I see is the following:
> > >> - 'loop invariant code motion' seems to be depending
on the result of the 'reassociate expression' pass:
> > >>
> > >> In the samples below I observer the following behavior:
> > >>
> > >> Both start with the same expression:
> > >>   %add = add i32 %total.0, %next.0
> > >>   %add1 = add i32 %add, 1
> > >>
> > >> -LLVM-3.1 converts this into:
> > >> --after Reassociate expressions:
> > >>   %add = add i32 %next.0, 1
> > >>   %add1 = add i32 %add, %total.0
> > >> -- after Canonicalize natural loops:
> > >>   %add = add i32 %next.0.ph, 1
> > >>   %add1 = add i32 %add, %total.0
> > >> -- and during 'loop invariant code motion':
> > >> LICM hoisting to while.cond.outer:   %add = add i32
%next.0.ph, 1
> > >>
> > >> - LLVM-3.2 converts it into:  (Notice the reversion of
%total.0 and %next.0)
> > >> --after Reassociate expressions:
> > >>   %add = add i32 %total.0, 1
> > >>   %add1 = add i32 %add, %next.0
> > >> -- after Canonicalize natural loops:
> > >>   %add = add i32 %total.0, 1
> > >>   %add1 = add i32 %add, %next.0.ph
> > >> -- and during 'loop invariant code motion':
> > >> -> %next.0.ph and '1' are loop invariant, but
because those are spreaded, they are not moved any more.
> > >>
> > >>
> > >> Is there a way that 'Reassociate expressions' of
'LICM' can rearrange the expressions (likely after 'Canonicalize
natural loops'),
> > >> and group loop-invariant parts of the expression ?
> > >> Or is there another pass that I can/should enable to have
this effect ?
> > >>
> > >> Greetings,
> > >>  Jeroen Dobbelaere
> > >>
> > >> ---
> > >>
> > >> llvm-3.1
> > >>
> > >>
> > >> while.cond:                                       ; preds =
%sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry
> > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]
> > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]
> > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [
%4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ]
> > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end
], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb ]
> > >>   %add = add i32 %total.0, %next.0
> > >>   %add1 = add i32 %add, 1
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label %while.cond29
> > >>
> > >> *** IR Dump After Reassociate expressions ***
> > >> define void @foo(i32 %size, i16 signext %seed, i8* nocapture
%p) nounwind {
> > >> entry:
> > >>   %dec = add i32 %size, -1
> > >>   br label %while.cond
> > >>
> > >> while.cond:                                       ; preds =
%sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry
> > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]
> > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]
> > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [
%4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ]
> > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end
], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb ]
> > >>   %add = add i32 %next.0, 1
> > >>   %add1 = add i32 %add, %total.0
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label %while.cond29
> > >>
> > >>
> > >> *** IR Dump After Canonicalize natural loops ***
> > >> while.cond:                                       ; preds =
%while.cond.outer, %if.end
> > >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]
> > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph, %while.cond.outer ]
> > >>   %add = add i32 %next.0.ph, 1
> > >>   %add1 = add i32 %add, %total.0
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > >> [...]
> > >>
> > >> for.cond.for.end_crit_edge:                       ; preds =
%for.body
> > >>   %split = phi i32 [ %inc, %for.body ]
> > >>   br label %for.end
> > >> LICM hoisting to while.cond.outer:   %add = add i32
%next.0.ph, 1
> > >> LICM hoisting to while.cond.outer:   %cmp2 = icmp eq i32
%next.0.ph, 0
> > >> LICM hoisting to while.cond.outer:   %cmp343 = icmp ult i32
0, %next.0.ph
> > >> LICM hoisting to while.cond.outer:   %add740 = or i32
%next.0.ph, 1
> > >> *** IR Dump After Loop Invariant Code Motion ***
> > >> while.cond:                                       ; preds =
%while.cond.outer, %if.end
> > >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]
> > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph, %while.cond.outer ]
> > >>   %add1 = add i32 %add, %total.0
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > >> [...]
> > >>
> > >> -------
> > >>
> > >> llvm-3.2
> > >>
> > >>
> > >> while.cond:                                       ; preds =
%sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry
> > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end
], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb ]
> > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]
> > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]
> > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [
%4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ]
> > >>   %add = add i32 %total.0, %next.0
> > >>   %add1 = add i32 %add, 1
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label %while.cond29
> > >>
> > >> *** IR Dump After Reassociate expressions ***
> > >> define void @foo(i32 %size, i16 signext %seed, i8* nocapture
%p) nounwind {
> > >> entry:
> > >>   %dec = add i32 %size, -1
> > >>   br label %while.cond
> > >>
> > >> while.cond:                                       ; preds =
%sw.bb, %sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry
> > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9, %if.end
], [ %inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb ]
> > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end ], [
%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]
> > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ], [ 8,
%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb ]
> > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end ], [
%4, %sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb ]
> > >>   %add = add i32 %total.0, 1
> > >>   %add1 = add i32 %add, %next.0
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label %while.cond29
> > >>
> > >>
> > >> *** IR Dump After Canonicalize natural loops ***
> > >> while.cond:                                       ; preds =
%while.cond.outer, %if.end
> > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph, %while.cond.outer ]
> > >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]
> > >>   %add = add i32 %total.0, 1
> > >>   %add1 = add i32 %add, %next.0.ph
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > >> [.....]
> > >> for.cond.for.end_crit_edge:                       ; preds =
%for.body
> > >>   %split = phi i32 [ %inc, %for.body ]
> > >>   br label %for.end
> > >> LICM hoisting to while.cond.outer:   %cmp2 = icmp eq i32
%next.0.ph, 0
> > >> LICM hoisting to while.cond.outer:   %cmp366 = icmp ult i32
0, %next.0.ph
> > >> LICM hoisting to while.cond.outer:   %add763 = or i32
%next.0.ph, 1
> > >> *** IR Dump After Loop Invariant Code Motion ***
> > >> while.cond:                                       ; preds =
%while.cond.outer, %if.end
> > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph, %while.cond.outer ]
> > >>   %total.0 = phi i32 [ %total.1, %if.end ], [ %total.0.ph,
%while.cond.outer ]
> > >>   %add = add i32 %total.0, 1
> > >>   %add1 = add i32 %add, %next.0.ph
> > >>   %cmp = icmp ult i32 %add1, %dec
> > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> LLVM Developers mailing list
> > >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > >
> > > __
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Preston Briggs

2013-Apr-25 19:18 UTC

head link

[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

On Thu, Apr 25, 2013 at 9:18 AM, Arnold Schwaighofer <
aschwaighofer at apple.com> wrote:>
> On Apr 25, 2013, at 10:51 AM, Preston Briggs <preston.briggs at
gmail.com>
wrote:>
> > It's an interesting problem.
> > The best stuff I've seen published is by Cooper, Eckhart, &
Kennedy, in
PACT '08.> > Cooper gives a nice intro in one of his lectures:
http://www.cs.rice.edu/~keith/512/2012/Lectures/26ReassocII-1up.pdf> > I can't tell, quickly, what's going on in Reassociate;
>
> Reasscociate is using a reverse postorder numbering, my guess is that theidea is from Briggs, Cooper ’94.

I believe some of the ideas are from Briggs & Cooper,
but there's other stuff in there I've never seen, e.g., Carmichael
Numbers.
I mean, we can all google Carmichael Numbers, but I had never heard of them
in relation to this problem.

Preston


>
> > as usual, the documentation resolutely avoids giving any credit for
the
ideas.> > Why is that?
> >
> > Preston
> >
> >
> > > On Apr 23, 2013, at 10:37 AM, Shuxin Yang <shuxin.llvm at
gmail.com>
wrote:> > >
> > > > As far as I can understand of the code, the Reassociate
tries toachieve this result by its "ranking"
mechanism.> > > >
> > > > If it dose not, it is not hard to achieve this result, just
restructure the expression in a way such that> > > > the earlier definition of the sub-expression is permute
earlier in
the resulting expr.> > > >
> > > > e.g.
> > > >   outer-loop1
> > > >       x> > > >       outer-loop2
> > > >          y > > > >
> > > >          inner-loop3
> > > >              z > > > >              t = z * y *
z
> > > >
> > > >    the RHS is better restructured into "x * y *
z", such that x * y
can moved as early as possible.> > > > Restructuring expr this expr is also able to reduce the
critical
path even if no sub-expr is> > > > moved out of loop.
> > > >
> > > >   By no means can "canonicalization" can achieve
this result,
because as this transformation need> > > > some context. (It seems the name
"cannonicalization" is badly
abused in LLVM; it could refers> > > > to this kind of xform. IMO, cannonicalization is sort of
blind but
consistent restructuring)> > > >
> > > >   BTW, I did similar stuff before in other compiler, I
didn't see
any %.> > >
> > > Right. Reassociate ranks expressions by their order in the IR, soshouldn't make matters worse, but it doesn't know about loops so
can't
expose more of these opportunities AFAIK.> > >
> > > Jeroen, it may be worth filing a bug. I'm not sure why LSR
isn't
handling it later.> > >
> > > For cases that LSR doesn't handle, we've been wanting an
MIreassociate pass in the backend anyway to expose ILP. That could clean up
cases like this when it's actually on the critical path and we don't
mind
the regpressure.> > >
> > > -Andy
> > >
> > > > On 4/23/13 6:48 AM, Jeroen Dobbelaere wrote:
> > > >> Hi,
> > > >>
> > > >> I am investigating a performance degradation between
llvm-3.1 and
llvm-3.2> > > >> (Note: current top-of-tree shows a similar degradation)
> > > >>
> > > >> One issue I see is the following:
> > > >> - 'loop invariant code motion' seems to be
depending on the result
of the 'reassociate expression' pass:> > > >>
> > > >> In the samples below I observer the following behavior:
> > > >>
> > > >> Both start with the same expression:
> > > >>   %add = add i32 %total.0, %next.0
> > > >>   %add1 = add i32 %add, 1
> > > >>
> > > >> -LLVM-3.1 converts this into:
> > > >> --after Reassociate expressions:
> > > >>   %add = add i32 %next.0, 1
> > > >>   %add1 = add i32 %add, %total.0
> > > >> -- after Canonicalize natural loops:
> > > >>   %add = add i32 %next.0.ph, 1
> > > >>   %add1 = add i32 %add, %total.0
> > > >> -- and during 'loop invariant code motion':
> > > >> LICM hoisting to while.cond.outer:   %add = add i32
%next.0.ph, 1
> > > >>
> > > >> - LLVM-3.2 converts it into:  (Notice the reversion of
%total.0
and %next.0)> > > >> --after Reassociate expressions:
> > > >>   %add = add i32 %total.0, 1
> > > >>   %add1 = add i32 %add, %next.0
> > > >> -- after Canonicalize natural loops:
> > > >>   %add = add i32 %total.0, 1
> > > >>   %add1 = add i32 %add, %next.0.ph
> > > >> -- and during 'loop invariant code motion':
> > > >> -> %next.0.ph and '1' are loop invariant, but
because those are
spreaded, they are not moved any more.> > > >>
> > > >>
> > > >> Is there a way that 'Reassociate expressions' of
'LICM' canrearrange the expressions (likely after 'Canonicalize natural
loops'),> > > >> and group loop-invariant parts of the expression ?
> > > >> Or is there another pass that I can/should enable to
have this
effect ?> > > >>
> > > >> Greetings,
> > > >>  Jeroen Dobbelaere
> > > >>
> > > >> ---
> > > >>
> > > >> llvm-3.1
> > > >>
> > > >>
> > > >> while.cond:                                       ;
preds = %sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> > > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ],
[ 8,%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb
]> > > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end
], [%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]> > > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end
], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> > > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9,
%if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> > > >>   %add = add i32 %total.0, %next.0
> > > >>   %add1 = add i32 %add, 1
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label %while.cond29
> > > >>
> > > >> *** IR Dump After Reassociate expressions ***
> > > >> define void @foo(i32 %size, i16 signext %seed, i8*
nocapture %p)
nounwind {> > > >> entry:
> > > >>   %dec = add i32 %size, -1
> > > >>   br label %while.cond
> > > >>
> > > >> while.cond:                                       ;
preds = %sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> > > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ],
[ 8,%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb
]> > > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end
], [%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]> > > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end
], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> > > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9,
%if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> > > >>   %add = add i32 %next.0, 1
> > > >>   %add1 = add i32 %add, %total.0
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label %while.cond29
> > > >>
> > > >>
> > > >> *** IR Dump After Canonicalize natural loops ***
> > > >> while.cond:                                       ;
preds %while.cond.outer, %if.end
> > > >>   %total.0 = phi i32 [ %total.1, %if.end ], [
%total.0.ph,
%while.cond.outer ]> > > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph,
%while.cond.outer ]> > > >>   %add = add i32 %next.0.ph, 1
> > > >>   %add1 = add i32 %add, %total.0
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > > >> [...]
> > > >>
> > > >> for.cond.for.end_crit_edge:                       ;
preds %for.body
> > > >>   %split = phi i32 [ %inc, %for.body ]
> > > >>   br label %for.end
> > > >> LICM hoisting to while.cond.outer:   %add = add i32
%next.0.ph, 1
> > > >> LICM hoisting to while.cond.outer:   %cmp2 = icmp eq i32
%next.0.ph,
0> > > >> LICM hoisting to while.cond.outer:   %cmp343 = icmp ult
i32 0, %
next.0.ph> > > >> LICM hoisting to while.cond.outer:   %add740 = or i32
%next.0.ph, 1
> > > >> *** IR Dump After Loop Invariant Code Motion ***
> > > >> while.cond:                                       ;
preds %while.cond.outer, %if.end
> > > >>   %total.0 = phi i32 [ %total.1, %if.end ], [
%total.0.ph,
%while.cond.outer ]> > > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph,
%while.cond.outer ]> > > >>   %add1 = add i32 %add, %total.0
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > > >> [...]
> > > >>
> > > >> -------
> > > >>
> > > >> llvm-3.2
> > > >>
> > > >>
> > > >> while.cond:                                       ;
preds = %sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> > > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9,
%if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> > > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end
], [%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]> > > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ],
[ 8,%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb
]> > > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end
], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> > > >>   %add = add i32 %total.0, %next.0
> > > >>   %add1 = add i32 %add, 1
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label %while.cond29
> > > >>
> > > >> *** IR Dump After Reassociate expressions ***
> > > >> define void @foo(i32 %size, i16 signext %seed, i8*
nocapture %p)
nounwind {> > > >> entry:
> > > >>   %dec = add i32 %size, -1
> > > >>   br label %while.cond
> > > >>
> > > >> while.cond:                                       ;
preds = %sw.bb,
%sw.bb13, %sw.bb18, %sw.bb23, %if.end, %entry> > > >>   %seed.addr.0 = phi i16 [ %seed, %entry ], [ %inc9,
%if.end ], [%inc9, %sw.bb23 ], [ %inc9, %sw.bb18 ], [ %inc9, %sw.bb13 ], [ %inc9,
%sw.bb]> > > >>   %total.0 = phi i32 [ 0, %entry ], [ %total.1, %if.end
], [%total.1, %sw.bb23 ], [ %total.1, %sw.bb18 ], [ %total.1, %sw.bb13 ], [
%total.1, %sw.bb ]> > > >>   %next.0 = phi i32 [ 0, %entry ], [ %next.0, %if.end ],
[ 8,%sw.bb23 ], [ 8, %sw.bb18 ], [ 8, %sw.bb13 ], [ 4, %sw.bb
]> > > >>   %buf.0 = phi i8* [ null, %entry ], [ %buf.0, %if.end
], [ %4,%sw.bb23 ], [ %3, %sw.bb18 ], [ %2, %sw.bb13 ], [ %1, %sw.bb
]> > > >>   %add = add i32 %total.0, 1
> > > >>   %add1 = add i32 %add, %next.0
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label %while.cond29
> > > >>
> > > >>
> > > >> *** IR Dump After Canonicalize natural loops ***
> > > >> while.cond:                                       ;
preds %while.cond.outer, %if.end
> > > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph,
%while.cond.outer ]> > > >>   %total.0 = phi i32 [ %total.1, %if.end ], [
%total.0.ph,
%while.cond.outer ]> > > >>   %add = add i32 %total.0, 1
> > > >>   %add1 = add i32 %add, %next.0.ph
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > > >> [.....]
> > > >> for.cond.for.end_crit_edge:                       ;
preds %for.body
> > > >>   %split = phi i32 [ %inc, %for.body ]
> > > >>   br label %for.end
> > > >> LICM hoisting to while.cond.outer:   %cmp2 = icmp eq i32
%next.0.ph,
0> > > >> LICM hoisting to while.cond.outer:   %cmp366 = icmp ult
i32 0, %
next.0.ph> > > >> LICM hoisting to while.cond.outer:   %add763 = or i32
%next.0.ph, 1
> > > >> *** IR Dump After Loop Invariant Code Motion ***
> > > >> while.cond:                                       ;
preds %while.cond.outer, %if.end
> > > >>   %seed.addr.0 = phi i16 [ %inc9, %if.end ], [
%seed.addr.0.ph,
%while.cond.outer ]> > > >>   %total.0 = phi i32 [ %total.1, %if.end ], [
%total.0.ph,
%while.cond.outer ]> > > >>   %add = add i32 %total.0, 1
> > > >>   %add1 = add i32 %add, %next.0.ph
> > > >>   %cmp = icmp ult i32 %add1, %dec
> > > >>   br i1 %cmp, label %while.body, label
%while.cond29.preheader
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> LLVM Developers mailing list
> > > >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > > >
> > > > __
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130425/23798f17/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Apr 2013 - [LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

[LLVMdev] 'loop invariant code motion' and 'Reassociate Expression'

Reasonably Related Threads