Preston Briggs via llvm-dev
2015-Dec-09 17:00 UTC
[llvm-dev] persuading licm to do the right thing
I suppose your view is reasonable, and perhaps common. My own "taste" has always preferred machine-independent code that is as simple as possible, so GEPs reduced to nothing more than an add, etc, i.e., quite risc-like. Then optimize it to reduce the total number of operations (as best we can), then raise the level during instruction selection, taking advantage of available instructions. I guess my whole scheme of using opt in this context is probably wrong headed. Thanks On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:> > On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at reservoir.com> wrote: > > I'm trying to make the IR "better", in a machine-independent fashion, > without having to do any lowering. > > > The question is “would the IR be more canonical” with the representation > you suggest? Why would the optimizer benefit from this representation > instead of the current one in general? > Right now this GEP reads as an offset from a constant global, which seems > pretty optimal to me. > > My impression is that when you reach a point where the “better” is target > specific, this is part of the lowering (I’m using lowering in the sense > that you go away from the canonical representation the optimizer expects). > I believe it is pretty common that targets need to do this kind of lowering. > > — > Mehdi > > > > I've written code that rewrites GEPs as simple adds and multiplies, > which helps a lot, but there's still some sort of re-canonicalization > that's getting in my way. Is there perhaps a way to suppress it? > > > Thanks, > Preston > > > On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> I guess is has to be done as part of the lowering for such a target, >> either during CodegenPrepare or during something like MachineLICM. >> >> — >> Mehdi >> >> >> >> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at reservoir.com> wrote: >> >> On some targets with limited addressing modes, >> getting that 64-bit relocatable but loop-invariant value into a register >> requires several instructions. I'd like those several instruction outside >> the loop, where they belong. >> >> Yes, my experience is that something (I assume instcombine) >> recanonicalizes. >> >> Thanks, >> Preston >> >> >> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at apple.com> >> wrote: >> >>> Hi Preston, >>> >>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>> When I compile two different modules using >>> >>> clang -O -S -emit-llvm >>> >>> >>> I get different .ll files, no surprise. >>> >>> The first looks like >>> >>> double *v; >>> >>> double zap(long n) { >>> double sum = 0; >>> for (long i = 0; i < n; i++) >>> sum += v[i]; >>> return sum; >>> } >>> >>> >>> yielding >>> >>> @v = common global double* null, align 8 >>> >>> ; Function Attrs: nounwind readonly uwtable >>> define double @zap(i64 %n) #0 { >>> entry: >>> %cmp4 = icmp sgt i64 %n, 0 >>> br i1 %cmp4, label %for.body.lr.ph, label %for.end >>> >>> for.body.lr.ph: ; preds = %entry >>> %0 = load double** @v, align 8, !tbaa !1 >>> br label %for.body >>> >>> for.body: ; preds = %for.body, % >>> for.body.lr.ph >>> %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body ] >>> %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [ %add, >>> %for.body ] >>> %arrayidx = getelementptr inbounds double* %0, i64 %i.06 >>> %1 = load double* %arrayidx, align 8, !tbaa !5 >>> %add = fadd double %sum.05, %1 >>> %inc = add nsw i64 %i.06, 1 >>> >>> %exitcond = icmp eq i64 %inc, %n >>> br i1 %exitcond, label %for.end, label %for.body >>> >>> for.end: ; preds = %for.body, >>> %entry >>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ] >>> ret double %sum.0.lcssa >>> } >>> >>> >>> and the second looks like >>> >>> double v[10000]; >>> >>> double zap(long n) { >>> double sum = 0; >>> for (long i = 0; i < n; i++) >>> sum += v[i]; >>> return sum; >>> } >>> >>> >>> yielding >>> >>> ; ModuleID = 'z.c' >>> target datalayout >>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128" >>> target triple = "x86_64-unknown-linux-gnu" >>> >>> @v = common global [10000 x double] zeroinitializer, align 16 >>> >>> ; Function Attrs: nounwind readonly uwtable >>> define double @zap(i64 %n) #0 { >>> entry: >>> %cmp4 = icmp sgt i64 %n, 0 >>> br i1 %cmp4, label %for.body, label %for.end >>> >>> for.body: ; preds = %entry, >>> %for.body >>> %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ] >>> %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry ] >>> %arrayidx = getelementptr inbounds [10000 x double]* @v, i64 0, i64 >>> %i.06 >>> %0 = load double* %arrayidx, align 8, !tbaa !1 >>> %add = fadd double %sum.05, %0 >>> %inc = add nsw i64 %i.06, 1 >>> %exitcond = icmp eq i64 %inc, %n >>> br i1 %exitcond, label %for.end, label %for.body >>> >>> for.end: ; preds = %for.body, >>> %entry >>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ] >>> ret double %sum.0.lcssa >>> } >>> >>> attributes #0 = { nounwind readonly uwtable "less-precise-fpmad"="false" >>> "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" >>> "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" >>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>> >>> !llvm.ident = !{!0} >>> >>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1 >>> (tags/RELEASE_34/final)"} >>> !1 = metadata !{metadata !2, metadata !2, i64 0} >>> !2 = metadata !{metadata !"double", metadata !3, i64 0} >>> !3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0} >>> !4 = metadata !{metadata !"Simple C/C++ TBAA"} >>> >>> >>> (I included all the metadata and such for the 2nd case, on the off >>> chance it matters.) >>> >>> Is there any way I can convince licm (or something) to rip open the GEP >>> and hoist the reference to @v outside the loop, similar to the first >>> example? >>> >>> >>> >>> I believe that in the second case, there is no need to load the address >>> of v as it is constant. However you have a constant address to an array, >>> which is represented by [10000 x double]* @v in the IR, which requires to >>> use the two-level GEP. >>> >>> You “could” manage to represent it this way: >>> >>> define double @zap(i64 %n) #0 { >>> entry: >>> %cmp6 = icmp sgt i64 %n, 0 >>> %hoisted = bitcast [10000 x double]* @v to double* >>> br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup >>> >>> for.body.preheader: ; preds = %entry >>> br label %for.body >>> >>> for.cond.cleanup.loopexit: ; preds = %for.body >>> %add.lcssa = phi double [ %add, %for.body ] >>> br label %for.cond.cleanup >>> >>> for.cond.cleanup: ; preds >>> %for.cond.cleanup.loopexit, %entry >>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, >>> %for.cond.cleanup.loopexit ] >>> ret double %sum.0.lcssa >>> >>> for.body: ; preds >>> %for.body.preheader, %for.body >>> %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ] >>> %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00, >>> %for.body.preheader ] >>> %arrayidx = getelementptr double, double* %hoisted, i64 %i.08 >>> %0 = load double, double* %arrayidx, align 8, !tbaa !2 >>> %add = fadd double %sum.07, %0 >>> %inc = add nuw nsw i64 %i.08, 1 >>> %exitcond = icmp eq i64 %inc, %n >>> br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body >>> } >>> >>> >>> However instcombine will recanonicalize it like it was originally. >>> >>> Since it is a GEP that operate on a constant address, this shouldn’t >>> matter, why would you want to split this? >>> >>> Best, >>> >>> — >>> Mehdi >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/9b036d71/attachment.html>
Mehdi Amini via llvm-dev
2015-Dec-09 17:17 UTC
[llvm-dev] persuading licm to do the right thing
> On Dec 9, 2015, at 9:00 AM, Preston Briggs <briggs at reservoir.com> wrote: > > I suppose your view is reasonable, and perhaps common. > My own "taste" has always preferred machine-independent code > that is as simple as possible, so GEPs reduced to nothing more than an > add, etc, i.e., quite risc-like. Then optimize it to reduce the total number > of operations (as best we can), then raise the level during instruction > selection, taking advantage of available instructions.I’m not sure I see something related to risc-like here, it seems to me that your problem is not GEP vs ADD but rather that you want to expose a mode where global addresses need to be loaded and can’t be referenced directly. (Unless I misunderstood the problem which is very possible as well) Maybe you could do this with a transformation that would put all the global variable addresses in a global array and reference them through the array. That’s the only workaround I could see. — Mehdi> > I guess my whole scheme of using opt in this context is probably wrong headed. > > Thanks > > > > On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > >> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at reservoir.com <mailto:briggs at reservoir.com>> wrote: >> >> I'm trying to make the IR "better", in a machine-independent fashion, >> without having to do any lowering. > > The question is “would the IR be more canonical” with the representation you suggest? Why would the optimizer benefit from this representation instead of the current one in general? > Right now this GEP reads as an offset from a constant global, which seems pretty optimal to me. > > My impression is that when you reach a point where the “better” is target specific, this is part of the lowering (I’m using lowering in the sense that you go away from the canonical representation the optimizer expects). I believe it is pretty common that targets need to do this kind of lowering. > > — > Mehdi > > >> >> I've written code that rewrites GEPs as simple adds and multiplies, >> which helps a lot, but there's still some sort of re-canonicalization >> that's getting in my way. Is there perhaps a way to suppress it? >> >> Thanks, >> Preston >> >> >> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: >> I guess is has to be done as part of the lowering for such a target, either during CodegenPrepare or during something like MachineLICM. >> >> — >> Mehdi >> >> >> >>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at reservoir.com <mailto:briggs at reservoir.com>> wrote: >>> >>> On some targets with limited addressing modes, >>> getting that 64-bit relocatable but loop-invariant value into a register >>> requires several instructions. I'd like those several instruction outside >>> the loop, where they belong. >>> >>> Yes, my experience is that something (I assume instcombine) recanonicalizes. >>> >>> Thanks, >>> Preston >>> >>> >>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: >>> Hi Preston, >>> >>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>>> >>>> When I compile two different modules using >>>> >>>> clang -O -S -emit-llvm >>>> >>>> I get different .ll files, no surprise. >>>> >>>> The first looks like >>>> >>>> double *v; >>>> >>>> double zap(long n) { >>>> double sum = 0; >>>> for (long i = 0; i < n; i++) >>>> sum += v[i]; >>>> return sum; >>>> } >>>> >>>> yielding >>>> >>>> @v = common global double* null, align 8 >>>> >>>> ; Function Attrs: nounwind readonly uwtable >>>> define double @zap(i64 %n) #0 { >>>> entry: >>>> %cmp4 = icmp sgt i64 %n, 0 >>>> br i1 %cmp4, label %for.body.lr.ph <http://for.body.lr.ph/>, label %for.end >>>> >>>> for.body.lr.ph <http://for.body.lr.ph/>: ; preds = %entry >>>> %0 = load double** @v, align 8, !tbaa !1 >>>> br label %for.body >>>> >>>> for.body: ; preds = %for.body, %for.body.lr.ph <http://for.body.lr.ph/> >>>> %i.06 = phi i64 [ 0, %for.body.lr.ph <http://for.body.lr.ph/> ], [ %inc, %for.body ] >>>> %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph <http://for.body.lr.ph/> ], [ %add, %for.body ] >>>> %arrayidx = getelementptr inbounds double* %0, i64 %i.06 >>>> %1 = load double* %arrayidx, align 8, !tbaa !5 >>>> %add = fadd double %sum.05, %1 >>>> %inc = add nsw i64 %i.06, 1 >>>> >>>> %exitcond = icmp eq i64 %inc, %n >>>> br i1 %exitcond, label %for.end, label %for.body >>>> >>>> for.end: ; preds = %for.body, %entry >>>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ] >>>> ret double %sum.0.lcssa >>>> } >>>> >>>> and the second looks like >>>> >>>> double v[10000]; >>>> >>>> double zap(long n) { >>>> double sum = 0; >>>> for (long i = 0; i < n; i++) >>>> sum += v[i]; >>>> return sum; >>>> } >>>> >>>> yielding >>>> >>>> ; ModuleID = 'z.c' >>>> target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128" >>>> target triple = "x86_64-unknown-linux-gnu" >>>> >>>> @v = common global [10000 x double] zeroinitializer, align 16 >>>> >>>> ; Function Attrs: nounwind readonly uwtable >>>> define double @zap(i64 %n) #0 { >>>> entry: >>>> %cmp4 = icmp sgt i64 %n, 0 >>>> br i1 %cmp4, label %for.body, label %for.end >>>> >>>> for.body: ; preds = %entry, %for.body >>>> %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ] >>>> %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry ] >>>> %arrayidx = getelementptr inbounds [10000 x double]* @v, i64 0, i64 %i.06 >>>> %0 = load double* %arrayidx, align 8, !tbaa !1 >>>> %add = fadd double %sum.05, %0 >>>> %inc = add nsw i64 %i.06, 1 >>>> %exitcond = icmp eq i64 %inc, %n >>>> br i1 %exitcond, label %for.end, label %for.body >>>> >>>> for.end: ; preds = %for.body, %entry >>>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ] >>>> ret double %sum.0.lcssa >>>> } >>>> >>>> attributes #0 = { nounwind readonly uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } >>>> >>>> !llvm.ident = !{!0} >>>> >>>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1 (tags/RELEASE_34/final)"} >>>> !1 = metadata !{metadata !2, metadata !2, i64 0} >>>> !2 = metadata !{metadata !"double", metadata !3, i64 0} >>>> !3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0} >>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"} >>>> >>>> (I included all the metadata and such for the 2nd case, on the off chance it matters.) >>>> >>>> Is there any way I can convince licm (or something) to rip open the GEP and hoist the reference to @v outside the loop, similar to the first example? >>> >>> >>> I believe that in the second case, there is no need to load the address of v as it is constant. However you have a constant address to an array, which is represented by [10000 x double]* @v in the IR, which requires to use the two-level GEP. >>> >>> You “could” manage to represent it this way: >>> >>> define double @zap(i64 %n) #0 { >>> entry: >>> %cmp6 = icmp sgt i64 %n, 0 >>> %hoisted = bitcast [10000 x double]* @v to double* >>> br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup >>> >>> for.body.preheader: ; preds = %entry >>> br label %for.body >>> >>> for.cond.cleanup.loopexit: ; preds = %for.body >>> %add.lcssa = phi double [ %add, %for.body ] >>> br label %for.cond.cleanup >>> >>> for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry >>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %for.cond.cleanup.loopexit ] >>> ret double %sum.0.lcssa >>> >>> for.body: ; preds = %for.body.preheader, %for.body >>> %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ] >>> %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00, %for.body.preheader ] >>> %arrayidx = getelementptr double, double* %hoisted, i64 %i.08 >>> %0 = load double, double* %arrayidx, align 8, !tbaa !2 >>> %add = fadd double %sum.07, %0 >>> %inc = add nuw nsw i64 %i.08, 1 >>> %exitcond = icmp eq i64 %inc, %n >>> br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body >>> } >>> >>> >>> However instcombine will recanonicalize it like it was originally. >>> >>> Since it is a GEP that operate on a constant address, this shouldn’t matter, why would you want to split this? >>> >>> Best, >>> >>> — >>> Mehdi >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/d9bb750b/attachment-0001.html>
Preston Briggs via llvm-dev
2015-Dec-09 19:16 UTC
[llvm-dev] persuading licm to do the right thing
A GEP can represent a potentially large tree of instructions. Seems like all the sub-trees are hidden from optimization; that is, I never see licm or value numbering doing anything with them. If I rewrite the GEPs as lots of little adds and multiplies, then opt will do a better job (I speculate this happens during lowering). One of the computations that's hidden in the GEP in my example is the non-zero effort required to get the value of a label into a register. I'd like to expose that effort to the optimizer; instead, it seems hidden, papered over with the GEP. Your point about putting labels in global variables seems to work if I do it by hand. Kind of embarassing though, don't you think, introducing an indirection to achieve better code? Thanks, Preston On Wed, Dec 9, 2015 at 9:17 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:> > On Dec 9, 2015, at 9:00 AM, Preston Briggs <briggs at reservoir.com> wrote: > > I suppose your view is reasonable, and perhaps common. > My own "taste" has always preferred machine-independent code > that is as simple as possible, so GEPs reduced to nothing more than an > add, etc, i.e., quite risc-like. Then optimize it to reduce the total > number > of operations (as best we can), then raise the level during instruction > selection, taking advantage of available instructions. > > > I’m not sure I see something related to risc-like here, it seems to me > that your problem is not GEP vs ADD but rather that you want to expose a > mode where global addresses need to be loaded and can’t be referenced > directly. > (Unless I misunderstood the problem which is very possible as well) > > Maybe you could do this with a transformation that would put all the > global variable addresses in a global array and reference them through the > array. That’s the only workaround I could see. > > — > Mehdi > > > > > I guess my whole scheme of using opt in this context is probably wrong > headed. > > Thanks > > > > On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at apple.com> wrote: > >> >> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at reservoir.com> wrote: >> >> I'm trying to make the IR "better", in a machine-independent fashion, >> without having to do any lowering. >> >> >> The question is “would the IR be more canonical” with the representation >> you suggest? Why would the optimizer benefit from this representation >> instead of the current one in general? >> Right now this GEP reads as an offset from a constant global, which seems >> pretty optimal to me. >> >> My impression is that when you reach a point where the “better” is target >> specific, this is part of the lowering (I’m using lowering in the sense >> that you go away from the canonical representation the optimizer expects). >> I believe it is pretty common that targets need to do this kind of lowering. >> >> — >> Mehdi >> >> >> >> I've written code that rewrites GEPs as simple adds and multiplies, >> which helps a lot, but there's still some sort of re-canonicalization >> that's getting in my way. Is there perhaps a way to suppress it? >> >> >> Thanks, >> Preston >> >> >> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at apple.com> >> wrote: >> >>> I guess is has to be done as part of the lowering for such a target, >>> either during CodegenPrepare or during something like MachineLICM. >>> >>> — >>> Mehdi >>> >>> >>> >>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at reservoir.com> wrote: >>> >>> On some targets with limited addressing modes, >>> getting that 64-bit relocatable but loop-invariant value into a register >>> requires several instructions. I'd like those several instruction outside >>> the loop, where they belong. >>> >>> Yes, my experience is that something (I assume instcombine) >>> recanonicalizes. >>> >>> Thanks, >>> Preston >>> >>> >>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at apple.com> >>> wrote: >>> >>>> Hi Preston, >>>> >>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev < >>>> llvm-dev at lists.llvm.org> wrote: >>>> >>>> When I compile two different modules using >>>> >>>> clang -O -S -emit-llvm >>>> >>>> >>>> I get different .ll files, no surprise. >>>> >>>> The first looks like >>>> >>>> double *v; >>>> >>>> double zap(long n) { >>>> double sum = 0; >>>> for (long i = 0; i < n; i++) >>>> sum += v[i]; >>>> return sum; >>>> } >>>> >>>> >>>> yielding >>>> >>>> @v = common global double* null, align 8 >>>> >>>> ; Function Attrs: nounwind readonly uwtable >>>> define double @zap(i64 %n) #0 { >>>> entry: >>>> %cmp4 = icmp sgt i64 %n, 0 >>>> br i1 %cmp4, label %for.body.lr.ph, label %for.end >>>> >>>> for.body.lr.ph: ; preds = %entry >>>> %0 = load double** @v, align 8, !tbaa !1 >>>> br label %for.body >>>> >>>> for.body: ; preds = %for.body, % >>>> for.body.lr.ph >>>> %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body ] >>>> %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [ %add, >>>> %for.body ] >>>> %arrayidx = getelementptr inbounds double* %0, i64 %i.06 >>>> %1 = load double* %arrayidx, align 8, !tbaa !5 >>>> %add = fadd double %sum.05, %1 >>>> %inc = add nsw i64 %i.06, 1 >>>> >>>> %exitcond = icmp eq i64 %inc, %n >>>> br i1 %exitcond, label %for.end, label %for.body >>>> >>>> for.end: ; preds = %for.body, >>>> %entry >>>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body >>>> ] >>>> ret double %sum.0.lcssa >>>> } >>>> >>>> >>>> and the second looks like >>>> >>>> double v[10000]; >>>> >>>> double zap(long n) { >>>> double sum = 0; >>>> for (long i = 0; i < n; i++) >>>> sum += v[i]; >>>> return sum; >>>> } >>>> >>>> >>>> yielding >>>> >>>> ; ModuleID = 'z.c' >>>> target datalayout >>>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128" >>>> target triple = "x86_64-unknown-linux-gnu" >>>> >>>> @v = common global [10000 x double] zeroinitializer, align 16 >>>> >>>> ; Function Attrs: nounwind readonly uwtable >>>> define double @zap(i64 %n) #0 { >>>> entry: >>>> %cmp4 = icmp sgt i64 %n, 0 >>>> br i1 %cmp4, label %for.body, label %for.end >>>> >>>> for.body: ; preds = %entry, >>>> %for.body >>>> %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ] >>>> %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry ] >>>> %arrayidx = getelementptr inbounds [10000 x double]* @v, i64 0, i64 >>>> %i.06 >>>> %0 = load double* %arrayidx, align 8, !tbaa !1 >>>> %add = fadd double %sum.05, %0 >>>> %inc = add nsw i64 %i.06, 1 >>>> %exitcond = icmp eq i64 %inc, %n >>>> br i1 %exitcond, label %for.end, label %for.body >>>> >>>> for.end: ; preds = %for.body, >>>> %entry >>>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body >>>> ] >>>> ret double %sum.0.lcssa >>>> } >>>> >>>> attributes #0 = { nounwind readonly uwtable >>>> "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" >>>> "no-infs-fp-math"="false" "no-nans-fp-math"="false" >>>> "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" >>>> "use-soft-float"="false" } >>>> >>>> !llvm.ident = !{!0} >>>> >>>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1 >>>> (tags/RELEASE_34/final)"} >>>> !1 = metadata !{metadata !2, metadata !2, i64 0} >>>> !2 = metadata !{metadata !"double", metadata !3, i64 0} >>>> !3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0} >>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"} >>>> >>>> >>>> (I included all the metadata and such for the 2nd case, on the off >>>> chance it matters.) >>>> >>>> Is there any way I can convince licm (or something) to rip open the GEP >>>> and hoist the reference to @v outside the loop, similar to the first >>>> example? >>>> >>>> >>>> >>>> I believe that in the second case, there is no need to load the address >>>> of v as it is constant. However you have a constant address to an array, >>>> which is represented by [10000 x double]* @v in the IR, which requires to >>>> use the two-level GEP. >>>> >>>> You “could” manage to represent it this way: >>>> >>>> define double @zap(i64 %n) #0 { >>>> entry: >>>> %cmp6 = icmp sgt i64 %n, 0 >>>> %hoisted = bitcast [10000 x double]* @v to double* >>>> br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup >>>> >>>> for.body.preheader: ; preds = %entry >>>> br label %for.body >>>> >>>> for.cond.cleanup.loopexit: ; preds = %for.body >>>> %add.lcssa = phi double [ %add, %for.body ] >>>> br label %for.cond.cleanup >>>> >>>> for.cond.cleanup: ; preds >>>> %for.cond.cleanup.loopexit, %entry >>>> %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, >>>> %for.cond.cleanup.loopexit ] >>>> ret double %sum.0.lcssa >>>> >>>> for.body: ; preds >>>> %for.body.preheader, %for.body >>>> %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ] >>>> %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00, >>>> %for.body.preheader ] >>>> %arrayidx = getelementptr double, double* %hoisted, i64 %i.08 >>>> %0 = load double, double* %arrayidx, align 8, !tbaa !2 >>>> %add = fadd double %sum.07, %0 >>>> %inc = add nuw nsw i64 %i.08, 1 >>>> %exitcond = icmp eq i64 %inc, %n >>>> br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body >>>> } >>>> >>>> >>>> However instcombine will recanonicalize it like it was originally. >>>> >>>> Since it is a GEP that operate on a constant address, this shouldn’t >>>> matter, why would you want to split this? >>>> >>>> Best, >>>> >>>> — >>>> Mehdi >>>> >>>> >>> >>> >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/be80c8fd/attachment.html>