thr3ads.net - llvm dev - [llvm-dev] persuading licm to do the right thing [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Preston Briggs via llvm-dev

2015-Dec-09 19:16 UTC

[llvm-dev] persuading licm to do the right thing

A GEP can represent a potentially large tree of instructions.
Seems like all the sub-trees are hidden from optimization;
that is, I never see licm or value numbering doing anything with them.
If I rewrite the GEPs as lots of little adds and multiplies,
then opt will do a better job (I speculate this happens during lowering).

One of the computations that's hidden in the GEP in my example
is the non-zero effort required to get the value of a label into a register.
I'd like to expose that effort to the optimizer; instead, it seems
hidden, papered over with the GEP.

Your point about putting labels in global variables seems to work
if I do it by hand. Kind of embarassing though, don't you think,
introducing an indirection to achieve better code?

Thanks,
Preston





On Wed, Dec 9, 2015 at 9:17 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Dec 9, 2015, at 9:00 AM, Preston Briggs <briggs at reservoir.com>
wrote:
>
> I suppose your view is reasonable, and perhaps common.
> My own "taste" has always preferred machine-independent code
> that is as simple as possible, so GEPs reduced to nothing more than an
> add, etc, i.e., quite risc-like. Then optimize it to reduce the total
> number
> of operations (as best we can), then raise the level during instruction
> selection, taking advantage of available instructions.
>
>
> I’m not sure I see something related to risc-like here, it seems to me
> that  your problem is not GEP vs ADD but rather that you want to expose a
> mode where global addresses need to be loaded and can’t be referenced
> directly.
> (Unless I misunderstood the problem which is very possible as well)
>
> Maybe you could do this with a transformation that would put all the
> global variable addresses in a global array and reference them through the
> array. That’s the only workaround I could see.
>
> —
> Mehdi
>
>
>
>
> I guess my whole scheme of using opt in this context is probably wrong
> headed.
>
> Thanks
>
>
>
> On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>
>>
>> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>
>> I'm trying to make the IR "better", in a
machine-independent fashion,
>> without having to do any lowering.
>>
>>
>> The question is “would the IR be more canonical” with the
representation
>> you suggest? Why would the optimizer benefit from this representation
>> instead of the current one in general?
>> Right now this GEP reads as an offset from a constant global, which
seems
>> pretty optimal to me.
>>
>> My impression is that when you reach a point where the “better” is
target
>> specific, this is part of the lowering (I’m using lowering in the sense
>> that you go away from the canonical representation the optimizer
expects).
>> I believe it is pretty common that targets need to do this kind of
lowering.
>>
>> —
>> Mehdi
>>
>>
>>
>> I've written code that rewrites GEPs as simple adds and multiplies,
>> which helps a lot, but there's still some sort of
re-canonicalization
>> that's getting in my way. Is there perhaps a way to suppress it?
>>
>>
>> Thanks,
>> Preston
>>
>>
>> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>> I guess is has to be done as part of the lowering for such a
target,
>>> either during CodegenPrepare or during something like MachineLICM.
>>>
>>> —
>>> Mehdi
>>>
>>>
>>>
>>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>>
>>> On some targets with limited addressing modes,
>>> getting that 64-bit relocatable but loop-invariant value into a
register
>>> requires several instructions. I'd like those several
instruction outside
>>> the loop, where they belong.
>>>
>>> Yes, my experience is that something (I assume instcombine)
>>> recanonicalizes.
>>>
>>> Thanks,
>>> Preston
>>>
>>>
>>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com>
>>> wrote:
>>>
>>>> Hi Preston,
>>>>
>>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> When I compile two different modules using
>>>>
>>>> clang -O -S -emit-llvm
>>>>
>>>>
>>>> I get different .ll files, no surprise.
>>>>
>>>> The first looks like
>>>>
>>>> double *v;
>>>>
>>>> double zap(long n) {
>>>>   double sum = 0;
>>>>   for (long i = 0; i < n; i++)
>>>>     sum += v[i];
>>>>   return sum;
>>>> }
>>>>
>>>>
>>>> yielding
>>>>
>>>> @v = common global double* null, align 8
>>>>
>>>> ; Function Attrs: nounwind readonly uwtable
>>>> define double @zap(i64 %n) #0 {
>>>> entry:
>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>   br i1 %cmp4, label %for.body.lr.ph, label %for.end
>>>>
>>>> for.body.lr.ph:                                   ; preds =
%entry
>>>>   %0 = load double** @v, align 8, !tbaa !1
>>>>   br label %for.body
>>>>
>>>> for.body:                                         ; preds =
%for.body, %
>>>> for.body.lr.ph
>>>>   %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
>>>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [
%add,
>>>> %for.body ]
>>>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>>>   %add = fadd double %sum.05, %1
>>>>   %inc = add nsw i64 %i.06, 1
>>>>
>>>> %exitcond = icmp eq i64 %inc, %n
>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>
>>>> for.end:                                          ; preds =
%for.body,
>>>> %entry
>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add,
%for.body
>>>> ]
>>>>   ret double %sum.0.lcssa
>>>> }
>>>>
>>>>
>>>> and the second looks like
>>>>
>>>> double v[10000];
>>>>
>>>> double zap(long n) {
>>>>   double sum = 0;
>>>>   for (long i = 0; i < n; i++)
>>>>     sum += v[i];
>>>>   return sum;
>>>> }
>>>>
>>>>
>>>> yielding
>>>>
>>>> ; ModuleID = 'z.c'
>>>> target datalayout >>>>
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>
>>>> @v = common global [10000 x double] zeroinitializer, align 16
>>>>
>>>> ; Function Attrs: nounwind readonly uwtable
>>>> define double @zap(i64 %n) #0 {
>>>> entry:
>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>   br i1 %cmp4, label %for.body, label %for.end
>>>>
>>>> for.body:                                         ; preds =
%entry,
>>>> %for.body
>>>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00,
%entry ]
>>>>   %arrayidx = getelementptr inbounds [10000 x double]* @v, i64
0, i64
>>>> %i.06
>>>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>>>   %add = fadd double %sum.05, %0
>>>>   %inc = add nsw i64 %i.06, 1
>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>
>>>> for.end:                                          ; preds =
%for.body,
>>>> %entry
>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add,
%for.body
>>>> ]
>>>>   ret double %sum.0.lcssa
>>>> }
>>>>
>>>> attributes #0 = { nounwind readonly uwtable
>>>> "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false"
>>>> "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
>>>> "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
>>>> "use-soft-float"="false" }
>>>>
>>>> !llvm.ident = !{!0}
>>>>
>>>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1
>>>> (tags/RELEASE_34/final)"}
>>>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>>>> !2 = metadata !{metadata !"double", metadata !3, i64
0}
>>>> !3 = metadata !{metadata !"omnipotent char", metadata
!4, i64 0}
>>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>>>
>>>>
>>>> (I included all the metadata and such for the 2nd case, on the
off
>>>> chance it matters.)
>>>>
>>>> Is there any way I can convince licm (or something) to rip open
the GEP
>>>> and hoist the reference to @v outside the loop, similar to the
first
>>>> example?
>>>>
>>>>
>>>>
>>>> I believe that in the second case, there is no need to load the
address
>>>> of v as it is constant. However you have a constant address to
an array,
>>>> which is represented by [10000 x double]* @v in the IR, which
requires to
>>>> use the two-level GEP.
>>>>
>>>> You “could” manage to represent it this way:
>>>>
>>>> define double @zap(i64 %n) #0 {
>>>> entry:
>>>>   %cmp6 = icmp sgt i64 %n, 0
>>>>   %hoisted = bitcast [10000 x double]* @v to double*
>>>>   br i1 %cmp6, label %for.body.preheader, label
%for.cond.cleanup
>>>>
>>>> for.body.preheader:                               ; preds =
%entry
>>>>   br label %for.body
>>>>
>>>> for.cond.cleanup.loopexit:                        ; preds =
%for.body
>>>>   %add.lcssa = phi double [ %add, %for.body ]
>>>>   br label %for.cond.cleanup
>>>>
>>>> for.cond.cleanup:                                 ; preds
>>>> %for.cond.cleanup.loopexit, %entry
>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add.lcssa,
>>>> %for.cond.cleanup.loopexit ]
>>>>   ret double %sum.0.lcssa
>>>>
>>>> for.body:                                         ; preds
>>>> %for.body.preheader, %for.body
>>>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader
]
>>>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
>>>> %for.body.preheader ]
>>>>   %arrayidx = getelementptr double, double* %hoisted, i64 %i.08
>>>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>>>   %add = fadd double %sum.07, %0
>>>>   %inc = add nuw nsw i64 %i.08, 1
>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label
%for.body
>>>> }
>>>>
>>>>
>>>> However instcombine will recanonicalize it like it was
originally.
>>>>
>>>> Since it is a GEP that operate on a constant address, this
shouldn’t
>>>> matter, why would you want to split this?
>>>>
>>>> Best,
>>>>
>>>> —
>>>> Mehdi
>>>>
>>>>
>>>
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/be80c8fd/attachment.html>

Daniel Berlin via llvm-dev

2015-Dec-09 19:48 UTC

head link

[llvm-dev] persuading licm to do the right thing

On Wed, Dec 9, 2015 at 11:16 AM, Preston Briggs via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> A GEP can represent a potentially large tree of instructions.
>Seems like all the sub-trees are hidden from
optimization;> that is, I never see licm or value numbering doing anything with them.
>
Without arguing about the broader point of "we can't eliminate the
*lowered* operations these turn into", the higher level issue of not really
doing much with them  are often just deficiencies in those passes.
There are plenty of those kinds of deficiencies
value numbering also doesn't really value number loads or stores, so would
never discover that a load and an add have the same value, etc.
Smart enough value numbering would understand the equivalence between
various forms of geps.

This won't address the issue of optimizing the lowered form of course.

If I rewrite the GEPs as lots of little adds and
multiplies,> then opt will do a better job (I speculate this happens during lowering).
>
For some definition of better job. For example,  this has a cost of making
things like AA significant more difficult

Thus it is likely you want the traditional answer, where the high level
form is optimized as well as it can, they are lowered, and then more
optimization is done.

> One of the computations that's hidden in the GEP in my example
> is the non-zero effort required to get the value of a label into a
> register.
> I'd like to expose that effort to the optimizer; instead, it seems
> hidden, papered over with the GEP.
>
> Your point about putting labels in global variables seems to work
> if I do it by hand. Kind of embarassing though, don't you think,
> introducing an indirection to achieve better code?
>
> Thanks,
> Preston
>
>
>
>
>
> On Wed, Dec 9, 2015 at 9:17 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>
>>
>> On Dec 9, 2015, at 9:00 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>
>> I suppose your view is reasonable, and perhaps common.
>> My own "taste" has always preferred machine-independent code
>> that is as simple as possible, so GEPs reduced to nothing more than an
>> add, etc, i.e., quite risc-like. Then optimize it to reduce the total
>> number
>> of operations (as best we can), then raise the level during instruction
>> selection, taking advantage of available instructions.
>>
>>
>> I’m not sure I see something related to risc-like here, it seems to me
>> that  your problem is not GEP vs ADD but rather that you want to expose
a
>> mode where global addresses need to be loaded and can’t be referenced
>> directly.
>> (Unless I misunderstood the problem which is very possible as well)
>>
>> Maybe you could do this with a transformation that would put all the
>> global variable addresses in a global array and reference them through
the
>> array. That’s the only workaround I could see.
>>
>> —
>> Mehdi
>>
>>
>>
>>
>> I guess my whole scheme of using opt in this context is probably wrong
>> headed.
>>
>> Thanks
>>
>>
>>
>> On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>>
>>> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>>
>>> I'm trying to make the IR "better", in a
machine-independent fashion,
>>> without having to do any lowering.
>>>
>>>
>>> The question is “would the IR be more canonical” with the
representation
>>> you suggest? Why would the optimizer benefit from this
representation
>>> instead of the current one in general?
>>> Right now this GEP reads as an offset from a constant global, which
>>> seems pretty optimal to me.
>>>
>>> My impression is that when you reach a point where the “better” is
>>> target specific, this is part of the lowering (I’m using lowering
in the
>>> sense that you go away from the canonical representation the
optimizer
>>> expects). I believe it is pretty common that targets need to do
this kind
>>> of lowering.
>>>
>>> —
>>> Mehdi
>>>
>>>
>>>
>>> I've written code that rewrites GEPs as simple adds and
multiplies,
>>> which helps a lot, but there's still some sort of
re-canonicalization
>>> that's getting in my way. Is there perhaps a way to suppress
it?
>>>
>>>
>>> Thanks,
>>> Preston
>>>
>>>
>>> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at
apple.com>
>>> wrote:
>>>
>>>> I guess is has to be done as part of the lowering for such a
target,
>>>> either during CodegenPrepare or during something like
MachineLICM.
>>>>
>>>> —
>>>> Mehdi
>>>>
>>>>
>>>>
>>>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at
reservoir.com>
>>>> wrote:
>>>>
>>>> On some targets with limited addressing modes,
>>>> getting that 64-bit relocatable but loop-invariant value into a
register
>>>> requires several instructions. I'd like those several
instruction
>>>> outside
>>>> the loop, where they belong.
>>>>
>>>> Yes, my experience is that something (I assume instcombine)
>>>> recanonicalizes.
>>>>
>>>> Thanks,
>>>> Preston
>>>>
>>>>
>>>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com>
>>>> wrote:
>>>>
>>>>> Hi Preston,
>>>>>
>>>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> When I compile two different modules using
>>>>>
>>>>> clang -O -S -emit-llvm
>>>>>
>>>>>
>>>>> I get different .ll files, no surprise.
>>>>>
>>>>> The first looks like
>>>>>
>>>>> double *v;
>>>>>
>>>>> double zap(long n) {
>>>>>   double sum = 0;
>>>>>   for (long i = 0; i < n; i++)
>>>>>     sum += v[i];
>>>>>   return sum;
>>>>> }
>>>>>
>>>>>
>>>>> yielding
>>>>>
>>>>> @v = common global double* null, align 8
>>>>>
>>>>> ; Function Attrs: nounwind readonly uwtable
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>>   br i1 %cmp4, label %for.body.lr.ph, label %for.end
>>>>>
>>>>> for.body.lr.ph:                                   ; preds =
%entry
>>>>>   %0 = load double** @v, align 8, !tbaa !1
>>>>>   br label %for.body
>>>>>
>>>>> for.body:                                         ; preds =
%for.body,
>>>>> %for.body.lr.ph
>>>>>   %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body
]
>>>>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [
%add,
>>>>> %for.body ]
>>>>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>>>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>>>>   %add = fadd double %sum.05, %1
>>>>>   %inc = add nsw i64 %i.06, 1
>>>>>
>>>>> %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>>
>>>>> for.end:                                          ; preds =
%for.body,
>>>>> %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add,
>>>>> %for.body ]
>>>>>   ret double %sum.0.lcssa
>>>>> }
>>>>>
>>>>>
>>>>> and the second looks like
>>>>>
>>>>> double v[10000];
>>>>>
>>>>> double zap(long n) {
>>>>>   double sum = 0;
>>>>>   for (long i = 0; i < n; i++)
>>>>>     sum += v[i];
>>>>>   return sum;
>>>>> }
>>>>>
>>>>>
>>>>> yielding
>>>>>
>>>>> ; ModuleID = 'z.c'
>>>>> target datalayout >>>>>
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>
>>>>> @v = common global [10000 x double] zeroinitializer, align
16
>>>>>
>>>>> ; Function Attrs: nounwind readonly uwtable
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>>   br i1 %cmp4, label %for.body, label %for.end
>>>>>
>>>>> for.body:                                         ; preds =
%entry,
>>>>> %for.body
>>>>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>>>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00,
%entry ]
>>>>>   %arrayidx = getelementptr inbounds [10000 x double]* @v,
i64 0, i64
>>>>> %i.06
>>>>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>>>>   %add = fadd double %sum.05, %0
>>>>>   %inc = add nsw i64 %i.06, 1
>>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>>
>>>>> for.end:                                          ; preds =
%for.body,
>>>>> %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add,
>>>>> %for.body ]
>>>>>   ret double %sum.0.lcssa
>>>>> }
>>>>>
>>>>> attributes #0 = { nounwind readonly uwtable
>>>>> "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false"
>>>>> "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
>>>>> "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
>>>>> "use-soft-float"="false" }
>>>>>
>>>>> !llvm.ident = !{!0}
>>>>>
>>>>> !0 = metadata !{metadata !"Clang Front-End version
3.4.1
>>>>> (tags/RELEASE_34/final)"}
>>>>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>>>>> !2 = metadata !{metadata !"double", metadata !3,
i64 0}
>>>>> !3 = metadata !{metadata !"omnipotent char",
metadata !4, i64 0}
>>>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>>>>
>>>>>
>>>>> (I included all the metadata and such for the 2nd case, on
the off
>>>>> chance it matters.)
>>>>>
>>>>> Is there any way I can convince licm (or something) to rip
open the
>>>>> GEP and hoist the reference to @v outside the loop, similar
to the first
>>>>> example?
>>>>>
>>>>>
>>>>>
>>>>> I believe that in the second case, there is no need to load
the
>>>>> address of v as it is constant. However you have a constant
address to an
>>>>> array, which is represented by [10000 x double]* @v in the
IR, which
>>>>> requires to use the two-level GEP.
>>>>>
>>>>> You “could” manage to represent it this way:
>>>>>
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp6 = icmp sgt i64 %n, 0
>>>>>   %hoisted = bitcast [10000 x double]* @v to double*
>>>>>   br i1 %cmp6, label %for.body.preheader, label
%for.cond.cleanup
>>>>>
>>>>> for.body.preheader:                               ; preds =
%entry
>>>>>   br label %for.body
>>>>>
>>>>> for.cond.cleanup.loopexit:                        ; preds =
%for.body
>>>>>   %add.lcssa = phi double [ %add, %for.body ]
>>>>>   br label %for.cond.cleanup
>>>>>
>>>>> for.cond.cleanup:                                 ; preds
>>>>> %for.cond.cleanup.loopexit, %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add.lcssa,
>>>>> %for.cond.cleanup.loopexit ]
>>>>>   ret double %sum.0.lcssa
>>>>>
>>>>> for.body:                                         ; preds
>>>>> %for.body.preheader, %for.body
>>>>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0,
%for.body.preheader ]
>>>>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
>>>>> %for.body.preheader ]
>>>>>   %arrayidx = getelementptr double, double* %hoisted, i64
%i.08
>>>>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>>>>   %add = fadd double %sum.07, %0
>>>>>   %inc = add nuw nsw i64 %i.08, 1
>>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label
%for.body
>>>>> }
>>>>>
>>>>>
>>>>> However instcombine will recanonicalize it like it was
originally.
>>>>>
>>>>> Since it is a GEP that operate on a constant address, this
shouldn’t
>>>>> matter, why would you want to split this?
>>>>>
>>>>> Best,
>>>>>
>>>>> —
>>>>> Mehdi
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/25121166/attachment.html>

Mehdi Amini via llvm-dev

2015-Dec-09 21:10 UTC

head link

[llvm-dev] persuading licm to do the right thing

> On Dec 9, 2015, at 11:16 AM, Preston Briggs <briggs at reservoir.com>
wrote:
> 
> A GEP can represent a potentially large tree of instructions.
> Seems like all the sub-trees are hidden from optimization;
> that is, I never see licm or value numbering doing anything with them.
> If I rewrite the GEPs as lots of little adds and multiplies,
> then opt will do a better job (I speculate this happens during lowering).
> 
> One of the computations that's hidden in the GEP in my example
> is the non-zero effort required to get the value of a label into a
register.
> I'd like to expose that effort to the optimizer; instead, it seems
> hidden, papered over with the GEP.
> 
> Your point about putting labels in global variables seems to work
> if I do it by hand. Kind of embarassing though, don't you think,
> introducing an indirection to achieve better code?
Isn’t this indirection just how your target work? 
It seems to me that this indirection just achieve exactly what you want: a more
risc-like and less implicit representation!

As far as I can tell, accessing a “label” does not require a load at the IR
level, (and indeed it does not on most targets). It is baked in everywhere in
the code that operates on the IR. You will fight the model all the time if you
try to circumvent that.
For instance the bitcast of the global label I used in my first answer is a
compile time constant expression for LLVM.
On the same topic, integer literals are assumed to be inlined in the code, we
don’t expose any load in the IR to operate on a literal value.

Also, you are relying on an implicit load (only on your target) for global
labels, which I believe is not valid IR. For LLVM IR, a GEP is nothing more than
pointer arithmetic.
Per LangRef:

    The ‘getelementptr‘ instruction is used to get the address of a subelement
of an aggregate data structure. It performs address calculation only and does
not access memory. The instruction can also be used to calculate a vector of
such addresses.

Note the *does not access memory* part (now, you may argue that the “load” you
are emitting is not accessing memory exposed in the optimizer memory model and
this does not apply).

— 
Mehdi



> 
> Thanks,
> Preston
> 
> 
> 
> 
> 
> On Wed, Dec 9, 2015 at 9:17 AM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> 
>> On Dec 9, 2015, at 9:00 AM, Preston Briggs <briggs at reservoir.com
<mailto:briggs at reservoir.com>> wrote:
>> 
>> I suppose your view is reasonable, and perhaps common.
>> My own "taste" has always preferred machine-independent code
>> that is as simple as possible, so GEPs reduced to nothing more than an
>> add, etc, i.e., quite risc-like. Then optimize it to reduce the total
number
>> of operations (as best we can), then raise the level during instruction
>> selection, taking advantage of available instructions.
> 
> I’m not sure I see something related to risc-like here, it seems to me that
your problem is not GEP vs ADD but rather that you want to expose a mode where
global addresses need to be loaded and can’t be referenced directly.
> (Unless I misunderstood the problem which is very possible as well)
> 
> Maybe you could do this with a transformation that would put all the global
variable addresses in a global array and reference them through the array.
That’s the only workaround I could see.
> 
> — 
> Mehdi
> 
> 
> 
>> 
>> I guess my whole scheme of using opt in this context is probably wrong
headed.
>> 
>> Thanks
>> 
>> 
>> 
>> On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> 
>>> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at
reservoir.com <mailto:briggs at reservoir.com>> wrote:
>>> 
>>> I'm trying to make the IR "better", in a
machine-independent fashion,
>>> without having to do any lowering.
>> 
>> The question is “would the IR be more canonical” with the
representation you suggest? Why would the optimizer benefit from this
representation instead of the current one in general?
>> Right now this GEP reads as an offset from a constant global, which
seems pretty optimal to me.
>> 
>> My impression is that when you reach a point where the “better” is
target specific, this is part of the lowering (I’m using lowering in the sense
that you go away from the canonical representation the optimizer expects). I
believe it is pretty common that targets need to do this kind of lowering.
>> 
>> — 
>> Mehdi
>> 
>> 
>>> 
>>> I've written code that rewrites GEPs as simple adds and
multiplies,
>>> which helps a lot, but there's still some sort of
re-canonicalization
>>> that's getting in my way. Is there perhaps a way to suppress
it?
>>> 
>>> Thanks,
>>> Preston
>>> 
>>> 
>>> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>> I guess is has to be done as part of the lowering for such a
target, either during CodegenPrepare or during something like MachineLICM.
>>> 
>>> — 
>>> Mehdi
>>> 
>>> 
>>> 
>>>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at
reservoir.com <mailto:briggs at reservoir.com>> wrote:
>>>> 
>>>> On some targets with limited addressing modes,
>>>> getting that 64-bit relocatable but loop-invariant value into a
register
>>>> requires several instructions. I'd like those several
instruction outside
>>>> the loop, where they belong.
>>>> 
>>>> Yes, my experience is that something (I assume instcombine)
recanonicalizes.
>>>> 
>>>> Thanks,
>>>> Preston
>>>> 
>>>> 
>>>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>>> Hi Preston,
>>>> 
>>>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>>>> 
>>>>> When I compile two different modules using
>>>>> 
>>>>> clang -O -S -emit-llvm
>>>>> 
>>>>> I get different .ll files, no surprise.
>>>>> 
>>>>> The first looks like
>>>>> 
>>>>> double *v;
>>>>> 
>>>>> double zap(long n) {
>>>>>   double sum = 0;
>>>>>   for (long i = 0; i < n; i++)
>>>>>     sum += v[i];
>>>>>   return sum;
>>>>> }
>>>>> 
>>>>> yielding
>>>>> 
>>>>> @v = common global double* null, align 8
>>>>> 
>>>>> ; Function Attrs: nounwind readonly uwtable
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>>   br i1 %cmp4, label %for.body.lr.ph
<http://for.body.lr.ph/>, label %for.end
>>>>> 
>>>>> for.body.lr.ph <http://for.body.lr.ph/>:             
; preds = %entry
>>>>>   %0 = load double** @v, align 8, !tbaa !1
>>>>>   br label %for.body
>>>>> 
>>>>> for.body:                                         ; preds =
%for.body, %for.body.lr.ph <http://for.body.lr.ph/>
>>>>>   %i.06 = phi i64 [ 0, %for.body.lr.ph
<http://for.body.lr.ph/> ], [ %inc, %for.body ]
>>>>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph
<http://for.body.lr.ph/> ], [ %add, %for.body ]
>>>>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>>>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>>>>   %add = fadd double %sum.05, %1
>>>>>   %inc = add nsw i64 %i.06, 1
>>>>>   
>>>>> %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>> 
>>>>> for.end:                                          ; preds =
%for.body, %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add, %for.body ]
>>>>>   ret double %sum.0.lcssa
>>>>> }
>>>>> 
>>>>> and the second looks like
>>>>> 
>>>>> double v[10000];
>>>>> 
>>>>> double zap(long n) {
>>>>>   double sum = 0;
>>>>>   for (long i = 0; i < n; i++)
>>>>>     sum += v[i];
>>>>>   return sum;
>>>>> }
>>>>> 
>>>>> yielding
>>>>> 
>>>>> ; ModuleID = 'z.c'
>>>>> target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>> 
>>>>> @v = common global [10000 x double] zeroinitializer, align
16
>>>>> 
>>>>> ; Function Attrs: nounwind readonly uwtable
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>>   br i1 %cmp4, label %for.body, label %for.end
>>>>> 
>>>>> for.body:                                         ; preds =
%entry, %for.body
>>>>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>>>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00,
%entry ]
>>>>>   %arrayidx = getelementptr inbounds [10000 x double]* @v,
i64 0, i64 %i.06
>>>>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>>>>   %add = fadd double %sum.05, %0
>>>>>   %inc = add nsw i64 %i.06, 1
>>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>> 
>>>>> for.end:                                          ; preds =
%for.body, %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add, %for.body ]
>>>>>   ret double %sum.0.lcssa
>>>>> }
>>>>> 
>>>>> attributes #0 = { nounwind readonly uwtable
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="false"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
"use-soft-float"="false" }
>>>>> 
>>>>> !llvm.ident = !{!0}
>>>>> 
>>>>> !0 = metadata !{metadata !"Clang Front-End version
3.4.1 (tags/RELEASE_34/final)"}
>>>>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>>>>> !2 = metadata !{metadata !"double", metadata !3,
i64 0}
>>>>> !3 = metadata !{metadata !"omnipotent char",
metadata !4, i64 0}
>>>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>>>> 
>>>>> (I included all the metadata and such for the 2nd case, on
the off chance it matters.)
>>>>> 
>>>>> Is there any way I can convince licm (or something) to rip
open the GEP and hoist the reference to @v outside the loop, similar to the
first example?
>>>> 
>>>> 
>>>> I believe that in the second case, there is no need to load the
address of v as it is constant. However you have a constant address to an array,
which is represented by [10000 x double]* @v in the IR, which requires to use
the two-level GEP.
>>>> 
>>>> You “could” manage to represent it this way:
>>>> 
>>>> define double @zap(i64 %n) #0 {
>>>> entry:
>>>>   %cmp6 = icmp sgt i64 %n, 0
>>>>   %hoisted = bitcast [10000 x double]* @v to double*
>>>>   br i1 %cmp6, label %for.body.preheader, label
%for.cond.cleanup
>>>> 
>>>> for.body.preheader:                               ; preds =
%entry
>>>>   br label %for.body
>>>> 
>>>> for.cond.cleanup.loopexit:                        ; preds =
%for.body
>>>>   %add.lcssa = phi double [ %add, %for.body ]
>>>>   br label %for.cond.cleanup
>>>> 
>>>> for.cond.cleanup:                                 ; preds =
%for.cond.cleanup.loopexit, %entry
>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add.lcssa, %for.cond.cleanup.loopexit ]
>>>>   ret double %sum.0.lcssa
>>>> 
>>>> for.body:                                         ; preds =
%for.body.preheader, %for.body
>>>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader
]
>>>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
%for.body.preheader ]
>>>>   %arrayidx = getelementptr double, double* %hoisted, i64 %i.08
>>>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>>>   %add = fadd double %sum.07, %0
>>>>   %inc = add nuw nsw i64 %i.08, 1
>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label
%for.body
>>>> }
>>>> 
>>>> 
>>>> However instcombine will recanonicalize it like it was
originally.
>>>> 
>>>> Since it is a GEP that operate on a constant address, this
shouldn’t matter, why would you want to split this?
>>>> 
>>>> Best,
>>>> 
>>>> — 
>>>> Mehdi
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/7ef1a31e/attachment.html>

Preston Briggs via llvm-dev

2015-Dec-09 22:25 UTC

head link

[llvm-dev] persuading licm to do the right thing

I understand that GEPs do not access memory.
They do a (possibly expensive) address calculation,
perhaps adding a few values to a label and leaving the result in a register.
Getting a label into a register is (to me) just like loading a 64-bit
integer value
into a register. It can happen in many places and it can cost a few
instructions
and several bytes. I'd like to see such things commoned and hoisted when
possible.

I agree with you that I may be fighting the spirit of the IR.

I also agree with Daniel that the classic solution is to lower and
re-optimize.
Indeed, I tried it, sort of, by rewriting GEPs into a sequence of adds and
multiplies,
with bitcasts to gets the types right. This helps in some places, but the
re-canonicalization defeats my efforts in others places.

So now I feel like I'm going to have to write my own LICM and VN passes.
I can do it and it won't take that long, but I had imagined that I could
pick and choose from the tools already available to achieve the desired
effect.
Seems not, and I'm surprised.

Preston


On Wed, Dec 9, 2015 at 1:10 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Dec 9, 2015, at 11:16 AM, Preston Briggs <briggs at reservoir.com>
wrote:
>
> A GEP can represent a potentially large tree of instructions.
> Seems like all the sub-trees are hidden from optimization;
> that is, I never see licm or value numbering doing anything with them.
> If I rewrite the GEPs as lots of little adds and multiplies,
> then opt will do a better job (I speculate this happens during lowering).
>
> One of the computations that's hidden in the GEP in my example
> is the non-zero effort required to get the value of a label into a
> register.
> I'd like to expose that effort to the optimizer; instead, it seems
> hidden, papered over with the GEP.
>
> Your point about putting labels in global variables seems to work
> if I do it by hand. Kind of embarassing though, don't you think,
> introducing an indirection to achieve better code?
>
>
> Isn’t this indirection just how your target work?
> It seems to me that this indirection just achieve exactly what you want: a
> more risc-like and less implicit representation!
>
> As far as I can tell, accessing a “label” does not require a load at the
> IR level, (and indeed it does not on most targets). It is baked in
> everywhere in the code that operates on the IR. You will fight the model
> all the time if you try to circumvent that.
> For instance the bitcast of the global label I used in my first answer is
> a compile time constant expression for LLVM.
> On the same topic, integer literals are assumed to be inlined in the code,
> we don’t expose any load in the IR to operate on a literal value.
>
> Also, you are relying on an implicit load (only on your target) for global
> labels, which I believe is not valid IR. For LLVM IR, a GEP is nothing more
> than pointer arithmetic.
> Per LangRef:
>
>     The ‘getelementptr‘ instruction is used to get the address of a
> subelement of an aggregate data structure. It performs address calculation
> only and does not access memory. The instruction can also be used to
> calculate a vector of such addresses.
>
> Note the *does not access memory* part (now, you may argue that the “load”
> you are emitting is not accessing memory exposed in the optimizer memory
> model and this does not apply).
>
> —
> Mehdi
>
>
>
>
>
> Thanks,
> Preston
>
>
>
>
>
> On Wed, Dec 9, 2015 at 9:17 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>
>>
>> On Dec 9, 2015, at 9:00 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>
>> I suppose your view is reasonable, and perhaps common.
>> My own "taste" has always preferred machine-independent code
>> that is as simple as possible, so GEPs reduced to nothing more than an
>> add, etc, i.e., quite risc-like. Then optimize it to reduce the total
>> number
>> of operations (as best we can), then raise the level during instruction
>> selection, taking advantage of available instructions.
>>
>>
>> I’m not sure I see something related to risc-like here, it seems to me
>> that  your problem is not GEP vs ADD but rather that you want to expose
a
>> mode where global addresses need to be loaded and can’t be referenced
>> directly.
>> (Unless I misunderstood the problem which is very possible as well)
>>
>> Maybe you could do this with a transformation that would put all the
>> global variable addresses in a global array and reference them through
the
>> array. That’s the only workaround I could see.
>>
>> —
>> Mehdi
>>
>>
>>
>>
>> I guess my whole scheme of using opt in this context is probably wrong
>> headed.
>>
>> Thanks
>>
>>
>>
>> On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>>
>>> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>>
>>> I'm trying to make the IR "better", in a
machine-independent fashion,
>>> without having to do any lowering.
>>>
>>>
>>> The question is “would the IR be more canonical” with the
representation
>>> you suggest? Why would the optimizer benefit from this
representation
>>> instead of the current one in general?
>>> Right now this GEP reads as an offset from a constant global, which
>>> seems pretty optimal to me.
>>>
>>> My impression is that when you reach a point where the “better” is
>>> target specific, this is part of the lowering (I’m using lowering
in the
>>> sense that you go away from the canonical representation the
optimizer
>>> expects). I believe it is pretty common that targets need to do
this kind
>>> of lowering.
>>>
>>> —
>>> Mehdi
>>>
>>>
>>>
>>> I've written code that rewrites GEPs as simple adds and
multiplies,
>>> which helps a lot, but there's still some sort of
re-canonicalization
>>> that's getting in my way. Is there perhaps a way to suppress
it?
>>>
>>>
>>> Thanks,
>>> Preston
>>>
>>>
>>> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at
apple.com>
>>> wrote:
>>>
>>>> I guess is has to be done as part of the lowering for such a
target,
>>>> either during CodegenPrepare or during something like
MachineLICM.
>>>>
>>>> —
>>>> Mehdi
>>>>
>>>>
>>>>
>>>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at
reservoir.com>
>>>> wrote:
>>>>
>>>> On some targets with limited addressing modes,
>>>> getting that 64-bit relocatable but loop-invariant value into a
register
>>>> requires several instructions. I'd like those several
instruction
>>>> outside
>>>> the loop, where they belong.
>>>>
>>>> Yes, my experience is that something (I assume instcombine)
>>>> recanonicalizes.
>>>>
>>>> Thanks,
>>>> Preston
>>>>
>>>>
>>>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com>
>>>> wrote:
>>>>
>>>>> Hi Preston,
>>>>>
>>>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> When I compile two different modules using
>>>>>
>>>>> clang -O -S -emit-llvm
>>>>>
>>>>>
>>>>> I get different .ll files, no surprise.
>>>>>
>>>>> The first looks like
>>>>>
>>>>> double *v;
>>>>>
>>>>> double zap(long n) {
>>>>>   double sum = 0;
>>>>>   for (long i = 0; i < n; i++)
>>>>>     sum += v[i];
>>>>>   return sum;
>>>>> }
>>>>>
>>>>>
>>>>> yielding
>>>>>
>>>>> @v = common global double* null, align 8
>>>>>
>>>>> ; Function Attrs: nounwind readonly uwtable
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>>   br i1 %cmp4, label %for.body.lr.ph, label %for.end
>>>>>
>>>>> for.body.lr.ph:                                   ; preds =
%entry
>>>>>   %0 = load double** @v, align 8, !tbaa !1
>>>>>   br label %for.body
>>>>>
>>>>> for.body:                                         ; preds =
%for.body,
>>>>> %for.body.lr.ph
>>>>>   %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body
]
>>>>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [
%add,
>>>>> %for.body ]
>>>>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>>>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>>>>   %add = fadd double %sum.05, %1
>>>>>   %inc = add nsw i64 %i.06, 1
>>>>>
>>>>> %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>>
>>>>> for.end:                                          ; preds =
%for.body,
>>>>> %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add,
>>>>> %for.body ]
>>>>>   ret double %sum.0.lcssa
>>>>> }
>>>>>
>>>>>
>>>>> and the second looks like
>>>>>
>>>>> double v[10000];
>>>>>
>>>>> double zap(long n) {
>>>>>   double sum = 0;
>>>>>   for (long i = 0; i < n; i++)
>>>>>     sum += v[i];
>>>>>   return sum;
>>>>> }
>>>>>
>>>>>
>>>>> yielding
>>>>>
>>>>> ; ModuleID = 'z.c'
>>>>> target datalayout >>>>>
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>
>>>>> @v = common global [10000 x double] zeroinitializer, align
16
>>>>>
>>>>> ; Function Attrs: nounwind readonly uwtable
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp4 = icmp sgt i64 %n, 0
>>>>>   br i1 %cmp4, label %for.body, label %for.end
>>>>>
>>>>> for.body:                                         ; preds =
%entry,
>>>>> %for.body
>>>>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>>>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00,
%entry ]
>>>>>   %arrayidx = getelementptr inbounds [10000 x double]* @v,
i64 0, i64
>>>>> %i.06
>>>>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>>>>   %add = fadd double %sum.05, %0
>>>>>   %inc = add nsw i64 %i.06, 1
>>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.end, label %for.body
>>>>>
>>>>> for.end:                                          ; preds =
%for.body,
>>>>> %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add,
>>>>> %for.body ]
>>>>>   ret double %sum.0.lcssa
>>>>> }
>>>>>
>>>>> attributes #0 = { nounwind readonly uwtable
>>>>> "less-precise-fpmad"="false"
"no-frame-pointer-elim"="false"
>>>>> "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
>>>>> "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
>>>>> "use-soft-float"="false" }
>>>>>
>>>>> !llvm.ident = !{!0}
>>>>>
>>>>> !0 = metadata !{metadata !"Clang Front-End version
3.4.1
>>>>> (tags/RELEASE_34/final)"}
>>>>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>>>>> !2 = metadata !{metadata !"double", metadata !3,
i64 0}
>>>>> !3 = metadata !{metadata !"omnipotent char",
metadata !4, i64 0}
>>>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>>>>
>>>>>
>>>>> (I included all the metadata and such for the 2nd case, on
the off
>>>>> chance it matters.)
>>>>>
>>>>> Is there any way I can convince licm (or something) to rip
open the
>>>>> GEP and hoist the reference to @v outside the loop, similar
to the first
>>>>> example?
>>>>>
>>>>>
>>>>>
>>>>> I believe that in the second case, there is no need to load
the
>>>>> address of v as it is constant. However you have a constant
address to an
>>>>> array, which is represented by [10000 x double]* @v in the
IR, which
>>>>> requires to use the two-level GEP.
>>>>>
>>>>> You “could” manage to represent it this way:
>>>>>
>>>>> define double @zap(i64 %n) #0 {
>>>>> entry:
>>>>>   %cmp6 = icmp sgt i64 %n, 0
>>>>>   %hoisted = bitcast [10000 x double]* @v to double*
>>>>>   br i1 %cmp6, label %for.body.preheader, label
%for.cond.cleanup
>>>>>
>>>>> for.body.preheader:                               ; preds =
%entry
>>>>>   br label %for.body
>>>>>
>>>>> for.cond.cleanup.loopexit:                        ; preds =
%for.body
>>>>>   %add.lcssa = phi double [ %add, %for.body ]
>>>>>   br label %for.cond.cleanup
>>>>>
>>>>> for.cond.cleanup:                                 ; preds
>>>>> %for.cond.cleanup.loopexit, %entry
>>>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [
%add.lcssa,
>>>>> %for.cond.cleanup.loopexit ]
>>>>>   ret double %sum.0.lcssa
>>>>>
>>>>> for.body:                                         ; preds
>>>>> %for.body.preheader, %for.body
>>>>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0,
%for.body.preheader ]
>>>>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
>>>>> %for.body.preheader ]
>>>>>   %arrayidx = getelementptr double, double* %hoisted, i64
%i.08
>>>>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>>>>   %add = fadd double %sum.07, %0
>>>>>   %inc = add nuw nsw i64 %i.08, 1
>>>>>   %exitcond = icmp eq i64 %inc, %n
>>>>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label
%for.body
>>>>> }
>>>>>
>>>>>
>>>>> However instcombine will recanonicalize it like it was
originally.
>>>>>
>>>>> Since it is a GEP that operate on a constant address, this
shouldn’t
>>>>> matter, why would you want to split this?
>>>>>
>>>>> Best,
>>>>>
>>>>> —
>>>>> Mehdi
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/51bfea6b/attachment-0001.html>

llvm dev - Dec 2015 - persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing