thr3ads.net - llvm dev - [llvm-dev] persuading licm to do the right thing [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Preston Briggs via llvm-dev

2015-Dec-09 15:58 UTC

[llvm-dev] persuading licm to do the right thing

I'm trying to make the IR "better", in a machine-independent
fashion,
without having to do any lowering.

I've written code that rewrites GEPs as simple adds and multiplies,
which helps a lot, but there's still some sort of re-canonicalization
that's getting in my way. Is there perhaps a way to suppress it?

Thanks,
Preston


On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> I guess is has to be done as part of the lowering for such a target,
> either during CodegenPrepare or during something like MachineLICM.
>
> —
> Mehdi
>
>
>
> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at reservoir.com>
wrote:
>
> On some targets with limited addressing modes,
> getting that 64-bit relocatable but loop-invariant value into a register
> requires several instructions. I'd like those several instruction
outside
> the loop, where they belong.
>
> Yes, my experience is that something (I assume instcombine)
> recanonicalizes.
>
> Thanks,
> Preston
>
>
> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com>
> wrote:
>
>> Hi Preston,
>>
>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> When I compile two different modules using
>>
>> clang -O -S -emit-llvm
>>
>>
>> I get different .ll files, no surprise.
>>
>> The first looks like
>>
>> double *v;
>>
>> double zap(long n) {
>>   double sum = 0;
>>   for (long i = 0; i < n; i++)
>>     sum += v[i];
>>   return sum;
>> }
>>
>>
>> yielding
>>
>> @v = common global double* null, align 8
>>
>> ; Function Attrs: nounwind readonly uwtable
>> define double @zap(i64 %n) #0 {
>> entry:
>>   %cmp4 = icmp sgt i64 %n, 0
>>   br i1 %cmp4, label %for.body.lr.ph, label %for.end
>>
>> for.body.lr.ph:                                   ; preds = %entry
>>   %0 = load double** @v, align 8, !tbaa !1
>>   br label %for.body
>>
>> for.body:                                         ; preds = %for.body,
%
>> for.body.lr.ph
>>   %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [ %add,
>> %for.body ]
>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>   %add = fadd double %sum.05, %1
>>   %inc = add nsw i64 %i.06, 1
>>
>> %exitcond = icmp eq i64 %inc, %n
>>   br i1 %exitcond, label %for.end, label %for.body
>>
>> for.end:                                          ; preds = %for.body,
>> %entry
>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body
]
>>   ret double %sum.0.lcssa
>> }
>>
>>
>> and the second looks like
>>
>> double v[10000];
>>
>> double zap(long n) {
>>   double sum = 0;
>>   for (long i = 0; i < n; i++)
>>     sum += v[i];
>>   return sum;
>> }
>>
>>
>> yielding
>>
>> ; ModuleID = 'z.c'
>> target datalayout >>
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>> target triple = "x86_64-unknown-linux-gnu"
>>
>> @v = common global [10000 x double] zeroinitializer, align 16
>>
>> ; Function Attrs: nounwind readonly uwtable
>> define double @zap(i64 %n) #0 {
>> entry:
>>   %cmp4 = icmp sgt i64 %n, 0
>>   br i1 %cmp4, label %for.body, label %for.end
>>
>> for.body:                                         ; preds = %entry,
>> %for.body
>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry ]
>>   %arrayidx = getelementptr inbounds [10000 x double]* @v, i64 0, i64
>> %i.06
>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>   %add = fadd double %sum.05, %0
>>   %inc = add nsw i64 %i.06, 1
>>   %exitcond = icmp eq i64 %inc, %n
>>   br i1 %exitcond, label %for.end, label %for.body
>>
>> for.end:                                          ; preds = %for.body,
>> %entry
>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body
]
>>   ret double %sum.0.lcssa
>> }
>>
>> attributes #0 = { nounwind readonly uwtable
"less-precise-fpmad"="false"
>> "no-frame-pointer-elim"="false"
"no-infs-fp-math"="false"
>> "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
>> "unsafe-fp-math"="false"
"use-soft-float"="false" }
>>
>> !llvm.ident = !{!0}
>>
>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1
>> (tags/RELEASE_34/final)"}
>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>> !2 = metadata !{metadata !"double", metadata !3, i64 0}
>> !3 = metadata !{metadata !"omnipotent char", metadata !4, i64
0}
>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>
>>
>> (I included all the metadata and such for the 2nd case, on the off
chance
>> it matters.)
>>
>> Is there any way I can convince licm (or something) to rip open the GEP
>> and hoist the reference to @v outside the loop, similar to the first
>> example?
>>
>>
>>
>> I believe that in the second case, there is no need to load the address
>> of v as it is constant. However you have a constant address to an
array,
>> which is represented by [10000 x double]* @v in the IR, which requires
to
>> use the two-level GEP.
>>
>> You “could” manage to represent it this way:
>>
>> define double @zap(i64 %n) #0 {
>> entry:
>>   %cmp6 = icmp sgt i64 %n, 0
>>   %hoisted = bitcast [10000 x double]* @v to double*
>>   br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup
>>
>> for.body.preheader:                               ; preds = %entry
>>   br label %for.body
>>
>> for.cond.cleanup.loopexit:                        ; preds = %for.body
>>   %add.lcssa = phi double [ %add, %for.body ]
>>   br label %for.cond.cleanup
>>
>> for.cond.cleanup:                                 ; preds >>
%for.cond.cleanup.loopexit, %entry
>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa,
>> %for.cond.cleanup.loopexit ]
>>   ret double %sum.0.lcssa
>>
>> for.body:                                         ; preds >>
%for.body.preheader, %for.body
>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ]
>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
>> %for.body.preheader ]
>>   %arrayidx = getelementptr double, double* %hoisted, i64 %i.08
>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>   %add = fadd double %sum.07, %0
>>   %inc = add nuw nsw i64 %i.08, 1
>>   %exitcond = icmp eq i64 %inc, %n
>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
>> }
>>
>>
>> However instcombine will recanonicalize it like it was originally.
>>
>> Since it is a GEP that operate on a constant address, this shouldn’t
>> matter, why would you want to split this?
>>
>> Best,
>>
>> —
>> Mehdi
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/6e2c8a68/attachment.html>

Mehdi Amini via llvm-dev

2015-Dec-09 16:45 UTC

head link

[llvm-dev] persuading licm to do the right thing

> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at reservoir.com>
wrote:
> 
> I'm trying to make the IR "better", in a machine-independent
fashion,
> without having to do any lowering.
The question is “would the IR be more canonical” with the representation you
suggest? Why would the optimizer benefit from this representation instead of the
current one in general?
Right now this GEP reads as an offset from a constant global, which seems pretty
optimal to me.

My impression is that when you reach a point where the “better” is target
specific, this is part of the lowering (I’m using lowering in the sense that you
go away from the canonical representation the optimizer expects). I believe it
is pretty common that targets need to do this kind of lowering.

— 
Mehdi

> 
> I've written code that rewrites GEPs as simple adds and multiplies,
> which helps a lot, but there's still some sort of re-canonicalization
> that's getting in my way. Is there perhaps a way to suppress it?
> 
> Thanks,
> Preston
> 
> 
> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at apple.com
<mailto:mehdi.amini at apple.com>> wrote:
> I guess is has to be done as part of the lowering for such a target, either
during CodegenPrepare or during something like MachineLICM.
> 
> — 
> Mehdi
> 
> 
> 
>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at reservoir.com
<mailto:briggs at reservoir.com>> wrote:
>> 
>> On some targets with limited addressing modes,
>> getting that 64-bit relocatable but loop-invariant value into a
register
>> requires several instructions. I'd like those several instruction
outside
>> the loop, where they belong.
>> 
>> Yes, my experience is that something (I assume instcombine)
recanonicalizes.
>> 
>> Thanks,
>> Preston
>> 
>> 
>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com <mailto:mehdi.amini at apple.com>> wrote:
>> Hi Preston,
>> 
>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev
<llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
wrote:
>>> 
>>> When I compile two different modules using
>>> 
>>> clang -O -S -emit-llvm
>>> 
>>> I get different .ll files, no surprise.
>>> 
>>> The first looks like
>>> 
>>> double *v;
>>> 
>>> double zap(long n) {
>>>   double sum = 0;
>>>   for (long i = 0; i < n; i++)
>>>     sum += v[i];
>>>   return sum;
>>> }
>>> 
>>> yielding
>>> 
>>> @v = common global double* null, align 8
>>> 
>>> ; Function Attrs: nounwind readonly uwtable
>>> define double @zap(i64 %n) #0 {
>>> entry:
>>>   %cmp4 = icmp sgt i64 %n, 0
>>>   br i1 %cmp4, label %for.body.lr.ph
<http://for.body.lr.ph/>, label %for.end
>>> 
>>> for.body.lr.ph <http://for.body.lr.ph/>:                     
; preds = %entry
>>>   %0 = load double** @v, align 8, !tbaa !1
>>>   br label %for.body
>>> 
>>> for.body:                                         ; preds =
%for.body, %for.body.lr.ph <http://for.body.lr.ph/>
>>>   %i.06 = phi i64 [ 0, %for.body.lr.ph
<http://for.body.lr.ph/> ], [ %inc, %for.body ]
>>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph
<http://for.body.lr.ph/> ], [ %add, %for.body ]
>>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>>   %add = fadd double %sum.05, %1
>>>   %inc = add nsw i64 %i.06, 1
>>>   
>>> %exitcond = icmp eq i64 %inc, %n
>>>   br i1 %exitcond, label %for.end, label %for.body
>>> 
>>> for.end:                                          ; preds =
%for.body, %entry
>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add,
%for.body ]
>>>   ret double %sum.0.lcssa
>>> }
>>> 
>>> and the second looks like
>>> 
>>> double v[10000];
>>> 
>>> double zap(long n) {
>>>   double sum = 0;
>>>   for (long i = 0; i < n; i++)
>>>     sum += v[i];
>>>   return sum;
>>> }
>>> 
>>> yielding
>>> 
>>> ; ModuleID = 'z.c'
>>> target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>>> target triple = "x86_64-unknown-linux-gnu"
>>> 
>>> @v = common global [10000 x double] zeroinitializer, align 16
>>> 
>>> ; Function Attrs: nounwind readonly uwtable
>>> define double @zap(i64 %n) #0 {
>>> entry:
>>>   %cmp4 = icmp sgt i64 %n, 0
>>>   br i1 %cmp4, label %for.body, label %for.end
>>> 
>>> for.body:                                         ; preds = %entry,
%for.body
>>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry
]
>>>   %arrayidx = getelementptr inbounds [10000 x double]* @v, i64 0,
i64 %i.06
>>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>>   %add = fadd double %sum.05, %0
>>>   %inc = add nsw i64 %i.06, 1
>>>   %exitcond = icmp eq i64 %inc, %n
>>>   br i1 %exitcond, label %for.end, label %for.body
>>> 
>>> for.end:                                          ; preds =
%for.body, %entry
>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add,
%for.body ]
>>>   ret double %sum.0.lcssa
>>> }
>>> 
>>> attributes #0 = { nounwind readonly uwtable
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="false"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
"use-soft-float"="false" }
>>> 
>>> !llvm.ident = !{!0}
>>> 
>>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1
(tags/RELEASE_34/final)"}
>>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>>> !2 = metadata !{metadata !"double", metadata !3, i64 0}
>>> !3 = metadata !{metadata !"omnipotent char", metadata !4,
i64 0}
>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>> 
>>> (I included all the metadata and such for the 2nd case, on the off
chance it matters.)
>>> 
>>> Is there any way I can convince licm (or something) to rip open the
GEP and hoist the reference to @v outside the loop, similar to the first
example?
>> 
>> 
>> I believe that in the second case, there is no need to load the address
of v as it is constant. However you have a constant address to an array, which
is represented by [10000 x double]* @v in the IR, which requires to use the
two-level GEP.
>> 
>> You “could” manage to represent it this way:
>> 
>> define double @zap(i64 %n) #0 {
>> entry:
>>   %cmp6 = icmp sgt i64 %n, 0
>>   %hoisted = bitcast [10000 x double]* @v to double*
>>   br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup
>> 
>> for.body.preheader:                               ; preds = %entry
>>   br label %for.body
>> 
>> for.cond.cleanup.loopexit:                        ; preds = %for.body
>>   %add.lcssa = phi double [ %add, %for.body ]
>>   br label %for.cond.cleanup
>> 
>> for.cond.cleanup:                                 ; preds =
%for.cond.cleanup.loopexit, %entry
>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa,
%for.cond.cleanup.loopexit ]
>>   ret double %sum.0.lcssa
>> 
>> for.body:                                         ; preds =
%for.body.preheader, %for.body
>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ]
>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
%for.body.preheader ]
>>   %arrayidx = getelementptr double, double* %hoisted, i64 %i.08
>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>   %add = fadd double %sum.07, %0
>>   %inc = add nuw nsw i64 %i.08, 1
>>   %exitcond = icmp eq i64 %inc, %n
>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
>> }
>> 
>> 
>> However instcombine will recanonicalize it like it was originally.
>> 
>> Since it is a GEP that operate on a constant address, this shouldn’t
matter, why would you want to split this?
>> 
>> Best,
>> 
>> — 
>> Mehdi
>> 
>> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/09a20f1d/attachment.html>

Preston Briggs via llvm-dev

2015-Dec-09 17:00 UTC

head link

[llvm-dev] persuading licm to do the right thing

I suppose your view is reasonable, and perhaps common.
My own "taste" has always preferred machine-independent code
that is as simple as possible, so GEPs reduced to nothing more than an
add, etc, i.e., quite risc-like. Then optimize it to reduce the total number
of operations (as best we can), then raise the level during instruction
selection, taking advantage of available instructions.

I guess my whole scheme of using opt in this context is probably wrong
headed.

Thanks



On Wed, Dec 9, 2015 at 8:45 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> On Dec 9, 2015, at 7:58 AM, Preston Briggs <briggs at reservoir.com>
wrote:
>
> I'm trying to make the IR "better", in a machine-independent
fashion,
> without having to do any lowering.
>
>
> The question is “would the IR be more canonical” with the representation
> you suggest? Why would the optimizer benefit from this representation
> instead of the current one in general?
> Right now this GEP reads as an offset from a constant global, which seems
> pretty optimal to me.
>
> My impression is that when you reach a point where the “better” is target
> specific, this is part of the lowering (I’m using lowering in the sense
> that you go away from the canonical representation the optimizer expects).
> I believe it is pretty common that targets need to do this kind of
lowering.
>
> —
> Mehdi
>
>
>
> I've written code that rewrites GEPs as simple adds and multiplies,
> which helps a lot, but there's still some sort of re-canonicalization
> that's getting in my way. Is there perhaps a way to suppress it?
>
>
> Thanks,
> Preston
>
>
> On Wed, Dec 9, 2015 at 7:47 AM, Mehdi Amini <mehdi.amini at
apple.com> wrote:
>
>> I guess is has to be done as part of the lowering for such a target,
>> either during CodegenPrepare or during something like MachineLICM.
>>
>> —
>> Mehdi
>>
>>
>>
>> On Dec 9, 2015, at 7:13 AM, Preston Briggs <briggs at
reservoir.com> wrote:
>>
>> On some targets with limited addressing modes,
>> getting that 64-bit relocatable but loop-invariant value into a
register
>> requires several instructions. I'd like those several instruction
outside
>> the loop, where they belong.
>>
>> Yes, my experience is that something (I assume instcombine)
>> recanonicalizes.
>>
>> Thanks,
>> Preston
>>
>>
>> On Tue, Dec 8, 2015 at 11:21 PM, Mehdi Amini <mehdi.amini at
apple.com>
>> wrote:
>>
>>> Hi Preston,
>>>
>>> On Dec 8, 2015, at 10:56 PM, Preston Briggs via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>> When I compile two different modules using
>>>
>>> clang -O -S -emit-llvm
>>>
>>>
>>> I get different .ll files, no surprise.
>>>
>>> The first looks like
>>>
>>> double *v;
>>>
>>> double zap(long n) {
>>>   double sum = 0;
>>>   for (long i = 0; i < n; i++)
>>>     sum += v[i];
>>>   return sum;
>>> }
>>>
>>>
>>> yielding
>>>
>>> @v = common global double* null, align 8
>>>
>>> ; Function Attrs: nounwind readonly uwtable
>>> define double @zap(i64 %n) #0 {
>>> entry:
>>>   %cmp4 = icmp sgt i64 %n, 0
>>>   br i1 %cmp4, label %for.body.lr.ph, label %for.end
>>>
>>> for.body.lr.ph:                                   ; preds = %entry
>>>   %0 = load double** @v, align 8, !tbaa !1
>>>   br label %for.body
>>>
>>> for.body:                                         ; preds =
%for.body, %
>>> for.body.lr.ph
>>>   %i.06 = phi i64 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
>>>   %sum.05 = phi double [ 0.000000e+00, %for.body.lr.ph ], [ %add,
>>> %for.body ]
>>>   %arrayidx = getelementptr inbounds double* %0, i64 %i.06
>>>   %1 = load double* %arrayidx, align 8, !tbaa !5
>>>   %add = fadd double %sum.05, %1
>>>   %inc = add nsw i64 %i.06, 1
>>>
>>> %exitcond = icmp eq i64 %inc, %n
>>>   br i1 %exitcond, label %for.end, label %for.body
>>>
>>> for.end:                                          ; preds =
%for.body,
>>> %entry
>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add,
%for.body ]
>>>   ret double %sum.0.lcssa
>>> }
>>>
>>>
>>> and the second looks like
>>>
>>> double v[10000];
>>>
>>> double zap(long n) {
>>>   double sum = 0;
>>>   for (long i = 0; i < n; i++)
>>>     sum += v[i];
>>>   return sum;
>>> }
>>>
>>>
>>> yielding
>>>
>>> ; ModuleID = 'z.c'
>>> target datalayout >>>
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64-S128"
>>> target triple = "x86_64-unknown-linux-gnu"
>>>
>>> @v = common global [10000 x double] zeroinitializer, align 16
>>>
>>> ; Function Attrs: nounwind readonly uwtable
>>> define double @zap(i64 %n) #0 {
>>> entry:
>>>   %cmp4 = icmp sgt i64 %n, 0
>>>   br i1 %cmp4, label %for.body, label %for.end
>>>
>>> for.body:                                         ; preds = %entry,
>>> %for.body
>>>   %i.06 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
>>>   %sum.05 = phi double [ %add, %for.body ], [ 0.000000e+00, %entry
]
>>>   %arrayidx = getelementptr inbounds [10000 x double]* @v, i64 0,
i64
>>> %i.06
>>>   %0 = load double* %arrayidx, align 8, !tbaa !1
>>>   %add = fadd double %sum.05, %0
>>>   %inc = add nsw i64 %i.06, 1
>>>   %exitcond = icmp eq i64 %inc, %n
>>>   br i1 %exitcond, label %for.end, label %for.body
>>>
>>> for.end:                                          ; preds =
%for.body,
>>> %entry
>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add,
%for.body ]
>>>   ret double %sum.0.lcssa
>>> }
>>>
>>> attributes #0 = { nounwind readonly uwtable
"less-precise-fpmad"="false"
>>> "no-frame-pointer-elim"="false"
"no-infs-fp-math"="false"
>>> "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
>>> "unsafe-fp-math"="false"
"use-soft-float"="false" }
>>>
>>> !llvm.ident = !{!0}
>>>
>>> !0 = metadata !{metadata !"Clang Front-End version 3.4.1
>>> (tags/RELEASE_34/final)"}
>>> !1 = metadata !{metadata !2, metadata !2, i64 0}
>>> !2 = metadata !{metadata !"double", metadata !3, i64 0}
>>> !3 = metadata !{metadata !"omnipotent char", metadata !4,
i64 0}
>>> !4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>>
>>>
>>> (I included all the metadata and such for the 2nd case, on the off
>>> chance it matters.)
>>>
>>> Is there any way I can convince licm (or something) to rip open the
GEP
>>> and hoist the reference to @v outside the loop, similar to the
first
>>> example?
>>>
>>>
>>>
>>> I believe that in the second case, there is no need to load the
address
>>> of v as it is constant. However you have a constant address to an
array,
>>> which is represented by [10000 x double]* @v in the IR, which
requires to
>>> use the two-level GEP.
>>>
>>> You “could” manage to represent it this way:
>>>
>>> define double @zap(i64 %n) #0 {
>>> entry:
>>>   %cmp6 = icmp sgt i64 %n, 0
>>>   %hoisted = bitcast [10000 x double]* @v to double*
>>>   br i1 %cmp6, label %for.body.preheader, label %for.cond.cleanup
>>>
>>> for.body.preheader:                               ; preds = %entry
>>>   br label %for.body
>>>
>>> for.cond.cleanup.loopexit:                        ; preds =
%for.body
>>>   %add.lcssa = phi double [ %add, %for.body ]
>>>   br label %for.cond.cleanup
>>>
>>> for.cond.cleanup:                                 ; preds
>>> %for.cond.cleanup.loopexit, %entry
>>>   %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa,
>>> %for.cond.cleanup.loopexit ]
>>>   ret double %sum.0.lcssa
>>>
>>> for.body:                                         ; preds
>>> %for.body.preheader, %for.body
>>>   %i.08 = phi i64 [ %inc, %for.body ], [ 0, %for.body.preheader ]
>>>   %sum.07 = phi double [ %add, %for.body ], [ 0.000000e+00,
>>> %for.body.preheader ]
>>>   %arrayidx = getelementptr double, double* %hoisted, i64 %i.08
>>>   %0 = load double, double* %arrayidx, align 8, !tbaa !2
>>>   %add = fadd double %sum.07, %0
>>>   %inc = add nuw nsw i64 %i.08, 1
>>>   %exitcond = icmp eq i64 %inc, %n
>>>   br i1 %exitcond, label %for.cond.cleanup.loopexit, label
%for.body
>>> }
>>>
>>>
>>> However instcombine will recanonicalize it like it was originally.
>>>
>>> Since it is a GEP that operate on a constant address, this
shouldn’t
>>> matter, why would you want to split this?
>>>
>>> Best,
>>>
>>> —
>>> Mehdi
>>>
>>>
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151209/9b036d71/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Dec 2015 - persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing

[llvm-dev] persuading licm to do the right thing

Reasonably Related Threads