thr3ads.net - llvm dev - [LLVMdev] Optimization puzzle... [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin

2015-Mar-25 19:26 UTC

[LLVMdev] Optimization puzzle...

Here's a version that doesn't try to do block deletion on it's own.
If you
use -adce then -simplifycfg, you get what you want.
It passes all tests except one, which is that we delete an invoke of a pure
function, IE Transforms/ADCE/dce_pure_invoke.ll -
I'm not sure why that's bad.

The reason we delete it is because it returns false to
I.mayHaveSideEffects(), and in particular, mayThrow returns false as well!

That seems really wrong if we expect it to stay around, so i'm actually
okay with failing this case unless someone tells me why we shouldn't.

(and why the right thing isn't to make it return true to mayThrow)



On Wed, Mar 25, 2015 at 12:02 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> So, the first  gets deleted with -dce because it's trivially dead.
>
> The second, nothing gets deleted because no DCE we have does DCE using
> control dependence, so it marks Terminators as live.  This makes everything
> executable in your case.
>
> SCCP and most other passes don't delete it because they are forward
> passes, and start by assuming the entryblock is executable and go from
> there.  If you make this assumption (which is not fixable for forward
> passes), everything here is executable.
>
> DCE is normally the backwards pass that fixes this.
>
> You can make ADCE fix this particular case, even without control
> dependence, with a small change:
>
> Right now it marks all terminators as alive.
>
> It should mark all return's as alive.
> Other terminators, It should only mark terminators as live once it reaches
> a block with an alive operation in it (IE the first time it gets into that
> block).
>
>
> That would make it mark the return as useful, then discover no other block
> has useful operations, and delete them all.
>
> I've attached a mostly untested patch to do this.
> It does what you want (It removes everything but the return)
> It should work in all cases.
> I ripped the erasure code from SCCP
> It will crash because it doesn't know what to do when it needs to
redirect
> the entry block
> SCCP doesn't have this problem because it is only redirecting, not
> erasing, so the entry block always has a terminator to redirect.
> (before, it crashed because it wanted to remove the entry block)
>
>
> So this needs to be fixed.
> I marked it with a TODO.
> If you want to do this, maybe refactor out the code for deleting dead
> blocks from SCCP/DCE, and then submit the patch, it would be great.
>
>
> On Wed, Mar 25, 2015 at 10:47 AM, Benoit Belley <
> Benoit.Belley at autodesk.com> wrote:
>
>> Hi everyone,
>>
>>
>> I am wondering what¹s stopping the LLVM optimizer (opt -O3) from
>> eliminating the apparently useless « icmp sgt » instruction in the
>> following piece of LLVM IR.
>>
>>     > ; ModuleID = 'lambda-opt.bc'
>>     > target datalayout =
"e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>     > target triple = "x86_64-apple-macosx10.10.0"
>>     >
>>     > ; Function Attrs: nounwind readnone ssp uwtable
>>     > define { <2 x float>, float }
@_Z18sampleNullOperator5PointS_(i64
>> %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32 %pmax.coerce1)
#0
>> {
>>     > _ZN15SamplingClosureD1Ev.exit:
>>     > %0 = icmp sgt i32 %pmin.coerce1, %pmax.coerce1
>>     > ret { <2 x float>, float } zeroinitializer
>>     > }
>>     >
>>     > attributes #0 = { nounwind readnone ssp uwtable
>> "less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
>> "no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false"
>> "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
>> "unsafe-fp-math"="false"
"use-soft-float"="false" }
>>     >
>>     > !llvm.ident = !{!0}
>>     >
>>     > !0 = metadata !{metadata !"Apple LLVM version 6.0
(clang-600.0.57)
>> (based on LLVM 3.5svn)"}
>>
>>
>> The X86 code generation passes seems to eliminate it for the simple
case
>> shown above. The generated assembly code is actually:
>>
>>
>>     > __Z18sampleNullOperator5PointS_: ##
@_Z18sampleNullOperator5PointS_
>>     > .cfi_startproc
>>     > ## BB#0: ## %_ZN15SamplingClosureD1Ev.exit
>>     > push      rbp
>>     > Ltmp0:
>>     > .cfi_def_cfa_offset 16
>>     > Ltmp1:
>>     > .cfi_offset rbp, -16
>>     > mov       rbp, rsp
>>     > Ltmp2:
>>     > .cfi_def_cfa_register rbp
>>     > vxorps    xmm0, xmm0, xmm0
>>     > vxorps    xmm1, xmm1, xmm1
>>     > pop       rbp
>>     > ret
>>
>>
>> I am wondering because I think that it might explain why the LLVM IR
code
>> shown below does not get simplified to a single "ret { <2 x
float>, float
>> } zeroinitializer" instruction. It seems to me that the nested
loops have
>> no side-effects and are guaranteed to terminate in a finite amount of
>> time. Unless, the nested "icmp sgt" instructions cannot be
eliminated.
>>
>> Of course, in that case, the X86 code generation passes can no longer
do
>> their magic...
>>
>>
>>     > ; ModuleID = 'lambda-opt.bc'
>>     > target datalayout =
"e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>     > target triple = "x86_64-apple-macosx10.10.0"
>>     >
>>     > ; Function Attrs: ssp uwtable
>>     > define { <2 x float>, float }
@_Z18sampleNullOperator5PointS_(i64
>>     > %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32
>> %pmax.coerce1) #0
>>     > {
>>     >   %1 = icmp slt i32 %pmin.coerce1, %pmax.coerce1
>>     >   br i1 %1, label %.lr.ph35.i, label
%_ZN15SamplingClosureD1Ev.exit
>>     >
>>     > .lr.ph35.i:                                       ; preds = %0
>>     >   %2 = lshr i64 %pmin.coerce0, 32
>>     >   %3 = trunc i64 %2 to i32
>>     >   %4 = lshr i64 %pmax.coerce0, 32
>>     >   %5 = trunc i64 %4 to i32
>>     >   %6 = icmp slt i32 %3, %5
>>     >   %7 = trunc i64 %pmin.coerce0 to i32
>>     >   %8 = trunc i64 %pmax.coerce0 to i32
>>     >   %9 = icmp slt i32 %7, %8
>>     >   br i1 %6, label %.lr.ph23.us.i.preheader, label
>>     > %_ZN15SamplingClosureD1Ev.exit
>>     >
>>     > .lr.ph23.us.i.preheader:                          ; preds
>> %.lr.ph35.i
>>     >   %10 = trunc i64 %pmax.coerce0 to i32
>>     >   %11 = trunc i64 %pmin.coerce0 to i32
>>     >   %12 = sub i32 %10, %11
>>     >   br label %.lr.ph23.us.i
>>     >
>>     > ._crit_edge24.us-lcssa.us57.i:                    ; preds =
%14,
>>     > %.lr.ph23.us.i
>>     >   %13 = add nsw i32 %z.028.us.i, 1
>>     >   %exitcond66.i = icmp eq i32 %13, %pmax.coerce1
>>     >   br i1 %exitcond66.i, label %_ZN15SamplingClosureD1Ev.exit,
label
>>     > %.lr.ph23.us.i
>>     >
>>     > .lr.ph23.us.i:                                    ; preds
>>     > %._crit_edge24.us-lcssa.us57.i, %.lr.ph23.us.i.preheader
>>     >   %z.028.us.i = phi i32 [ %13, %._crit_edge24.us-lcssa.us57.i
], [
>>     > %pmin.coerce1, %.lr.ph23.us.i.preheader ]
>>     >   br i1 %9, label %.lr.ph.us.us.i, label
>> %._crit_edge24.us-lcssa.us57.i
>>     >
>>     > ; <label>:14                                      ;
preds = %.noexc,
>>     > %middle.block
>>     >   %15 = add nsw i32 %y.018.us.us.i, 1
>>     >   %exitcond65.i = icmp eq i32 %15, %5
>>     >   br i1 %exitcond65.i, label %._crit_edge24.us-lcssa.us57.i,
label
>>     > %.lr.ph.us.us.i
>>     >
>>     > .lr.ph.us.us.i:                                   ; preds =
%14,
>>     > %.lr.ph23.us.i
>>     >   %y.018.us.us.i = phi i32 [ %15, %14 ], [ %3, %.lr.ph23.us.i
]
>>     >   %end.idx = add i32 %12, %7
>>     >   %n.vec = and i32 %12, -128
>>     >   %end.idx.rnd.down = add i32 %n.vec, %7
>>     >   %cmp.zero = icmp eq i32 %n.vec, 0
>>     >   br i1 %cmp.zero, label %middle.block, label %vector.body
>>     >
>>     > vector.body:                                      ; preds
>> %vector.body,
>>     > %.lr.ph.us.us.i
>>     >   %index = phi i32 [ %index.next, %vector.body ], [ %7,
>> %.lr.ph.us.us.i ]
>>     >   %index.next = add i32 %index, 128
>>     >   %16 = icmp eq i32 %index.next, %end.idx.rnd.down
>>     >   br i1 %16, label %middle.block, label %vector.body,
!llvm.loop !1
>>     >
>>     > middle.block:                                     ; preds
>> %vector.body,
>>     > %.lr.ph.us.us.i
>>     >   %resume.val = phi i32 [ %7, %.lr.ph.us.us.i ], [
>> %end.idx.rnd.down,
>>     > %vector.body ]
>>     >   %cmp.n = icmp eq i32 %end.idx, %resume.val
>>     >   br i1 %cmp.n, label %14, label %.noexc
>>     >
>>     > .noexc:                                           ; preds =
%.noexc,
>>     > %middle.block
>>     >   %x.012.us.us.i = phi i32 [ %17, %.noexc ], [ %resume.val,
>> %middle.block ]
>>     >   %17 = add nsw i32 %x.012.us.us.i, 1
>>     >   %exitcond64.i = icmp eq i32 %17, %8
>>     >   br i1 %exitcond64.i, label %14, label %.noexc, !llvm.loop !4
>>     >
>>     > _ZN15SamplingClosureD1Ev.exit:                    ; preds
>>     > %._crit_edge24.us-lcssa.us57.i, %.lr.ph35.i, %0
>>     >   ret { <2 x float>, float } zeroinitializer
>>     > }
>>     >
>>     > attributes #0 = { ssp uwtable
"less-precise-fpmad"="false"
>>     > "no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
>>     > "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
>>     > "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
>>     > "use-soft-float"="false" }
>>     >
>>     > !llvm.ident = !{!0}
>>     >
>>     > !0 = metadata !{metadata !"Apple LLVM version 6.0
(clang-600.0.57)
>> (based
>>     > on LLVM 3.5svn)"}
>>     > !1 = metadata !{metadata !1, metadata !2, metadata !3}
>>     > !2 = metadata !{metadata !"llvm.vectorizer.width",
i32 1}
>>     > !3 = metadata !{metadata !"llvm.vectorizer.unroll",
i32 1}
>>     > !4 = metadata !{metadata !4, metadata !2, metadata !3}
>>
>>
>> Thanks for you help,
>> Benoit
>>
>> Benoit Belley
>> Sr Principal Developer
>> M&E-Product Development Group
>> MAIN +1 514 393 1616
>> DIRECT +1 438 448 6304
>> FAX +1 514 393 0110
>> Twitter <http://twitter.com/autodesk>
>> Facebook <https://www.facebook.com/Autodesk>
>> Autodesk, Inc.
>> 10 Duke Street
>> Montreal, Quebec, Canada H3C 2L7
>> www.autodesk.com <http://www.autodesk.com/>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/667a7b7f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: adce2.patch
Type: application/octet-stream
Size: 4114 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/667a7b7f/attachment.obj>

Daniel Berlin

2015-Mar-25 19:31 UTC

head link

[LLVMdev] Optimization puzzle...

Note: this version is likely broken in other ways, because it replaces the
unconditional terminators with unreachable, which may leave the function
with no returns at all if it does it to the entry block.

I.E:

it will produce

; Function Attrs: readnone
declare i32 @strlen(i8*) #0

define i32 @test() {
  unreachable

Cont:                                             ; No predecessors!
  ret i32 0

Other:                                            ; No predecessors!
  %exn = landingpad { i8*, i32 } personality i32 (...)*
@__gxx_personality_v0
          cleanup
  ret i32 1
}

So i don't know which is better to go with.



On Wed, Mar 25, 2015 at 12:26 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> Here's a version that doesn't try to do block deletion on it's
own. If you
> use -adce then -simplifycfg, you get what you want.
> It passes all tests except one, which is that we delete an invoke of a
> pure function, IE Transforms/ADCE/dce_pure_invoke.ll -
> I'm not sure why that's bad.
>
> The reason we delete it is because it returns false to
> I.mayHaveSideEffects(), and in particular, mayThrow returns false as well!
>
> That seems really wrong if we expect it to stay around, so i'm actually
> okay with failing this case unless someone tells me why we shouldn't.
>
> (and why the right thing isn't to make it return true to mayThrow)
>
>
>
> On Wed, Mar 25, 2015 at 12:02 PM, Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>> So, the first  gets deleted with -dce because it's trivially dead.
>>
>> The second, nothing gets deleted because no DCE we have does DCE using
>> control dependence, so it marks Terminators as live.  This makes
everything
>> executable in your case.
>>
>> SCCP and most other passes don't delete it because they are forward
>> passes, and start by assuming the entryblock is executable and go from
>> there.  If you make this assumption (which is not fixable for forward
>> passes), everything here is executable.
>>
>> DCE is normally the backwards pass that fixes this.
>>
>> You can make ADCE fix this particular case, even without control
>> dependence, with a small change:
>>
>> Right now it marks all terminators as alive.
>>
>> It should mark all return's as alive.
>> Other terminators, It should only mark terminators as live once it
>> reaches a block with an alive operation in it (IE the first time it
gets
>> into that block).
>>
>>
>> That would make it mark the return as useful, then discover no other
>> block has useful operations, and delete them all.
>>
>> I've attached a mostly untested patch to do this.
>> It does what you want (It removes everything but the return)
>> It should work in all cases.
>> I ripped the erasure code from SCCP
>> It will crash because it doesn't know what to do when it needs to
>> redirect the entry block
>> SCCP doesn't have this problem because it is only redirecting, not
>> erasing, so the entry block always has a terminator to redirect.
>> (before, it crashed because it wanted to remove the entry block)
>>
>>
>> So this needs to be fixed.
>> I marked it with a TODO.
>> If you want to do this, maybe refactor out the code for deleting dead
>> blocks from SCCP/DCE, and then submit the patch, it would be great.
>>
>>
>> On Wed, Mar 25, 2015 at 10:47 AM, Benoit Belley <
>> Benoit.Belley at autodesk.com> wrote:
>>
>>> Hi everyone,
>>>
>>>
>>> I am wondering what¹s stopping the LLVM optimizer (opt -O3) from
>>> eliminating the apparently useless « icmp sgt » instruction in the
>>> following piece of LLVM IR.
>>>
>>>     > ; ModuleID = 'lambda-opt.bc'
>>>     > target datalayout =
"e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>>     > target triple = "x86_64-apple-macosx10.10.0"
>>>     >
>>>     > ; Function Attrs: nounwind readnone ssp uwtable
>>>     > define { <2 x float>, float }
@_Z18sampleNullOperator5PointS_(i64
>>> %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32
%pmax.coerce1)
>>> #0
>>> {
>>>     > _ZN15SamplingClosureD1Ev.exit:
>>>     > %0 = icmp sgt i32 %pmin.coerce1, %pmax.coerce1
>>>     > ret { <2 x float>, float } zeroinitializer
>>>     > }
>>>     >
>>>     > attributes #0 = { nounwind readnone ssp uwtable
>>> "less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
>>> "no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false"
>>> "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
>>> "unsafe-fp-math"="false"
"use-soft-float"="false" }
>>>     >
>>>     > !llvm.ident = !{!0}
>>>     >
>>>     > !0 = metadata !{metadata !"Apple LLVM version 6.0
(clang-600.0.57)
>>> (based on LLVM 3.5svn)"}
>>>
>>>
>>> The X86 code generation passes seems to eliminate it for the simple
case
>>> shown above. The generated assembly code is actually:
>>>
>>>
>>>     > __Z18sampleNullOperator5PointS_: ##
@_Z18sampleNullOperator5PointS_
>>>     > .cfi_startproc
>>>     > ## BB#0: ## %_ZN15SamplingClosureD1Ev.exit
>>>     > push      rbp
>>>     > Ltmp0:
>>>     > .cfi_def_cfa_offset 16
>>>     > Ltmp1:
>>>     > .cfi_offset rbp, -16
>>>     > mov       rbp, rsp
>>>     > Ltmp2:
>>>     > .cfi_def_cfa_register rbp
>>>     > vxorps    xmm0, xmm0, xmm0
>>>     > vxorps    xmm1, xmm1, xmm1
>>>     > pop       rbp
>>>     > ret
>>>
>>>
>>> I am wondering because I think that it might explain why the LLVM
IR code
>>> shown below does not get simplified to a single "ret { <2 x
float>, float
>>> } zeroinitializer" instruction. It seems to me that the nested
loops have
>>> no side-effects and are guaranteed to terminate in a finite amount
of
>>> time. Unless, the nested "icmp sgt" instructions cannot
be eliminated.
>>>
>>> Of course, in that case, the X86 code generation passes can no
longer do
>>> their magic...
>>>
>>>
>>>     > ; ModuleID = 'lambda-opt.bc'
>>>     > target datalayout =
"e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>>     > target triple = "x86_64-apple-macosx10.10.0"
>>>     >
>>>     > ; Function Attrs: ssp uwtable
>>>     > define { <2 x float>, float }
@_Z18sampleNullOperator5PointS_(i64
>>>     > %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32
>>> %pmax.coerce1) #0
>>>     > {
>>>     >   %1 = icmp slt i32 %pmin.coerce1, %pmax.coerce1
>>>     >   br i1 %1, label %.lr.ph35.i, label
%_ZN15SamplingClosureD1Ev.exit
>>>     >
>>>     > .lr.ph35.i:                                       ; preds
= %0
>>>     >   %2 = lshr i64 %pmin.coerce0, 32
>>>     >   %3 = trunc i64 %2 to i32
>>>     >   %4 = lshr i64 %pmax.coerce0, 32
>>>     >   %5 = trunc i64 %4 to i32
>>>     >   %6 = icmp slt i32 %3, %5
>>>     >   %7 = trunc i64 %pmin.coerce0 to i32
>>>     >   %8 = trunc i64 %pmax.coerce0 to i32
>>>     >   %9 = icmp slt i32 %7, %8
>>>     >   br i1 %6, label %.lr.ph23.us.i.preheader, label
>>>     > %_ZN15SamplingClosureD1Ev.exit
>>>     >
>>>     > .lr.ph23.us.i.preheader:                          ; preds
>>> %.lr.ph35.i
>>>     >   %10 = trunc i64 %pmax.coerce0 to i32
>>>     >   %11 = trunc i64 %pmin.coerce0 to i32
>>>     >   %12 = sub i32 %10, %11
>>>     >   br label %.lr.ph23.us.i
>>>     >
>>>     > ._crit_edge24.us-lcssa.us57.i:                    ; preds
= %14,
>>>     > %.lr.ph23.us.i
>>>     >   %13 = add nsw i32 %z.028.us.i, 1
>>>     >   %exitcond66.i = icmp eq i32 %13, %pmax.coerce1
>>>     >   br i1 %exitcond66.i, label
%_ZN15SamplingClosureD1Ev.exit, label
>>>     > %.lr.ph23.us.i
>>>     >
>>>     > .lr.ph23.us.i:                                    ; preds
>>>     > %._crit_edge24.us-lcssa.us57.i, %.lr.ph23.us.i.preheader
>>>     >   %z.028.us.i = phi i32 [ %13,
%._crit_edge24.us-lcssa.us57.i ], [
>>>     > %pmin.coerce1, %.lr.ph23.us.i.preheader ]
>>>     >   br i1 %9, label %.lr.ph.us.us.i, label
>>> %._crit_edge24.us-lcssa.us57.i
>>>     >
>>>     > ; <label>:14                                      ;
preds >>> %.noexc,
>>>     > %middle.block
>>>     >   %15 = add nsw i32 %y.018.us.us.i, 1
>>>     >   %exitcond65.i = icmp eq i32 %15, %5
>>>     >   br i1 %exitcond65.i, label
%._crit_edge24.us-lcssa.us57.i, label
>>>     > %.lr.ph.us.us.i
>>>     >
>>>     > .lr.ph.us.us.i:                                   ; preds
= %14,
>>>     > %.lr.ph23.us.i
>>>     >   %y.018.us.us.i = phi i32 [ %15, %14 ], [ %3,
%.lr.ph23.us.i ]
>>>     >   %end.idx = add i32 %12, %7
>>>     >   %n.vec = and i32 %12, -128
>>>     >   %end.idx.rnd.down = add i32 %n.vec, %7
>>>     >   %cmp.zero = icmp eq i32 %n.vec, 0
>>>     >   br i1 %cmp.zero, label %middle.block, label %vector.body
>>>     >
>>>     > vector.body:                                      ; preds
>>> %vector.body,
>>>     > %.lr.ph.us.us.i
>>>     >   %index = phi i32 [ %index.next, %vector.body ], [ %7,
>>> %.lr.ph.us.us.i ]
>>>     >   %index.next = add i32 %index, 128
>>>     >   %16 = icmp eq i32 %index.next, %end.idx.rnd.down
>>>     >   br i1 %16, label %middle.block, label %vector.body,
!llvm.loop !1
>>>     >
>>>     > middle.block:                                     ; preds
>>> %vector.body,
>>>     > %.lr.ph.us.us.i
>>>     >   %resume.val = phi i32 [ %7, %.lr.ph.us.us.i ], [
>>> %end.idx.rnd.down,
>>>     > %vector.body ]
>>>     >   %cmp.n = icmp eq i32 %end.idx, %resume.val
>>>     >   br i1 %cmp.n, label %14, label %.noexc
>>>     >
>>>     > .noexc:                                           ; preds
>>> %.noexc,
>>>     > %middle.block
>>>     >   %x.012.us.us.i = phi i32 [ %17, %.noexc ], [
%resume.val,
>>> %middle.block ]
>>>     >   %17 = add nsw i32 %x.012.us.us.i, 1
>>>     >   %exitcond64.i = icmp eq i32 %17, %8
>>>     >   br i1 %exitcond64.i, label %14, label %.noexc,
!llvm.loop !4
>>>     >
>>>     > _ZN15SamplingClosureD1Ev.exit:                    ; preds
>>>     > %._crit_edge24.us-lcssa.us57.i, %.lr.ph35.i, %0
>>>     >   ret { <2 x float>, float } zeroinitializer
>>>     > }
>>>     >
>>>     > attributes #0 = { ssp uwtable
"less-precise-fpmad"="false"
>>>     > "no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
>>>     > "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
>>>     > "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
>>>     > "use-soft-float"="false" }
>>>     >
>>>     > !llvm.ident = !{!0}
>>>     >
>>>     > !0 = metadata !{metadata !"Apple LLVM version 6.0
(clang-600.0.57)
>>> (based
>>>     > on LLVM 3.5svn)"}
>>>     > !1 = metadata !{metadata !1, metadata !2, metadata !3}
>>>     > !2 = metadata !{metadata
!"llvm.vectorizer.width", i32 1}
>>>     > !3 = metadata !{metadata
!"llvm.vectorizer.unroll", i32 1}
>>>     > !4 = metadata !{metadata !4, metadata !2, metadata !3}
>>>
>>>
>>> Thanks for you help,
>>> Benoit
>>>
>>> Benoit Belley
>>> Sr Principal Developer
>>> M&E-Product Development Group
>>> MAIN +1 514 393 1616
>>> DIRECT +1 438 448 6304
>>> FAX +1 514 393 0110
>>> Twitter <http://twitter.com/autodesk>
>>> Facebook <https://www.facebook.com/Autodesk>
>>> Autodesk, Inc.
>>> 10 Duke Street
>>> Montreal, Quebec, Canada H3C 2L7
>>> www.autodesk.com <http://www.autodesk.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/0bdfe967/attachment.html>

Daniel Berlin

2015-Mar-25 20:40 UTC

head link

[LLVMdev] Optimization puzzle...

No, it's some loop specific pass working backwards.
It's probably still worth fixing ADCE though, since the fix is simple.

I've updated the change to work with invokes (after talking with nick, we
don't consider the control flow change to be a side effect).
I'll test and submit it

On Wed, Mar 25, 2015 at 1:29 PM, Benoit Belley <Benoit.Belley at
autodesk.com>
wrote:
>  I just find out that LLVM 3.7/Trunk seems to be already capable of
> eliminating the useless nested loops on its own. Here’s what I am getting:
>
>   ; Function Attrs: nounwind readnone ssp uwtable
> define { <2 x float>, float } @_Z18sampleNullOperator5PointS_(i64
> %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32 %pmax.coerce1) #0
{
>   ret { <2 x float>, float } zeroinitializer
> }
>
>  Would 3.7 already contain a version of your patch ?
>
>  Judging from dumping the pass outputs, it seems to be the following
> passes which are eliminating the dead loops (see the attached file):
>
>     7786:*** IR Dump Before Recognize loop idioms ***
>    7789:*** IR Dump After Recognize loop idioms ***
>    7792:*** IR Dump Before Delete dead loops ***
>
>  The ADCE and SCFG passes do the remaining clean-up afterward.
>
>  Benoit
>
>
>  *Benoit Belley*
>
> Sr Principal Developer
>
> M&E-Product Development Group
>
>
>
> *MAIN* +1 514 393 1616
>
> *DIRECT* +1 438 448 6304
>
> *FAX* +1 514 393 0110
>
>
>
> Twitter <http://twitter.com/autodesk>
>
> Facebook <https://www.facebook.com/Autodesk>
>
>
>
> *Autodesk, Inc.*
>
> 10 Duke Street
>
> Montreal, Quebec, Canada H3C 2L7
>
> www.autodesk.com
>
>
>
> [image: Description: Email_Signature_Logobar]
>
>
>
>   From: Daniel Berlin <dberlin at dberlin.org>
> Date: mercredi 25 mars 2015 15:31
> To: Benoit Belley <benoit.belley at autodesk.com>
> Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>
> Subject: Re: [LLVMdev] Optimization puzzle...
>
>   Note: this version is likely broken in other ways, because it replaces
> the unconditional terminators with unreachable, which may leave the
> function with no returns at all if it does it to the entry block.
>
>  I.E:
>
>  it will produce
>
>  ; Function Attrs: readnone
> declare i32 @strlen(i8*) #0
>
>  define i32 @test() {
>   unreachable
>
>  Cont:                                             ; No predecessors!
>   ret i32 0
>
>  Other:                                            ; No predecessors!
>   %exn = landingpad { i8*, i32 } personality i32 (...)*
> @__gxx_personality_v0
>           cleanup
>   ret i32 1
> }
>
>  So i don't know which is better to go with.
>
>
>
> On Wed, Mar 25, 2015 at 12:26 PM, Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>> Here's a version that doesn't try to do block deletion on
it's own. If
>> you use -adce then -simplifycfg, you get what you want.
>> It passes all tests except one, which is that we delete an invoke of a
>> pure function, IE Transforms/ADCE/dce_pure_invoke.ll -
>> I'm not sure why that's bad.
>>
>>  The reason we delete it is because it returns false to
>> I.mayHaveSideEffects(), and in particular, mayThrow returns false as
well!
>>
>>  That seems really wrong if we expect it to stay around, so i'm
actually
>> okay with failing this case unless someone tells me why we
shouldn't.
>>
>>  (and why the right thing isn't to make it return true to mayThrow)
>>
>>
>>
>> On Wed, Mar 25, 2015 at 12:02 PM, Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>>
>>> So, the first  gets deleted with -dce because it's trivially
dead.
>>>
>>>  The second, nothing gets deleted because no DCE we have does DCE
using
>>> control dependence, so it marks Terminators as live.  This makes
everything
>>> executable in your case.
>>>
>>>  SCCP and most other passes don't delete it because they are
forward
>>> passes, and start by assuming the entryblock is executable and go
from
>>> there.  If you make this assumption (which is not fixable for
forward
>>> passes), everything here is executable.
>>>
>>>  DCE is normally the backwards pass that fixes this.
>>>
>>>  You can make ADCE fix this particular case, even without control
>>> dependence, with a small change:
>>>
>>>  Right now it marks all terminators as alive.
>>>
>>>  It should mark all return's as alive.
>>> Other terminators, It should only mark terminators as live once it
>>> reaches a block with an alive operation in it (IE the first time it
gets
>>> into that block).
>>>
>>>
>>>  That would make it mark the return as useful, then discover no
other
>>> block has useful operations, and delete them all.
>>>
>>>  I've attached a mostly untested patch to do this.
>>> It does what you want (It removes everything but the return)
>>> It should work in all cases.
>>> I ripped the erasure code from SCCP
>>> It will crash because it doesn't know what to do when it needs
to
>>> redirect the entry block
>>> SCCP doesn't have this problem because it is only redirecting,
not
>>> erasing, so the entry block always has a terminator to redirect.
>>> (before, it crashed because it wanted to remove the entry block)
>>>
>>>
>>>  So this needs to be fixed.
>>> I marked it with a TODO.
>>> If you want to do this, maybe refactor out the code for deleting
dead
>>> blocks from SCCP/DCE, and then submit the patch, it would be great.
>>>
>>>
>>> On Wed, Mar 25, 2015 at 10:47 AM, Benoit Belley <
>>> Benoit.Belley at autodesk.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>>
>>>> I am wondering what¹s stopping the LLVM optimizer (opt -O3)
from
>>>> eliminating the apparently useless ? icmp sgt ? instruction in
the
>>>> following piece of LLVM IR.
>>>>
>>>>     > ; ModuleID = 'lambda-opt.bc'
>>>>     > target datalayout =
"e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>>>     > target triple = "x86_64-apple-macosx10.10.0"
>>>>     >
>>>>     > ; Function Attrs: nounwind readnone ssp uwtable
>>>>     > define { <2 x float>, float }
@_Z18sampleNullOperator5PointS_(i64
>>>> %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32
%pmax.coerce1)
>>>> #0
>>>> {
>>>>     > _ZN15SamplingClosureD1Ev.exit:
>>>>     > %0 = icmp sgt i32 %pmin.coerce1, %pmax.coerce1
>>>>     > ret { <2 x float>, float } zeroinitializer
>>>>     > }
>>>>     >
>>>>     > attributes #0 = { nounwind readnone ssp uwtable
>>>> "less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
>>>> "no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false"
>>>> "no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
>>>> "unsafe-fp-math"="false"
"use-soft-float"="false" }
>>>>     >
>>>>     > !llvm.ident = !{!0}
>>>>     >
>>>>     > !0 = metadata !{metadata !"Apple LLVM version 6.0
(clang-600.0.57)
>>>> (based on LLVM 3.5svn)"}
>>>>
>>>>
>>>> The X86 code generation passes seems to eliminate it for the
simple case
>>>> shown above. The generated assembly code is actually:
>>>>
>>>>
>>>>     > __Z18sampleNullOperator5PointS_: ##
>>>> @_Z18sampleNullOperator5PointS_
>>>>     > .cfi_startproc
>>>>     > ## BB#0: ## %_ZN15SamplingClosureD1Ev.exit
>>>>     > push      rbp
>>>>     > Ltmp0:
>>>>     > .cfi_def_cfa_offset 16
>>>>     > Ltmp1:
>>>>     > .cfi_offset rbp, -16
>>>>     > mov       rbp, rsp
>>>>     > Ltmp2:
>>>>     > .cfi_def_cfa_register rbp
>>>>     > vxorps    xmm0, xmm0, xmm0
>>>>     > vxorps    xmm1, xmm1, xmm1
>>>>     > pop       rbp
>>>>     > ret
>>>>
>>>>
>>>> I am wondering because I think that it might explain why the
LLVM IR
>>>> code
>>>> shown below does not get simplified to a single "ret {
<2 x float>,
>>>> float
>>>> } zeroinitializer" instruction. It seems to me that the
nested loops
>>>> have
>>>> no side-effects and are guaranteed to terminate in a finite
amount of
>>>> time. Unless, the nested "icmp sgt" instructions
cannot be eliminated.
>>>>
>>>> Of course, in that case, the X86 code generation passes can no
longer do
>>>> their magic...
>>>>
>>>>
>>>>     > ; ModuleID = 'lambda-opt.bc'
>>>>     > target datalayout =
"e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>>>>     > target triple = "x86_64-apple-macosx10.10.0"
>>>>     >
>>>>     > ; Function Attrs: ssp uwtable
>>>>     > define { <2 x float>, float }
@_Z18sampleNullOperator5PointS_(i64
>>>>     > %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0,
i32
>>>> %pmax.coerce1) #0
>>>>     > {
>>>>     >   %1 = icmp slt i32 %pmin.coerce1, %pmax.coerce1
>>>>     >   br i1 %1, label %.lr.ph35.i, label
>>>> %_ZN15SamplingClosureD1Ev.exit
>>>>     >
>>>>     > .lr.ph35.i:                                       ;
preds = %0
>>>>     >   %2 = lshr i64 %pmin.coerce0, 32
>>>>     >   %3 = trunc i64 %2 to i32
>>>>     >   %4 = lshr i64 %pmax.coerce0, 32
>>>>     >   %5 = trunc i64 %4 to i32
>>>>     >   %6 = icmp slt i32 %3, %5
>>>>     >   %7 = trunc i64 %pmin.coerce0 to i32
>>>>     >   %8 = trunc i64 %pmax.coerce0 to i32
>>>>     >   %9 = icmp slt i32 %7, %8
>>>>     >   br i1 %6, label %.lr.ph23.us.i.preheader, label
>>>>     > %_ZN15SamplingClosureD1Ev.exit
>>>>     >
>>>>     > .lr.ph23.us.i.preheader:                          ;
preds >>>> %.lr.ph35.i
>>>>     >   %10 = trunc i64 %pmax.coerce0 to i32
>>>>     >   %11 = trunc i64 %pmin.coerce0 to i32
>>>>     >   %12 = sub i32 %10, %11
>>>>     >   br label %.lr.ph23.us.i
>>>>     >
>>>>     > ._crit_edge24.us-lcssa.us57.i:                    ;
preds = %14,
>>>>     > %.lr.ph23.us.i
>>>>     >   %13 = add nsw i32 %z.028.us.i, 1
>>>>     >   %exitcond66.i = icmp eq i32 %13, %pmax.coerce1
>>>>     >   br i1 %exitcond66.i, label
%_ZN15SamplingClosureD1Ev.exit, label
>>>>     > %.lr.ph23.us.i
>>>>     >
>>>>     > .lr.ph23.us.i:                                    ;
preds >>>>     > %._crit_edge24.us-lcssa.us57.i,
%.lr.ph23.us.i.preheader
>>>>     >   %z.028.us.i = phi i32 [ %13,
%._crit_edge24.us-lcssa.us57.i ], [
>>>>     > %pmin.coerce1, %.lr.ph23.us.i.preheader ]
>>>>     >   br i1 %9, label %.lr.ph.us.us.i, label
>>>> %._crit_edge24.us-lcssa.us57.i
>>>>     >
>>>>     > ; <label>:14                                    
; preds >>>> %.noexc,
>>>>     > %middle.block
>>>>     >   %15 = add nsw i32 %y.018.us.us.i, 1
>>>>     >   %exitcond65.i = icmp eq i32 %15, %5
>>>>     >   br i1 %exitcond65.i, label
%._crit_edge24.us-lcssa.us57.i, label
>>>>     > %.lr.ph.us.us.i
>>>>     >
>>>>     > .lr.ph.us.us.i:                                   ;
preds = %14,
>>>>     > %.lr.ph23.us.i
>>>>     >   %y.018.us.us.i = phi i32 [ %15, %14 ], [ %3,
%.lr.ph23.us.i ]
>>>>     >   %end.idx = add i32 %12, %7
>>>>     >   %n.vec = and i32 %12, -128
>>>>     >   %end.idx.rnd.down = add i32 %n.vec, %7
>>>>     >   %cmp.zero = icmp eq i32 %n.vec, 0
>>>>     >   br i1 %cmp.zero, label %middle.block, label
%vector.body
>>>>     >
>>>>     > vector.body:                                      ;
preds >>>> %vector.body,
>>>>     > %.lr.ph.us.us.i
>>>>     >   %index = phi i32 [ %index.next, %vector.body ], [
%7,
>>>> %.lr.ph.us.us.i ]
>>>>     >   %index.next = add i32 %index, 128
>>>>     >   %16 = icmp eq i32 %index.next, %end.idx.rnd.down
>>>>     >   br i1 %16, label %middle.block, label %vector.body,
!llvm.loop
>>>> !1
>>>>     >
>>>>     > middle.block:                                     ;
preds >>>> %vector.body,
>>>>     > %.lr.ph.us.us.i
>>>>     >   %resume.val = phi i32 [ %7, %.lr.ph.us.us.i ], [
>>>> %end.idx.rnd.down,
>>>>     > %vector.body ]
>>>>     >   %cmp.n = icmp eq i32 %end.idx, %resume.val
>>>>     >   br i1 %cmp.n, label %14, label %.noexc
>>>>     >
>>>>     > .noexc:                                           ;
preds >>>> %.noexc,
>>>>     > %middle.block
>>>>     >   %x.012.us.us.i = phi i32 [ %17, %.noexc ], [
%resume.val,
>>>> %middle.block ]
>>>>     >   %17 = add nsw i32 %x.012.us.us.i, 1
>>>>     >   %exitcond64.i = icmp eq i32 %17, %8
>>>>     >   br i1 %exitcond64.i, label %14, label %.noexc,
!llvm.loop !4
>>>>     >
>>>>     > _ZN15SamplingClosureD1Ev.exit:                    ;
preds >>>>     > %._crit_edge24.us-lcssa.us57.i, %.lr.ph35.i, %0
>>>>     >   ret { <2 x float>, float } zeroinitializer
>>>>     > }
>>>>     >
>>>>     > attributes #0 = { ssp uwtable
"less-precise-fpmad"="false"
>>>>     > "no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
>>>>     > "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
>>>>     > "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
>>>>     > "use-soft-float"="false" }
>>>>     >
>>>>     > !llvm.ident = !{!0}
>>>>     >
>>>>     > !0 = metadata !{metadata !"Apple LLVM version 6.0
(clang-600.0.57)
>>>> (based
>>>>     > on LLVM 3.5svn)"}
>>>>     > !1 = metadata !{metadata !1, metadata !2, metadata !3}
>>>>     > !2 = metadata !{metadata
!"llvm.vectorizer.width", i32 1}
>>>>     > !3 = metadata !{metadata
!"llvm.vectorizer.unroll", i32 1}
>>>>     > !4 = metadata !{metadata !4, metadata !2, metadata !3}
>>>>
>>>>
>>>> Thanks for you help,
>>>> Benoit
>>>>
>>>> Benoit Belley
>>>> Sr Principal Developer
>>>> M&E-Product Development Group
>>>> MAIN +1 514 393 1616
>>>> DIRECT +1 438 448 6304
>>>> FAX +1 514 393 0110
>>>> Twitter <http://twitter.com/autodesk>
>>>> Facebook <https://www.facebook.com/Autodesk>
>>>> Autodesk, Inc.
>>>> 10 Duke Street
>>>> Montreal, Quebec, Canada H3C 2L7
>>>> www.autodesk.com <http://www.autodesk.com/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/759660cf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 350F40DB-4457-4455-A632-0DF05738AF15[8].png
Type: image/png
Size: 4316 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/759660cf/attachment.png>

Benoit Belley

2015-Mar-25 20:53 UTC

head link

[LLVMdev] Optimization puzzle...

Excellent!
Benoit

From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Date: mercredi 25 mars 2015 16:40
To: Benoit Belley <benoit.belley at autodesk.com<mailto:benoit.belley at
autodesk.com>>
Cc: "llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>"
<llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>>
Subject: Re: [LLVMdev] Optimization puzzle...

No, it's some loop specific pass working backwards.
It's probably still worth fixing ADCE though, since the fix is simple.

I've updated the change to work with invokes (after talking with nick, we
don't consider the control flow change to be a side effect).
I'll test and submit it

On Wed, Mar 25, 2015 at 1:29 PM, Benoit Belley <Benoit.Belley at
autodesk.com<mailto:Benoit.Belley at autodesk.com>> wrote:
I just find out that LLVM 3.7/Trunk seems to be already capable of eliminating
the useless nested loops on its own. Here’s what I am getting:

; Function Attrs: nounwind readnone ssp uwtable
define { <2 x float>, float } @_Z18sampleNullOperator5PointS_(i64
%pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32 %pmax.coerce1) #0 {
  ret { <2 x float>, float } zeroinitializer
}

Would 3.7 already contain a version of your patch ?

Judging from dumping the pass outputs, it seems to be the following passes which
are eliminating the dead loops (see the attached file):

   7786:*** IR Dump Before Recognize loop idioms ***
   7789:*** IR Dump After Recognize loop idioms ***
   7792:*** IR Dump Before Delete dead loops ***

The ADCE and SCFG passes do the remaining clean-up afterward.

Benoit

Benoit Belley
Sr Principal Developer
M&E-Product Development Group

MAIN +1 514 393 1616<tel:514%20393%201616>
DIRECT +1 438 448 6304<tel:438%20448%206304>
FAX +1 514 393 0110<tel:514%20393%200110>

Twitter<http://twitter.com/autodesk>
Facebook<https://www.facebook.com/Autodesk>

Autodesk, Inc.
10 Duke Street
Montreal, Quebec, Canada H3C 2L7
www.autodesk.com<http://www.autodesk.com/>

[Description: Email_Signature_Logobar]

From: Daniel Berlin <dberlin at dberlin.org<mailto:dberlin at
dberlin.org>>
Date: mercredi 25 mars 2015 15:31
To: Benoit Belley <benoit.belley at autodesk.com<mailto:benoit.belley at
autodesk.com>>
Cc: "llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>"
<llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>>
Subject: Re: [LLVMdev] Optimization puzzle...

Note: this version is likely broken in other ways, because it replaces the
unconditional terminators with unreachable, which may leave the function with no
returns at all if it does it to the entry block.

I.E:

it will produce

; Function Attrs: readnone
declare i32 @strlen(i8*) #0

define i32 @test() {
  unreachable

Cont:                                             ; No predecessors!
  ret i32 0

Other:                                            ; No predecessors!
  %exn = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
          cleanup
  ret i32 1
}

So i don't know which is better to go with.

On Wed, Mar 25, 2015 at 12:26 PM, Daniel Berlin <dberlin at
dberlin.org<mailto:dberlin at dberlin.org>> wrote:
Here's a version that doesn't try to do block deletion on it's own.
If you use -adce then -simplifycfg, you get what you want.
It passes all tests except one, which is that we delete an invoke of a pure
function, IE Transforms/ADCE/dce_pure_invoke.ll -
I'm not sure why that's bad.

The reason we delete it is because it returns false to I.mayHaveSideEffects(),
and in particular, mayThrow returns false as well!

That seems really wrong if we expect it to stay around, so i'm actually okay
with failing this case unless someone tells me why we shouldn't.

(and why the right thing isn't to make it return true to mayThrow)

On Wed, Mar 25, 2015 at 12:02 PM, Daniel Berlin <dberlin at
dberlin.org<mailto:dberlin at dberlin.org>> wrote:
So, the first  gets deleted with -dce because it's trivially dead.

The second, nothing gets deleted because no DCE we have does DCE using control
dependence, so it marks Terminators as live.  This makes everything executable
in your case.

SCCP and most other passes don't delete it because they are forward passes,
and start by assuming the entryblock is executable and go from there.  If you
make this assumption (which is not fixable for forward passes), everything here
is executable.

DCE is normally the backwards pass that fixes this.

You can make ADCE fix this particular case, even without control dependence,
with a small change:

Right now it marks all terminators as alive.

It should mark all return's as alive.
Other terminators, It should only mark terminators as live once it reaches a
block with an alive operation in it (IE the first time it gets into that block).

That would make it mark the return as useful, then discover no other block has
useful operations, and delete them all.

I've attached a mostly untested patch to do this.
It does what you want (It removes everything but the return)
It should work in all cases.
I ripped the erasure code from SCCP
It will crash because it doesn't know what to do when it needs to redirect
the entry block
SCCP doesn't have this problem because it is only redirecting, not erasing,
so the entry block always has a terminator to redirect.
(before, it crashed because it wanted to remove the entry block)

So this needs to be fixed.
I marked it with a TODO.
If you want to do this, maybe refactor out the code for deleting dead blocks
from SCCP/DCE, and then submit the patch, it would be great.

On Wed, Mar 25, 2015 at 10:47 AM, Benoit Belley <Benoit.Belley at
autodesk.com<mailto:Benoit.Belley at autodesk.com>> wrote:
Hi everyone,

I am wondering what¹s stopping the LLVM optimizer (opt -O3) from
eliminating the apparently useless ? icmp sgt ? instruction in the
following piece of LLVM IR.

    > ; ModuleID = 'lambda-opt.bc'
    > target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
    > target triple = "x86_64-apple-macosx10.10.0"
    >
    > ; Function Attrs: nounwind readnone ssp uwtable
    > define { <2 x float>, float } @_Z18sampleNullOperator5PointS_(i64
%pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32 %pmax.coerce1) #0
{
    > _ZN15SamplingClosureD1Ev.exit:
    > %0 = icmp sgt i32 %pmin.coerce1, %pmax.coerce1
    > ret { <2 x float>, float } zeroinitializer
    > }
    >
    > attributes #0 = { nounwind readnone ssp uwtable
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
"use-soft-float"="false" }
    >
    > !llvm.ident = !{!0}
    >
    > !0 = metadata !{metadata !"Apple LLVM version 6.0 (clang-600.0.57)
(based on LLVM 3.5svn)"}

The X86 code generation passes seems to eliminate it for the simple case
shown above. The generated assembly code is actually:

    > __Z18sampleNullOperator5PointS_: ## @_Z18sampleNullOperator5PointS_
    > .cfi_startproc
    > ## BB#0: ## %_ZN15SamplingClosureD1Ev.exit
    > push      rbp
    > Ltmp0:
    > .cfi_def_cfa_offset 16
    > Ltmp1:
    > .cfi_offset rbp, -16
    > mov       rbp, rsp
    > Ltmp2:
    > .cfi_def_cfa_register rbp
    > vxorps    xmm0, xmm0, xmm0
    > vxorps    xmm1, xmm1, xmm1
    > pop       rbp
    > ret

I am wondering because I think that it might explain why the LLVM IR code
shown below does not get simplified to a single "ret { <2 x float>,
float
} zeroinitializer" instruction. It seems to me that the nested loops have
no side-effects and are guaranteed to terminate in a finite amount of
time. Unless, the nested "icmp sgt" instructions cannot be eliminated.

Of course, in that case, the X86 code generation passes can no longer do
their magic...

    > ; ModuleID = 'lambda-opt.bc'
    > target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
    > target triple = "x86_64-apple-macosx10.10.0"
    >
    > ; Function Attrs: ssp uwtable
    > define { <2 x float>, float } @_Z18sampleNullOperator5PointS_(i64
    > %pmin.coerce0, i32 %pmin.coerce1, i64 %pmax.coerce0, i32
%pmax.coerce1) #0
    > {
    >   %1 = icmp slt i32 %pmin.coerce1, %pmax.coerce1
    >   br i1 %1, label %.lr.ph35.i, label %_ZN15SamplingClosureD1Ev.exit
    >
    > .lr.ph35.i:                                       ; preds = %0
    >   %2 = lshr i64 %pmin.coerce0, 32
    >   %3 = trunc i64 %2 to i32
    >   %4 = lshr i64 %pmax.coerce0, 32
    >   %5 = trunc i64 %4 to i32
    >   %6 = icmp slt i32 %3, %5
    >   %7 = trunc i64 %pmin.coerce0 to i32
    >   %8 = trunc i64 %pmax.coerce0 to i32
    >   %9 = icmp slt i32 %7, %8
    >   br i1 %6, label %.lr.ph23.us.i.preheader, label
    > %_ZN15SamplingClosureD1Ev.exit
    >
    > .lr.ph23.us.i.preheader:                          ; preds %.lr.ph35.i
    >   %10 = trunc i64 %pmax.coerce0 to i32
    >   %11 = trunc i64 %pmin.coerce0 to i32
    >   %12 = sub i32 %10, %11
    >   br label %.lr.ph23.us.i
    >
    > ._crit_edge24.us-lcssa.us57.i:                    ; preds = %14,
    > %.lr.ph23.us.i
    >   %13 = add nsw i32 %z.028.us.i, 1
    >   %exitcond66.i = icmp eq i32 %13, %pmax.coerce1
    >   br i1 %exitcond66.i, label %_ZN15SamplingClosureD1Ev.exit, label
    > %.lr.ph23.us.i
    >
    > .lr.ph23.us.i:                                    ; preds     >
%._crit_edge24.us-lcssa.us57.i, %.lr.ph23.us.i.preheader
    >   %z.028.us.i = phi i32 [ %13, %._crit_edge24.us-lcssa.us57.i ], [
    > %pmin.coerce1, %.lr.ph23.us.i.preheader ]
    >   br i1 %9, label %.lr.ph.us.us.i, label
%._crit_edge24.us-lcssa.us57.i
    >
    > ; <label>:14                                      ; preds =
%.noexc,
    > %middle.block
    >   %15 = add nsw i32 %y.018.us.us.i, 1
    >   %exitcond65.i = icmp eq i32 %15, %5
    >   br i1 %exitcond65.i, label %._crit_edge24.us-lcssa.us57.i, label
    > %.lr.ph.us.us.i
    >
    > .lr.ph.us.us.i:                                   ; preds = %14,
    > %.lr.ph23.us.i
    >   %y.018.us.us.i = phi i32 [ %15, %14 ], [ %3, %.lr.ph23.us.i ]
    >   %end.idx = add i32 %12, %7
    >   %n.vec = and i32 %12, -128
    >   %end.idx.rnd.down = add i32 %n.vec, %7
    >   %cmp.zero = icmp eq i32 %n.vec, 0
    >   br i1 %cmp.zero, label %middle.block, label %vector.body
    >
    > vector.body:                                      ; preds %vector.body,
    > %.lr.ph.us.us.i
    >   %index = phi i32 [ %index.next, %vector.body ], [ %7,
%.lr.ph.us.us.i ]
    >   %index.next = add i32 %index, 128
    >   %16 = icmp eq i32 %index.next, %end.idx.rnd.down
    >   br i1 %16, label %middle.block, label %vector.body, !llvm.loop !1
    >
    > middle.block:                                     ; preds %vector.body,
    > %.lr.ph.us.us.i
    >   %resume.val = phi i32 [ %7, %.lr.ph.us.us.i ], [ %end.idx.rnd.down,
    > %vector.body ]
    >   %cmp.n = icmp eq i32 %end.idx, %resume.val
    >   br i1 %cmp.n, label %14, label %.noexc
    >
    > .noexc:                                           ; preds = %.noexc,
    > %middle.block
    >   %x.012.us.us.i = phi i32 [ %17, %.noexc ], [ %resume.val,
%middle.block ]
    >   %17 = add nsw i32 %x.012.us.us.i, 1
    >   %exitcond64.i = icmp eq i32 %17, %8
    >   br i1 %exitcond64.i, label %14, label %.noexc, !llvm.loop !4
    >
    > _ZN15SamplingClosureD1Ev.exit:                    ; preds     >
%._crit_edge24.us-lcssa.us57.i, %.lr.ph35.i, %0
    >   ret { <2 x float>, float } zeroinitializer
    > }
    >
    > attributes #0 = { ssp uwtable
"less-precise-fpmad"="false"
    > "no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"
    > "no-infs-fp-math"="false"
"no-nans-fp-math"="false"
    > "stack-protector-buffer-size"="8"
"unsafe-fp-math"="false"
    > "use-soft-float"="false" }
    >
    > !llvm.ident = !{!0}
    >
    > !0 = metadata !{metadata !"Apple LLVM version 6.0 (clang-600.0.57)
(based
    > on LLVM 3.5svn)"}
    > !1 = metadata !{metadata !1, metadata !2, metadata !3}
    > !2 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
    > !3 = metadata !{metadata !"llvm.vectorizer.unroll", i32 1}
    > !4 = metadata !{metadata !4, metadata !2, metadata !3}

Thanks for you help,
Benoit

Benoit Belley
Sr Principal Developer
M&E-Product Development Group
MAIN +1 514 393 1616<tel:%2B1%20514%20393%201616>
DIRECT +1 438 448 6304<tel:%2B1%20438%20448%206304>
FAX +1 514 393 0110<tel:%2B1%20514%20393%200110>
Twitter <http://twitter.com/autodesk>
Facebook <https://www.facebook.com/Autodesk>
Autodesk, Inc.
10 Duke Street
Montreal, Quebec, Canada H3C 2L7
www.autodesk.com<http://www.autodesk.com> <http://www.autodesk.com/>

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/1b8aac13/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 350F40DB-4457-4455-A632-0DF05738AF15[8].png
Type: image/png
Size: 4316 bytes
Desc: 350F40DB-4457-4455-A632-0DF05738AF15[8].png
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150325/1b8aac13/attachment.png>

llvm dev - Mar 2015 - [LLVMdev] Optimization puzzle...

[LLVMdev] Optimization puzzle...

[LLVMdev] Optimization puzzle...

[LLVMdev] Optimization puzzle...

[LLVMdev] Optimization puzzle...