thr3ads.net - llvm dev - [LLVMdev] Not enough optimisations in the SelectionDAG phase? [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Fan Dawei

2012-Apr-25 06:48 UTC

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

For the following code fragment,

; <label>:27                                      ; preds = %27, %entry
  %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8
  %29 = icmp slt i32 %28, 0
  br i1 %29, label %27, label %loop.exit

loop.exit:                                  ; preds = %27

llc will generate following MIPS code,

$BB0_1:
  lui $3, 32800
  ori $3, $3, 1032
  lw  $3, 0($3)
  bltz  $3, $BB0_1
  nop
# BB#2:


The two operation lui and ori which are used to calculate memory address
actually are loop invariants. They supposed to be moved out of the loop.  I
thought it might be a limitation of the MIPS backend.  Then I tried the ARM
backend,

 .LBB1_1:
  ldr r2, .LCPI1_2
  ldr r2, [r2]
  cmp r2, #0
  blt .LBB1_1
@ BB#2:

The first ldr instruction is to load the address from constant pool. It
also should be outside the loop.

I'm not sure if this is because of the optimisations are not enough in the
common SelectionDAG optimisation phase, or  should this kind of
optimisation be implemented by the SelectionDAG instruction lowering phase
for each target?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120425/653a0858/attachment.html>

Duncan Sands

2012-Apr-29 14:04 UTC

head link

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

I suggest you open a bug report about this.

Ciao, Duncan.

On 25/04/12 08:48, Fan Dawei wrote:> For the following code fragment,
>
> ; <label>:27                                      ; preds = %27,
%entry
>    %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8
>    %29 = icmp slt i32 %28, 0
>    br i1 %29, label %27, label %loop.exit
>
> loop.exit:                                  ; preds = %27
>
> llc will generate following MIPS code,
>
> $BB0_1:
>    lui $3, 32800
>    ori $3, $3, 1032
>    lw  $3, 0($3)
>    bltz  $3, $BB0_1
>    nop
> # BB#2:
>
>
> The two operation lui and ori which are used to calculate memory address
> actually are loop invariants. They supposed to be moved out of the loop.  I
> thought it might be a limitation of the MIPS backend.  Then I tried the ARM
backend,
>
>   .LBB1_1:
>    ldr r2, .LCPI1_2
>    ldr r2, [r2]
>    cmp r2, #0
>    blt .LBB1_1
> @ BB#2:
>
> The first ldr instruction is to load the address from constant pool. It
also
> should be outside the loop.
>
> I'm not sure if this is because of the optimisations are not enough in
the
> common SelectionDAG optimisation phase, or  should this kind of
optimisation be
> implemented by the SelectionDAG instruction lowering phase for each target?
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Evan Cheng

2012-Apr-29 18:19 UTC

head link

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

On Apr 24, 2012, at 11:48 PM, Fan Dawei wrote:
> For the following code fragment, 
> 
> ; <label>:27                                      ; preds = %27,
%entry
>   %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8
>   %29 = icmp slt i32 %28, 0
>   br i1 %29, label %27, label %loop.exit
> 
> loop.exit:                                  ; preds = %27
> 
> llc will generate following MIPS code,
> 
> $BB0_1:                               
>   lui $3, 32800
>   ori $3, $3, 1032
>   lw  $3, 0($3)
>   bltz  $3, $BB0_1
>   nop
> # BB#2:
> 
> 
> The two operation lui and ori which are used to calculate memory address
actually are loop invariants. They supposed to be moved out of the loop.  I
thought it might be a limitation of the MIPS backend.  Then I tried the ARM
backend,
> 
>  .LBB1_1:                   
>   ldr r2, .LCPI1_2
>   ldr r2, [r2]
>   cmp r2, #0
>   blt .LBB1_1
> @ BB#2: 
> 
> The first ldr instruction is to load the address from constant pool. It
also should be outside the loop.
> 
> I'm not sure if this is because of the optimisations are not enough in
the common SelectionDAG optimisation phase, or  should this kind of optimisation
be implemented by the SelectionDAG instruction lowering phase for each target?
SelectionDAG doesn't do LICM. Are you running machine-licm pass?

Evan
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Matt Johnson

2012-Apr-29 19:34 UTC

head link

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

On 04/29/2012 01:19 PM, Evan Cheng wrote:> On Apr 24, 2012, at 11:48 PM, Fan Dawei wrote:
>
>> For the following code fragment,
>>
>> ;<label>:27                                      ; preds = %27,
%entry
>>    %28 = load volatile i32* inttoptr (i64 2149581832 to i32*), align 8
>>    %29 = icmp slt i32 %28, 0
>>    br i1 %29, label %27, label %loop.exit
>>
>> loop.exit:                                  ; preds = %27
>>
>> llc will generate following MIPS code,
>>
>> $BB0_1:
>>    lui $3, 32800
>>    ori $3, $3, 1032
>>    lw  $3, 0($3)
>>    bltz  $3, $BB0_1
>>    nop
>> # BB#2:
>>
>>
>> The two operation lui and ori which are used to calculate memory
address actually are loop invariants. They supposed to be moved out of the loop.
I thought it might be a limitation of the MIPS backend.  Then I tried the ARM
backend,
>>
>>   .LBB1_1:
>>    ldr r2, .LCPI1_2
>>    ldr r2, [r2]
>>    cmp r2, #0
>>    blt .LBB1_1
>> @ BB#2:
>>
>> The first ldr instruction is to load the address from constant pool. It
also should be outside the loop.
>>
>> I'm not sure if this is because of the optimisations are not enough
in the common SelectionDAG optimisation phase, or  should this kind of
optimisation be implemented by the SelectionDAG instruction lowering phase for
each target?I had a mailing list thread on this exact topic last month (see 
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048076.html**).
The underlying cause is that lui and ori are both 'cheap' instructions.
It used to be that cheap instructions would not get hoisted at all 
during Machine LICM.
There was a patch a couple weeks back (r154455) that was a bit more 
aggressive and will hoist cheap instructions if they don't increase 
register pressure, but it doesn't help us in this case because lui/ori 
are a pair of dependent ori instructions.  There is a chicken-and-egg 
problem where neither can be hoisted without the other, and MachineLICM 
is not aggressive enough to recognize chains of dependent, 
loop-invariant cheap instructions.
At the time, the advice was to implement a PseudoInstruction for lui+ori 
and lower it in a C++ pass, as is done in ARM (see MOVi32imm in 
ARMInstrInfo.td and ARMExpandPseudoInsts.cpp).
I did this for my target and it worked fine, so MIPS could do the same.
To me, that solution isn't too satisfying because you have to do this 
for every multi-instruction TableGen pattern to get them hoisted out of 
loops, but the philosophy seems to be to keep MachineLICM simple.
-Matt
> SelectionDAG doesn't do LICM. Are you running machine-licm pass?
>
> Evan
>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120429/dab3d895/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Apr 2012 - [LLVMdev] Not enough optimisations in the SelectionDAG phase?

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

[LLVMdev] Not enough optimisations in the SelectionDAG phase?

Reasonably Related Threads