thr3ads.net - llvm dev - [LLVMdev] Missed optimization on array initialization [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Carlo Alberto Ferraris

2012-Feb-25 11:17 UTC

[LLVMdev] Missed optimization on array initialization

Prompted by a SO post 
(http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363)
I checked and found that LLVM yields the same (seemingly) suboptimal 
code as MSVC.
Consider the following, simplified, C snippet:

extern void bar(int*);

void foo(int a)
{
     int ar[100] = {a};
     if (a)
         return;
     bar(ar);
}

Ideally, the array initialization should be sank after the return, but 
in Clang/LLVM 3.0 this doesn't happen:

; ModuleID = '/tmp/webcompile/_11079_0.bc'
target  datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target  triple = "x86_64-unknown-linux-gnu"

define  void  @_Z3fooi(i32  %a) uwtable {
   %ar =alloca  [100 xi32],align  16
   %1 =bitcast  [100 xi32]* %arto  i8*
   call  void  @llvm.memset.p0i8.i64(i8* %1,i8  0,i64  400,i32  16,i1  false)
   %2 =getelementptr  inbounds [100 xi32]* %ar,i64  0,i64  0
   store  i32  %a,i32* %2,align  16, !tbaa !0
   %3 =icmp  eq  i32  %a, 0
   br  i1  %3,label  %4,label  %5

;<label>:4                                       ; preds = %0
   call  void  @_Z3barPi(i32* %2)
   br  label  %5

;<label>:5                                       ; preds = %4, %0
   ret  void
}

declare  void  @llvm.memset.p0i8.i64(i8*nocapture,i8,i64,i32,i1)nounwind

declare  void  @_Z3barPi(i32*)

!0 = metadata !{metadata !"int", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA",null}

and this gets emitted as (for x64, but x86 is similar):

# BB#0:
	pushq	%rbx
.Ltmp3:
	.cfi_def_cfa_offset 16
	subq	$400, %rsp              # imm = 0x190
.Ltmp4:
	.cfi_def_cfa_offset 416
.Ltmp5:
	.cfi_offset %rbx, -16
	movl	%edi, %ebx
	leaq	(%rsp), %rdi
	xorl	%esi, %esi
	movl	$400, %edx              # imm = 0x190
	callq	memset
	movl	%ebx, (%rsp)
	testl	%ebx, %ebx
	jne	.LBB0_2
# BB#1:
	leaq	(%rsp), %rdi
	callq	_Z3barPi
.LBB0_2:
	addq	$400, %rsp              # imm = 0x190
	popq	%rbx
	ret

I don't have ToT at hand, so I don't know if this is still the case. Any
idea why this might be happening?

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120225/4969db25/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cafxx.vcf
Type: text/x-vcard
Size: 230 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120225/4969db25/attachment.vcf>

Duncan Sands

2012-Feb-25 12:17 UTC

head link

[LLVMdev] Missed optimization on array initialization

Hi Carlo, for what it's worth, gcc-4.7 doesn't get this either.

Ciao, Duncan.

On 25/02/12 12:17, Carlo Alberto Ferraris wrote:> Prompted by a SO post
>
(http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363)
> I checked and found that LLVM yields the same (seemingly) suboptimal code
as MSVC.
> Consider the following, simplified, C snippet:
>
> extern void bar(int*);
>
> void foo(int a)
> {
>      int ar[100] = {a};
>      if (a)
>          return;
>      bar(ar);
> }
>
> Ideally, the array initialization should be sank after the return, but in
> Clang/LLVM 3.0 this doesn't happen:
>
> ; ModuleID ='/tmp/webcompile/_11079_0.bc'
> target  datalayout
="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target  triple ="x86_64-unknown-linux-gnu"
>
> define  void  @_Z3fooi(i32  %a) uwtable {
>    %ar =alloca  [100 xi32],align  16
>    %1 =bitcast  [100 xi32]* %arto  i8*
>    call  void  @llvm.memset.p0i8.i64(i8* %1,i8  0,i64  400,i32  16,i1 
false)
>    %2 =getelementptr  inbounds [100 xi32]* %ar,i64  0,i64  0
>    store  i32  %a,i32* %2,align  16, !tbaa !0
>    %3 =icmp  eq  i32  %a, 0
>    br  i1  %3,label  %4,label  %5
>
> ;<label>:4                                       ; preds = %0
>    call  void  @_Z3barPi(i32* %2)
>    br  label  %5
>
> ;<label>:5                                       ; preds = %4, %0
>    ret  void
> }
>
> declare  void  @llvm.memset.p0i8.i64(i8*nocapture,i8,i64,i32,i1)nounwind
>
> declare  void  @_Z3barPi(i32*)
>
> !0 = metadata !{metadata !"int", metadata !1}
> !1 = metadata !{metadata !"omnipotent char", metadata !2}
> !2 = metadata !{metadata !"Simple C/C++ TBAA",null}
>
> and this gets emitted as (for x64, but x86 is similar):
>
> # BB#0:
> 	pushq	%rbx
> .Ltmp3:
> 	.cfi_def_cfa_offset 16
> 	subq	$400, %rsp              # imm = 0x190
> .Ltmp4:
> 	.cfi_def_cfa_offset 416
> .Ltmp5:
> 	.cfi_offset %rbx, -16
> 	movl	%edi, %ebx
> 	leaq	(%rsp), %rdi
> 	xorl	%esi, %esi
> 	movl	$400, %edx              # imm = 0x190
> 	callq	memset
> 	movl	%ebx, (%rsp)
> 	testl	%ebx, %ebx
> 	jne	.LBB0_2
> # BB#1:
> 	leaq	(%rsp), %rdi
> 	callq	_Z3barPi
> .LBB0_2:
> 	addq	$400, %rsp              # imm = 0x190
> 	popq	%rbx
> 	ret
>
> I don't have ToT at hand, so I don't know if this is still the
case. Any idea
> why this might be happening?
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Chris Lattner

2012-Feb-25 18:32 UTC

head link

[LLVMdev] Missed optimization on array initialization

On Feb 25, 2012, at 3:17 AM, Carlo Alberto Ferraris wrote:
> Prompted by a SO post
(http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363)
I checked and found that LLVM yields the same (seemingly) suboptimal code as
MSVC.
> Consider the following, simplified, C snippet:
> extern void bar(int*);
> 
> void foo(int a)
> {
>     int ar[100] = {a}; 
>     if (a)
>         return;
>     bar(ar);
> }
> 
> Ideally, the array initialization should be sank after the return, but in
Clang/LLVM 3.0 this doesn't happen:
This is a straight-forward form of code motion we don't implement, which
would be built on partially dead store analysis.  Our dead store analysis in
general isn't very powerful, and cannot see across blocks.  It turns out
that it is pretty expensive and doesn't often lead to big performance wins. 
That said, it is certainly an area that should be improved.

I'll note that the original example from SO is more complex.  Instead of a
single store, it is a whole loop that initializes the array.  Handling this case
requires moving the entire loop, which requires fairly heroic compiler analysis.
The saving grace is that that case is equivalent to a memcpy, so we may be able
to handle *that* someday.
>   %ar = alloca [100 x i32], align 16
>   %1 = bitcast [100 x i32]* %ar to i8*
>   call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 400, i32 16, i1 false)
>   %2 = getelementptr inbounds [100 x i32]* %ar, i64 0, i64 0
>   store i32 %a, i32* %2, align 16, !tbaa !0
I'm surprised that we're not shortening the memset to skip setting the
dead element.  That *is* something that we should be able to handle.  Pete,
didn't you implement this a while ago?

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120225/975f3c38/attachment.html>

Peter Cooper

2012-Feb-25 18:51 UTC

head link

[LLVMdev] Missed optimization on array initialization

On Feb 25, 2012, at 10:32 AM, Chris Lattner <clattner at apple.com> wrote:
> 
> On Feb 25, 2012, at 3:17 AM, Carlo Alberto Ferraris wrote:
> 
>> Prompted by a SO post
(http://stackoverflow.com/questions/9441882/compiler-instruction-reordering-optimizations-in-c-and-what-inhibits-them/9442363)
I checked and found that LLVM yields the same (seemingly) suboptimal code as
MSVC.
>> Consider the following, simplified, C snippet:
> 
>> extern void bar(int*);
>> 
>> void foo(int a)
>> {
>>     int ar[100] = {a}; 
>>     if (a)
>>         return;
>>     bar(ar);
>> }
>> 
>> Ideally, the array initialization should be sank after the return, but
in Clang/LLVM 3.0 this doesn't happen:
> 
> This is a straight-forward form of code motion we don't implement,
which would be built on partially dead store analysis.  Our dead store analysis
in general isn't very powerful, and cannot see across blocks.  It turns out
that it is pretty expensive and doesn't often lead to big performance wins. 
That said, it is certainly an area that should be improved.
> 
> I'll note that the original example from SO is more complex.  Instead
of a single store, it is a whole loop that initializes the array.  Handling this
case requires moving the entire loop, which requires fairly heroic compiler
analysis.  The saving grace is that that case is equivalent to a memcpy, so we
may be able to handle *that* someday.
> 
>>   %ar = alloca [100 x i32], align 16
>>   %1 = bitcast [100 x i32]* %ar to i8*
>>   call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 400, i32 16, i1
false)
>>   %2 = getelementptr inbounds [100 x i32]* %ar, i64 0, i64 0
>>   store i32 %a, i32* %2, align 16, !tbaa !0
> 
> I'm surprised that we're not shortening the memset to skip setting
the dead element.  That *is* something that we should be able to handle.  Pete,
didn't you implement this a while ago?Yeah. I think my implementation only trimmed stores to the end of the memset but
this is the start. I'll take a look at improving that. Will probably only
want to shorten the start of the memset when it's not going to shorten it to
a horribly unaligned start position but that's ok here.

Pete> 
> -Chris
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120225/90e8d62b/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Feb 2012 - [LLVMdev] Missed optimization on array initialization

[LLVMdev] Missed optimization on array initialization

[LLVMdev] Missed optimization on array initialization

[LLVMdev] Missed optimization on array initialization

[LLVMdev] Missed optimization on array initialization

Reasonably Related Threads