thr3ads.net - llvm dev - [LLVMdev] Poor register allocations vs gcc [Jul 2015]

If this information is useful, please help other people find it:
Share via:

deco33000 Jog

2015-Jul-13 18:08 UTC

[LLVMdev] Poor register allocations vs gcc

<br />Hello, <br />Ecx is a problem because you have to xor it.
Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br
/><br />Now llvm is one instruction from gcc. If ecx was not used, it
would be as fast.<br />-- <br />Sent from Yandex.Mail for
mobile<br /><br />20:03, 13 July 2015, Matthias Braun
<mbraun@apple.com>:<br /><blockquote><br /><br
/><blockquote>šOn Jul 13, 2015, at 10:03 AM, deco33000@yandex.com
wrote:<br /><br />šHello,<br /><br />šI have an issue
with the llvm optimizations. I need to create object codes.<br /><br
/>šthe -ON PURPOSE poor && useless- code :<br
/>š---------------------------------------------------<br />š#include
<stdio.h><br />š#include <stdlib.h><br /><br
/>šint ci(int a){<br /><br />šššššššššreturn 23;<br
/><br />š}<br />šint flop(int a, char ** c){<br /><br
/>ššššššššša += 71;<br /><br />šššššššššint b = 0;<br
/><br />šššššššššif (a == 56){<br /><br
/>šššššššššššššššššb = 69;<br />šššššššššššššššššb += ci(a);<br
/>ššššššššš}<br /><br />šššššššššputs("ok");<br
/>šššššššššreturn a + b;<br />š}<br
/>š--------------------------------------<br /><br />šCompiled
that way (using the versions I downloaded and eventually compiled) :<br
/>šclang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br
/>šagainst gcc:<br />šgcc_custom -std=c11 -O3 -march=native -c app2.c
-S<br /><br />šVersions (latest for each, downloaded just a few days
ago):<br />šgcc : 5.1<br />šclang/llvm:
clang+llvm-3.6.1-x86_64-apple-darwin<br /><br />šHost:<br
/>šosx yosemite.<br /><br />šThe assembly (cut to the
essential):<br /><br />šLLVM:<br />šššššššššpushq %rbp<br
/>šššššššššmovq %rsp, %rbp<br />šššššššššpushq %r14<br
/>šššššššššpushq %rbx<br />šššššššššmovl %edi, %r14d<br
/>šššššššššleal 71(%r14), %eax<br />šššššššššxorl %ecx, %ecx<br
/>šššššššššcmpl $56, %eax<br />šššššššššmovl $92, %ebx<br
/>šššššššššcmovnel %ecx, %ebx<br />šššššššššleaq L_.str(%rip),
%rdi<br />šššššššššcallq _puts<br />šššššššššleal 71(%rbx,%r14),
%eax<br />šššššššššpopq %rbx<br />šššššššššpopq %r14<br
/>šššššššššpopq %rbp<br />šššššššššretq<br /><br />šand the
gcc one:<br /><br />ššššššššpushq %rbp<br />šššššššššmovl $0,
%eax<br />šššššššššmovl $92, %ebp<br />šššššššššpushq %rbx<br
/>šššššššššleal 71(%rdi), %ebx<br />šššššššššleaq LC1(%rip), %rdi<br
/>šššššššššsubq $8, %rsp<br />šššššššššcmpl $56, %ebx<br
/>šššššššššcmovne %eax, %ebp<br />šššššššššcall _puts<br
/>šššššššššaddq $8, %rsp<br />šššššššššleal 0(%rbp,%rbx), %eax<br
/>šššššššššpopq %rbx<br />šššššššššpopq %rbp<br
/>šššššššššret<br /><br />šAs we can see, llvm makes poor
register allocations (ecx and r14), leading to more instructions for the same
result.<br /><br />šAre there some optimizations I can bring on the
table to avoid this ?<br /></blockquote><br />As far as I know
clang on OS X always sets up a frame pointer unless you explicitely use
-fomit-frame-pointer. I think the reasoning being that dtrace and others rely on
frame pointers being present.<br /><br />I don't see why using
%ecx would be a problem, there are no extra spill/reloads produced because of
that.<br /><br />- Matthias<br /><br
/></blockquote>

Matthias Braun

2015-Jul-13 18:25 UTC

head link

[LLVMdev] Poor register allocations vs gcc

> On Jul 13, 2015, at 11:08 AM, deco33000 Jog <deco33000 at yandex.com>
wrote:
> 
> 
> Hello, 
> Ecx is a problem because you have to xor it. Which is avoided in the gcc
compilation. Fomit-pointer-frame helps.
> 
> Now llvm is one instruction from gcc. If ecx was not used, it would be as
fast.Register allocation is not the problem here. If you look at the gcc produced
code you see "movl $0, %eax" as well (no idea why it wouldn't use
xorl to zero the register).
I looked into it again and the fact that llvms version is 1 instruction more is
because the addition of 71 is folded into the last leal which means the value
before adding the 71 and the value plus 71 is alive in the part before the puts
call effectively leading to an additional mov instruction being necessary to
duplicate the value. You could file a PR if you really care about the issue.

- Matthias
> -- 
> Sent from Yandex.Mail for mobile
> 
> 20:03, 13 July 2015, Matthias Braun <mbraun at apple.com>:
> 
> 
>  On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote:
> 
>  Hello,
> 
>  I have an issue with the llvm optimizations. I need to create object
codes.
> 
>  the -ON PURPOSE poor && useless- code :
>  ---------------------------------------------------
>  #include <stdio.h>
>  #include <stdlib.h>
> 
>  int ci(int a){
> 
>          return 23;
> 
>  }
>  int flop(int a, char ** c){
> 
>          a += 71;
> 
>          int b = 0;
> 
>          if (a == 56){
> 
>                  b = 69;
>                  b += ci(a);
>          }
> 
>          puts("ok");
>          return a + b;
>  }
>  --------------------------------------
> 
>  Compiled that way (using the versions I downloaded and eventually
compiled) :
>  clang_custom -std=c11 -O3 -march=native -c app2.c -S
> 
>  against gcc:
>  gcc_custom -std=c11 -O3 -march=native -c app2.c -S
> 
>  Versions (latest for each, downloaded just a few days ago):
>  gcc : 5.1
>  clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin
> 
>  Host:
>  osx yosemite.
> 
>  The assembly (cut to the essential):
> 
>  LLVM:
>          pushq %rbp
>          movq %rsp, %rbp
>          pushq %r14
>          pushq %rbx
>          movl %edi, %r14d
>          leal 71(%r14), %eax
>          xorl %ecx, %ecx
>          cmpl $56, %eax
>          movl $92, %ebx
>          cmovnel %ecx, %ebx
>          leaq L_.str(%rip), %rdi
>          callq _puts
>          leal 71(%rbx,%r14), %eax
>          popq %rbx
>          popq %r14
>          popq %rbp
>          retq
> 
>  and the gcc one:
> 
>         pushq %rbp
>          movl $0, %eax
>          movl $92, %ebp
>          pushq %rbx
>          leal 71(%rdi), %ebx
>          leaq LC1(%rip), %rdi
>          subq $8, %rsp
>          cmpl $56, %ebx
>          cmovne %eax, %ebp
>          call _puts
>          addq $8, %rsp
>          leal 0(%rbp,%rbx), %eax
>          popq %rbx
>          popq %rbp
>          ret
> 
>  As we can see, llvm makes poor register allocations (ecx and r14), leading
to more instructions for the same result.
> 
>  Are there some optimizations I can bring on the table to avoid this ?
> 
> As far as I know clang on OS X always sets up a frame pointer unless you
explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace
and others rely on frame pointers being present.
> 
> I don't see why using %ecx would be a problem, there are no extra
spill/reloads produced because of that.
> 
> - Matthias
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150713/7865f99a/attachment.html>

deco33000 Jog

2015-Jul-13 18:26 UTC

head link

[LLVMdev] Poor register allocations vs gcc

<br />I will Matthias.<br /><br />Thanks!<br />-- <br
/>Sent from Yandex.Mail for mobile<br /><br />20:25, 13 July
2015, Matthias Braun <mbraun@apple.com>:<br
/><blockquote><br /><br /><blockquote>On Jul 13,
2015, at 11:08 AM, deco33000 Jog <deco33000@yandex.com> wrote:<br
/><br /><br />Hello, <br />Ecx is a problem because you
have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame
helps.<br /><br />Now llvm is one instruction from gcc. If ecx was
not used, it would be as fast.<br /><br
/></blockquote>Register allocation is not the problem here. If you look
at the gcc produced code you see "movl $0, %eax" as well (no idea why
it wouldn't use xorl to zero the register).<br />I looked into it
again and the fact that llvms version is 1 instruction more is because the
addition of 71 is folded into the last leal which means the value before adding
the 71 and the value plus 71 is alive in the part before the puts call
effectively leading to an additional mov instruction being necessary to
duplicate the value. You could file a PR if you really care about the
issue.<br /><br /><br />- Matthias<br /><br
/><blockquote>-- <br />Sent from Yandex.Mail for mobile<br
/><br />20:03, 13 July 2015, Matthias Braun
<mbraun@apple.com>:<br /><blockquote><br /><br
/><blockquote>šOn Jul 13, 2015, at 10:03 AM, deco33000@yandex.com
wrote:<br /><br />šHello,<br /><br />šI have an issue
with the llvm optimizations. I need to create object codes.<br /><br
/>šthe -ON PURPOSE poor && useless- code :<br
/>š---------------------------------------------------<br />š#include
<stdio.h><br />š#include <stdlib.h><br /><br
/>šint ci(int a){<br /><br />šššššššššreturn 23;<br
/><br />š}<br />šint flop(int a, char ** c){<br /><br
/>ššššššššša += 71;<br /><br />šššššššššint b = 0;<br
/><br />šššššššššif (a == 56){<br /><br
/>šššššššššššššššššb = 69;<br />šššššššššššššššššb += ci(a);<br
/>ššššššššš}<br /><br />šššššššššputs("ok");<br
/>šššššššššreturn a + b;<br />š}<br
/>š--------------------------------------<br /><br />šCompiled
that way (using the versions I downloaded and eventually compiled) :<br
/>šclang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br
/>šagainst gcc:<br />šgcc_custom -std=c11 -O3 -march=native -c app2.c
-S<br /><br />šVersions (latest for each, downloaded just a few days
ago):<br />šgcc : 5.1<br />šclang/llvm:
clang+llvm-3.6.1-x86_64-apple-darwin<br /><br />šHost:<br
/>šosx yosemite.<br /><br />šThe assembly (cut to the
essential):<br /><br />šLLVM:<br />šššššššššpushq %rbp<br
/>šššššššššmovq %rsp, %rbp<br />šššššššššpushq %r14<br
/>šššššššššpushq %rbx<br />šššššššššmovl %edi, %r14d<br
/>šššššššššleal 71(%r14), %eax<br />šššššššššxorl %ecx, %ecx<br
/>šššššššššcmpl $56, %eax<br />šššššššššmovl $92, %ebx<br
/>šššššššššcmovnel %ecx, %ebx<br />šššššššššleaq L_.str(%rip),
%rdi<br />šššššššššcallq _puts<br />šššššššššleal 71(%rbx,%r14),
%eax<br />šššššššššpopq %rbx<br />šššššššššpopq %r14<br
/>šššššššššpopq %rbp<br />šššššššššretq<br /><br />šand the
gcc one:<br /><br />ššššššššpushq %rbp<br />šššššššššmovl $0,
%eax<br />šššššššššmovl $92, %ebp<br />šššššššššpushq %rbx<br
/>šššššššššleal 71(%rdi), %ebx<br />šššššššššleaq LC1(%rip), %rdi<br
/>šššššššššsubq $8, %rsp<br />šššššššššcmpl $56, %ebx<br
/>šššššššššcmovne %eax, %ebp<br />šššššššššcall _puts<br
/>šššššššššaddq $8, %rsp<br />šššššššššleal 0(%rbp,%rbx), %eax<br
/>šššššššššpopq %rbx<br />šššššššššpopq %rbp<br
/>šššššššššret<br /><br />šAs we can see, llvm makes poor
register allocations (ecx and r14), leading to more instructions for the same
result.<br /><br />šAre there some optimizations I can bring on the
table to avoid this ?<br /></blockquote><br />As far as I know
clang on OS X always sets up a frame pointer unless you explicitely use
-fomit-frame-pointer. I think the reasoning being that dtrace and others rely on
frame pointers being present.<br /><br />I don't see why using
%ecx would be a problem, there are no extra spill/reloads produced because of
that.<br /><br />- Matthias<br /><br
/></blockquote> <br /></blockquote><br /><br
/><br /></blockquote>

llvm dev - Jul 2015 - [LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc