<br />Hello, <br />Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br /><br />Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.<br />-- <br />Sent from Yandex.Mail for mobile<br /><br />20:03, 13 July 2015, Matthias Braun <mbraun@apple.com>:<br /><blockquote><br /><br /><blockquote>šOn Jul 13, 2015, at 10:03 AM, deco33000@yandex.com wrote:<br /><br />šHello,<br /><br />šI have an issue with the llvm optimizations. I need to create object codes.<br /><br />šthe -ON PURPOSE poor && useless- code :<br />š---------------------------------------------------<br />š#include <stdio.h><br />š#include <stdlib.h><br /><br />šint ci(int a){<br /><br />šššššššššreturn 23;<br /><br />š}<br />šint flop(int a, char ** c){<br /><br />ššššššššša += 71;<br /><br />šššššššššint b = 0;<br /><br />šššššššššif (a == 56){<br /><br />šššššššššššššššššb = 69;<br />šššššššššššššššššb += ci(a);<br />ššššššššš}<br /><br />šššššššššputs("ok");<br />šššššššššreturn a + b;<br />š}<br />š--------------------------------------<br /><br />šCompiled that way (using the versions I downloaded and eventually compiled) :<br />šclang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br />šagainst gcc:<br />šgcc_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br />šVersions (latest for each, downloaded just a few days ago):<br />šgcc : 5.1<br />šclang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin<br /><br />šHost:<br />šosx yosemite.<br /><br />šThe assembly (cut to the essential):<br /><br />šLLVM:<br />šššššššššpushq %rbp<br />šššššššššmovq %rsp, %rbp<br />šššššššššpushq %r14<br />šššššššššpushq %rbx<br />šššššššššmovl %edi, %r14d<br />šššššššššleal 71(%r14), %eax<br />šššššššššxorl %ecx, %ecx<br />šššššššššcmpl $56, %eax<br />šššššššššmovl $92, %ebx<br />šššššššššcmovnel %ecx, %ebx<br />šššššššššleaq L_.str(%rip), %rdi<br />šššššššššcallq _puts<br />šššššššššleal 71(%rbx,%r14), %eax<br />šššššššššpopq %rbx<br />šššššššššpopq %r14<br />šššššššššpopq %rbp<br />šššššššššretq<br /><br />šand the gcc one:<br /><br />ššššššššpushq %rbp<br />šššššššššmovl $0, %eax<br />šššššššššmovl $92, %ebp<br />šššššššššpushq %rbx<br />šššššššššleal 71(%rdi), %ebx<br />šššššššššleaq LC1(%rip), %rdi<br />šššššššššsubq $8, %rsp<br />šššššššššcmpl $56, %ebx<br />šššššššššcmovne %eax, %ebp<br />šššššššššcall _puts<br />šššššššššaddq $8, %rsp<br />šššššššššleal 0(%rbp,%rbx), %eax<br />šššššššššpopq %rbx<br />šššššššššpopq %rbp<br />šššššššššret<br /><br />šAs we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result.<br /><br />šAre there some optimizations I can bring on the table to avoid this ?<br /></blockquote><br />As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present.<br /><br />I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that.<br /><br />- Matthias<br /><br /></blockquote>
> On Jul 13, 2015, at 11:08 AM, deco33000 Jog <deco33000 at yandex.com> wrote: > > > Hello, > Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps. > > Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.Register allocation is not the problem here. If you look at the gcc produced code you see "movl $0, %eax" as well (no idea why it wouldn't use xorl to zero the register). I looked into it again and the fact that llvms version is 1 instruction more is because the addition of 71 is folded into the last leal which means the value before adding the 71 and the value plus 71 is alive in the part before the puts call effectively leading to an additional mov instruction being necessary to duplicate the value. You could file a PR if you really care about the issue. - Matthias> -- > Sent from Yandex.Mail for mobile > > 20:03, 13 July 2015, Matthias Braun <mbraun at apple.com>: > > > On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote: > > Hello, > > I have an issue with the llvm optimizations. I need to create object codes. > > the -ON PURPOSE poor && useless- code : > --------------------------------------------------- > #include <stdio.h> > #include <stdlib.h> > > int ci(int a){ > > return 23; > > } > int flop(int a, char ** c){ > > a += 71; > > int b = 0; > > if (a == 56){ > > b = 69; > b += ci(a); > } > > puts("ok"); > return a + b; > } > -------------------------------------- > > Compiled that way (using the versions I downloaded and eventually compiled) : > clang_custom -std=c11 -O3 -march=native -c app2.c -S > > against gcc: > gcc_custom -std=c11 -O3 -march=native -c app2.c -S > > Versions (latest for each, downloaded just a few days ago): > gcc : 5.1 > clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin > > Host: > osx yosemite. > > The assembly (cut to the essential): > > LLVM: > pushq %rbp > movq %rsp, %rbp > pushq %r14 > pushq %rbx > movl %edi, %r14d > leal 71(%r14), %eax > xorl %ecx, %ecx > cmpl $56, %eax > movl $92, %ebx > cmovnel %ecx, %ebx > leaq L_.str(%rip), %rdi > callq _puts > leal 71(%rbx,%r14), %eax > popq %rbx > popq %r14 > popq %rbp > retq > > and the gcc one: > > pushq %rbp > movl $0, %eax > movl $92, %ebp > pushq %rbx > leal 71(%rdi), %ebx > leaq LC1(%rip), %rdi > subq $8, %rsp > cmpl $56, %ebx > cmovne %eax, %ebp > call _puts > addq $8, %rsp > leal 0(%rbp,%rbx), %eax > popq %rbx > popq %rbp > ret > > As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result. > > Are there some optimizations I can bring on the table to avoid this ? > > As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present. > > I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that. > > - Matthias >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150713/7865f99a/attachment.html>
<br />I will Matthias.<br /><br />Thanks!<br />-- <br />Sent from Yandex.Mail for mobile<br /><br />20:25, 13 July 2015, Matthias Braun <mbraun@apple.com>:<br /><blockquote><br /><br /><blockquote>On Jul 13, 2015, at 11:08 AM, deco33000 Jog <deco33000@yandex.com> wrote:<br /><br /><br />Hello, <br />Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br /><br />Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.<br /><br /></blockquote>Register allocation is not the problem here. If you look at the gcc produced code you see "movl $0, %eax" as well (no idea why it wouldn't use xorl to zero the register).<br />I looked into it again and the fact that llvms version is 1 instruction more is because the addition of 71 is folded into the last leal which means the value before adding the 71 and the value plus 71 is alive in the part before the puts call effectively leading to an additional mov instruction being necessary to duplicate the value. You could file a PR if you really care about the issue.<br /><br /><br />- Matthias<br /><br /><blockquote>-- <br />Sent from Yandex.Mail for mobile<br /><br />20:03, 13 July 2015, Matthias Braun <mbraun@apple.com>:<br /><blockquote><br /><br /><blockquote>šOn Jul 13, 2015, at 10:03 AM, deco33000@yandex.com wrote:<br /><br />šHello,<br /><br />šI have an issue with the llvm optimizations. I need to create object codes.<br /><br />šthe -ON PURPOSE poor && useless- code :<br />š---------------------------------------------------<br />š#include <stdio.h><br />š#include <stdlib.h><br /><br />šint ci(int a){<br /><br />šššššššššreturn 23;<br /><br />š}<br />šint flop(int a, char ** c){<br /><br />ššššššššša += 71;<br /><br />šššššššššint b = 0;<br /><br />šššššššššif (a == 56){<br /><br />šššššššššššššššššb = 69;<br />šššššššššššššššššb += ci(a);<br />ššššššššš}<br /><br />šššššššššputs("ok");<br />šššššššššreturn a + b;<br />š}<br />š--------------------------------------<br /><br />šCompiled that way (using the versions I downloaded and eventually compiled) :<br />šclang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br />šagainst gcc:<br />šgcc_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br />šVersions (latest for each, downloaded just a few days ago):<br />šgcc : 5.1<br />šclang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin<br /><br />šHost:<br />šosx yosemite.<br /><br />šThe assembly (cut to the essential):<br /><br />šLLVM:<br />šššššššššpushq %rbp<br />šššššššššmovq %rsp, %rbp<br />šššššššššpushq %r14<br />šššššššššpushq %rbx<br />šššššššššmovl %edi, %r14d<br />šššššššššleal 71(%r14), %eax<br />šššššššššxorl %ecx, %ecx<br />šššššššššcmpl $56, %eax<br />šššššššššmovl $92, %ebx<br />šššššššššcmovnel %ecx, %ebx<br />šššššššššleaq L_.str(%rip), %rdi<br />šššššššššcallq _puts<br />šššššššššleal 71(%rbx,%r14), %eax<br />šššššššššpopq %rbx<br />šššššššššpopq %r14<br />šššššššššpopq %rbp<br />šššššššššretq<br /><br />šand the gcc one:<br /><br />ššššššššpushq %rbp<br />šššššššššmovl $0, %eax<br />šššššššššmovl $92, %ebp<br />šššššššššpushq %rbx<br />šššššššššleal 71(%rdi), %ebx<br />šššššššššleaq LC1(%rip), %rdi<br />šššššššššsubq $8, %rsp<br />šššššššššcmpl $56, %ebx<br />šššššššššcmovne %eax, %ebp<br />šššššššššcall _puts<br />šššššššššaddq $8, %rsp<br />šššššššššleal 0(%rbp,%rbx), %eax<br />šššššššššpopq %rbx<br />šššššššššpopq %rbp<br />šššššššššret<br /><br />šAs we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result.<br /><br />šAre there some optimizations I can bring on the table to avoid this ?<br /></blockquote><br />As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present.<br /><br />I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that.<br /><br />- Matthias<br /><br /></blockquote> <br /></blockquote><br /><br /><br /></blockquote>