Hello, I have an issue with the llvm optimizations. I need to create object codes. the -ON PURPOSE poor && useless- code : --------------------------------------------------- #include <stdio.h> #include <stdlib.h> int ci(int a){ return 23; } int flop(int a, char ** c){ a += 71; int b = 0; if (a == 56){ b = 69; b += ci(a); } puts("ok"); return a + b; } -------------------------------------- Compiled that way (using the versions I downloaded and eventually compiled) : clang_custom -std=c11 -O3 -march=native -c app2.c -S against gcc: gcc_custom -std=c11 -O3 -march=native -c app2.c -S Versions (latest for each, downloaded just a few days ago): gcc : 5.1 clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin Host: osx yosemite. The assembly (cut to the essential): LLVM: pushq %rbp movq %rsp, %rbp pushq %r14 pushq %rbx movl %edi, %r14d leal 71(%r14), %eax xorl %ecx, %ecx cmpl $56, %eax movl $92, %ebx cmovnel %ecx, %ebx leaq L_.str(%rip), %rdi callq _puts leal 71(%rbx,%r14), %eax popq %rbx popq %r14 popq %rbp retq and the gcc one: pushq %rbp movl $0, %eax movl $92, %ebp pushq %rbx leal 71(%rdi), %ebx leaq LC1(%rip), %rdi subq $8, %rsp cmpl $56, %ebx cmovne %eax, %ebp call _puts addq $8, %rsp leal 0(%rbp,%rbx), %eax popq %rbx popq %rbp ret As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result. Are there some optimizations I can bring on the table to avoid this ? -- Jog
> On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote: > > Hello, > > I have an issue with the llvm optimizations. I need to create object codes. > > the -ON PURPOSE poor && useless- code : > --------------------------------------------------- > #include <stdio.h> > #include <stdlib.h> > > int ci(int a){ > > return 23; > > } > int flop(int a, char ** c){ > > a += 71; > > int b = 0; > > if (a == 56){ > > b = 69; > b += ci(a); > } > > puts("ok"); > return a + b; > } > -------------------------------------- > > Compiled that way (using the versions I downloaded and eventually compiled) : > clang_custom -std=c11 -O3 -march=native -c app2.c -S > > against gcc: > gcc_custom -std=c11 -O3 -march=native -c app2.c -S > > Versions (latest for each, downloaded just a few days ago): > gcc : 5.1 > clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin > > Host: > osx yosemite. > > The assembly (cut to the essential): > > LLVM: > pushq %rbp > movq %rsp, %rbp > pushq %r14 > pushq %rbx > movl %edi, %r14d > leal 71(%r14), %eax > xorl %ecx, %ecx > cmpl $56, %eax > movl $92, %ebx > cmovnel %ecx, %ebx > leaq L_.str(%rip), %rdi > callq _puts > leal 71(%rbx,%r14), %eax > popq %rbx > popq %r14 > popq %rbp > retq > > and the gcc one: > > pushq %rbp > movl $0, %eax > movl $92, %ebp > pushq %rbx > leal 71(%rdi), %ebx > leaq LC1(%rip), %rdi > subq $8, %rsp > cmpl $56, %ebx > cmovne %eax, %ebp > call _puts > addq $8, %rsp > leal 0(%rbp,%rbx), %eax > popq %rbx > popq %rbp > ret > > As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result. > > Are there some optimizations I can bring on the table to avoid this ?As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present. I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that. - Matthias
<br />Hello, <br />Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br /><br />Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.<br />-- <br />Sent from Yandex.Mail for mobile<br /><br />20:03, 13 July 2015, Matthias Braun <mbraun@apple.com>:<br /><blockquote><br /><br /><blockquote>šOn Jul 13, 2015, at 10:03 AM, deco33000@yandex.com wrote:<br /><br />šHello,<br /><br />šI have an issue with the llvm optimizations. I need to create object codes.<br /><br />šthe -ON PURPOSE poor && useless- code :<br />š---------------------------------------------------<br />š#include <stdio.h><br />š#include <stdlib.h><br /><br />šint ci(int a){<br /><br />šššššššššreturn 23;<br /><br />š}<br />šint flop(int a, char ** c){<br /><br />ššššššššša += 71;<br /><br />šššššššššint b = 0;<br /><br />šššššššššif (a == 56){<br /><br />šššššššššššššššššb = 69;<br />šššššššššššššššššb += ci(a);<br />ššššššššš}<br /><br />šššššššššputs("ok");<br />šššššššššreturn a + b;<br />š}<br />š--------------------------------------<br /><br />šCompiled that way (using the versions I downloaded and eventually compiled) :<br />šclang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br />šagainst gcc:<br />šgcc_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br />šVersions (latest for each, downloaded just a few days ago):<br />šgcc : 5.1<br />šclang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin<br /><br />šHost:<br />šosx yosemite.<br /><br />šThe assembly (cut to the essential):<br /><br />šLLVM:<br />šššššššššpushq %rbp<br />šššššššššmovq %rsp, %rbp<br />šššššššššpushq %r14<br />šššššššššpushq %rbx<br />šššššššššmovl %edi, %r14d<br />šššššššššleal 71(%r14), %eax<br />šššššššššxorl %ecx, %ecx<br />šššššššššcmpl $56, %eax<br />šššššššššmovl $92, %ebx<br />šššššššššcmovnel %ecx, %ebx<br />šššššššššleaq L_.str(%rip), %rdi<br />šššššššššcallq _puts<br />šššššššššleal 71(%rbx,%r14), %eax<br />šššššššššpopq %rbx<br />šššššššššpopq %r14<br />šššššššššpopq %rbp<br />šššššššššretq<br /><br />šand the gcc one:<br /><br />ššššššššpushq %rbp<br />šššššššššmovl $0, %eax<br />šššššššššmovl $92, %ebp<br />šššššššššpushq %rbx<br />šššššššššleal 71(%rdi), %ebx<br />šššššššššleaq LC1(%rip), %rdi<br />šššššššššsubq $8, %rsp<br />šššššššššcmpl $56, %ebx<br />šššššššššcmovne %eax, %ebp<br />šššššššššcall _puts<br />šššššššššaddq $8, %rsp<br />šššššššššleal 0(%rbp,%rbx), %eax<br />šššššššššpopq %rbx<br />šššššššššpopq %rbp<br />šššššššššret<br /><br />šAs we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result.<br /><br />šAre there some optimizations I can bring on the table to avoid this ?<br /></blockquote><br />As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present.<br /><br />I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that.<br /><br />- Matthias<br /><br /></blockquote>
Hi Jog, This look like a scheduling problem to me. The main difference here is that in GCC the final “a + b” is scheduled before the call, whereas in LLVM case, this is scheduled after the call. Because of that, %rdi cannot be used in the final add and it has to be saved somewhere else. You can see that in effect by replacing: puts("ok"); return a + b; By b += a; puts("ok"); return b; That being said, you shouldn’t have to do that to have the nice code. Could you file a PR for the scheduling problem? Thanks, -Quentin> On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote: > > Hello, > > I have an issue with the llvm optimizations. I need to create object codes. > > the -ON PURPOSE poor && useless- code : > --------------------------------------------------- > #include <stdio.h> > #include <stdlib.h> > > int ci(int a){ > > return 23; > > } > int flop(int a, char ** c){ > > a += 71; > > int b = 0; > > if (a == 56){ > > b = 69; > b += ci(a); > } > > puts("ok"); > return a + b; > } > -------------------------------------- > > Compiled that way (using the versions I downloaded and eventually compiled) : > clang_custom -std=c11 -O3 -march=native -c app2.c -S > > against gcc: > gcc_custom -std=c11 -O3 -march=native -c app2.c -S > > Versions (latest for each, downloaded just a few days ago): > gcc : 5.1 > clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin > > Host: > osx yosemite. > > The assembly (cut to the essential): > > LLVM: > pushq %rbp > movq %rsp, %rbp > pushq %r14 > pushq %rbx > movl %edi, %r14d > leal 71(%r14), %eax > xorl %ecx, %ecx > cmpl $56, %eax > movl $92, %ebx > cmovnel %ecx, %ebx > leaq L_.str(%rip), %rdi > callq _puts > leal 71(%rbx,%r14), %eax > popq %rbx > popq %r14 > popq %rbp > retq > > and the gcc one: > > pushq %rbp > movl $0, %eax > movl $92, %ebp > pushq %rbx > leal 71(%rdi), %ebx > leaq LC1(%rip), %rdi > subq $8, %rsp > cmpl $56, %ebx > cmovne %eax, %ebp > call _puts > addq $8, %rsp > leal 0(%rbp,%rbx), %eax > popq %rbx > popq %rbp > ret > > As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result. > > Are there some optimizations I can bring on the table to avoid this ? > -- > Jog > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<br />Hi, <br />I certainly will Quentin!<br /><br />Thanks<br />-- <br />Sent from Yandex.Mail for mobile<br /><br />20:10, 13 July 2015, Quentin Colombet <qcolombet@apple.com>:<br /><blockquote>Hi Jog,<br /><br />This look like a scheduling problem to me.<br /><br />The main difference here is that in GCC the final “a + b” is scheduled before the call, whereas in LLVM case, this is scheduled after the call.<br />Because of that, %rdi cannot be used in the final add and it has to be saved somewhere else.<br /><br />You can see that in effect by replacing:<br /> puts("ok");<br /> return a + b;<br /><br />By<br /><br /> b += a;<br /> puts("ok");<br /> return b;<br /><br />That being said, you shouldn’t have to do that to have the nice code.<br /><br />Could you file a PR for the scheduling problem?<br /><br />Thanks,<br />-Quentin<br /><br /><blockquote> On Jul 13, 2015, at 10:03 AM, deco33000@yandex.com wrote:<br /><br /> Hello,<br /><br /> I have an issue with the llvm optimizations. I need to create object codes.<br /><br /> the -ON PURPOSE poor && useless- code :<br /> ---------------------------------------------------<br /> #include <stdio.h><br /> #include <stdlib.h><br /><br /> int ci(int a){<br /><br /> return 23;<br /><br /> }<br /> int flop(int a, char ** c){<br /><br /> a += 71;<br /><br /> int b = 0;<br /><br /> if (a == 56){<br /><br /> b = 69;<br /> b += ci(a);<br /> }<br /><br /> puts("ok");<br /> return a + b;<br /> }<br /> --------------------------------------<br /><br /> Compiled that way (using the versions I downloaded and eventually compiled) :<br /> clang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br /> against gcc:<br /> gcc_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br /> Versions (latest for each, downloaded just a few days ago):<br /> gcc : 5.1<br /> clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin<br /><br /> Host:<br /> osx yosemite.<br /><br /> The assembly (cut to the essential):<br /><br /> LLVM:<br /> pushq %rbp<br /> movq %rsp, %rbp<br /> pushq %r14<br /> pushq %rbx<br /> movl %edi, %r14d<br /> leal 71(%r14), %eax<br /> xorl %ecx, %ecx<br /> cmpl $56, %eax<br /> movl $92, %ebx<br /> cmovnel %ecx, %ebx<br /> leaq L_.str(%rip), %rdi<br /> callq _puts<br /> leal 71(%rbx,%r14), %eax<br /> popq %rbx<br /> popq %r14<br /> popq %rbp<br /> retq<br /><br /> and the gcc one:<br /><br /> pushq %rbp<br /> movl $0, %eax<br /> movl $92, %ebp<br /> pushq %rbx<br /> leal 71(%rdi), %ebx<br /> leaq LC1(%rip), %rdi<br /> subq $8, %rsp<br /> cmpl $56, %ebx<br /> cmovne %eax, %ebp<br /> call _puts<br /> addq $8, %rsp<br /> leal 0(%rbp,%rbx), %eax<br /> popq %rbx<br /> popq %rbp<br /> ret<br /><br /> As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result.<br /><br /> Are there some optimizations I can bring on the table to avoid this ?<br /> -- <br /> Jog<br /><br /> _______________________________________________<br /> LLVM Developers mailing list<br /> LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br /> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br /></blockquote><br /><br /></blockquote>
By the way Quentin, Your modification makes llvm much faster than gcc (12 ops vs 15 ops): less pushq/popq, better use of the registers.. This code is silly at best but thanks to you I could learn something on llvm. Thanks a lot :) -- Jog
Apparently Analagous Threads
- [LLVMdev] Poor register allocations vs gcc
- [LLVMdev] llvm jit acting at runtime, like libgccjit ?
- [LLVMdev] llvm jit acting at runtime, like libgccjit ?
- [LLVMdev] ARM struct byval size > 64 triggers failure
- [LLVMdev] ARM struct byval size > 64 triggers failure