Stefan Kanthak via llvm-dev
2018-Dec-01 17:28 UTC
[llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...
Compile the following functions with "-O3 -target i386" (see <https://godbolt.org/z/VmKlXL>): long long div(long long foo, long long bar) { return foo / bar; } On the left the generated code; on the right the expected, properly optimised code: div: # @div push ebp | mov ebp, esp | push dword ptr [ebp + 20] | push dword ptr [ebp + 16] | push dword ptr [ebp + 12] | push dword ptr [ebp + 8] | call __divdi3 | jmp __divdi3 add esp, 16 | pop ebp | ret | long long mod(long long foo, long long bar) { return foo % bar; } mod: # @mod push ebp | mov ebp, esp | push dword ptr [ebp + 20] | push dword ptr [ebp + 16] | push dword ptr [ebp + 12] | push dword ptr [ebp + 8] | call __moddi3 | jmp __moddi3 add esp, 16 | pop ebp | ret | long long mul(long long foo, long long bar) { return foo * bar; } mul: # @mul push ebp mov ebp, esp push esi mov ecx, dword ptr [ebp + 16] mov esi, dword ptr [ebp + 8] mov eax, ecx imul ecx, dword ptr [ebp + 12] mul esi imul esi, dword ptr [ebp + 20] add edx, ecx add edx, esi pop esi pop ebp ret
Craig Topper via llvm-dev
2018-Dec-01 19:07 UTC
[llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...
Clang's -target option is supposed to take a cpu type and an operating system. So "-target i386" is giving it no operatiing system. This is preventing frame pointer elimination which is why ebp is being updated. If you pass "-target i386-linux" you get sightly better code. The division/remainder operations are turned into library calls as part of instruction selection. This code is somewhat independent of how other calls are handled. We probably don't support tail calls in it. Is it really realistic that a user would have a non-inlined function that contains just a division? Why should we optimize for that case? ~Craig On Sat, Dec 1, 2018 at 9:37 AM Stefan Kanthak via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Compile the following functions with "-O3 -target i386" > (see <https://godbolt.org/z/VmKlXL>): > > long long div(long long foo, long long bar) > { > return foo / bar; > } > > On the left the generated code; on the right the expected, > properly optimised code: > > div: # @div > push ebp | > mov ebp, esp | > push dword ptr [ebp + 20] | > push dword ptr [ebp + 16] | > push dword ptr [ebp + 12] | > push dword ptr [ebp + 8] | > call __divdi3 | jmp __divdi3 > add esp, 16 | > pop ebp | > ret | > > > long long mod(long long foo, long long bar) > { > return foo % bar; > } > > mod: # @mod > push ebp | > mov ebp, esp | > push dword ptr [ebp + 20] | > push dword ptr [ebp + 16] | > push dword ptr [ebp + 12] | > push dword ptr [ebp + 8] | > call __moddi3 | jmp __moddi3 > add esp, 16 | > pop ebp | > ret | > > > long long mul(long long foo, long long bar) > { > return foo * bar; > } > > mul: # @mul > push ebp > mov ebp, esp > push esi > mov ecx, dword ptr [ebp + 16] > mov esi, dword ptr [ebp + 8] > mov eax, ecx > imul ecx, dword ptr [ebp + 12] > mul esi > imul esi, dword ptr [ebp + 20] > add edx, ecx > add edx, esi > pop esi > pop ebp > ret > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181201/08a35b72/attachment-0001.html>
Stefan Kanthak via llvm-dev
2018-Dec-01 19:55 UTC
[llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...
"Craig Topper" <craig.topper at gmail.com> wrote:> Clang's -target option is supposed to take a cpu type and an operating > system. So "-target i386" is giving it no operatiing system. This is > preventing frame pointer elimination which is why ebp is being updated. If > you pass "-target i386-linux" you get sightly better code.The frame pointer is but not the point here.> The division/remainder operations are turned into library calls as part of > instruction selection. This code is somewhat independent of how other calls > are handled. We probably don't support tail calls in it. Is it really > realistic that a user would have a non-inlined function that contains just > a division? Why should we optimize for that case?I've seen quite some libraries which implement such functions, calling just another function having the same prototype, as target-independent wrappers. So the question is not whether it's just a division, but in general the call of a function having the same prototype. regards Stefan> On Sat, Dec 1, 2018 at 9:37 AM Stefan Kanthak via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Compile the following functions with "-O3 -target i386" >> (see <https://godbolt.org/z/VmKlXL>): >> >> long long div(long long foo, long long bar) >> { >> return foo / bar; >> } >> >> On the left the generated code; on the right the expected, >> properly optimised code: >> >> div: # @div >> push ebp | >> mov ebp, esp | >> push dword ptr [ebp + 20] | >> push dword ptr [ebp + 16] | >> push dword ptr [ebp + 12] | >> push dword ptr [ebp + 8] | >> call __divdi3 | jmp __divdi3 >> add esp, 16 | >> pop ebp | >> ret | >> >> >> long long mod(long long foo, long long bar) >> { >> return foo % bar; >> } >> >> mod: # @mod >> push ebp | >> mov ebp, esp | >> push dword ptr [ebp + 20] | >> push dword ptr [ebp + 16] | >> push dword ptr [ebp + 12] | >> push dword ptr [ebp + 8] | >> call __moddi3 | jmp __moddi3 >> add esp, 16 | >> pop ebp | >> ret | >> >> >> long long mul(long long foo, long long bar) >> { >> return foo * bar; >> } >> >> mul: # @mul >> push ebp >> mov ebp, esp >> push esi >> mov ecx, dword ptr [ebp + 16] >> mov esi, dword ptr [ebp + 8] >> mov eax, ecx >> imul ecx, dword ptr [ebp + 12] >> mul esi >> imul esi, dword ptr [ebp + 20] >> add edx, ecx >> add edx, esi >> pop esi >> pop ebp >> ret >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >
Maybe Matching Threads
- Where's the optimiser gone? (part 5.c): missed tail calls, and more...
- Where's the optimiser gone (part 11): use the proper instruction for sign extension
- KNL Assembly Code for Matrix Multiplication
- [LLVMdev] Area for improvement
- [LLVMdev] Area for improvement