Stefan Kanthak via llvm-dev
2018-Nov-29 14:20 UTC
[llvm-dev] Where's the optimiser gone? (part 4): 64-bit division routines for IA32
Hi @ll, compiler-rt implements the 64-bit division routines __divdi3(), __moddi3(), __udivdi3() and __umoddi3() for IA32 alias x86 in assembler (see the directory compiler-rt/lib/builtins/i386/) While Stephen Canon did a decent job back in December 2008, he left QUITE some room for improvement^Woptimisation: see the attached patch. All 4 routines have two almost identical code branches of 20+ and 22+ instructions, with just TWO additional instructions in the second branch: - divdi3.S lines 72-102 vs. 103-104 - moddi3.S lines 71-104 vs. 104-144 - udivdi3.S lines 43-67 vs. 68-100 - umoddi3.S lines 44-72 vs. 73-108 These two branches can of course be folded into just one branch, saving 20+ instructions. The third branch, where both dividend and divisor are below 2**32, always performs a "long division", even if a single DIV would be sufficient, at the expense of an additional CMP and Jcc: adding these 2 instructions saves the execution of a DIV and QUITE some processor cycles (on average about 10-16 cycles per call, from a total of about 42-56 cycles). See <https://skanthak.homepage.t-online.de/msvc.html#sidenote> for comparision of these improved routines with other implementations. regards Stefan Kanthak PS: is there any special reason why __divmoddi4() and __udivmoddi4() are not implemented in assembler? What about __udivmodti4() etc. for AMD64 alias x86-64? See the directory compiler-rt/lib/builtins/x86_64/ -------------- next part -------------- A non-text attachment was scrubbed... Name: i386_di3.patch Type: application/octet-stream Size: 26444 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181129/2cc0afd2/attachment-0001.obj>