For the simple C program below I show the output of clang and the
output of the VS compiler (I am on windows). Maybe this is obvious to
you, but is it really faster to do 2 multiplications, 3 movl
instructions, 2 shifts, 1 add, and 1 substract than to do 1 mov, 1
cdq, and 1 idiv?
I run into this while trying to understand why my code runs slower
with llvm than a comparable program on windows.
Thanks for any help,
Brent
int f(int n)
{
return (n + 1) % 18;
}
"clang -O2 -S" produces this code:
_f: # @f
# BB#0:
movl 4(%esp), %ecx
incl %ecx
movl $954437177, %edx # imm = 0x38E38E39
movl %ecx, %eax
imull %edx
movl %edx, %eax
shrl $31, %eax
sarl $2, %edx
addl %eax, %edx
imull $18, %edx, %eax
subl %eax, %ecx
movl %ecx, %eax
ret
The visual studio compiler (/O2) instead issues the idiv instruction:
PUBLIC _f
; Function compile flags: /Ogtpy
; COMDAT _f
_TEXT SEGMENT
_n$ = 8 ; size = 4
_f PROC ; COMDAT
; File c:\a.c
; Line 6
mov eax, DWORD PTR _n$[esp-4]
inc eax
cdq
mov ecx, 18 ; 00000012H
idiv ecx
mov eax, edx
; Line 7
ret 0
_f ENDP
_TEXT ENDS
END