thr3ads.net - llvm dev - [LLVMdev] Poor register allocations vs gcc [Jul 2015]

If this information is useful, please help other people find it:
Share via:

deco33000 at yandex.com

2015-Jul-13 17:03 UTC

[LLVMdev] Poor register allocations vs gcc

Hello,

I have an issue with the llvm optimizations. I need to create object codes.

the -ON PURPOSE poor && useless- code :
---------------------------------------------------
#include <stdio.h>
#include <stdlib.h>

int ci(int a){

	return 23;

}
int flop(int a, char ** c){
	
	a += 71;

	int b = 0;

	if (a == 56){

		b = 69;
		b += ci(a);
	}

	puts("ok");
	return a + b;
}
--------------------------------------

Compiled that way (using the versions I downloaded and eventually compiled) :
clang_custom -std=c11 -O3 -march=native -c app2.c -S

against gcc:
gcc_custom -std=c11 -O3 -march=native -c app2.c -S

Versions (latest for each, downloaded just a few days ago):
gcc : 5.1
clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin

Host:
osx yosemite.

The assembly (cut to the essential):

LLVM:
	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r14
	pushq	%rbx
	movl	%edi, %r14d
	leal	71(%r14), %eax
	xorl	%ecx, %ecx
	cmpl	$56, %eax
	movl	$92, %ebx
	cmovnel	%ecx, %ebx
	leaq	L_.str(%rip), %rdi
	callq	_puts
	leal	71(%rbx,%r14), %eax
	popq	%rbx
	popq	%r14
	popq	%rbp
	retq

and the gcc one:

        pushq %rbp
	movl	$0, %eax
	movl	$92, %ebp
	pushq	%rbx
	leal	71(%rdi), %ebx
	leaq	LC1(%rip), %rdi
	subq	$8, %rsp
	cmpl	$56, %ebx
	cmovne	%eax, %ebp
	call	_puts
	addq	$8, %rsp
	leal	0(%rbp,%rbx), %eax
	popq	%rbx
	popq	%rbp
	ret

As we can see, llvm makes poor register allocations (ecx and r14), leading to
more instructions for the same result.

Are there some optimizations I can bring on the table to avoid this ?
-- 
Jog

Matthias Braun

2015-Jul-13 18:03 UTC

head link

[LLVMdev] Poor register allocations vs gcc

> On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote:
> 
> Hello,
> 
> I have an issue with the llvm optimizations. I need to create object codes.
> 
> the -ON PURPOSE poor && useless- code :
> ---------------------------------------------------
> #include <stdio.h>
> #include <stdlib.h>
> 
> int ci(int a){
> 
> 	return 23;
> 
> }
> int flop(int a, char ** c){
> 	
> 	a += 71;
> 
> 	int b = 0;
> 
> 	if (a == 56){
> 
> 		b = 69;
> 		b += ci(a);
> 	}
> 
> 	puts("ok");
> 	return a + b;
> }
> --------------------------------------
> 
> Compiled that way (using the versions I downloaded and eventually compiled)
:
> clang_custom -std=c11 -O3 -march=native -c app2.c -S
> 
> against gcc:
> gcc_custom -std=c11 -O3 -march=native -c app2.c -S
> 
> Versions (latest for each, downloaded just a few days ago):
> gcc : 5.1
> clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin
> 
> Host:
> osx yosemite.
> 
> The assembly (cut to the essential):
> 
> LLVM:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	pushq	%r14
> 	pushq	%rbx
> 	movl	%edi, %r14d
> 	leal	71(%r14), %eax
> 	xorl	%ecx, %ecx
> 	cmpl	$56, %eax
> 	movl	$92, %ebx
> 	cmovnel	%ecx, %ebx
> 	leaq	L_.str(%rip), %rdi
> 	callq	_puts
> 	leal	71(%rbx,%r14), %eax
> 	popq	%rbx
> 	popq	%r14
> 	popq	%rbp
> 	retq
> 
> and the gcc one:
> 
>        pushq %rbp
> 	movl	$0, %eax
> 	movl	$92, %ebp
> 	pushq	%rbx
> 	leal	71(%rdi), %ebx
> 	leaq	LC1(%rip), %rdi
> 	subq	$8, %rsp
> 	cmpl	$56, %ebx
> 	cmovne	%eax, %ebp
> 	call	_puts
> 	addq	$8, %rsp
> 	leal	0(%rbp,%rbx), %eax
> 	popq	%rbx
> 	popq	%rbp
> 	ret
> 
> As we can see, llvm makes poor register allocations (ecx and r14), leading
to more instructions for the same result.
> 
> Are there some optimizations I can bring on the table to avoid this ?
As far as I know clang on OS X always sets up a frame pointer unless you
explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace
and others rely on frame pointers being present.

I don't see why using %ecx would be a problem, there are no extra
spill/reloads produced because of that.

- Matthias

deco33000 Jog

2015-Jul-13 18:08 UTC

head link

[LLVMdev] Poor register allocations vs gcc

<br />Hello, <br />Ecx is a problem because you have to xor it.
Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br
/><br />Now llvm is one instruction from gcc. If ecx was not used, it
would be as fast.<br />-- <br />Sent from Yandex.Mail for
mobile<br /><br />20:03, 13 July 2015, Matthias Braun
<mbraun@apple.com>:<br /><blockquote><br /><br
/><blockquote>šOn Jul 13, 2015, at 10:03 AM, deco33000@yandex.com
wrote:<br /><br />šHello,<br /><br />šI have an issue
with the llvm optimizations. I need to create object codes.<br /><br
/>šthe -ON PURPOSE poor && useless- code :<br
/>š---------------------------------------------------<br />š#include
<stdio.h><br />š#include <stdlib.h><br /><br
/>šint ci(int a){<br /><br />šššššššššreturn 23;<br
/><br />š}<br />šint flop(int a, char ** c){<br /><br
/>ššššššššša += 71;<br /><br />šššššššššint b = 0;<br
/><br />šššššššššif (a == 56){<br /><br
/>šššššššššššššššššb = 69;<br />šššššššššššššššššb += ci(a);<br
/>ššššššššš}<br /><br />šššššššššputs("ok");<br
/>šššššššššreturn a + b;<br />š}<br
/>š--------------------------------------<br /><br />šCompiled
that way (using the versions I downloaded and eventually compiled) :<br
/>šclang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br
/>šagainst gcc:<br />šgcc_custom -std=c11 -O3 -march=native -c app2.c
-S<br /><br />šVersions (latest for each, downloaded just a few days
ago):<br />šgcc : 5.1<br />šclang/llvm:
clang+llvm-3.6.1-x86_64-apple-darwin<br /><br />šHost:<br
/>šosx yosemite.<br /><br />šThe assembly (cut to the
essential):<br /><br />šLLVM:<br />šššššššššpushq %rbp<br
/>šššššššššmovq %rsp, %rbp<br />šššššššššpushq %r14<br
/>šššššššššpushq %rbx<br />šššššššššmovl %edi, %r14d<br
/>šššššššššleal 71(%r14), %eax<br />šššššššššxorl %ecx, %ecx<br
/>šššššššššcmpl $56, %eax<br />šššššššššmovl $92, %ebx<br
/>šššššššššcmovnel %ecx, %ebx<br />šššššššššleaq L_.str(%rip),
%rdi<br />šššššššššcallq _puts<br />šššššššššleal 71(%rbx,%r14),
%eax<br />šššššššššpopq %rbx<br />šššššššššpopq %r14<br
/>šššššššššpopq %rbp<br />šššššššššretq<br /><br />šand the
gcc one:<br /><br />ššššššššpushq %rbp<br />šššššššššmovl $0,
%eax<br />šššššššššmovl $92, %ebp<br />šššššššššpushq %rbx<br
/>šššššššššleal 71(%rdi), %ebx<br />šššššššššleaq LC1(%rip), %rdi<br
/>šššššššššsubq $8, %rsp<br />šššššššššcmpl $56, %ebx<br
/>šššššššššcmovne %eax, %ebp<br />šššššššššcall _puts<br
/>šššššššššaddq $8, %rsp<br />šššššššššleal 0(%rbp,%rbx), %eax<br
/>šššššššššpopq %rbx<br />šššššššššpopq %rbp<br
/>šššššššššret<br /><br />šAs we can see, llvm makes poor
register allocations (ecx and r14), leading to more instructions for the same
result.<br /><br />šAre there some optimizations I can bring on the
table to avoid this ?<br /></blockquote><br />As far as I know
clang on OS X always sets up a frame pointer unless you explicitely use
-fomit-frame-pointer. I think the reasoning being that dtrace and others rely on
frame pointers being present.<br /><br />I don't see why using
%ecx would be a problem, there are no extra spill/reloads produced because of
that.<br /><br />- Matthias<br /><br
/></blockquote>

Quentin Colombet

2015-Jul-13 18:10 UTC

head link

[LLVMdev] Poor register allocations vs gcc

Hi Jog,

This look like a scheduling problem to me.

The main difference here is that in GCC the final “a + b” is scheduled before
the call, whereas in LLVM case, this is scheduled after the call.
Because of that, %rdi cannot be used in the final add and it has to be saved
somewhere else.

You can see that in effect by replacing:
	puts("ok");
	return a + b;

By

        b += a;
	puts("ok");
	return b;

That being said, you shouldn’t have to do that to have the nice code.

Could you file a PR for the scheduling problem?

Thanks,
-Quentin> On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote:
> 
> Hello,
> 
> I have an issue with the llvm optimizations. I need to create object codes.
> 
> the -ON PURPOSE poor && useless- code :
> ---------------------------------------------------
> #include <stdio.h>
> #include <stdlib.h>
> 
> int ci(int a){
> 
> 	return 23;
> 
> }
> int flop(int a, char ** c){
> 	
> 	a += 71;
> 
> 	int b = 0;
> 
> 	if (a == 56){
> 
> 		b = 69;
> 		b += ci(a);
> 	}
> 
> 	puts("ok");
> 	return a + b;
> }
> --------------------------------------
> 
> Compiled that way (using the versions I downloaded and eventually compiled)
:
> clang_custom -std=c11 -O3 -march=native -c app2.c -S
> 
> against gcc:
> gcc_custom -std=c11 -O3 -march=native -c app2.c -S
> 
> Versions (latest for each, downloaded just a few days ago):
> gcc : 5.1
> clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin
> 
> Host:
> osx yosemite.
> 
> The assembly (cut to the essential):
> 
> LLVM:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	pushq	%r14
> 	pushq	%rbx
> 	movl	%edi, %r14d
> 	leal	71(%r14), %eax
> 	xorl	%ecx, %ecx
> 	cmpl	$56, %eax
> 	movl	$92, %ebx
> 	cmovnel	%ecx, %ebx
> 	leaq	L_.str(%rip), %rdi
> 	callq	_puts
> 	leal	71(%rbx,%r14), %eax
> 	popq	%rbx
> 	popq	%r14
> 	popq	%rbp
> 	retq
> 
> and the gcc one:
> 
>        pushq %rbp
> 	movl	$0, %eax
> 	movl	$92, %ebp
> 	pushq	%rbx
> 	leal	71(%rdi), %ebx
> 	leaq	LC1(%rip), %rdi
> 	subq	$8, %rsp
> 	cmpl	$56, %ebx
> 	cmovne	%eax, %ebp
> 	call	_puts
> 	addq	$8, %rsp
> 	leal	0(%rbp,%rbx), %eax
> 	popq	%rbx
> 	popq	%rbp
> 	ret
> 
> As we can see, llvm makes poor register allocations (ecx and r14), leading
to more instructions for the same result.
> 
> Are there some optimizations I can bring on the table to avoid this ?
> -- 
> Jog
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

deco33000 Jog

2015-Jul-13 18:14 UTC

head link

[LLVMdev] Poor register allocations vs gcc

<br />Hi, <br />I certainly will Quentin!<br /><br
/>Thanks<br />-- <br />Sent from Yandex.Mail for mobile<br
/><br />20:10, 13 July 2015, Quentin Colombet
<qcolombet@apple.com>:<br /><blockquote>Hi Jog,<br
/><br />This look like a scheduling problem to me.<br /><br
/>The main difference here is that in GCC the final “a + b” is scheduled
before the call, whereas in LLVM case, this is scheduled after the call.<br
/>Because of that, %rdi cannot be used in the final add and it has to be
saved somewhere else.<br /><br />You can see that in effect by
replacing:<br />        puts("ok");<br />        return a
+ b;<br /><br />By<br /><br />        b += a;<br
/>        puts("ok");<br />        return b;<br
/><br />That being said, you shouldn’t have to do that to have the nice
code.<br /><br />Could you file a PR for the scheduling
problem?<br /><br />Thanks,<br />-Quentin<br /><br
/><blockquote> On Jul 13, 2015, at 10:03 AM, deco33000@yandex.com
wrote:<br /><br /> Hello,<br /><br /> I have an issue
with the llvm optimizations. I need to create object codes.<br /><br
/> the -ON PURPOSE poor && useless- code :<br
/> ---------------------------------------------------<br /> #include
<stdio.h><br /> #include <stdlib.h><br /><br
/> int ci(int a){<br /><br />         return 23;<br
/><br /> }<br /> int flop(int a, char ** c){<br /><br
/>         a += 71;<br /><br />         int b = 0;<br
/><br />         if (a == 56){<br /><br
/>                 b = 69;<br />                 b += ci(a);<br
/>         }<br /><br />         puts("ok");<br
/>         return a + b;<br /> }<br
/> --------------------------------------<br /><br /> Compiled
that way (using the versions I downloaded and eventually compiled) :<br
/> clang_custom -std=c11 -O3 -march=native -c app2.c -S<br /><br
/> against gcc:<br /> gcc_custom -std=c11 -O3 -march=native -c app2.c
-S<br /><br /> Versions (latest for each, downloaded just a few days
ago):<br /> gcc : 5.1<br /> clang/llvm:
clang+llvm-3.6.1-x86_64-apple-darwin<br /><br /> Host:<br
/> osx yosemite.<br /><br /> The assembly (cut to the
essential):<br /><br /> LLVM:<br />         pushq %rbp<br
/>         movq %rsp, %rbp<br />         pushq %r14<br
/>         pushq %rbx<br />         movl %edi, %r14d<br
/>         leal 71(%r14), %eax<br />         xorl %ecx, %ecx<br
/>         cmpl $56, %eax<br />         movl $92, %ebx<br
/>         cmovnel %ecx, %ebx<br />         leaq L_.str(%rip),
%rdi<br />         callq _puts<br />         leal 71(%rbx,%r14),
%eax<br />         popq %rbx<br />         popq %r14<br
/>         popq %rbp<br />         retq<br /><br /> and the
gcc one:<br /><br />        pushq %rbp<br />         movl $0,
%eax<br />         movl $92, %ebp<br />         pushq %rbx<br
/>         leal 71(%rdi), %ebx<br />         leaq LC1(%rip), %rdi<br
/>         subq $8, %rsp<br />         cmpl $56, %ebx<br
/>         cmovne %eax, %ebp<br />         call _puts<br
/>         addq $8, %rsp<br />         leal 0(%rbp,%rbx), %eax<br
/>         popq %rbx<br />         popq %rbp<br
/>         ret<br /><br /> As we can see, llvm makes poor
register allocations (ecx and r14), leading to more instructions for the same
result.<br /><br /> Are there some optimizations I can bring on the
table to avoid this ?<br /> -- <br /> Jog<br /><br
/> _______________________________________________<br /> LLVM
Developers mailing list<br /> LLVMdev@cs.uiuc.edu
http://llvm.cs.uiuc.edu<br
/> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br
/></blockquote><br /><br /></blockquote>

deco33000 at yandex.com

2015-Jul-13 21:44 UTC

head link

[LLVMdev] Poor register allocations vs gcc

By the way Quentin,

Your modification makes llvm much faster than gcc (12 ops vs 15 ops): less
pushq/popq, better use of the registers..
This code is silly at best but thanks to you I could learn something on llvm.

Thanks a lot :)

-- 
Jog

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jul 2015 - [LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

[LLVMdev] Poor register allocations vs gcc

Seemingly Similar Threads