thr3ads.net - llvm dev - [LLVMdev] -march and -mtune options on x86 [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Ghassan Shobaki

2012-Jan-15 07:55 UTC

[LLVMdev] -march and -mtune options on x86

I have been doing some benchmarking on x86 using llvm 2.9 with the llvm-gcc 4.2
front end. I noticed that the -march and -mtune options make a significant
positive difference in x86-32 mode but hardly make any difference in x86-64
mode. The small difference that I am measuring when the target is x86-64 could
easily be random variation, while for the x86-32 target I am measuring a huge
difference on some benchmarks. Does anyone have an explanation for this? Does
the llvm back end somehow ignore these options when the target is x86-64?

Thank you in advance!
-Ghassan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120114/7c2f7e02/attachment.html>

Chandler Carruth

2012-Jan-15 08:01 UTC

head link

[LLVMdev] -march and -mtune options on x86

On Sat, Jan 14, 2012 at 11:55 PM, Ghassan Shobaki <ghassan_shobaki at
yahoo.com> wrote:
> I have been doing some benchmarking on x86 using llvm 2.9 with the
> llvm-gcc 4.2 front end.
>
FYI, llvm 2.9 and llvm-gcc 4.2 are both quite old. It's very limited how
much help you can get for those platforms. I would suggest trying LLVM 3.0
and the Clang or DragonEgg frontends.

> I noticed that the -march and -mtune options make a significant positive
> difference in x86-32 mode but hardly make any difference in x86-64 mode.
> The small difference that I am measuring when the target is x86-64 could
> easily be random variation, while for the x86-32 target I am measuring a
> huge difference on some benchmarks. Does anyone have an explanation for
> this? Does the llvm back end somehow ignore these options when the target
> is x86-64?
>
I'm not sure about the llvm-gcc implementation of the options, but
understand that i386 is an extremely limiting platform to target. I would
expect significant improvements when moving the code generator from i386 ->
i686 and other modern processors.

However x86-64 (as one side benefit) provides a much more modern baseline
for the instructions available, etc. The biggest differences past x86-64
you're likely to see are in the SIMD vector instructions available, which
will only help if you have heavily vectorized code. There are a few other
differences that can be significant, but far fewer than the jumps between
i386 and modern x86 chips.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120115/6b99483c/attachment.html>

Gordon Keiser

2012-Jan-16 00:34 UTC

head link

[LLVMdev] -march and -mtune options on x86

Which options are you seeing that cause the largest difference, and on which
targets?   As Chandler mentioned there has been a large amount of variation in
x86 targets, and there are certain optimizations that can be done, on say a
Pentium (scheduling instructions which are pairable and non-dependent so the U
and V pipelines are saturated without contention, for example) that don't
make sense to take the time with on a true i386 target.  Likewise there were
certain optimizations on the i386 such as decomposing multiplications into left
shifts which generally wouldn't be needed on a modern processor.   x64
hasn't seen nearly the same amount of variation since it came into
existence.

-Gordon

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Ghassan Shobaki
Sent: Sunday, January 15, 2012 2:55 AM
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] -march and -mtune options on x86

I have been doing some benchmarking on x86 using llvm 2.9 with the llvm-gcc 4.2
front end. I noticed that the -march and -mtune options make a significant
positive difference in x86-32 mode but hardly make any difference in x86-64
mode. The small difference that I am measuring when the target is x86-64 could
easily be random variation, while for the x86-32 target I am measuring a huge
difference on some benchmarks. Does anyone have an explanation for this? Does
the llvm back end somehow ignore these options when the target is x86-64?

Thank you in advance!
-Ghassan

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120115/bf35026c/attachment.html>

Ghassan Shobaki

2012-Jan-16 08:29 UTC

head link

[LLVMdev] -march and -mtune options on x86

Let me describe more precisely what I am doing and why the results I got may
help improve LLVM's performance on modern x86-64 processors regardless of
the front end (GCC, Clang or DragonEgg).


I am running ALL my tests on an Intel Xeon E5540 processor, which is an x86-64
Nehalem processor. The OS is a 64-bit version of Ubuntu. So, I am running all my
tests on the same x86-64 machine and am only experimenting with compiler
options. What I meant by running in x86-32 mode is simply using the -m32 option
(passed through the llvm-gcc front end, but I believe the front end is
irrelevant with regard to this matter, any confirmation?). The original reason
for using this option was forcing the back end to work with a smaller number of
physical registers to study register pressure reduction under more stringent
constraints (as part of an academic research project). However, the observation
that I am trying to communicate in this posting may not be related to register
pressure.


The observation is that using the -march=core2 and -mtune=core2 options makes a
significant positive difference in x86-32 mode (that is, with the -m32 option),
while it does not make any significant difference in the x86-64 mode (without
the -m32 option). My hypothesis is that the LLVM back end is making some good
target-specific optimizations when the -m32 option is used but those good
optimizations are disabled when this option is not used. This implies that
enabling those optimizations in the native x86-64 mode may lead to a significant
performance improvement. Here are the SPEC CPU2006 geometric-mean scores that I
am measuring:

Native x86-64 mode (without the -m32 option):Using -O3
only:                                               INT score: 19.24     FP
score: 15.64
 
Using -O3 -march=core2 -mtune=core2:     INT score: 19.16     FP score: 15.57

So, there is no significant difference in this case. The small difference may
just be random variation.


x86-32 mode (adding the -m32 option):Using -O3 -m32
only:                                               INT score: 16.86     FP
score: 14.09
 
Using -O3 -m32 -march=core2 -mtune=core2:     INT score: 17.02     FP score:
15.24

So, there is a significant difference in this case. In fact, the 8%
geometric-mean improvement on FP2006 is a huge improvement that comes out of
many double-digit percentage improvements on some individual benchmarks. The
biggest improvement is 48% on gromacs, which is a CPU-intensive FP benchmark.


The above geometric means are consistent with the logical expectation that
LLVM's performance in the native x86-64 mode is better, because more spill
code is generated when the -m32 option is used. However, these aggregate numbers
hide the fact that LLVM generates much faster code for some benchmarks when the
-m32 option is used. An extreme example is the bwaves benchmark, which has a
score of 15.5 in the native mode and a score of 23 (a 48% speedup) when the -m32
option is used. If LLVM is capable of achieving the higher score in the -m32
mode, it should be able to achieve at least the same score in the native mode.
So, the question is: are there any good optimizations that we are losing in the
native x86-64 mode? If yes, can we enable them to get better performance on
x86-64?

It seems to me that the back end is somehow assuming that modern x86-64 machines
magically solve all the scheduling and tuning problems and do not need any help
from the compiler. Any truth to this?


Can anyone who is interested in performance on x86-64 try rerunning his/her
tests using the -m32 mode to see if that gives any speedup?


Many thanks!

-Ghassan




________________________________
 From: Gordon Keiser <gkeiser at arxan.com>
To: Ghassan Shobaki <ghassan_shobaki at yahoo.com>; "llvmdev at
cs.uiuc.edu" <llvmdev at cs.uiuc.edu>
Sent: Monday, January 16, 2012 2:34 AM
Subject: RE: [LLVMdev] -march and -mtune options on x86
 

Which options are you seeing that cause the largest difference, and on which
targets?   As Chandler mentioned there has been a large amount of variation in
x86 targets, and there are certain optimizations that can be done, on say a
Pentium (scheduling instructions which are pairable and non-dependent so the U
and V pipelines are saturated without contention, for example) that don’t make
sense to take the time with on a true i386 target.  Likewise there were certain
optimizations on the i386 such as decomposing multiplications into left shifts
which generally wouldn’t be needed on a modern processor.   x64 hasn’t seen
nearly the same amount of variation since it came into existence. 
 
-Gordon
 
From:llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Ghassan Shobaki
Sent: Sunday, January 15, 2012 2:55 AM
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] -march and -mtune options on x86
 
I have been doing some benchmarking on x86 using llvm 2.9 with the llvm-gcc 4.2
front end. I noticed that the -march and -mtune options make a significant
positive difference in x86-32 mode but hardly make any difference in x86-64
mode. The small difference that I am measuring when the target is x86-64 could
easily be random variation, while for the x86-32 target I am measuring a huge
difference on some benchmarks. Does anyone have an explanation for this? Does
the llvm back end somehow ignore these options when the target is x86-64?
 
Thank you in advance!
-Ghassan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120116/4896ee37/attachment.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Jan 2012 - [LLVMdev] -march and -mtune options on x86

[LLVMdev] -march and -mtune options on x86

[LLVMdev] -march and -mtune options on x86

[LLVMdev] -march and -mtune options on x86

[LLVMdev] -march and -mtune options on x86

Apparently Analagous Threads