On Tue, 27 Apr 2004, [koi8-r] "Valery A.Khamenya[koi8-r] "
wrote:> i was thinking that this question was not good
> right after relese 1.0, but now perhaps it is OK...
> if not, then I am sorry.
You could always ask, it's just that the answer changes over time. :)
> So, what about current status of benchmarks?
> I mean comparison to gcc.
It's slowly getting better. :)
> I have looked at
> http://llvm.cs.uiuc.edu/testresults/X86/
>
> Unfortunatelly graphs lines are hardly for human eye,
> but tables are OK.
Note that there is often a bit of noise in those numbers. In particular,
the programs are only run once and "real" time is reported. The
nightly
tester runs in the middle of the night so the machine is still unloaded,
but noise is an issue.
> I give my own interpretation for April 26, 2004 benchmrking tests,
> please don't beat me if i am wrong, but correct me. Focus of my
> attention is the time of execution of the program, i.e., fields GCC/CBE
> and GCC/LLC.
Yup, that's a good idea. You might also be interested in LLC-LS, which is
the X86 backend with the global register allocator. Not suprisingly, it
can make a substantial impact over the local allocator: it generates code
that is about twice as fast as LLC for programs like 254.gap and
256.bzip2.
> For me it looks like following.
>
> ----------------------------------------------------
> 1. Programs/External:
>
> a) CBE code is already comparable with GCC code
> (some tests are slower, but some quicker.)
> b) LLC code is still rather slower then GCC code
This is about right. With the CBE, we are *consistently* faster on
179.art (a 2-2.5x speedup), 252.eon (~20% speedup), 255.vortex (~15%
speedup), and 130.li (~20% speedup). Some of the other benchmarks we lag
behind, others are extremely noisy.
LLC generates code that is generally pretty slow compared to the CBE on
X86. This is largely due to lack of global register allocator for
floating point (even with linear scan), and some of the other issues
described here:
http://mail.cs.uiuc.edu/pipermail/llvmdev/2004-April/001020.html
> 2. Programs/MultiSource
>
> a) CBE code is already rather quicker then GCC code
> some tests are still (moderate) slower,
> but some are much quicker (up to 5 times).
> b) LLC code is still rather slower then GCC code.
> However some tests show up to 5 times speed up
Be careful comparing these numbers. I see that we have a 23x speedup
today over GCC on the "burg" test, but we go from 0.093 -> 0.004s.
:)
The shorter the test runs get, the more noisy they get, so unfortunately
we're not getting a realistic 23x speedup here. ;-)
That said, there are quite a few 20%, 40%, and even an 85% speedup here.
> 3. Programs/SingleSource
> a) CBE code is rather quicker then GCC code
> some tests are still (moderate) slower,
> but some are much quicker (up to 6 times).
These are even more dubious. In particular, only the first 6 rows contain
programs with reasonable runtimes. This means that the 7x speedups for
going from 0.021 -> 0.003 don't really count. :)
That said, we are still getting a 1.88x an 2.32x speedup on the int/fp
drystones and a 1.82x speedup on whetstone.
> Overall impression:
> 1) CBE code is already rather quicker then GCC
> 2) LLC code is rather slower then CBE, but comparable to GCC
LLC code is only really comparable on testcases where the LLVM optimizer
is doing really good things, such as C++ programs. Right now with the
linear scan allocator on the X86, I would say that LLC generates is
20->50% slower code than the C backend.
> BTW, guys, why not to focus more attention on slow
> tests like: UnitTests/2002-10-09-ArrayResolution?
This one is just noise, if you look today it's 1.0's straight across the
board. Also note that the test runs for 0.003 seconds, which is the
resolution of the time command on the system the program is being run on:
this is not a good test for checking performance. :)
> SPEC/CFP2000/179.art/179.art or
Hrm, you're not happy with the 2.25x speedup we get now? With GCC it
takes 9.639s to run the test, with LLVM-CBE it takes 4.3s, and with
LLVM-LLC-LS it takes 4.963s. I think these are pretty good numbers. :)
> or maybe it is already under hard work? :)
Actually we spend *VERY* little time tuning and tweaking the optimizer for
performance. Something that would be *INCREDIBLY* useful would be for
someone to pick some benchmark or other program we do poorly on (e.g.
Ptrdist-ks), and find out *WHAT* we could be doing to improve it. A good
way to do this is to take the program, run it in a profiler (llvm-prof or
gprof) find the hot spots, and see what we're code generating for them,
and suggest ways that it could be improved. If something performs well
with the CBE but not with LLC-LS, then compare the native machine code
generated, if it performs poorly with both, then it's probably an LLVM
optimization.
At this point there are a huge number of possibilities for improvement.
We have very little in the way of loop optimizations, and we don't
actually use an interprocedural pointer analysis (I'm hope to rectify this
for 1.3, it should make a huge difference). Even if you're not into
hacking on LLVM optimizations, identifying code that we could improve
(and reducing them down to small examples of code we compile poorly) is
incredibly useful.
Consider this a small plea for help. :) Once we know what to fix, it's
usually pretty easy to do so, but identifying the problems takes time, and
we have plenty of other things we need to be doing as well.
-Chris
--
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/