On Sat, 7 May 2005, Markus F.X.J. Oberhumer wrote:> Actually I feel that the current state of the art of inlining is where
> register allocation has been about 10 years ago. It's pretty fine for
most
> things, but back then I remember writing code like "register const
char *p
> __asm__("%esi");" where just adding the explicit __asm__
boosted performance
> of some compression routine by about 10% for gcc 2.6.3. Fortunately these
> times have passed for register allocation, but not yet for inlining.
You've just ignored all of the reasons I gave you above about why this is
a bad idea.
> Basically using things like _attribute__((__noinline__)) and
> __declspec(noinline) means the same - they may be unnecessary in ten years,
> but definitely not today. Just like in the past you can easily boost
> performance by putting them in the right place, even if it may be necessary
> to surround this by #ifdef REGISTER_STARVED_CPU tests.
I understand what you're saying. Again, read my last email for why I
think this is a bad idea :)
> Futhermore things like link-time IPO are not that important in C++ template
> style programs, where much of the whole "program" is available to
the
> compiler during each compilation pass.
I disagree. You're basically saying that because the compiler has all of
the stuff as inline functions, it can just inline it all instead of using
IPO.
> And profile-guided feedback optimizations are just evolving (and IMHO
> currently still mostly a marketing issue).
Like I said: "If you really care about the performance of your
code..."
>> Finally, if you have a piece of code that the LLVM optimizer is doing a
>> poor job on, please please please file a bug so we can fix it!!
>
> I'll always try my best to help improving LLVM :-)
Thanks!!
> Still, in the tests that are important to me LLVM currently does overally
not
> perform too well,
Can you share those tests?
> but looking at the disassembly suggests that this might mainly be an
> issue of x86 codegen, which is rather young as compared to other
> compilers.
If you're testing on X86, I would be strongly suspious of the X86 backend,
which can be improved in many ways. If you use PowerPC, for example,
things are much better (in terms of codegen quality). Using the C backend
should help a lot with this though.
Before pointing the finger at the inliner, it would be good to understand
what is going on in the testcase. Can you share or reduce the problem to
a small testcase?
-Chris
--
http://nondot.org/sabre/
http://llvm.cs.uiuc.edu/