I've started investigating -O0 -g compile times with llvm-gcc, which
are pretty important for people in development mode (e.g. all debug
builds of llvm itself!).
I've found some interesting things. I'm testing with mainline as of
r52596 in a Release build and with checking disabled in the front-
end. My testcase is a large C++ source file: my friend
InstructionCombining.cpp. I build it the normal way we build it in a
debug mode but with the output redirected to /dev/null, which is:
time llvm-g++ -I/Users/sabre/llvm/include -I/Users/sabre/llvm/lib/
Transforms/Scalar -D_DEBUG -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -g -fno-
exceptions -Woverloaded-virtual -pedantic -Wall -W -Wwrite-strings -
Wno-long-long -Wunused -Wno-unused-parameter -c -MMD -MP -MF "/Users/
sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.d.tmp" -MT
"/Users/sabre/llvm/lib/Transforms/Scalar/Debug/
InstructionCombining.lo" -MT "/Users/sabre/llvm/lib/Transforms/Scalar/
Debug/InstructionCombining.o" -MT "/Users/sabre/llvm/lib/Transforms/
Scalar/Debug/InstructionCombining.d" InstructionCombining.cpp -o /dev/
null
One thing that is interesting is that we are significantly slower than
g++-4.2 on this testcase. I'm seeing these timings:
GCC 4.2 -c: 4.27s
GCC 4.2 -S: 3.59s
LLVM4.2 -c: 9.30s
LLVM4.2 -S: 8.40s
One thing I noticed is that with llvm-gcc, the assembler is taking
longer than with gcc 4.2 (.9s vs .68s). This turns out to be because
we make much larger output than GCC does:
gcc.s -> 8943786
llvm.s -> 13424378
gcc.o -> 2055892
llvm.o -> 3044512
Why is this? Lets look at the contents:
$ sdiff -w 120 gcc.size llvm.size
Segment : 1495968 | Segment : 2211617
Section (__TEXT, __text): 251661 | Section (__TEXT, __text):
290873
Section (__DWARF, __debug_frame): 82752 | Section (__DWARF,
__debug_frame): 80240
Section (__DWARF, __debug_info): 671478 | Section (__DWARF,
__debug_info): 1240778
Section (__DWARF, __debug_abbrev): 3241 | Section (__DWARF,
__debug_abbrev): 1535
Section (__DWARF, __debug_aranges): 48 | Section (__DWARF,
__debug_aranges): 0
Section (__DWARF, __debug_macinfo): 0 Section (__DWARF,
__debug_macinfo): 0
Section (__DWARF, __debug_line): 126106 | Section (__DWARF,
__debug_line): 149797
Section (__DWARF, __debug_loc): 0 Section (__DWARF, __debug_loc): 0
Section (__DWARF, __debug_pubnames): 168873 | Section (__DWARF,
__debug_pubnames): 165104
Section (__DWARF, __debug_pubtypes): 32449 |
Section (__DWARF, __debug_str): 17541 | Section (__DWARF,
__debug_str): 0
Section (__DWARF, __debug_ranges): 456 | Section (__DWARF,
__debug_ranges): 0
Section (__DATA, __const): 100 | Section (__DATA, __const): 136
Section (__TEXT, __cstring): 11543 | Section (__TEXT,
__cstring): 12678
Section (__DATA, __data): 64 | Section (__DATA, __data): 76
Section (__DATA, __const_coal): 48 |
Section (__TEXT, __const_coal): 128 |
Section (__DATA, __mod_init_func): 4 | Section (__DATA,
__mod_init_func): 4
Section (__DATA, __bss): 32 | Section (__DATA, __bss): 65
Section (__TEXT, __textcoal_nt): 116324 | Section (__TEXT,
__textcoal_nt): 168920
Section (__TEXT, __literal8): 8 | Section (__TEXT, __eh_frame):
88636
Section (__TEXT, __StaticInit): 147 | Section (__TEXT,
__StaticInit): 166
Section (__IMPORT, __jump_table): 12790 | Section (__IMPORT,
__jump_table): 12410
Section (__IMPORT, __pointers): 136 | Section (__IMPORT,
__pointers): 128
total 1495929 | total 2211546
total 1495968 | total 2211617
There are several problems here:
1. We're emitting __eh_frame even though it is being built with -fno-
exceptions: http://llvm.org/PR2481. Just the excess labels alone give
the assembler a lot more work to do.
2. The __debug_info section is twice as big and the __debug_line
section is a bit bigger: http://llvm.org/PR2482
3. We aren't outputting text or data __const_coal sections. I'm not
sure what these are, but they seem preferable to __textcoal_nt:
http://llvm.org/PR2483
Also, we have no __debug_pubtypes, __debug_aranges, __debug_str,
__debug_ranges or sections. I have no idea what these are, but could
be a problem :)
Fixing these are important for a couple of reasons. Generating more
output takes more time, both in the assembler but also in the compiler
to push all this around.
Moving up from the assembler, according to -ftime-report, our time in
cc1plus is basically going into:
LLVM Passes:
2.65s -> X86 DAG->DAG Instruction Selection (all selectiondag stuff)
0.54s -> X86 AT&T-Style Assembly Printer
0.42s -> Live Variable Analysis
0.19s -> Local Register Allocator
...
C++ Front-end time:
- 2.22s Tree to LLVM translator
- 1.94s parser
- 2.07s name lookup
- 0.66s preprocessor
- 0.20s gimplify
This doesn't add up to 8.4s because -ftime-report adds significant
overhead. It isn't to be trusted, but is a decent indicator.
From this, it looks like there is significant room for improvement in
many of the LLVM pieces. The two that sick out are the tree to llvm
translator and the selection dag related stuff. However, even the
asmprinter is taking a significant amount of time. This is partially
because it has to output a ton of stuff, but even then it could be
improved.
For example, picking on the frontend for a bit, we spend 10% of "-emit-
llvm -O0 -g -c" time in DebugInfo::EmitFunctionStart, most of which is
spent recursively walking the debug info with DISerializer. We also
spend 9.3% of the time in DebugInfo::EmitDeclare, 10% of the time in
eraseLocalLLVMValues, 12% of the time writing the .bc file (which
isn't relevant to normal use), 21% of time parsing (which we can't
help),
Anyone interested in picking off a piece and tackling it?
-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080621/3abe82e4/attachment.html>