Hi everyone, llvm is great! But there is one exception ;) llvm components are generally fast, but instruction selection is slooow. Let me explain. I am developing a toolkit for building virtual machines which can automatically generate a JIT compiler using the interpreter specification. llvm does the hard work of machine code generation. (Thanks to you all) I discovered that JIT compilation is taking much longer than had hoped. Some profiling showed that is almost entirely due to the llvm instruction selector (The register allocator and optimisation passes are fast) The split in execution time for JIT compilation being approximately: (My) IR code generation: 1 (llvm) Register allocation: 1 (llvm) Instruction selection: 12 The ladyvm JVM paper noted the same problem: http://vmkit.llvm.org/ladyvm.html So I did a quick experiment with llvm and lcc (http://www.cs.princeton.edu/software/lcc), in order to see how much faster instruction selection could be. lcc uses a BURG-type (called lburg) instruction-selector. The following is for x86/linux (ubuntu) I am interested in JIT performance so I have only counted user+sys time. Compiling a small test program (the dhrystone benchmark): lcc (Fraser and Hanson) (after pre-processing): 4ms. llc : 16ms. Incidentally, optimisation is respectably fast opt -O3 : 16ms. (My machine is quite slow) Using -time-passes shows that almost all of the time spent by llc is in the DAG-to-DAG instruction selection. lcc does lexing, parsing, type-checking, IR code-generation, register allocation AND final code generation in much less time that llvm spends doing instruction selection. Also, there is no way that instruction selection should take as long as -O3 optimisation. It would seem that lcc's lburg instruction selector is at least ten times faster than than llvm, possibly 20 times faster. Does a fast instruction selector for llvm exist? A BURG based instruction selector would improve the JIT compilation capabilities of llvm hugely, and benefit the static compiler as well. A fast compiler is really useful in the compile-test-debug cycle. Sadly, I do not have the time to implement such an instruction selector myself. Cheers, Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3683 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20081215/22b34198/attachment.bin>
Hi Mark, I don't know the internals of the instruction selection in LLVM (vburg and stuff), but I do know that work has been made to improve it. The "-fast" command line argument in lli and llc generates less optimized machine code quickly. You can also use the local register allocator which is faster than the default linear scan. I did some benchmarks in vmkit (the project that encompass ladyvm) and compilation time is roughly 50% faster with both fast and local register allocator. Nicolas Mark Shannon wrote:> Hi everyone, > > llvm is great! > > But there is one exception ;) > llvm components are generally fast, but instruction selection is slooow. > > Let me explain. > I am developing a toolkit for building virtual machines which can automatically > generate a JIT compiler using the interpreter specification. > llvm does the hard work of machine code generation. (Thanks to you all) > > I discovered that JIT compilation is taking much longer than had hoped. > Some profiling showed that is almost entirely due > to the llvm instruction selector (The register allocator and optimisation passes are fast) > > The split in execution time for JIT compilation being approximately: > (My) IR code generation: 1 > (llvm) Register allocation: 1 > (llvm) Instruction selection: 12 > > The ladyvm JVM paper noted the same problem: > http://vmkit.llvm.org/ladyvm.html > > So I did a quick experiment with llvm and lcc (http://www.cs.princeton.edu/software/lcc), > in order to see how much faster instruction selection could be. > lcc uses a BURG-type (called lburg) instruction-selector. > > The following is for x86/linux (ubuntu) > I am interested in JIT performance so I have only counted user+sys time. > > Compiling a small test program (the dhrystone benchmark): > lcc (Fraser and Hanson) (after pre-processing): 4ms. > llc : 16ms. > Incidentally, optimisation is respectably fast > opt -O3 : 16ms. > > (My machine is quite slow) > > Using -time-passes shows that almost all of the time spent by llc is in the DAG-to-DAG > instruction selection. > > lcc does lexing, parsing, type-checking, IR code-generation, register allocation > AND final code generation in much less time that llvm spends doing instruction selection. > Also, there is no way that instruction selection should take as long as -O3 optimisation. > > It would seem that lcc's lburg instruction selector is at least ten times faster than > than llvm, possibly 20 times faster. > > Does a fast instruction selector for llvm exist? > > A BURG based instruction selector would improve the JIT compilation capabilities of llvm hugely, > and benefit the static compiler as well. > A fast compiler is really useful in the compile-test-debug cycle. > > Sadly, I do not have the time to implement such an instruction selector myself. > > Cheers, > Mark. > > ------------------------------------------------------------------------ > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Dec 15, 2008, at 9:14 AM, Mark Shannon wrote:> I discovered that JIT compilation is taking much longer than had > hoped. > Some profiling showed that is almost entirely due > to the llvm instruction selector (The register allocator and > optimisation passes are fast) > > The split in execution time for JIT compilation being approximately: > (My) IR code generation: 1 > (llvm) Register allocation: 1 > (llvm) Instruction selection: 12Interesting. What version of LLVM are you using here?> > > The ladyvm JVM paper noted the same problem: > http://vmkit.llvm.org/ladyvm.html > > So I did a quick experiment with llvm and lcc (http://www.cs.princeton.edu/software/lcc > ), > in order to see how much faster instruction selection could be. > lcc uses a BURG-type (called lburg) instruction-selector. > > The following is for x86/linux (ubuntu) > I am interested in JIT performance so I have only counted user+sys > time. > > Compiling a small test program (the dhrystone benchmark): > lcc (Fraser and Hanson) (after pre-processing): 4ms. > llc : 16ms. > Incidentally, optimisation is respectably fast > opt -O3 : 16ms. > > (My machine is quite slow) > > Using -time-passes shows that almost all of the time spent by llc is > in the DAG-to-DAG > instruction selection. > > lcc does lexing, parsing, type-checking, IR code-generation, > register allocation > AND final code generation in much less time that llvm spends doing > instruction selection. > Also, there is no way that instruction selection should take as long > as -O3 optimisation.Note that these proportions can vary greatly depending on the nature of the input, and dhrystone is a very small amount of code.> > > It would seem that lcc's lburg instruction selector is at least ten > times faster than > than llvm, possibly 20 times faster. > > Does a fast instruction selector for llvm exist?The -fast option directs llc to generate code quickly, at the expense of the quality of the generated code. If your JIT is designed such that JIT compilation time is the dominating factor, this can be a very valuable tradeoff. A lot of work was done in this area for LLVM 2.4.> > > A BURG based instruction selector would improve the JIT compilation > capabilities of llvm hugely, > and benefit the static compiler as well.This may be true, though in the data I've seen currently the selection algorithm itself isn't the most significant bottleneck. FWIW, what -time-passes calls "instruction selection" includes a lot more work than what its name suggests. A fair amount of pre-selection optimization and lowering happens during this phase, as well as scheduling.> A fast compiler is really useful in the compile-test-debug cycle.Yep.> > > Sadly, I do not have the time to implement such an instruction > selector myself.Do you have time to do more detailed profiling? It might be interesting to see which parts of codegen are hot in your use cases. Dan